TW202426060A - Engineered retrons and methods of use - Google Patents

Engineered retrons and methods of use

Info

Publication number
TW202426060A
TW202426060A TW112132199A TW112132199A TW202426060A TW 202426060 A TW202426060 A TW 202426060A TW 112132199 A TW112132199 A TW 112132199A TW 112132199 A TW112132199 A TW 112132199A TW 202426060 A TW202426060 A TW 202426060A
Authority
TW
Taiwan
Prior art keywords
ncrna
retrotranscript
sequence
engineered
nucleic acid
Prior art date
Application number
TW112132199A
Other languages
Chinese (zh)
Inventor
亞凜 拉達
維拉迪米爾 普雷斯尼艾
茵娜 施雪巴科瓦
布萊恩 古曼
馬瑞奧 羅德里圭茲 梅斯崔
權 慶林 傑弗
索奇塔 萊
狄文 史考特 昆蘭
穆瑟薩米 賈亞拉曼
史蒂芬 斯克里
雅各 萊爾
亞布里爾 弗雷塔斯 貝坦斯
衣仲夏
Original Assignee
美商雷納嘉德醫療管理公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/087,673 external-priority patent/US11866728B2/en
Priority claimed from PCT/US2023/061038 external-priority patent/WO2023141602A2/en
Priority claimed from PCT/US2023/072872 external-priority patent/WO2024044723A1/en
Application filed by 美商雷納嘉德醫療管理公司 filed Critical 美商雷納嘉德醫療管理公司
Publication of TW202426060A publication Critical patent/TW202426060A/en

Links

Landscapes

  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

Disclosed are engineered retrons and methods of use such as to modify the genome of a host (e.g., mammalian) cell by delivering the engineered retron or the encoded ncRNA in vitro or in vivoto the host (e.g., mammalian) cell.

Description

經工程改造之逆轉錄子及使用方法Engineered retrotransposons and methods of use

本揭示案一般係關於用於精確基因體編輯之系統、方法及組合物,包括在靶向及精確基因體位點處之核酸插入、置換及缺失,其中該等系統、方法及組合物係基於新穎及/或經工程改造之逆轉錄子。The present disclosure generally relates to systems, methods and compositions for precise genome editing, including nucleic acid insertion, substitution and deletion at targeted and precise genomic sites, wherein the systems, methods and compositions are based on novel and/or engineered retrotransposons.

藉由可程式化核酸酶(例如,RNA引導核酸酶(例如CRISPR核酸酶)、鋅指核酸酶(ZFN)及轉錄活化子樣效應子核酸酶(TALENS))進行之精確基因體編輯通常依賴於同源定向修復(HDR)及由可程式化核酸酶誘導之雙股斷裂(DSB)位點處存在供體DNA模板。普遍認為,HDR依賴性精確基因體編輯之一限制步驟係將供體DNA模板遞送至核酸酶誘導之DSB (例如,參見Ling等人, 「Improving the efficiency of precise genome editing with site-specific Cas9-oligonucleotide conjugates,」 Science Advances, 2020, 第6卷, 第15期, 第1-8頁)。已報告多種旨在增強HDR依賴性編輯效率之方法,其中多種涉及將DNA供體物理繫栓至精確編輯系統之組分上。例示性方法已在以下文獻中論述:K. Lee等人, 「Synthetically modified guide RNA and donor DNA are a versatile platform for CRISPR-Cas9 engineering,」 eLife 6, e25312 (2017);J. Carlson-Stevermer等人, 「Assembly of CRISPR ribonucleoproteins with biotinylated oligonucleotides via an RNA aptamer for precise gene editing,」 Nat. Commun. 8, 1711 (2017);N. Savic等人, 「Covalent linkage of the DNA repair template to the CRISPR-Cas9 nuclease enhances homology-directed repair,」 eLife 7, e33761 (2018);及E. J. Aird等人, 「Increasing Cas9-mediated homology-directed repair efficiency through covalent tethering of DNA repair template.,」 Commun. Biol. 1, 54 (2018),其中每一者均以引用之方式併入本文中。儘管作出了此等努力,HDR依賴性精確編輯之效率仍然不能令人滿意。Precise genome editing by programmable nucleases (e.g., RNA-guided nucleases (e.g., CRISPR nucleases), zinc finger nucleases (ZFNs), and transcription activator-like effector nucleases (TALENS)) generally relies on homology-directed repair (HDR) and the presence of a donor DNA template at the site of a double-strand break (DSB) induced by the programmable nuclease. It is generally believed that a limiting step of HDR-dependent precise genome editing is the delivery of the donor DNA template to the nuclease-induced DSB (e.g., see Ling et al., “Improving the efficiency of precise genome editing with site-specific Cas9-oligonucleotide conjugates,” Science Advances, 2020, Vol. 6, No. 15, pp. 1-8). A variety of approaches have been reported to enhance the efficiency of HDR-dependent editing, many of which involve physically tethering the DNA donor to components of the precise editing system. Exemplary methods are discussed in: K. Lee et al., “Synthetically modified guide RNA and donor DNA are a versatile platform for CRISPR-Cas9 engineering,” eLife 6, e25312 (2017); J. Carlson-Stevermer et al., “Assembly of CRISPR ribonucleoproteins with biotinylated oligonucleotides via an RNA aptamer for precise gene editing,” Nat. Commun. 8, 171 1 (2017); N. Savic et al., “Covalent linkage of the DNA repair template to the CRISPR-Cas9 nuclease enhances homology-directed repair,” eLife 7, e33761 (2018); and E. J. Aird et al., “Increasing Cas9-mediated homology-directed repair efficiency through covalent tethering of DNA repair template.,” Commun. Biol. 1, 54 (2018), each of which is incorporated herein by reference. Despite these efforts, the efficiency of accurate editing of HDR dependencies remains unsatisfactory.

逆轉錄子藉由其產生不尋常衛星DNA之獨特能力加以定義,該不尋常衛星DNA稱為msDNA (多複本單股DNA)。編碼逆轉錄子之DNA包括逆轉錄酶(RT)編碼基因( ret)及編碼非編碼RNA (ncRNA)之核酸序列,該核酸序列含有兩個連續且反向之非編碼序列,稱為 msrmsdret基因及非編碼RNA (包括 msrmsd)經轉錄為單一RNA轉錄本,其在轉錄後加工後經折疊成特定二級結構。一旦經轉譯,RT與 msd基因座下游之RNA模板結合,從而在充當引子之保守分支鏈鳥苷殘基中存在之2’OH基團的協助下,起始RNA朝向其5'端逆轉錄。逆轉錄在到達 msr基因座之前停止,且所得DNA (msDNA)經由2'-5'磷酸二酯鍵以及msDNA與RNA模板之3'端之間之鹼基配對,保持共價連接至RNA模板。在msd/msr轉錄本之5'及3'端之外部區域(分別為a1及a2)係互補的且可雜交,從而將位於 msrmsd區域中之結構留在內部位置(參見圖1A)。 msr基因座不進行逆轉錄,形成大小可變(介於3至10個鹼基對之範圍內)之一至三個短莖-環,而 msd基因座折疊成具有10-50 bp長度之高變長莖的單/雙長髮夾,該髮夾亦存在於最終msDNA形式中。 Retrotranscriptons are defined by their unique ability to generate unusual satellite DNA, known as msDNA (multi-copy single-stranded DNA). The DNA encoding the retrotranscripton includes a reverse transcriptase (RT) encoding gene ( ret ) and a nucleic acid sequence encoding a non-coding RNA (ncRNA) containing two consecutive and inverted non-coding sequences, known as msr and msd . The ret gene and the non-coding RNA (including msr and msd ) are transcribed as a single RNA transcript, which is folded into a specific secondary structure after post-transcriptional processing. Once transcribed, RT binds to an RNA template downstream of the msd locus, thereby initiating reverse transcription of the RNA towards its 5' end with the help of the 2'OH group present in the conserved branch-chain guanosine residue that acts as a primer. Reverse transcription stops before reaching the msr locus, and the resulting DNA (msDNA) remains covalently linked to the RNA template via 2'-5' phosphodiester bonds and base pairing between the msDNA and the 3' end of the RNA template. The external regions at the 5' and 3' ends of the msd/msr transcripts (a1 and a2, respectively) are complementary and can hybridize, leaving structures located in the msr and msd regions in internal positions (see Figure 1A). The msr locus does not undergo reverse transcription, forming one to three short stem-loops of variable size (ranging from 3 to 10 base pairs), while the msd locus folds into a single/double long hairpin with a highly variable stem of 10-50 bp in length, which is also present in the final msDNA form.

最近有報告稱,逆轉錄子可用作提供用於HDR依賴性基因體編輯之供體DNA模板的手段(例如,參見Lopez等人, 「Precise genome editing across kingdoms of life using retron-derived DNA,」 Nature Chemical Biology, 2021年12月12日, 18, 第199-206頁(2022)),然而,在細胞內產生足夠水準之供體DNA模板以充分支持有效之HDR依賴性編輯仍為一項重大挑戰。此項技術中非常需要經改良之基於逆轉錄子之基因體修飾系統。 It has recently been reported that retrotranscripts can be used as a means to provide donor DNA templates for HDR-dependent genome editing (e.g., see Lopez et al., “Precise genome editing across kingdoms of life using retron-derived DNA,” Nature Chemical Biology , Dec. 12, 2021, 18 , pp. 199-206 (2022)), however, generating sufficient levels of donor DNA templates in cells to fully support efficient HDR-dependent editing remains a major challenge. Improved retrotranscript-based genome modification systems are highly desirable in this technology.

在一態樣中,本揭示案提供包含一或多種改良逆轉錄子之功能及/或特性的遺傳修飾之重組逆轉錄子。此類遺傳修飾可包括編碼逆轉錄子或逆轉錄子組分(諸如ncRNA或逆轉錄酶)之核酸分子中的一或多個連續或非連續核鹼基之突變、插入、缺失、倒置、置換、取代或易位。在各個態樣中,經一或多種遺傳修飾修飾之逆轉錄子(亦即,「經預修飾」或「未經修飾」之逆轉錄子或逆轉錄子組分)係天然存在之逆轉錄子或逆轉錄子組分(例如,表A的天然存在之ncRNA或RT),能夠促進細胞中之同源依賴性重組(或HDR),由此導致包含DNA供體模板之msDNA的濃度或量相對增加。在特定實施例中,重組逆轉錄子係基於及/或源自天然存在之逆轉錄子,諸如由表X提供之任何逆轉錄子相關序列(將一或多種遺傳修飾引入藉由本文所述之計算方法(例如,參見實例)發現的7257種先前未知之逆轉錄子的集合中。在其他實施例中,重組逆轉錄子係基於將一或多種遺傳修飾引入先前可用之逆轉錄子序列中(例如,「Mestre等人, Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classi fi cation of The Encoded Tripartite Systems, Nucleic Acids Research, 第48卷, 第22期, 2020年12月16日, 第12632-12647頁」(以引用之方式併入本文)以獲得重組逆轉錄子,該等重組逆轉錄子產生濃度或量增加之包含DNA供體模板之msDNA的能力增強。 In one aspect, the present disclosure provides recombinant retrotransposons comprising one or more genetic modifications that improve the function and/or properties of the retrotransposons. Such genetic modifications may include mutations, insertions, deletions, inversions, substitutions, replacements or translocations of one or more consecutive or non-consecutive nucleotides in a nucleic acid molecule encoding a retrotransposon or a retrotransposon component (such as an ncRNA or a reverse transcriptase). In various aspects, a retrotranscript modified with one or more genetic modifications (i.e., a "pre-modified" or "unmodified" retrotranscript or retrotranscript component) is a naturally occurring retrotranscript or retrotranscript component (e.g., a naturally occurring ncRNA or RT of Table A) that is capable of promoting homology-dependent recombination (or HDR) in a cell, thereby resulting in a relative increase in the concentration or amount of msDNA comprising a DNA donor template. In certain embodiments, the recombinant retrotranscript is based on and/or derived from a naturally occurring retrotranscript, such as any retrotranscript-related sequence provided by Table X (introducing one or more genetic modifications into a collection of 7257 previously unknown retrotranscripts discovered by the computational methods described herein (e.g., see Examples). In other embodiments, the recombinant retrotranscript is based on introducing one or more genetic modifications into a previously available retrotranscript sequence (e.g., "Mestre et al., Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classification of The Encoded Tripartite Systems , Nucleic Acids Research, Vol. 48, No. 22, Dec. 16, 2020, 12632-12647" (incorporated herein by reference) to obtain recombinant retrotranscripts having an enhanced ability to produce increased concentrations or amounts of msDNA comprising a DNA donor template.

在另一態樣中,本揭示案進一步提供編碼重組逆轉錄子及/或重組逆轉錄子組分(例如,重組ncRNA及/或重組逆轉錄子RT)之核酸分子。在又一態樣中,本揭示案提供基因體編輯系統,其包含重組逆轉錄子組分(例如,重組ncRNA及/或重組RT)、可程式化核酸酶(例如RNA引導之核酸酶,諸如CRISPR-Cas蛋白、ZFP及TALENS)以及引導RNA (在RNA引導核酸酶用於該等基因體編輯系統之情況下)。在另一態樣中,本揭示案提供編碼所述基因體編輯系統及該等其組分之核酸分子,以及構成該等基因體編輯系統之組分之多肽。在另一態樣中,本揭示案提供用於例如在 活體外離體活體內條件下轉移及/或表現該等基因體編輯系統之載體。在另一態樣中,本揭示案提供細胞遞送組合物及方法,包括用於被動及/或主動轉運至細胞(例如,質體)、藉由基於病毒之重組載體(例如,AAV及/或慢病毒載體)遞送、藉由非基於病毒之系統(例如脂質體及LNP)遞送以及藉由病毒樣顆粒遞送之組合物。取決於所採用之遞送系統,本文所述之基於逆轉錄子之基因體編輯系統可以DNA (例如,質體或基於DNA之病毒載體)、RNA (例如,由LNP遞送之ncRNA及mRNA)、DNA及RNA混合物、蛋白質(例如,病毒樣顆粒)及核糖核蛋白(RNP)複合物之形式進行遞送。可採用用於遞送本文所揭示之基於逆轉錄子之基因體編輯系統之組分的方法之任何合適組合。在一實施例中,基於逆轉錄子之基因體編輯系統之每種組分藉由全RNA系統進行遞送,例如,藉由一或多種LNP遞送一或多種RNA分子(例如,mRNA及/或ncRNA),其中該一或多種RNA分子形成ncRNA及引導RNA (根據需要)及/或經轉譯成多肽組分(例如,RT及可程式化核酸酶)。在另一態樣中,本揭示案提供藉由將本文所述之基於逆轉錄子之基因體編輯系統引入包含標靶編輯位點的細胞中(例如,在 活體外、活體內離體條件下)來進行基因體編輯之方法,由此導致在標靶編輯處進行編輯。在其他態樣中,本揭示案提供包含任何前述組分之調配物,用於遞送至細胞及/或組織,包括 活體外活體內離體遞送、由本文所述之基於重組逆轉錄子之基因體修飾系統及方法修飾的重組細胞及/或組織以及藉由使用本文所揭示之基於逆轉錄子之基因體修飾系統進行基因體編輯及相關DNA供體依賴性方法(諸如重組工程或細胞記錄)來修飾細胞之方法。本揭示案亦提供製備本文所述之重組逆轉錄子、基於逆轉錄子之基因體修飾系統、載體、組合物及調配物之方法,以及用於在 活體外活體內離體條件下修飾細胞之醫藥組合物及套組,其包含本文所揭示之基因體編輯及/或修飾系統。 In another aspect, the present disclosure further provides nucleic acid molecules encoding recombinant retrotranscripts and/or recombinant retrotranscript components (e.g., recombinant ncRNA and/or recombinant retrotranscript RT). In yet another aspect, the present disclosure provides a genome editing system comprising a recombinant retrotranscript component (e.g., recombinant ncRNA and/or recombinant RT), a programmable nuclease (e.g., RNA-guided nuclease, such as CRISPR-Cas proteins, ZFPs, and TALENS), and a guide RNA (in the case where RNA-guided nucleases are used in the genome editing systems). In another aspect, the present disclosure provides nucleic acid molecules encoding the genome editing systems and the components thereof, and polypeptides constituting the components of the genome editing systems. In another aspect, the present disclosure provides vectors for transferring and/or expressing such genome editing systems, e.g., in vitro , ex vivo , and in vivo conditions. In another aspect, the present disclosure provides cell delivery compositions and methods, including compositions for passive and/or active delivery to cells (e.g., plasmids), delivery by viral-based recombinant vectors (e.g., AAV and/or lentiviral vectors), delivery by non-viral-based systems (e.g., liposomes and LNPs), and delivery by virus-like particles. Depending on the delivery system employed, the retrotransposon-based genome editing systems described herein can be delivered in the form of DNA (e.g., plasmids or DNA-based viral vectors), RNA (e.g., ncRNA and mRNA delivered by LNPs), mixtures of DNA and RNA, proteins (e.g., virus-like particles), and ribonucleoprotein (RNP) complexes. Any suitable combination of methods for delivering the components of the retrotransposon-based genome editing systems disclosed herein can be employed. In one embodiment, each component of the retrotranscript-based genome editing system is delivered by a total RNA system, for example, one or more RNA molecules (e.g., mRNA and/or ncRNA) are delivered by one or more LNPs, wherein the one or more RNA molecules form ncRNA and guide RNA (as needed) and/or are translated into polypeptide components (e.g., RT and programmable nuclease). In another aspect, the present disclosure provides a method for genome editing by introducing the retrotranscript-based genome editing system described herein into a cell comprising a target editing site (e.g., in vitro, in vivo , or in vitro conditions), thereby causing editing to be performed at the target editing site. In other aspects, the disclosure provides formulations comprising any of the foregoing components for delivery to cells and/or tissues, including in vitro , in vivo and ex vivo delivery, recombinant cells and/or tissues modified by the recombinant retrotranscript-based genome modification systems and methods described herein, and methods of modifying cells by performing genome editing and related DNA donor-dependent methods (such as recombineering or cell transcription) using the retrotranscript-based genome modification systems disclosed herein. The present disclosure also provides methods for preparing the recombinant retrotransposons, retrotransposons-based genome modification systems, vectors, compositions and formulations described herein, as well as pharmaceutical compositions and kits for modifying cells in vitro , in vivo and ex vivo , which comprise the genome editing and/or modification systems disclosed herein.

在一實施例中,本揭示案或本文中之發明提供包含一或多種遞送媒劑之基因編輯系統,其中:該(等)遞送媒劑包含RNA貨物;該RNA貨物包含(a)至少一種編碼(i)核酸可程式化核酸酶及(ii)逆轉錄子逆轉錄酶之mRNA分子,(b)經工程改造之逆轉錄子ncRNA,及(c)用於可程式化核酸酶之引導RNA;且每種遞送媒劑含有(a)(i)及/或(a)(ii)及/或(b)及/或(c);由此一種遞送媒劑或超過一種遞送媒劑遞送(a)(i)、(a)(ii)、(b)及(c)。In one embodiment, the present disclosure or the invention herein provides a gene editing system comprising one or more delivery vehicles, wherein: the delivery vehicle(s) comprises an RNA cargo; the RNA cargo comprises (a) at least one mRNA molecule encoding (i) a nucleic acid programmable nuclease and (ii) a retrotranscript reverse transcriptase, (b) an engineered retrotranscript ncRNA, and (c) a guide RNA for the programmable nuclease; and each delivery vehicle contains (a)(i) and/or (a)(ii) and/or (b) and/or (c); whereby one delivery vehicle or more than one delivery vehicle delivers (a)(i), (a)(ii), (b) and (c).

在一實施例中,在基因編輯系統中,(a)(i)及(a)(ii)包含編碼核酸可程式化核酸酶及逆轉錄子逆轉錄酶之單一mRNA分子。In one embodiment, in the gene editing system, (a)(i) and (a)(ii) comprise a single mRNA molecule encoding a nucleic acid programmable nuclease and a retrotransposase.

在一實施例中,在基因編輯系統中,(a)(i)及(a)(ii)經編碼且表現為融合蛋白。In one embodiment, in the gene editing system, (a)(i) and (a)(ii) are encoded and expressed as a fusion protein.

在一實施例中,在基因編輯系統中,(a)(i)及(a)(ii)經編碼且表現為融合蛋白,且該融合蛋白包含與逆轉錄子逆轉錄酶之N末端融合的核酸可程式化核酸酶之C末端(核酸酶:RT融合);或該融合蛋白包含與逆轉錄子逆轉錄酶之C末端融合的核酸可程式化核酸酶之N末端(RT:核酸酶融合)。In one embodiment, in a gene editing system, (a)(i) and (a)(ii) are encoded and expressed as a fusion protein, and the fusion protein comprises the C terminus of a nucleic acid programmable nuclease fused to the N terminus of a retrotranscriptase (nuclease:RT fusion); or the fusion protein comprises the N terminus of a nucleic acid programmable nuclease fused to the C terminus of a retrotranscriptase (RT:nuclease fusion).

在一實施例中,在基因編輯系統中,(a)(i)及(a)(ii)包含編碼核酸可程式化核酸酶之第一mRNA分子及編碼逆轉錄子逆轉錄酶之第二mRNA分子。In one embodiment, in the gene editing system, (a)(i) and (a)(ii) comprise a first mRNA molecule encoding a nucleic acid programmable nuclease and a second mRNA molecule encoding a retrotransposase.

在一實施例中,在基因編輯系統中,(c)與(a)(i)、(a)(ii)及(b)分開或以 反式提供。 In one embodiment, in the gene editing system, (c) is separated from (a)(i), (a)(ii) and (b) or provided in trans .

在一實施例中,在基因編輯系統中,(b)經工程改造之逆轉錄子ncRNA及(c)引導RNA經融合或以 順式提供。 In one embodiment, in the gene editing system, (b) the engineered retrotran ncRNA and (c) the guide RNA are fused or provided in cis .

在一實施例中,在基因編輯系統中,(b)經工程改造之逆轉錄子ncRNA及(c)引導RNA經融合或以 順式提供,且引導RNA融合至逆轉錄子ncRNA之5’端。 In one embodiment, in the gene editing system, (b) the engineered retrotranscript ncRNA and (c) the guide RNA are fused or provided in cis , and the guide RNA is fused to the 5' end of the retrotranscript ncRNA.

在一實施例中,在基因編輯系統中,(b)經工程改造之逆轉錄子ncRNA及(c)引導RNA經融合或以 順式提供,且引導RNA融合至逆轉錄子ncRNA之3’端。 In one embodiment, in the gene editing system, (b) the engineered retrotranscript ncRNA and (c) the guide RNA are fused or provided in cis , and the guide RNA is fused to the 3' end of the retrotranscript ncRNA.

在一實施例中,在基因編輯系統中,(b)經工程改造之逆轉錄子ncRNA及(c)引導RNA經融合或以 順式提供,且經工程改造之ncRNA包含融合至逆轉錄子ncRNA之5'端的第一引導RNA及融合至逆轉錄子ncRNA之3’端的第二引導RNA,且第一及第二引導RNA靶向不同序列。因此,在更廣泛範圍內,在一實施例中,在基因編輯系統中,(c)用於可程式化核酸酶之引導RNA可包含靶向相同或不同標靶序列之一或多個引導物。在一實施例中,此類引導RNA可為單一引導RNA或sgRNA;例如,當核酸可程式化核酸酶包含Cas9時。 In one embodiment, in a gene editing system, (b) an engineered retrotranscript ncRNA and (c) a guide RNA are fused or provided in tandem , and the engineered ncRNA comprises a first guide RNA fused to the 5' end of the retrotranscript ncRNA and a second guide RNA fused to the 3' end of the retrotranscript ncRNA, and the first and second guide RNAs target different sequences. Therefore, in a broader scope, in one embodiment, in a gene editing system, (c) a guide RNA for a programmable nuclease may comprise one or more guides targeting the same or different target sequences. In one embodiment, such a guide RNA may be a single guide RNA or sgRNA; for example, when the nucleic acid programmable nuclease comprises Cas9.

在一實施例中,在基因編輯系統中,一或多種遞送媒劑包含脂質體或脂質奈米顆粒(LNP)。In one embodiment, in a gene editing system, one or more delivery vehicles comprise liposomes or lipid nanoparticles (LNPs).

在一實施例中,在基因編輯系統中,(a)至少一種編碼(i)核酸可程式化核酸酶及(ii)逆轉錄子逆轉錄酶之mRNA分子及(b)經工程改造之逆轉錄子ncRNA係在同一遞送媒劑中。In one embodiment, in a gene editing system, (a) at least one mRNA molecule encoding (i) a nucleic acid programmable nuclease and (ii) a retrotranscriptase and (b) an engineered retrotranscript ncRNA are in the same delivery vehicle.

在一實施例中,在基因編輯系統中,(a)至少一種編碼(i)核酸可程式化核酸酶及(ii)逆轉錄子逆轉錄酶之mRNA分子及(b)經工程改造之逆轉錄子ncRNA係在單獨遞送媒劑中。In one embodiment, in a gene editing system, (a) at least one mRNA molecule encoding (i) a nucleic acid programmable nuclease and (ii) a retrotranscriptase and (b) an engineered retrotranscript ncRNA are in separate delivery vehicles.

在一實施例中,在基因編輯系統中,核酸可程式化核酸酶及逆轉錄子逆轉錄酶係在單獨mRNA分子上經編碼且(a)(i)及(a)(ii)之彼等單獨mRNA分子含於同一遞送媒劑中。In one embodiment, in the gene editing system, the nucleic acid programmable nuclease and the retrotranscriptase are encoded on separate mRNA molecules and those separate mRNA molecules of (a)(i) and (a)(ii) are contained in the same delivery vehicle.

在一實施例中,在基因編輯系統中,核酸可程式化核酸酶及逆轉錄子逆轉錄酶係在單獨mRNA分子上經編碼且(a)(i)及(a)(ii)之彼等單獨mRNA分子含於不同遞送媒劑中。In one embodiment, in the gene editing system, the nucleic acid programmable nuclease and the retrotransposase are encoded on separate mRNA molecules and those separate mRNA molecules of (a)(i) and (a)(ii) are contained in different delivery vehicles.

在一實施例中,在基因編輯系統中,經工程改造之逆轉錄子ncRNA包括編碼供體多核苷酸之相關序列,供體多核苷酸包含欲整合至細胞中之標靶序列處的預期編輯,且其中供體多核苷酸側接與標靶序列5’處之序列雜交的5'同源臂及與標靶序列3’處之序列雜交的3'同源臂。在一實施例中,供體多核苷酸可為細胞異源性的。在一實施例中,供體多核苷酸可為細胞內源性的;例如,細胞可含有具有疾病狀態之群體中之彼等所特有的序列,且供體多核苷酸可為不具有非疾病狀態之群體中之彼等所特有的序列(例如,供體可用於細胞之基因校正或修復,以將細胞自具有引起疾病狀態之突變或修飾修飾為具有不具有疾病狀態所特有的序列)。這可在動物細胞或哺乳動物細胞(例如,靈長類動物、非人類靈長類動物或馴養哺乳動物,諸如貓或犬或馬)或人類細胞中進行;例如以校正、解決、治療、減輕動物、哺乳動物、馴養哺乳動物、貓、犬、馬或人類之基因疾患。這可在植物細胞中進行以引入突變,由此產生有利之表型特徵,諸如抗病性或其他有利之植物特質。In one embodiment, in a gene editing system, an engineered retrotranscript ncRNA includes a sequence encoding a donor polynucleotide comprising the desired edit at a target sequence to be integrated into a cell, and wherein the donor polynucleotide is flanked by a 5' homology arm hybridized to a sequence at 5' of the target sequence and a 3' homology arm hybridized to a sequence at 3' of the target sequence. In one embodiment, the donor polynucleotide may be heterologous to the cell. In one embodiment, the donor polynucleotide may be endogenous to the cell; for example, the cell may contain a sequence that is unique to those in a population with a disease state, and the donor polynucleotide may be a sequence that is not unique to those in a population without a disease state (for example, the donor may be used for gene correction or repair of the cell to modify the cell from having a mutation or modification that causes a disease state to having a sequence that is not unique to the disease state). This may be done in an animal cell or a mammalian cell (e.g., a primate, a non-human primate, or a domesticated mammal, such as a cat or dog or horse) or a human cell; for example, to correct, resolve, treat, alleviate a genetic disease in an animal, mammal, domesticated mammal, cat, dog, horse, or human. This can be done in plant cells to introduce mutations that result in beneficial phenotypic traits such as disease resistance or other beneficial plant traits.

在一實施例中,在基因編輯系統中,核酸可程式化核酸酶包含Cas9核酸酶、TnpB核酸酶或Cas12a核酸酶。In one embodiment, in the gene editing system, the nucleic acid programmable nuclease comprises Cas9 nuclease, TnpB nuclease or Cas12a nuclease.

在一實施例中,在基因編輯系統中,經工程改造之逆轉錄子ncRNA包含:A)具有逆轉錄子ncRNA之第一互補區的前驅 msr序列;B)包括 msr莖-環結構之 msr序列;C)包括 msd莖-環結構及相關序列之 msd序列,其中該 msd序列在逆轉錄子逆轉錄酶存在下模板化單股DNA產物(RT-DNA);及D)具有第二互補區之 msd後序列,其中第一及第二互補區形成逆轉錄子ncRNA之a1/a2雙鏈體區,其中 msr莖-環結構、 msd莖-環結構或a1/a2雙鏈體包含在與一或多種引導RNA締合之核酸可程式化核酸酶存在下導致編輯效率增加之修飾,且其中視情況,(c)之一或多種引導RNA與前驅 msr序列、 msd後序列或前驅 msr序列及 msd後序列兩者偶合。在其中經工程改造之逆轉錄子ncRNA包含A)、B)、C)及D)之此類實施例中,其中相關序列可編碼供體多核苷酸,供體多核苷酸包含欲整合至細胞之標靶序列處的預期編輯,其中供體多核苷酸側接與標靶序列5’處之序列雜交的5'同源臂及與標靶序列3’處之序列雜交的3'同源臂。在其中經工程改造之逆轉錄子ncRNA包含A)、B)、C)及D) (以及編碼供體多核苷酸之相關序列或簡單地為相關序列)之此類實施例中,ncRNA具有表B之核苷酸序列,或與表B之序列具有至少75%、80%、85%、90%、95%、99%或100%序列一致性之核苷酸序列。供體多核苷酸可為細胞異源性的。或者,供體多核苷酸可為細胞內源性的。例如,細胞可含有具有疾病狀態之群體中之彼等所特有的序列,且供體多核苷酸可為不具有非疾病狀態之群體中之彼等所特有的序列(例如,供體可用於細胞之基因校正或修復,以將細胞自具有引起疾病狀態之突變或修飾修飾為具有不具有疾病狀態所特有的序列)。 In one embodiment, in a gene editing system, an engineered retrotransposon ncRNA comprises: A) a leading msr sequence having a first complementary region of the retrotransposon ncRNA; B) an msr sequence including an msr stem-loop structure; C) an msd sequence including an msd stem-loop structure and related sequences, wherein the msd sequence templates a single-stranded DNA product (RT-DNA) in the presence of a retrotransposon reverse transcriptase; and D) an msd post-sequence having a second complementary region, wherein the first and second complementary regions form an a1/a2 duplex region of the retrotransposon ncRNA, wherein the msr stem-loop structure, The msd stem-loop structure or a1/a2 duplex comprises a modification that results in increased editing efficiency in the presence of a nucleic acid programmable nuclease associated with one or more guide RNAs, and wherein, as the case may be, one or more guide RNAs of (c) are coupled to the leading msr sequence, the msd post sequence, or both the leading msr sequence and the msd post sequence. In such embodiments where the engineered retrotranscript ncRNA comprises A), B), C) and D), wherein the sequence of interest may encode a donor polynucleotide comprising the intended editing at a target sequence to be integrated into a cell, wherein the donor polynucleotide is flanked by a 5' homology arm hybridized to a sequence at 5' of the target sequence and a 3' homology arm hybridized to a sequence at 3' of the target sequence. In such embodiments where the engineered retrotranscript ncRNA comprises A), B), C) and D) (and a related sequence encoding a donor polynucleotide or simply a related sequence), the ncRNA has a nucleotide sequence of Table B, or a nucleotide sequence having at least 75%, 80%, 85%, 90%, 95%, 99% or 100% sequence identity with a sequence of Table B. The donor polynucleotide may be heterologous to the cell. Alternatively, the donor polynucleotide may be endogenous to the cell. For example, the cell may contain sequences that are unique to those in a population with a disease state, and the donor polynucleotide may be a sequence that is unique to those in a population that does not have a non-disease state (e.g., the donor may be used for gene correction or repair of the cell to modify the cell from having a mutation or modification that causes a disease state to having a sequence that is not unique to the disease state).

在基因編輯系統之一實施例中,基因編輯系統可包含基因編輯系統之前述實施例之任何組合。In one embodiment of the gene editing system, the gene editing system may include any combination of the previously described embodiments of the gene editing system.

在一實施例中,本揭示案或本文中之發明提供細胞,諸如包含本文(諸如在任一前述段落中)所揭示之基因編輯系統的經分離之細胞。在一實施例中,細胞(例如,經分離之細胞)可為真核細胞。在一實施例中,真核細胞可為植物細胞或動物細胞或哺乳動物細胞,例如經分離之植物細胞或經分離之動物細胞或經分離之哺乳動物細胞。在一實施例中,哺乳動物細胞(例如,經分離之哺乳動物細胞)可為人類細胞。在一實施例中,細胞可為原核細胞,例如細菌細胞。在其中細胞為細菌細胞之此類實施例中,供體多核苷酸可編碼抗生素敏感性;且因此,本發明可涉及一種藉由使抗生素抗性細菌對抗生素敏感來解決此類細菌之手段(且經投與基因編輯系統之個體接著亦可接受抗生素,基因編輯系統使細菌對抗生素敏感)。In one embodiment, the disclosure or invention herein provides a cell, such as an isolated cell comprising a gene editing system disclosed herein (such as in any of the preceding paragraphs). In one embodiment, the cell (e.g., an isolated cell) may be a eukaryotic cell. In one embodiment, the eukaryotic cell may be a plant cell or an animal cell or a mammalian cell, such as an isolated plant cell or an isolated animal cell or an isolated mammalian cell. In one embodiment, the mammalian cell (e.g., an isolated mammalian cell) may be a human cell. In one embodiment, the cell may be a prokaryotic cell, such as a bacterial cell. In such embodiments where the cell is a bacterial cell, the donor polynucleotide may encode antibiotic sensitivity; and thus, the invention may be directed to a means of addressing such bacteria by rendering antibiotic-resistant bacteria sensitive to antibiotics (and the individual to whom the gene editing system is administered may then also receive the antibiotic, the gene editing system rendering the bacteria sensitive to the antibiotic).

在一實施例中,本揭示案或本文中之發明提供一種組合物,該組合物包含:a)本文(諸如在任一前述段落中)所揭示之基因編輯系統;及b)醫藥學上或獸醫學上可接受之載劑。在一實施例中,在該組合物中,遞送媒劑可包含脂質奈米顆粒,該脂質奈米顆粒包含:a)一或多種可離子化脂質;b)一或多種結構脂質;c)一或多種PEG化脂質;及d)一或多種磷脂。在一實施例中,在該組合物中,一或多種可離子化脂質包含表2中所陳述之可離子化脂質。In one embodiment, the disclosure or invention herein provides a composition comprising: a) a gene editing system disclosed herein (such as in any of the preceding paragraphs); and b) a pharmaceutically or veterinarily acceptable carrier. In one embodiment, in the composition, the delivery vehicle may comprise a lipid nanoparticle comprising: a) one or more ionizable lipids; b) one or more structural lipids; c) one or more PEGylated lipids; and d) one or more phospholipids. In one embodiment, in the composition, the one or more ionizable lipids comprise the ionizable lipids set forth in Table 2.

在一實施例中,本揭示案或本文中之發明提供本文(諸如在任一前述段落中)所揭示之基因編輯系統實施例及/或組合物的用途;例如, 活體內活體外離體修飾細胞或遺傳修飾細胞,例如真核或原核細胞及/或動物細胞及/或哺乳動物及/或人類細胞及/或細菌細胞及/或植物細胞(例如本文所論述之任何細胞,其中該細胞包含經分離之細胞)的用途。在一實施例中,本揭示案或本文中之發明提供本文(諸如在任一前述段落中)所揭示之基因編輯系統實施例及/或組合物的用途;例如,治療或解決個體之基因疾患的用途, In one embodiment, the present disclosure or the invention herein provides the use of the gene editing system embodiments and/or compositions disclosed herein (such as in any of the preceding paragraphs); for example, the use of in vivo , in vitro or ex vivo modified cells or genetically modified cells, such as eukaryotic or prokaryotic cells and/or animal cells and/or mammalian cells and/or human cells and/or bacterial cells and/or plant cells (such as any cells discussed herein, wherein the cells include isolated cells). In one embodiment, the present disclosure or the invention herein provides the use of the gene editing system embodiments and/or compositions disclosed herein (such as in any of the preceding paragraphs); for example, the use of treating or resolving a genetic disease in an individual,

在一實施例中,本揭示案或本文中之發明提供遺傳修飾細胞之方法,該等方法包括:接觸如本文所論述(諸如在任一前述段落中)之基因編輯系統,或如本文所論述(諸如在任一前述段落中)之組合物(其包含如本文所論述(諸如在任一前述段落中)之基因編輯系統),有利的是,包括編碼供體多核苷酸之相關序列的基因編輯系統,該供體多核苷酸包含欲整合至細胞中之標靶序列處的預期編輯,該方法包括使該組合物或該基因編輯系統與細胞接觸,由此將RNA貨物遞送至細胞,其中:核酸可程式化核酸酶與引導RNA形成複合物,其中該引導RNA將該複合物引導至標靶序列;核酸可程式化核酸酶在標靶序列中產生雙股斷裂;逆轉錄子逆轉錄酶及經工程改造之逆轉錄子ncRNA產生包含供體多核苷酸之RT DNA;且供體多核苷酸經整合至標靶序列處;藉此編輯該細胞係經遺傳修飾的。在一實施例中,細胞可為真核細胞或原核細胞或動物細胞或哺乳動物細胞或人類細胞或細菌細胞或植物細胞。In one embodiment, the disclosure or invention herein provides methods for genetically modifying cells, the methods comprising: contacting a gene editing system as discussed herein (such as in any of the preceding paragraphs), or a composition as discussed herein (such as in any of the preceding paragraphs) (which comprises a gene editing system as discussed herein (such as in any of the preceding paragraphs)), advantageously, a gene editing system comprising a sequence encoding a donor polynucleotide, the donor polynucleotide comprising the gene to be modified The method comprises contacting the composition or the gene editing system with a cell, thereby delivering the RNA cargo to the cell, wherein: the nucleic acid programmable nuclease forms a complex with a guide RNA, wherein the guide RNA guides the complex to the target sequence; the nucleic acid programmable nuclease generates a double-strand break in the target sequence; the retrotranscriptase and the engineered retrotranscript ncRNA generate RT DNA comprising a donor polynucleotide; and the donor polynucleotide is integrated into the target sequence; thereby editing the cell to be genetically modified. In one embodiment, the cell can be a eukaryotic cell or a prokaryotic cell or an animal cell or a mammalian cell or a human cell or a bacterial cell or a plant cell.

本揭示案之例示性及非限制性態樣及實施例以編號段落之形式概述如下。Exemplary and non-limiting aspects and embodiments of the present disclosure are summarized below in numbered paragraphs.

1.    一種包含一或多種遞送媒劑之基因編輯系統,其中: 該(等)遞送媒劑包含RNA貨物, 該RNA貨物包含(a)至少一種編碼(i)核酸可程式化核酸酶及(ii)逆轉錄子逆轉錄酶之mRNA分子,(b)經工程改造之逆轉錄子ncRNA,及(c)用於核酸可程式化核酸酶之引導RNA, 每種遞送媒劑含有(a)(i)及/或(a)(ii)及/或(b)及/或(c), 藉此一種遞送媒劑或超過一種遞送媒劑遞送(a)(i)、(a)(ii)、(b)及(c)。 1. A gene editing system comprising one or more delivery vehicles, wherein: The delivery vehicle(s) comprises an RNA cargo, The RNA cargo comprises (a) at least one mRNA molecule encoding (i) a nucleic acid programmable nuclease and (ii) a retrotranscriptase, (b) an engineered retrotranscript ncRNA, and (c) a guide RNA for the nucleic acid programmable nuclease, Each delivery vehicle contains (a)(i) and/or (a)(ii) and/or (b) and/or (c), and (a)(i), (a)(ii), (b) and (c) are delivered by one or more delivery vehicles.

2.    段落1之基因編輯系統, 其中該經工程改造之逆轉錄子ncRNA包含取代至逆轉錄子ncRNA中之HDR核苷酸序列; 其中該逆轉錄子逆轉錄酶具有與表A之逆轉錄子逆轉錄酶包含至少90%序列一致性之胺基酸序列; 其中該逆轉錄子ncRNA與表B之逆轉錄子ncRNA具有約85%至98%序列一致性。 2.    The gene editing system of paragraph 1, wherein the engineered retrovirus ncRNA comprises an HDR nucleotide sequence substituted into the retrovirus ncRNA; wherein the retrovirus reverse transcriptase has an amino acid sequence having at least 90% sequence identity with the retrovirus reverse transcriptase of Table A; wherein the retrovirus ncRNA has about 85% to 98% sequence identity with the retrovirus ncRNA of Table B.

3.    段落2之基因編輯系統,其中該逆轉錄子ncRNA及該逆轉錄子逆轉錄酶來自同一進化枝。3. The gene editing system of paragraph 2, wherein the retrotranscript ncRNA and the retrotranscript reverse transcriptase are from the same evolutionary branch.

4.    段落2之基因編輯系統,其中該逆轉錄子ncRNA核苷酸序列與SEQ ID NO:15327具有約85%至98%序列一致性,且該逆轉錄子逆轉錄酶與I-C型逆轉錄子逆轉錄酶具有至少90%序列一致性。4. The gene editing system of paragraph 2, wherein the retrotranscript ncRNA nucleotide sequence has about 85% to 98% sequence identity with SEQ ID NO:15327, and the retrotranscript reverse transcriptase has at least 90% sequence identity with type I-C retrotranscript reverse transcriptase.

5.    段落4之基因編輯系統,其中該逆轉錄子逆轉錄酶包含與SEQ ID NO:1262至少約90%一致之胺基酸序列。5. The gene editing system of paragraph 4, wherein the retrovirus reverse transcriptase comprises an amino acid sequence that is at least about 90% identical to SEQ ID NO:1262.

6.    段落4之基因編輯系統,其中該逆轉錄子ncRNA核苷酸序列與SEQ ID NO:16411具有約85%至98%序列一致性,且該逆轉錄子逆轉錄酶與III型逆轉錄子逆轉錄酶具有至少90%序列一致性。6. The gene editing system of paragraph 4, wherein the retrotranscript ncRNA nucleotide sequence has about 85% to 98% sequence identity with SEQ ID NO:16411, and the retrotranscript reverse transcriptase has at least 90% sequence identity with type III retrotranscript reverse transcriptase.

7.    段落6之基因編輯系統,其中該逆轉錄子逆轉錄酶包含與SEQ ID NO:2781至少約90%一致之胺基酸序列。7. The gene editing system of paragraph 6, wherein the retrovirus reverse transcriptase comprises an amino acid sequence that is at least about 90% identical to SEQ ID NO:2781.

8.    段落6之基因編輯系統,其中該逆轉錄子ncRNA核苷酸序列與SEQ ID NO:18731具有約85%至98%序列一致性,且該逆轉錄子逆轉錄酶與XIII型逆轉錄子逆轉錄酶具有至少90%序列一致性。8. The gene editing system of paragraph 6, wherein the retrotranscript ncRNA nucleotide sequence has about 85% to 98% sequence identity with SEQ ID NO:18731, and the retrotranscript reverse transcriptase has at least 90% sequence identity with type XIII retrotranscript reverse transcriptase.

9.    段落8之基因編輯系統,其中該逆轉錄子逆轉錄酶包含與SEQ ID NO:6342至少約90%一致之胺基酸序列。9. The gene editing system of paragraph 8, wherein the retrovirus reverse transcriptase comprises an amino acid sequence that is at least about 90% identical to SEQ ID NO:6342.

10.  段落1之基因編輯系統,其中該逆轉錄子逆轉錄酶包含至少一種增加可加工性及/或保真度之胺基酸取代。10. The gene editing system of paragraph 1, wherein the retrovirus reverse transcriptase comprises at least one amino acid substitution that increases processibility and/or fidelity.

11.  段落10之基因編輯系統,其中該逆轉錄子逆轉錄酶包含對應於Eco1 RT中之以下胺基酸殘基的胺基酸殘基中之胺基酸取代:Q190、E302或T306。11. The gene editing system of paragraph 10, wherein the retrovirus reverse transcriptase comprises an amino acid substitution in an amino acid residue corresponding to the following amino acid residue in Eco1 RT: Q190, E302 or T306.

12.  段落10之基因編輯系統,其中該逆轉錄子逆轉錄酶包含對應於Eco1 RT中之以下胺基酸取代的胺基酸殘基中之胺基酸取代:Q190F、E302R或T306K。12. The gene editing system of paragraph 10, wherein the retrovirus reverse transcriptase comprises an amino acid substitution in an amino acid residue corresponding to the following amino acid substitutions in Eco1 RT: Q190F, E302R or T306K.

13.  一種包含一或多種遞送媒劑之基因編輯系統,其中: 該(等)遞送媒劑包含RNA貨物, 該RNA貨物包含(a)至少一種編碼(i)核酸可程式化核酸酶及(ii)經工程改造之逆轉錄子逆轉錄酶的mRNA分子,(b)經工程改造之逆轉錄子ncRNA,及(c)用於該可程式化核酸酶之引導RNA, 每種遞送媒劑含有(a)(i)及/或(a)(ii)及/或(b)及/或(c), 藉此一種遞送媒劑或超過一種遞送媒劑遞送(a)(i)、(a)(ii)、(b)及(c),且 其中該經工程改造之逆轉錄子逆轉錄酶包含可加工性增強結構域或保真度增強結構域。 13. A gene editing system comprising one or more delivery vehicles, wherein: the delivery vehicle(s) comprises an RNA cargo, the RNA cargo comprises (a) at least one mRNA molecule encoding (i) a nucleic acid programmable nuclease and (ii) an engineered retrotranscript reverse transcriptase, (b) an engineered retrotranscript ncRNA, and (c) a guide RNA for the programmable nuclease, each delivery vehicle contains (a)(i) and/or (a)(ii) and/or (b) and/or (c), (a)(i), (a)(ii), (b) and (c) are delivered by one or more delivery vehicles, and The engineered retrotransposase comprises a processability-enhancing domain or a fidelity-enhancing domain.

14.  段落13之基因編輯系統,其中該可加工性增強結構域包含Sso7d或Sac7d。14. The gene editing system of paragraph 13, wherein the processability enhancing domain comprises Sso7d or Sac7d.

15.  段落13之基因編輯系統,其中該保真度增強結構域包含3’至5’核酸外切酶結構域。15. The gene editing system of paragraph 13, wherein the fidelity enhancing domain comprises a 3’ to 5’ exonuclease domain.

16.  段落15之基因編輯系統,其中該核酸外切酶結構域包含POLE1 POLD1、POLG、Pfu或KOD。16. The gene editing system of paragraph 15, wherein the exonuclease domain comprises POLE1, POLD1, POLG, Pfu or KOD.

17.  段落13之基因編輯系統,其中該經工程改造之逆轉錄子ncRNA包含取代至逆轉錄子ncRNA中之HDR核苷酸序列; 其中該逆轉錄子逆轉錄酶具有與表A之逆轉錄子逆轉錄酶包含至少90%序列一致性之胺基酸序列; 其中該逆轉錄子ncRNA與表B之逆轉錄子ncRNA具有約85%至98%序列一致性。 17. The gene editing system of paragraph 13, wherein the engineered retroviral ncRNA comprises an HDR nucleotide sequence substituted into the retroviral ncRNA; wherein the retroviral reverse transcriptase has an amino acid sequence having at least 90% sequence identity with the retroviral reverse transcriptase of Table A; wherein the retroviral ncRNA has about 85% to 98% sequence identity with the retroviral ncRNA of Table B.

18.  段落13之基因編輯系統,其中該逆轉錄子ncRNA及該逆轉錄子逆轉錄酶來自同一進化枝。18. The gene editing system of paragraph 13, wherein the retrotranscript ncRNA and the retrotranscript reverse transcriptase are from the same evolutionary branch.

19.  一種包含一或多種遞送媒劑之基因編輯系統,其中: 該(等)遞送媒劑包含RNA貨物, 該RNA貨物包含(a)至少一種編碼(i)核酸可程式化核酸酶及(ii)經工程改造之逆轉錄酶的mRNA分子,(b)經工程改造之逆轉錄子ncRNA,及(c)用於該可程式化核酸酶之引導RNA, 每種遞送媒劑含有(a)(i)及/或(a)(ii)及/或(b)及/或(c), 藉此一種遞送媒劑或超過一種遞送媒劑遞送(a)(i)、(a)(ii)、(b)及(c),且 其中該經工程改造之逆轉錄酶包含來自對應於該經工程改造之逆轉錄子ncRNA的逆轉錄子RT之Y區結構域。 19. A gene editing system comprising one or more delivery vehicles, wherein: the delivery vehicle(s) comprises an RNA cargo, the RNA cargo comprises (a) at least one mRNA molecule encoding (i) a nucleic acid programmable nuclease and (ii) an engineered reverse transcriptase, (b) an engineered reverse transcriptase ncRNA, and (c) a guide RNA for the programmable nuclease, each delivery vehicle contains (a)(i) and/or (a)(ii) and/or (b) and/or (c), (a)(i), (a)(ii), (b) and (c) are delivered by one delivery vehicle or more than one delivery vehicle, and The engineered reverse transcriptase comprises a Y region domain from a reverse transcriptase RT corresponding to the engineered reverse transcriptase ncRNA.

20.  段落19之基因編輯系統,其中該經工程改造之逆轉錄酶係包含與逆轉錄子RT之Y區融合的MMLV RT之嵌合體。20. The gene editing system of paragraph 19, wherein the engineered reverse transcriptase is a chimera comprising MMLV RT fused to the Y region of the retrotranscript RT.

21.  段落1之基因編輯系統,其中(a)(i)及(a)(ii)包含編碼該核酸可程式化核酸酶及該逆轉錄子逆轉錄酶之單一mRNA分子。21. The gene editing system of paragraph 1, wherein (a)(i) and (a)(ii) comprise a single mRNA molecule encoding the nucleic acid programmable nuclease and the retrotransposase.

22.  段落21之基因編輯系統,其中(a)(i)及(a)(ii)經編碼且表現為融合蛋白。22. The gene editing system of paragraph 21, wherein (a)(i) and (a)(ii) are encoded and expressed as a fusion protein.

23.  段落22之基因編輯系統,其中該融合蛋白包含與該逆轉錄子逆轉錄酶之N末端融合的該核酸可程式化核酸酶之C末端(核酸酶:RT融合)。23. The gene editing system of paragraph 22, wherein the fusion protein comprises the C-terminus of the nucleic acid programmable nuclease fused to the N-terminus of the retrotranscriptase (nuclease:RT fusion).

24.  段落22之基因編輯系統,其中該融合蛋白包含與該逆轉錄子逆轉錄酶之C末端融合的該核酸可程式化核酸酶之N末端(RT:核酸酶融合)。24. The gene editing system of paragraph 22, wherein the fusion protein comprises the N-terminus of the nucleic acid programmable nuclease fused to the C-terminus of the reverse transcriptase of the retrovirus (RT: nuclease fusion).

25.  段落1之基因編輯系統,其中(a)(i)及(a)(ii)包含編碼該核酸可程式化核酸酶之第一mRNA分子及編碼該逆轉錄子逆轉錄酶之第二mRNA分子。25. The gene editing system of paragraph 1, wherein (a)(i) and (a)(ii) comprise a first mRNA molecule encoding the nucleic acid programmable nuclease and a second mRNA molecule encoding the retrotransposase.

26.  段落1之基因編輯系統,其中(c)與(a)(i)、(a)(ii)及(b)分開或以 反式提供。 26. The gene editing system of paragraph 1, wherein (c) is separated from (a)(i), (a)(ii) and (b) or provided in trans .

27.  段落1之基因編輯系統,其中(b)該經工程改造之逆轉錄子ncRNA及(c)該引導RNA經融合或以 順式提供。 27. The gene editing system of paragraph 1, wherein (b) the engineered retrotran ncRNA and (c) the guide RNA are fused or provided in cis form .

28.  段落27之基因編輯系統,其中該引導RNA融合至該逆轉錄子ncRNA之5'端。28. The gene editing system of paragraph 27, wherein the guide RNA is fused to the 5' end of the retrotranscript ncRNA.

29.  段落27之基因編輯系統,其中該引導RNA融合至該逆轉錄子ncRNA之3'端。29. The gene editing system of paragraph 27, wherein the guide RNA is fused to the 3' end of the retrotranscript ncRNA.

30.  段落27之基因編輯系統,其中該經工程改造之ncRNA包含融合至該逆轉錄子ncRNA之5'端的第一引導RNA及融合至該逆轉錄子ncRNA之3’端的第二引導RNA,且該等第一及第二引導RNA靶向不同序列。30. The gene editing system of paragraph 27, wherein the engineered ncRNA comprises a first guide RNA fused to the 5' end of the retrotranscript ncRNA and a second guide RNA fused to the 3' end of the retrotranscript ncRNA, and the first and second guide RNAs target different sequences.

31.  段落1之基因編輯系統,其中該一或多種遞送媒劑包含脂質體或脂質奈米顆粒(LNP)。31. The gene editing system of paragraph 1, wherein the one or more delivery vehicles comprise liposomes or lipid nanoparticles (LNPs).

32.  段落1之基因編輯系統,其中(a)該至少一種編碼(i)該核酸可程式化核酸酶及(ii)該逆轉錄子逆轉錄酶之mRNA分子及(b)該經工程改造之逆轉錄子ncRNA係在同一遞送媒劑中。32. The gene editing system of paragraph 1, wherein (a) the at least one mRNA molecule encoding (i) the nucleic acid programmable nuclease and (ii) the retrotranscriptase and (b) the engineered retrotranscript ncRNA are in the same delivery medium.

33.  段落1之基因編輯系統,其中(a)該至少一種編碼(i)該核酸可程式化核酸酶及(ii)該逆轉錄子逆轉錄酶之mRNA分子及(b)該經工程改造之逆轉錄子ncRNA係在單獨遞送媒劑中。33. The gene editing system of paragraph 1, wherein (a) the at least one mRNA molecule encoding (i) the nucleic acid programmable nuclease and (ii) the retrotranscriptase and (b) the engineered retrotranscript ncRNA are in a separate delivery medium.

34.  段落1之基因編輯系統,其中該核酸可程式化核酸酶及該逆轉錄子逆轉錄酶係在單獨mRNA分子上經編碼且(a)(i)及(a)(ii)之彼等單獨mRNA分子含於同一遞送媒劑中。34. The gene editing system of paragraph 1, wherein the nucleic acid programmable nuclease and the retroviral reverse transcriptase are encoded on separate mRNA molecules and those separate mRNA molecules of (a)(i) and (a)(ii) are contained in the same delivery medium.

35.  段落1之基因編輯系統,其中該核酸可程式化核酸酶及該逆轉錄子逆轉錄酶係在單獨mRNA分子上經編碼且(a)(i)及(a)(ii)之彼等單獨mRNA分子含於不同遞送媒劑中。35. The gene editing system of paragraph 1, wherein the nucleic acid programmable nuclease and the retroviral reverse transcriptase are encoded on separate mRNA molecules and those separate mRNA molecules of (a)(i) and (a)(ii) are contained in different delivery vehicles.

36.  段落1之基因編輯系統,其中該經工程改造之逆轉錄子ncRNA包括編碼供體多核苷酸之相關序列,該供體多核苷酸包含欲整合至細胞中之標靶序列處的預期編輯,且其中該供體多核苷酸側接與該標靶序列5’處之序列雜交的5'同源臂及與該標靶序列3’處之序列雜交的3'同源臂。36. The gene editing system of paragraph 1, wherein the engineered retrotranscript ncRNA comprises a sequence encoding a donor polynucleotide comprising the desired edit at a target sequence to be integrated into a cell, and wherein the donor polynucleotide is flanked by a 5' homology arm hybridized to a sequence at the 5' position of the target sequence and a 3' homology arm hybridized to a sequence at the 3' position of the target sequence.

37.  段落1之基因編輯系統,其中該核酸可程式化核酸酶包含Cas9核酸酶、TnpB核酸酶或Cas12a核酸酶。37. The gene editing system of paragraph 1, wherein the nucleic acid programmable nuclease comprises Cas9 nuclease, TnpB nuclease or Cas12a nuclease.

38.  段落1之基因編輯系統,其中該核酸可程式化核酸酶包含Cas9核酸酶。38. The gene editing system of paragraph 1, wherein the nucleic acid programmable nuclease comprises a Cas9 nuclease.

39.  段落1之基因編輯系統,其中該核酸可程式化核酸酶包含Cas9切口酶。39. The gene editing system of paragraph 1, wherein the nucleic acid programmable nuclease comprises a Cas9 nickase.

40.  一種經分離之細胞,其包含段落1之基因編輯系統。40. An isolated cell comprising the gene editing system of paragraph 1.

41.  段落40之經分離之細胞,其中該經分離之細胞為哺乳動物細胞。41. The isolated cell of paragraph 40, wherein the isolated cell is a mammalian cell.

42.  段落41之經分離之細胞,其中該哺乳動物細胞為人類細胞。42. The isolated cell of paragraph 41, wherein the mammalian cell is a human cell.

43.  一種組合物,其包含: a)  段落1之基因編輯系統;及 b)  醫藥學上或獸醫學上可接受之載劑。 43. A composition comprising: a) the gene editing system of paragraph 1; and b) a pharmaceutically or veterinarily acceptable carrier.

44.  段落43之組合物,其中該遞送媒劑為脂質奈米顆粒,其包含: a)  一或多種可離子化脂質; b)  一或多種結構脂質; c)  一或多種PEG化脂質;及 d)  一或多種磷脂。 44. The composition of paragraph 43, wherein the delivery vehicle is a lipid nanoparticle comprising: a) one or more ionizable lipids; b) one or more structured lipids; c) one or more PEGylated lipids; and d) one or more phospholipids.

45.    段落44之組合物,其中該一或多種可離子化脂質包含表2中所陳述之可離子化脂質。45. The composition of paragraph 44, wherein the one or more ionizable lipids comprise the ionizable lipids described in Table 2.

46.    一種遺傳修飾細胞之方法,該方法包括: 使段落1之基因編輯系統與該細胞接觸,由此將該RNA貨物遞送至該細胞, 其中: 該核酸可程式化核酸酶與該引導RNA形成複合物,其中該引導RNA將該複合物引導至該標靶序列, 該核酸可程式化核酸酶在該標靶序列中產生雙股斷裂, 該逆轉錄子逆轉錄酶及該經工程改造之逆轉錄子ncRNA產生包含該供體多核苷酸之RT DNA,且 該供體多核苷酸整合至該標靶序列處, 藉此編輯該細胞係經遺傳修飾的。 46.    A method for genetically modifying a cell, the method comprising: contacting the gene editing system of paragraph 1 with the cell, thereby delivering the RNA cargo to the cell, wherein: the nucleic acid programmable nuclease forms a complex with the guide RNA, wherein the guide RNA guides the complex to the target sequence, the nucleic acid programmable nuclease generates a double-strand break in the target sequence, the retrotranscriptase and the engineered retrotranscript ncRNA generate RT DNA comprising the donor polynucleotide, and the donor polynucleotide is integrated into the target sequence, thereby editing the cell to be genetically modified.

本揭示案之其他例示性及非限制性態樣及實施例以編號段落之形式概述如下。Other exemplary and non-limiting aspects and embodiments of the present disclosure are summarized below in numbered paragraphs.

1.    一種經工程改造之核酸構築體,其包含: a)  編碼非編碼RNA (ncRNA)之第一多核苷酸,該第一多核苷酸包含: 1)  編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及 2)  編碼該msDNA之 msdRNA部分的 msd基因座;及 b)  在選自以下之位置處或內部插入的一或多種異源核酸:該 msd基因座、該 msr基因座上游、該 msd基因座上游及該 msd基因座下游, 其中該ncRNA包含: (I) 表B中列出之ncRNA,或與表B中列出之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA;及/或 (II)       具有圖2-27之任一ncRNA結構之保守結構的ncRNA;且 其中該ncRNA視情況排除自然界中與表X之任一逆轉錄子逆轉錄酶相關的任何ncRNA。 1. An engineered nucleic acid construct comprising: a) a first polynucleotide encoding a non-coding RNA (ncRNA), the first polynucleotide comprising: 1) an msr locus encoding the msr RNA portion of a multi-copy single-stranded DNA (msDNA); and 2) an msd locus encoding the msd RNA portion of the msDNA; and b) one or more heterologous nucleic acids inserted at or within a position selected from: the msd locus, upstream of the msr locus, upstream of the msd locus, and downstream of the msd locus, wherein the ncRNA comprises: (I) ncRNAs listed in Table B, or ncRNAs having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with the ncRNAs listed in Table B; and/or (II) ncRNAs having a conservative structure of any of the ncRNA structures of Figures 2-27; and wherein the ncRNAs exclude any ncRNAs associated with any of the retrotransposons in Table X in nature, as the case may be.

2.    段落1之經工程改造之核酸構築體,其進一步包含編碼逆轉錄酶(RT)或其部分之第二多核苷酸,其中該經編碼RT或其部分能夠合成編碼該msDNA之該 msd基因座中的至少一部分之DNA複本。 2. The engineered nucleic acid construct of paragraph 1, further comprising a second polynucleotide encoding a reverse transcriptase (RT) or a portion thereof, wherein the encoded RT or a portion thereof is capable of synthesizing a DNA copy of at least a portion of the msd locus encoding the msDNA.

3.    段落2之經工程改造之核酸構築體, 其中該第二多核苷酸包含: III)       表A中列出之多核苷酸,或與表A中列出之多核苷酸具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多核苷酸;及/或 IV)      編碼表C之共有胺基酸序列,或編碼與表C中列出之胺基酸序列具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之胺基酸序列;及/或 其中該第二多核苷酸編碼: V) 表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或 VI)      包含表C中列出之多肽共有序列的多肽,或與表C中列出之胺基酸序列具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或 其中該第二多核苷酸視情況不編碼表X中列出之胺基酸序列。 3.    The engineered nucleic acid construct of paragraph 2, wherein the second polynucleotide comprises: III)       a polynucleotide listed in Table A, or a polynucleotide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with a polynucleotide listed in Table A; and/or IV)    Encoding a consensus amino acid sequence of Table C, or encoding an amino acid sequence having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with an amino acid sequence listed in Table C; and/or wherein the second polynucleotide encodes: V) A polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with a polypeptide listed in Table A; and/or VI)     A polypeptide comprising a consensus sequence of the polypeptides listed in Table C, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with an amino acid sequence listed in Table C; and/or wherein the second polynucleotide does not encode an amino acid sequence listed in Table X as the case may be.

4.    一種經工程改造之核酸構築體,其包含: a)  編碼非編碼RNA (ncRNA)之第一多核苷酸,該第一多核苷酸包含: 1)  編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及 2)  編碼該msDNA之 msdRNA部分的 msd基因座; b)  在選自以下之位置處或內部插入的一或多種異源核酸:該 msd基因座、該 msr基因座上游、該 msd基因座上游及該 msd基因座下游;及 c)  編碼逆轉錄酶(RT)或其部分之第二多核苷酸,其中該經編碼RT或其部分能夠合成編碼該msDNA之該 msd基因座中的至少一部分之DNA複本,且 其中該第一多核苷酸之該非編碼RNA (ncRNA)視情況具有圖2-27之任一ncRNA結構之保守結構; 其中該第二多核苷酸包含: I)  表A中列出之多核苷酸,或與表A中列出之多核苷酸具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多核苷酸;及/或 其中該第二多核苷酸編碼: II) 表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或 IV)      包含表C中列出之多肽共有序列的多肽,或與表C中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;且 其中該第二多核苷酸視情況不編碼表X之胺基酸序列。 4. An engineered nucleic acid construct comprising: a) a first polynucleotide encoding a non-coding RNA (ncRNA), the first polynucleotide comprising: 1) an msr locus encoding the msr RNA portion of multiple copies of single-stranded DNA (msDNA); and 2) an msd locus encoding the msd RNA portion of the msDNA; b) one or more heterologous nucleic acids inserted at or within a position selected from: the msd locus, upstream of the msr locus, upstream of the msd locus, and downstream of the msd locus; and c) a second polynucleotide encoding a reverse transcriptase (RT) or a portion thereof, wherein the encoded RT or a portion thereof is capable of synthesizing a DNA copy of at least a portion of the msd locus encoding the msDNA, and wherein the non-coding RNA (ncRNA) of the first polynucleotide optionally has a conserved structure of any one of the ncRNA structures of Figures 2-27; wherein the second polynucleotide comprises: 1) The polynucleotides listed in Table A, or polynucleotides having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to the polynucleotides listed in Table A; and/or wherein the second polynucleotide encodes: II) A polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polypeptide listed in Table A; and/or IV) A polypeptide comprising a consensus sequence of a polypeptide listed in Table C, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polypeptide listed in Table C; and wherein the second polynucleotide optionally does not encode the amino acid sequence of Table X.

4a.  一種經工程改造之核酸構築體,其包含: 1) msr基因座(其編碼msDNA之 msrRNA部分); 2)  編碼該msDNA之 msdRNA部分的 msd基因座; 3)  編碼逆轉錄子逆轉錄酶(RT)之序列,其中該 msdRNA能夠由該逆轉錄子逆轉錄酶(RT)逆轉錄以形成該msDNA;及 4)  在該 msd基因座、該 msr基因座上游、該 msd基因座上游或下游處或內部插入之異源核酸; 其中該經工程改造之核酸構築體視情況具有(a)圖2-27中任一者之野生型ncRNA的二級結構,或 b)  a)之變異體,其具有: i)   每10個紅色字母核苷酸多達1、2或3個(例如,多達1個)核苷酸變化; ii)  每10個黑色字母核苷酸多達4、5或6個(例如,多達1或2個)核苷酸變化;及/或 iii) 每10個灰色字母核苷酸多達7、8或9個(例如,多達3或4個)核苷酸變化;及/或 視情況進一步包含: i)   每10個紅色圓圈核苷酸存在7、8、9或10個(例如,9或10個)核苷酸; ii)  每10個黑色圓圈核苷酸存在6、7、8、9或10個(例如,8、9或10個)核苷酸; iii) 每10個灰色圓圈核苷酸存在4、5、6、7、8、9或10個(例如,6、7、8、9或10個)核苷酸;及/或 iv) 每10個白色圓圈核苷酸存在2、3、4、5、6、7、8、9或10個(例如,4、5、6、7、8、9或10個)核苷酸。 4a. An engineered nucleic acid construct comprising: 1) an msr locus (which encodes the msr RNA portion of the msDNA); 2) an msd locus encoding the msd RNA portion of the msDNA; 3) a sequence encoding a retrotranscriptase (RT), wherein the msd RNA can be reverse transcribed by the retrotranscriptase (RT) to form the msDNA; and 4) a heterologous nucleic acid inserted at the msd locus, upstream of the msr locus, upstream or downstream of the msd locus, or within the msd locus; wherein the engineered nucleic acid construct has, as the case may be, (a) the secondary structure of the wild-type ncRNA of any one of Figures 2-27, or b) a variant of a), which has: i) up to 1, 2 or 3 (e.g., up to 1) nucleotide changes for every 10 red letter nucleotides; ii) up to 4, 5 or 6 (e.g., up to 1 or 2) nucleotide changes for every 10 black lettered nucleotides; and/or iii) up to 7, 8 or 9 (e.g., up to 3 or 4) nucleotide changes for every 10 grey lettered nucleotides; and/or further comprising, as appropriate: i) 7, 8, 9 or 10 (e.g., 9 or 10) nucleotides for every 10 red circled nucleotides; ii) 6, 7, 8, 9 or 10 (e.g., 8, 9 or 10) nucleotides for every 10 black circled nucleotides; iii) 4, 5, 6, 7, 8, 9 or 10 (e.g., 6, 7, 8, 9 or 10) nucleotides for every 10 grey circled nucleotides; and/or iv) 2, 3, 4, 5, 6, 7, 8, 9 or 10 (e.g., 4, 5, 6, 7, 8, 9 or 10) nucleotides for every 10 white circled nucleotides.

5.    段落1至4a中任一項之經工程改造之核酸構築體,其包含 msr基因座及/或 msd基因座中的一或多個序列修飾(例如,一或多個核苷酸之插入、缺失及/或取代),該一或多個序列修飾: a)  調節(例如,增強) msDNA之逆轉錄、可加工性、準確性/保真度及/或產生(例如,在哺乳動物細胞中); b)  調節(例如,降低)宿主(例如,包含哺乳動物細胞之宿主)中由經工程改造之逆轉錄子(例如, msr基因座及/或 msd基因座)編碼的ncRNA之免疫原性; c)  調節(例如,永久或短暫地抑制) msDNA之功能;及/或 d)  調節(例如,改良)靶向基因體編輯/工程改造之效率。 5. The engineered nucleic acid construct of any one of paragraphs 1 to 4a, comprising one or more sequence modifications (e.g., insertion, deletion and/or substitution of one or more nucleotides) in the msr locus and/or msd locus, which one or more sequence modifications: a) modulate (e.g., enhance) reverse transcription, processability, accuracy/fidelity and/or production of msDNA (e.g., in mammalian cells); b) modulate (e.g., reduce) the immunogenicity of ncRNA encoded by the engineered retrotranscript (e.g., msr locus and/or msd locus) in a host (e.g., a host comprising mammalian cells); c) modulate (e.g., permanently or transiently inhibit) the function of msDNA; and/or d) modulate (e.g., improve) the efficiency of targeted genome editing/engineering.

6.    段落1至4中任一項之經工程改造之核酸構築體,其中該經工程改造之核酸構築體具有編碼以下涵蓋之野生型逆轉錄子ncRNA的野生型逆轉錄子之二級結構: a)  如圖2-27所描繪之任一結構,或 b)  a)之變異體,其具有: i)   每10個紅色字母核苷酸多達1、2或3個(例如,多達1個)核苷酸變化; ii)  每10個黑色字母核苷酸多達4、5或6個(例如,多達1或2個)核苷酸變化;及/或 iii) 每10個灰色字母核苷酸多達7、8或9個(例如,多達3或4個)核苷酸變化;及/或 視情況進一步包含: i)     每10個紅色圓圈核苷酸存在7、8、9或10個(例如,9或10個)核苷酸; ii)    每10個黑色圓圈核苷酸存在6、7、8、9或10個(例如,8、9或10個)核苷酸; iii)   每10個灰色圓圈核苷酸存在4、5、6、7、8、9或10個(例如,6、7、8、9或10個)核苷酸;及/或 iv)   每10個白色圓圈核苷酸存在2、3、4、5、6、7、8、9或10個(例如,4、5、6、7、8、9或10個)核苷酸。 6.    An engineered nucleic acid construct of any of paragraphs 1 to 4, wherein the engineered nucleic acid construct has a secondary structure of a wild-type retrotranscript encoding a wild-type retrotranscript ncRNA covered below: a)  Any structure as depicted in Figure 2-27, or b)  A variant of a) having: i)   Up to 1, 2 or 3 (e.g., up to 1) nucleotide changes per 10 red-letter nucleotides; ii)  Up to 4, 5 or 6 (e.g., up to 1 or 2) nucleotide changes per 10 black-letter nucleotides; and/or iii) Up to 7, 8 or 9 (e.g., up to 3 or 4) nucleotide changes per 10 grey-letter nucleotides; and/or further comprising, as the case may be: i)    7, 8, 9 or 10 (e.g., 9 or 10) nucleotides are present for every 10 red circled nucleotides; ii)    6, 7, 8, 9 or 10 (e.g., 8, 9 or 10) nucleotides are present for every 10 black circled nucleotides; iii)    4, 5, 6, 7, 8, 9 or 10 (e.g., 6, 7, 8, 9 or 10) nucleotides are present for every 10 grey circled nucleotides; and/or iv)    2, 3, 4, 5, 6, 7, 8, 9 or 10 (e.g., 4, 5, 6, 7, 8, 9 or 10) nucleotides are present for every 10 white circled nucleotides.

7.    段落1-6中任一項之經工程改造之核酸構築體,其中該核酸構築體係藉由將該一或多個序列修飾引入編碼表B中列出之野生型ncRNA的野生型逆轉錄子中而經工程改造的。7. The engineered nucleic acid construct of any of paragraphs 1-6, wherein the nucleic acid construct is engineered by introducing the one or more sequence modifications into a wild-type retrotranscript of a wild-type ncRNA listed in Coding Table B.

8.    段落1-7中任一項之經工程改造之核酸構築體,其中該ncRNA中之該一或多個序列修飾包含以下一或多者: (i) a1、a2或a1及a2兩者中的經修飾(例如,突變、減少或消除)之凸起; (ii) a1、a2或a1及a2兩者之延長或縮短; (iii)      髮夾環之間的間隔序列之延伸或縮短(例如,S1、S2、S3及/或S4); (iv)      髮夾環中之額外或經修飾(例如,突變或消除)之凸起(例如,L2及/或L3 (例如,藉由移除該凸起中的未配對鹼基,或藉由用相等數目之鹼基對置換未配對鹼基)); (v) 經修飾(例如,延長或縮短)之髮夾環長度(例如,L1、L2、L3及/或L4); (vi)      具有補體、反向或反向補體序列之替代L1及/或L2; (vii)     在髮夾環(例如,L1、L2、L3及/或L4)之尖端處經修飾(例如,增加)數目之未配對鹼基; (viii)    髮夾環中經修飾(例如,增加或減少)之GC含量(例如,L1、L2、L3及/或L4); (ix)      在髮夾環之間的間隔序列中(例如,S1、S2、S3及/或S4)或在髮夾環(例如,L1、L2、L3及/或L4)之尖端處插入該異源核酸; (x) 缺失一或多個髮夾環(例如,L1、L2、L3及/或L4); (xi)      髮夾環之間的間隔序列中添加新環(例如,S1、S2、S3及/或S4); (xii)     該ncRNA之環化,其中該ncRNA之5'端及3'端直接地或經由間隔序列連接; (xiii)    能夠起始逆轉錄啟動的再定位之分支鏈鳥苷; (xiv)    降低該逆轉錄子ncRNA之免疫原性的交錯末端序列,其藉由例如添加或移除5' a1核苷酸及/或3' a2核苷酸而產生;及/或, (xv)     與由該異源核酸編碼之CRISPR/Cas引導RNA (gRNA)序列互補的反義序列,其中該反義序列與該經編碼之逆轉錄子ncRNA中的該gRNA雜交且抑制該gRNA,且其中該反義序列在該msDNA之逆轉錄後經移除。 8.    The engineered nucleic acid construct of any of paragraphs 1-7, wherein the one or more sequence modifications in the ncRNA include one or more of the following: (i) a modified (e.g., mutated, reduced, or eliminated) bulge in a1, a2, or both a1 and a2; (ii) an extension or shortening of a1, a2, or both a1 and a2; (iii)      an extension or shortening of the spacer sequence between the hairpin loops (e.g., S1, S2, S3, and/or S4); (iv)      an additional or modified (e.g., mutated or eliminated) bulge in the hairpin loop (e.g., L2 and/or L3 (e.g., by removing unpaired bases in the bulge, or by replacing unpaired bases with an equal number of base pairs)); (v) Modified (e.g., extended or shortened) hairpin length (e.g., L1, L2, L3 and/or L4); (vi)      Alternative L1 and/or L2 with complement, reverse or reverse complement sequence; (vii)      Modified (e.g., increased) number of unpaired bases at the tip of the hairpin (e.g., L1, L2, L3 and/or L4); (viii)    Modified (e.g., increased or decreased) GC content in the hairpin (e.g., L1, L2, L3 and/or L4); (ix)     Insertion of the heterologous nucleic acid in the spacer sequence between the hairpin loops (e.g., S1, S2, S3 and/or S4) or at the tip of the hairpin loops (e.g., L1, L2, L3 and/or L4); (x) Deletion of one or more hairpin loops (e.g., L1, L2, L3 and/or L4); (xi)      Addition of new loops in the spacer sequence between the hairpin loops (e.g., S1, S2, S3 and/or S4); (xii)     Circularization of the ncRNA, wherein the 5' end and the 3' end of the ncRNA are directly or via the spacer sequence; (xiii)    Repositioned branched guanosine capable of initiating reverse transcription initiation; (xiv)   a staggered end sequence that reduces the immunogenicity of the retrotranscript ncRNA, generated by, for example, adding or removing 5' a1 nucleotides and/or 3' a2 nucleotides; and/or, (xv)     an antisense sequence that is complementary to a CRISPR/Cas guide RNA (gRNA) sequence encoded by the heterologous nucleic acid, wherein the antisense sequence hybridizes with and inhibits the gRNA in the encoded retrotranscript ncRNA, and wherein the antisense sequence is removed after reverse transcription of the msDNA.

9.    段落1-8中任一項之經工程改造之核酸構築體,其中該一或多個異源核酸序列包含: a)  插入該 msr基因座或該 msd基因座中(諸如S區(例如,S1、S2、S3及/或S4)中,或L區(例如,L1、L2、L3及/或L4)之尖端,或該 msr基因座或該 msd基因座上游或下游中之異源核酸(諸如RNA適體或核酶之編碼序列);或 b)  插入該 msd基因座中之第一異源核酸,及插入該 msr基因座上游或該 msd基因座下游之第二異源核酸,其中該第二異源核酸編碼引導RNA。 9. The engineered nucleic acid construct of any one of paragraphs 1-8, wherein the one or more heterologous nucleic acid sequences comprise: a) a heterologous nucleic acid (such as a coding sequence for an RNA aptamer or a ribozyme) inserted into the msr locus or the msd locus (such as in the S region (e.g., S1, S2, S3 and/or S4), or at the tip of the L region (e.g., L1, L2, L3 and/or L4), or upstream or downstream of the msr locus or the msd locus; or b) a first heterologous nucleic acid inserted into the msd locus, and a second heterologous nucleic acid inserted upstream of the msr locus or downstream of the msd locus, wherein the second heterologous nucleic acid encodes a guide RNA.

10.  段落1-9中任一項之經工程改造之核酸構築體,其中該異源核酸編碼: (a) 相關蛋白質或肽,或其中該異源核酸包含; (b) DNA供體模板序列; (c) 選自啟動子、增強子、蛋白質結合序列、甲基化位點、用於輔助基因編輯之同源區及其類似元件之功能性DNA元件;或 (d) 選自引導RNA及ncRNA之功能性RNA元件的編碼序列。 10. An engineered nucleic acid construct of any one of paragraphs 1-9, wherein the heterologous nucleic acid encodes: (a) a protein or peptide of interest, or wherein the heterologous nucleic acid comprises; (b) a DNA donor template sequence; (c) a functional DNA element selected from promoters, enhancers, protein binding sequences, methylation sites, homologous regions for assisting gene editing, and similar elements; or (d) a coding sequence for a functional RNA element selected from guide RNA and ncRNA.

11.  段落10之經工程改造之核酸構築體,其中該相關蛋白質或肽包含可用於治療疾病之治療蛋白。11. The engineered nucleic acid construct of paragraph 10, wherein the related protein or peptide comprises a therapeutic protein that can be used to treat a disease.

12.  段落10之經工程改造之核酸構築體,其中該DNA供體模板序列校正/修復/移除標靶基因體位點處之突變。12. The engineered nucleic acid construct of paragraph 10, wherein the DNA donor template sequence corrects/repairs/removes a mutation at a target genomic site.

13.  段落1-12中任一項之經工程改造之核酸構築體,其進一步包含或編碼序列特異性核酸酶(諸如CRISPR/Cas效應酶、ZFN、TALEN、大範圍核酸酶、TnpB、IscB或限制性核酸內切酶(RE))及/或DNA修復調節生物分子。13. The engineered nucleic acid construct of any of paragraphs 1-12, further comprising or encoding a sequence-specific nuclease (such as a CRISPR/Cas effector enzyme, ZFN, TALEN, meganuclease, TnpB, IscB or restriction endonuclease (RE)) and/or a DNA repair regulatory biomolecule.

13b. 段落1-13之經工程改造之核酸構築體,其中該經工程改造之核酸為全RNA組分系統。13b. The engineered nucleic acid construct of paragraphs 1-13, wherein the engineered nucleic acid is an all-RNA component system.

13c. 段落1-13之經工程改造之核酸構築體,其中該經工程改造之核酸為全DNA分子系統。13c. The engineered nucleic acid construct of paragraphs 1-13, wherein the engineered nucleic acid is an all-DNA molecule system.

14.  段落13之經工程改造之核酸構築體,其中該序列特異性核酸酶視情況經由柔性連接體(例如,包含富Gly及Ser序列(諸如G4S重複或GS重複)之柔性連接體)或藉由普遍無序之蛋白質序列(諸如非結構化親水性、生物可降解之蛋白質聚合物,例如XTEN肽聚合物)與該RT融合。14. The engineered nucleic acid construct of paragraph 13, wherein the sequence-specific nuclease is fused to the RT via a flexible linker (e.g., a flexible linker comprising Gly and Ser-rich sequences such as G4S repeats or GS repeats) or by a generally disordered protein sequence (e.g., an unstructured hydrophilic, biodegradable protein polymer, such as an XTEN peptide polymer), as appropriate.

15.  段落13或14之經工程改造之核酸構築體,其中該核酸酶係與識別標靶序列之引導RNA (gRNA)形成複合物的CRISPR/Cas效應酶,其中該gRNA直接地或藉由連接體/間隔多核苷酸連接至該ncRNA及/或該msDNA。15. The engineered nucleic acid construct of paragraph 13 or 14, wherein the nuclease is a CRISPR/Cas effector enzyme that forms a complex with a guide RNA (gRNA) that recognizes a target sequence, wherein the gRNA is linked to the ncRNA and/or the msDNA directly or via a linker/spacer polynucleotide.

16.  段落13之經工程改造之核酸構築體,其中該DNA修復調節生物分子為調節(例如,增強) HDR之調節蛋白,且該調節蛋白視情況經由柔性連接體(例如,包含富Gly及Ser序列(諸如G4S重複或GS重複)之柔性連接體)或藉由普遍無序之蛋白質序列(諸如非結構化親水性、生物可降解之蛋白質聚合物,例如XTEN肽聚合物)與該RT或該序列特異性核酸酶融合。16. The engineered nucleic acid construct of paragraph 13, wherein the DNA repair regulatory biomolecule is a regulatory protein that regulates (e.g., enhances) HDR, and the regulatory protein is fused to the RT or the sequence-specific nuclease via a flexible linker (e.g., a flexible linker comprising Gly and Ser-rich sequences (such as G4S repeats or GS repeats)) or by a generally disordered protein sequence (such as an unstructured hydrophilic, biodegradable protein polymer, such as an XTEN peptide polymer), as the case may be.

17.  一種包含一或多種載體之載體系統,該一或多種載體包含段落1-16中任一項之經工程改造之核酸構築體,其中該載體系統視情況為全RNA的。17. A vector system comprising one or more vectors comprising an engineered nucleic acid construct of any one of paragraphs 1-16, wherein the vector system is optionally all-RNA.

18.  段落17之載體系統,其中該 msr基因座、該 msd基因座及編碼該RT之該多核苷酸包含於同一載體內。 18. The vector system of paragraph 17, wherein the msr locus, the msd locus and the polynucleotide encoding the RT are contained in the same vector.

19.  段落17或18之載體系統,其中該同一載體進一步包含可操作地連接至該 msr基因座及/或該 msd基因座之啟動子。 19. The vector system of paragraph 17 or 18, wherein the same vector further comprises a promoter operably linked to the msr locus and/or the msd locus.

20.  段落19之載體系統,其中該啟動子進一步可操作地連接至編碼該RT之該多核苷酸。20. The vector system of paragraph 19, wherein the promoter is further operably linked to the polynucleotide encoding the RT.

21.  一種包含一或多種載體之載體系統,其包含段落1或2之經工程改造之核酸構築體,其中該載體系統進一步包含編碼逆轉錄酶(RT)或其部分之第二多核苷酸,其中該經編碼之RT能夠合成編碼該msDNA之該 msd基因座的至少一部分之DNA複本,且其中該 msr基因座、該 msd基因座及編碼該RT之該第二多核苷酸由至少兩種不同載體提供。 21. A vector system comprising one or more vectors, comprising the engineered nucleic acid construct of paragraph 1 or 2, wherein the vector system further comprises a second polynucleotide encoding a reverse transcriptase (RT) or a portion thereof, wherein the encoded RT is capable of synthesizing a DNA copy of at least a portion of the msd locus encoding the msDNA, and wherein the msr locus, the msd locus and the second polynucleotide encoding the RT are provided by at least two different vectors.

22.  段落21之載體系統,其中: a)  該第二多核苷酸包含: i)   表A中列出之多核苷酸,或與表A中列出之多核苷酸具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多核苷酸;及/或 b)  該第二多核苷酸編碼: i)   表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或 ii)  表C中列出之多肽,或與表C中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及 其中該第二多核苷酸視情況不編碼表X中列出之多肽。 22. The vector system of paragraph 21, wherein: a) the second polynucleotide comprises: i)   a polynucleotide listed in Table A, or a polynucleotide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with a polynucleotide listed in Table A; and/or b) the second polynucleotide encodes: i)  A polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with a polypeptide listed in Table A; and/or ii) A polypeptide listed in Table C, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with a polypeptide listed in Table C; and wherein the second polynucleotide does not encode a polypeptide listed in Table X, as the case may be.

23.  段落21或22之載體系統,其中編碼該RT之該多核苷酸相對於該 msr基因及/或該 msd基因以 反式提供。 23. The vector system of paragraph 21 or 22, wherein the polynucleotide encoding the RT is provided in trans relative to the msr gene and/or the msd gene.

24.  段落17-23中任一項之載體系統,其中該一或多種載體包含病毒載體。24. A vector system as described in any of paragraphs 17-23, wherein the one or more vectors comprise a viral vector.

25.  段落24之載體系統,其中該病毒載體為逆轉錄病毒載體、慢病毒載體、腺病毒載體、腺相關病毒載體、牛痘病毒載體、痘病毒載體或單純疱疹病毒載體。25. The vector system of paragraph 24, wherein the viral vector is a retroviral vector, a lentiviral vector, an adenoviral vector, an adeno-associated viral vector, a vaccinia viral vector, a poxvirus vector or a herpes simplex viral vector.

26.  段落17-23中任一項之載體系統,其中該一或多種載體包含非病毒載體。26. A vector system as described in any of paragraphs 17-23, wherein the one or more vectors comprise a non-viral vector.

27.  段落26之載體系統,其中該非病毒載體包含質體。27. The vector system of paragraph 26, wherein the non-viral vector comprises a plasmid.

28.  段落26之載體系統,其中該非病毒載體包含脂質體、脂質奈米顆粒(LNP)、陽離子聚合物、囊泡或金奈米顆粒。28. The vector system of paragraph 26, wherein the non-viral vector comprises a liposome, a lipid nanoparticle (LNP), a cationic polymer, a vesicle or a gold nanoparticle.

29.  段落17-28中任一項之載體系統,其包含編碼序列特異性核酸酶之載體。29. A vector system as described in any of paragraphs 17-28, which comprises a vector encoding a sequence-specific nuclease.

30.  段落29之載體系統,其中該序列特異性核酸酶包含RNA引導之序列特異性核酸酶(例如,CRISPR/Cas效應酶、經工程改造之RNA引導之FokI-核酸酶(例如dCas-FokI)、RNA引導之DNA核酸內切酶、TnpB、IscB或轉位子相關核酸酶)或非RNA引導之序列特異性核酸酶(例如,大範圍核酸酶、鋅指核酸酶(ZFN)、TALE核酸酶(TALEN)或限制性核酸內切酶(RE))。30. The vector system of paragraph 29, wherein the sequence-specific nuclease comprises an RNA-guided sequence-specific nuclease (e.g., CRISPR/Cas effector enzyme, engineered RNA-guided FokI-nuclease (e.g., dCas-FokI), RNA-guided DNA endonuclease, TnpB, IscB or translocon-associated nuclease) or a non-RNA-guided sequence-specific nuclease (e.g., meganuclease, zinc finger nuclease (ZFN), TALE nuclease (TALEN) or restriction endonuclease (RE)).

31.  段落30之載體系統,其中該Cas效應酶為1類,I型、II型或III型Cas;2類,II型Cas (例如Cas9);或2類,V型Cas (例如Cpfl)。31. The vector system of paragraph 30, wherein the Cas effector enzyme is class 1, type I, type II or type III Cas; class 2, type II Cas (e.g. Cas9); or class 2, type V Cas (e.g. Cpf1).

32.  段落30之載體系統,其中: 1)  該RNA引導之序列特異性核酸酶包含該CRISPR/Cas效應酶、該經工程改造之RNA引導之FokI核酸酶(例如dCas-FokI)、該RNA引導之DNA核酸內切酶、TnpB、IscB、IsrB或轉位子相關核酸酶;或, 2)  非RNA引導之序列特異性核酸酶包含該大範圍核酸酶、該鋅指核酸酶(ZFN)、該TALE核酸酶(TALEN)或該限制性核酸內切酶(RE)。 32. The vector system of paragraph 30, wherein: 1) the RNA-guided sequence-specific nuclease comprises the CRISPR/Cas effector enzyme, the engineered RNA-guided FokI nuclease (e.g., dCas-FokI), the RNA-guided DNA endonuclease, TnpB, IscB, IsrB, or a translocon-associated nuclease; or, 2) the non-RNA-guided sequence-specific nuclease comprises the meganuclease, the zinc finger nuclease (ZFN), the TALE nuclease (TALEN), or the restriction endonuclease (RE).

33.  段落17-32中任一項之載體系統,其進一步包含編碼同源重組增強子蛋白之載體。33. The vector system of any of paragraphs 17-32, further comprising a vector encoding a homologous recombination enhancer protein.

34.  一種RNA分子,該RNA分子由段落1-16中任一項之經工程改造之核酸構築體編碼。34. An RNA molecule encoded by an engineered nucleic acid construct of any of paragraphs 1-16.

35.  一種經工程改造之核酸-酶構築體,其包含: a)  非編碼RNA (ncRNA),其包含: i)   編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及 ii)  編碼該msDNA之 msdRNA部分的 msd基因座; b)  在選自以下之位置處或內部插入的異源核酸:該 msd基因座、該 msr基因座上游、該 msd基因座上游及該 msd基因座下游;及 c)  編碼逆轉錄酶(RT)或其結構域之序列,其包含: i)   表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或 ii)  表C中列出之多肽,或與表C中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及 其中,該RT視情況不包含表X中列出之多肽。 35. An engineered nucleic acid-enzyme construct comprising: a) a non-coding RNA (ncRNA) comprising: i) an msr locus encoding the msr RNA portion of a multi-copy single-stranded DNA (msDNA); and ii) an msd locus encoding the msd RNA portion of the msDNA; b) a heterologous nucleic acid inserted at or within a position selected from: the msd locus, upstream of the msr locus, upstream of the msd locus, and downstream of the msd locus; and c) a sequence encoding a reverse transcriptase (RT) or a domain thereof comprising: i) A polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polypeptide listed in Table A; and/or ii) The polypeptides listed in Table C, or polypeptides having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to the polypeptides listed in Table C; and wherein the RT optionally does not comprise a polypeptide listed in Table X.

36.  一種經工程改造之核酸-酶構築體,其包含: a)  非編碼RNA (ncRNA),其包含: i)   編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及 ii)  編碼該msDNA之 msdRNA部分的 msd基因座; 其中該ncRNA包含: i)   表B中列出之ncRNA,或與表B中列出之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA;及/或 ii)  具有圖2-27之任一ncRNA結構之保守結構的ncRNA;且 其中該ncRNA視情況排除自然界中與表X之任一逆轉錄子逆轉錄酶相關的任何ncRNA; b)  在選自以下之位置處或內部插入的異源核酸:該 msd基因座;該 msr基因座上游;該 msd基因座上游;及該 msd基因座下游;及 c)  逆轉錄酶(RT)或其部分,其中該RT能夠合成編碼該msDNA之該 msd基因座中的至少一部分之DNA複本。 36. An engineered nucleic acid-enzyme construct comprising: a) a non-coding RNA (ncRNA) comprising: i) an msr locus encoding an msr RNA portion of a multi-copy single-stranded DNA (msDNA); and ii) an msd locus encoding an msd RNA portion of the msDNA; wherein the ncRNA comprises: i) an ncRNA listed in Table B, or an ncRNA having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to an ncRNA listed in Table B; and/or ii) 2-27; and wherein the ncRNA excludes any ncRNA associated with any reverse transcriptase of Table X in nature as the case may be; b) a heterologous nucleic acid inserted at or within a position selected from: the msd locus; upstream of the msr locus; upstream of the msd locus; and downstream of the msd locus ; and c) reverse transcriptase (RT) or a portion thereof, wherein the RT is capable of synthesizing a DNA copy of at least a portion of the msd locus encoding the msDNA.

37.  一種經工程改造之核酸-酶構築體,其包含: a)  非編碼RNA (ncRNA),其包含: i)   編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及 ii)  編碼該msDNA之 msdRNA部分的 msd基因座; 其中,該ncRNA包含: i)   表B中列出之ncRNA,或與表B中列出之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA;及/或 ii)  具有圖2-27之任一ncRNA結構之保守結構的ncRNA;且 其中該ncRNA視情況排除自然界中與表X之任一逆轉錄子逆轉錄酶相關的任何ncRNA; b)  在選自以下之位置處或內部插入的異源核酸:該 msd基因座、該 msr基因座上游、該 msd基因座上游及該 msd基因座下游;及 c)  逆轉錄酶(RT)或其結構域: 其中該RT包含: i)   表A中列出之RT,或與表A中列出之RT具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之RT;及/或 ii)  表C中列出之共有序列,或與表C中列出之胺基酸序列具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及 其中該RT視情況不包含表X中列出之RT。 37. An engineered nucleic acid-enzyme construct comprising: a) a non-coding RNA (ncRNA) comprising: i) an msr locus encoding a msr RNA portion of a multi-copy single-stranded DNA (msDNA); and ii) an msd locus encoding a msd RNA portion of the msDNA; wherein the ncRNA comprises: i) ncRNAs listed in Table B, or ncRNAs having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to the ncRNAs listed in Table B; and/or ii) ncRNAs having a conservative structure of any ncRNA structure of Figures 2-27; and wherein the ncRNA excludes any ncRNA associated with any retrotransposase of Table X in nature as the case may be; b) A heterologous nucleic acid inserted at or within a position selected from the group consisting of: the msd locus, upstream of the msr locus, upstream of the msd locus, and downstream of the msd locus; and c) a reverse transcriptase (RT) or a domain thereof: wherein the RT comprises: i) an RT listed in Table A, or an RT having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to an RT listed in Table A; and/or ii) The consensus sequence listed in Table C, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to the amino acid sequence listed in Table C; and wherein the RT optionally does not include the RT listed in Table X.

38.  一種經分離之宿主細胞,其包含段落1-16中任一項之經工程改造之核酸構築體、段落17-33中任一項之載體系統、段落34之RNA分子或段落35-37中任一項之經工程改造之核酸-酶構築體。38. An isolated host cell comprising an engineered nucleic acid construct of any of paragraphs 1-16, a vector system of any of paragraphs 17-33, an RNA molecule of paragraph 34, or an engineered nucleic acid-enzyme construct of any of paragraphs 35-37.

39.  段落38之經分離之宿主細胞,其中該宿主細胞為原核、古核(archeon)或真核宿主細胞。39. The isolated host cell of paragraph 38, wherein the host cell is a prokaryotic, archeon or eukaryotic host cell.

40.  段落38之經分離之宿主細胞,其中該真核宿主細胞為哺乳動物宿主細胞。40. The isolated host cell of paragraph 38, wherein the eukaryotic host cell is a mammalian host cell.

41.  段落39之經分離之宿主細胞,其中該真核宿主細胞為非人類宿主細胞。41. The isolated host cell of paragraph 39, wherein the eukaryotic host cell is a non-human host cell.

42.  段落40之經分離之宿主細胞,其中該哺乳動物宿主細胞為人類宿主細胞。42. The isolated host cell of paragraph 40, wherein the mammalian host cell is a human host cell.

43.  段落38-42中任一項之經分離之宿主細胞,其中該宿主細胞為人工細胞或經遺傳修飾之細胞。43. An isolated host cell as described in any one of paragraphs 38-42, wherein the host cell is an artificial cell or a genetically modified cell.

44.  一種醫藥組合物,其包含: a)  段落1-16中任一項之經工程改造之核酸構築體、由段落1-16中任一項之經工程改造之核酸構築體編碼的ncRNA、段落17-33中任一項之載體系統、段落34之RNA分子、段落35-37中任一項之經工程改造之核酸-酶構築體及/或段落38-43中任一項之經分離之宿主細胞;及 b)  醫藥學上可接受之載劑。 44. A pharmaceutical composition comprising: a) an engineered nucleic acid construct of any one of paragraphs 1-16, an ncRNA encoded by an engineered nucleic acid construct of any one of paragraphs 1-16, a vector system of any one of paragraphs 17-33, an RNA molecule of paragraph 34, an engineered nucleic acid-enzyme construct of any one of paragraphs 35-37, and/or an isolated host cell of any one of paragraphs 38-43; and b) a pharmaceutically acceptable carrier.

45.  一種醫藥組合物,其包含: a)  脂質奈米顆粒(LNP);及 b)  段落1-16中任一項之經工程改造之核酸構築體、由段落1-16中任一項之經工程改造之核酸構築體編碼的ncRNA、段落17-33中任一項之載體系統、段落34之RNA分子及/或段落35-37中任一項之經工程改造之核酸-酶構築體。 45. A pharmaceutical composition comprising: a) lipid nanoparticles (LNPs); and b) an engineered nucleic acid construct of any one of paragraphs 1-16, a ncRNA encoded by an engineered nucleic acid construct of any one of paragraphs 1-16, a vector system of any one of paragraphs 17-33, an RNA molecule of paragraph 34, and/or an engineered nucleic acid-enzyme construct of any one of paragraphs 35-37.

46.  段落45之醫藥組合物,其中該LNP囊封該經工程改造之核酸構築體、該ncRNA、該載體系統、該RNA分子及/或該經工程改造之核酸-酶構築體。46. The pharmaceutical composition of paragraph 45, wherein the LNP encapsulates the engineered nucleic acid construct, the ncRNA, the vector system, the RNA molecule and/or the engineered nucleic acid-enzyme construct.

47.  段落45或46之醫藥組合物,其中該脂質奈米顆粒包含: a)  一或多種可離子化脂質; b)  一或多種結構脂質; c)  一或多種PEG化脂質;及 d)  一或多種磷脂。 47. The pharmaceutical composition of paragraph 45 or 46, wherein the lipid nanoparticles comprise: a) one or more ionizable lipids; b) one or more structured lipids; c) one or more PEGylated lipids; and d) one or more phospholipids.

48.  段落47之醫藥組合物,其中該一或多種可離子化脂質選自由表2中所揭示之彼等組成之群。48. The pharmaceutical composition of paragraph 47, wherein the one or more ionizable lipids are selected from the group consisting of those disclosed in Table 2.

49.  段落47或48之醫藥組合物,其中該一或多種結構脂質選自由以下組成之群:膽固醇、糞甾醇、β麥固醇、麥固醇、麥角甾醇、菜油甾醇、豆甾醇、蕓苔甾醇、番茄鹼、番茄苷、熊果酸、α-生育酚、潑尼松龍、地塞米松、潑尼松及氫化可的松。49. The pharmaceutical composition of paragraph 47 or 48, wherein the one or more structured lipids are selected from the group consisting of: cholesterol, natriuretic acid, β-myristol, sterol, ergosterol, campesterol, stigmasterol, sterosterol, tomatine, tomatine, ursolic acid, α-tocopherol, prednisolone, dexamethasone, prednisolone and hydrocortisone.

50.  段落47-49中任一項之醫藥組合物,其中該一或多種PEG化脂質選自由以下組成之群:PEG-c-DOMG、PEG-DMG、PEG-DLPE、PEG-DMPE、PEG-DPPC及PEG-DSPE。50. The pharmaceutical composition of any of paragraphs 47-49, wherein the one or more PEGylated lipids are selected from the group consisting of PEG-c-DOMG, PEG-DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC and PEG-DSPE.

51.  段落47-50中任一項之醫藥組合物,其中該一或多種磷脂選自由以下組成之群:1,2-二硬脂醯基-sn-甘油-3-磷酸膽鹼(DSPC)、1,2-二油醯基-sn-甘油-3-磷酸乙醇胺(DOPE)、1,2-二亞油醯基-sn-甘油-3-磷酸膽鹼(DLPC)、1,2-二肉豆蔻醯基-sn-甘油-磷酸膽鹼(DMPC)、1,2-二油醯基-sn-甘油-3-磷酸膽鹼(DOPC)、1,2-二棕櫚醯基-sn-甘油-3-磷酸膽鹼(DPPC)、1,2-二(十一烷醯基)-sn-甘油-磷酸膽鹼(DUPC)、1-棕櫚醯基-2-油醯基-sn-甘油-3-磷酸膽鹼(POPC)、1,2-二-O-十八烯基-sn-甘油-3-磷酸膽鹼(18:0 Diether PC)、1-油醯基-2-膽固醇基半琥珀醯基-sn-甘油-3-磷酸膽鹼(OChemsPC)、1-十六烷基-sn-甘油-3-磷酸膽鹼(C16 Lyso PC)、1,2-二亞麻醯基-sn-甘油-3-磷酸膽鹼、1,2-二花生四烯醯基-sn-甘油-3-磷酸膽鹼、1,2-二(二十二碳六烯醯基)-sn-甘油-3-磷酸膽鹼、1,2-二植烷醯基sn-甘油-3-磷酸乙醇胺(ME 16.0 PE)、1,2-二硬脂醯基-sn-甘油-3-磷酸乙醇胺、1,2-二亞油醯基-sn-甘油-3-磷酸乙醇胺、1,2-二亞麻醯基-sn-甘油-3-磷酸乙醇胺、1,2-二花生四烯醯基-sn-甘油-3-磷酸乙醇胺、1,2-二(二十二碳六烯醯基)-sn-甘油-3-磷酸乙醇胺、1,2-二油醯基-sn-甘油-3-磷酸-外消旋-(1-甘油)鈉鹽(DOPG)及鞘磷脂。51. The pharmaceutical composition of any one of paragraphs 47-50, wherein the one or more phospholipids are selected from the group consisting of: 1,2-distearoyl-sn-glycero-3-phosphocholine (DSPC), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 1,2-dilinoleyl-sn-glycero-3-phosphocholine (DLPC), 1,2-dimyristoyl-sn-glycero-3-phosphocholine (DMPC ), 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC), 1,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2-di(undecanoyl)-sn-glycero-phosphocholine (DUPC), 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC), 1,2-di-O-octadecenyl-sn-glycero-3-phosphocholine (18:0 Diether PC), 1-oleyl-2-cholesterol hemisuccinyl-sn-glycero-3-phosphocholine (OChemsPC), 1-hexadecyl-sn-glycero-3-phosphocholine (C16 Lyso PC), 1,2-dilinolenoyl-sn-glycero-3-phosphocholine, 1,2-diarachidonyl-sn-glycero-3-phosphocholine, 1,2-di(docosahexaenoyl)-sn-glycero-3-phosphocholine, 1,2-diphytanyl-sn-glycero-3-phosphoethanolamine (ME 16.0 PE), 1,2-distearoyl-sn-glycero-3-phosphoethanolamine, 1,2-dilinoleyl-sn-glycero-3-phosphoethanolamine, 1,2-dilinolenoyl-sn-glycero-3-phosphoethanolamine, 1,2-diarachidonyl-sn-glycero-3-phosphoethanolamine, 1,2-di(docosahexaenoyl)-sn-glycero-3-phosphoethanolamine, 1,2-dioleoyl-sn-glycero-3-phospho-racemic-(1-glycero) sodium salt (DOPG) and sphingomyelin.

52.  段落47-51中任一項之醫藥組合物,其中該脂質奈米顆粒包含約48.5 mol%可離子化脂質、約10 mol%磷脂、約40 mol%結構脂質及約1.5 mol% PEG脂質。52. The pharmaceutical composition of any of paragraphs 47-51, wherein the lipid nanoparticles comprise about 48.5 mol% ionizable lipids, about 10 mol% phospholipids, about 40 mol% structural lipids and about 1.5 mol% PEG lipids.

53.  段落47-52中任一項之醫藥組合物,其中該脂質奈米顆粒包含約48.5 mol%可離子化脂質、約10 mol%磷脂、約39 mol%結構脂質及約2.5 mol% PEG脂質。53. The pharmaceutical composition of any of paragraphs 47-52, wherein the lipid nanoparticles comprise about 48.5 mol% ionizable lipids, about 10 mol% phospholipids, about 39 mol% structural lipids and about 2.5 mol% PEG lipids.

54.  段落47-53中任一項之醫藥組合物,其中該LNP進一步包含可操作地連接至該LNP之標靶部分。54. The pharmaceutical composition of any of paragraphs 47-53, wherein the LNP further comprises a targeting portion operably linked to the LNP.

55.  段落47-54中任一項之醫藥組合物,其中該LNP進一步包含選自由以下組成之群的一或多種額外組分:DDAB、EPC、14PA、18BMP、DODAP、DOTAP及C12-200。55. The pharmaceutical composition of any of paragraphs 47-54, wherein the LNP further comprises one or more additional components selected from the group consisting of: DDAB, EPC, 14PA, 18BMP, DODAP, DOTAP and C12-200.

56.  段落45之醫藥組合物,其中該脂質奈米顆粒包含選自由以下組成之群的至少一種陽離子脂質:表2中之脂質、具有式(I)結構之脂質、具有式(II)結構之脂質、具有式(III)結構之脂質、具有式(IV)結構之脂質、具有式(V)結構之脂質、具有式(VI)結構之脂質及其組合。56. The pharmaceutical composition of paragraph 45, wherein the lipid nanoparticles comprise at least one cationic lipid selected from the group consisting of lipids in Table 2, lipids having a structure of formula (I), lipids having a structure of formula (II), lipids having a structure of formula (III), lipids having a structure of formula (IV), lipids having a structure of formula (V), lipids having a structure of formula (VI), and combinations thereof.

57.  一種套組,其包含段落1-16中任一項之經工程改造之核酸構築體、由段落1-16中任一項之經工程改造之核酸構築體編碼的ncRNA、段落17-33中任一項之載體系統、段落34之RNA分子、段落35-37中任一項之經工程改造之核酸-酶構築體、段落38-43中任一項之宿主細胞或段落44-56中任一項之醫藥組合物以及關於用該經工程改造之核酸構築體、該ncRNA、該載體系統、該宿主細胞或該醫藥組合物遺傳修飾細胞的說明書。57. A kit comprising an engineered nucleic acid construct of any of paragraphs 1-16, an ncRNA encoded by an engineered nucleic acid construct of any of paragraphs 1-16, a vector system of any of paragraphs 17-33, an RNA molecule of paragraph 34, an engineered nucleic acid-enzyme construct of any of paragraphs 35-37, a host cell of any of paragraphs 38-43, or a pharmaceutical composition of any of paragraphs 44-56, and instructions for genetically modifying a cell using the engineered nucleic acid construct, the ncRNA, the vector system, the host cell, or the pharmaceutical composition.

58.  一種修飾宿主(例如,哺乳動物)細胞中之標靶DNA序列的方法,該方法包括將段落1-16中任一項之經工程改造之核酸構築體、由段落1-16中任一項之經工程改造之核酸構築體編碼的ncRNA、段落17-33中任一項之載體系統、段落34之RNA分子、段落35-37中任一項之經工程改造之核酸-酶構築體或段落44-56中任一項之醫藥組合物引入該哺乳動物細胞中,以允許在該宿主(例如,哺乳動物)細胞中產生該msDNA,其中該msDNA中之該異源核酸藉由同源依賴性重組整合至該宿主(例如,哺乳動物)細胞之基因體中的該標靶DNA序列處。58. A method for modifying a target DNA sequence in a host (e.g., mammal) cell, the method comprising introducing an engineered nucleic acid construct of any of paragraphs 1-16, an ncRNA encoded by an engineered nucleic acid construct of any of paragraphs 1-16, a vector system of any of paragraphs 17-33, an RNA molecule of paragraph 34, an engineered nucleic acid-enzyme construct of any of paragraphs 35-37, or a pharmaceutical composition of any of paragraphs 44-56 into the mammalian cell to allow production of the msDNA in the host (e.g., mammal) cell, wherein the heterologous nucleic acid in the msDNA is integrated into the target DNA sequence in the genome of the host (e.g., mammal) cell by homology-dependent recombination.

59.  段落58之方法,其中該修飾包括將插入、缺失及/或取代引入該標靶DNA序列中。59. The method of paragraph 58, wherein the modification comprises introducing insertions, deletions and/or substitutions into the target DNA sequence.

60.  一種治療有需要之個體的疾病或疾患之方法,該方法包括向該個體投與治療有效量的段落1-16中任一項之經工程改造之核酸構築體、由段落1-16中任一項之經工程改造之核酸構築體編碼的ncRNA、段落17-33中任一項之載體系統、段落34之RNA分子、段落35-37中任一項之經工程改造之核酸-酶構築體、段落38-43中任一項之宿主細胞或段落44-56中任一項之醫藥組合物,由此治療該個體之該疾病或疾患。60. A method for treating a disease or condition in an individual in need thereof, the method comprising administering to the individual a therapeutically effective amount of an engineered nucleic acid construct of any one of paragraphs 1-16, an ncRNA encoded by an engineered nucleic acid construct of any one of paragraphs 1-16, a vector system of any one of paragraphs 17-33, an RNA molecule of paragraph 34, an engineered nucleic acid-enzyme construct of any one of paragraphs 35-37, a host cell of any one of paragraphs 38-43, or a pharmaceutical composition of any one of paragraphs 44-56, thereby treating the disease or condition in the individual.

61.  一種治療有需要之個體的疾病或疾患之方法,該方法包括向該個體投與治療有效量之段落38-43中任一項之宿主細胞,由此治療該個體之該疾病或疾患。61. A method for treating a disease or condition in a subject in need thereof, the method comprising administering to the subject a therapeutically effective amount of a host cell of any one of paragraphs 38-43, thereby treating the disease or condition in the subject.

62.  段落61之方法,其中該宿主細胞對於該個體為自體的。62. The method of paragraph 61, wherein the host cell is autologous to the individual.

63.  段落61之方法,其中該宿主細胞對於該個體為同種異體的。63. The method of paragraph 61, wherein the host cell is allogeneic to the individual.

相關申請案 Related applications

本申請案根據35 U.S.C. § 119(e)主張2022年8月25日申請之美國臨時申請案第63/373,545號(RTX003-P1)、2022年12月22日申請之美國申請案第18/087,673號(RTX004-T1)、2022年12月22日申請之美國臨時申請案第63/476,900號(RTX005-P1)、2023年1月20日申請之國際PCT申請案第PCT/US2023/61038號(RTX006-PCT1)、2023年3月3日申請之美國臨時申請案第63/488,317號(RTX007-P1)、2023年3月22日申請之美國臨時申請案第63/491,603號(RTX008-P1)以及2023年7月26日申請之美國臨時申請案第63/515,783號(RTX009-P1)的優先權,其中每一者均以引用之方式整體併入本文中。This application is based on U.S. Provisional Application No. 63/373,545 filed on August 25, 2022 (RTX003-P1), U.S. Application No. 18/087,673 filed on December 22, 2022 (RTX004-T1), U.S. Provisional Application No. 63/476,900 filed on December 22, 2022 (RTX005-P1), International PCT Application No. PCT/US2023/6 filed on January 20, 2023, and U.S. Provisional Application No. 63/476,900 filed on December 22, 2022 (RTX005-P1). No. 1038 filed on March 3, 2023 (RTX006-PCT1), U.S. Provisional Application No. 63/488,317 filed on March 3, 2023 (RTX007-P1), U.S. Provisional Application No. 63/491,603 filed on March 22, 2023 (RTX008-P1), and U.S. Provisional Application No. 63/515,783 filed on July 26, 2023 (RTX009-P1), each of which is incorporated herein by reference in its entirety.

本申請案參考以下申請案:2022年1月21日申請之美國臨時申請案第63/301,936號(RTX001-P1)及2022年8月9日申請之美國臨時申請案第63/370,880號(RTX002-P1),其中每一者均以引用之方式整體併入本文中。This application refers to the following applications: U.S. Provisional Application No. 63/301,936 filed on January 21, 2022 (RTX001-P1) and U.S. Provisional Application No. 63/370,880 filed on August 9, 2022 (RTX002-P1), each of which is incorporated herein by reference in its entirety.

上述申請案及其中或在其起訴期間引用之所有文件(「申請案所引用之文件」),及該等申請案引用之文件中所引用或參考之所有文件,及本文所引用或參考之所有文件(「本文所引用之文件」),以及本文所引用之文件中所引用或參考之所有文件,連同本文或以引用之方式併入本文中之任何文件中所提及之任何產品的任何製造商之說明書、描述、產品規格及產品表,由此以引用之方式併入本文中,且可在本發明之實踐中採用。更特定言之,所有參考文件均以引用之方式併入,其併入程度就如同每個個別文件特定地且個別地經指示以引用之方式併入一般。 序列表 The above-mentioned applications and all documents cited therein or during the prosecution thereof ("Documents cited in the Applications"), and all documents cited or referenced in documents cited in the Applications, and all documents cited or referenced herein ("Documents cited herein"), and all documents cited or referenced in documents cited herein, together with any manufacturer's instructions, descriptions, product specifications and product sheets for any products mentioned herein or in any document incorporated by reference herein, are hereby incorporated by reference and may be used in the practice of the invention. More specifically, all referenced documents are incorporated by reference to the same extent as if each individual document was specifically and individually indicated to be incorporated by reference. Sequence Listing

最初申請之本申請案包括/含有以電子形式以可延伸標示語言(XML)格式申請之序列表,標題為J0356-99004.xml,創建於2023年8月24日且大小為36,884,841個位元組。該序列表之內容整體併入本文中。 定義 This application as originally filed includes/contains a sequence listing filed electronically in XML format, titled J0356-99004.xml, created on August 24, 2023 and 36,884,841 bytes in size. The contents of the sequence listing are incorporated herein in their entirety. Definitions

本文所用之所有技術及科學術語均具有熟習本發明所屬領域之技術人員通常所理解之含義。以下參考文獻為熟習此項技術者提供本發明中所用之許多術語之一般定義:Singleton等人, Dictionary of Microbiology and Molecular Biology (第2版 1994);The Cambridge Dictionary of Science and Technology (Walker編, 1988);The Glossary of Genetics, 第5版, R. Rieger等人(編), Springer Verlag (1991);以及Hale及62/1005 Marham, The Harper Collins Dictionary of Biology (1991)。All technical and scientific terms used herein have the meanings commonly understood by those skilled in the art to which the present invention belongs. The following references provide general definitions of many of the terms used in the present invention for those skilled in the art: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd edition 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th edition, R. Rieger et al. (eds.), Springer Verlag (1991); and Hale and 62/1005 Marham, The Harper Collins Dictionary of Biology (1991).

分子及細胞生物化學中之一般方法可見於標準教科書,諸如Molecular Cloning: A Laboratory Manual, 第3版 (Sambrook等人, HaRBor Laboratory Press 2001);Short Protocols in Molecular Biology, 第4版 (Ausubel等人編, John Wiley & Sons 1999);Protein Methods (Bollag等人, John Wiley & Sons 1996);Nonviral Vectors for Gene Therapy (Wagner等人編, Academic Press 1999);Viral Vectors (Kaplift及Loewy編, Academic Press 1995);Immunology Methods Manual (I. Lefkovits編, Academic Press 1997);以及Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle及Griffiths, John Wiley & Sons 1998),其揭示內容以引用之方式併入本文中。General methods in molecular and cellular biochemistry can be found in standard textbooks such as Molecular Cloning: A Laboratory Manual, 3rd edition (Sambrook et al., Harbor Laboratory Press 2001); Short Protocols in Molecular Biology, 4th edition (Ausubel et al., eds., John Wiley & Sons 1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al., eds., Academic Press 1999); Viral Vectors (Kaplift and Loewy, eds., Academic Press 1995); Immunology Methods Manual (I. Lefkovits, ed., Academic Press 1997); and Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle and Griffiths, John Wiley & Sons 1998), the disclosures of which are incorporated herein by reference.

在提供數值範圍之情況下,應理解,除非上下文另外明確指出,否則在彼範圍之上限與下限之間的各介入值至下限單位之十分之一及在彼陳述範圍中之任何其他陳述值或介入值均涵蓋於本揭示案內。此等較小範圍之上限及下限可獨立地包括於較小範圍內,且亦涵蓋於本揭示案內,受所陳述範圍內的任何特定排除之極限的限制。當所陳述範圍包括該等極限中之一者或兩者時,排除彼等所包括之極限中的任一者或兩者之範圍亦包括於本揭示案中。Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit, between the upper and lower limits of that range and any other stated or intervening value in that stated range is encompassed within the disclosure unless the context clearly dictates otherwise. The upper and lower limits of such smaller ranges may independently be included in the smaller ranges and are also encompassed within the disclosure, subject to any specifically excluded limit in the stated range. When the stated range includes one or both of those limits, ranges excluding either or both of those included limits are also included in the disclosure.

必須注意,除非上下文另外明確指出,否則如本文及隨附申請專利範圍中所用,單數形式「一(a/an)」及「該(the)」包括複數個提及物。因此,舉例而言,提及「ncRNA」包括複數種ncRNA,且提及「逆轉錄酶」包括提及一或多種RT及熟習此項技術者已知之其等效物,等等。應進一步注意,可起草申請專利範圍以排除任何視情況選用之要素。因而,此陳述意欲用作關聯申請專利範圍之要素的描述使用如「僅僅」、「僅」及其類似術語之此类排他性術語或使用「負性」限制的先行基礎。例如,可起草申請專利範圍以排除某些RT序列。It must be noted that, as used herein and in the appended claims, the singular forms "a", "an", and "the" include plural references unless the context clearly indicates otherwise. Thus, for example, reference to "ncRNA" includes plural ncRNAs, and reference to "reverse transcriptase" includes reference to one or more RTs and equivalents thereof known to those skilled in the art, and so forth. It should be further noted that claims may be drafted to exclude any optional elements. Thus, this statement is intended to serve as an antecedent basis for the use of such exclusive terms as "only", "only", and the like, or the use of a "negative" limitation in the description of elements of the associated claims. For example, claims may be drafted to exclude certain RT sequences.

應理解,為了清楚起見在單獨實施例之背景中描述的本發明之某些特徵亦可在單個實施例中組合提供。相反,為了簡潔起見在單個實施例之背景中描述的本發明之各種特徵亦可單獨地或以任何合適之子組合提供。屬於本發明之實施例之所有組合均特定地由本發明涵蓋且在本文中揭示,就如同個別地且明確地揭示每個組合一般。此外,多個實施例及其要素之所有子組合亦特定地由本發明涵蓋且在本文中揭示,就如同本文中個別地且明確地揭示每個此類子組合一般。 生物活性 It will be appreciated that certain features of the invention that are described for clarity in the context of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features of the invention that are described for brevity in the context of a single embodiment may also be provided individually or in any suitable subcombination. All combinations of embodiments of the invention are specifically contemplated by the invention and disclosed herein, as if each combination were individually and expressly disclosed. Furthermore, all subcombinations of the multiple embodiments and elements thereof are also specifically contemplated by the invention and disclosed herein, as if each such subcombination were individually and expressly disclosed herein. Biological Activity

如本文所用,術語「生物活性」係指在生物系統(包括 活體外活體內生物系統)中且尤其在活生物體中(諸如在包括人類及非人類哺乳動物之哺乳動物中)具有活性之劑( 例如,DNA、RNA或蛋白質)之特徵。例如,當投與至生物體時對彼生物體具有生物效應之劑被視為生物活性的。 凸起 As used herein, the term "biological activity" refers to the characteristic of an agent (e.g., DNA, RNA , or protein) that is active in biological systems (including in vitro and in vivo biological systems), and particularly in living organisms (such as in mammals, including humans and non-human mammals). For example, an agent that has a biological effect on an organism when administered to that organism is considered biologically active.

如本文所用,術語「凸起」係指中斷鹼基配對之核苷酸的「莖」之未配對鹼基之小區域。凸起可包含兩端藉由該莖之鹼基配對之核苷酸接合的一或兩個單股或非鹼基配對之核苷酸。凸起可為對稱的( 亦即,兩個非鹼基配對之單股區域具有相同數目之核苷酸),或不對稱的( 亦即,非鹼基配對之單股區域具有不同或不相等數目之核苷酸),或一股上僅存在一個非鹼基配對之核苷酸。凸起可描述為A/B (諸如「2/2凸起」或「1/0凸起」),其中A表示該莖之上游股上的未配對核苷酸之數目,且B表示該莖之下游股上的未配對核苷酸之數目。在一級核苷酸序列中,凸起之上游股更接近凸起之下游股的5'。 cDNA As used herein, the term "bulge" refers to a small region of unpaired bases of the "stem" of the nucleotides that interrupt the base pairing. The bulge may comprise one or two single stranded or non-base paired nucleotides joined at both ends by base paired nucleotides of the stem. The bulge may be symmetrical ( i.e. , the two non-base paired single stranded regions have the same number of nucleotides), or asymmetrical ( i.e. , the non-base paired single stranded regions have different or unequal numbers of nucleotides), or there may be only one non-base paired nucleotide on one strand. The bulge may be described as A/B (e.g., "2/2 bulge" or "1/0 bulge"), where A represents the number of unpaired nucleotides on the upstream strand of the stem, and B represents the number of unpaired nucleotides on the downstream strand of the stem. In the primary nucleotide sequence, the upstream strand of the protrusion is closer to the 5' of the downstream strand of the protrusion.

如本文所用,術語「cDNA」係指例如藉由逆轉錄酶自RNA模板複製之DNA股。 同源物 As used herein, the term "cDNA" refers to a DNA strand that has been copied from an RNA template, for example by reverse transcriptase .

術語「同源物」係指在自然界中通常相互作用或共存之兩種生物分子。 互補 The term " homologs " refers to two biological molecules that usually interact or coexist in nature.

如本文所用,術語「互補」或「實質上互補」意欲指核酸(例如,RNA、DNA)包含使其能夠在溫度及溶液離子強度之適當活體外及/或活體內條件下以序列特異性、反平行方式(亦即,核酸與互補核酸特異性結合)非共價結合(亦即,形成Watson-Crick鹼基對及/或G/U鹼基對)、「退火」或「雜交」至另一核酸之核苷酸序列。標準Watson-Crick鹼基配對包括:腺嘌呤(A)與胸苷(T)配對、腺嘌呤(A)與尿嘧啶(U)配對以及鳥嘌呤(G)與胞嘧啶(C)配對[DNA、RNA]。另外,對於兩種RNA分子(例如,dsRNA)之間的雜交,以及DNA分子與RNA分子之雜交(例如當DNA標靶核酸與引導RNA進行鹼基配對時,等):鳥嘌呤(G)亦可與尿嘧啶(U)進行鹼基配對。例如,在tRNA反密碼子與mRNA中之密碼子進行鹼基配對的背景中,G/U鹼基配對至少部分地負責遺傳密碼之簡併性(亦即,冗餘性)。因此,在本揭示案之背景中,鳥嘌呤(G)被認為與尿嘧啶(U)及腺嘌呤(A)兩者互補。例如,當可在引導RNA分子之dsRNA雙鏈體的既定核苷酸位置處產生G/U鹼基對時,該位置未被視為非互補的,而是被視為互補的。As used herein, the term "complementary" or "substantially complementary" is intended to mean that a nucleic acid (e.g., RNA, DNA) comprises a nucleotide sequence that enables it to non-covalently bind (i.e., form Watson-Crick base pairs and/or G/U base pairs), "anneal" or "hybridize" to another nucleic acid in a sequence-specific, antiparallel manner (i.e., nucleic acid specifically binds to a complementary nucleic acid), under appropriate in vitro and/or in vivo conditions of temperature and solution ionic strength. Standard Watson-Crick base pairing includes: adenine (A) pairs with thymidine (T), adenine (A) pairs with uracil (U), and guanine (G) pairs with cytosine (C) [DNA, RNA]. Additionally, for hybridization between two RNA molecules (e.g., dsRNA), and hybridization between a DNA molecule and an RNA molecule (e.g., when a DNA target nucleic acid base pairs with a guide RNA, etc.): guanine (G) can also base pair with uracil (U). For example, in the context of base pairing of a tRNA anticodon with a codon in an mRNA, G/U base pairing is at least partially responsible for the parsimony (i.e., redundancy) of the genetic code. Thus, in the context of the present disclosure, guanine (G) is considered complementary to both uracil (U) and adenine (A). For example, when a G/U base pair can be generated at a given nucleotide position of the dsRNA duplex of a guide RNA molecule, that position is not considered non-complementary, but rather is considered complementary.

應理解,多核苷酸之序列無需與其標靶核酸之序列100%互補即為可特異性地雜交的或可雜交的。此外,多核苷酸可在一或多個區段上雜交,使得間插或相鄰區段不參與雜交事件(例如,凸起、環結構或髮夾結構等)。多核苷酸可包含與其將雜交之標靶核酸序列內的標靶區域之60%或更多、65%或更多、70%或更多、75%或更多、80%或更多、85%或更多、90%或更多、95%或更多、98%或更多、99%或更多、99.5%或更多或100%序列互補性。例如,其中反義化合物之20個核苷酸中之18個與標靶區域互補且因此將特異性地雜交之反義核酸將表示90%互補性。在此實例中,剩餘非互補核苷酸可與互補核苷酸成簇或散佈有互補核苷酸,且無需彼此鄰接或與互補核苷酸鄰接。可使用任何便利方法來確定核酸內之核酸序列的特定延伸段之間之互補性百分比。實例方法包括BLAST程式(基本局部比對搜索工具)及PowerBLAST程式(Altschul等人, J. Mol. Biol., 1990, 215, 403-410;Zhang及Madden, Genome Res., 1997, 7, 649-656);Gap程式(Wisconsin Sequence Analysis Package, 版本8用於Unix, Genetics Computer Group, University Research Park, Madison Wis.),例如,使用預設設定,該程式使用Smith及Waterman之算法(Adv. Appl. Math., 1981, 2, 482-489),及其類似程式。 DNA 引導之核酸酶 It is understood that the sequence of a polynucleotide need not be 100% complementary to the sequence of its target nucleic acid to be specifically hybridizable or hybridizable. In addition, a polynucleotide may hybridize over one or more segments such that intervening or adjacent segments do not participate in the hybridization event (e.g., bulges, loop structures, or hairpin structures, etc.). A polynucleotide may comprise 60% or more, 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% or more, or 100% sequence complementarity with a target region within the target nucleic acid sequence to which it will hybridize. For example, an antisense nucleic acid in which 18 of the 20 nucleotides of the antisense compound are complementary to the target region and will therefore specifically hybridize would represent 90% complementarity. In this example, the remaining non-complementary nucleotides may be clustered with or interspersed with complementary nucleotides and need not be adjacent to each other or to complementary nucleotides. Any convenient method may be used to determine the percentage of complementarity between a particular stretch of nucleic acid sequences within a nucleic acid. Example methods include BLAST (Basic Local Alignment Search Tool) and PowerBLAST programs (Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang and Madden, Genome Res., 1997, 7, 649-656); Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer Group, University Research Park, Madison Wis.), which uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2, 482-489), for example, using default settings, and similar programs. DNA- guided nuclease

如本文所用,「DNA引導之核酸酶」為一類「可程式化核酸酶」及一類特定之「核酸引導之核酸酶」。DNA引導之核酸酶的實例報告於Varshney等人, DNA-guided genome editing using structure-guided endonucleases, Genome Biology, 2016, 17(1), 187中,該文獻可在本揭示案之背景中使用且以引用之方式併入本文中。如本文所用,術語「DNA引導之核酸酶」或「DNA引導之核酸內切酶」係指核酸酶,該核酸酶與引導RNA共價或非共價締合,由此在引導RNA與DNA引導之核酸酶之間形成複合物。引導RNA包含間隔序列,該間隔序列包含與標靶DNA序列之股具有互補性之核苷酸序列。因此,DNA引導之核酸酶藉由其與引導RNA之締合間接地經引導或經編程以定位至DNA分子中之特定位點,該引導RNA經由Watson-Crick鹼基配對藉由其互補區域直接地結合或退火至標靶DNA之一股。 DNA 調節序列 As used herein, "DNA-guided nucleases" are a class of "programmable nucleases" and a specific class of "nucleic acid-guided nucleases." Examples of DNA-guided nucleases are reported in Varshney et al., DNA-guided genome editing using structure-guided endonucleases, Genome Biology, 2016, 17(1), 187, which may be used in the context of the present disclosure and is incorporated herein by reference. As used herein, the term "DNA-guided nuclease" or "DNA-guided endonuclease" refers to a nuclease that covalently or non-covalently binds to a guide RNA, thereby forming a complex between the guide RNA and the DNA-guided nuclease. The guide RNA comprises a spacer sequence comprising a nucleotide sequence that is complementary to a strand of a target DNA sequence. Thus, a DNA-guided nuclease is indirectly guided or programmed to localize to a specific site in a DNA molecule by its association with a guide RNA which directly binds or anneals to one strand of the target DNA through its complementary region via Watson -Crick base pairing .

如本文所用,術語「DNA調節序列」、「控制元件」及「調節元件」可在本文中互換使用來指代轉錄及轉譯控制序列,諸如啟動子、增強子、多腺苷酸化信號、終止子、蛋白質降解信號及其類似元件,該等序列提供及/或調節非編碼序列(例如,引導RNA)或編碼序列之轉錄及/或調節mRNA轉譯成經編碼之多肽。 供體核酸 As used herein, the terms "DNA regulatory sequence,""controlelement," and "regulatory element" are used interchangeably herein to refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, which provide for and/or regulate the transcription of non-coding sequences (e.g., guide RNA) or coding sequences and/or regulate the translation of mRNA into the encoded polypeptide. Donor Nucleic Acid

「供體核酸」或「供體多核苷酸」或「供體DNA」或「HDR供體DNA」意謂欲插入由可程式化核酸酶(例如,CRISPR/Cas效應蛋白;TALEN;ZFN;大範圍核酸酶)裂解之位點處的單股DNA (例如,在dsDNA裂解之後、在對標靶DNA刻切口之後、在對標靶DNA雙重刻切口之後及其類似情形)。供體多核苷酸可含有與標靶位點處之基因體序列的足夠同源性,例如與側接標靶位點之核苷酸序列的70%、80%、85%、90%、95%或100%同源性,例如在標靶位點之約200個鹼基或更少鹼基內,例如在標靶位點之約190個鹼基或更少鹼基內,例如在標靶位點之約180個鹼基或更少鹼基內,例如在標靶位點之約170個鹼基或更少鹼基內,例如在標靶位點之約160個鹼基或更少鹼基內,例如在標靶位點之約150個鹼基或更少鹼基內,例如在標靶位點之約140個鹼基或更少鹼基內,例如在標靶位點之約130個鹼基或更少鹼基內,例如在標靶位點之約120個鹼基或更少鹼基內,例如在標靶位點之約110個鹼基或更少鹼基內,例如在標靶位點之約100個鹼基或更少鹼基內,例如在標靶位點之約90個鹼基或更少鹼基內,例如在標靶位點之約80個鹼基或更少鹼基內,例如在標靶位點之約70個鹼基或更少鹼基內,例如在標靶位點之約60個鹼基或更少鹼基、例如標靶位點之50個鹼基或更少鹼基內,例如約30個鹼基內、約15個鹼基內、約10個鹼基內、約5個鹼基內或直接側接標靶位點,以支持其與與其具有同源性之基因體序列之間的同源定向修復。 編碼 "Donor nucleic acid" or "donor polynucleotide" or "donor DNA" or "HDR donor DNA" means a single strand of DNA to be inserted at a site to be cleaved by a programmable nuclease (e.g., CRISPR/Cas effector protein; TALEN; ZFN; meganuclease) (e.g., after dsDNA cleavage, after nicking of the target DNA, after double nicking of the target DNA, and the like). The donor polynucleotide can contain sufficient homology to the genome sequence at the target site, such as 70%, 80%, 85%, 90%, 95% or 100% homology to the nucleotide sequence flanking the target site, such as within about 200 bases or less of the target site, such as within about 190 bases or less of the target site, such as within about 100 bases or less of the target site. The target site is within about 180 alkali groups or less, such as within about 170 alkali groups or less of the target site, such as within about 160 alkali groups or less of the target site, such as within about 150 alkali groups or less of the target site, such as within about 140 alkali groups or less of the target site, such as within about 130 alkali groups or less of the target site. The target site is within about 120 alkali groups or less, such as within about 110 alkali groups or less, such as within about 100 alkali groups or less, such as within about 90 alkali groups or less, such as within about 80 alkali groups or less, such as within about 100 alkali groups or less, such as within about 90 alkali groups or less, such as within about 80 alkali groups or less, such as within about 100 alkali groups or less. Within about 70 bases or less, such as within about 60 bases or less of the target site, such as within 50 bases or less of the target site, such as within about 30 bases, within about 15 bases, within about 10 bases, within about 5 bases, or directly flanking the target site to support homology-directed repair between it and a genomic sequence with homology thereto.

如本文所用,「編碼」特定RNA之DNA序列係轉錄成RNA之DNA核苷酸序列。DNA多核苷酸可編碼經轉譯成蛋白質之RNA (mRNA) (且因此DNA及mRNA均編碼該蛋白質),或DNA多核苷酸可編碼未轉譯成蛋白質之RNA (例如,tRNA、rRNA、微小RNA (miRNA)、「非編碼」RNA (ncRNA)、引導RNA等)。在逆轉錄子之情況下,逆轉錄子DNA可編碼ncRNA基因座(其包括msr及msd區域)以及逆轉錄子RT。 經工程改造之逆轉錄子 As used herein, a DNA sequence that "encodes" a particular RNA is a DNA nucleotide sequence that is transcribed into RNA. A DNA polynucleotide may encode an RNA (mRNA) that is translated into a protein (and thus both the DNA and the mRNA encode the protein), or a DNA polynucleotide may encode an RNA that is not translated into a protein (e.g., tRNA, rRNA, microRNA (miRNA), "non-coding" RNA (ncRNA), guide RNA, etc.). In the case of a retrotranscript, the retrotranscript DNA may encode the ncRNA locus (which includes the msr and msd regions) as well as the retrotranscript RT. Engineered Retrotranscripts

如本文所用,術語「經工程改造之逆轉錄子」或等效地「重組逆轉錄子」係指自然界中不存在之逆轉錄子。在一實施例中,經工程改造之逆轉錄子可包括野生型或天然存在之逆轉錄子,該等逆轉錄子經修飾以含有至少一種修飾,包括單一核苷酸取代、插入或缺失,或超過一個核苷酸之取代、插入或缺失,亦即,起始點逆轉錄子(例如,野生型逆轉錄子)之多達2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49、50、51、52、53、54、55、56、57、58、59、60、61、62、63、64、65、66、67、68、69、70、71、72、73、74、75、76、77、78、79、80、81、82、83、84、85、86、87、88、89、90、91、92、93、94、95、96、97、98、99個或多達100個或多達200、300、400、500、600、700、800、900、1000、1100、1200、1300、1400、1500、1600、1700、1800、1900個或多達2000個核苷酸經取代、插入或缺失。當起始點逆轉錄子(例如,野生型逆轉錄子)之超過一個核苷酸經取代、缺失或插入時,該等核苷酸可為連續或不連續的。雖然經工程改造之逆轉錄子整體並非天然存在的,但其可包括自然界中確實存在之組分,諸如核苷酸序列。例如,經工程改造之逆轉錄子可具有來自不同生物體( 例如,來自不同細菌物種)或來自完全合成/人工/重組核酸序列之核苷酸序列。因此,經工程改造之逆轉錄子可具有細菌核苷酸序列、人類核苷酸序列、病毒核苷酸序列及/或合成/人工/重組核苷酸序列及/或此類序列之組合。本文所揭示之重組逆轉錄子的修飾之實例包括將異源核酸序列插入逆轉錄子中,例如插入ncRNA基因座中,諸如msr或msd基因座中。將引導RNA分子連接至5'及/或3'端(亦即,將一分子連接至ncRNA之5'端及/或將一分子連接至ncRNA之3'端)亦表示本文所揭示之重組逆轉錄子所設想的另一修飾。在此類實施例中,引導RNA分子亦可分類或更一般地稱為用於修飾起始點逆轉錄子之異源核酸序列之類型。 胞泌體 As used herein, the term "engineered retrotranscript" or equivalently "recombinant retrotranscript" refers to a retrotranscript that does not exist in nature. In one embodiment, the engineered retrotranscript can include a wild-type or naturally occurring retrotranscript that has been modified to contain at least one modification, including a single nucleotide substitution, insertion or deletion, or a substitution, insertion or deletion of more than one nucleotide, that is, up to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75 2, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92 ,93,94, 95, 96, 97, 98, 99 or up to 100 or up to 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900 or up to 2000 nucleotides are substituted, inserted or deleted. When more than one nucleotide of the starting point retrotranscript (e.g., wild-type retrotranscript) is substituted, deleted or inserted, the nucleotides may be consecutive or non-consecutive. Although the engineered retrotranscript as a whole does not occur naturally, it may include components that do occur in nature, such as nucleotide sequences. For example, an engineered retrotranscript may have nucleotide sequences from different organisms ( e.g. , from different bacterial species) or from completely synthetic/artificial/recombinant nucleic acid sequences. Thus, an engineered retrotranscript may have bacterial nucleotide sequences, human nucleotide sequences, viral nucleotide sequences, and/or synthetic/artificial/recombinant nucleotide sequences and/or combinations of such sequences. Examples of modifications of the recombinant retrotranscript disclosed herein include inserting a heterologous nucleic acid sequence into the retrotranscript, for example, into a ncRNA locus, such as an msr or msd locus. Attaching a guide RNA molecule to the 5' and/or 3' end (i.e., attaching a molecule to the 5' end of the ncRNA and/or attaching a molecule to the 3' end of the ncRNA) also represents another modification contemplated for the recombinant retrotranscript disclosed herein. In such embodiments, the guide RNA molecule may also be classified or more generally referred to as a type of heterologous nucleic acid sequence used to modify the start site of a retrotransposon .

如本文所用,術語「胞泌體」係指具有內吞起源之小膜結合囊泡。不希望受理論束縛,胞泌體通常在多胞體與細胞質膜融合後自宿主/祖細胞釋放至細胞外環境中。因而,除了所設計之組分(例如,經工程改造之逆轉錄子)以外,胞泌體亦可包括祖細胞膜之組分。胞泌體膜一般為層狀的,由脂質雙層構成,具有水性奈米顆粒間空間。 表現載體 As used herein, the term "exosome" refers to a small membrane-bound vesicle of endocytic origin. Without wishing to be bound by theory, exosomes are typically released from host/progenitor cells into the extracellular environment following fusion of multicellular bodies with the cytoplasmic membrane. Thus, in addition to designed components (e.g., engineered retrotransposons), exosomes may also include components of the progenitor cell membrane. Exosome membranes are generally lamellar, composed of a lipid bilayer with aqueous inter-nanoparticle spaces. Expression vectors

如本文所用,術語「表現載體」或「表現構築體」係指包括一或多個表現控制序列之載體,且「表現控制序列」係控制且調節另一DNA序列之轉錄及/或轉譯的DNA序列。合適表現載體包括但不限於源自例如噬菌體、桿狀病毒、煙草花葉病毒、疱疹病毒、巨細胞病毒、逆轉錄病毒、牛痘病毒、腺病毒及腺相關病毒之質體及病毒載體。許多載體及表現系統為市售的,諸如來自Novagen (Madison, WI)、Clontech (Palo Alto, CA)、Stratagene (La Jolla, CA)及Invitrogen/Life Technologies (Carlsbad, CA)。本發明涵蓋重組載體,其可包括病毒載體、細菌載體、原生動物載體、DNA載體或其重組體。 引導RNA As used herein, the term "expression vector" or "expression construct" refers to a vector that includes one or more expression control sequences, and an "expression control sequence" is a DNA sequence that controls and regulates the transcription and/or translation of another DNA sequence. Suitable expression vectors include, but are not limited to, plasmids and viral vectors derived from, for example, bacteriophages, bacilli, tobacco mosaic viruses, herpes viruses, cytomegaloviruses, retroviruses, vaccinia viruses, adenoviruses, and adeno-associated viruses. Many vectors and expression systems are commercially available, such as from Novagen (Madison, WI), Clontech (Palo Alto, CA), Stratagene (La Jolla, CA), and Invitrogen/Life Technologies (Carlsbad, CA). The present invention encompasses recombinant vectors, which may include viral vectors, bacterial vectors, protozoan vectors, DNA vectors, or recombinants thereof. Guide RNA

與基於逆轉錄子之基因編輯系統的可程式化核酸酶結合且將該核酸酶靶向至靶向多核苷酸序列內之特定位置之RNA分子在本文中係稱為「引導RNA」或「引導RNA多核苷酸」(在本文中亦稱為「引導RNA」或「gRNA」或「crRNA」)。在某些實施例中(取決於與其相互作用之特定核酸酶),引導RNA包含兩個區段,即「DNA靶向區段」及「蛋白質結合區段」。「區段」意謂分子之區段/部分/區域,例如RNA中之連續核苷酸延伸段。作為說明性非限制性實例,引導RNA之蛋白質結合區段可包含長度為40個鹼基對之RNA分子的鹼基對5-20;且DNA靶向區段可包含長度為40個鹼基對之RNA分子的鹼基對21-40。除非特定背景中另外特定地定義,否則「區段」之定義不限於特定數目之總鹼基對,不限於來自既定RNA分子的任何特定數目之鹼基對,不限於複合物內之特定數目之獨立分子,且可包括具有任何總長度的RNA分子區域,且可或可不包括與其他分子互補之區域。The RNA molecule that binds to the programmable nuclease of the retrotranscript-based gene editing system and targets the nuclease to a specific position within the targeted polynucleotide sequence is referred to herein as a "guide RNA" or "guide RNA polynucleotide" (also referred to herein as "guide RNA" or "gRNA" or "crRNA"). In certain embodiments (depending on the specific nuclease with which it interacts), the guide RNA comprises two segments, a "DNA targeting segment" and a "protein binding segment." "Segment" means a segment/portion/region of a molecule, such as a stretch of contiguous nucleotides in an RNA. As an illustrative, non-limiting example, the protein binding segment of the guide RNA may comprise base pairs 5-20 of an RNA molecule of 40 base pairs in length; and the DNA targeting segment may comprise base pairs 21-40 of an RNA molecule of 40 base pairs in length. Unless otherwise specifically defined in a particular context, the definition of "segment" is not limited to a specific number of total base pairs, is not limited to any specific number of base pairs from a given RNA molecule, is not limited to a specific number of individual molecules within a complex, and may include regions of RNA molecules of any overall length, and may or may not include regions that are complementary to other molecules.

DNA靶向區段(或「DNA靶向序列」)包含與本文中指定為「原間隔基樣」序列之靶向多核苷酸序列內之特定序列(該靶向多核苷酸序列之互補股)互補的核苷酸序列。蛋白質結合區段(或「蛋白質結合序列」)與定點修飾多肽相互作用。當定點修飾多肽為CRISPR核酸酶時,靶向多核苷酸序列之位點特異性裂解可發生於由以下兩者確定之位置處:(i)引導RNA與靶向多核苷酸序列之間的鹼基配對互補性;及(ii)靶向多核苷酸序列中之短模體(稱為原間隔基相鄰模體(PAM))。 異源核酸序列 The DNA targeting segment (or "DNA targeting sequence") comprises a nucleotide sequence that is complementary to a specific sequence within the targeting polynucleotide sequence (the complementary strand of the targeting polynucleotide sequence) designated herein as a "protospacer-like" sequence. The protein binding segment (or "protein binding sequence") interacts with the site-directed modifying polypeptide. When the site-directed modifying polypeptide is a CRISPR nuclease, site-specific cleavage of the targeting polynucleotide sequence can occur at a position determined by: (i) base pairing complementarity between the guide RNA and the targeting polynucleotide sequence; and (ii) a short motif in the targeting polynucleotide sequence (called a protospacer adjacent motif (PAM)). Heterologous Nucleic Acid Sequence

如本文所用,術語「異源核酸」係指基因型不同實體,該實體不同於與其進行比較或引入或併入該實體之其餘實體。例如,藉由基因工程改造技術引入不同細胞類型中之多核苷酸為異源多核苷酸( 例如,DNA或RNA)且若經表現,則可編碼異源多肽。同樣,併入病毒載體中之細胞序列( 例如,基因或其部分)係相對於載體之異源核苷酸序列。在一些實施例中,插入野生型逆轉錄子區域中之異源序列並非天然插入此類區域中( 例如,具有插入之異源序列的經工程改造之逆轉錄子並非天然存在的)。例如,異源序列可來自其中通常發現野生型逆轉錄子之相同物種之細菌,只要該異源序列並非天然插入野生型逆轉錄子中該異源序列所插入之位置處。在某些實施例中,異源序列為哺乳動物序列( 例如,人類序列)或其反向補體。引入逆轉錄子中之異源核酸序列可包括但不限於引導RNA序列、HDR供體模板、蛋白質編碼基因或非編碼功能性RNA元件(例如,莖-環、髮夾及凸起)。 同源定向修復 As used herein, the term "heterologous nucleic acid" refers to a genotypically different entity that is different from the rest of the entity with which it is compared or introduced or incorporated. For example, a polynucleotide introduced into a different cell type by genetic engineering techniques is a heterologous polynucleotide ( e.g. , DNA or RNA) and, if expressed, can encode a heterologous polypeptide. Similarly, a cellular sequence ( e.g. , a gene or portion thereof) incorporated into a viral vector is a heterologous nucleotide sequence relative to the vector. In some embodiments, a heterologous sequence inserted into a wild-type retrotranscript region is not naturally inserted into such a region ( e.g. , an engineered retrotranscript with an inserted heterologous sequence does not occur naturally). For example, the heterologous sequence can be from a bacterium of the same species in which the wild-type retrotranscript is normally found, as long as the heterologous sequence is not naturally inserted into the wild-type retrotranscript at the position where the heterologous sequence is inserted. In certain embodiments, the heterologous sequence is a mammalian sequence ( e.g. , a human sequence) or a reverse complement thereof. The heterologous nucleic acid sequence introduced into the retrotranscript may include, but is not limited to, a guide RNA sequence, an HDR donor template, a protein coding gene, or a non-coding functional RNA element (e.g., a stem-loop, a hairpin, and a bulge). Homology-directed repair

如本文所用,「同源定向修復(HDR)」係指例如在細胞中之雙股斷裂的修復期間發生之特殊形式之DNA修復。此過程需要核苷酸序列同源性,使用「供體」分子對「標靶」分子(亦即,經歷雙股斷裂之分子)進行模板修復,且導致遺傳資訊自供體轉移至標靶。若供體多核苷酸不同於標靶分子且供體多核苷酸之部分或全部序列併入靶向多核苷酸序列中,則同源定向修復可導致標靶分子序列之改變(例如,插入、缺失、突變)。 一致 As used herein, "homologous directed repair (HDR)" refers to a specific form of DNA repair that occurs, for example, during the repair of a double-strand break in a cell. This process requires nucleotide sequence homology, uses a "donor" molecule to template the repair of a "target" molecule (i.e., the molecule that has undergone a double-strand break), and results in the transfer of genetic information from the donor to the target. If the donor polynucleotide is different from the target molecule and part or all of the sequence of the donor polynucleotide is incorporated into the sequence of the targeting polynucleotide, homology-directed repair can result in an alteration (e.g., insertion, deletion, mutation ) in the sequence of the target molecule.

如本文所用,術語「一致」係指兩個或兩個以上相同之序列或子序列。此外,如本文所用,術語「實質上一致」係指當針對最大對應在比較窗口或指定區域上進行比較及比對時,兩個或兩個以上序列具有一定百分比之相同的連續單元,如使用比較算法或藉由手動比對及視覺檢查所量測。僅舉例而言,若該等連續單元在規定區域上為約60%一致、約65%一致、約70%一致、約75%一致、約80%一致、約85%一致、約90%一致或約95%一致,則兩個或兩個以上序列可為「實質上一致的」。此類百分比描述兩個或兩個以上序列之「一致性百分比」。序列一致性可存在於長度為至少約75-100個連續單元之區域上、長度為約50個連續單元之區域上或在未規定之情況下存在於整個序列上。此定義亦指測試序列之補體。As used herein, the term "identical" refers to two or more identical sequences or subsequences. In addition, as used herein, the term "substantially identical" refers to two or more sequences having a certain percentage of identical contiguous units when compared and aligned over a comparison window or specified region for maximum correspondence, as measured using a comparison algorithm or by manual alignment and visual inspection. By way of example only, two or more sequences may be "substantially identical" if the contiguous units are about 60% identical, about 65% identical, about 70% identical, about 75% identical, about 80% identical, about 85% identical, about 90% identical, or about 95% identical over a specified region. Such percentages describe the "percentage of identity" of two or more sequences. Sequence identity can exist over a region that is at least about 75-100 contiguous units in length, over a region that is about 50 contiguous units in length, or, where not specified, over the entire sequence. This definition also refers to complements of test sequences.

或者,當核酸或其片段在嚴格雜交條件下與另一核酸、另一核酸之股或其互補股雜交時,存在實質上一致或相似性。在核酸雜交實驗之背景中之「嚴格雜交條件」及「嚴格洗滌條件」取決於多種不同的物理參數。核酸雜交將受諸如鹽濃度、溫度、溶劑、雜交物質之基本組成、互補區域之長度以及雜交核酸之間的核苷酸鹼基錯配數目之條件影響,如熟習此項技術者應容易瞭解。一般技術者知曉如何改變此等參數來實現特定雜交嚴格度。 脂質奈米顆粒(LNP) Alternatively, substantial identity or similarity exists when a nucleic acid or fragment thereof is hybridized with another nucleic acid, a strand of another nucleic acid, or a complementary strand thereof under strict hybridization conditions. "Strict hybridization conditions" and "strict wash conditions" in the context of nucleic acid hybridization experiments depend on a variety of different physical parameters. Nucleic acid hybridization will be affected by conditions such as salt concentration, temperature, solvent, basic composition of the hybridization medium, length of the complementary region, and the number of nucleotide base mismatches between the hybridized nucleic acids, as should be readily understood by those skilled in the art. Those of ordinary skill know how to vary these parameters to achieve a particular hybridization stringency. Lipid Nanoparticles (LNP)

如本文所用,術語「脂質奈米顆粒」或LNP係指一類由小固體或半固體顆粒形成之脂質顆粒遞送系統,該等顆粒具有外部脂質層,該外部脂質層具有暴露於非LNP環境之親水性外表面;內部空間,該內部空間可為水性的(囊泡樣)或非水性的(膠束樣);以及至少一個疏水性膜間空間。LNP膜可為層狀或非層狀的且可包含1、2、3、4、5個或更多個層。在一些實施例中,LNP可包含進入其內部空間中、進入膜間空間中、到達其外表面上或其任何組合之核酸( 例如,經工程改造之逆轉錄子)。在一些實施例中,本揭示案之LNP包含可離子化脂質、結構脂質、PEG化脂質(亦稱為PEG脂質)及磷脂。在替代實施例中,LNP包含可離子化脂質、結構脂質、PEG化脂質(亦稱為PEG脂質)及兩性離子胺基酸脂質。 As used herein, the term "lipid nanoparticle" or LNP refers to a class of lipid particle delivery systems formed by small solid or semisolid particles having an outer lipid layer with a hydrophilic outer surface exposed to the non-LNP environment; an interior space that can be aqueous (vesicle-like) or non-aqueous (micell-like); and at least one hydrophobic intermembrane space. The LNP membrane can be lamellar or non-lamellar and can comprise 1, 2, 3, 4, 5 or more layers. In some embodiments, the LNP can comprise a nucleic acid ( e.g. , an engineered retrotransposon) that enters its interior space, enters the intermembrane space, reaches its outer surface, or any combination thereof. In some embodiments, the LNP of the present disclosure comprises ionizable lipids, structural lipids, PEGylated lipids (also referred to as PEG lipids) and phospholipids. In alternative embodiments, the LNP comprises ionizable lipids, structural lipids, PEGylated lipids (also referred to as PEG lipids) and zwitterionic amino acid lipids.

關於脂質體之進一步論述可見於例如Tenchov等人, 「Lipid Nanoparticles – From Liposomes to mRNA Vaccine Delivery, a Landscape of Diversity and Advancement,」 ACS Nano, 2021, 15, 第16982-17015頁中(其內容以引用之方式併入)。 連接體 Further discussion of liposomes can be found in, for example, Tenchov et al., "Lipid Nanoparticles - From Liposomes to mRNA Vaccine Delivery, a Landscape of Diversity and Advancement," ACS Nano , 2021, 15, pp. 16982-17015 (incorporated by reference).

如本文所用,術語「連接體」係指連接或接合兩個其他分子或部分之分子。在連接體接合兩個融合蛋白之情況下,連接體可為胺基酸序列。例如,RNA引導之核酸酶(例如Cas12a)可藉由胺基酸連接體序列與逆轉錄子逆轉錄酶融合。在將兩個核苷酸序列接合在一起之情況下,連接體亦可為核苷酸序列。例如,在當前情況下,ncRNA在其5'及/或3'端可藉由核苷酸序列連接體連接至一或多個引導RNA。在其他實施例中,連接體為有機分子、基團、聚合物或化學部分。在一些實施例中,連接體為5-100個胺基酸長,例如5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、30-35、35-40、40-45、45-50、50-60、60-70、70-80、80-90、90-100、100-150或150-200個胺基酸長。亦考慮更長或更短之連接體。 脂質體 As used herein, the term "linker" refers to a molecule that connects or joins two other molecules or parts. In the case where the linker joins two fusion proteins, the linker can be an amino acid sequence. For example, an RNA-guided nuclease (e.g., Cas12a) can be fused to a retrotranscriptase by an amino acid linker sequence. In the case of joining two nucleotide sequences together, the linker can also be a nucleotide sequence. For example, in the present case, ncRNA can be connected to one or more guide RNAs at its 5' and/or 3' end by a nucleotide sequence linker. In other embodiments, the linker is an organic molecule, a group, a polymer, or a chemical moiety. In some embodiments, the linker is 5-100 amino acids long, e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids long. Longer or shorter linkers are also contemplated. Liposomes

如本文所用,術語「脂質體」係指一類包含小囊泡之脂質顆粒遞送系統,該等小囊泡含有至少一個圍繞水性奈米顆粒內部空間之脂質膜,該水性奈米顆粒內部空間一般未源自祖細胞/宿主細胞。脂質體為通用載劑平台,因為其能夠將疏水性或親水性分子(包括小分子、蛋白質及核酸)轉運至細胞中。其為最早開發之一代奈米級藥物遞送平台。多種脂質體藥物調配物已獲批用於人類藥物,例如抗腫瘤劑多柔比星之脂質奈米顆粒調配物Doxil。關於脂質體之進一步論述可見於例如Tenchov等人, 「Lipid Nanoparticles – From Liposomes to mRNA Vaccine Delivery, a Landscape of Diversity and Advancement,」 ACS Nano, 2021, 15, 第16982-17015頁中(其內容以引用之方式併入)。 As used herein, the term "liposome" refers to a class of lipid particle delivery systems comprising vesicles containing at least one lipid membrane surrounding an aqueous nanoparticle interior space that is generally not derived from progenitor cells/host cells. Liposomes are universal carrier platforms because they can transport hydrophobic or hydrophilic molecules (including small molecules, proteins, and nucleic acids) into cells. It is one of the earliest developed nanoscale drug delivery platforms. A variety of liposomal drug formulations have been approved for use in human drugs, such as Doxil, a lipid nanoparticle formulation of the anti-tumor agent doxorubicin. Further discussion of liposomes can be found in, for example, Tenchov et al., “Lipid Nanoparticles – From Liposomes to mRNA Vaccine Delivery, a Landscape of Diversity and Advancement,” ACS Nano, 2021, 15, pp. 16982-17015 (incorporated by reference ).

如本文所用,多核苷酸中之術語「環」係指一或多個核苷酸之單股延伸段,諸如2、3、4、5、6、7、8、9或10個核苷酸,其中環之最接近5'核苷酸及最接近3'核苷酸各自連接至莖中之鹼基配對核苷酸。 膠束 As used herein, the term "loop" in a polynucleotide refers to a single-stranded stretch of one or more nucleotides, such as 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides, wherein the 5'-most nucleotide and the 3'-most nucleotide of the loop are each linked to a base-paired nucleotide in the stem.

如本文所用,術語「膠束」係指不具有水性顆粒內空間之小顆粒。 奈米顆粒 As used herein, the term " micell " refers to small particles that do not have aqueous intraparticle space.

如本文所用,術語「奈米顆粒」係指尺寸通常介於約1 nm至1000 nm範圍內之任何奈米級顆粒。 核定位序列(NLS) As used herein, the term "nanoparticle" refers to any nanoscale particle with a size generally ranging from about 1 nm to 1000 nm. Nuclear Localization Sequence (NLS)

如本文所用,術語「核定位序列」或「NLS」係指促進蛋白質(例如,RNA引導之核酸酶)例如藉由核轉運輸入細胞核中之胺基酸序列。核定位序列為此項技術中已知的。例如,NLS序列描述於Plank等人之國際PCT申請案PCT/EP2000/011690中,該申請案在2000年11月23日申請,在2001年5月31日作為WO/2001/038547公開,針對其中關於例示性核定位序列之揭示內容,其內容以引用之方式併入本文中。 核酸 As used herein, the term "nuclear localization sequence" or "NLS" refers to an amino acid sequence that facilitates the import of a protein (e.g., an RNA-guided nuclease) into the cell nucleus, such as by nuclear transport. Nuclear localization sequences are known in the art. For example, NLS sequences are described in International PCT Application No. PCT/EP2000/011690 to Plank et al., filed November 23, 2000, published as WO/2001/038547 on May 31, 2001, which is incorporated herein by reference for its disclosure of exemplary nuclear localization sequences. Nucleic Acids

如本文所用,術語「核酸」或「核酸分子」或「核酸序列」或「多核苷酸」一般係指呈單股或雙股形式之去氧核糖核酸或核糖核酸寡核苷酸。該術語亦可涵蓋含有天然核苷酸之已知類似物之寡核苷酸。該術語亦可亦涵蓋具有合成主鏈之核酸樣結構,參見例如Eckstein, 1991;Baserga等人, 1992;Milligan, 1993;WO 97/03211;WO 96/39154;Mata, 1997;Strauss-Soukup, 1997;及Samstag, 1996。該術語涵蓋核糖核酸(RNA)及DNA兩者,包括cDNA (包括RT DNA)、基因體DNA、合成、合成之(例如,化學合成之) DNA及/或含有核酸類似物之DNA (或RNA)。核苷酸腺嘌呤(A)、胸腺嘧啶(T)、鳥嘌呤(G)及胞嘧啶(C)亦可(或可不)涵蓋核苷酸修飾,例如甲基化及/或羥基化核苷酸,例如胞嘧啶(C)涵蓋5-甲基胞嘧啶及5-羥基甲基胞嘧啶。 核酸引導之核酸酶 As used herein, the term "nucleic acid" or "nucleic acid molecule" or "nucleic acid sequence" or "polynucleotide" generally refers to a deoxyribonucleic acid or ribonucleic acid oligonucleotide in single- or double-stranded form. The term also encompasses oligonucleotides containing known analogs of natural nucleotides. The term also encompasses nucleic acid-like structures with synthetic backbones, see, e.g., Eckstein, 1991; Baserga et al., 1992; Milligan, 1993; WO 97/03211; WO 96/39154; Mata, 1997; Strauss-Soukup, 1997; and Samstag, 1996. The term encompasses both ribonucleic acid (RNA) and DNA, including cDNA (including RT DNA), genomic DNA, synthetic, synthetic (e.g., chemically synthesized) DNA, and/or DNA (or RNA) containing nucleic acid analogs. The nucleotides adenine (A), thymine (T), guanine (G) and cytosine (C) may or may not also include nucleotide modifications, such as methylated and/or hydroxylated nucleotides, for example, cytosine (C) includes 5-methylcytosine and 5-hydroxymethylcytosine. Nucleic acid-guided nuclease

如本文所用,術語「核酸引導之核酸酶」或「核酸引導之核酸內切酶」係指核酸酶,該核酸酶與引導核酸(例如,引導RNA或引導DNA)共價或非共價締合,由此在引導核酸與核酸引導之核酸酶之間形成複合物。指導核酸包含間隔序列,該間隔序列包含與標靶DNA序列之股具有互補性之核苷酸序列。因此,核酸引導之核酸酶藉由其與引導核酸之締合間接地經引導或經編程以定位至DNA分子中之特定位點,該引導核酸經由Watson-Crick鹼基配對藉由其互補區域直接地結合或退火至標靶DNA之一股。在一些實施例中,核酸引導之核酸酶將包括DNA結合活性(例如,如在CRISPR Cas9之情況下)。最常見的是,核酸引導之核酸酶藉由與引導RNA分子締合而經編程且在此類情況下,該核酸酶可稱為「RNA引導之核酸酶」。當由引導DNA編程時,該核酸酶可稱為「DNA引導之核酸酶」。核酸引導、RNA引導或DNA引導之核酸酶亦可稱為「可程式化核酸酶」,其亦包括藉由胺基酸/核苷酸序列識別(例如,鋅指核酸酶(ZFN)及轉錄活化子樣效應子核酸酶(TALEN))而非藉由引導RNA與特定DNA序列締合之其他類別之可程式化核酸酶。另外,本文所考慮之任何核酸酶亦可經工程改造以移除、不活化或以其他方式消除一或多種核酸酶活性(例如,藉由在核酸酶之活性位點中引入核酸酶不活化突變)。已經修飾以移除、不活化或以其他方式消除所有核酸酶活性之核酸酶可稱為「死」核酸酶。死核酸酶無法切割雙股DNA分子之任一股。已經修飾以移除、不活化或以其他方式消除至少一種核酸酶活性但仍保留至少一種核酸酶活性之核酸酶可稱為「切口酶」核酸酶。切口酶核酸酶切割雙股DNA分子之一股,而非兩股。例如,CRISPR Cas9天然地包含兩個不同的核酸酶活性結構域,即HNH結構域及RuvC結構域。HNH結構域切割與引導RNA結合之DNA股且RuvC結構域切割原間隔基股。可藉由使HNH結構域或RuvC結構域不活化來獲得切口酶Cas9。可藉由使HNH結構域及RuvC結構域兩者不活化來獲得死Cas9。同樣,可藉由使一或多個現有核酸酶結構域不活化,將其他RNA引導之核酸酶轉化為切口酶及/或死核酸酶。 可操作地連接 As used herein, the term "nucleic acid-guided nuclease" or "nucleic acid-guided endonuclease" refers to a nuclease that covalently or non-covalently binds to a guide nucleic acid (e.g., a guide RNA or a guide DNA), thereby forming a complex between the guide nucleic acid and the nucleic acid-guided nuclease. The guide nucleic acid comprises a spacer sequence comprising a nucleotide sequence that is complementary to a strand of a target DNA sequence. Thus, the nucleic acid-guided nuclease is indirectly guided or programmed to localize to a specific site in a DNA molecule by its association with a guide nucleic acid that directly binds or anneals to a strand of the target DNA through its complementary region via Watson-Crick base pairing. In some embodiments, the nucleic acid-guided nuclease will include DNA binding activity (e.g., as in the case of CRISPR Cas9). Most commonly, nucleic acid-guided nucleases are programmed by binding to a guide RNA molecule and in such cases, the nuclease may be referred to as an "RNA-guided nuclease." When programmed by a guide DNA, the nuclease may be referred to as a "DNA-guided nuclease." Nucleic acid-guided, RNA-guided, or DNA-guided nucleases may also be referred to as "programmable nucleases," which also include other classes of programmable nucleases that recognize by amino acid/nucleotide sequence (e.g., zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs)) rather than by binding to a specific DNA sequence via a guide RNA. In addition, any of the nucleases contemplated herein may also be engineered to remove, inactivate, or otherwise eliminate one or more nuclease activities (e.g., by introducing a nuclease-inactivating mutation in the active site of the nuclease). Nucleases that have been modified to remove, inactivate, or otherwise eliminate all nuclease activity may be referred to as "dead" nucleases. Dead nucleases are unable to cleave either strand of a double-stranded DNA molecule. Nucleases that have been modified to remove, inactivate, or otherwise eliminate at least one nuclease activity but still retain at least one nuclease activity may be referred to as "nickase" nucleases. Nickase nucleases cleave one strand of a double-stranded DNA molecule, not both strands. For example, CRISPR Cas9 naturally comprises two different nuclease activity domains, the HNH domain and the RuvC domain. The HNH domain cleaves the DNA strand bound to the guide RNA and the RuvC domain cleaves the protospacer strand. Nickase Cas9 can be obtained by inactivating the HNH domain or the RuvC domain. Dead Cas9 can be obtained by inactivating both the HNH domain and the RuvC domain. Likewise, other RNA-guided nucleases can be converted into nickases and /or dead nucleases by inactivating one or more of the existing nuclease domains.

如本文所用,術語「可操作地連接」或「在轉錄控制下」當與啟動子之描述結合使用時,係指與多核苷酸( 例如,編碼序列)相關之正確位置及取向以控制RNA聚合酶之轉錄的起始及編碼序列(諸如 msr基因、 msd基因及/或 ret基因之編碼序列)之表現。若其他轉錄控制調節元件(例如,增強子序列、轉錄因子結合位點)相對於基因之位置控制或調節基因之表現,則其亦可操作地連接至基因。 可程式化核酸酶 As used herein, the term "operably linked" or "under transcriptional control" when used in conjunction with the description of a promoter refers to the correct position and orientation relative to a polynucleotide ( e.g. , a coding sequence) to control the initiation of transcription by RNA polymerase and the expression of the coding sequence (e.g., the coding sequence of the msr gene, msd gene, and/or ret gene). Other transcriptional control regulatory elements (e.g., enhancer sequences, transcription factor binding sites) are also operably linked to a gene if their position relative to the gene controls or regulates the expression of the gene. Programmable Nucleases

如本文所用,術語「可程式化核酸酶」意欲指由於一或多種靶向功能而具有選擇性定位至核酸分子中之特定所需核苷酸序列標靶(例如,特定基因標靶)之特性的多肽。此類靶向功能可包括一或多個DNA結合結構域,諸如多種不同類型之DNA結合蛋白所特有的鋅指結構域或TALEN蛋白所特有的TALE結構域。此類靶向功能亦可包括與引導RNA締合及/或形成複合物之能力,該引導RNA接著定位至DNA上之特定位點,該位點具有與引導RNA之一部分互補的序列(亦即,引導RNA之間隔基)。在一些實施例中,可程式化核酸酶可為單一蛋白質,其包含直接地(例如,ZF蛋白質)或間接地(例如,RNA引導之蛋白質)結合至標靶DNA位點之結構域以及核酸酶結構域。在其他實施例中,可程式化核酸酶可為兩個或兩個以上單獨蛋白質或結構域(來自不同蛋白質)之複合物,該等蛋白質或結構域一起提供選擇性DNA結合及核酸酶活性之必需功能。例如,可程式化核酸酶可包含(a)核酸酶無活性RNA引導之核酸酶(其仍能結合引導RNA、定位至標靶DNA且結合至標靶DNA,但不能對各股進行切割或刻切口),融合至(b)核酸酶蛋白或結構域,諸如FokI核酸酶。 啟動子 As used herein, the term "programmable nuclease" is intended to refer to a polypeptide having the property of selectively localizing to a specific desired nucleotide sequence target (e.g., a specific gene target) in a nucleic acid molecule due to one or more targeting functions. Such targeting functions may include one or more DNA binding domains, such as zinc finger domains unique to a variety of different types of DNA binding proteins or TALE domains unique to TALEN proteins. Such targeting functions may also include the ability to bind and/or form a complex with a guide RNA, which then localizes to a specific site on the DNA having a sequence that is complementary to a portion of the guide RNA (i.e., a spacer base of the guide RNA). In some embodiments, a programmable nuclease may be a single protein comprising a domain that binds directly (e.g., a ZF protein) or indirectly (e.g., an RNA-guided protein) to a target DNA site and a nuclease domain. In other embodiments, a programmable nuclease may be a complex of two or more separate proteins or domains (from different proteins) that together provide the necessary functions of selective DNA binding and nuclease activity. For example, a programmable nuclease may comprise (a) a nuclease-inactive RNA-guided nuclease (which is still able to bind guide RNA, localize to target DNA, and bind to target DNA, but is unable to cut or nick strands), fused to (b) a nuclease protein or domain, such as the FokI nuclease. Promoter

如本文所用,術語「啟動子」為技術公認的且係指具有由細胞轉錄機器識別之序列且能夠起始下游基因之轉錄的核酸分子。啟動子可為組成性活性的,意謂啟動子在既定細胞背景中始終具有活性,或為條件性活性的,意謂啟動子僅在特定條件存在下具有活性。例如,條件啟動子可能僅在將與啟動子中之調節元件相關的蛋白質連接至基本轉錄機器之特定蛋白質存在下具有活性,或僅在抑制分子不存在下具有活性。在啟動子序列內會發現轉錄起始位點以及負責RNA聚合酶結合之蛋白質結合結構域。真核啟動子通常將但並非始終含有「TATA」盒及「CAT」盒。包括誘導型啟動子在內之各種啟動子可用於驅動本揭示案之各種載體之表現。 重組核酸 As used herein, the term "promoter" is art-recognized and refers to a nucleic acid molecule having a sequence recognized by the cellular transcription machinery and capable of initiating transcription of downstream genes. Promoters can be constitutively active, meaning that the promoter is always active in a given cellular background, or conditionally active, meaning that the promoter is only active under specific conditions. For example, a conditional promoter may be active only in the presence of a specific protein that connects the protein associated with the regulatory element in the promoter to the basic transcription machinery, or only in the absence of an inhibitory molecule. Within the promoter sequence will be found a transcription start site and a protein binding domain responsible for RNA polymerase binding. Eukaryotic promoters will often, but not always, contain a "TATA" box and a "CAT" box. Various promoters, including inducible promoters, can be used to drive expression of the various vectors of the present disclosure. Recombinant Nucleic Acids

「重組核酸」或「重組核苷酸」係指藉由接合核酸分子構建之分子,該等核酸分子視情況可在活細胞中自複製。 逆轉錄子 "Recombinant nucleic acid" or "recombinant nucleotide" refers to a molecule constructed by joining nucleic acid molecules that can, if appropriate, replicate themselves in living cells.

如本文所用,術語「逆轉錄子」係指在多種細菌之基因體中發現的特定類型之天然存在且獨特之DNA序列,其通常編碼三種不同組分,亦即,(a)非編碼RNA (「ncRNA」) (包含連續反向序列(msr及msd),(b)逆轉錄酶(RT)編碼基因(ret),以及(c)在多種情況下,功能未知之逆轉錄子相關基因。逆轉錄子尤其藉由其產生衛星DNA之獨特能力加以定義,該衛星DNA稱為msDNA (多複本單股DNA)。ncRNA (包含 msrmsd元件)及 ret基因經轉錄為單一多順反子RNA轉錄本,該轉錄本經加工成ncRNA轉錄本及編碼 ret基因之轉錄本。ncRNA接著折疊成特定二級結構。一旦經轉譯,RT接著結合經折疊之ncRNA且逆轉錄 msd區域以形成cDNA單股(msDNA),該股經由2’-5'磷酸二酯鍵及msDNA與RNA模板之3'端之間之鹼基配對,保持共價連接至RNA模板。參見圖1A,其提供由天然存在之逆轉錄子產生msDNA之示意圖。 逆轉錄子組分 As used herein, the term "retrotranscriptome" refers to a specific type of naturally occurring and unique DNA sequence found in the genome of a variety of bacteria, which typically encodes three different components, namely, (a) noncoding RNA ("ncRNA") (including consecutive reverse sequences (msr and msd), (b) a reverse transcriptase (RT) encoding gene (ret), and (c) in many cases, a retrotranscriptome-related gene of unknown function. Retrotranscriptomes are defined, among other things, by their unique ability to generate satellite DNA, which is called msDNA (multi-copy single-stranded DNA). ncRNA (comprising msr and msd elements) and the ret gene are transcribed as a single polycistronic RNA transcript, which is processed into a ncRNA transcript and a transcript encoding the ret gene. The ncRNA is then folded into a specific secondary structure. Once translated, RT then binds to the folded ncRNA and reverse transcribes the msd region to form a single strand of cDNA (msDNA), which remains covalently linked to the RNA template via a 2'-5' phosphodiester bond and base pairing between the msDNA and the 3' end of the RNA template. See Figure 1A, which provides a schematic diagram of the production of msDNA from a naturally occurring retrotranscript. Retrotranscript components

如本文所用,術語「逆轉錄子組分」係指逆轉錄子之獨特元件或特徵,亦即,(a)非編碼RNA (「ncRNA」) (包含連續反向序列(msr及msd),(b)逆轉錄酶(RT)編碼基因(ret),以及(c)在多種情況下,功能未知之逆轉錄子相關基因。 RNA 引導之核酸酶 As used herein, the term "retrotranscriptome components" refers to unique elements or features of a retrotranscriptome, namely, (a) noncoding RNA ("ncRNA") (comprising consecutive reverse sequences (msr and msd), (b) a reverse transcriptase (RT) encoding gene (ret), and (c) in many cases, a retrotranscriptome-associated gene of unknown function. RNA- guided nuclease

如本文所用,「RNA引導之核酸酶」為一種類型之「可程式化核酸酶」及一種特定類型之「核酸引導之核酸酶」。如本文所用,術語「RNA引導之核酸酶」或「RNA引導之核酸內切酶」係指核酸酶,該核酸酶與引導RNA共價或非共價締合,由此在引導RNA與RNA引導之核酸酶之間形成複合物。引導RNA包含間隔序列,該間隔序列包含與標靶DNA序列之股具有互補性之核苷酸序列。因此,RNA引導之核酸酶藉由其與引導RNA之締合間接地經引導或經編程以定位至DNA分子中之特定位點,該引導RNA經由Watson-Crick鹼基配對藉由其互補區域直接地結合或退火至標靶DNA之一股。 序列一致性 As used herein, "RNA-guided nuclease" is a type of "programmable nuclease" and a specific type of "nucleic acid-guided nuclease." As used herein, the term "RNA-guided nuclease" or "RNA-guided endonuclease" refers to a nuclease that covalently or non-covalently binds to a guide RNA, thereby forming a complex between the guide RNA and the RNA-guided nuclease. The guide RNA comprises a spacer sequence comprising a nucleotide sequence that is complementary to a strand of the target DNA sequence. Thus, the RNA-guided nuclease is indirectly guided or programmed to localize to a specific site in a DNA molecule by its binding to the guide RNA, which directly binds or anneals to a strand of the target DNA through its complementary region via Watson-Crick base pairing. Sequence identity

如本文所用,術語「序列一致性」係指聚合物分子之間,例如多核苷酸分子( 例如,DNA分子及/或RNA分子)之間及/或多肽分子之間的總體相關性。舉例而言,可藉由比對兩個多核苷酸序列來執行兩個序列之一致性百分比的計算,以實現最佳比較目的( 例如,可在第一及第二核酸序列中之一者或兩者中引入間隙以實現最佳比對且出於比較目的,可忽略非一致序列)。例如,出於比較目的而比對之序列的長度為參考序列之長度的至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少90%、至少95%或100%。接著比較在對應核苷酸位置處之核苷酸。當第一序列中之位置由與第二序列中之相應位置相同的核苷酸佔據時,則該等分子在彼位置處一致。兩個序列之間之一致性百分比係該等序列所共享之一致位置的數目之函數,其中考慮到間隙數目及每個間隙之長度,需要引入該長度以實現兩個序列之最佳比對。可使用數學算法來完成序列比較及兩個序列之間之一致性百分比的確定。例如,可使用諸如以下文獻中所述之彼等方法的方法來確定兩個核苷酸序列之間之一致性百分比:Computational Molecular Biology, Lesk, A. M.編, Oxford University Press, New York, 1988;Biocomputing: Informatics and Genome Projects, Smith, D. W.編, Academic Press, New York, 1993;Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987;Computer Analysis of Sequence Data, 第I部分, Griffin, A. M.及Griffin, H. G.編, Humana Press, New Jersey, 1994;及Sequence Analysis Primer, Gribskov, M.及Devereux, J.編, M Stockton Press, New York, 1991;其中每一者以引用之方式併入本文中。例如,可使用Meyers及Miller (CABIOS, 1989, 4:11-17)之算法來確定兩個核苷酸序列之間之一致性百分比,該算法已併入使用PAM120權重殘基表、間隙長度罰分12及間隙罰分4之ALIGN程式(2.0版)中。或者,可使用GCG套裝軟體中之GAP程式使用NWSgapdna. CMP矩陣來確定兩個核苷酸序列之間之一致性百分比。通常用於確定序列之間之一致性百分比的方法包括但不限於Carillo, H.及Lipman, D., SIAM J Applied Math., 48:1073 (1988)中所揭示之彼等方法;以引用之方式併入本文中。用於確定一致性之技術經編入公開可得之計算機程式中。確定兩個序列之間之同源性的例示性計算機軟體包括但不限於GCG套裝程式Devereux, J.等人, Nucleic Acids Research, 12(1), 387 (1984))、BLASTP、BLASTN及FASTA Altschul, S. F.等人, J. Molec. Biol., 215, 403 (1990)。 As used herein, the term "sequence identity" refers to the overall relatedness between polymer molecules, such as between polynucleotide molecules ( e.g. , DNA molecules and/or RNA molecules) and/or between polypeptide molecules. For example, the calculation of the percent identity of two sequences can be performed by aligning two polynucleotide sequences for optimal comparison purposes ( e.g. , gaps can be introduced in one or both of the first and second nucleic acid sequences to achieve optimal alignment and non-identical sequences can be ignored for comparison purposes). For example, the length of the sequences aligned for comparison purposes is at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or 100% of the length of the reference sequence. The nucleotides at corresponding nucleotide positions are then compared. When a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps and the length of each gap, which need to be introduced to achieve optimal alignment of the two sequences. The comparison of sequences and determination of the percent identity between two sequences can be accomplished using a mathematical algorithm. For example, the percent identity between two nucleotide sequences can be determined using methods such as those described in Computational Molecular Biology, Lesk, AM, ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, DW, ed., Academic Press, New York, 1993; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; Computer Analysis of Sequence Data, Part I, Griffin, AM and Griffin, HG, eds., Humana Press, New Jersey, 1994; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991; each of which is incorporated herein by reference. For example, the percent identity between two nucleotide sequences can be determined using the algorithm of Meyers and Miller (CABIOS, 1989, 4: 11-17), which has been incorporated into the ALIGN program (version 2.0) using the PAM120 weighted residual table, a gap length penalty of 12, and a gap penalty of 4. Alternatively, the percent identity between two nucleotide sequences can be determined using the GAP program in the GCG software suite using the NWSgapdna.CMP matrix. Methods commonly used to determine the percent identity between sequences include, but are not limited to, those disclosed in Carillo, H. and Lipman, D., SIAM J Applied Math., 48: 1073 (1988); incorporated herein by reference. Techniques for determining identity are incorporated into publicly available computer programs. Exemplary computer software for determining homology between two sequences include, but are not limited to, the GCG suite of programs (Devereux, J. et al., Nucleic Acids Research, 12(1), 387 (1984)), BLASTP, BLASTN, and FASTA (Altschul, SF et al., J. Molec. Biol., 215, 403 (1990).

應注意,當本揭示案提及與另一胺基酸序列(參考胺基酸序列)具有一致性百分比之多肽(包括本說明書中之任何位置,包括表A及實例中),諸如與另一胺基酸序列(參考胺基酸序列)至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%一致之多肽時,有利的是在與參考胺基酸序列具有一致性百分比之多肽中,保留參考胺基酸序列之保守區域(例如,當與其他逆轉錄子RT、諸如本文所鑑定之彼等相比時保守),及/或該多肽具有選自逆轉錄酶之至少一種活性;核酸內切酶活性;核糖核酸內切酶活性或RNA引導之DNA酶活性,及/或其多肽包含:a.一或多個α-螺旋識別葉(REC)及核酸酶葉(NUC);b.楔形(WED)、α-螺旋識別葉(REC)、PAM相互作用(PI)、RuvC核酸酶、橋螺旋(BH)及NUC結構域;或c.選自RuvC、REC、WED、BH、PI及NUC結構域之一或多個結構域,及/或該多肽視情況識別或結合引導RNA或ncRNA。同樣,當本揭示案提及相對於與另一核酸序列或分子(參考核酸序列)具有一致性百分比之核酸序列具有一致性百分比之核酸序列或分子,諸如與另一核酸序列至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%一致之核酸序列時,有利的是在與參考核酸序列具有一致性百分比之核酸序列中,保留參考核酸序列之保守區域(例如,當與其他逆轉錄子RT、諸如本文所鑑定之彼等相比時保守),及/或在由與參考核酸序列具有一致性百分比之核酸序列表現的多肽中,該多肽含有保守區域(例如,當與其他逆轉錄子序列、諸如本文所鑑定之彼等相比時保守),及/或該多肽具有選自逆轉錄酶之至少一種活性;核酸內切酶活性;核糖核酸內切酶活性或RNA引導之DNA酶活性,及/或其多肽包含:a.一或多個α-螺旋識別葉(REC)及核酸酶葉(NUC);b.楔形(WED)、α-螺旋識別葉(REC)、PAM相互作用(PI)、RuvC核酸酶、橋螺旋(BH)及NUC結構域;或c.選自RuvC、REC、WED、BH、PI及NUC結構域之一或多個結構域,及/或該多肽識別或結合引導RNA。 個體 It should be noted that when the present disclosure refers to a polypeptide having a percent identity to another amino acid sequence (reference amino acid sequence) (including anywhere in the specification, including Table A and in the Examples), such as a polypeptide that is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% identical to another amino acid sequence (reference amino acid sequence), it is advantageous to refer to a polypeptide that is at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% identical to the reference amino acid sequence. In a polypeptide having a percent identity to an amino acid sequence, a conserved region of a reference amino acid sequence is retained (e.g., when compared to other retrotranscripts RT, such as those identified herein), and/or the polypeptide has at least one activity selected from reverse transcriptase; endonuclease activity; endoribonuclease activity or RNA-guided DNase activity, and/or the polypeptide comprises: a. one or more α-helical recognition lobe (REC) and nuclease lobe (NUC); b. a wedge (WED), an α-helical recognition lobe (REC), a PAM interaction (PI), a RuvC nuclease, a bridging helix (BH) and a NUC domain; or c. one or more domains selected from RuvC, REC, WED, BH, PI and NUC domains, and/or the polypeptide recognizes or binds to a guide RNA or ncRNA, as appropriate. Similarly, when the disclosure refers to a nucleic acid sequence or molecule having a percent identity relative to a nucleic acid sequence having a percent identity to another nucleic acid sequence or molecule (a reference nucleic acid sequence), such as a nucleic acid sequence that is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, or at least 99.9% identical to the other nucleic acid sequence, it is advantageous to retain conserved regions of the reference nucleic acid sequence in the nucleic acid sequence having a percent identity to the reference nucleic acid sequence (e.g., when compared to the reference nucleic acid sequence). other retrotranscripts, such as those identified herein), and/or in a polypeptide represented by a nucleic acid sequence having a percent identity with a reference nucleic acid sequence, the polypeptide contains a conserved region (e.g., conserved when compared to other retrotranscript sequences, such as those identified herein), and/or the polypeptide has at least one activity selected from reverse transcriptase; endonuclease activity; endoribonuclease activity or RNA-guided DNase activity, And/or its polypeptide comprises: a. one or more α-helical recognition lobe (REC) and nuclease lobe (NUC); b. wedge (WED), α-helical recognition lobe (REC), PAM interaction (PI), RuvC nuclease, bridge helix (BH) and NUC domains; or c. one or more domains selected from RuvC, REC, WED, BH, PI and NUC domains, and/or the polypeptide recognizes or binds to a guide RNA. Individual

如本文所用,術語「個體」係指個別生物體,例如個別哺乳動物。在一些實施例中,個體為人類。在一些實施例中,個體為非人類哺乳動物。在一些實施例中,個體為非人類靈長類動物。在一些實施例中,個體為囓齒動物。在一些實施例中,個體為綿羊、山羊、牛、貓或犬。在一些實施例中,個體為脊椎動物、兩棲類、爬行動物、魚、昆蟲、蠅或線蟲。在一些實施例中,個體為研究動物。在一些實施例中,個體係經遺傳工程改造的,例如,經遺傳工程改造之非人類個體。個體可屬於任一性別且處於任何發育階段。術語「個體(individual)」、「個體(subject)」、「宿主」及「患者」在本文中可互換使用。 As used herein, the term "subject" refers to an individual organism, such as an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, goat, cow, cat, or dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode. In some embodiments, the subject is a research animal. In some embodiments, the subject is genetically engineered, such as a genetically engineered non-human subject. The subject may be of either sex and at any stage of development. The terms "individual", "subject", "host" and "patient" are used interchangeably herein.

如本文所用,術語「莖」係指由「尖端」處連接之反向重複序列形成的兩個或兩個以上鹼基對,諸如3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20個或更多個鹼基對,其中莖之更接近5'或「上游」股彎曲,以允許更接近3'或「下游」股與上游股進行鹼基配對。莖中之鹼基對的數目為莖之「長度」。莖之尖端通常為至少3個核苷酸,但可為4、5、6、7、8、9、10、11、12、13、14、15個或更多個核苷酸。具有超過5個核苷酸之較大尖端亦稱為「環」。在其他情況下連續之莖可由如本文所定義之一或多個凸起中斷。凸起中之未配對核苷酸的數目未包括於莖之長度中。最接近尖端之凸起的位置可藉由凸起與尖端之間之鹼基對的數目來描述( 例如,凸起距尖端4 bp)。遠離尖端之其他凸起(若有)的位置可藉由莖中在所論述之凸起與尖端之間之鹼基對的數目來描述,不包括其間其他凸起之任何未配對鹼基。 合成或人工核酸 As used herein, the term "stem" refers to two or more base pairs, such as 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more base pairs formed by the inverted repeat sequences connected at the "tip", wherein the more 5' or "upstream" strand of the stem bends to allow the more 3' or "downstream" strand to base pair with the upstream strand. The number of base pairs in the stem is the "length" of the stem. The tip of the stem is usually at least 3 nucleotides, but can be 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more nucleotides. Larger tips with more than 5 nucleotides are also called "loops." In other cases the continuous stem may be interrupted by one or more protrusions as defined herein. The number of unpaired nucleotides in the protrusions is not included in the length of the stem. The position of the protrusion closest to the tip can be described by the number of base pairs between the protrusion and the tip ( e.g. , the protrusion is 4 bp from the tip). The positions of other protrusions (if any) distal to the tip can be described by the number of base pairs in the stem between the protrusion in question and the tip, excluding any unpaired bases of other protrusions in between. Synthetic or artificial nucleic acids

「合成或人工核酸」係指核酸為非天然存在之序列。此類序列並非源自任何活生物體,或已知不存在於任何活生物體中( 例如,基於現有序列資料庫中之序列搜索)。重組核酸及合成核酸亦包括由前述任一者複製產生之彼等分子。本揭示案之經工程改造之核酸構築體(諸如本文所述的經工程改造之逆轉錄子)可由單一分子編碼( 例如,由相同質體或其他合適載體編碼或存在於相同質體或其他合適載體上)或由多種不同分子( 例如,多種獨立複製載體)編碼。 標靶位點 "Synthetic or artificial nucleic acid" refers to a nucleic acid that is a sequence that does not occur naturally. Such sequences are not derived from any living organism or are known not to exist in any living organism ( e.g. , based on sequence searches in existing sequence databases). Recombinant nucleic acids and synthetic nucleic acids also include those molecules produced by replication of any of the foregoing. The engineered nucleic acid constructs of the present disclosure (such as the engineered retrotranscripts described herein) can be encoded by a single molecule ( e.g. , encoded by or present on the same plasmid or other suitable vector) or by multiple different molecules ( e.g. , multiple independently replicating vectors). Target site

如本文所用,如本文所用之「標靶位點」為多核苷酸(例如DNA,諸如基因體DNA),其包括由本文所揭示之重組逆轉錄子基因體修飾系統靶向之位點或特定基因座(「標靶位點」或「標靶序列」)。在本文所揭示之包含RNA引導之核酸酶之逆轉錄子基因體修飾系統的背景下,標靶序列係將與引導核酸(例如,引導RNA)之引導序列雜交之序列。例如,標靶核酸內之標靶位點(或標靶序列) 5'-GTCAATGGACC-3' (SEQ ID NO:19933)由序列5'-GGTCCATTGAC-3' (SEQ ID NO:19934)靶向(或由其結合,或與其雜交,或與其互補)。合適雜交條件包括細胞中通常存在之生理條件。對於雙股標靶核酸,該標靶核酸中與引導RNA互補且雜交之股係稱為「互補股」或「標靶股」;而該標靶核酸中與「標靶股」互補(且因此不與引導RNA互補)之股係稱為「非標靶股」或「非互補股」。 治療 As used herein, a "target site" as used herein is a polynucleotide (e.g., DNA, such as genomic DNA) that includes a site or specific locus ("target site" or "target sequence") that is targeted by the recombinant retrotranscript genome modification system disclosed herein. In the context of the retrotranscript genome modification system comprising an RNA-guided nuclease disclosed herein, the target sequence is a sequence that will hybridize with the guide sequence of a guide nucleic acid (e.g., a guide RNA). For example, the target site (or target sequence) 5'-GTCAATGGACC-3' (SEQ ID NO: 19933) within the target nucleic acid is targeted (or bound by, hybridized with, or complemented with) the sequence 5'-GGTCCATTGAC-3' (SEQ ID NO: 19934). Suitable hybridization conditions include physiological conditions that normally exist in cells. For a double-stranded target nucleic acid, the strand of the target nucleic acid that is complementary to and hybridizes with the guide RNA is called the "complementary strand" or "target strand", and the strand of the target nucleic acid that is complementary to the "target strand" (and therefore not complementary to the guide RNA) is called the "non-target strand" or "non-complementary strand ".

如本文所用,術語「治療(treatment/treat/treating)」係指旨在逆轉、緩解疾病或病症或其一或多種症狀,延遲其發作,或抑制其進展之臨床介入,如本文所述。在一些實施例中,可在發展一或多種症狀之後及/或在診斷出疾病之後投與治療。在其他實施例中,可在無症狀之情況下投與治療,例如以預防或延遲症狀之發作或抑制疾病之發作或進展。例如,可在症狀發作之前向易感個體投與治療(例如,根據症狀史及/或根據遺傳或其他易感因素)。亦可在症狀消退之後繼續治療,例如以預防或延緩其復發。 上游及下游 As used herein, the terms "treatment", "treat", or "treating" refer to clinical intervention intended to reverse, alleviate, delay the onset of, or inhibit the progression of a disease or disorder or one or more symptoms thereof, as described herein. In some embodiments, treatment may be administered after the development of one or more symptoms and/or after the disease is diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, for example, to prevent or delay the onset of symptoms or to inhibit the onset or progression of a disease. For example, treatment may be administered to a susceptible individual before the onset of symptoms (e.g., based on a history of symptoms and/or based on genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence. Upstream and Downstream

如本文所用,術語「上游」及「下游」為相對性術語,該等術語定義位於以5'-3'方向取向之核酸分子(單股或雙股)中的至少兩個元件之線性位置。據說在其中第一元件位於第二元件5'處之某處的核酸分子中,第一元件在第二元件上游。相反,在其中第一元件位於第二元件3'處之某處的核酸分子中,第一元件在第二元件下游。 變異體 As used herein, the terms "upstream" and "downstream" are relative terms that define the linear position of at least two elements located in a nucleic acid molecule (single or double stranded) oriented in a 5'-3' direction. In a nucleic acid molecule in which a first element is located somewhere 5' to a second element, the first element is said to be upstream of the second element. Conversely, in a nucleic acid molecule in which a first element is located somewhere 3' to a second element, the first element is downstream of the second element. Variants

如本文所用,應理解術語「變異體」意謂展現具有偏離自然界中所發生之事情之模式的品質,舉例而言,變異體逆轉錄子RT係如與野生型逆轉錄子RT胺基酸序列相比包含一或多種胺基酸殘基變化之逆轉錄子RT。術語「變異體」涵蓋與參考序列具有至少75%或至少80%或至少85%或至少90%或至少95%或至少99%一致性百分比且具有與參考序列相同或實質上相同之一或多種功能活性的同源蛋白質。該術語亦涵蓋參考序列之突變體、截短或結構域,且其呈現與參考序列相同或實質上相同之一或多種功能活性。 載體 As used herein, the term "variant" is understood to mean a property that exhibits a pattern that deviates from that which occurs in nature, for example, a variant retrotranscript RT is a retrotranscript RT that comprises one or more amino acid residue changes compared to the wild-type retrotranscript RT amino acid sequence. The term "variant" encompasses homologous proteins that have a percent identity of at least 75% or at least 80% or at least 85% or at least 90% or at least 95% or at least 99% with a reference sequence and have one or more functional activities that are the same or substantially the same as the reference sequence. The term also encompasses mutants, truncations or domains of a reference sequence that exhibit one or more functional activities that are the same or substantially the same as the reference sequence. Vector

如本文所用,術語「載體」允許或促進多核苷酸自一種環境轉移至另一環境。其為複製子,諸如質體、噬菌體或黏接質體,可將另一DNA區段插入其中以便引起插入區段之複製( 例如,本發明之經工程改造之逆轉錄子)。通常,當與適當控制元件締合時,載體能夠複製。術語「載體」可包括選殖及表現載體,以及病毒載體及整合載體。 野生型 As used herein, the term "vector" allows or facilitates the transfer of a polynucleotide from one environment to another. It is a replicon, such as a plasmid, phage, or cohesin, into which another DNA segment can be inserted so as to cause replication of the inserted segment ( e.g. , an engineered retrotransposon of the present invention). Typically, a vector is capable of replication when combined with appropriate control elements. The term "vector" can include cloning and expression vectors, as well as viral vectors and integration vectors. Wild type

如本文所用,術語「野生型」為熟練人員所理解之技術術語,且意謂自然界中出現的區別於突變體或變異體形式之生物體、菌株、基因、蛋白質或特徵之典型形式。As used herein, the term "wild type" is a technical term understood by skilled artisans and means the typical form of an organism, strain, gene, protein or trait occurring in nature, as distinguished from mutant or variant forms.

本揭示案提供用於精確基因體編輯之系統、方法及組合物,包括在靶向及精確基因體位點處安裝核酸插入、置換及缺失,其中該等系統、方法及組合物係基於新穎及/或經修飾之逆轉錄子或其組分,諸如表X之逆轉錄子RT的經修飾形式、表A之ncRNA的經修飾形式以及表B之RT的經修飾形式。The present disclosure provides systems, methods and compositions for precise genome editing, including installation of nucleic acid insertions, substitutions and deletions at targeted and precise genomic sites, wherein the systems, methods and compositions are based on novel and/or modified retrotranscripts or components thereof, such as modified forms of the retrotranscript RTs of Table X, modified forms of the ncRNAs of Table A, and modified forms of the RTs of Table B.

在一態樣中,本揭示案提供包含一或多種改良逆轉錄子之功能及/或特性的遺傳修飾之重組逆轉錄子。此類遺傳修飾可包括編碼逆轉錄子或逆轉錄子組分(諸如ncRNA或逆轉錄酶)之核酸分子中的一或多個連續或非連續核鹼基之突變、插入、缺失、倒置、置換、取代或易位。在各個態樣中,經一或多種遺傳修飾修飾之逆轉錄子(亦即,「經預修飾」或「未經修飾」之逆轉錄子或逆轉錄子組分)係天然存在之逆轉錄子或逆轉錄子組分(例如,表A的天然存在之ncRNA或RT),能夠促進細胞中之同源依賴性重組(或HDR),由此導致包含DNA供體模板之msDNA的濃度或量相對增加。在特定實施例中,重組逆轉錄子係基於及/或源自天然存在之逆轉錄子,諸如由表X提供之任何逆轉錄子相關序列(將一或多種遺傳修飾引入藉由本文所述之計算方法(例如,參見實例)發現的7257種先前未知之逆轉錄子的集合中。在其他實施例中,重組逆轉錄子係基於將一或多種遺傳修飾引入先前可用之逆轉錄子序列中(例如,「Mestre等人, Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classi fi cation of The Encoded Tripartite Systems, Nucleic Acids Research, 第48卷, 第22期, 2020年12月16日, 第12632-12647頁」(以引用之方式併入本文)以獲得重組逆轉錄子,該等重組逆轉錄子產生濃度或量增加之包含DNA供體模板之msDNA的能力增強。 In one aspect, the present disclosure provides recombinant retrotransposons comprising one or more genetic modifications that improve the function and/or properties of the retrotransposons. Such genetic modifications may include mutations, insertions, deletions, inversions, substitutions, replacements or translocations of one or more consecutive or non-consecutive nucleotides in a nucleic acid molecule encoding a retrotransposon or a retrotransposon component (such as an ncRNA or a reverse transcriptase). In various aspects, a retrotranscript modified with one or more genetic modifications (i.e., a "pre-modified" or "unmodified" retrotranscript or retrotranscript component) is a naturally occurring retrotranscript or retrotranscript component (e.g., a naturally occurring ncRNA or RT of Table A) that is capable of promoting homology-dependent recombination (or HDR) in a cell, thereby resulting in a relative increase in the concentration or amount of msDNA comprising a DNA donor template. In certain embodiments, the recombinant retrotranscript is based on and/or derived from a naturally occurring retrotranscript, such as any retrotranscript-related sequence provided by Table X (introducing one or more genetic modifications into a collection of 7257 previously unknown retrotranscripts discovered by the computational methods described herein (e.g., see Examples). In other embodiments, the recombinant retrotranscript is based on introducing one or more genetic modifications into a previously available retrotranscript sequence (e.g., "Mestre et al., Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classification of The Encoded Tripartite Systems , Nucleic Acids Research, Vol. 48, No. 22, Dec. 16, 2020, 12632-12647" (incorporated herein by reference) to obtain recombinant retrotranscripts having an enhanced ability to produce increased concentrations or amounts of msDNA comprising a DNA donor template.

在另一態樣中,本揭示案進一步提供編碼重組逆轉錄子及/或重組逆轉錄子組分(例如,重組ncRNA及/或重組逆轉錄子RT)之核酸分子。在又一態樣中,本揭示案提供基因體編輯系統,其包含重組逆轉錄子組分(例如,重組ncRNA及/或重組RT)、可程式化核酸酶(例如RNA引導之核酸酶,諸如CRISPR-Cas蛋白、ZFP及TALENS)以及引導RNA (在RNA引導核酸酶用於該等基因體編輯系統之情況下)。在另一態樣中,本揭示案提供編碼所述基因體編輯系統及該等其組分之核酸分子,以及構成該等基因體編輯系統之組分之多肽。在另一態樣中,本揭示案提供用於例如在 活體外離體活體內條件下轉移及/或表現該等基因體編輯系統之載體。在另一態樣中,本揭示案提供細胞遞送組合物及方法,包括用於被動及/或主動轉運至細胞(例如,質體)、藉由基於病毒之重組載體(例如,AAV及/或慢病毒載體)遞送、藉由非基於病毒之系統(例如脂質體及LNP)遞送以及藉由病毒樣顆粒遞送之組合物。取決於所採用之遞送系統,本文所述之基於逆轉錄子之基因體編輯系統可以DNA (例如,質體或基於DNA之病毒載體)、RNA (例如,由LNP遞送之ncRNA及mRNA)、DNA及RNA混合物、蛋白質(例如,病毒樣顆粒)及核糖核蛋白(RNP)複合物之形式進行遞送。可採用用於遞送本文所揭示之基於逆轉錄子之基因體編輯系統之組分的方法之任何合適組合。在一實施例中,基於逆轉錄子之基因體編輯系統之每種組分藉由全RNA系統進行遞送,例如,藉由一或多種LNP遞送一或多種RNA分子(例如,mRNA及/或ncRNA),其中該一或多種RNA分子形成ncRNA及引導RNA (根據需要)及/或經轉譯成多肽組分(例如,RT及可程式化核酸酶)。在另一態樣中,本揭示案提供藉由將本文所述之基於逆轉錄子之基因體編輯系統引入包含標靶編輯位點的細胞中(例如,在 活體外、活體內離體條件下)來進行基因體編輯之方法,由此導致在標靶編輯處進行編輯。在其他態樣中,本揭示案提供包含任何前述組分之調配物,用於遞送至細胞及/或組織,包括 活體外活體內離體遞送、由本文所述之基於重組逆轉錄子之基因體修飾系統及方法修飾的重組細胞及/或組織以及藉由使用本文所揭示之基於逆轉錄子之基因體修飾系統進行基因體編輯及相關DNA供體依賴性方法(諸如重組工程或細胞記錄)來修飾細胞之方法。本揭示案亦提供製備本文所述之重組逆轉錄子、基於逆轉錄子之基因體修飾系統、載體、組合物及調配物之方法,以及用於在 活體外活體內離體條件下修飾細胞之醫藥組合物及套組,其包含本文所揭示之基因體編輯及/或修飾系統。 In another aspect, the present disclosure further provides nucleic acid molecules encoding recombinant retrotranscripts and/or recombinant retrotranscript components (e.g., recombinant ncRNA and/or recombinant retrotranscript RT). In yet another aspect, the present disclosure provides a genome editing system comprising a recombinant retrotranscript component (e.g., recombinant ncRNA and/or recombinant RT), a programmable nuclease (e.g., RNA-guided nuclease, such as CRISPR-Cas proteins, ZFPs, and TALENS), and a guide RNA (in the case where RNA-guided nucleases are used in the genome editing systems). In another aspect, the present disclosure provides nucleic acid molecules encoding the genome editing systems and the components thereof, and polypeptides constituting the components of the genome editing systems. In another aspect, the present disclosure provides vectors for transferring and/or expressing such genome editing systems, e.g., in vitro , ex vivo , and in vivo conditions. In another aspect, the present disclosure provides cell delivery compositions and methods, including compositions for passive and/or active delivery to cells (e.g., plasmids), delivery by viral-based recombinant vectors (e.g., AAV and/or lentiviral vectors), delivery by non-viral-based systems (e.g., liposomes and LNPs), and delivery by virus-like particles. Depending on the delivery system employed, the retrotransposon-based genome editing systems described herein can be delivered in the form of DNA (e.g., plasmids or DNA-based viral vectors), RNA (e.g., ncRNA and mRNA delivered by LNPs), mixtures of DNA and RNA, proteins (e.g., virus-like particles), and ribonucleoprotein (RNP) complexes. Any suitable combination of methods for delivering the components of the retrotransposon-based genome editing systems disclosed herein can be employed. In one embodiment, each component of the retrotranscript-based genome editing system is delivered by a total RNA system, for example, one or more RNA molecules (e.g., mRNA and/or ncRNA) are delivered by one or more LNPs, wherein the one or more RNA molecules form ncRNA and guide RNA (as needed) and/or are translated into polypeptide components (e.g., RT and programmable nuclease). In another aspect, the present disclosure provides a method for genome editing by introducing the retrotranscript-based genome editing system described herein into a cell comprising a target editing site (e.g., in vitro, in vivo , or in vitro conditions), thereby causing editing to be performed at the target editing site. In other aspects, the disclosure provides formulations comprising any of the foregoing components for delivery to cells and/or tissues, including in vitro , in vivo and ex vivo delivery, recombinant cells and/or tissues modified by the recombinant retrotranscript-based genome modification systems and methods described herein, and methods of modifying cells by performing genome editing and related DNA donor-dependent methods (such as recombineering or cell transcription) using the retrotranscript-based genome modification systems disclosed herein. The present disclosure also provides methods for preparing the recombinant retrotransposons, retrotransposons-based genome modification systems, vectors, compositions and formulations described herein, as well as pharmaceutical compositions and kits for modifying cells in vitro , in vivo and ex vivo , which comprise the genome editing and/or modification systems disclosed herein.

本文描述包含一或多種異源核酸之經工程改造之逆轉錄子。該一或多種異源核酸可例如在選自以下之位置處或內部插入:該 msd基因座、該 msr基因座上游、該 msd基因座上游及該 msd基因座下游。在一些實施例中,經工程改造之逆轉錄子至少在經編碼之ncRNA及/或逆轉錄酶(RT)方面相比其天然存在之配對物或野生型逆轉錄子具有結構改良,使得經工程改造之逆轉錄子或其經編碼之ncRNA當遞送至宿主細胞(諸如哺乳動物宿主細胞)時相比其天然存在/野生型逆轉錄子元件展現多種功能改良。 Described herein are engineered retrotranscripts comprising one or more heterologous nucleic acids. The one or more heterologous nucleic acids can be inserted, for example, at or within a position selected from: the msd locus, upstream of the msr locus, upstream of the msd locus, and downstream of the msd locus. In some embodiments, the engineered retrotranscript has structural improvements over its naturally occurring counterpart or wild-type retrotranscript in at least the encoded ncRNA and/or reverse transcriptase (RT), such that the engineered retrotranscript or its encoded ncRNA exhibits multiple functional improvements over its naturally occurring/wild-type retrotranscript elements when delivered to a host cell (such as a mammalian host cell).

例示性(非限制性功能改良)可包括本文所述之任何一或多種特徵。例如,在一些實施例中,經工程改造之逆轉錄子可在 msr基因座及/或 msd基因座中包含序列修飾( 例如,一或多個核苷酸之插入、缺失及/或取代),該序列修飾:i)調節( 例如,增強) msDNA之逆轉錄、可加工性、準確性/保真度及/或產生( 例如,在哺乳動物細胞中);ii)調節( 例如,降低)宿主( 例如,包含哺乳動物細胞之宿主)中由經工程改造之逆轉錄子編碼( 例如,由 msr基因座及/或 msd基因座編碼)的ncRNA之免疫原性;iii)包含調節( 例如,抑制或拮抗) msDNA功能之核苷酸序列;及/或iv)調節( 例如,改良)靶向基因體工程改造之效率。 Exemplary (non-limiting) functional improvements may include any one or more of the features described herein. For example, in some embodiments, the engineered retrotranscript may comprise a sequence modification ( e.g. , insertion, deletion, and/or substitution of one or more nucleotides) in the msr locus and/or the msd locus that: i) modulates ( e.g. , enhances) the reverse transcription, processibility, accuracy/fidelity, and/or production of msDNA ( e.g. , in mammalian cells); ii) modulates ( e.g. , reduces) the immunogenicity of ncRNA encoded by the engineered retrotranscript ( e.g. , encoded by the msr locus and/or the msd locus) in a host ( e.g. , a host comprising mammalian cells); iii) comprises a nucleotide sequence that modulates ( e.g. , inhibits or antagonizes) msDNA function; and/or iv) modulates ( e.g. , improves) the efficiency of targeted genomic engineering.

因此,一般而言,經工程改造之逆轉錄子係經工程改造之核酸構築體,其包含:a) 編碼非編碼RNA (ncRNA)之第一多核苷酸,該第一多核苷酸包含:i)編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及ii)編碼該msDNA之 msdRNA部分的 msd基因座;及 b) 在選自以下之位置處或內部插入的一或多種異源核酸:該 msd基因座、該 msr基因座上游、該 msd基因座上游及該 msd基因座下游。 Thus, in general, an engineered retrotranscript is an engineered nucleic acid construct comprising: a) a first polynucleotide encoding a non-coding RNA (ncRNA), the first polynucleotide comprising: i) an msr locus encoding the msr RNA portion of multiple copies of single-stranded DNA (msDNA); and ii) an msd locus encoding the msd RNA portion of the msDNA; and b) one or more heterologous nucleic acids inserted at or within a position selected from: the msd locus, upstream of the msr locus, upstream of the msd locus, and downstream of the msd locus.

該經工程改造之核酸構築體(例如,經工程改造之逆轉錄子)可進一步包含編碼逆轉錄酶(RT)或其部分之第二多核苷酸,其中該經編碼RT能夠合成編碼該msDNA之該 msd基因座中的至少一部分之DNA複本。 The engineered nucleic acid construct (eg, engineered retrotranscript) can further comprise a second polynucleotide encoding a reverse transcriptase (RT) or a portion thereof, wherein the encoded RT is capable of synthesizing a DNA copy of at least a portion of the msd locus encoding the msDNA.

在某些實施例中,本發明之經工程改造之逆轉錄子編碼逆轉錄酶(RT)或其功能結構域,其包含:i)表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或ii)表C的任一者列出之多肽。在一些實施例中,RT不包含表X中列出之多肽。In certain embodiments, the engineered retrotranscript of the present invention encodes a reverse transcriptase (RT) or a functional domain thereof comprising: i) a polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polypeptide listed in Table A; and/or ii) a polypeptide listed in any one of Table C. In some embodiments, RT does not comprise a polypeptide listed in Table X.

在某些實施例中,本發明之經工程改造之逆轉錄子編碼逆轉錄酶(RT)或其功能結構域,其包含:i)表A中列出之多核苷酸,或與表A中列出之多核苷酸具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多核苷酸;及/或ii)表C中列出之共有多核苷酸序列。在一些實施例中,編碼RT之多核苷酸不包含表X之多核苷酸。In certain embodiments, the engineered retrotranscript of the present invention encodes a reverse transcriptase (RT) or a functional domain thereof comprising: i) a polynucleotide listed in Table A, or a polynucleotide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polynucleotide listed in Table A; and/or ii) a consensus polynucleotide sequence listed in Table C. In some embodiments, the polynucleotide encoding RT does not comprise a polynucleotide of Table X.

在某些實施例中,本發明之經工程改造之逆轉錄子編碼ncRNA,其包含:(I)表B中列出之ncRNA,或與表B中之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA。In certain embodiments, the engineered retrotranscripts of the present invention encode ncRNAs comprising: (I) an ncRNA listed in Table B, or an ncRNA having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to an ncRNA in Table B.

在某些實施例中,本發明之經工程改造之逆轉錄子編碼ncRNA及逆轉錄酶(RT)或其功能結構域,其中ncRNA及RT或其功能結構域如上文所述。In certain embodiments, the engineered retrotranscript of the present invention encodes ncRNA and reverse transcriptase (RT) or its functional domain, wherein the ncRNA and RT or its functional domain are as described above.

特定言之,在此類實施例中,ncRNA可包含:(I)表B中列出之ncRNA,或與表B中列出之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA。Specifically, in such embodiments, the ncRNA may include: (I) an ncRNA listed in Table B, or an ncRNA having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to an ncRNA listed in Table B.

又在此類實施例中,逆轉錄酶(RT)或其功能結構域包含:(A) i)表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或ii)表C中列出之多肽;視情況,RT不包含表X中列出之多肽;或(B) i)表A中列出之多核苷酸,或與表A中之多核苷酸具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多核苷酸;及/或視情況,編碼RT之多核苷酸不包含表X中之多核苷酸。In such embodiments, the reverse transcriptase (RT) or its functional domain comprises: (A) i) a polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polypeptide listed in Table A; and/or ii) a polypeptide listed in Table C; optionally, RT does not comprise a polypeptide listed in Table X; or (B) i) a polynucleotide listed in Table A, or a polynucleotide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polynucleotide in Table A; and/or, as the case may be, the polynucleotide encoding RT does not comprise a polynucleotide in Table X.

在某些實施例中,經工程改造之核酸構築體包含:1) msr基因座(其編碼msDNA之msr RNA部分);2) msd基因座,其編碼msDNA之 msdRNA部分;3)編碼逆轉錄子逆轉錄酶(RT)之序列,其中該 msdRNA能夠由逆轉錄子逆轉錄酶(RT)逆轉錄形成msDNA;及4)在該msd基因座、該msr基因座上游、該msd基因座上游或下游處或內部插入之異源核酸;其中經工程改造之核酸構築體基於及/或類似於編碼以下涵蓋之野生型或共有逆轉錄子ncRNA的野生型或共有逆轉錄子之二級結構經工程改造:a)如表B之任一SEQ ID NO:及/或圖2-27所描繪的任一序列及/或結構;或b) a)之變異體,其具有:i)每10個紅色字母核苷酸多達1、2或3個(例如,多達1個)核苷酸變化;ii)每10個黑色字母核苷酸多達4、5或6個(例如,多達1或2個)核苷酸變化;及/或iii)每10個灰色字母核苷酸多達7、8或9個(例如,多達3或4個)核苷酸變化;及/或視情況進一步包含:i)每10個紅色圓圈核苷酸存在7、8、9或10個(例如,9或10個)核苷酸;ii)每10個黑色圓圈核苷酸存在6、7、8、9或10個(例如,8、9或10個)核苷酸;iii)每10個灰色圓圈核苷酸存在4、5、6、7、8、9或10個(例如,6、7、8、9或10個)核苷酸;及/或iv)每10個白色圓圈核苷酸存在2、3、4、5、6、7、8、9或10個(例如,4、5、6、7、8、9或10個)核苷酸;其中ncRNA不包含與表X之序列相關的ncRNA。 In certain embodiments, the engineered nucleic acid construct comprises: 1) an msr locus (which encodes the msr RNA portion of msDNA); 2) an msd locus, which encodes the msd RNA portion of msDNA; 3) a sequence encoding a retrotranscriptase (RT), wherein the msd RNA is capable of being reverse transcribed by the retrotranscriptase (RT) to form msDNA; and 4) a heterologous nucleic acid inserted at the msd locus, upstream of the msr locus, upstream or downstream of the msd locus, or within the msd locus; wherein the engineered nucleic acid construct is based on and/or is similar to the secondary structure of a wild-type or common retrotranscript encoding a wild-type or common retrotranscript ncRNA covered by: a) any one of SEQ ID NO: in Table B and/or any one of the sequences and/or structures depicted in Figures 2-27; or b) a) having: i) up to 1, 2 or 3 (e.g., up to 1) nucleotide changes for every 10 nucleotides in red letters; ii) up to 4, 5 or 6 (e.g., up to 1 or 2) nucleotide changes for every 10 nucleotides in black letters; and/or iii) up to 7, 8 or 9 (e.g., up to 3 or 4) nucleotide changes for every 10 nucleotides in grey letters; and/or further comprising, as appropriate: i) 7, 8, 9 or 10 (e.g., 9 or 10) nucleotides for every 10 nucleotides in red circles; i i) 6, 7, 8, 9 or 10 (e.g., 8, 9 or 10) nucleotides are present for every 10 black circled nucleotides; iii) 4, 5, 6, 7, 8, 9 or 10 (e.g., 6, 7, 8, 9 or 10) nucleotides are present for every 10 grey circled nucleotides; and/or iv) 2, 3, 4, 5, 6, 7, 8, 9 or 10 (e.g., 4, 5, 6, 7, 8, 9 or 10) nucleotides are present for every 10 white circled nucleotides; wherein the ncRNA does not include an ncRNA related to a sequence of Table X.

經工程改造之核酸構築體(例如,經工程改造之逆轉錄子)可在 msr基因座及/或 msd基因座中包含一或多種序列修飾(例如,一或多個核苷酸之插入、缺失及/或取代),該一或多種序列修飾:a)調節(例如,增強) msDNA之逆轉錄、可加工性、準確性/保真度及/或產生(例如,在哺乳動物細胞中);b)調節(例如,降低)宿主(例如,包含哺乳動物細胞之宿主)中由經工程改造之逆轉錄子(例如, msr基因座及/或 msd基因座)編碼的ncRNA之免疫原性;c)調節(例如,永久或短暫地抑制) msDNA之功能;及/或d)調節(例如,改良)靶向基因體編輯/工程改造之效率。 An engineered nucleic acid construct (e.g., an engineered retrotranscript) can comprise one or more sequence modifications (e.g., insertion, deletion, and/or substitution of one or more nucleotides) in the msr locus and/or the msd locus that: a) modulates (e.g., enhances) the reverse transcription, processibility, accuracy/fidelity, and/or production of msDNA (e.g., in mammalian cells); b) modulates (e.g., reduces) the immunogenicity of ncRNA encoded by the engineered retrotranscript (e.g., msr locus and/or msd locus) in a host (e.g., a host comprising mammalian cells); c) modulates (e.g., permanently or transiently inhibits) the function of msDNA; and/or d) modulates (e.g., improves) the efficiency of targeted genome editing/engineering.

在一些實施例中,經工程改造之核酸構築體(例如,經工程改造之逆轉錄子)基於及/或類似於編碼以下涵蓋之野生型或共有逆轉錄子ncRNA的野生型或共有逆轉錄子之二級結構經工程改造:a)表B ncRNA序列中之任一序列及/或圖2-27中任一者所描繪的結構;或b) a)之變異體,其具有:i)每10個紅色字母核苷酸多達1、2或3個(例如,多達1個)核苷酸變化;ii)每10個黑色字母核苷酸多達4、5或6個(例如,多達1或2個)核苷酸變化;及/或iii)每10個灰色字母核苷酸多達7、8或9個(例如,多達3或4個)核苷酸變化;及/或視情況進一步包含:i)每10個紅色圓圈核苷酸存在7、8、9或10個(例如,9或10個)核苷酸;ii)每10個黑色圓圈核苷酸存在6、7、8、9或10個(例如,8、9或10個)核苷酸;iii)每10個灰色圓圈核苷酸存在4、5、6、7、8、9或10個(例如,6、7、8、9或10個)核苷酸;及/或iv)每10個白色圓圈核苷酸存在2、3、4、5、6、7、8、9或10個(例如,4、5、6、7、8、9或10個)核苷酸。In some embodiments, the engineered nucleic acid construct (e.g., engineered retrotranscript) is based on and/or is engineered to resemble the secondary structure of a wild-type or consensus retrotranscript encoding a wild-type or consensus retrotranscript ncRNA encompassed below: a) any of the sequences in Table B ncRNA sequences and/or the structures depicted in any of Figures 2-27; or b) a variant of a) having: i) up to 1, 2, or 3 (e.g., up to 1) nucleotide changes for every 10 red lettered nucleotides; ii) up to 4, 5, or 6 (e.g., up to 1 or 2) nucleotide changes for every 10 black lettered nucleotides; and/or iii) up to 7, 8, or 9 (e.g., up to 3 or 4) nucleotide changes for every 10 grey lettered nucleotides; and/or optionally further comprising: i) 7, 8, 9, or 10 (e.g., up to 1) nucleotide changes for every 10 red circled nucleotides. , 9 or 10) nucleotides; ii) there are 6, 7, 8, 9 or 10 (e.g., 8, 9 or 10) nucleotides for every 10 black circled nucleotides; iii) there are 4, 5, 6, 7, 8, 9 or 10 (e.g., 6, 7, 8, 9 or 10) nucleotides for every 10 grey circled nucleotides; and/or iv) there are 2, 3, 4, 5, 6, 7, 8, 9 or 10 (e.g., 4, 5, 6, 7, 8, 9 or 10) nucleotides for every 10 white circled nucleotides.

本揭示案之另一態樣提供一種包含載體之載體系統,該載體包含本文所述的經工程改造之逆轉錄子。Another aspect of the disclosure provides a vector system comprising a vector comprising an engineered retrotransposon as described herein.

本揭示案之另一態樣提供一種經分離之宿主細胞,該經分離之宿主細胞包含本文所述的經工程改造之逆轉錄子或本文所述之載體系統。Another aspect of the disclosure provides an isolated host cell comprising an engineered retrotransposon described herein or a vector system described herein.

本揭示案之另一態樣提供一種醫藥組合物,該醫藥組合物包含本文所述的經工程改造之逆轉錄子或本文所述之載體系統。Another aspect of the present disclosure provides a pharmaceutical composition comprising the engineered retrotransposons described herein or the vector systems described herein.

本揭示案之另一態樣提供一種遞送媒劑,該遞送媒劑包含本文所述的經工程改造之逆轉錄子或由本文所述的經工程改造之逆轉錄子編碼之ncRNA、本文所述之載體或載體系統、本文所述之宿主細胞或本文所述之醫藥組合物。Another aspect of the present disclosure provides a delivery vehicle comprising an engineered retrotranscript described herein or an ncRNA encoded by an engineered retrotranscript described herein, a vector or vector system described herein, a host cell described herein, or a pharmaceutical composition described herein.

本揭示案之另一態樣提供一種套組,該套組包含本文所述的經工程改造之逆轉錄子或由本文所述的經工程改造之逆轉錄子編碼之ncRNA,以及視情況,關於使用本文所述的經工程改造之逆轉錄子或由本文所述的經工程改造之逆轉錄子編碼之ncRNA對細胞進行遺傳修飾之說明書。Another aspect of the disclosure provides a kit comprising an engineered retrotranscript described herein or an ncRNA encoded by an engineered retrotranscript described herein, and, as appropriate, instructions for genetically modifying a cell using the engineered retrotranscript described herein or the ncRNA encoded by an engineered retrotranscript described herein.

本揭示案之另一態樣提供一種修飾宿主細胞( 例如,哺乳動物細胞)中之標靶DNA序列的方法,該方法包括將本發明之經工程改造之逆轉錄子、由本發明之經工程改造之逆轉錄子或本文所述之載體/載體系統編碼的ncRNA引入宿主細胞( 例如,哺乳動物細胞)中,以允許在宿主細胞( 例如,哺乳動物細胞)中產生msDNA,其中msDNA中之至少一部分異源核酸經整合至宿主( 例如,哺乳動物)細胞基因體中之標靶DNA序列處。視情況,由合適核酸酶,諸如CRISPR/Cas效應酶、ZFN、TALEN、大範圍核酸酶、TnpB、IscB或限制性核酸內切酶(RE)識別標靶序列,且由該核酸酶產生雙股斷裂(DSB)以促進/促成異源核酸部分插入標靶序列中。進一步視情況,由異源核酸部分修飾/插入之標靶序列無法再由該核酸酶識別來再產生DSB。 Another aspect of the present disclosure provides a method for modifying a target DNA sequence in a host cell ( e.g. , a mammalian cell), the method comprising introducing an engineered retrotranscript of the present invention, an ncRNA encoded by an engineered retrotranscript of the present invention, or a vector/vector system described herein into a host cell ( e.g. , a mammalian cell) to allow production of msDNA in the host cell ( e.g. , a mammalian cell), wherein at least a portion of the heterologous nucleic acid in the msDNA is integrated into the target DNA sequence in the host ( e.g. , mammalian) cell genome. Optionally, the target sequence is recognized by a suitable nuclease, such as CRISPR/Cas effector enzyme, ZFN, TALEN, meganuclease, TnpB, IscB or restriction endonuclease (RE), and a double strand break (DSB) is generated by the nuclease to promote/facilitate the insertion of the heterologous nucleic acid portion into the target sequence. Further, as the case may be, the target sequence modified/inserted by the heterologous nucleic acid portion can no longer be recognized by the nuclease to regenerate a DSB.

本揭示案之另一態樣提供經工程改造之逆轉錄子在本文所述之各種方法中的用途。Another aspect of the disclosure provides for use of the engineered retrotranscriptors in the various methods described herein.

本揭示案之另一態樣提供一種基因體編輯系統,其包含:a)能夠作用於基因體( 例如,人類基因體)上之標靶位點的核酸酶,諸如CRISPR/Cas效應酶、ZFN、TALEN、大範圍核酸酶、TnpB、IscB或限制性核酸內切酶(RE);及b)本文所述的經工程改造之逆轉錄子,或由此編碼之ncRNA,或包含或編碼經工程改造之逆轉錄子或ncRNA之載體或載體系統。視情況,核酸酶可連接至經工程改造之逆轉錄子或經編碼之ncRNA的一或多個元件。例如,在一實施例中,核酸酶可連接( 例如,融合或結合)至本文所述的經工程改造之逆轉錄子之逆轉錄酶。在另一實施例中,核酸酶可進行銜接/結合以與核酸指導序列(諸如Cas酶之單引導RNA)形成複合物,其中該指導序列連接至本文所述的經工程改造之逆轉錄子之ncRNA及/或msDNA。 Another aspect of the present disclosure provides a genome editing system comprising: a) a nuclease capable of acting on a target site on a genome ( e.g. , a human genome), such as a CRISPR/Cas effector, ZFN, TALEN, meganuclease, TnpB, IscB, or a restriction endonuclease (RE); and b) an engineered retrotranscript described herein, or an ncRNA encoded thereby, or a vector or vector system comprising or encoding an engineered retrotranscript or ncRNA. Optionally, the nuclease may be linked to one or more elements of the engineered retrotranscript or the encoded ncRNA. For example, in one embodiment, the nuclease may be linked ( e.g. , fused or bound) to a reverse transcriptase of an engineered retrotranscript described herein. In another embodiment, the nuclease can be tethered/bound to form a complex with a nucleic acid guide sequence (such as a single guide RNA of a Cas enzyme), wherein the guide sequence is linked to the ncRNA and/or msDNA of the engineered retrotranscript described herein.

本揭示案之另一態樣提供一種增強之基因體編輯系統,其包含連接至生物分子的本揭示案之基因體編輯系統,該生物分子調節宿主DNA修復以便例如調節( 例如,增強)將異源核酸序列併入基因體( 例如,人類基因體)中。 Another aspect of the present disclosure provides an enhanced genome editing system comprising the genome editing system of the present disclosure linked to a biomolecule that regulates host DNA repair so as to, for example, regulate ( e.g. , enhance) the incorporation of a heterologous nucleic acid sequence into a genome ( e.g. , a human genome).

使用本文所述的本揭示案之一般態樣,在以下章節中進一步描述本揭示案之特定態樣及實施例。應理解,本揭示案之任一實施例(包括僅在實例或申請專利範圍中或僅在下文一個章節中描述之彼等)均可與本發明之任何一或多個額外實施例組合,除非此類組合被明確否認或不適當。 A. 重組逆轉錄子 Using the general aspects of the disclosure described herein, specific aspects and embodiments of the disclosure are further described in the following sections. It should be understood that any embodiment of the disclosure (including those described only in the examples or patent application or only in one section below) can be combined with any one or more additional embodiments of the invention, unless such combination is explicitly denied or inappropriate. A. Recombinant Retrotranscript

本揭示案提供經工程改造之逆轉錄子,以及包括或利用經工程改造之逆轉錄子進行基因體修飾(諸如基因體編輯、細胞記錄及重組工程)之組合物、系統及方法。The present disclosure provides engineered retrotranscripts, as well as compositions, systems and methods comprising or utilizing engineered retrotranscripts for genome modifications such as genome editing, cellular transcription and recombineering.

逆轉錄子最初於1984年在 黃色黏球菌(Myxococcus xanthus)細菌中發現,當時鑑定出細菌細胞中大量存在之短的多複本單股DNA (msDNA)。此後,在許多原核生物(諸如細菌)中發現了多種天然存在之逆轉錄子。 Retrotranscripts were first discovered in 1984 in the bacterium Myxococcus xanthus , when short, multi-copy single-stranded DNA (msDNA) was identified as being present in large quantities in bacterial cells. Since then, a variety of naturally occurring retrotranscripts have been discovered in many prokaryotes, such as bacteria.

如圖1A所描繪,逆轉錄子編碼且轉錄為單一RNA,其包含非編碼RNA (ncRNA)部分及編碼特殊逆轉錄酶(RT)之部分。逆轉錄子ncRNA ( msrmsd)係最終形成之雜合分子的前驅體,且其最初折疊成典型RNA二級結構,該結構由伴隨之RT識別。經轉譯之RT通常識別ncRNA中之某些二級結構,且結合 msd區域下游之RNA模板。RT自ncRNA內之雙股RNA結構(a1/a2區域)後立即發現的保守鳥苷(G)殘基之2’端開始,朝向其5'端起始RNA之逆轉錄。ncRNA之一部分充當逆轉錄之模板,且逆轉錄在到達 msr基因座之前終止。在逆轉錄期間,細胞RNase H降解充當模板之ncRNA區段,但不降解ncRNA之其他部分。逆轉錄之結果msDNA經由2’-5'磷酸二酯鍵保持與RNA模板共價連接,且使用msDNA之3'端與RNA模板進行鹼基配對。關於逆轉錄子編碼序列(包括RT編碼序列以及 msrmsd基因座)之一般或典型組織,以及藉由初始ncRNA轉錄本之逆轉錄合成msDNA,參見圖1A。 As depicted in Figure 1A, the retrotranscript is encoded and transcribed as a single RNA, which includes a non-coding RNA (ncRNA) portion and a portion encoding a specific reverse transcriptase (RT). The retrotranscript ncRNA ( msr and msd ) is the precursor of the final hybrid molecule, and it is initially folded into a typical RNA secondary structure, which is recognized by the accompanying RT. The translated RT usually recognizes certain secondary structures in the ncRNA and binds to the RNA template downstream of the msd region. RT starts from the 2' end of the conserved guanosine (G) residue found immediately after the double-stranded RNA structure (a1/a2 region) in the ncRNA and initiates reverse transcription of the RNA towards its 5' end. A portion of the ncRNA serves as a template for reverse transcription, and reverse transcription terminates before reaching the msr locus. During reverse transcription, cellular RNase H degrades the ncRNA segment that serves as a template, but does not degrade other parts of the ncRNA. As a result of reverse transcription, the msDNA remains covalently linked to the RNA template via a 2'-5' phosphodiester bond, and uses the 3' end of the msDNA for base pairing with the RNA template. See Figure 1A for the general or typical organization of the retrotranscript coding sequence (including the RT coding sequence and the msr and msd loci), and the synthesis of msDNA by reverse transcription of the initial ncRNA transcript.

許多逆轉錄子亦含有輔助蛋白(圖1A中未描繪),該輔助蛋白可具有可能未完全理解之可變功能。在某些實施例中,本文所述的經工程改造之逆轉錄子不包含與野生型或模板逆轉錄子天然相關之輔助蛋白。Many retrotranscripts also contain accessory proteins (not depicted in Figure 1A) that may have variable functions that may not be fully understood. In certain embodiments, the engineered retrotranscripts described herein do not include accessory proteins that are naturally associated with the wild-type or template retrotranscript.

申請人已基於多種標準(包括序列同源性及保守之預測二級結構)自自然界中發現、分析了7257種先前未知之逆轉錄子,且對其進行種系發生分類,且已基於序列同源性及/或保守之預測二級結構將此等逆轉錄子分組至不同的種系發生進化枝。此等進化枝包括IA_IIA1型(圖2)、1B1型(圖3)、IB2型(圖4)、1C型(圖5)、其他IIA1型(圖6)、IIA2型(圖7)、IIA3型(圖8)、IIA4型(圖9)、IIA5型(圖10)、IIIA1型(圖11)、IIIA2型(圖12)、IIIA3型(圖13)、IIIA4型(圖14)、IIIA5型(圖15)、IIIunk型(圖16)、IV型(圖IX)、V型(圖19)、VI型(圖20)、XI型組1(圖21)、XI型(組2) (圖22)、XII型(圖23)、XIII型(圖24)、XIV型(圖25)、Eco107樣(圖26)及外群A (圖27)。本揭示案進一步描述此等新發現之逆轉錄子序列的工程改造及/或修飾,作為獲得有用之重組逆轉錄子(諸如圖1B中所描繪之彼等)之起始點。The applicant has discovered, analyzed, and phylogenetically classified 7,257 previously unknown retrotranscripts from nature based on multiple criteria, including sequence homology and conserved predicted secondary structure, and has grouped these retrotranscripts into different phylogenetic clades based on sequence homology and/or conserved predicted secondary structure. These clades include IA_IIA1 (Fig. 2), 1B1 (Fig. 3), IB2 (Fig. 4), 1C (Fig. 5), other IIA1 (Fig. 6), IIA2 (Fig. 7), IIA3 (Fig. 8), IIA4 (Fig. 9), IIA5 (Fig. 10), IIIA1 (Fig. 11), IIIA2 (Fig. 12), IIIA3 (Fig. 13), IIIA4 (Fig. 14), IIIA5 (Fig. 15), IIIunk (Fig. 16), IV (Fig. IX), V (Fig. 19), VI (Fig. 20), XI group 1 (Fig. 21), XI (group 2) (Fig. 22), XII (Fig. 23), XIII (Fig. 24), XIV (Fig. 25), Eco107-like (Fig. 26) and outgroup A (Fig. 27). This disclosure further describes the engineering and/or modification of these newly discovered retrotranscript sequences as a starting point for obtaining useful recombinant retrotranscripts such as those depicted in FIG. 1B .

圖1B. 1描繪本揭示案所考慮之重組逆轉錄子構築體(例如,經選殖至表現載體中之核苷酸序列)之一實施例。在左上方示意圖中,單一細黑線表示雙股核苷酸序列(例如,如選殖至表現載體(諸如質體)中)。藉由修飾編碼ncRNA (msr/msd區域)之起始點逆轉錄子DNA序列(諸如本文所揭示之7257個新發現之逆轉錄子序列中的任一者,且特定言之表B之7257個ncRNA序列中的任一者來構建重組逆轉錄子。編碼ncRNA之起始點逆轉錄子DNA序列可以多種方式進行修飾,且可包括一種修飾或超過一種修飾。舉例而言,逆轉錄子DNA可經修飾以含有至少一種核苷酸修飾,包括單一核苷酸取代、插入或缺失,或超過一個核苷酸之取代、插入或缺失,亦即,起始點逆轉錄子(例如,野生型逆轉錄子)之多達2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49、50、51、52、53、54、55、56、57、58、59、60、61、62、63、64、65、66、67、68、69、70、71、72、73、74、75、76、77、78、79、80、81、82、83、84、85、86、87、88、89、90、91、92、93、94、95、96、97、98、99個或多達100個或多達200、300、400、500、600、700、800、900、1000、1100、1200、1300、1400、1500、1600、1700、1800、1900個或多達2000個核苷酸經取代、插入或缺失。當起始點逆轉錄子(例如,野生型逆轉錄子)之超過一個核苷酸經取代、缺失或插入時,該等核苷酸可為連續或不連續的。雖然經工程改造之逆轉錄子整體並非天然存在的,但其可包括自然界中確實存在之組分,諸如核苷酸序列。例如,經工程改造之逆轉錄子可具有來自不同生物體( 例如,來自不同細菌物種)或來自完全合成/人工/重組核酸序列之核苷酸序列。因此,經工程改造之逆轉錄子可具有細菌核苷酸序列、人類核苷酸序列、病毒核苷酸序列及/或合成/人工/重組核苷酸序列及/或此類序列之組合。本文所揭示之重組逆轉錄子的修飾之實例包括將異源核酸序列插入逆轉錄子中,例如插入ncRNA基因座中,諸如msr或msd基因座中。將引導RNA分子連接至5'及/或3'端(亦即,將一分子連接至ncRNA之5'端及/或將一分子連接至ncRNA之3'端)亦表示本文所揭示之重組逆轉錄子所考慮的另一修飾。在此類實施例中,引導RNA分子亦可分類或更一般地稱為用於修飾起始點逆轉錄子之異源核酸序列之類型。此等修飾描繪於圖1B中。 FIG. 1B.1 depicts one embodiment of a recombinant retrotranscript construct contemplated by the present disclosure (e.g., a nucleotide sequence cloned into an expression vector). In the upper left schematic, a single thin black line represents a double-stranded nucleotide sequence (e.g., as cloned into an expression vector such as a plasmid). By modifying the encoding ncRNA The starting point retrotranscript DNA sequence of (msr/msd region) (such as any of the 7257 newly discovered retrotranscript sequences disclosed herein, and in particular any of the 7257 ncRNA sequences in Table B) is used to construct a recombinant retrotranscript. The starting point retrotranscript DNA sequence encoding the ncRNA can be modified in a variety of ways, and can include one modification or more than one modification. For example, the retrotranscript DNA can be modified to contain at least one nucleotide modification, including a single nucleotide substitution, insertion or deletion, or a substitution, insertion or deletion of more than one nucleotide, that is, up to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75 13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,5 3, 5 4, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94 ,95, 96, 97, 98, 99 or up to 100 or up to 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900 or up to 2000 nucleotides are substituted, inserted or deleted. When more than one nucleotide of a starting point retrotran (e.g., a wild-type retrotran) is substituted, deleted or inserted, the nucleotides may be consecutive or non-consecutive. Although an engineered retrotran is not naturally occurring as a whole, it may include components that do exist in nature, such as nucleotide sequences. For example, an engineered retrotran may have nucleotides from different organisms. ( e.g. , from different bacterial species) or from a nucleotide sequence of a completely synthetic/artificial/recombinant nucleic acid sequence. Therefore, the engineered retrotransposons may have bacterial nucleotide sequences, human nucleotide sequences, viral nucleotide sequences and/or synthetic/artificial/recombinant nucleotide sequences and/or combinations of such sequences. Examples of modifications of the recombinant retrotransposons disclosed herein include inserting a heterologous nucleic acid sequence into the retrotransposons, for example, into a ncRNA locus, such as an msr or msd locus. Connecting a guide RNA molecule to the 5' and/or 3' end (i.e., connecting a molecule to the 5' end of the ncRNA and/or connecting a molecule to the 3' end of the ncRNA) also represents another modification considered for the recombinant retrotransposons disclosed herein. In such embodiments, the guide RNA molecule may also be classified or more generally referred to as the type of heterologous nucleic acid sequence used to modify the starting point retrotransposons. These modifications are depicted in Figure 1B.

除了編碼ncRNA之DNA以外,亦可修飾編碼RT之DNA以獲得重組RT。舉例而言,編碼RT之DNA可經修飾以含有至少一種核苷酸修飾,包括單一核苷酸取代、插入或缺失,或超過一個核苷酸之取代、插入或缺失,亦即,RT基因內之起始點逆轉錄子(例如,野生型逆轉錄子)之多達2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49、50、51、52、53、54、55、56、57、58、59、60、61、62、63、64、65、66、67、68、69、70、71、72、73、74、75、76、77、78、79、80、81、82、83、84、85、86、87、88、89、90、91、92、93、94、95、96、97、98、99個或多達100個或多達200、300、400、500、600、700、800、900、1000、1100、1200、1300、1400、1500、1600、1700、1800、1900個或多達2000個核苷酸經取代、插入或缺失。In addition to DNA encoding ncRNA, DNA encoding RT can also be modified to obtain recombinant RT. For example, the DNA encoding RT can be modified to contain at least one nucleotide modification, including a single nucleotide substitution, insertion or deletion, or a substitution, insertion or deletion of more than one nucleotide, that is, up to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900 or up to 2000 nucleotides are substituted, inserted or deleted.

對編碼ncRNA及/或RT之DNA進行的此類修飾可以多種方式調節ncRNA及/或RT之功能,包括i)調節( 例如,增強) msDNA之逆轉錄、可加工性、準確性/保真度及/或產生( 例如,在哺乳動物細胞中);ii)調節( 例如,降低)宿主( 例如,包含哺乳動物細胞之宿主)中由經工程改造之逆轉錄子編碼的ncRNA ( msr基因座及 msd基因座)之免疫原性;iii)調節( 例如,永久或短暫地抑制) msDNA之功能;及/或iv)調節( 例如,改良)靶向基因體編輯/工程改造之效率。 Such modifications to the DNA encoding the ncRNA and/or RT can modulate the function of the ncRNA and/or RT in a variety of ways, including i) modulating ( e.g. , enhancing) the reverse transcription, processibility, accuracy/fidelity and/or production of msDNA ( e.g. , in mammalian cells); ii) modulating ( e.g. , reducing) the immunogenicity of the ncRNA ( msr locus and msd locus) encoded by the engineered retrotranscript in a host ( e.g. , a host comprising mammalian cells); iii) modulating ( e.g. , permanently or transiently inhibiting) the function of msDNA; and/or iv) modulating ( e.g. , improving) the efficiency of targeted genome editing/engineering.

在一實施例中,本揭示案提供具有以下一般結構之重組逆轉錄子:a) msr基因座;b) msd基因座,其編碼msDNA之 msdRNA部分;c)編碼逆轉錄子逆轉錄酶(RT)之序列(視情況相對於ncRNA呈 反式),其中該 msdRNA能夠由逆轉錄子逆轉錄酶(RT)逆轉錄( 例如在宿主細胞中,諸如哺乳動物細胞)形成msDNA;d)能夠經 msr基因座及/或 msd基因座轉錄之異源核酸( 例如,異源DNA),視情況,在該 msd基因座、該 msr基因座上游、該 msd基因座上游或下游處或內部插入該異源核酸。 In one embodiment, the present disclosure provides a recombinant retrotranscriptase having the following general structure: a) an msr locus; b) an msd locus encoding the msd RNA portion of the msDNA; c) a sequence encoding a retrotranscriptase (RT) (optionally in trans relative to the ncRNA), wherein the msd RNA is capable of being reverse transcribed ( e.g. , in a host cell, such as a mammalian cell) by the retrotranscriptase (RT) to form msDNA; d) a heterologous nucleic acid ( e.g. , heterologous DNA) capable of being transcribed through the msr locus and/or the msd locus, optionally inserted at the msd locus, upstream of the msr locus, upstream or downstream of the msd locus, or within the msd locus.

本發明之經工程改造之逆轉錄子視情況在結構上進一步經修飾以包括一或多種異源核酸。經工程改造之逆轉錄子可進一步經修飾以提供各種功能改良,諸如(但不限於)增強細胞( 例如哺乳動物細胞,包括人類細胞)中之msDNA產生。 The engineered retrotranscripts of the present invention are optionally further modified in structure to include one or more heterologous nucleic acids. The engineered retrotranscripts can be further modified to provide various functional improvements, such as (but not limited to) enhancing msDNA production in cells ( e.g., mammalian cells, including human cells).

在某些實施例中,本揭示案提供經工程改造之逆轉錄子,該等逆轉錄子基於其保守之預測二級結構,諸如圖2-27中之彼等。In certain embodiments, the present disclosure provides engineered retrotranscripts based on their conservative predicted secondary structures, such as those in Figures 2-27.

在其他實施例中,本揭示案提供經工程改造之逆轉錄子,該等逆轉錄子基於其序列一致性。表A提供例示性RT胺基酸序列及 ret基因核酸序列。表C提供例示性RT共有胺基酸序列及/或 ret基因核酸序列。表B提供例示性ncRNA序列。 In other embodiments, the disclosure provides engineered retrotransposons based on their sequence identity. Table A provides exemplary RT amino acid sequences and ret gene nucleic acid sequences. Table C provides exemplary RT consensus amino acid sequences and/or ret gene nucleic acid sequences. Table B provides exemplary ncRNA sequences.

表X提供可在一些實施例中在本發明之範圍外部提供之逆轉錄子序列。Table X provides retrotranscript sequences that may, in some embodiments, be provided outside the scope of the present invention.

在某些實施例中,本發明之例示性經工程改造之逆轉錄子(1)基於如圖2-27中任一者所描繪之二級結構經工程改造或類似於該等二級結構經工程改造,及/或(2)提供於表B中。具有顯著序列一致性百分比( 例如,至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性)之序列亦在本發明之範圍內。 In certain embodiments, exemplary engineered retrotranscripts of the invention (1) are engineered based on a secondary structure as depicted in any of Figures 2-27 or engineered similar to such secondary structures, and/or (2) are provided in Table B. Sequences having a significant percentage of sequence identity ( e.g. , at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity) are also within the scope of the invention.

在某些實施例中,經工程改造之核酸構築體包含:1) msr基因座(其編碼msDNA之msr RNA部分);2) msd基因座,其編碼msDNA之 msdRNA部分;3)編碼逆轉錄子逆轉錄酶(RT)之序列,其中該 msdRNA能夠由逆轉錄子逆轉錄酶(RT)逆轉錄形成msDNA;及4)在該msd基因座、該msr基因座上游、該msd基因座上游或下游處或內部插入之異源核酸;其中經工程改造之核酸構築體基於及/或類似於編碼以下涵蓋之野生型或共有逆轉錄子ncRNA的野生型或共有逆轉錄子之二級結構經工程改造:a)如表B及/或圖2-27之任一SEQ ID No所描繪的任一序列及/或結構;或b) a)之變異體,其具有:i)每10個紅色字母核苷酸多達1、2或3個(例如,多達1個)核苷酸變化;ii)每10個黑色字母核苷酸多達4、5或6個(例如,多達1或2個)核苷酸變化;及/或iii)每10個灰色字母核苷酸多達7、8或9個(例如,多達3或4個)核苷酸變化;及/或視情況進一步包含:i)每10個紅色圓圈核苷酸存在7、8、9或10個(例如,9或10個)核苷酸;ii)每10個黑色圓圈核苷酸存在6、7、8、9或10個(例如,8、9或10個)核苷酸;iii)每10個灰色圓圈核苷酸存在4、5、6、7、8、9或10個(例如,6、7、8、9或10個)核苷酸;及/或iv)每10個白色圓圈核苷酸存在2、3、4、5、6、7、8、9或10個(例如,4、5、6、7、8、9或10個)核苷酸;其中ncRNA不包含與表X之序列相關的ncRNA。 In certain embodiments, the engineered nucleic acid construct comprises: 1) an msr locus (which encodes the msr RNA portion of msDNA); 2) an msd locus, which encodes the msd RNA portion of msDNA; 3) a sequence encoding a retrotranscriptase (RT), wherein the msd RNA is capable of being reverse transcribed by the retrotranscriptase (RT) to form msDNA; and 4) a heterologous nucleic acid inserted at the msd locus, upstream of the msr locus, upstream or downstream of the msd locus, or within the msd locus; wherein the engineered nucleic acid construct is engineered based on and/or similar to the secondary structure of a wild-type or common retrotranscript encoding a wild-type or common retrotranscript ncRNA covered by: a) any sequence and/or structure depicted in any SEQ ID No of Table B and/or Figures 2-27; or b) a) having: i) up to 1, 2 or 3 (e.g., up to 1) nucleotide changes for every 10 nucleotides in red letters; ii) up to 4, 5 or 6 (e.g., up to 1 or 2) nucleotide changes for every 10 nucleotides in black letters; and/or iii) up to 7, 8 or 9 (e.g., up to 3 or 4) nucleotide changes for every 10 nucleotides in grey letters; and/or further comprising, as appropriate: i) 7, 8, 9 or 10 (e.g., 9 or 10) nucleotides for every 10 nucleotides in red circles; i i) 6, 7, 8, 9 or 10 (e.g., 8, 9 or 10) nucleotides are present for every 10 black circled nucleotides; iii) 4, 5, 6, 7, 8, 9 or 10 (e.g., 6, 7, 8, 9 or 10) nucleotides are present for every 10 grey circled nucleotides; and/or iv) 2, 3, 4, 5, 6, 7, 8, 9 or 10 (e.g., 4, 5, 6, 7, 8, 9 or 10) nucleotides are present for every 10 white circled nucleotides; wherein the ncRNA does not comprise an ncRNA related to a sequence of Table X.

在某些實施例中,異源核酸之插入包括逆轉錄子核酸之缺失。在某些實施例中,插入之異源核酸取代缺失之逆轉錄子核酸。不要求插入之異源核酸及缺失之逆轉錄子核酸具有相同或相似大小。在某些實施例中,取代包含置換逆轉錄子之髮夾環區之一部分。在某些實施例中,取代包含置換逆轉錄子之莖-環區之一部分。在某些實施例中,取代包含置換逆轉錄子中不包含莖或環之區域。在某些情況下,逆轉錄子結構之鑑定涉及與本文所提供之模型ncRNA結構進行比較,例如與圖2-27中之一或多者進行比較。在某些情況下,逆轉錄子結構之鑑定涉及藉由核酸折疊算法進行建模,諸如但不限於RNAfold (Gruber AR等人, The Vienna RNA Websuite. Nucleic Acids Research, 第36卷, 第suppl_2期, 2008年7月1日, 第W70-W74頁)、UNAFold (Markham NR等人, UNAFold: software for nucleic acid folding and hybridization. Methods Mol Biol. 第453卷, 2008, 第3-31頁)及SPOT-RNA (Singh J等人, RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nature communications, 第10卷, 2019, 第1-13頁)。 In certain embodiments, the insertion of heterologous nucleic acid includes the deletion of retrotransposons. In certain embodiments, the inserted heterologous nucleic acid replaces the deleted retrotransposons. It is not required that the inserted heterologous nucleic acid and the deleted retrotransposons have the same or similar size. In certain embodiments, the replacement comprises replacing a portion of the hairpin loop region of the retrotransposons. In certain embodiments, the replacement comprises replacing a portion of the stem-loop region of the retrotransposons. In certain embodiments, the replacement comprises replacing a region of the retrotransposons that does not include a stem or loop. In certain cases, the identification of the retrotransposons structure involves comparison with the model ncRNA structures provided herein, for example, comparison with one or more of Figures 2-27. In some cases, identification of retrotranscript structure involves modeling by nucleic acid folding algorithms, such as but not limited to RNAfold (Gruber AR et al., The Vienna RNA Websuite. Nucleic Acids Research , Vol. 36, No. suppl_2, July 1, 2008, pp. W70-W74), UNAFold (Markham NR et al., UNAFold: software for nucleic acid folding and hybridization. Methods Mol Biol. Vol. 453, 2008, pp. 3-31), and SPOT-RNA (Singh J et al., RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nature communications , Vol. 10, 2019, pp. 1-13).

在某些實施例中,本發明之經工程改造之逆轉錄子編碼逆轉錄酶(RT)或其功能結構域,其包含:i)表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽。在一些實施例中,RT不包含表X中鑑定出之多肽。In certain embodiments, the engineered retrotranscript of the present invention encodes a reverse transcriptase (RT) or a functional domain thereof comprising: i) a polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polypeptide listed in Table A. In some embodiments, RT does not comprise a polypeptide identified in Table X.

在某些實施例中,本發明之經工程改造之逆轉錄子編碼逆轉錄酶(RT)或其功能結構域,其包含:i)表A中列出之多核苷酸,或與表A之多核苷酸具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多核苷酸;及/或ii)表A中列出之共有多核苷酸序列。在一些實施例中,編碼RT之多核苷酸不包含表X中鑑定出之多核苷酸。In certain embodiments, the engineered retrotranscript of the present invention encodes a reverse transcriptase (RT) or a functional domain thereof comprising: i) a polynucleotide listed in Table A, or a polynucleotide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polynucleotide of Table A; and/or ii) a consensus polynucleotide sequence listed in Table A. In some embodiments, the polynucleotide encoding RT does not comprise a polynucleotide identified in Table X.

在某些實施例中,本發明之經工程改造之逆轉錄子編碼ncRNA,其包含:(I)表B中列出之ncRNA,或與表B之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA。In certain embodiments, the engineered retrotranscripts of the present invention encode ncRNAs comprising: (I) an ncRNA listed in Table B, or an ncRNA having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to the ncRNA of Table B.

本發明之經工程改造之ncRNA在大小上可不同於作為其基礎之ncRNA,且保留於經工程改造之ncRNA中的ncRNA之比例可變化。在某些實施例中,ncRNA之保留量為ncRNA之約50%、約55%、約60%、約65%、約70%、約75%、約80%、約85%、約88%、約90%、約91%、約92%、約93%、約94%、約95%、約96%、約97%、約98%、約99%、約99.5%,或50%至80%,或60%至85%,70%至90%,或80%至95%,或85%至98%,或90%至99%,或全部。The engineered ncRNA of the present invention may be different in size from the ncRNA on which it is based, and the proportion of ncRNA retained in the engineered ncRNA may vary. In certain embodiments, the retention of the ncRNA is about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 88%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, about 99.5%, or 50% to 80%, or 60% to 85%, 70% to 90%, or 80% to 95%, or 85% to 98%, or 90% to 99%, or all of the ncRNA.

在某些實施例中,本發明之經工程改造之逆轉錄子編碼ncRNA及逆轉錄酶(RT)或其功能結構域,其中該ncRNA及該RT或其功能結構域如上文所述。In certain embodiments, the engineered retrotranscript of the present invention encodes ncRNA and reverse transcriptase (RT) or its functional domain, wherein the ncRNA and the RT or its functional domain are as described above.

特定言之,在此類實施例中,ncRNA可包含:(I)表B中列出之ncRNA,或與表B中列出之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA;且其中該ncRNA視情況排除與表X中鑑定出之序列相關的ncRNA。Specifically, in such embodiments, the ncRNA may include: (I) an ncRNA listed in Table B, or an ncRNA having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to an ncRNA listed in Table B; and wherein the ncRNA optionally excludes ncRNAs related to sequences identified in Table X.

又在此類實施例中,逆轉錄酶(RT)或其功能結構域包含:(A) i)表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或ii)表C中列出之多肽;視情況,RT不包含表X中鑑定出之多肽;或(B) i)表A中列出之多核苷酸,或與表A中列出之多核苷酸具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多核苷酸;視情況,編碼RT之多核苷酸不包含與表X中鑑定出之序列相關的多核苷酸。In such embodiments, the reverse transcriptase (RT) or its functional domain comprises: (A) i) a polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polypeptide listed in Table A; and/or ii) a polypeptide listed in Table C; optionally, RT does not comprise a polypeptide identified in Table X; or (B) i) a polynucleotide listed in Table A, or a polynucleotide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polynucleotide listed in Table A; optionally, the polynucleotide encoding RT does not include a polynucleotide related to a sequence identified in Table X.

在某些實施例中,異源核酸在>20個核苷酸與約10,000個核苷酸之間。In certain embodiments, the heterologous nucleic acid is between >20 nucleotides and about 10,000 nucleotides.

經工程改造之逆轉錄子可進一步在 msr基因座及/或 msd基因座中包含序列修飾( 例如,一或多個核苷酸之插入、缺失及/或取代),該序列修飾:i)調節( 例如,增強) msDNA之逆轉錄、可加工性、準確性/保真度及/或產生( 例如,在哺乳動物細胞中);ii)調節( 例如,降低)宿主( 例如,包含哺乳動物細胞之宿主)中由經工程改造之逆轉錄子編碼的ncRNA ( msr基因座及 msd基因座)之免疫原性;iii)包含調節( 例如,永久或短暫地抑制) msDNA功能之核苷酸序列;及/或iv)調節( 例如,改良)靶向基因體編輯/工程改造之效率。 The engineered retrotranscript may further comprise a sequence modification ( e.g. , insertion, deletion and/or substitution of one or more nucleotides) in the msr locus and/or msd locus that: i) modulates ( e.g. , enhances) the reverse transcription, processibility, accuracy/fidelity and/or production of msDNA ( e.g. , in mammalian cells); ii) modulates ( e.g. , reduces) the immunogenicity of ncRNA ( msr locus and msd locus) encoded by the engineered retrotranscript in a host ( e.g. , a host comprising mammalian cells); iii) comprises a nucleotide sequence that modulates ( e.g. , permanently or transiently inhibits) msDNA function; and/or iv) modulates ( e.g. , improves) the efficiency of targeted genome editing/engineering.

逆轉錄子 msr基因、 msd基因及RT核酸序列( 例如ret基因)以及可充當本文所述的經工程改造之逆轉錄子之模板的經編碼之逆轉錄子逆轉錄酶蛋白序列可源自任何來源,諸如表A中之彼等,視情況排除與表X之序列相關的彼等。 Retrotranscript msr gene, msd gene and RT nucleic acid sequence ( e.g. , ret gene) and the encoded retrotranscript reverse transcriptase protein sequence that can serve as a template for the engineered retrotranscript described herein can be derived from any source, such as those in Table A, excluding those related to the sequences of Table X as appropriate.

在一些實施例中,經工程改造之逆轉錄子中使用的 msr基因、 msd基因及RT編碼序列( 亦即ret基因)之模板或野生型(wt)序列源自細菌逆轉錄子。 In some embodiments, the template or wild-type (wt) sequence of the msr gene, msd gene, and RT coding sequence ( ie , ret gene) used in the engineered retrotranscript is derived from a bacterial retrotranscript.

在一些實施例中,代表性模板/野生型逆轉錄子來自革蘭氏陰性細菌。在一些實施例中,逆轉錄子來自表X中列出之細菌。In some embodiments, the representative template/wild-type retrotransposons are from Gram-negative bacteria. In some embodiments, the retrotransposons are from bacteria listed in Table X.

在一些實施例中,經工程改造之逆轉錄子基於關於逆轉錄子/逆轉錄子RT所定義之進化枝經工程改造,其中逆轉錄子與由ncRNA、RT及具有不同酶功能之額外蛋白質或RT融合結構域組成的三方繫統相關聯。參見例如「Mestre等人, Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classi fi cation of The Encoded Tripartite Systems, Nucleic Acids Research, 第48卷, 第22期, 2020年12月16日, 第12632-12647頁」(以引用之方式併入本文中)。雖然該等進化枝主要基於天然存在之ncRNA及逆轉錄子/逆轉錄子RT,以及額外蛋白質或RT融合結構域,但出於充當本發明之經工程改造之逆轉錄子的模板之目的,該等進化枝不限於天然存在之序列。反而,該等進化枝亦可涵蓋非天然存在之ncRNA及RT,包括但不限於重組、經修飾或改變、嵌合、雜合、合成、人工 In some embodiments, the engineered retrotranscript is engineered based on an evolutionary branch defined for retrotranscripts/retrotranscript RTs, wherein the retrotranscript is associated with a tripartite system consisting of ncRNA, RT, and an additional protein or RT fusion domain with different enzymatic functions. See, e.g., "Mestre et al., Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classification of The Encoded Tripartite Systems , Nucleic Acids Research, Vol. 48, No. 22, Dec. 16, 2020, pp. 12632-12647" (incorporated herein by reference). Although these evolutionary branches are mainly based on naturally occurring ncRNAs and retrotransposons/retrotransposons RTs, as well as additional proteins or RT fusion domains, for the purpose of serving as templates for the engineered retrotransposons of the present invention, these evolutionary branches are not limited to naturally occurring sequences. Instead, these evolutionary branches can also cover non-naturally occurring ncRNAs and RTs, including but not limited to recombinant, modified or altered, chimeric, hybrid, synthetic, artificial, etc.

因此,根據本揭示案,基於至少75% (至少1000次重複)之鄰接算法及不超過0.05之泊松校正距離量測值(基於逆轉錄子RT之比對),可認為逆轉錄子為種系發生上相關的。或者或另外,當/若相同RT或密切相關之RT可識別逆轉錄子之ncRNA的二級結構且保留轉錄逆轉錄子以產生msDNA時,可認為逆轉錄子為種系發生上相關的。Therefore, according to the present disclosure, retrotranscripts can be considered to be germline related based on a neighbor-joining algorithm of at least 75% (at least 1000 replicates) and a Poisson-corrected distance measure of no more than 0.05 (based on alignment of retrotranscript RTs). Alternatively or additionally, retrotranscripts can be considered to be germline related when/if the same RT or closely related RTs can recognize the secondary structure of the ncRNA of the retrotranscript and transcription of the retrotranscript is preserved to generate msDNA.

在某些實施例中,不同逆轉錄子序列( 例如,ncRNA及/或RT (蛋白質及/或核酸)序列)之間之序列比對或二級結構生成係基於一般技術者已知之軟體。 In certain embodiments, sequence alignment or secondary structure generation between different retrotranscript sequences ( eg , ncRNA and/or RT (protein and/or nucleic acid) sequences) is based on software known to those of ordinary skill in the art.

同一進化枝內之逆轉錄子ncRNA序列(包括 msrmsd序列)可能在某些位置處高度保守,而在其他位置處不太保守。 Retrotranscript ncRNA sequences (including msr and msd sequences) within the same evolutionary branch may be highly conserved at certain positions but less conserved at other positions.

生成基於進化枝成員之例示性共有序列(分別參見相應的圖2-27)以顯示此等保守序列及/或二級結構,包括在紅色字母核苷酸處具有至少97%序列保守性之高度保守核苷酸、在黑色字母核苷酸處具有90-97%之間序列保守性之彼等以及在灰色字母核苷酸處具有75-90%核苷酸序列一致性之彼等。進化枝共有序列之進一步結構限制以彩色圓圈形式提供,指示在彼特定位置處具有鹼基之概率,包括紅色圓圈表示97%情況下之鹼基,黑色圓圈表示90-97%情況下之鹼基,且灰色圓圈表示75-90% 情況下之鹼基。Exemplary consensus sequences based on clade members (see corresponding Figures 2-27, respectively) were generated to show these conserved sequences and/or secondary structures, including highly conserved nucleotides with at least 97% sequence conservation at red letter nucleotides, those with between 90-97% sequence conservation at black letter nucleotides, and those with 75-90% nucleotide sequence identity at gray letter nucleotides. Further structural constraints of the clade consensus sequences are provided in the form of colored circles, indicating the probability of having a base at that particular position, including red circles representing bases in 97% of the cases, black circles representing bases in 90-97% of the cases, and gray circles representing bases in 75-90% of the cases.

在一些實施例中,作為修飾本發明之經工程改造之逆轉錄子的基礎之模板ncRNA (包括 msrmsd區域序列)係各種逆轉錄子ncRNA (包括 msrmsd核酸序列)進化枝之共有序列,如表B及相應圖2-27之任一SEQ ID NO:分別所提供,包括高度保守且由特定顏色的字母或圓圈所描繪之鹼基,且視情況進一步包括可存在於特定位置處之由特定顏色的圓圈表示之鹼基。 In some embodiments, the template ncRNA (including msr and msd region sequences) as the basis for modifying the engineered retrotranscript of the present invention is a consensus sequence of various retrotranscript ncRNA (including msr and msd nucleic acid sequences) evolutionary branches, as provided in Table B and any SEQ ID NO: of the corresponding Figures 2-27, respectively, including highly conserved bases depicted by letters or circles of specific colors, and optionally further including bases represented by circles of specific colors that may be present at specific positions.

在一些實施例中,經工程改造之逆轉錄子基於及/或類似於編碼以下涵蓋之野生型或共有逆轉錄子ncRNA的野生型或共有逆轉錄子之二級結構經工程改造:1)如表B及相應圖2-27之任一SEQ ID NO:分別所描繪的序列及/或結構;或2) 1)之變異體,其具有:A)每10個紅色字母核苷酸多達1、2或3個( 例如,多達1個)核苷酸變化;B)每10個黑色字母核苷酸多達4、5或6個( 例如,多達1或2個)核苷酸變化;及/或C)每10個灰色字母核苷酸多達7、8或9個( 例如,多達3或4個)核苷酸變化。視情況,1)之變異體進一步包含:a)每10個紅色圓圈核苷酸存在7、8、9或10個( 例如,9或10個)核苷酸;b)每10個黑色圓圈核苷酸存在6、7、8、9或10個( 例如,8、9或10個)核苷酸;c)每10個灰色圓圈核苷酸存在4、5、6、7、8、9或10個( 例如,6、7、8、9或10個)核苷酸;及/或d)每10個白色圓圈核苷酸存在2、3、4、5、6、7、8、9或10個( 例如,4、5、6、7、8、9或10個)核苷酸。 In some embodiments, the engineered retrotranscript is based on and/or engineered to resemble the secondary structure of a wild-type or consensus retrotranscript encoding a wild-type or consensus retrotranscript ncRNA covered below: 1) a sequence and/or structure as depicted in any one of SEQ ID NOs: in Table B and corresponding Figures 2-27, respectively; or 2) a variant of 1) having: A) up to 1, 2, or 3 ( e.g. , up to 1) nucleotide changes per 10 red-letter nucleotides; B) up to 4, 5, or 6 ( e.g. , up to 1 or 2) nucleotide changes per 10 black-letter nucleotides; and/or C) up to 7, 8, or 9 ( e.g. , up to 3 or 4) nucleotide changes per 10 grey-letter nucleotides. Optionally, the variant of 1) further comprises: a) 7, 8, 9 or 10 ( e.g. , 9 or 10) nucleotides for every 10 red circled nucleotides; b) 6, 7, 8, 9 or 10 ( e.g. , 8, 9 or 10) nucleotides for every 10 black circled nucleotides; c) 4, 5, 6, 7, 8, 9 or 10 ( e.g. , 6, 7, 8, 9 or 10) nucleotides for every 10 grey circled nucleotides; and/or d) 2, 3, 4, 5, 6, 7, 8, 9 or 10 ( e.g. , 4, 5, 6, 7, 8, 9 or 10) nucleotides for every 10 white circled nucleotides.

經工程改造之逆轉錄子可藉由將序列修飾( 例如,缺失、添加或取代)引入編碼野生型逆轉錄子ncRNA之野生型逆轉錄子中或引入編碼共有逆轉錄子ncRNA之逆轉錄子中來進行工程改造。 An engineered retrotranscript can be engineered by introducing a sequence modification ( eg , a deletion, addition, or substitution) into a wild-type retrotranscript encoding a wild-type retrotranscript ncRNA or into a retrotranscript encoding a consensus retrotranscript ncRNA.

例如,變異體逆轉錄子可能不符合表B及相應圖2-27之任一SEQ ID NO:分別之序列及/或結構要求,但仍可為本文所述的經工程改造之逆轉錄子之合適模板,只要滿足A) – C)及/或a)-d)中陳述之一或多個條件即可。For example, a variant retrotransposons may not meet the sequence and/or structural requirements of any SEQ ID NO: in Table B and corresponding Figures 2-27, respectively, but may still be a suitable template for the engineered retrotransposons described herein, as long as one or more of the conditions set forth in A) - C) and/or a) - d) are met.

在某些實施例中,模板逆轉錄子中之高度保守序列在本文所述的經工程改造之逆轉錄子中係保留/保守的或實質上保留/保守的。In certain embodiments, highly conserved sequences in the template retrotranscripts are retained/conserved or substantially retained/conserved in the engineered retrotranscripts described herein.

在某些實施例中,所有或實質上所有紅色字母核苷酸( 亦即,在同一進化枝中之約97%或更多逆轉錄子中保守之彼等)在本文所述的經工程改造之逆轉錄子中係保留/保守的。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,每10個紅色字母核苷酸出現不超過1、2或3個( 例如,多達1個)核苷酸變化( 例如,缺失或取代)。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,不超過約0.3%、0.5%、1%、2%、3%、4%或5%之紅色字母核苷酸發生變化( 例如,缺失或取代)。 In certain embodiments, all or substantially all red-letter nucleotides ( i.e. , those conserved in about 97% or more retrotranscripts in the same evolutionary branch) are retained/conserved in the engineered retrotranscripts described herein. In certain embodiments, no more than 1, 2, or 3 ( e.g. , up to 1) nucleotide changes ( e.g. , deletions or substitutions) occur for every 10 red-letter nucleotides in the engineered retrotranscripts described herein. In certain embodiments, no more than about 0.3%, 0.5%, 1%, 2%, 3%, 4%, or 5% of the red-letter nucleotides are changed ( e.g. , deleted or substituted) in the engineered retrotranscripts described herein.

在某些實施例中,所有或實質上所有黑色字母核苷酸( 亦即,在同一進化枝中之約90-97%逆轉錄子中保守之彼等)在本文所述的經工程改造之逆轉錄子中係保留/保守的。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,每10個黑色字母核苷酸出現不超過1、2、3、4、5、6、7、8、9或10個( 例如,多達1或2個)核苷酸變化( 例如,缺失或取代)。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,不超過約3%、4%、5%或10%之黑色字母核苷酸發生變化( 例如,缺失或取代)。 In certain embodiments, all or substantially all black letter nucleotides ( i.e. , those conserved in about 90-97% of retrotransposons in the same evolutionary branch) are retained/conserved in the engineered retrotransposons described herein. In certain embodiments, no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 ( e.g. , up to 1 or 2) nucleotide changes ( e.g. , deletions or substitutions) occur for every 10 black letter nucleotides in the engineered retrotransposons described herein. In certain embodiments, no more than about 3%, 4%, 5%, or 10% of the black letter nucleotides are changed ( e.g. , deleted or substituted) in the engineered retrotransposons described herein.

在某些實施例中,所有或實質上所有灰色字母核苷酸( 亦即,在同一進化枝中之約75-90%逆轉錄子中保守之彼等)在本文所述的經工程改造之逆轉錄子中係保留/保守的。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,每10個灰色字母核苷酸出現不超過1、2、3、4、5、6、7、8、9、10、11、12、13、14或15個( 例如,多達3或4個,或多達7、8或9個)核苷酸變化( 例如,缺失或取代)。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,不超過約5%、10%、15%、20%或25%之灰色字母核苷酸發生變化( 例如,缺失或取代)。 In certain embodiments, all or substantially all grey lettered nucleotides ( i.e. , those conserved in about 75-90% of retrotransposons in the same evolutionary branch) are retained/conserved in the engineered retrotransposons described herein. In certain embodiments, no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 ( e.g. , up to 3 or 4, or up to 7, 8, or 9) nucleotide changes ( e.g. , deletions or substitutions) occur for every 10 grey lettered nucleotides in the engineered retrotransposons described herein. In certain embodiments, no more than about 5%, 10%, 15%, 20%, or 25% of the grey lettered nucleotides are changed ( e.g. , deleted or substituted) in the engineered retrotransposons described herein.

在某些實施例中,所有或實質上所有紅色圓圈核苷酸( 亦即,在同一進化枝中之約97%或更多逆轉錄子中具有核苷酸之彼等)存在於本文所述的經工程改造之逆轉錄子中。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,每10個紅色圓圈核苷酸有不超過1、2或3個( 例如,0.3、0.5個或多達1個)核苷酸不存在( 例如,缺失)。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,每10個紅色圓圈核苷酸存在7、8、9或10個( 例如,9或10個)核苷酸。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,不超過約0.3%、0.5%、1%、2%、3%、4%或5%之紅色圓圈核苷酸不存在( 例如,缺失)。 In certain embodiments, all or substantially all red circle nucleotides ( i.e. , those having nucleotides in about 97% or more retrotranscripts in the same evolutionary branch) are present in the engineered retrotranscripts described herein. In certain embodiments, no more than 1, 2, or 3 ( e.g. , 0.3, 0.5, or up to 1) nucleotides are absent ( e.g. , deleted) for every 10 red circle nucleotides in the engineered retrotranscripts described herein. In certain embodiments, 7, 8, 9, or 10 ( e.g., 9 or 10) nucleotides are present for every 10 red circle nucleotides in the engineered retrotranscripts described herein. In certain embodiments, no more than about 0.3%, 0.5%, 1%, 2%, 3%, 4%, or 5% of the red circle nucleotides are absent (e.g. , deleted) in the engineered retrotranscripts described herein.

在某些實施例中,所有或實質上所有黑色圓圈核苷酸( 亦即,在同一進化枝中之約90-97%逆轉錄子中具有核苷酸之彼等)存在於本文所述的經工程改造之逆轉錄子中。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,每10個黑色圓圈核苷酸有不超過1、2、3或4個( 例如,多達1或2個)核苷酸不存在( 例如,缺失)。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,每10個黑色圓圈核苷酸存在6、7、8、9或10個( 例如,8、9或10個)核苷酸。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,不超過約1%、2%、3%、5%或10%之黑色圓圈核苷酸不存在( 例如,缺失)。 In certain embodiments, all or substantially all black circled nucleotides ( i.e. , those having nucleotides in about 90-97% of retrotransposons in the same clade) are present in the engineered retrotransposons described herein. In certain embodiments, no more than 1, 2, 3, or 4 ( e.g. , up to 1 or 2) nucleotides are absent ( e.g. , deleted) for every 10 black circled nucleotides in the engineered retrotransposons described herein. In certain embodiments, 6, 7, 8, 9, or 10 (e.g., 8, 9, or 10) nucleotides are present for every 10 black circled nucleotides in the engineered retrotransposons described herein. In certain embodiments, no more than about 1%, 2%, 3%, 5%, or 10% of the black circled nucleotides are absent ( e.g. , deleted) in the engineered retrotransposons described herein .

在某些實施例中,所有或實質上所有灰色圓圈核苷酸( 亦即,在同一進化枝中之約75-90%逆轉錄子中具有核苷酸之彼等)存在於本文所述的經工程改造之逆轉錄子中。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,每10個灰色圓圈核苷酸有不超過1、2、3、4或5個( 例如,多達2、3或4個)核苷酸不存在( 例如,缺失)。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,每10個灰色圓圈核苷酸存在4、5、6、7、8、9或10個( 例如,6、7、8、9或10個)核苷酸。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,不超過約5%、10%、15%、20%或25%之灰色圓圈核苷酸不存在( 例如,缺失)。 In certain embodiments, all or substantially all of the grey circled nucleotides ( i.e. , those having nucleotides in about 75-90% of retrotransposons in the same clade) are present in the engineered retrotransposons described herein. In certain embodiments, no more than 1, 2, 3, 4, or 5 ( e.g. , up to 2, 3, or 4) nucleotides are absent ( e.g. , deleted) for every 10 grey circled nucleotides in the engineered retrotransposons described herein. In certain embodiments, 4, 5, 6, 7, 8, 9, or 10 ( e.g., 6, 7, 8, 9, or 10) nucleotides are present for every 10 grey circled nucleotides in the engineered retrotransposons described herein. In certain embodiments, no more than about 5%, 10%, 15%, 20%, or 25% of the grey circled nucleotides are absent (e.g. , deleted) in the engineered retrotransposons described herein.

在某些實施例中,所有或實質上所有白色圓圈核苷酸( 亦即,在同一進化枝中之約50-75%逆轉錄子中具有核苷酸之彼等)存在於本文所述的經工程改造之逆轉錄子中。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,每10個白色圓圈核苷酸有不超過1、2、3、4、5、6或6個( 例如,多達2、3、4、5、6個)核苷酸不存在( 例如,缺失)。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,每10個灰色圓圈核苷酸存在2、3、4、5、6、7、8、9或10個( 例如,4、5、6、7、8、9或10個)核苷酸。在某些實施例中,在本文所述的經工程改造之逆轉錄子中,不超過約5%、10%、15%、20%、30%、40%或50%之白色圓圈核苷酸不存在( 例如,缺失)。 In certain embodiments, all or substantially all white circled nucleotides ( i.e. , those having nucleotides in about 50-75% of retrotransposons in the same clade) are present in the engineered retrotransposons described herein. In certain embodiments, no more than 1, 2, 3, 4, 5, 6, or 6 ( e.g. , up to 2, 3, 4, 5, 6) nucleotides are absent ( e.g. , deleted) for every 10 white circled nucleotides in the engineered retrotransposons described herein. In certain embodiments, 2, 3, 4, 5, 6, 7, 8, 9, or 10 ( e.g. , 4, 5, 6, 7, 8, 9, or 10) nucleotides are present for every 10 grey circled nucleotides in the engineered retrotransposons described herein. In certain embodiments, no more than about 5%, 10%, 15%, 20%, 30%, 40%, or 50% of the white circled nucleotides are absent ( eg , deleted) in an engineered retrotranscript described herein.

在一些實施例中,經工程改造之逆轉錄子係合成產生的。在其他實施例中,合成產生之經工程改造之逆轉錄子包含如表B及相應圖2-27之任一SEQ ID NO:分別所描繪的序列及/或二級結構,以及至少根據其各自之序列一致性水準的保守彩色字母核苷酸( 例如,紅色、黑色及灰色字母),及/或至少根據其各自之序列存在概率水準的保守彩色圓圈核苷酸( 例如,紅色、黑色及灰色圓圈)。 In some embodiments, the engineered retrotranscript is synthetically produced. In other embodiments, the synthetically produced engineered retrotranscript comprises a sequence and/or secondary structure as depicted in any one of SEQ ID NOs: in Table B and corresponding Figures 2-27, respectively, and conserved colored letter nucleotides ( e.g. , red, black, and gray letters) at least according to their respective sequence identity levels, and/or conserved colored circle nucleotides ( e.g. , red, black, and gray circles) at least according to their respective sequence occurrence probability levels.

在一些實施例中,經工程改造之逆轉錄子中的序列修飾導致經編碼之逆轉錄子ncRNA具有所需功能改良。In some embodiments, sequence modifications in the engineered retrotran result in desired improvements in the function of the encoded retrotran ncRNA.

在某些實施例中,在ncRNA中,一或多個序列修飾包含以下一或多者:(i) a1、a2或a1及a2兩者中的經修飾(例如,突變、減少或消除)之凸起;(ii) a1、a2或a1及a2兩者之延長或縮短;(iii)髮夾環之間的間隔序列之延伸或縮短(例如,圖2中之S1、S2、S3及/或S4,或圖2-27中任一者之任何S區);(iv)髮夾環中之額外或經修飾(例如,突變或消除)之凸起(例如,圖2中之L2及/或L3,或圖2-27中任一者之任何L區(例如,藉由移除該凸起中的未配對鹼基,或藉由用相等數目之鹼基對置換未配對鹼基));(v)經修飾(例如,延長或縮短)之髮夾環長度(例如,圖2中之L1、L2、L3及/或L4,或圖2-27中任一者之任何L區) ; (vi)具有補體、反向或反向補體序列之替代L1及/或L2 (在圖2中,或圖2-27中任一者之任何L區);(vii)在髮夾環(例如,圖2中之L1、L2、L3及/或L4,或圖2-27中任一者之任何L區)之尖端處經修飾(例如,增加)數目之未配對鹼基;(viii)髮夾環中經修飾(例如,增加或減少)之GC含量(例如,圖2中之L1、L2、L3及/或L4,或圖2-27中任一者之任何L區);(ix)在髮夾環之間的間隔序列中(例如,圖2中之S1、S2、S3及/或S4,或圖2-27中任一者之任何S區)或在髮夾環(例如,圖2中之L1、L2、L3及/或L4,或圖2-27中任一者之任何L區)之尖端處插入該異源核酸;(x)缺失一或多個髮夾環(例如,圖2中之L1、L2、L3及/或L4,或圖2-27中任一者之任何L區);(xi)髮夾環之間的間隔序列中添加新環(例如,圖2中之S1、S2、S3及/或S4,或圖2-27中任一者之任何S區); (xii)該ncRNA之環化,其中該ncRNA之5'端及3'端直接地或經由間隔序列連接;(xiii)能夠起始逆轉錄啟動的再定位之分支鏈鳥苷;(xiv)降低該逆轉錄子ncRNA之免疫原性的交錯末端序列,其藉由例如添加或移除5' a1核苷酸及/或3' a2核苷酸而產生;及/或(xv)與由該異源核酸編碼之CRISPR/Cas引導RNA (gRNA)序列互補的反義序列,其中該反義序列與該經編碼之逆轉錄子ncRNA中的該gRNA雜交且抑制該gRNA,且其中該反義序列在該msDNA之逆轉錄後經移除。In certain embodiments, in the ncRNA, the one or more sequence modifications include one or more of the following: (i) a modified (e.g., mutated, reduced, or eliminated) bulge in a1, a2, or both a1 and a2; (ii) (iii) an extension or shortening of a1, a2, or both a1 and a2; (iv) an additional or modified (e.g., mutated or eliminated) protrusion in a hairpin (e.g., L2 and/or L3 in FIG. 2, or any L region in FIG. 2-27 (e.g., by removing unpaired bases in the protrusion, or by replacing unpaired bases with an equal number of base pairs)); (v) a modified (e.g., extended or shortened) length of a hairpin (e.g., L1, L2, L3 and/or L4 in FIG. 2, or any L region in FIG. 2-27) ; (vi) an alternative L1 and/or L2 (in FIG. 2 , or any L region of any of FIG. 2-27 ) with a complement, reverse, or reverse complement sequence; (vii) a modified (e.g., increased) number of unpaired bases at the tip of a hairpin loop (e.g., L1, L2, L3, and/or L4 in FIG. 2 , or any L region of any of FIG. 2-27 ); (viii) a modified (e.g., increased or decreased) GC content in a hairpin loop (e.g., L1, L2, L3, and/or L4 in FIG. 2 , or any L region of any of FIG. 2-27 ); (ix) in the spacer sequence between hairpin loops (e.g., S1, S2 in FIG. 2 ); 2-27) or inserting the heterologous nucleic acid at the tip of a hairpin loop (e.g., L1, L2, L3 and/or L4 in FIG. 2, or any L region in FIG. 2-27); (x) deleting one or more hairpin loops (e.g., L1, L2, L3 and/or L4 in FIG. 2, or any L region in FIG. 2-27); (xi) adding a new loop to the spacer sequence between hairpin loops (e.g., S1, S2, S3 and/or S4 in FIG. 2, or any S region in FIG. 2-27); (xii) circularization of the ncRNA, wherein the 5' and 3' ends of the ncRNA are linked directly or via a spacer sequence; (xiii) a relocated branched chain guanosine capable of initiating retrotranscriptional initiation; (xiv) a staggered end sequence that reduces the immunogenicity of the retrotranscript ncRNA, which is generated by, for example, adding or removing 5' a1 nucleotides and/or 3' a2 nucleotides; and/or (xv) an antisense sequence that is complementary to a CRISPR/Cas guide RNA (gRNA) sequence encoded by the heterologous nucleic acid, wherein the antisense sequence hybridizes with and inhibits the gRNA in the encoded retrotranscript ncRNA, and wherein the antisense sequence is removed after reverse transcription of the msDNA.

除非另外特別指示,否則a1及a2區域均為單股的且實質上彼此反向互補,從而形成視情況由對稱或不對稱凸起中斷之莖,具有視情況存在的一或多個5'及/或3'懸垂/未配對核苷酸,其中a1區域一般在提供用於逆轉錄啟動之2’-OH的保守分支鏈鳥苷(G)之前結束( 例如,緊接5'結束)。 Unless otherwise specifically indicated, the a1 and a2 regions are both single-stranded and substantially anti-complement each other, forming a stem interrupted by symmetrical or asymmetrical bulges, as appropriate, with one or more 5' and/or 3' overhanging/unpaired nucleotides, as appropriate, wherein the a1 region generally ends before ( e.g. , immediately 5' to) a conserved branching guanosine (G) that provides a 2'-OH for reverse transcription initiation.

在一些實施例中,序列變化包含a1/a2莖區中之突變、減少或消除之突出,包括一( 亦即,a1或a2)股或a1及a2股兩者中之序列變化。 In some embodiments, the sequence variation comprises a mutation in the a1/a2 stem region, a reduction or elimination of an overhang, including sequence variation in one ( ie , a1 or a2) strand or both a1 and a2 strands.

例如,在一些實施例中,序列變化包含自a1、a2或a1及a2兩者中缺失核苷酸,使得凸起之大小減小,或對稱凸起變為不對稱,或反之亦然,或消除凸起。For example, in some embodiments, the sequence change comprises deleting nucleotides from a1, a2, or both a1 and a2, such that the size of the bulge is reduced, or a symmetric bulge becomes asymmetric, or vice versa, or the bulge is eliminated.

在一些實施例中,序列變化包含置換/取代a1、a2或a1及a2兩者中之核苷酸,使得凸起中之先前未配對鹼基變成鹼基配對的。In some embodiments, the sequence change comprises a replacement/substitution of a nucleotide in a1, a2, or both a1 and a2 such that a previously unpaired base in the bulge becomes base-paired.

在一些實施例中,序列變化包含用一或多個未配對嘧啶鹼基置換未配對嘌呤鹼基。In some embodiments, the sequence changes comprise replacing an unpaired purine base with one or more unpaired pyrimidine bases.

在一些實施例中,序列變化包含用一或多個未配對嘌呤鹼基置換未配對嘧啶鹼基。In some embodiments, the sequence changes comprise replacing an unpaired pyrimidine base with one or more unpaired purine bases.

在一些實施例中,序列變化包含用另一未配對嘌呤鹼基( 例如,分別為G或A)置換一個未配對嘌呤鹼基( 例如,A或G)。 In some embodiments, the sequence change comprises replacing one unpaired purine base ( e.g. , A or G) with another unpaired purine base ( e.g. , G or A, respectively).

在一些實施例中,序列變化包含用另一未配對嘧啶鹼基( 例如,分別為C或T/U)置換一個未配對嘧啶鹼基( 例如,T/U或C)。 In some embodiments, the sequence change comprises replacing one unpaired pyrimidine base ( e.g. , T/U or C) with another unpaired pyrimidine base ( e.g. , C or T/U, respectively).

在一些實施例中,序列變化包含a1、a2或a1及a2兩者之延長或縮短。In some embodiments, the sequence change comprises an extension or shortening of a1, a2, or both a1 and a2.

例如,可藉由缺失5'懸垂、缺失任何上游凸起核苷酸、缺失參與鹼基配對之鹼基來縮短a1之長度。同樣,可藉由添加5'懸垂、添加任何上游凸起核苷酸、添加參與鹼基配對之鹼基來延長a1之長度。For example, the length of a1 can be shortened by deleting a 5' overhang, deleting any upstream protruding nucleotides, or deleting bases that participate in base pairing. Similarly, the length of a1 can be extended by adding a 5' overhang, adding any upstream protruding nucleotides, or adding bases that participate in base pairing.

在一些實施例中,可藉由缺失5'懸垂、缺失任何下游凸起核苷酸、缺失參與鹼基配對之鹼基來縮短a2之長度。同樣,可藉由添加5'懸垂、添加任何下游凸起核苷酸、添加參與鹼基配對之鹼基來延長a2之長度。In some embodiments, the length of a2 can be shortened by deleting a 5' overhang, deleting any downstream protruding nucleotides, or deleting bases that participate in base pairing. Similarly, the length of a2 can be extended by adding a 5' overhang, adding any downstream protruding nucleotides, or adding bases that participate in base pairing.

在一些實施例中,可延長或縮短如圖2-27中任一者所描繪之髮夾環之間之間隔序列( 例如,圖2中之S1、S2、S3及/或S4)。在一些實施例中,可藉由將異源核酸序列插入髮夾環之間之間隔序列( 例如,圖2中之S1、S2、S3及/或S4)中來進行修飾。在某些實施例中,將一或多個異源核酸序列插入 msd區域中之間隔序列中。在一些實施例中,可藉由用額外凸起或髮夾環中斷間隔基來進行間隔區之修飾。 In some embodiments, the spacer sequence between the hairpin loops as depicted in any of Figures 2-27 ( e.g. , S1, S2, S3, and/or S4 in Figure 2) can be lengthened or shortened. In some embodiments, the modification can be performed by inserting a heterologous nucleic acid sequence into the spacer sequence between the hairpin loops ( e.g. , S1, S2, S3, and/or S4 in Figure 2). In certain embodiments, one or more heterologous nucleic acid sequences are inserted into the spacer sequence in the msd region. In some embodiments, the modification of the spacer region can be performed by interrupting the spacer base with an additional protrusion or hairpin loop.

在其他實施例中,髮夾環中之凸起在該凸起中發生突變或消除( 例如,藉由移除未配對鹼基),使得例如對稱凸起變成不對稱凸起,或不對稱凸起變成對稱凸起或甚至更不對稱凸起。在某些實施例中,凸起中之未配對鹼基經相等數目之鹼基對置換。額外鹼基對可在前一凸起之一端或兩端合併至莖中,或可將前一凸起二等分以產生兩個凸起。 In other embodiments, a protrusion in a hairpin is mutated or eliminated in the protrusion ( e.g. , by removing unpaired bases), such that, for example, a symmetric protrusion becomes an asymmetric protrusion, or an asymmetric protrusion becomes a symmetric protrusion or even more asymmetric protrusion. In certain embodiments, unpaired bases in a protrusion are replaced with an equal number of base pairs. The additional base pairs may be incorporated into the stem at one or both ends of the previous protrusion, or the previous protrusion may be bisected to create two protrusions.

在一些實施例中,可延長或縮短如圖2-27中任一者所描繪之一或多個髮夾環( 例如,圖2之L1、L2、L3及/或L4)之長度。例如,可增加或減少尖端或環內之未配對鹼基之數目。此外,可將相關異源核酸序列插入尖端或髮夾環內。在某些實施例中,將相關異源核酸序列插入 msd基因座中之尖端或髮夾環內。 In some embodiments, the length of one or more hairpin loops ( e.g. , L1, L2, L3, and/or L4 of FIG. 2 ) as depicted in any of FIGS. 2-27 can be extended or shortened. For example, the number of unpaired bases in the tip or loop can be increased or decreased. In addition, a related heterologous nucleic acid sequence can be inserted into the tip or hairpin loop. In certain embodiments, a related heterologous nucleic acid sequence is inserted into the tip or hairpin loop in the msd locus.

在其他實施例中,增加或減少尖端或髮夾環中之GC含量。In other embodiments, the GC content in the tip or hair clip is increased or decreased.

在其他實施例中,可缺失髮夾環。In other embodiments, the hair clip ring may be absent.

在一些實施例中,可藉由直接地或經由間隔序列連接ncRNA之5'端及3'端來環化ncRNA。In some embodiments, the ncRNA can be circularized by joining the 5' and 3' ends of the ncRNA directly or through a spacer sequence.

在一些實施例中,修飾一或多個髮夾環( 例如,圖2之L1、L2、L3及/或L4)以具有補體、反向或反向補體序列。 In some embodiments, one or more hairpin loops ( e.g. , L1, L2, L3, and/or L4 of FIG. 2) are modified to have a complement, reverse, or reverse complement sequence.

在某些實施例中,對能夠起始逆轉錄啟動之分支鏈鳥苷(G)進行再定位。例如,G可進一步位於a1序列末端之下游,例如1、2、3、4或5個額外核苷酸處。In certain embodiments, a branched guanosine (G) capable of initiating reverse transcription is relocated. For example, G can be further downstream of the end of the a1 sequence, such as 1, 2, 3, 4 or 5 additional nucleotides.

在某些實施例中,藉由 例如添加或移除5' a1核苷酸及/或3' a2核苷酸來降低逆轉錄子ncRNA之免疫原性。 In certain embodiments, the immunogenicity of a retrotran ncRNA is reduced by, for example, adding or removing 5' a1 nucleotides and/or 3' a2 nucleotides.

在某些實施例中,一或多個異源核酸序列(插入本發明之經工程改造之逆轉錄子中)包含:a)插入 msr基因座或 msd基因座中(諸如S區(例如,圖2中之S1、S2、S3及/或S4,或圖2-27中任一者之任何S區)中,或L區(例如,圖2中之L1、L2、L3及/或L4,或圖2-27中任一者之任何L區)之尖端,或 msr基因座或 msd基因座上游或下游中的異源核酸(諸如RNA適體或核酶之編碼序列);或b)插入 msd基因座中之第一異源核酸,及插入 msr基因座上游或 msd基因座下游之第二異源核酸,其中該第二異源核酸編碼CRISPR/Cas引導RNA (gRNA)。 In certain embodiments, the one or more heterologous nucleic acid sequences (inserted into the engineered retrotransposons of the present invention) comprise: a) a heterologous nucleic acid (such as an RNA aptamer or a ribozyme encoding sequence) inserted into the msr locus or the msd locus (such as an S region (e.g., S1, S2, S3 and/or S4 in FIG. 2 , or any S region of any one of FIGS. 2-27 ), or at the tip of the L region (e.g., L1, L2, L3 and/or L4 in FIG. 2 , or any L region of any one of FIGS. 2-27 ), or upstream or downstream of the msr locus or the msd locus; or b) a first heterologous nucleic acid inserted into the msd locus, and a second heterologous nucleic acid inserted upstream of the msr locus or downstream of the msd locus, wherein the second heterologous nucleic acid encodes a CRISPR/Cas guide RNA (gRNA).

在某些實施例中,可包括與由該異源核酸編碼之CRISPR/Cas引導RNA (gRNA)序列互補的反義序列,其中該反義序列與經編碼之逆轉錄子ncRNA中的gRNA雜交且抑制該gRNA,且其中該反義序列在msDNA之逆轉錄後經移除。In certain embodiments, an antisense sequence may be included that is complementary to a CRISPR/Cas guide RNA (gRNA) sequence encoded by the heterologous nucleic acid, wherein the antisense sequence hybridizes with and inhibits the gRNA in the encoded retrotranscript ncRNA, and wherein the antisense sequence is removed after retrotranscription of the msDNA.

在某些實施例中,該異源核酸編碼相關蛋白質或肽,或其中該異源核酸包含或編碼供體/模板序列(例如,校正/修復/移除標靶基因體位點處之突變(諸如疾病基因中之突變外顯子)的供體;功能性DNA元件(諸如啟動子、增強子、蛋白質結合序列、甲基化位點、用於輔助基因編輯之同源區等);或功能性RNA元件之編碼序列(ncRNA等))。In certain embodiments, the heterologous nucleic acid encodes a protein or peptide of interest, or wherein the heterologous nucleic acid comprises or encodes a donor/template sequence (e.g., a donor that corrects/repairs/removes a mutation at a target genome site (e.g., a mutant exon in a disease gene); a functional DNA element (e.g., a promoter, an enhancer, a protein binding sequence, a methylation site, a homologous region for assisting gene editing, etc.); or a coding sequence for a functional RNA element (ncRNA, etc.)).

在某些實施例中,相關蛋白質或肽包含可用於治療疾病之治療蛋白(諸如疾病細胞中具有缺陷之野生型蛋白質,或治療抗體或其抗原結合片段)。In certain embodiments, the protein or peptide of interest comprises a therapeutic protein that can be used to treat a disease (eg, a wild-type protein that is defective in a diseased cell, or a therapeutic antibody or antigen-binding fragment thereof).

本發明之其他異源核酸描述於本說明書之其他章節中,均以引用之方式併入本文中。Other heterologous nucleic acids of the present invention are described in other sections of this specification, which are incorporated herein by reference.

在一些實施例中,用於經工程改造之逆轉錄子的模板/野生型逆轉錄子編碼野生型或共有逆轉錄子ncRNA多核苷酸,該多核苷酸具有圖2-27中任一者所示之共有二級結構,該等圖單獨描述如下: 此模板之變異體,其亦可用於本發明之經工程改造之逆轉錄子,包括具有以下之變異體:A)每10個紅色字母核苷酸多達1、2或3個( 例如,多達1個)核苷酸變化;B)每10個黑色字母核苷酸多達4、5或6個( 例如,多達1或2個)核苷酸變化;及/或C)每10個灰色字母核苷酸多達7、8或9個( 例如,多達3或4個)核苷酸變化;及/或視情況進一步包含:a)每10個紅色圓圈核苷酸存在7、8、9或10個( 例如,9或10個)核苷酸;b)每10個黑色圓圈核苷酸存在6、7、8、9或10個( 例如,8、9或10個)核苷酸;c)每10個灰色圓圈核苷酸存在4、5、6、7、8、9或10個( 例如,6、7、8、9或10個)核苷酸;及/或d)每10個白色圓圈核苷酸存在2、3、4、5、6、7、8、9或10個( 例如,4、5、6、7、8、9或10個)核苷酸。 In some embodiments, the template/wild-type retrotranscript for an engineered retrotranscript encodes a wild-type or common retrotranscript ncRNA polynucleotide having a common secondary structure as shown in any of Figures 2-27, which are described individually as follows: Variants of this template, which may also be used in the engineered retrotranscripts of the present invention, include variants having: A) up to 1, 2, or 3 ( e.g. , up to 1) nucleotide changes for every 10 red-letter nucleotides; B) up to 4, 5, or 6 ( e.g. , up to 1 or 2) nucleotide changes for every 10 black-letter nucleotides; and/or C) up to 7, 8, or 9 ( e.g. , up to 3 or 4) nucleotide changes for every 10 gray-letter nucleotides; and/or further comprising, as appropriate: a) up to 10 red-circled nucleotides a) there are 7, 8, 9 or 10 ( e.g. , 9 or 10) nucleotides per 10 black circled nucleotides; c) there are 4, 5, 6, 7, 8, 9 or 10 ( e.g. , 6, 7, 8, 9 or 10) nucleotides per 10 grey circled nucleotides; and/or d) there are 2, 3, 4, 5, 6, 7, 8, 9 or 10 ( e.g. , 4, 5, 6, 7, 8, 9 or 10) nucleotides per 10 white circled nucleotides.

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如SEQ ID NO. XX及圖2-27所示之彩色字母所表示的保守核苷酸。In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in SEQ ID NO. XX and in Figures 2-27.

在一些實施例中,經工程改造之逆轉錄子的非編碼RNA (ncRNA)部分包含編碼以下之多核苷酸(例如,DNA分子):表B中列出之ncRNA,或與表B中列出之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA。在一些實施例中,該ncRNA不包含與表X之序列相關的ncRNA。In some embodiments, the non-coding RNA (ncRNA) portion of the engineered retrotransposons comprises a polynucleotide (e.g., a DNA molecule) encoding an ncRNA listed in Table B, or an ncRNA having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, or at least 99.9% sequence identity to an ncRNA listed in Table B. In some embodiments, the ncRNA does not comprise an ncRNA related to a sequence of Table X.

例如,可在轉染細胞或接合至載體中之前執行本文所述的經工程改造之逆轉錄子之擴增。可使用用於擴增經工程改造之逆轉錄子的任何方法,包括但不限於聚合酶鏈反應(PCR)、等溫擴增、基於核酸序列之擴增(NASBA)、轉錄介導之擴增(TMA )、股置換擴增(SDA)及連接酶鏈反應(LCR)。在一實施例中,經工程改造之逆轉錄子包含共有5'及3'啟動位點,以允許與一組通用引子平行地擴增逆轉錄子序列。在另一實施例中,使用一組選擇性引子自匯集之混合物中選擇性地擴增逆轉錄子序列之子集。For example, the amplification of the engineered retrotranscript described herein can be performed before transfecting cells or joining into a vector. Any method for amplifying engineered retrotranscripts can be used, including but not limited to polymerase chain reaction (PCR), isothermal amplification, nucleic acid sequence-based amplification (NASBA), transcription-mediated amplification (TMA), strand substitution amplification (SDA), and ligase chain reaction (LCR). In one embodiment, the engineered retrotranscript comprises a common 5' and 3' activating site to allow amplification of retrotranscript sequences in parallel with a set of universal primers. In another embodiment, a set of selective primers is used to selectively amplify a subset of retrotranscript sequences from a pooled mixture.

在一些實施例中,用於經工程改造之逆轉錄子的模板/野生型逆轉錄子編碼野生型或共有逆轉錄子ncRNA多核苷酸,該多核苷酸具有圖2-27所示之共有二級結構,且該等圖如下文單獨描述: IA/IIA1 型逆轉錄子(圖2) In some embodiments, the template/wild-type retrotranscript for the engineered retrotranscript encodes a wild-type or consensus retrotranscript ncRNA polynucleotide having a consensus secondary structure as shown in Figures 2-27, and the Figures are described separately below: Type IA/IIA1 Retrotranscript (Figure 2)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-L2-S2-L3-S3-a2-S4-L4, 其中: a1/a2為8 bp長度之莖; L1為5 bp之莖,具有10-nt尖端; L2為7 bp之莖,具有5-nt尖端及距尖端3 nt之1/1凸起; L3為23 bp之莖,具有22-nt尖端及距尖端21 bp之2/2凸起; L4為11 bp之莖,具有5-nt尖端; S1係a1/a2莖與L1之間之單股間隔區,其中L1與L2之間無間隔基; S2係L2與L3之間之單股間隔區; S3係L3與a1/a2莖之間之單股間隔區;且 S4係a1/a2莖與L4之間之單股間隔區,且 保守核苷酸如SEQ ID NO: 1及圖2所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-S1-L1-L2-S2-L3-S3-a2-S4-L4, wherein: a1/a2 is a stem of 8 bp in length; L1 is a stem of 5 bp with a 10-nt tip; L2 is a stem of 7 bp with a 5-nt tip and a 1/1 bulge 3 nt from the tip; L3 is a stem of 23 bp with a 22-nt tip and a 2/2 bulge 21 bp from the tip; L4 is a stem of 11 bp with a 5-nt tip; S1 is a single-stranded spacer region between the a1/a2 stem and L1, wherein there is no spacer between L1 and L2; S2 is a single-stranded spacer between L2 and L3; S3 is a single-stranded spacer between L3 and a1/a2 stem; and S4 is a single-stranded spacer between a1/a2 stem and L4, and the conserved nucleotides are as shown in SEQ ID NO: 1 and Figure 2, and wherein the colored circled nucleotides are present at their respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides are present).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖2所示之彩色字母所表示的保守核苷酸。 IB1 型逆轉錄子(圖 3) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 2. Type IB1 retrotranscript (Figure 3)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-a2, 其中: a1/a2為6 bp長度之莖,具有距尖端3 bp之2/2凸起,其中a1具有2-nt懸垂且a2具有6-nt懸垂; L1為14 bp之莖,具有3-nt尖端、距尖端4 bp之1/0凸起及距尖端10 bp之0/6凸起; L2為23 bp之莖,具有5-nt尖端、距尖端4 bp之1/1凸起及距尖端18 bp之0/1凸起; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與a1/a2之間之單股間隔區, 保守核苷酸如圖3所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-a2, wherein: a1/a2 is a 6 bp long stem with a 2/2 bulge 3 bp from the tip, wherein a1 has a 2-nt overhang and a2 has a 6-nt overhang; L1 is a 14 bp stem with a 3-nt tip, a 1/0 bulge 4 bp from the tip, and a 0/6 bulge 10 bp from the tip; L2 is a 23 bp stem with a 5-nt tip, a 1/1 bulge 4 bp from the tip, and a 0/1 bulge 18 bp from the tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and a1/a2. Conserved nucleotides are shown in FIG3 , and the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖3所示之彩色字母所表示的保守核苷酸。 IB2 型逆轉錄子(圖4) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 3. Type IB2 retrotranscript (Figure 4)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-L2-S2-L3-S3-L4-S4-a2, 其中: a1/a2莖為16 bp長,具有17個鹼基之5'懸垂及16個鹼基之3'懸垂; L1為6 bp之莖,具有4-nt尖端; L2為4 bp之莖,具有4-nt尖端及距尖端2 nt之2/2凸起; L3為3 bp之莖,具有5-nt尖端; L4為9 bp之莖,具有5-nt尖端及距尖端4 nt之1/1凸起; S1、S2、S3及S4分別為a1/a2莖與L1、L2與L3、L3與L4以及L4與a1/a2莖之間之單股間隔區,其中L1與L2之間無間隔基;其中S1之最後5 nt及S2之第5-9個nt形成5-bp莖,且 保守核苷酸如圖4所示,且其中彩色圓圈核苷酸以各自之確定性水準存在(例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-S1-L1-L2-S2-L3-S3-L4-S4-a2, wherein: the a1/a2 stem is 16 bp long with a 17-base 5' overhang and a 16-base 3' overhang; L1 is a 6 bp stem with a 4-nt tip; L2 is a 4 bp stem with a 4-nt tip and a 2/2 protrusion 2 nt from the tip; L3 is a 3 bp stem with a 5-nt tip; L4 is a 9 bp stem with a 5-nt tip and a 1/1 protrusion 4 nt from the tip; S1, S2, S3 and S4 are single-stranded spacer regions between a1/a2 stem and L1, L2 and L3, L3 and L4, and L4 and a1/a2 stem, respectively, wherein there is no spacer between L1 and L2; wherein the last 5 nt of S1 and the 5th to 9th nt of S2 form a 5-bp stem, and the conserved nucleotides are as shown in FIG4, and wherein the colored circled nucleotides are present at respective certainty levels (e.g., at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides are present).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖4所示之彩色字母所表示的保守核苷酸。 1C 型逆轉錄子(圖5) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 4. Type 1C retrotranscript (Figure 5)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-a2, 其中: a1/a2為13 bp長度之莖; L1為9 bp之莖,具有3-nt尖端; L2為10 bp之莖,具有5-nt尖端; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與a1/a2之間之單股間隔區, 保守核苷酸如圖5所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-a2, wherein: a1/a2 is a stem of 13 bp in length; L1 is a stem of 9 bp with a 3-nt tip; L2 is a stem of 10 bp with a 5-nt tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and a1/a2, Conserved nucleotides are shown in FIG. 5 , wherein the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖5所示之彩色字母所表示的保守核苷酸。 IIA1 型逆轉錄子(圖6) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 5. Type IIA1 retrotranscript (Figure 6)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-L3-S4-a2, 其中: a1/a2為10 bp長度之莖,a2上具有1-nt懸垂; L1為10 bp之莖,具有3-nt尖端; L2為7 bp之莖,具有5-nt尖端; L3為27 bp之莖,具有8-nt尖端及距尖端26 bp之0/2凸起; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與L3之間之單股間隔區; S4係L3與a1/a2之間之單股間隔區, 保守核苷酸如圖6所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-L3-S4-a2, wherein: a1/a2 is a 10 bp long stem with a 1-nt overhang on a2; L1 is a 10 bp stem with a 3-nt tip; L2 is a 7 bp stem with a 5-nt tip; L3 is a 27 bp stem with an 8-nt tip and a 0/2 bulge 26 bp from the tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and L3; S4 is a single-stranded spacer between L3 and a1/a2. The conserved nucleotides are shown in FIG6 , and the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖6所示之彩色字母所表示的保守核苷酸。 IIA2 型逆轉錄子(圖7) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 6. Type IIA2 retrotranscript (Figure 7)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-L3-S3-a2, 其中: a1/a2為7 bp長度之莖,不具有懸垂; L1為8 bp之莖,具有3-nt尖端; L2為30 bp之莖,具有8-nt尖端、距尖端2 bp之1/1凸起及距尖端27 bp之1/1凸起; L3為8 bp之莖,具有5-nt尖端及距尖端3 nt之0/1凸起; S1係a1/a2莖與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區;且 S3係L3與a1/a2莖之間之單股間隔區; 保守核苷酸如圖7所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-S1-L1-S2-L2-L3-S3-a2, wherein: a1/a2 is a 7 bp long stem without an overhang; L1 is an 8 bp stem with a 3-nt tip; L2 is a 30 bp stem with an 8-nt tip, a 1/1 bulge 2 bp from the tip, and a 1/1 bulge 27 bp from the tip; L3 is an 8 bp stem with a 5-nt tip and a 0/1 bulge 3 nt from the tip; S1 is a single-stranded spacer between the a1/a2 stem and L1; S2 is a single-stranded spacer between L1 and L2; and S3 is a single-stranded spacer between L3 and a1/a2 stems; the conserved nucleotides are shown in FIG7 , and the colored circled nucleotides are present at their respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides are present).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖7所示之彩色字母所表示的保守核苷酸。 IIA3 型逆轉錄子(圖8) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 7. Type IIA3 retrotranscript (Figure 8)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-L2-S2-L3-S3-a2, 其中: a1/a2為6 bp長度之莖; L1為8 bp之莖,具有9-nt尖端; L2為8 bp之莖,具有3-nt尖端; S1係a1/a2與L1之間之單股間隔區; S2係L2與L3之間之單股間隔區; S3係L3與a1/a2之間之單股間隔區, 保守核苷酸如圖8所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-S1-L1-L2-S2-L3-S3-a2, wherein: a1/a2 is a 6 bp long stem; L1 is an 8 bp stem with a 9-nt tip; L2 is an 8 bp stem with a 3-nt tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L2 and L3; S3 is a single-stranded spacer between L3 and a1/a2, Conserved nucleotides are shown in FIG. 8 , wherein the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖8所示之彩色字母所表示的保守核苷酸。 IIA4 型逆轉錄子(圖9) In some embodiments, the engineered retrotransposons are completely synthetically produced and have conserved nucleotides represented by colored letters as shown in Figure 8. Type IIA4 retrotransposons (Figure 9)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-a2-L1-S1-L2-L3-S2-L4-S3, 其中: a1/a2為3 bp長度之莖,不具有懸垂且具有7-nt尖端; L1為7 bp之莖,具有3-nt尖端; L2為6 bp之莖,具有4-nt尖端; L3為40 bp之莖,具有5-nt尖端及距尖端3 bp之2/2凸起、距尖端10 bp之5/4凸起以及距尖端30 bp之12/15凸起; L4為4 bp之莖,具有9-nt尖端; S1係L1與L2/L3之間之單股間隔區; S2係L2/L3與L4之間之單股間隔區; S3係L4與ncRNA之3'端之間的單股間隔區;且 保守核苷酸如圖9所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-a2-L1-S1-L2-L3-S2-L4-S3, wherein: a1/a2 is a 3 bp long stem without an overhang and with a 7-nt tip; L1 is a 7 bp stem with a 3-nt tip; L2 is a 6 bp stem with a 4-nt tip; L3 is a 40 bp stem with a 5-nt tip and a 2/2 bulge 3 bp from the tip, a 5/4 bulge 10 bp from the tip, and a 12/15 bulge 30 bp from the tip; L4 is a 4 bp stem with a 9-nt tip; S1 is a single-stranded spacer between L1 and L2/L3; S2 is a single-stranded spacer between L2/L3 and L4; S3 is a single-stranded spacer between L4 and the 3' end of the ncRNA; and the conserved nucleotides are as shown in FIG. 9 , wherein the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides are present).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖9所示之彩色字母所表示的保守核苷酸。 IIA5 型新穎逆轉錄子(圖10) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 9. Novel retrotranscript of type IIA5 (Figure 10)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-L3-S4-a2, 其中: a1/a2為15 bp長度之莖,a1上具有1-nt懸垂,a2上具有13-nt懸垂,且具有距尖端13-nt之7/5凸起; L1為10 bp之莖,具有3-nt尖端; L2為35 bp之莖,具有3-nt尖端; L3為6 bp之莖,具有5-nt尖端; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與L3之間之單股間隔區; S4係L3與a1/a2之間之單股間隔區, 保守核苷酸如圖10所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-L3-S4-a2, wherein: a1/a2 is a 15 bp long stem with a 1-nt overhang on a1, a 13-nt overhang on a2, and a 7/5 protrusion 13-nt from the tip; L1 is a 10 bp stem with a 3-nt tip; L2 is a 35 bp stem with a 3-nt tip; L3 is a 6 bp stem with a 5-nt tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and L3; S4 is a single-stranded spacer between L3 and a1/a2. Conserved nucleotides are shown in FIG. 10 , and the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖10所示之彩色字母所表示的保守核苷酸。 IIIA1 型逆轉錄子(圖11) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 10. Type IIIA1 retrotranscript (Figure 11)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-L2-S2-L3-S3-a2, 其中: a1/a2為2 bp長度之莖,a2上具有1-nt懸垂; L1為8 bp之莖,具有4-nt尖端; L2為9 bp之莖,具有3-nt尖端及距尖端3 bp之1/1凸起; L3為20 bp之莖,具有3-nt尖端及距尖端3 bp之1/2凸起; S1係a1/a2與L1之間之單股間隔區; S2係L2與L3之間之單股間隔區; S3係L3與a1/a2之間之單股間隔區, 保守核苷酸如圖11所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-S1-L1-L2-S2-L3-S3-a2, wherein: a1/a2 is a 2 bp long stem with a 1-nt overhang on a2; L1 is an 8 bp stem with a 4-nt tip; L2 is a 9 bp stem with a 3-nt tip and a 1/1 bulge 3 bp from the tip; L3 is a 20 bp stem with a 3-nt tip and a 1/2 bulge 3 bp from the tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L2 and L3; S3 is a single-stranded spacer between L3 and a1/a2, Conserved nucleotides are shown in FIG. 11 , wherein the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖11所示之彩色字母所表示的保守核苷酸。 IIIA2 型逆轉錄子(圖12) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 11. Type IIIA2 retrotranscript (Figure 12)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-L3-S4-L4-S5-a2, 其中: a1/a2為15 bp長度之莖; L1為6 bp之莖,具有4-nt尖端; L2為13 bp之莖,具有5-nt尖端; L3為4 bp之莖,具有8-nt尖端; L4為20 bp之莖,具有4-nt尖端及距尖端6 bp之2/2凸起; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與L3之間之單股間隔區; S4係L3與L4之間之單股間隔區; S5係L4與a1/a2之間之單股間隔區; 保守核苷酸如圖12所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-L3-S4-L4-S5-a2, wherein: a1/a2 is a stem of 15 bp in length; L1 is a stem of 6 bp with a 4-nt tip; L2 is a stem of 13 bp with a 5-nt tip; L3 is a stem of 4 bp with an 8-nt tip; L4 is a stem of 20 bp with a 4-nt tip and a 2/2 bulge 6 bp from the tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and L3; S4 is a single-stranded spacer between L3 and L4; S5 is a single-stranded spacer between L4 and a1/a2; Conserved nucleotides are shown in FIG. 12 , and the colored circled nucleotides are present at their respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖12所示之彩色字母所表示的保守核苷酸。 IIIA3 型逆轉錄子(圖13) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 12. Type IIIA3 retrotranscript (Figure 13)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-L3-S4-L4-L5-L6-S5-a2, 其中: a1/a2為24 bp長度之莖且具有距尖端15 bp之1/0凸起及距尖端19 bp之1/1凸起; L1為7 bp之莖,具有4-nt尖端; L2為9 bp之莖,具有8-nt尖端; L3為8 bp之莖,具有4-nt尖端; L4為4 bp之莖,具有9-nt尖端及距尖端3 bp之2/2凸起; L5為19 bp之莖,具有18-nt尖端; L6為5 bp之莖,具有3-nt尖端; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與L3之間之單股間隔區; S4係L3與L4之間之單股間隔區; S5係L6與a1/a2之間之單股間隔區; 保守核苷酸如圖13所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-L3-S4-L4-L5-L6-S5-a2, wherein: a1/a2 is a stem of 24 bp in length and has a 1/0 bulge 15 bp from the tip and a 1/1 bulge 19 bp from the tip; L1 is a stem of 7 bp with a 4-nt tip; L2 is a stem of 9 bp with an 8-nt tip; L3 is a stem of 8 bp with a 4-nt tip; L4 is a stem of 4 bp with a 9-nt tip and a 2/2 bulge 3 bp from the tip; L5 is a stem of 19 bp with an 18-nt tip; L6 is a stem of 5 bp stem with a 3-nt tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and L3; S4 is a single-stranded spacer between L3 and L4; S5 is a single-stranded spacer between L6 and a1/a2; Conserved nucleotides are shown in Figure 13, and the colored circled nucleotides are present at their respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖13所示之彩色字母所表示的保守核苷酸。 IIIA4 型逆轉錄子(圖14) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 13. Type IIIA4 retrotranscript (Figure 14)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-L3-S4-a2, 其中: a1/a2為5 bp長度之莖,具有距尖端2 bp之1/2凸起; L1為8 bp之莖,具有6-nt尖端; L2為8 bp之莖,具有5-nt尖端; L3為13 bp之莖,具有14-nt尖端及距尖端2 bp之1/0凸起; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與L3之間之單股間隔區: S4係L3與a1/a2之間之單股間隔區, 保守核苷酸如SEQ ID NO: 19及圖20所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-L3-S4-a2, wherein: a1/a2 is a 5 bp long stem with a 1/2 bulge 2 bp from the tip; L1 is an 8 bp stem with a 6-nt tip; L2 is an 8 bp stem with a 5-nt tip; L3 is a 13 bp stem with a 14-nt tip and a 1/0 bulge 2 bp from the tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and L3: S4 is a single-stranded spacer between L3 and a1/a2, and the conserved nucleotides are shown in SEQ ID NO: 19 and Figure 20, and the colored circled nucleotides are present at their respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖14所示之彩色字母所表示的保守核苷酸。 IIIA5 型逆轉錄子(圖15) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 14. Type IIIA5 retrotranscript (Figure 15)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-L3-L4-S3-a2, 其中: a1/a2莖為11 bp長,不具有懸垂; L1為9 bp之莖,具有3-nt尖端; L2為14 bp之莖,具有5-nt尖端; L3為9 bp之莖,具有7-nt尖端; L4為15 bp之莖,具有7-nt尖端; S1、S2及S3分別為a1/a2莖與L1、L2與L3以及L4與a1/a2莖之間之單股間隔區,其中L1與L2之間無間隔基,且L2與L3之間無間隔基;其中S2之倒數第5-2個nt及S3之第3-6個nt形成4-bp莖,且 保守核苷酸如圖15所示,且其中彩色圓圈核苷酸以各自之確定性水準存在(例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-S1-L1-S2-L2-L3-L4-S3-a2, wherein: a1/a2 stem is 11 bp long and has no overhang; L1 is a 9 bp stem with a 3-nt tip; L2 is a 14 bp stem with a 5-nt tip; L3 is a 9 bp stem with a 7-nt tip; L4 is a 15 bp stem with a 7-nt tip; S1, S2 and S3 are single-stranded spacer regions between a1/a2 stem and L1, L2 and L3, and L4 and a1/a2 stem, respectively, wherein there is no spacer between L1 and L2, and there is no spacer between L2 and L3; wherein the 5th to 2nd last nt of S2 and the 3rd to 6th nt of S3 form a 4-bp stem, and the conserved nucleotides are shown in FIG15, and wherein the colored circled nucleotides are present at respective certainty levels (e.g., at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides are present).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖15所示之彩色字母所表示的保守核苷酸。 IIIunk 型逆轉錄子(圖16) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 15. IIIunk type retrotranscript (Figure 16)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-L3-S4-a2, 其中: a1/a2為11 bp長度之莖; L1為12 bp之莖,具有2-nt尖端; L2為21 bp之莖,具有1-nt尖端; L3為20 bp之莖,具有4-nt尖端; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與L3之間之單股間隔區; S4係L3與a1/a2之間之單股間隔區, 保守核苷酸如圖16所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-L3-S4-a2, wherein: a1/a2 is a stem of 11 bp in length; L1 is a stem of 12 bp with a 2-nt tip; L2 is a stem of 21 bp with a 1-nt tip; L3 is a stem of 20 bp with a 4-nt tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and L3; S4 is a single-stranded spacer between L3 and a1/a2, Conserved nucleotides are shown in FIG. 16 , wherein the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖16所示之彩色字母所表示的保守核苷酸。 IV 型逆轉錄子(圖17) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 16. Type IV Retrotranscript (Figure 17)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-L2-S2-L3-S3-a2, 其中: a1/a2為9 bp長度之莖,不具有懸垂; L1為5 bp之莖,具有6-nt尖端; L2為9 bp之莖,具有4-nt尖端; L3為26 bp之莖,具有5-nt尖端、距尖端7 bp之0/1凸起及距尖端9 bp之0/1凸起; S1係a1/a2莖與L1之間之單股間隔區,其中L1與L2之間無間隔區; S2係L2與L3之間之單股間隔區;且 S3係L3與a1/a2莖之間之單股間隔區; 保守核苷酸如圖17所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-S1-L1-L2-S2-L3-S3-a2, wherein: a1/a2 is a 9 bp long stem without an overhang; L1 is a 5 bp stem with a 6-nt tip; L2 is a 9 bp stem with a 4-nt tip; L3 is a 26 bp stem with a 5-nt tip, a 0/1 bulge 7 bp from the tip, and a 0/1 bulge 9 bp from the tip; S1 is a single-stranded spacer between the a1/a2 stem and L1, wherein there is no spacer between L1 and L2; S2 is a single-stranded spacer between L2 and L3; and S3 is a single-stranded spacer between L3 and a1/a2 stems; the conserved nucleotides are shown in FIG. 17 , and the colored circled nucleotides are present at their respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides are present).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖17所示之彩色字母所表示的保守核苷酸。 IX 型逆轉錄子(圖18) In some embodiments, the engineered retrotransposons are completely synthetically produced and have conserved nucleotides represented by colored letters as shown in Figure 17. Type IX retrotransposons (Figure 18)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-L1-S1-L2-S2-a2, 其中: a1/a2為12 bp長度之莖,其中a1具有14-nt懸垂且a2具有2-nt懸垂; L1為11 bp之莖,具有3-nt尖端及距尖端7 bp之1/3凸起; L2為25 bp之莖,具有7-nt尖端; S1係L1與L2之間之單股間隔區; S2係L2與a1/a2之間之單股間隔區; 保守核苷酸如圖18所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-L1-S1-L2-S2-a2, wherein: a1/a2 is a stem of 12 bp in length, wherein a1 has a 14-nt overhang and a2 has a 2-nt overhang; L1 is a stem of 11 bp with a 3-nt tip and a 1/3 bulge 7 bp from the tip; L2 is a stem of 25 bp with a 7-nt tip; S1 is a single-stranded spacer between L1 and L2; S2 is a single-stranded spacer between L2 and a1/a2; Conserved nucleotides are shown in FIG. 18 , wherein the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖18所示之彩色字母所表示的保守核苷酸。 V 型逆轉錄子(圖19) In some embodiments, the engineered retrotransposons are completely synthetically produced and have conserved nucleotides represented by colored letters as shown in Figure 18. V- type retrotransposons (Figure 19)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-a2, 其中: a1/a2為13 bp長度之莖; L1為20 bp之莖,具有4-nt尖端及距尖端6 bp之6/4凸起; L2為14 bp之莖,具有4-nt尖端及距尖端5 bp之1/0凸起; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與a1/a2之間之單股間隔區, 保守核苷酸如圖19所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-a2, wherein: a1/a2 is a stem of 13 bp in length; L1 is a stem of 20 bp with a 4-nt tip and a 6/4 protrusion 6 bp from the tip; L2 is a stem of 14 bp with a 4-nt tip and a 1/0 protrusion 5 bp from the tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and a1/a2, Conserved nucleotides are shown in FIG. 19 , wherein the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖19所示之彩色字母所表示的保守核苷酸。 VI 型逆轉錄子(圖20) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 19. Type VI Retrotranscript (Figure 20)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-L3-L4-S4-a2, 其中: a1/a2為4 bp長度之莖,具有1 bp 5'懸垂; L1為7 bp之莖,具有4-nt尖端; L2為8 bp之莖,具有4-nt尖端; L3為16 bp之莖,具有6-nt尖端、距尖端3 bp之3/4凸起、距尖端5 bp之2/3凸起以及距尖端8 bp之3/1凸起; S1係a1/a2莖與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與L3之間之單股間隔區,其中L3與L4之間無間隔區; S4係L4與a1/a2莖之間之單股間隔區;且 保守核苷酸如圖20所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-L3-L4-S4-a2, wherein: a1/a2 is a 4 bp long stem with a 1 bp 5'overhang; L1 is a 7 bp stem with a 4-nt tip; L2 is an 8 bp stem with a 4-nt tip; L3 is a 16 bp stem with a 6-nt tip, a 3/4 bulge 3 bp from the tip, a 2/3 bulge 5 bp from the tip, and a 3/1 bulge 8 bp from the tip; S1 is a single-stranded spacer between the a1/a2 stem and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and L3, wherein there is no spacer between L3 and L4; S4 is a single-stranded spacer between L4 and a1/a2 stems; and the conserved nucleotides are as shown in FIG. 20 , wherein the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides are present).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖20所示之彩色字母所表示的保守核苷酸。 XI 型組1逆轉錄子(圖21) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 20. Type XI Group 1 Retrotranscript (Figure 21)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-a2, 其中: a1/a2為16 bp長度之莖,a1上具有5-nt懸垂,且a2上具有3-nt懸垂; L1為9 bp之莖,具有3-nt尖端; L2為7 bp之莖,具有13-nt尖端; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與a1/a2之間之單股間隔區, 保守核苷酸如圖21所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-a2, wherein: a1/a2 is a 16 bp long stem with a 5-nt overhang on a1 and a 3-nt overhang on a2; L1 is a 9 bp stem with a 3-nt tip; L2 is a 7 bp stem with a 13-nt tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and a1/a2, Conserved nucleotides are shown in FIG. 21 , wherein the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖21所示之彩色字母所表示的保守核苷酸。 XI 型組2逆轉錄子(圖22) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 21. Type XI Group 2 Retrotranscript (Figure 22)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-a2, 其中: a1/a2為13 bp長度之莖,a2上具有1-nt懸垂; L1為7 bp之莖,具有3-nt尖端及距尖端1 bp之2/2凸起; L2為8 bp之莖,具有20-nt尖端; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與a1/a2之間之單股間隔區, 保守核苷酸如圖22所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-a2, wherein: a1/a2 is a 13 bp long stem with a 1-nt overhang on a2; L1 is a 7 bp stem with a 3-nt tip and a 2/2 protrusion 1 bp from the tip; L2 is an 8 bp stem with a 20-nt tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and a1/a2, Conserved nucleotides are shown in FIG. 22 , wherein the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖22所示之彩色字母所表示的保守核苷酸。 XII 型逆轉錄子(圖23) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 22. Type XII Retrotranscript (Figure 23)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-a2, 其中: a1/a2為13 bp長度之莖,a2上具有1-nt懸垂; L1為7 bp之莖,具有3-nt尖端及距尖端1 bp之2/2凸起; L2為8 bp之莖,具有19-nt尖端; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與a1/a2之間之單股間隔區, 保守核苷酸如圖23所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-a2, wherein: a1/a2 is a 13 bp long stem with a 1-nt overhang on a2; L1 is a 7 bp stem with a 3-nt tip and a 2/2 protrusion 1 bp from the tip; L2 is an 8 bp stem with a 19-nt tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and a1/a2, Conserved nucleotides are shown in FIG. 23 , wherein the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖23所示之彩色字母所表示的保守核苷酸。 XIII 型逆轉錄子(圖24) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 23. Type XIII Retrotranscript (Figure 24)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-L3-S3-a2, 其中: a1/a2為7 bp長度之莖,不具有懸垂; L1為8 bp之莖,具有3-nt尖端; L2為30 bp之莖,具有8-nt尖端、距尖端2 bp之1/1凸起及距尖端27 bp之1/1凸起; L3為8 bp之莖,具有5-nt尖端及距尖端3 nt之0/1凸起; S1係a1/a2莖與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區;且 S3係L3與a1/a2莖之間之單股間隔區; 保守核苷酸如圖24所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-S1-L1-S2-L2-L3-S3-a2, wherein: a1/a2 is a 7 bp long stem without an overhang; L1 is an 8 bp stem with a 3-nt tip; L2 is a 30 bp stem with an 8-nt tip, a 1/1 bulge 2 bp from the tip, and a 1/1 bulge 27 bp from the tip; L3 is an 8 bp stem with a 5-nt tip and a 0/1 bulge 3 nt from the tip; S1 is a single-stranded spacer between the a1/a2 stem and L1; S2 is a single-stranded spacer between L1 and L2; and S3 is a single-stranded spacer between L3 and a1/a2 stems; the conserved nucleotides are shown in FIG. 24 , and the colored circled nucleotides are present at their respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖24所示之彩色字母所表示的保守核苷酸。 XIV 型逆轉錄子(圖25) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 24. Type XIV Retrotranscript (Figure 25)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-L2-S2-L3-S3-a2, 其中: a1/a2莖為15 bp長,不具有懸垂,且具有距a1之5'端7 bp之4/2凸起; L1為8 bp之莖,具有5-nt尖端; L2為7 bp之莖,具有5-nt尖端; L3為13 bp之莖,具有2-nt尖端、距尖端5 bp之5/9凸起及距尖端8 bp之5/5凸起; S1、S2及S3分別為a1/a2莖與L1、L2與L3以及L3與a1/a2莖之間之單股間隔區,其中L1與L2之間無間隔基;且其中S1之倒數第5-3個nt及S2之第2-5個4 nt形成3-bp莖,且 保守核苷酸如圖25所示,且其中彩色圓圈核苷酸以各自之確定性水準存在(例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a common secondary structure that can be described as follows: a1-S1-L1-L2-S2-L3-S3-a2, wherein: the a1/a2 stem is 15 bp long, has no overhang, and has a 4/2 protrusion 7 bp from the 5' end of a1; L1 is an 8 bp stem with a 5-nt tip; L2 is a 7 bp stem with a 5-nt tip; L3 is a 13 bp stem with a 2-nt tip, a 5/9 protrusion 5 bp from the tip, and a 5/5 protrusion 8 bp from the tip; S1, S2 and S3 are single-stranded spacer regions between a1/a2 stem and L1, L2 and L3, and L3 and a1/a2 stem, respectively, wherein there is no spacer between L1 and L2; wherein the 5th to 3rd last nt of S1 and the 2nd to 5th 4 nt of S2 form a 3-bp stem, and the conserved nucleotides are shown in Figure 25, and wherein the colored circled nucleotides are present at their respective certainty levels (e.g., at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides are present).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖25所示之彩色字母所表示的保守核苷酸。 Ec107 樣逆轉錄子(圖26)在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-L3-S4-a2, 其中: a1/a2為12 bp長度之莖; L1為4 bp之莖,具有8-nt尖端; L2為8 bp之莖,具有3-nt尖端; L3為22 bp之莖,具有3-nt尖端、距尖端6 bp之4/6凸起、距尖端13 bp之3/3凸起以及距尖端18 bp之1/1凸起; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與L3之間之單股間隔區;且 S4係L3與a1/a2之間之單股間隔區, 保守核苷酸如圖26所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the engineered retrotranscript is entirely synthetically produced and has conserved nucleotides represented by colored letters as shown in FIG. 25 . Ec107 -like retrotranscript (FIG. 26) In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-L3-S4-a2, wherein: a1/a2 is a stem of 12 bp in length; L1 is a stem of 4 bp with an 8-nt tip; L2 is a stem of 8 bp with a 3-nt tip; L3 is a stem of 22 bp with a 3-nt tip, a 4/6 bulge 6 bp from the tip, a 3/3 bulge 13 bp from the tip, and a 1/1 bulge 18 bp from the tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and L3; and S4 is a single-stranded spacer between L3 and a1/a2. The conserved nucleotides are shown in FIG. 26 , and the colored circled nucleotides are present at their respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖26所示之彩色字母所表示的保守核苷酸。 外群A逆轉錄子(圖27) In some embodiments, the engineered retrotranscript is completely synthetically produced and has conserved nucleotides represented by colored letters as shown in Figure 26. Outgroup A retrotranscript (Figure 27)

在一些實施例中,用於本發明之經工程改造之逆轉錄子的模板/wt逆轉錄子編碼野生型逆轉錄子ncRNA多核苷酸,該多核苷酸具有可如下描述之共有二級結構: a1-S1-L1-S2-L2-S3-L3-S4-a2, 其中: a1/a2為5 bp長度之莖,a2上具有2-nt懸垂; L1為11 bp之莖,具有4-nt尖端; L2為8 bp之莖,具有3-nt尖端; L3為18 bp之莖,具有3-nt尖端; S1係a1/a2與L1之間之單股間隔區; S2係L1與L2之間之單股間隔區; S3係L2與L3之間之單股間隔區; S4係L3與a1/a2之間之單股間隔區, 保守核苷酸如圖27所示,且其中彩色圓圈核苷酸以各自之確定性水準存在( 例如,存在至少約97%之紅色圓圈核苷酸、至少約90-97%之黑色圓圈核苷酸、至少約75-90%之灰色圓圈核苷酸及至少約50%之白色圓圈核苷酸)。 In some embodiments, the template/wt retrotranscript used in the engineered retrotranscript of the present invention encodes a wild-type retrotranscript ncRNA polynucleotide having a consensus secondary structure that can be described as follows: a1-S1-L1-S2-L2-S3-L3-S4-a2, wherein: a1/a2 is a 5 bp long stem with a 2-nt overhang on a2; L1 is an 11 bp stem with a 4-nt tip; L2 is an 8 bp stem with a 3-nt tip; L3 is an 18 bp stem with a 3-nt tip; S1 is a single-stranded spacer between a1/a2 and L1; S2 is a single-stranded spacer between L1 and L2; S3 is a single-stranded spacer between L2 and L3; S4 is a single-stranded spacer between L3 and a1/a2, Conserved nucleotides are shown in FIG. 27 , wherein the colored circled nucleotides are present at respective certainty levels ( e.g. , at least about 97% of the red circled nucleotides, at least about 90-97% of the black circled nucleotides, at least about 75-90% of the gray circled nucleotides, and at least about 50% of the white circled nucleotides).

在一些實施例中,經工程改造之逆轉錄子完全係合成產生的且具有如圖27所示之彩色字母所表示的保守核苷酸。In some embodiments, the engineered retrotranscript is produced entirely synthetically and has conserved nucleotides represented by colored letters as shown in FIG. 27 .

例如,可在轉染細胞或接合至載體中之前執行本文所述的經工程改造之逆轉錄子之擴增(例如,圖2-27中)。可使用用於擴增經工程改造之逆轉錄子的任何方法,包括但不限於聚合酶鏈反應(PCR)、等溫擴增、基於核酸序列之擴增(NASBA)、轉錄介導之擴增(TMA)、股置換擴增(SDA)及連接酶鏈反應(LCR)。在一實施例中,經工程改造之逆轉錄子包含共有5'及3'啟動位點,以允許與一組通用引子平行地擴增逆轉錄子序列。在另一實施例中,使用一組選擇性引子自匯集之混合物中選擇性地擴增逆轉錄子序列之子集。For example, amplification of the engineered retrotranscripts described herein (e.g., in Figures 2-27) can be performed prior to transfection of cells or ligation into a vector. Any method for amplifying engineered retrotranscripts can be used, including but not limited to polymerase chain reaction (PCR), isothermal amplification, nucleic acid sequence-based amplification (NASBA), transcription-mediated amplification (TMA), strand-swap amplification (SDA), and ligase chain reaction (LCR). In one embodiment, the engineered retrotranscript comprises a common 5' and 3' priming site to allow amplification of retrotranscript sequences in parallel with a set of universal primers. In another embodiment, a set of selective primers is used to selectively amplify a subset of retrotranscript sequences from a pooled mixture.

此等模板之變異體,其亦可用於本發明之經工程改造之逆轉錄子,包括具有以下之變異體:A)每10個紅色字母核苷酸多達1、2或3個( 例如,多達1個)核苷酸變化;B)每10個黑色字母核苷酸多達4、5或6個( 例如,多達1或2個)核苷酸變化;及/或C)每10個灰色字母核苷酸多達7、8或9個( 例如,多達3或4個)核苷酸變化;及/或視情況進一步包含:a)每10個紅色圓圈核苷酸存在7、8、9或10個( 例如,9或10個)核苷酸;b)每10個黑色圓圈核苷酸存在6、7、8、9或10個( 例如,8、9或10個)核苷酸;c)每10個灰色圓圈核苷酸存在4、5、6、7、8、9或10個( 例如,6、7、8、9或10個)核苷酸;及/或d)每10個白色圓圈核苷酸存在2、3、4、5、6、7、8、9或10個( 例如,4、5、6、7、8、9或10個)核苷酸。 Variants of these templates, which may also be used in the engineered retrotransposons of the present invention, include variants having: A) up to 1, 2 or 3 ( e.g. , up to 1) nucleotide changes for every 10 red lettered nucleotides; B) up to 4, 5 or 6 ( e.g. , up to 1 or 2) nucleotide changes for every 10 black lettered nucleotides; and/or C) up to 7, 8 or 9 ( e.g. , up to 3 or 4) nucleotide changes for every 10 grey lettered nucleotides; and/or optionally further comprising: a) up to 10 red circled nucleotides a) there are 7, 8, 9 or 10 ( e.g. , 9 or 10) nucleotides per 10 black circled nucleotides; c) there are 4, 5, 6, 7, 8, 9 or 10 ( e.g. , 6, 7, 8, 9 or 10) nucleotides per 10 grey circled nucleotides; and/or d) there are 2, 3, 4, 5, 6, 7, 8, 9 or 10 ( e.g. , 4, 5, 6, 7, 8, 9 or 10) nucleotides per 10 white circled nucleotides.

在一些實施例中,經工程改造之逆轉錄子的非編碼RNA (ncRNA)部分包含編碼以下之多核苷酸(例如,DNA分子):表B中列出之ncRNA,或與表B中列出之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA。在一些實施例中,該ncRNA不包含與表X之RT序列相關的ncRNA。 ncRNA 及引導RNA之形式 In some embodiments, the non-coding RNA (ncRNA) portion of the engineered retrotransposons comprises a polynucleotide (e.g., a DNA molecule) encoding an ncRNA listed in Table B, or an ncRNA having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, or at least 99.9% sequence identity to an ncRNA listed in Table B. In some embodiments, the ncRNA does not comprise an ncRNA related to an RT sequence of Table X. Forms of ncRNA and Guide RNA

在某些實施例中,ncRNA及引導RNA可作為單一分子經遞送,亦即,其中引導RNA融合至ncRNA之5’及/或3’端。在一些實施例中,ncRNA可具有位於兩端之引導RNA。In some embodiments, the ncRNA and guide RNA can be delivered as a single molecule, i.e., wherein the guide RNA is fused to the 5' and/or 3' end of the ncRNA. In some embodiments, the ncRNA can have guide RNAs at both ends.

在其他實施例中,引導RNA及ncRNA可作為單獨組分經提供及/或經遞送。如實例4所示,引導RNA與ncRNA之分離可引起增加之編輯效率。In other embodiments, the guide RNA and ncRNA can be provided and/or delivered as separate components. As shown in Example 4, the separation of the guide RNA and ncRNA can result in increased editing efficiency.

在其他實施例中,ncRNA-gRNA融合可與單獨之引導RNA共遞送。 經修飾之ncRNA In other embodiments, the ncRNA-gRNA fusion can be co-delivered with a separate guide RNA.

在其他實施例中,可藉由將額外RNA模體引入ncRNA中,例如ncRNA之5'及3'末端,或甚至其間位置處(例如, msrmsd區域中)來修飾本文所揭示之ncRNA,以改良轉錄產生及/或穩定性及/或功能(例如,RT-DNA產生)。此類結構可包括但不限於RNA髮夾、RNA莖-環、RNA四鏈體、帽結構及聚(A)尾或核酶功能及其類似結構。此外,ncRNA亦可能經修飾以包括一或多個核定位序列。 In other embodiments, the ncRNA disclosed herein may be modified by introducing additional RNA motifs into the ncRNA, such as at the 5' and 3' ends of the ncRNA, or even at positions therebetween (e.g., in the msr or msd regions), to improve transcriptional production and/or stability and/or function (e.g., RT-DNA production). Such structures may include, but are not limited to, RNA hairpins, RNA stem-loops, RNA quadruplexes, cap structures, and poly(A) tails or ribozyme functions and the like. In addition, the ncRNA may also be modified to include one or more nuclear localization sequences.

額外RNA模體亦可能改良ncRNA之RT可加工性或藉由增強RT結合來增強ncRNA活性。在ncRNA之5'及3'末端添加二聚化模體(諸如接吻環或GNRA四環/四環受體對)亦可能導致ncRNA之有效環化,從而改良穩定性。另外,預期此等模體之添加可能實現ncRNA組分之物理分離,例如 msrmsd區域之分離。ncRNA之短5'延伸或3’延伸在ncRNA之任一端或兩端形成小的立足點髮夾,這亦可能有利地與沿ncRNA長度之內互補區之退火競爭。最後,接吻環亦可能用於將其他RNA或蛋白質募集至基因體位點,且實現RT活性自一種RNA交換至另一RNA。 Additional RNA motifs may also improve the RT processibility of ncRNAs or enhance ncRNA activity by enhancing RT binding. Addition of dimerization motifs (such as kissing loops or GNRA tetraloop/tetraloop receptor pairs) to the 5' and 3' ends of ncRNAs may also lead to efficient circularization of ncRNAs, thereby improving stability. In addition, it is expected that the addition of these motifs may enable physical separation of ncRNA components, such as separation of msr and msd regions. Short 5' or 3' extensions of ncRNAs form small foothold hairpins at either or both ends of the ncRNA, which may also advantageously compete with annealing of internal complementary regions along the length of the ncRNA. Finally, kissing loops may also be used to recruit other RNAs or proteins to genome sites and enable exchange of RT activity from one RNA to another.

ncRNA可能經由定向進化,以類似於可如何改良蛋白質功能之方式進一步經改良。定向演化可能增強RT之ncRNA識別,及/或減少位點外靶向及/或插入缺失,及/或改良精確編輯效率。ncRNAs may be further improved through directed evolution in a manner similar to how protein function can be improved. Directed evolution may enhance ncRNA recognition by RT, and/or reduce off-site targeting and/or indels, and/or improve precision editing efficiency.

本揭示案考慮任何此類方式來進一步改良本文所揭示之ncRNA之穩定性及/或功能性。The present disclosure contemplates any such means to further improve the stability and/or functionality of the ncRNAs disclosed herein.

在一些實施例中,本揭示案之組合物中使用的RNA (包括引導RNA及ncRNA)已經歷化學或生物修飾以使其更穩定。對RNA之例示性修飾包括鹼基之耗盡(例如,藉由缺失或藉由用一種核苷酸取代另一核苷酸)或鹼基之修飾(例如,鹼基之化學修飾)。如本文所用,片語「化學修飾」包括引入與天然存在之RNA中所見不同的化學性質之修飾,例如共價修飾,諸如引入經修飾核苷酸(例如,核苷酸類似物,或包括此類mRNA分子中未天然可見之側基)。In some embodiments, the RNA (including guide RNA and ncRNA) used in the compositions of the present disclosure has been chemically or biologically modified to make it more stable. Exemplary modifications to RNA include depletion of bases (e.g., by deletion or by substitution of one nucleotide for another) or modification of bases (e.g., chemical modification of bases). As used herein, the phrase "chemical modification" includes modifications that introduce chemical properties that are different from those found in naturally occurring RNA, such as covalent modifications, such as the introduction of modified nucleotides (e.g., nucleotide analogs, or the inclusion of side groups that are not naturally found in such mRNA molecules).

可併入本揭示案之組合物中所用之RNA中的其他合適之多核苷酸修飾包括但不限於4'-硫代修飾之鹼基:4'-硫代-腺苷、4'-硫代-鳥苷、4'-硫代-胞苷、4'-硫代-尿苷、4'-硫代-5-甲基-胞苷、4'-硫代-假尿苷及4'-硫代-2-硫代尿苷、吡啶-4-酮核糖核苷、5-氮雜-尿苷、2-硫代-5-氮雜-尿苷、2-硫代尿苷、4-硫代-假尿苷、2-硫代-假尿苷、5-羥基尿苷、3-甲基尿苷、5-羧甲基-尿苷、1-羧甲基-假尿苷、5-丙炔基-尿苷、1-丙炔基-假尿苷、5-牛磺酸甲基尿苷、1-牛磺酸甲基-假尿苷、5-牛磺酸甲基-2-硫代-尿苷、1-牛磺酸甲基-4-硫代-尿苷、5-甲基-尿苷、1-甲基-假尿苷、4-硫代-1-甲基-假尿苷、2-硫代-1-甲基-假尿苷、1-甲基-1-去氮雜-假尿苷、2-硫代-1-甲基-1-去氮雜-假尿苷、二氫尿苷、二氫假尿苷、2-硫代-二氫尿苷、2-硫代-二氫假尿苷、2-甲氧基尿苷、2-甲氧基-4-硫代-尿苷、4-甲氧基-假尿苷、4-甲氧基-2-硫代-假尿苷、5-氮雜-胞苷、假異胞苷、3-甲基-胞苷、N4-乙醯基胞苷、5-甲醯基胞苷、N4-甲基胞苷、5-羥基甲基胞苷、1-甲基-假異胞苷、吡咯并-胞苷、吡咯并-假異胞苷、2-硫代-胞苷、2-硫代-5-甲基-胞苷、4-硫代-假異胞苷、4-硫代-1-甲基-假異胞苷、4-硫代-1-甲基-1-去氮雜-假異胞苷、1-甲基-1-去氮雜-假異胞苷、澤布拉林、5-氮雜-澤布拉林、5-甲基-澤布拉林、5-氮雜-2-硫代-澤布拉林、2-硫代-澤布拉林、2-甲氧基-胞苷、2-甲氧基-5-甲基-胞苷、4-甲氧基-假異胞苷、4-甲氧基-1-甲基-假異胞苷、2-胺基嘌呤、2,6-二胺基嘌呤、7-去氮雜-腺嘌呤、7-去氮雜-8-氮雜-腺嘌呤、7-去氮雜-2-胺基嘌呤、7-去氮雜-8-氮雜-2-胺基嘌呤、7-去氮雜-2,6-二胺基嘌呤、7-去氮雜-8-氮雜-2,6-二胺基嘌呤、1-甲基腺苷、N6-甲基腺苷、N6-異戊烯基腺苷、N6-(順-羥基異戊烯基)腺苷、2-甲硫基-N6-(順-羥基異戊烯基)腺苷、N6-甘胺醯基胺甲醯基腺苷、N6-酥胺醯基胺甲醯基腺苷、2-甲硫基-N6-酥胺醯基胺甲醯基腺苷、N6,N6-二甲基腺苷、7-甲基腺嘌呤、2-甲硫基-腺嘌呤及2-甲氧基-腺嘌呤、肌苷、1-甲基-肌苷、懷俄苷、懷丁苷、7-去氮雜-鳥苷、7-去氮雜-8-氮雜-鳥苷、6-硫代-鳥苷、6-硫代-7-去氮雜-鳥苷、6-硫代-7-去氮雜-8-氮雜-鳥苷、7-甲基-鳥苷、6-硫代-7-甲基鳥苷、7-甲基肌苷、6 -甲氧基-鳥苷、1-甲基鳥苷、N2-甲基鳥苷、N2,N2-二甲基鳥苷、8-側氧基-鳥苷、7-甲基-8-側氧基-鳥苷、1-甲基-6-硫代-鳥苷、N2-甲基-6-硫代-鳥苷及N2,N2-二甲基-6-硫代-鳥苷及其組合。術語修飾亦包括例如將非核苷酸鍵聯或經修飾核苷酸併入本發明之mRNA序列中(例如,對編碼功能性蛋白質或酶之mRNA分子的3'及5'端之一或兩者進行之修飾)。此類修飾包括向mRNA序列中添加鹼基(例如,包括聚A尾或更長之聚A尾)、改變3' UTR或5' UTR、使mRNA與劑(例如,蛋白質或互補核酸分子)複合以及包括改變RNA分子結構之元件(例如,其形成二級結構)。Other suitable polynucleotide modifications that can be incorporated into the RNA used in the compositions of the present disclosure include, but are not limited to, 4'-thio modified bases: 4'-thio-adenosine, 4'-thio-guanosine, 4'-thio-cytidine, 4'-thio-uridine, 4'-thio-5-methyl-cytidine, 4'-thio-pseudouridine and 4'-thio-2-thiouridine, pyridin-4-one ribonucleosides, 5- Aza-uridine, 2-thio-5-aza-uridine, 2-thio-uridine, 4-thio-pseudouridine, 2-thio-pseudouridine, 5-hydroxyuridine, 3-methyluridine, 5-carboxymethyl-uridine, 1-carboxymethyl-pseudouridine, 5-propynyl-uridine, 1-propynyl-pseudouridine, 5-taurine methyl uridine, 1-taurine methyl-pseudouridine, 5-taurine methyl-2-thio-uridine, 1 -taurine methyl-4-thio-uridine, 5-methyl-uridine, 1-methyl-pseudouridine, 4-thio-1-methyl-pseudouridine, 2-thio-1-methyl-pseudouridine, 1-methyl-1-deaza-pseudouridine, 2-thio-1-methyl-1-deaza-pseudouridine, dihydrouridine, dihydropseudouridine, 2-thio-dihydrouridine, 2-thio-dihydropseudouridine, 2-methoxyuridine, 2-methoxy-4-thio-uridine, 4-methoxy-pseudouridine, 4-methoxy-2-thio-pseudouridine, 5-aza-cytidine, pseudoisocytidine, 3-methyl-cytidine, N4-acetylcytidine, 5-methylcytidine, N4-methylcytidine, 5-hydroxymethylcytidine, 1-methyl-pseudoisocytidine, pyrrolo-cytidine, pyrrolo-pseudoisocytidine, 2-thio-cytidine, 2-thio-5- Methyl-cytidine, 4-thio-pseudoisocytidine, 4-thio-1-methyl-pseudoisocytidine, 4-thio-1-methyl-1-deaza-pseudoisocytidine, 1-methyl-1-deaza-pseudoisocytidine, zebularine, 5-aza-zebularine, 5-methyl-zebularine, 5-aza-2-thio-zebularine, 2-thio-zebularine, 2-methoxy-cytidine, 2-methoxy-5- Methyl-cytidine, 4-methoxy-pseudoisocytidine, 4-methoxy-1-methyl-pseudoisocytidine, 2-aminopurine, 2,6-diaminopurine, 7-deaza-adenine, 7-deaza-8-aza-adenine, 7-deaza-2-aminopurine, 7-deaza-8-aza-2-aminopurine, 7-deaza-2,6-diaminopurine, 7-deaza-8-aza-2, 6-diaminopurine, 1-methyladenosine, N6-methyladenosine, N6-isopentenyladenosine, N6-(cis-hydroxyisopentenyl)adenosine, 2-methylthio-N6-(cis-hydroxyisopentenyl)adenosine, N6-glycolylaminomethyladenosine, N6-succinimidylaminomethyladenosine, 2-methylthio-N6-succinimidylaminomethyladenosine, N6,N6-dimethyladenosine, 7-methyl Adenine, 2-methylthio-adenine and 2-methoxy-adenine, inosine, 1-methyl-inosine, yadosine, yadosine, 7-deaza-guanosine, 7-deaza-8-aza-guanosine, 6-thio-guanosine, 6-thio-7-deaza-guanosine, 6-thio-7-deaza-8-aza-guanosine, 7-methyl-guanosine, 6-thio-7-methylguanosine, 7-methylinosine, 6 The term modification also includes, for example, incorporation of non-nucleotide linkages or modified nucleotides into the mRNA sequence of the present invention (e.g., modification of one or both of the 3' and 5' ends of an mRNA molecule encoding a functional protein or enzyme). Such modifications include adding bases to the mRNA sequence (e.g., including a poly A tail or a longer poly A tail), altering the 3'UTR or 5'UTR, complexing the mRNA with an agent (e.g., a protein or a complementary nucleic acid molecule), and including elements that alter the structure of the RNA molecule (e.g., which form a secondary structure).

在一些實施例中,RNA (例如,ncRNA)包括5'帽結構。通常如下添加5'帽:首先,RNA末端磷酸酶自5'核苷酸上移除一個末端磷酸酯基,從而留下兩個末端磷酸酯基;接著,經由鳥苷基轉移酶將三磷酸鳥苷(GTP)添加至末端磷酸酯基中,從而產生5'5'5三磷酸鍵聯;且接著,藉由甲基轉移酶使鳥嘌呤之7-氮甲基化。帽結構之實例包括但不限於m7G(5')ppp (5'(A、G(5')ppp(5')A及G(5')ppp(5')G。天然存在之帽結構包含經由三磷酸橋連接至首先轉錄之核苷酸的5'端之7-甲基鳥苷,從而產生m7G(5')ppp(5')N之二核苷酸帽,其中N為任何核苷。活體內,酶促添加該帽。該帽添加至細胞核中且由酶鳥苷基轉移酶催化。在轉錄起始後,立即在RNA 5'末端添加該帽。末端核苷通常為鳥苷,且與所有其他核苷酸之取向相反,亦即G(5')ppp(5')GpNpNp。In some embodiments, RNA (e.g., ncRNA) includes a 5' cap structure. The 5' cap is generally added as follows: first, an RNA terminal phosphatase removes one terminal phosphate group from the 5' nucleotide, leaving two terminal phosphate groups; then, guanosine triphosphate (GTP) is added to the terminal phosphate group via a guanosyltransferase, thereby generating a 5'5'5 triphosphate bond; and then, the 7-nitrogen of guanine is methylated by a methyltransferase. Examples of cap structures include, but are not limited to, m7G(5')ppp(5'(A, G(5')ppp(5')A, and G(5')ppp(5')G. The naturally occurring cap structure comprises a 7-methylguanosine linked via a triphosphate bridge to the 5' end of the first transcribed nucleotide, resulting in a dinucleotide cap of m7G(5')ppp(5')N, where N is any nucleoside. In vivo, the cap is added enzymatically. The cap is added to the cell nucleus and is catalyzed by the enzyme guanosyltransferase. Immediately after transcription is initiated, the cap is added to the 5' end of the RNA. The terminal nucleoside is usually guanosine and is in the opposite orientation to all other nucleotides, i.e., G(5')ppp(5')GpNpNp.

額外帽類似物包括但不限於選自由以下組成之群的化學結構:m7GpppG、m7GpppA、m7GpppC;未甲基化帽類似物(例如,GpppG);二甲基化帽類似物(例如,m2,7GpppG)、三甲基化帽類似物(例如,m2,2,7GpppG)、二甲基化對稱帽類似物(例如,m7Gpppm7G)或抗反向帽類似物(例如,ARCA;m7,2'OmeGpppG、m72'dGpppG、m7,3'OmeGpppG、m7,3'dGpppG及其四磷酸衍生物) (參見例如Jemielity, J.等人, 「Novel 'anti-reverse' cap analogs with superior translational properties」, RNA, 9: 1108-1122 (2003))。Additional cap analogs include, but are not limited to, chemical structures selected from the group consisting of: m7GpppG, m7GpppA, m7GpppC; unmethylated cap analogs (e.g., GpppG); dimethylated cap analogs (e.g., m2,7GpppG), trimethylated cap analogs (e.g., m2,2,7GpppG), dimethylated symmetric cap analogs (e.g., m7Gpppm7G) or anti-reverse cap analogs (e.g., ARCA; m7,2'OmeGpppG, m72'dGpppG, m7,3'OmeGpppG, m7,3'dGpppG and their tetraphosphate derivatives) (see, e.g., Jemielity, J. et al., "Novel 'anti-reverse' cap analogs with superior translational properties", RNA, 9: 1108-1122 (2003)).

通常,「尾」之存在用於保護RNA (例如,ncRNA)免受核酸外切酶降解。聚A或聚U尾被認為穩定天然信使及合成有義RNA。因此,在某些實施例中,可將長的聚A或聚U尾添加至RNA分子中,因此使RNA更穩定。可使用多種技術公認技術來添加聚A或聚U尾。例如,可使用聚A聚合酶將長的聚A尾添加至合成或活體外轉錄之RNA中(Yokoe等人 Nature Biotechnology.1996; 14: 1252-1256)。轉錄載體亦可編碼長的聚A尾。此外,可藉由直接自PCR產物轉錄來添加聚A尾。亦可用RNA連接酶將聚A接合至有義RNA之3'端(參見例如Molecular Cloning A Laboratory Manual, 第2版, Sambrook, Fritsch及Maniatis編 (Cold Spring Harbor Laboratory Press: 1991版))。Typically, the presence of a "tail" is used to protect RNA (e.g., ncRNA) from exonuclease degradation. Poly A or poly U tails are believed to stabilize natural messengers and synthetic sense RNAs. Therefore, in certain embodiments, long poly A or poly U tails can be added to RNA molecules, thereby making the RNA more stable. Poly A or poly U tails can be added using a variety of technically recognized techniques. For example, poly A polymerase can be used to add a long poly A tail to synthetic or in vitro transcribed RNA (Yokoe et al. Nature Biotechnology. 1996; 14: 1252-1256). Transcription vectors can also encode long poly A tails. In addition, poly A tails can be added by directly transcribing from PCR products. Poly A can also be joined to the 3' end of the sense RNA using RNA ligase (see, e.g., Molecular Cloning A Laboratory Manual, 2nd edition, Sambrook, Fritsch and Maniatis, eds. (Cold Spring Harbor Laboratory Press: 1991 edition)).

通常,聚A或聚U尾之長度可為至少約10、50、100、200、300、400 至少500個核苷酸。在一些實施例中,mRNA 3'末端上之聚A尾通常包括約10至300個腺苷核苷酸(例如,約10至200個腺苷核苷酸、約10至150個腺苷核苷酸、約10至100個腺苷核苷酸、約20至70個腺苷核苷酸或約20至60個腺苷核苷酸)。在一些實施例中,mRNA包括3'聚(C)尾結構。mRNA 3'末端上之合適聚C尾通常包括約10至200個胞嘧啶核苷酸(例如,約10至150個胞嘧啶核苷酸、約10至100個胞嘧啶核苷酸、約20至70個胞嘧啶核苷酸、約20至60個胞嘧啶核苷酸或約10至40個胞嘧啶核苷酸)。聚C尾可添加至聚A或聚U尾中,或可取代聚A或聚U尾。Typically, the length of the poly A or poly U tail can be at least about 10, 50, 100, 200, 300, 400 or at least 500 nucleotides. In some embodiments, the poly A tail at the 3' end of the mRNA typically includes about 10 to 300 adenosine nucleotides (e.g., about 10 to 200 adenosine nucleotides, about 10 to 150 adenosine nucleotides, about 10 to 100 adenosine nucleotides, about 20 to 70 adenosine nucleotides, or about 20 to 60 adenosine nucleotides). In some embodiments, the mRNA includes a 3' poly (C) tail structure. Suitable poly-C tails on the 3' end of the mRNA typically include about 10 to 200 cytosine nucleotides (e.g., about 10 to 150 cytosine nucleotides, about 10 to 100 cytosine nucleotides, about 20 to 70 cytosine nucleotides, about 20 to 60 cytosine nucleotides, or about 10 to 40 cytosine nucleotides). The poly-C tail may be added to the poly-A or poly-U tail, or may replace the poly-A or poly-U tail.

可根據多種已知方法中之任一種來合成根據本揭示案之RNA (例如,ncRNA)。例如,可經由活體外轉錄(IVT)來合成根據本發明之RNA。簡言之,通常用含有啟動子之線性或環狀DNA模板、三磷酸核糖核苷酸池、可包括DTT及鎂離子之緩衝系統以及適當RNA聚合酶(例如,T3、T7或SP6 RNA聚合酶)、DNAse I、焦磷酸酶及/或RNAse抑制劑執行IVT。確切條件將根據特定應用而異。經改良之ncRNA IVT方法揭示於本文實例5中。RNA (e.g., ncRNA) according to the present disclosure may be synthesized according to any of a variety of known methods. For example, RNA according to the present invention may be synthesized by in vitro transcription (IVT). Briefly, IVT is typically performed with a linear or circular DNA template containing a promoter, a pool of ribonucleotide triphosphates, a buffer system that may include DTT and magnesium ions, and an appropriate RNA polymerase (e.g., T3, T7, or SP6 RNA polymerase), DNAse I, pyrophosphatase, and/or RNAse inhibitor. The exact conditions will vary depending on the specific application. An improved ncRNA IVT method is disclosed in Example 5 herein.

在一特定實施例中(如本文實例6中所例示),ncRNA可包含MS2修飾,作為在自然界中由某種MS2結合蛋白識別之特定RNA髮夾結構。此結構域可有助於穩定ncRNA且改良編輯效率。本揭示案考慮其他相似修飾。其他此類MS2樣結構域之綜述描述於此項技術中,例如,描述於Johansson等人, 「RNA recognition by the MS2 phage coat protein,」 Sem Virol., 1997, 第8卷(3): 176-185;Delebecque等人,「Organization of intracellular reactions with rationally designed RNA assemblies,」 Science, 2011, 第333卷: 470-474;Mali等人,「Cas9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering,」 Nat. Biotechnol., 2013, 第31卷: 833- 838;及Zalatan等人,「Engineering complex synthetic transcriptional programs with CRISPR RNA scaffolds,」 Cell, 2015, 第160卷: 339-350中,其中每一者以引用之方式整體併入本文中。其他系統包括PP7髮夾,其特定地募集PCP蛋白;及「com」髮夾,其特定地募集Com蛋白。參見Zalatan等人。MS2髮夾(或等同地稱為「MS2適體」)之核苷酸序列為:GCCAACATGAGGATCACCCATGTCTGCAGGGCC (SEQ ID NO:19935)。 B. 異源核苷酸序列(HNS) In a specific embodiment (as exemplified in Example 6 herein), the ncRNA may include an MS2 modification, as a specific RNA hairpin structure recognized by a certain MS2 binding protein in nature. This domain may help stabilize the ncRNA and improve editing efficiency. The present disclosure contemplates other similar modifications. Other such MS2-like domains are described in the art, for example, in Johansson et al., "RNA recognition by the MS2 phage coat protein," Sem Virol., 1997, Vol. 8(3): 176-185; Delebecque et al., "Organization of intracellular reactions with rationally designed RNA assemblies," Science, 2011, Vol. 333: 470-474; Mali et al., "Cas9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering," Nat. Biotechnol., 2013, Vol. 31: 833- 838; and Zalatan et al., "Engineering complex synthetic transcriptional programs with CRISPR RNA scaffolds," Cell, 2015, Vol. 160: 339-350, each of which is incorporated herein by reference in its entirety. Other systems include the PP7 hairpin, which specifically recruits PCP proteins; and the "com" hairpin, which specifically recruits Com proteins. See Zalatan et al. The nucleotide sequence of the MS2 hairpin (or equivalently, the "MS2 aptamer") is: GCCAACATGAGGATCACCCATGTCTGCAGGGCC (SEQ ID NO: 19935). B. Heterologous Nucleotide Sequence (HNS)

經工程改造之逆轉錄子可包含或編碼 msr基因座或 msd基因座內(諸如圖2-27中任一者及其變異體中之共有結構中的S區或L區之尖端中)或 msr基因座或 msd基因座上游或下游之異源核酸( 例如,DNA或RNA)。在一些實施例中,異源核酸插入 msd基因座內。在一些實施例中,異源核酸插入 msr基因座上游。在一些實施例中,異源核酸插入 msd基因座上游或下游。在其他實施例中,異源核酸插入髮夾環之間及/或髮夾環內( 例如,如圖2所描繪之L1、L2、L3及/或L4)之間隔序列( 例如,如圖2所描繪之S1、S2、S3及/或S4) ( 例如,尖端或環區)中,或凸起內。在一些實施例中,異源核酸序列可插入L區或髮夾環( 例如,如圖2所描繪之L1、L2、L3及/或L4)之尖端中。在一些實施例中,一或多種異源核酸插入 msd基因座中。在一些實施例中,第一異源核酸插入 msd基因座中,且第二異源核酸插入 msr基因座上游或 msd基因座上游或下游。 The engineered retrotransposons may comprise or encode heterologous nucleic acids (e.g., DNA or RNA) within the msr locus or msd locus ( e.g. , in the tip of the S region or L region in the consensus structure in any of Figures 2-27 and variants thereof) or upstream or downstream of the msr locus or msd locus. In some embodiments, the heterologous nucleic acid is inserted into the msd locus. In some embodiments, the heterologous nucleic acid is inserted upstream of the msr locus. In some embodiments, the heterologous nucleic acid is inserted upstream or downstream of the msd locus. In other embodiments, the heterologous nucleic acid is inserted into a spacer sequence ( e.g. , S1, S2, S3, and/or S4 as depicted in Figure 2) between and/or within the hairpin loops ( e.g. , L1, L2, L3, and/or L4 as depicted in Figure 2) ( e.g. , the tip or loop region), or within the bulge. In some embodiments, the heterologous nucleic acid sequence can be inserted into the tip of the L region or hairpin ( e.g. , L1, L2, L3 and/or L4 as depicted in Figure 2). In some embodiments, one or more heterologous nucleic acids are inserted into the msd locus. In some embodiments, the first heterologous nucleic acid is inserted into the msd locus, and the second heterologous nucleic acid is inserted upstream of the msr locus or upstream or downstream of the msd locus.

在一些實施例中,異源核酸包含實質上與細胞之標靶位點基因體序列互補的側接連續核苷酸(亦稱為同源臂),以促進經由同源定向修復(HDR)將異源核酸之至少部分插入細胞基因體中之標靶位點處。在一些實施例中,異源核酸係在>20個核苷酸與10,000個核苷酸之間( 例如,包括側接同源臂)。 In some embodiments, the heterologous nucleic acid comprises flanking contiguous nucleotides (also referred to as homology arms) that are substantially complementary to the target site genome sequence of the cell to facilitate insertion of at least a portion of the heterologous nucleic acid into the target site in the cell genome via homology directed repair (HDR). In some embodiments, the heterologous nucleic acid is between >20 nucleotides and 10,000 nucleotides ( e.g. , including the flanking homology arms).

在一些實施例中,異源核酸上之一或兩個同源臂與標靶基因體序列100%一致。在一些實施例中,異源核酸上之一或兩個同源臂與標靶基因體序列之互補性小於100%,例如與標靶基因體序列99%、98%、97%、96%、95%、94%、93%、92%、91%或90%一致。In some embodiments, one or both homology arms on the heterologous nucleic acid are 100% identical to the target genome sequence. In some embodiments, one or both homology arms on the heterologous nucleic acid are less than 100% complementary to the target genome sequence, such as 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91% or 90% identical to the target genome sequence.

欲插入標靶基因體序列中之異源核酸部分有時稱為供體序列。供體序列可與標靶基因體序列之全長或其部分部分地一致,或可與標靶基因體序列無關。供體序列可用於例如將修飾( 例如,取代、缺失、插入或其組合),諸如突變或其他遺傳變化( 例如,標靶多核苷酸上之遺傳元件(諸如終止密碼子)或開放閱讀框移位)引入其標靶序列中。 The portion of the heterologous nucleic acid to be inserted into the target genome sequence is sometimes referred to as a donor sequence. The donor sequence may be partially identical to the entire length of the target genome sequence or a portion thereof, or may be unrelated to the target genome sequence. The donor sequence can be used, for example, to introduce modifications ( e.g. , substitutions, deletions, insertions, or combinations thereof), such as mutations or other genetic changes ( e.g. , genetic elements (such as stop codons) or open reading frame shifts on the target polynucleotide) into its target sequence.

在一些實施例中,異源核酸序列係或編碼生物活性分子,諸如但不限於治療蛋白。In some embodiments, the heterologous nucleic acid sequence is or encodes a biologically active molecule, such as but not limited to a therapeutic protein.

任何治療蛋白均可由異源核酸編碼。Any therapeutic protein can be encoded by a heterologous nucleic acid.

在一些實施例中,異源核酸序列編碼一或多種預防或治療活性蛋白質、多肽或其他因子。In some embodiments, the heterologous nucleic acid sequence encodes one or more prophylactically or therapeutically active proteins, polypeptides, or other factors.

作為非限制性實例,異源序列可為或編碼在癌症中增強腫瘤殺死活性之劑,諸如但不限於TRAIL或腫瘤壞死因子(TNF)。作為另一非限制性實例,異源序列可為或編碼適合治療以下疾患之劑:諸如肌肉萎縮( 例如,異源序列係或編碼抗肌萎縮蛋白或其功能片段或變異體,諸如此項技術中已知之多種抗肌萎縮蛋白袖珍基因或微型抗肌萎縮蛋白編碼序列)、心血管疾病( 例如,異源序列係或編碼SERCA2a、GATA4、Tbx5、Mef2C、Hand2、Myocd )、神經退化性疾病( 例如,異源序列係或編碼NGF、BDNF、GDNF、NT-3 )。 As a non-limiting example, the heterologous sequence may be or encode an agent that enhances tumoricidal activity in cancer, such as, but not limited to, TRAIL or tumor necrosis factor (TNF). As another non-limiting example, the heterologous sequence may be or encode an agent suitable for treating the following diseases: such as muscle atrophy ( e.g. , the heterologous sequence is or encodes dystrophin or a functional fragment or variant thereof, such as the various dystrophin minigenes or mini-dystrophin encoding sequences known in the art), cardiovascular disease ( e.g. , the heterologous sequence is or encodes SERCA2a, GATA4, Tbx5, Mef2C, Hand2, Myocd, etc. ), neurodegenerative disease ( e.g. , the heterologous sequence is or encodes NGF, BDNF, GDNF, NT-3 , etc. ).

作為額外非限制性實例,異源核酸序列可為或編碼在癌症中增強腫瘤殺死活性之劑,諸如但不限於TRAIL或腫瘤壞死因子(TNF)。作為另一非限制性實例,異源核酸序列可為或編碼適合治療以下疾患之劑:諸如肌肉萎縮( 例如,異源核酸序列係或編碼抗肌萎縮蛋白)、心血管疾病( 例如,異源核酸序列係或編碼SERCA2a、GATA4、Tbx5、Mef2C、Hand2、Myocd )、神經退化性疾病( 例如,異源核酸序列係或編碼NGF、BDNF、GDNF、NT-3 )、慢性疼痛( 例如,異源核酸序列係或編碼GlyRal)、腦啡肽或麩胺酸去羧酶( 例如,異源核酸序列係或編碼GAD65、GAD67或另一同功型)、肺病( 例如,異源核酸序列係或編碼CFTR)、血友病( 例如,異源核酸序列係或編碼因子VIII或因子IX)、贅瘤形成( 例如,異源核酸序列係或編碼PTEN、ATM、ATR、EGFR、ERBB2、ERBB3、ERBB4、Notchl、Notch2、Notch3、Notch4、AKT、AKT2、AKT3、HIF、HI Fla、HIF3a、Met、HRG、Bcl2、PPARα、PPARγ、WT1 (威爾姆氏腫瘤)、FGF受體家族成員(5個成員:1、2、3、4、5)、CDKN2a、APC、RB (視網膜母細胞瘤)、MEN1、VHL、BRCA1、BRCA2、AR (雄激素受體)、TSG101、IGF、IGF受體、Igfl (4種變異體)、Igf2 (3種變異體)、Igfl受體、Igf2受體、Bax、Bcl2、半胱天冬酶家族(9個成員:1、2、3、4、6、7、8、9、12)、Kras、Ape)、年齡相關黃斑變性( 例如,異源核酸序列係或編碼Aber、Ccl2、Cc2、cp (血漿銅藍蛋白)、Timp3、組織蛋白酶D、Vldlr)、精神分裂症( 例如,神經調節蛋白(Nrgl)、Erb4 (神經調節蛋白之受體)、Complexin-l (Cplxl)、Tphl色胺酸羥化酶、Tph2色胺酸羥化酶2、Neurexin 1、GSK3、GSK3a、GSK3b、5-HIT (Slc6a4)、COMT、DRD (Drdla)、SLC6A3、DAOA、DTNBPI、Dao (Daol))、三核苷酸重複病症( 例如, HTT (亨廷頓氏Dx)、SBMA/SMAXI/AR (Kennedy氏Dx)、FXN/X25 (Friedrich運動失調)、ATX3 (Machado-Joseph氏Dx)、ATXNI及ATXN2 (脊髓小腦性失調症)、DMPK (肌強直性營養不良)、Atrophin-1及Atnl (DRPLA Dx)、CBP (Creb-BP整體不穩定性)、VLDLR (阿茲海默氏)、Atxn7、Atxn10)、脆性X症候群( 例如,異源核酸序列係或編碼FMR2、FXRI、FXR2、mGLUR5)、分泌酶相關病症( 例如,異源核酸序列係或編碼APH-1 (α及β)、早老素(Psenl)、nicastrin (Ncstn)、PEN-2)、ALS ( 例如,異源核酸序列係或編碼SOD1、ALS2、STEX、FUS、TARD BP、VEGF (VEGF-a、VEGF-b、VEGF-c))、自閉症( 例如,異源核酸序列係或編碼Mecp2、BZRAP1、MDGA2、Sema5A、Neurexin 1)、阿茲海默氏病( 例如,異源核酸序列係或編碼El、CHIP、UCH、UBB、Tau、LRP、PICALM、Clusterin、PS1、SORL1、CR1、Vldlr、Ubal、Uba3、CHIP28 (Aqpl、水孔蛋白1)、Uchll、Uchl3、APP)、發炎( 例如,異源核酸序列係或編碼IL-10、IL-1 (IL-Ia、IL-Ib)、IL-13、IL-17 (IL-17a (CTLA8)、IL-17b、IL-17c、IL-17d、IL-171)、11-23、Cx3crl、ptpn22、TNFa、針對IBD之NOD2/CARD15、IL-6、IL-12 (IL-12a、IL-12b)、CTLA4、Cx3cll)、帕金森氏病( 例如, x-突觸核蛋白、DJ-1、LRRK2、Parkin、PINK1)、血液及凝血病症( 例如,貧血、裸淋巴球症候群、出血病症、血噬性淋巴組織細胞增生病症、血友病A、血友病B、出血性病症、白細胞缺乏及病症、鐮狀細胞貧血及地中海貧血) ( 例如,異源核酸序列係或編碼CRAN1、CDA1、RPS19、DBA、PKLR、PK1、NT5C3、UMPH1、PSNI、RHAG、RH50A、NRAMP2、SPTB、ALAS2、ANH1、ASB、ABCB7、ABC7、ASAT、TAPBP、TPSN、TAP2、ABCB3、PSF2、RING11、MHC2TA、C2TA、RFX5、RFXAP、RFX5、TBXA2R、P2RX1、P2X1、HF1、CFH、HUS、MCFD2、FANCA、FAC A、FA1、FA、FA A、FAAP95、FAAP90、FLJ34064、FANCB、FANCC、FACC、BRCA2、FANCDI、FANCD2、FANCD、FACD、FAD、FANCE、FACE、FANCF、XRCC9、FANCG、BR1PI、BACH1、FANCJ、PHF9、FANCL、FANCM、KIAA1596、PRF1、HPLH2、UNC13D、MUNC13-4、HPLH3、HLH3、FHL3、F8、FSC、PI、ATT、F5、ITGB2、CD18、LCAMB、LAD、EIF2B1、EIF2BA、EIF2B2、EIF2B3、EIF2B5、LVWM、CACH、CLE、EIF2B4、HBB、HBA2、HBB、HBD、LCRB、HBA1)、B細胞非霍奇金淋巴瘤或白血病( 例如,異源核酸序列係或編碼BCL7A、BCL7、ALI、TCL5、SCL、TAL2、FLT3、NBS1、NBS、ZNFN1AI、1KI、LYF1、HOXD4、HOX4B、BCR、CML、PHL、ALL、ARNT、KRAS2、RASK2、GMPS、AFIO、ARHGEF12、LARG、KIAA0382、CALM、CLTH、CEBPA、CEBP、CHIC2、BTL、FLT3、KIT、PBT、LPP、NPMI、NUP214、D9S46E、CAN、CAIN、RUNXI、CBFA2、AML1、WHSC1LI、NSD3、FLT3、AF1Q、NPMI、NUMA1、ZNF145、PLZF、PML、MYL、STAT5B、AF1Q、CALM、CLTH、ARL11、ARLTS1、P2RX7、P2X7、BCR、CML、PHL、ALL、GRAF、NF1、VRNF、WSS、NFNS、PTPNII、PTP2C、SHP2、NS1、BCL2、CCND1、PRAD1、BCL1、TCRA、GATA1、GF1、ERYF1、NFE1、ABLI、NQO1、DIA4、NMOR1、NUP214、D9S46E、CAN、CAIN)、發炎及免疫相關疾病及病症( 例如,異源核酸序列係或編碼KIR3DL1、NKAT3、NKB1、AMB11、K1R3DS1、IFNG、CXCL12、TNFRSF6、APT1、FAS、CD95、ALPS1A、IL2RG、SCIDX1、SCIDX、IMD4、CCL5、SCYA5、D17S136E、TCP228、IL10、CSIF、CMKBR2、CCR2、CMKBR5、CCCKR5 (CCR5)、CD3E、CD3G、AICDA、AID、HIGM2、TNFRSF5、CD40、UNG、DGU、HIGM4、TNFSFS、CD40LG、HIGM1、IGM、FOXP3、IPEX、AIID、XPID、PIDX、TNFRSF14B、TACI)、發炎( 例如,異源核酸序列係或編碼IL-10、IL-1 (IL-IA、IL-IB)、IL-13、IL-17 (IL-17a (CTLA8)、IL-17b、IL-17c、IL-17d、IL-171)、11-23、Cx3crl、ptpn22、TNFa、針對IBD之NOD2/CARD15、IL-6、IL-12 (IL-12a、IL-12b)、CTLA4、Cx3cII)、JAK3、JAKL、DCLREIC、ARTEMIS、SCIDA、RAG1、RAG2、ADA、PTPRC、CD45、LCA、IL7R、CD3D、T3D、IL2RG、SCIDXI、SCIDX、IMD4)、代謝、肝、腎及蛋白質疾病及病症( 例如,異源核酸序列係或編碼TTR、PALB、APOA1、APP、AAA、CVAP、ADI、GSN、FGA、LYZ、TTR、PALB、KRT18、KRT8、CIRH1A、NAIC、TEX292、KIAA1988、CFTR、ABCC7、CF、MRP7、SLC2A2、GLUT2、G6PC、G6PT、G6PT1、GAA、LAMP2、LAMPB、AGL、GDE、GBE1、GYS2、PYGL、PFKM、TCF1、HNF1A、MODY3、SCOD1、SCOl、CTNNB1、PDGFRL、PDGRL、PRLTS、AX1NI、AXIN、CTNNB1、TP53、P53、LFS1、IGF2R、MPRI、MET、CASP8、MCH5、UMOD、HNFJ、FJHN、MCKD2、ADMCKD2、PAH、PKU1、QDPR、DHPR、PTS、FCYT、PKHD1、ARPKD、PKD1、PKD2、PKD4、PKDTS、PRKCSH、G19P1、PCLD、SEC63)、肌肉/骨骼疾病及病症( 例如,異源核酸序列係或編碼DMD、BMD、MYF6、LMNA、LMN1、EMD2、FPLD、CMDIA、HGPS、LGMDIB、LMNA、LMNI、EMD2、FPLD、CMDIA、FSHMD1A、FSHD1A、FKRP、MDC1C、LGMD2I、LAMA2、LAMM、LARGE、KIAA0609、MDC1D、FCMD、TTID、MYOT、CAPN3、CANP3、DYSF、LGMD2B、SGCG、LGMD2C、DMDA1、SCG3、SGCA、ADL、DAG2、LGMD2D、DMDA2、SGCB、LGMD2E、SGCD、SGD、LGMD2F、CMD1L、TCAP、LGMD2G、CMD1N、TRIM32、HT2A、LGMD2H、FKRP、MDCIC、LGMD21、TTN、CMD1G、TMD、LGMD2J、POMT1、CAV3、LGMD1C、SEPN1、SELN、RSMD1、PLEC1、PLTN、EBS1、LRP5、BMNDl、LRP7、LR3、OPPG、VBCH2、CLCN7、CLC7、OPTA2、OSTMI、GL、TCIRG1、TIRC7、OC116、OPTB1、VAPB、VAPC、ALS8、SMN1、SMA1、SMA2、SMA3、SMA4、BSCL2、SPG17、GARS、SMAD1、CMT2D、HEXB、IGHMBP2、SMUBP2、CATF1、SMARD1)、神經及神經元疾病及病症( 例如,異源核酸序列係或編碼SOD1、ALS2、STEX、FUS、TARDBP、VEGF (VEGF-a、VEGF-b、VEGF-c)、APP、AAA、CVAP、ADI、APOE、AD2、PSEN2、AD4、STM2、APBB2、FE65LI、NOS3、PLAU、URK、ACE、DCPI、ACEI、MPO、PAC1PI、PAXIPIL、PTIP、A2M、BLMH、BMH、PSEN1、AD3、Mecp2、BZRAP1、MDGA2、Sema5A、Neurexin 1、GLOl、MECP2、RTT、PPMX、MRX16、MRX79、NLGN3、NLGN4、KIAA1260、AUTSX2、FMR2、FXR1、FXR2、mGLUR5、HD、IT15、PRNP、PRIP、JPH3、JP3、HDL2、TBP、SCA17、NR4A2、NURR1、NOT、TINUR、SNCAIP、TBP、SCA17、SNCA、NACP、PARK1、PARK4、DJI、PARK7、LRRK2、PARK8、PINK1、PARK6、UCHL1、PARK5、SNCA、NACP、PARKl、PARK4、PRKN、PARK2、PDJ、DBH、NDUFV2、MECP2、RTT、PPMX、MRX16、MRX79、CDKL5、STK9、MECP2、RTT、PPMX、MRX16、MRX79、x-突觸核蛋白、DJ-1、神經調節蛋白-l (Nrgl)、Erb4 (神經調節蛋白之受體)、Complexin-l (Cplxl)、Tphl色胺酸羥化酶、Tph2、色胺酸羥化酶2、Neurexin 1、GSK3、GSK3a、GSK3b、5-HTT (Slc6a4)、CONT、DRD (Drdla)、SLC6A 、DAOA、DTNBP1、Dao (Daol)、APH-l (α及β)、早老素(Psenl)、Nicastrin、(Ncstn)、PEN-2、Nosl、Parpl、Natl、Nat2、HTT、SBMA/SMAX1/AR、FXN/X25、ATX3、TXN、ATXN2、DMPK、Atrophin-1、Atnl、CBP、VLDLR、Atxn7及AtxnlO)以及眼部疾病及病症( 例如,Aber、Ccl2、Cc2、cp (血漿銅藍蛋白)、Timp3、組織蛋白酶-D、Vldlr、Ccr2、CRYAA、CRYA1、CRYBB2、CRYB2、PITX3、BFSP2、CP49、CP47、CRYAA、CRYAI、PAX6、AN2、MGDA、CRYBA1、CRYB1、CRYGC、CRYG3、CCL、LIM2、MP19、CRYGD、CRYG4、BFSP2、CP49、CP47、HSF4、CTM、HSF4、CTM、MIP、AQPO、CRYAB、CRYA2、CTPP2、CRYBB1、CRYGD、CRYG4、CRYBB2、CRYB2、CRYGC、CRYG3、CCL、CRYAA、CRYAI、GJA8、CX50、CAE1、GJA3、CX46、CZP3、CAE3、CCM1、CAM、KRIT1、APOA1、TGFBI、CSD2、CDGG1、CSD、BIGH3、CDG2、TACSTD2、TROP2、M1SI、VSX1、RINX、PPCD、PPD、KTCN、COL8A2、FECD、PPCD2、PIP5K3、CFD、KERA、CNA2、MYOC、TIGR、GLCIA、JO AG、GPOA、OPTN、GLC1E、FIP2、HYPL、NRP、CYP1BI、GLC3A、OPA1、NTG、NPG、CYP1BI、GLC3A、CRB1、RP12、CRX、CORD2、CRD、RPGRIPI、LCA6、CORD9、RPE65、RP20、AIPL1、LCA4、GUCY2D、GUC2D、LCA1、CORD6、RDH12、LCA3、ELOVL4、ADMD、STGD2、STGD3、RDS、RP7、PRPH2、PRPH、AVMD、AOFMD及VMD2)。 As an additional non-limiting example, the heterologous nucleic acid sequence may be or encode an agent that enhances tumoricidal activity in cancer, such as, but not limited to, TRAIL or tumor necrosis factor (TNF). As another non-limiting example, the heterologous nucleic acid sequence may be or encode an agent suitable for treating the following diseases: such as muscle atrophy ( e.g. , the heterologous nucleic acid sequence is or encodes anti-dystrophy protein), cardiovascular disease ( e.g. , the heterologous nucleic acid sequence is or encodes SERCA2a, GATA4, Tbx5, Mef2C, Hand2, Myocd , etc. ), neurodegenerative disease ( e.g. , the heterologous nucleic acid sequence is or encodes NGF, BDNF, GDNF, NT-3 , etc. ), chronic pain ( e.g. , the heterologous nucleic acid sequence is or encodes GlyRal), enkephalin or glutathione. amino acid decarboxylase ( e.g. , the heterologous nucleic acid sequence is or encodes GAD65, GAD67 or another isoform), lung disease ( e.g. , the heterologous nucleic acid sequence is or encodes CFTR), hemophilia ( e.g. , the heterologous nucleic acid sequence is or encodes factor VIII or factor IX), tumor formation ( e.g. , the heterologous nucleic acid sequence is or encodes PTEN, ATM, ATR, EGFR, ERBB2, ERBB3, ERBB4, Notch1, Notch2, Notch3, Notch4, AKT, AKT2, AKT3, HIF, HIF Fla, HIF3a, Met, HRG, Bcl2, PPARα, PPARγ, WT1 (Wilm's tumor), FGF receptor family members (5 members: 1, 2, 3, 4, 5), CDKN2a, APC, RB (retinoblastoma), MEN1, VHL, BRCA1, BRCA2, AR (androgen receptor), TSG101, IGF, IGF receptor, Igf1 (4 variants), Igf2 (3 variants), Igf1 receptor, Igf2 receptor, Bax, Bcl2, caspase family (9 members: 1, 2, 3, 4, 6, 7, 8, 9, 12), Kras, Ape), age-related macular degeneration ( e.g. , heterologous nucleic acid sequences are or encode Aber, Ccl2, Cc2, cp (plasma cuprocyanin), Timp3, cathepsin D, Vldlr), schizophrenia ( e.g. , neuromodulin (Nrgl), Erb4 (neuromodulin receptor), Complexin-1 (Cplxl), Tphl tryptophan hydroxylase, Tph2 tryptophan hydroxylase 2, Neurexin 1, GSK3, GSK3a, GSK3b, 5-HIT (Slc6a4), COMT, DRD (Drdla), SLC6A3, DAOA, DTNBPI, Dao (Daol)), trinucleotide repeat disorders ( e.g. , HTT (Huntington's Dx), SBMA/SMAXI/AR (Kennedy's Dx), FXN/X25 (Friedrich's movement disorder), ATX3 (Machado-Joseph's Dx), ATXNI and ATXN2 (spinocerebellar disorders), DMPK (myotonic dystrophy), Atrophin-1 and Atn1 (DRPLA Dx), CBP (Creb-BP global instability), VLDLR (Alzheimer's), Atxn7, Atxn10), fragile X syndrome ( e.g. , the heterologous nucleic acid sequence is or encodes FMR2, FXRI, FXR2, mGLUR5), secretase-related disorders ( e.g. , the heterologous nucleic acid sequence is or encodes APH-1 (α and β), presenilin (Psen1), nicastrin (Ncstn), PEN-2), ALS ( e.g. , the heterologous nucleic acid sequence is or encodes SOD1, ALS2, STEX, FUS, TARD BP, VEGF (VEGF-a, VEGF-b, VEGF-c)), autism ( e.g. , the heterologous nucleic acid sequence is or encodes Mecp2, BZRAP1, MDGA2, Sema5A, Neurexin 1), Alzheimer's disease ( e.g. , heterologous nucleic acid sequences are or encode El, CHIP, UCH, UBB, Tau, LRP, PICALM, Clusterin, PS1, SORL1, CR1, Vldlr, Ubal, Uba3, CHIP28 (Aqpl, aquaporin 1), Uchll, Uchl3, APP), inflammation ( e.g. , heterologous nucleic acid sequences are or encode IL-10, IL-1 (IL-Ia, IL-Ib), IL-13, IL-17 (IL-17a (CTLA8), IL-17b, IL-17c, IL-17d, IL-171), 11-23, Cx3crl, ptpn22, TNFa, NOD2/CARD15 for IBD, IL-6, IL-12 (IL-12a, IL-12b), CTLA4, Cx3cll), Parkinson's disease ( e.g. , x-synuclein, DJ-1, LRRK2, Parkin, PINK1), blood and coagulation disorders ( e.g. , anemia, bare lymphocyte syndrome, bleeding disorders, hemophagocytic lymphohistiocytosis, hemophilia A, hemophilia B, bleeding disorders, leukocytopenias and disorders, sickle cell anemia, and thalassemia) ( For example , the heterologous nucleic acid sequence is or encodes CRAN1, CDA1, RPS19, DBA, PKLR, PK1, NT5C3, UMPH1, PSNI, RHAG, RH50A, NRAMP2, SPTB, ALAS2, ANH1, ASB, ABCB7, ABC7, ASAT, TAPBP, TPSN, TAP2, ABCB3, PSF2, RING11, MHC2TA, C2TA, RFX5, RFXAP, RFX5, TBXA2R, P2RX1, P2X1, HF1, CFH, HUS, MCFD2, FANCA, FAC A, FA1, FA, FA A. FAAP95, FAAP90, FLJ34064, FANCB, FANCC, FACC, BRCA2, FANCDI, FANCD2, FANCD, FACD, FAD, FANCE, FACE, FANCF, 3-4, HPLH3, HLH3, FHL3, F8, FSC, PI, ATT, F5, ITGB2, CD18, LCAMB, LAD, EIF2B1, EIF2BA, EIF2B2, E IF2B3, EIF2B5, LVWM, CACH, CLE, EIF2B4, HBB, HBA2, HBB, HBD, LCRB, HBA1), B cell non-Hodgkin lymphoma or leukemia ( e.g. , the heterologous nucleic acid sequence is or encodes BCL7A, BCL7, ALI, TCL5, SCL, TAL2, FLT3, NBS1, NBS, ZNFN1AI, 1KI, LYF1, HOXD4, HOX4B, BCR, CML, PHL, ALL, ARNT, KRAS2, RASK2, GMPS, AFIO, ARHGEF12, LARG, KIAA0382, CALM, CLTH, CEBPA, CEBP, CHIC2 , BTL, FLT3, KIT, PBT, LPP, NPMI, NUP214, D9S46E, CAN, CAIN, RUNXI, CBFA2, AML1, WHSC1LI, NSD3, FLT3, AF1Q, NPMI, NUMA1, ZNF145, PLZF, PML, MYL, STAT5B, AF1Q, CALM, CLTH, ARL11, ARLTS1, P2RX7, P2 X7, BCR, CML, PHL, ALL, GRAF, NF1, VRNF, WSS, NFNS, PTPNII, PTP2C, SHP2, NS1, BCL2, CCND1, PRAD1, BCL1, TCR A, GATA1, GF1, ERYF1, NFE1, ABLI, NQO1, DIA4, NMOR1, NUP214, D9S46E, CAN, CAIN), inflammation and immune-related diseases and disorders ( e.g. , heterologous nucleic acid sequences are or encode KIR3DL1, NKAT3, NKB1, AMB11, K1R3DS1, IFNG, CXCL12, TNFRSF6, APT1, FAS, CD95, ALPS1A, IL2RG, SCIDX1, SCIDX, IMD4, CCL5, SCYA5, D17S136E, TCP228, IL10, CSIF, CMKBR2, CCR2, CMKBR5, CCCKR5 (CCR5), CD3E, CD3G, AICDA, AID, HIGM2, TNFRSF5, CD40, UNG, DGU, HIGM4, TNFSFS, CD40LG, HIGM1, IGM, FOXP3, IPEX, AIID, XPID, PIDX, TNFRSF14B, TACI), inflammation ( e.g. , heterologous nucleic acid sequences are or encode IL-10, IL-1 (IL-IA, IL-IB), IL-13, IL-17 (IL-17a (CTLA8), IL-17b, IL-17c, IL-17d, IL-171), 11-23, Cx3crl, ptpn22, TNFa, NOD2/CARD15 for IBD, IL-6, IL-12 (IL-12a, IL-12b), CTLA4, Cx3cII), JAK3, JAKL, DCLREIC, ARTEMIS, SCIDA, RAG1, RAG2, ADA, PTPRC, CD45, LCA, IL7R, CD3D, T3D, IL2RG, SCIDXI, SCIDX, IMD4), metabolic, liver, kidney and protein diseases and disorders ( e.g. , heterologous nucleic acid sequences are or encode TTR, PALB, APOA1, APP, AAA, CVAP, ADI, GSN, FGA, LYZ, TTR, PALB, KRT18, KRT8, CIRH1A, NAIC, TEX292, KIAA1988, CFTR, ABCC7, CF, MRP7, SLC2A2, GLUT2, G6PC, G6PT, G6PT1, GAA, LAM P2, LAMPB, AGL, GDE, GBE1, GYS2, PYGL, PFKM, TCF1, HNF1A, MODY3, SCOD1, SCO1, CTNNB1, PDGFRL, PDGRL, PRLTS, AX1NI, AXIN, CTNNB1, TP53, P53, LFS1, IGF2R, MPRI, MET, CASP8, MCH5, UMOD, HNFJ, FJHN, MCKD2, ADMCKD2, PAH, PKU1, QDPR, DHPR, PTS, FCYT, PKHD1, ARPKD, PKD1, PKD2, PKD4, PKDTS, PRKCSH, G19P1, PCLD, SEC63), musculoskeletal diseases and disorders ( e.g. , heterologous nucleic acid sequences are or encode DMD, BMD, MYF6, LMNA, L MN1, EMD2, FPLD, CMDIA, HGPS, LGMDIB, LMNA, LMNI, EMD2, FPLD, CMDIA, FSHMD1A, FSHD1A, FKRP, MDC1C, LGMD2I, LAMA2, LAMM, LARGE, KIAA0609, MDC1D, FCMD, TTID, MYOT, CAPN3, CANP3, DYSF, LGMD2B, SGCG, LGMD2C, DM DA1, SCG3, SGCA, ADL, DAG2, LGMD2D, DMDA2, SGCB, LGMD2E, SGCD, SGD, LGMD2F, CMD1L, TCAP, LGMD2G, CMD1N, TRIM32, HT2A, LGMD2H, FKRP, MDCIC, LGMD21, TTN, C MD1G, TMD, LGMD2J, POMT1, CAV3, LGMD1C, SEPN1, SELN, RSMD1, PLEC1, PLTN, EBS1, LRP5, BMND1, LRP7, LR3, OPPG, VBCH2, CLCN7, CLC7, OPTA2, OSTMI, GL, TCIRG1, TIRC7, OC116, OPTB1, VAPB, VAPC, ALS8, SMN1, SMA1, SMA2, SMA3, SMA4, BSCL2, SPG17, GARS, SMAD1, CMT2D, HEXB, IGHMBP2, SMUBP2, CATF1, SMARD1), neural and neuronal diseases and disorders ( e.g. , heterologous nucleic acid sequences are or encode SOD1, ALS2, STEX, FUS, TARDBP, VEGF (VEGF-a, VEGF-b, VEGF-c), APP, AAA, CVAP, ADI, APOE, AD2, PSEN2, AD4, STM2, APBB2, FE65LI, NOS3, PLAU, URK, ACE, DCPI, ACEI, MPO, PAC1PI, PAXIPIL, PTIP, A2M, BLMH, BMH, PSEN1, AD3, Mecp2, BZRAP1, MDGA2, Sema5A , Neurexin 1. GLOl, MECP2, RTT, PPMX, MRX16, MRX79, NLGN3, NLGN4, KIAA1260, AUTSX2, FMR2, FXR1, FXR2, mGLUR5, HD, IT15, PRNP, PRIP, JPH3, JP3, HDL2, TBP, SCA17, NR4A2, NURR1, NOT, TINUR, SNCAIP, TBP, SCA 17. SNCA, NACP, PARK1, PA RK4, DJI, PARK7, LRRK2, PARK8, PINK1, PARK6, UCHL1, PARK5, SNCA, NACP, PARK1, PARK4, PRKN, PARK2, PDJ, DBH, NDUFV2, MECP2, RTT, PPMX, MRX16, MRX79, CDKL5, STK9, MECP2, RTT, PPMX, MRX16, MRX79, x-synaptotagmin, DJ-1, neuromodulin-1 (Nrgl), Erb4 (neuromodulatory protein receptor), Complexin-1 (Cplxl), Tphl tryptophan hydroxylase, Tph2, tryptophan hydroxylase 2, Neurexin 1, GSK3, GSK3a, GSK3b, 5-HTT (Slc6a4), CONT, DRD (Drdla), SLC6A, DAOA, DTNBP1, Dao (Daol), APH-1 (α and β), Presenilin (Psenl), Nicastrin, (Ncstn), PEN-2, Nosl, Parpl, Natl, Nat2, HTT, SBMA/SMAX1/AR, FXN/X25, ATX3, TXN, ATXN2, DMPK, Atrophin-1, Atnl, CBP, VLDLR, Atxn7 and AtxnlO) and ocular diseases and disorders ( e.g. , Aber, Ccl2, Cc2, cp (plasma copper cyanin), Timp3, cathepsin-D, Vldlr, Ccr2, CRYAA, CRYA1, CRYBB2, CRYB2, PITX3, BFSP2, CP49, CP47, CRYAA, CRYAI, PAX6, AN2, MGDA, CRYBA1, CRYB1, CRYGC, CRYG3, CCL, LIM2, MP19, CRYGD, CRYG4, BFSP2, CP49, CP47, HSF4, CTM, HSF4, CTM, MIP, AQPO, CRYAB, CRYA2, CTPP2, CRYBB1, CRYGD, C RYG4, CRYBB2, CRYB2, CRYGC, CRYG3, CCL, CRYAA, CRYAI, GJA8, CX50, CAE1, GJA3, CX46, CZP3, CAE3, CCM1, CAM, KRIT1, APOA1, TGFBI, CSD2, CDGG1, CSD, BIGH3, CDG2, TACSTD2, TROP2, M1SI, VSX1, RINX, PPCD, PPD, KTCN, COL8A2, FECD, PPCD2, PIP5K3, CFD, KERA, CNA2, MYOC, TIGR, GLCIA, JO AG, GPOA, OPTN, GLC1E, FIP2, HYPL, NRP, CYP1BI, GLC3A, OPA1, NTG, NPG, CYP1BI, GLC3A, CRB1, RP12, CRX, CORD2, CRD, RPGRIPI, LCA6, CORD9, RPE65, RP20, AIPL1, LCA4, GUCY2D, GUC2D, LCA1, CORD6, RDH12 , LCA3, ELOVL4, ADMD, STGD2, STGD3, RDS, RP7, PRPH2, PRPH, AVMD, AOFMD and VMD2).

在一些實施例中,異源核酸序列係或編碼可影響細胞分化之因子。作為非限制性實例,Oct4、Klf4、Sox2、c-Myc、L-Myc、優勢陰性p53、Nanog、Glisl、Lin28、TFIID、mir-302/367或其他miRNA中之一或多者之表現可使細胞變成誘導型富潛能幹(iPS)細胞。In some embodiments, the heterologous nucleic acid sequence is or encodes a factor that can affect cell differentiation. As a non-limiting example, the expression of one or more of Oct4, Klf4, Sox2, c-Myc, L-Myc, dominant negative p53, Nanog, Glis1, Lin28, TFIID, mir-302/367 or other miRNAs can make cells become induced enriched potential stem (iPS) cells.

在一些實施例中,異源核酸序列係或編碼用於轉分化細胞之因子。因子之非限制性實例包括:針對心肌細胞之GATA4、Tbx5、Mef2C、Myocd、Hand2、SRF、Mespl、SMARCD3中的一或多者;針對神經細胞之Ascii、Nurrl、LmxlA、Bm2、Mytll、NeuroDl、FoxA2;及針對肝細胞之Hnf4a、Foxal、Foxa2或Foxa3。In some embodiments, the heterologous nucleic acid sequence is or encodes a factor for transdifferentiating cells. Non-limiting examples of factors include: one or more of GATA4, Tbx5, Mef2C, Myocd, Hand2, SRF, Mespl, SMARCD3 for cardiomyocytes; Ascii, Nurrl, LmxlA, Bm2, Mytll, NeuroDl, FoxA2 for neural cells; and Hnf4a, Foxal, Foxa2 or Foxa3 for hepatocytes.

在某些實施例中,異源核酸編碼治療抗體或其抗原結合片段。In certain embodiments, the heterologous nucleic acid encodes a therapeutic antibody or an antigen-binding fragment thereof.

在某些實施例中,異源核酸編碼用於替代療法之蛋白質。該蛋白質在疾病細胞或疾病生物體/個體中可具有缺陷,且當藉由異源核酸遞送至疾病細胞/組織/生物體/個體時,野生型蛋白質或其功能片段或變異體至少部分地或完全地恢復該蛋白質之疾病細胞/組織/生物體/個體中失去之功能。 HNS = 供體DNA模板 In certain embodiments, the heterologous nucleic acid encodes a protein for replacement therapy. The protein may be defective in a diseased cell or diseased organism/individual, and when delivered to the diseased cell/tissue/organism/individual by the heterologous nucleic acid, the wild-type protein or a functional fragment or variant thereof at least partially or completely restores the function of the protein lost in the diseased cell/tissue/organism/individual. HNS = donor DNA template

在一些實施例中,異源核酸序列係可經由HDR整合至宿主基因體中之供體DNA模板。In some embodiments, the heterologous nucleic acid sequence is a donor DNA template that can be integrated into the host genome via HDR.

在一些實施例中,異源核酸序列係可在複製期間充當用於重組工程之模板或引子之供體DNA。In some embodiments, the heterologous nucleic acid sequence is a donor DNA that can serve as a template or primer for recombination engineering during replication.

在某些實施例中,異源核酸包含或編碼供體/模板序列,其中供體/模板校正/修復/移除標靶基因體位點處之突變。例如,突變可為疾病基因中之突變外顯子。In certain embodiments, the heterologous nucleic acid comprises or encodes a donor/template sequence, wherein the donor/template corrects/repairs/removes a mutation at a target genome site. For example, the mutation may be a mutated exon in a disease gene.

在某些實施例中,供體/模板可編碼或包含功能性DNA元件,諸如啟動子、增強子、蛋白質結合序列、甲基化位點或用於輔助基因編輯之同源區等。In certain embodiments, the donor/template may encode or contain functional DNA elements, such as promoters, enhancers, protein binding sequences, methylation sites, or homology regions for assisting gene editing.

「供體DNA」或「供體DNA模板」意謂欲插入由基因編輯核酸酶(例如,CRISPR/Cas效應蛋白;TALEN;ZFN)裂解之位點處的單股DNA (例如,在dsDNA裂解之後、在對標靶DNA刻切口之後、在對標靶DNA雙重刻切口之後及其類似情形)。供體DNA模板可含有與標靶位點處之基因體序列的足夠同源性,例如與側接標靶位點之核苷酸序列的70%、80%、85%、90%、95%或100%同源性,例如在標靶位點之約50個鹼基或更少鹼基內,例如在約30個鹼基內、在約15個鹼基內、在約10個鹼基內、在約5個鹼基內或直接側接標靶位點,以支持該供體DNA模板與與其攜帶同源性之基因體序列之間的同源定向修復。"Donor DNA" or "donor DNA template" means a single strand of DNA to be inserted at a site to be cleaved by a gene editing nuclease (e.g., CRISPR/Cas effector protein; TALEN; ZFN) (e.g., after dsDNA cleavage, after nicking of the target DNA, after double nicking of the target DNA, and the like). The donor DNA template can contain sufficient homology to the genomic sequence at the target site, e.g., 70%, 80%, 85%, 90%, 95% or 100% homology to the nucleotide sequence flanking the target site, e.g., within about 50 bases or less of the target site, e.g., within about 30 bases, within about 15 bases, within about 10 bases, within about 5 bases, or directly flanking the target site, to support homology-directed repair between the donor DNA template and the genomic sequence to which it carries homology.

供體DNA模板與基因體序列之間大約25、50、100或200個核苷酸或超過200個核苷酸之序列同源性(或10與200個核苷酸或更多核苷酸之間之任何整數值)可支持同源定向修復。供體DNA模板可具有任何長度,例如50個核苷酸或更多、100個核苷酸或更多、250個核苷酸或更多、500個核苷酸或更多、1000個核苷酸或更多、5000個核苷酸或更多等。合適供體DNA模板可為50個核苷酸至100個核苷酸、100個核苷酸至500個核苷酸、500個核苷酸至1000個核苷酸、1000個核苷酸至5000個核苷酸或5000個核苷酸至10,000個核苷酸或超過10,000個核苷酸長。Sequence homology of about 25, 50, 100 or 200 nucleotides or more (or any integer value between 10 and 200 nucleotides or more) between the donor DNA template and the genome sequence can support homology-directed repair. The donor DNA template can have any length, such as 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 500 nucleotides or more, 1000 nucleotides or more, 5000 nucleotides or more, etc. Suitable donor DNA templates can be 50 nucleotides to 100 nucleotides, 100 nucleotides to 500 nucleotides, 500 nucleotides to 1000 nucleotides, 1000 nucleotides to 5000 nucleotides, or 5000 nucleotides to 10,000 nucleotides or more than 10,000 nucleotides in length.

如上所述,供體DNA模板包含第一同源臂及第二同源臂。第一同源臂位於或接近供體DNA之5’端;且包含與標靶核酸中之第一核苷酸序列至少部分地互補之核苷酸序列。第二同源臂位於或接近供體DNA之3’端;且包含與標靶核酸中之第二核苷酸序列至少部分地互補之核苷酸序列。第一及第二同源臂可各自獨立地具有約10個核苷酸至400個核苷酸之長度;例如,10個核苷酸(nt)至15 nt、15 nt至20 nt、20 nt至25 nt、25 nt至30 nt、30 nt至35 nt、35 nt至40 nt、40 nt至45 nt、45 nt至50 nt、50 nt至75 nt、75 nt至100 nt、100 nt至125 nt、125 nt至150 nt、150 nt至175 nt、175 nt至200 nt、200 nt至225 nt、225 nt至250 nt、250 nt至275 nt、275 nt至300 nt、325 nt至350 nt、350 nt至375 nt或375 nt至400 nt。As described above, the donor DNA template comprises a first homology arm and a second homology arm. The first homology arm is located at or near the 5' end of the donor DNA; and comprises a nucleotide sequence that is at least partially complementary to the first nucleotide sequence in the target nucleic acid. The second homology arm is located at or near the 3' end of the donor DNA; and comprises a nucleotide sequence that is at least partially complementary to the second nucleotide sequence in the target nucleic acid. The first and second homology arms can each independently have a length of about 10 nucleotides to 400 nucleotides; for example, 10 nucleotides (nt) to 15 nt, 15 nt to 20 nt, 20 nt to 25 nt, 25 nt to 30 nt, 30 nt to 35 nt, 35 nt to 40 nt, 40 nt to 45 nt, 45 nt to 50 nt, 50 nt to 75 nt, 75 nt to 100 nt, 100 nt to 125 nt, 125 nt to 150 nt, 150 nt to 175 nt, 175 nt to 200 nt, 200 nt to 225 nt, 225 nt to 250 nt, 250 nt to 275 nt, 275 nt to 300 nt, 325 nt to 350 nt, 350 nt to nt to 375 nt or 375 nt to 400 nt.

在某些實施例中,供體DNA模板用於編輯標靶核苷酸序列。在某些實施例中,供體DNA模板包含欲引入標靶多核苷酸中之一或多種突變。此類突變之實例包括取代、缺失、插入或其組合。在某些實施例中,突變引起標靶多核苷酸上之開放閱讀框移位。在某些實施例中,供體多核苷酸改變標靶多核苷酸中之終止密碼子。在某些實施例中,供體多核苷酸校正早熟終止密碼子。可藉由缺失終止密碼子或藉由引入一或多個序列變化以將終止密碼子改變為密碼子來實現校正。在某些實施例中,供體多核苷酸藉由插入或恢復基因之功能複本或其功能片段或者功能調節序列或調節序列之功能片段來解決可例如出現於某些疾病背景中的功能突變、缺失或易位之損失。功能片段包括少於基因之完整複本但以其他方式提供足以恢復野生型基因或非編碼調節序列( 例如,編碼長非編碼RNA之序列)之功能性的核苷酸序列之片段。 In certain embodiments, the donor DNA template is used to edit the target nucleotide sequence. In certain embodiments, the donor DNA template comprises one or more mutations to be introduced into the target polynucleotide. Examples of such mutations include substitutions, deletions, insertions, or combinations thereof. In certain embodiments, the mutation causes an open reading frame shift on the target polynucleotide. In certain embodiments, the donor polynucleotide changes the stop codon in the target polynucleotide. In certain embodiments, the donor polynucleotide corrects a premature stop codon. Correction can be achieved by deleting the stop codon or by introducing one or more sequence changes to change the stop codon to a codon. In certain embodiments, the donor polynucleotide addresses loss of function mutations, deletions or translocations that may occur, for example, in certain disease settings by inserting or restoring a functional copy of a gene or a functional fragment thereof, or a functional regulatory sequence or a functional fragment of a regulatory sequence. Functional fragments include fragments of nucleotide sequence that are less than a complete copy of a gene but that otherwise provide sufficient nucleotide sequence to restore the functionality of a wild-type gene or non-coding regulatory sequence ( e.g. , a sequence encoding a long non-coding RNA).

在某些實施例中,供體DNA模板可用於置換缺陷基因或其缺陷片段之單一等位基因。在另一實施例中,供體DNA模板用於置換缺陷基因或缺陷基因片段之兩個等位基因。「缺陷基因」或「缺陷基因片段」為基因或基因之一部分,其在表現時無法生成具有相應野生型基因之功能性的功能蛋白或非編碼RNA。In some embodiments, the donor DNA template can be used to replace a single allele of a defective gene or a defective fragment thereof. In another embodiment, the donor DNA template is used to replace both alleles of a defective gene or a defective gene fragment. A "defective gene" or "defective gene fragment" is a gene or a portion of a gene that, when expressed, fails to generate a functional protein or non-coding RNA having the functionality of the corresponding wild-type gene.

在某些例示性實施例中,此等缺陷基因可能與一或多種疾病表型相關。在某些例示性實施例中,不置換缺陷基因或基因片段,但使用異源核酸來插入編碼補償或超越缺陷基因表現之基因或基因片段之供體多核苷酸,使得消除與缺陷基因表現相關之細胞表型或改變為不同或所需細胞表型。這可藉由包括治療蛋白(諸如治療抗體或其功能片段,或與一或多種疾病表型相關之缺陷蛋白的野生型形式)之編碼序列來實現。In certain exemplary embodiments, these defective genes may be associated with one or more disease phenotypes. In certain exemplary embodiments, the defective gene or gene fragment is not replaced, but a donor polynucleotide encoding a gene or gene fragment that compensates or exceeds the expression of the defective gene is inserted using a heterologous nucleic acid, so that the cell phenotype associated with the expression of the defective gene is eliminated or changed to a different or desired cell phenotype. This can be achieved by including a coding sequence for a therapeutic protein (such as a therapeutic antibody or a functional fragment thereof, or a wild-type form of a defective protein associated with one or more disease phenotypes).

在某些實施例中,供體可包括但不限於基因或基因片段、編碼蛋白或欲表現之RNA轉錄本、調節元件、修復模板及其類似物。根據本發明,供體多核苷酸可包含與介導插入之轉座組分一起發揮作用的左端及右端序列元件。In certain embodiments, the donor may include, but is not limited to, a gene or gene fragment, an RNA transcript encoding a protein or to be expressed, a regulatory element, a repair template, and the like. According to the present invention, the donor polynucleotide may include left and right sequence elements that function with the transposition component mediating insertion.

在某些實施例中,供體DNA模板操縱標靶多核苷酸上之剪接位點。在某些實施例中,供體DNA模板破壞剪接位點。可藉由將多核苷酸插入剪接位點中及/或將一或多種突變引入剪接位點中來實現破壞。在某些實施例中,供體多核苷酸可恢復剪接位點。例如,多核苷酸可包含剪接位點序列。In some embodiments, the donor DNA template manipulates the splice site on the target polynucleotide. In some embodiments, the donor DNA template disrupts the splice site. Disruption can be achieved by inserting a polynucleotide into the splice site and/or introducing one or more mutations into the splice site. In some embodiments, the donor polynucleotide can restore the splice site. For example, the polynucleotide can include a splice site sequence.

在某些實施例中,欲插入之供體DNA模板具有10 bp至50 kb長之大小, 例如50 bp至約40 kb、100 bp至約30 kb、100 bp至約10 kb、100 bp至300 bp、200 bp至400 bp、300 bp至500 bp、400 bp至600 bp、500 bp至700 bp、600 bp至800 bp、700 bp至900 bp、800 bp至1000 bp、900 bp至1100 bp、1000 bp至1200 bp、1100 bp至1300 bp、1200 bp至1400 bp、1300 bp至1500 bp、1400 bp至1600 bp、1500 bp至1700 bp、1600 bp至1800 bp、1700 bp至1900 bp、1800 bp至2000 bp核苷酸長。 In certain embodiments, the donor DNA template to be inserted has a size of 10 bp to 50 kb in length, such as 50 bp to about 40 kb, 100 bp to about 30 kb, 100 bp to about 10 kb, 100 bp to 300 bp, 200 bp to 400 bp, 300 bp to 500 bp, 400 bp to 600 bp, 500 bp to 700 bp, 600 bp to 800 bp, 700 bp to 900 bp, 800 bp to 1000 bp, 900 bp to 1100 bp, 1000 bp to 1200 bp, 1100 bp to 1300 bp, 1200 bp to 1400 bp, 1300 bp to 1500 bp, 1400 bp to 1600 bp, or bp, 1500 bp to 1700 bp, 1600 bp to 1800 bp, 1700 bp to 1900 bp, 1800 bp to 2000 bp nucleotide length.

在某些實施例中,欲插入序列一端或兩端之同源臂獨立地為約20 bp、40 bp、60 bp、80 bp、100 bp、120 bp或150 bp。In certain embodiments, the homology arms at one or both ends of the sequence to be inserted are independently about 20 bp, 40 bp, 60 bp, 80 bp, 100 bp, 120 bp or 150 bp.

供體DNA之第一同源臂及第二同源臂側接欲引入標靶核酸中之核苷酸序列(「相關核苷酸序列」或「間插核苷酸序列」)。相關核苷酸序列可包含:i)編碼相關多肽之核苷酸序列;ii)編碼基因外顯子之核苷酸序列;iii)啟動子序列;iv)增強子序列;v)編碼非編碼RNA之核苷酸序列;或vi)前述之任何組合。The first homology arm and the second homology arm of the donor DNA are flanked by nucleotide sequences to be introduced into the target nucleic acid ("related nucleotide sequences" or "intervening nucleotide sequences"). The related nucleotide sequences may include: i) nucleotide sequences encoding related polypeptides; ii) nucleotide sequences encoding gene exons; iii) promoter sequences; iv) enhancer sequences; v) nucleotide sequences encoding non-coding RNA; or vi) any combination of the foregoing.

供體DNA可提供基因校正、基因置換、基因標記、轉殖基因插入、核苷酸缺失、基因破壞、基因突變等。例如,供體DNA可用於向標靶DNA中添加(例如,插入或置換)核酸材料(例如,以「敲入」編碼蛋白質、siRNA、miRNA等之核酸),添加標籤(例如,6xHis、螢光蛋白(例如,綠色螢光蛋白;黃色螢光蛋白等)、血球凝集素(HA)、FLAG等),向基因中添加調節序列(例如,啟動子、多腺苷酸化信號、內部核糖體進入序列(IRES)、2A肽、起始密碼子、終止密碼子、剪接信號、定位信號、增強子等),修飾核酸序列(例如,引入突變)及其類似情形。例如,供體DNA可用於以位點特異性(亦即,「靶向」)方式修飾DNA;例如,基因剔除、基因敲入、基因編輯、基因標記等,如用於例如基因療法,例如以治療疾病;或用作抗病毒、抗病原體或抗癌治療劑、在農業中產生經遺傳修飾之生物體、由細胞大規模產生蛋白質以實現治療、診斷或研究目的、誘導富潛能幹細胞、生物研究、靶向用於缺失或置換之病原體基因等。Donor DNA can provide gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, gene mutation, etc. For example, donor DNA can be used to add (e.g., insert or replace) nucleic acid material to target DNA (e.g., to "knock in" nucleic acids encoding proteins, siRNAs, miRNAs, etc.), add tags (e.g., 6xHis, fluorescent proteins (e.g., green fluorescent proteins; yellow fluorescent proteins, etc.), hemagglutinin (HA), FLAG, etc.), add regulatory sequences to genes (e.g., promoters, polyadenylation signals, internal ribosome entry sequences (IRES), 2A peptides, start codons, stop codons, splicing signals, localization signals, enhancers, etc.), modify nucleic acid sequences (e.g., introduce mutations), and the like. For example, the donor DNA can be used to modify DNA in a site-specific (i.e., "targeted") manner; for example, gene knockout, gene knock-in, gene editing, gene tagging, etc., such as for use in, for example, gene therapy, e.g., to treat disease; or as an antiviral, antipathogenic, or anticancer therapeutic, to generate genetically modified organisms in agriculture, to mass produce proteins by cells for therapeutic, diagnostic, or research purposes, to induce high-potential stem cells, for biological research, to target pathogenic genes for deletion or replacement, etc.

在一些情況下,供體DNA包含編碼相關多肽之核苷酸序列。相關多肽包括例如a)包含一或多種胺基酸取代、插入及/或缺失且展現功能降低之多肽功能形式,例如,其中功能降低與病理疾患相關或引起病理疾患;b)螢光多肽;c)激素;d)配位體之受體;e)離子通道;f)神經遞質;g)及其類似物。In some cases, the donor DNA comprises a nucleotide sequence encoding a polypeptide of interest. The polypeptide of interest includes, for example, a) a functional form of a polypeptide comprising one or more amino acid substitutions, insertions and/or deletions and exhibiting reduced function, for example, where the reduced function is associated with or causes a pathological disorder; b) a fluorescent polypeptide; c) a hormone; d) a receptor for a ligand; e) an ion channel; f) a neurotransmitter; g) and the like.

在一些情況下,供體DNA包含編碼受體細胞中缺乏之野生型蛋白質之核苷酸序列。在一些情況下,供體DNA編碼參與凝血之野生型因子(例如,因子VII、因子VIII、因子IX及其類似因子)。在一些情況下,供體DNA包含編碼治療抗體之核苷酸序列。在一些情況下,供體DNA包含編碼經工程改造之蛋白質或受體之核苷酸序列。在一些情況下,經工程改造之受體為T細胞受體(TCR)、自然殺手(NK)受體(NKR)或B細胞受體(BCR)。在一些情況下,經工程改造之TCR或NKR靶向癌症標記物(例如,在癌細胞之表面上表現(例如,過表現)之多肽)。在一些情況下,供體DNA包含編碼嵌合抗原受體(CAR)之核苷酸序列。在一些情況下,CAR靶向癌症標記物。編碼CAR、TCR及/或NCR蛋白之供體DNA可折疊成DNA摺紙結構(DNA奈米結構)且在活體外或活體內遞送至T細胞或NK細胞中。In some cases, the donor DNA comprises a nucleotide sequence encoding a wild-type protein lacking in the recipient cell. In some cases, the donor DNA encodes a wild-type factor involved in coagulation (e.g., factor VII, factor VIII, factor IX and the like). In some cases, the donor DNA comprises a nucleotide sequence encoding a therapeutic antibody. In some cases, the donor DNA comprises a nucleotide sequence encoding an engineered protein or receptor. In some cases, the engineered receptor is a T cell receptor (TCR), a natural killer (NK) receptor (NKR), or a B cell receptor (BCR). In some cases, the engineered TCR or NKR targets a cancer marker (e.g., a polypeptide expressed (e.g., overexpressed) on the surface of a cancer cell). In some cases, the donor DNA comprises a nucleotide sequence encoding a chimeric antigen receptor (CAR). In some cases, the CAR targets a cancer marker. Donor DNA encoding the CAR, TCR and/or NCR proteins can be folded into a DNA origami structure (DNA nanostructure) and delivered to T cells or NK cells in vitro or in vivo.

可由供體DNA編碼之多肽的非限制性實例包括例如IL1B (介白素1, β)、XDH (黃嘌呤去氫酶)、TP53 (腫瘤蛋白p53)、PTGIS (前列腺素12 (前列腺環素)合成酶)、MB (肌紅蛋白)、IL4 (介白素4)、ANGPT1 (血管生成素1)、ABCG8 (ATP結合卡匣,子族G (WHITE),成員8)、CTSK (組織蛋白酶K)、PTGIR (前列腺素12 (前列腺環素)受體(IP))、KCNJ11 (鉀內向整流通道,子族J,成員11)、INS (胰島素)、CRP (C反應蛋白,五聚環蛋白相關)、PDGFRB (血小板源性生長因子受體,β多肽)、CCNA2 (細胞週期蛋白A2)、PDGFB (血小板源性生長因子β多肽(猿肉瘤病毒性(v-sis)致癌基因同源物))、KCNJ5 (鉀內向整流通道,子族J,成員5)、KCNN3 (鉀中間/小電導鈣激活通道,子族N,成員3)、CAPN10 (鈣蛋白酶10)、PTGES (前列腺素E合成酶)、ADRA2B (腎上腺素激導性α-2B-受體)、ABCG5 (ATP結合卡匣,子族G (WHITE),成員5)、PRDX2 (過氧化物還原蛋白2)、CAPN5 (鈣蛋白酶5)、PARP14 (聚(ADP-核糖)聚合酶家族,成員14)、MEX3C (mex-3同源物C (秀麗隱桿線蟲))、ACE血管緊張素I轉化酶(肽基-二肽酶A) 1)、TNF (腫瘤壞死因子(TNF超家族,成員2))、IL6 (介白素6 (干擾素β2))、STN (斯他汀)、SERPINE1 (serpin肽酶抑制劑,進化枝E (連接蛋白,纖維蛋白溶酶原活化劑抑制劑1型),成員1)、ALB (白蛋白)、ADIPOQ (脂聯素,含有C1Q及膠原蛋白結構域)、APOB (載脂蛋白B (包括Ag(x)抗原))、APOE (載脂蛋白E)、LEP (瘦素)、MTHFR (5,10-亞甲基四氫葉酸還原酶(NADPH))、APOA1 (載脂蛋白A-I)、EDN1 (內皮素1)、NPPB (利鈉肽前驅體B)、NOS3 (一氧化氮合成酶3 (內皮細胞))、PPARG (過氧化物酶體增生物活化受體γ)、PLAT (纖維蛋白溶酶原活化劑,組織)、PTGS2 (前列腺素-內過氧化物合成酶2 (前列腺素G/H合成酶及環氧合酶))、CETP (膽固醇酯轉移蛋白,血漿)、AGTR1 (血管緊張素II受體,1型)、HMGCR (3-羥基-3-甲基戊二醯基-輔酶A還原酶)、IGF1 (胰島素樣生長因子1 (生長介素C))、SELE (選擇素E)、REN (腎素)、PPARA (過氧化物酶體增生物活化受體α)、PON1 (過氧磷酶1)、KNG1 (激肽原1)、CCL2 (趨化介素(C-C模體)配位體2)、LPL (脂蛋白脂肪酶)、vWF (馮威里氏因子)、F2 (凝血因子II (凝血酶))、ICAM1 (細胞間黏附分子1)、TGFB1 (轉化生長因子,β1)、NPPA (利鈉肽前驅體A)、IL10 (介白素10)、EPO (紅血球生成素)、SOD1 (超氧化物歧化酶1,可溶性)、VCAM1 (血管細胞黏附分子1)、IFNG (干擾素,γ)、LPA (脂蛋白,Lp(a))、MPO (髓過氧化物酶)、ESR1 (雌激素受體1)、MAPK1 (促分裂原活化蛋白激酶1)、HP (血紅素結合素)、F3 (凝血因子III (凝血質、組織因子))、CST3 (胱抑素C)、COG2 (寡聚高爾基體複合物組分2)、MMP9 (基質金屬蛋白酶9 (明膠酶B,92 kDa明膠酶,92 kDa IV型膠原酶))、SERPINC1 (serpin肽酶抑制劑,進化枝C (抗凝血酶),成員1)、F8 (凝血因子VIII,促凝血組分)、HMOX1 (血紅素加氧酶(開環) 1)、APOC3 (載脂蛋白C-III)、IL8 (介白素8)、PROK1 (前動力蛋白1)、CBS (胱硫醚-β-合成酶)、NOS2 (一氧化氮合成酶2,誘導性)、TLR4 (toll樣受體4)、SELP (選擇素P (顆粒膜蛋白140 kDa,抗原CD62))、ABCA1 (ATP結合卡匣,子族A (ABC1),成員1)、AGT (血管緊張素原(serpin肽酶抑制劑,進化枝A,成員8))、LDLR (低密度脂蛋白受體)、GPT (麩胺酸-丙酮酸轉胺酶(丙胺酸胺基轉移酶))、VEGFA (血管內皮生長因子A)、NR3C2 (核受體子族3,C組,成員2)、IL18 (介白素18 (干擾素-γ-誘導因子))、NOS1 (一氧化氮合成酶1 (神經元))、NR3C1 (核受體子族3,C組,成員1 (糖皮質激素受體))、FGB (纖維蛋白原β鏈)、HGF (肝細胞生長因子(肝細胞生成素A;擴散因子))、ILIA (介白素1,α)、RETN (抵抗素)、AKT1 (v-akt鼠科動物胸腺瘤病毒致癌基因同源物1)、LIPC (脂肪酶,肝)、HSPD1 (熱休克60 kDa蛋白1 (伴侶蛋白))、MAPK14 (促分裂原活化蛋白激酶14)、SPP1 (分泌型磷蛋白1)、ITGB3 (整合素,β3 (血小板醣蛋白111a,抗原CD61))、CAT (過氧化氫酶)、UTS2 (尾加壓素2)、THBD (血栓調節蛋白)、F10 (凝血因子X)、CP (血漿銅藍蛋白(鐵氧化酶))、TNFRSF11B (腫瘤壞死因子受體子族,成員lib)、EDNRA (內皮素受體A型)、EGFR (表皮生長因子受體(成紅細胞性白血病病毒(v-erb-b)致癌基因同源物,禽))、MMP2 (基質金屬蛋白酶2 (明膠酶A,72 kDa明膠酶,72 kDa IV型膠原酶))、PLG (纖維蛋白溶酶原)、NPY (神經肽Y)、RHOD (ras同源物基因家族,成員D)、MAPK8 (促分裂原活化蛋白激酶8)、MYC (v-myc髓細胞瘤病毒致癌基因同源物(禽))、FN1 (纖維連接蛋白1)、CMA1 (糜蛋白酶1,肥大細胞)、PLAU (纖維蛋白溶酶原活化劑,尿激酶)、GNB3 (鳥嘌呤核苷酸結合蛋白(G蛋白),β多肽3)、ADRB2 (腎上腺素激導性β-2-受體,表面)、APOA5 (載脂蛋白A-V)、SOD2 (超氧化物歧化酶2,粒線體)、F5 (凝血因子V (促凝血球蛋白原,不穩定因子))、VDR (維他命D (1,25-二羥基維他命D3)受體)、ALOX5 (花生四烯酸5 -脂肪加氧酶)、HLA-DRB1 (主要組織相容性複合物,II類,DRβ1)、PARP1 (聚(ADP-核糖)聚合酶1)、CD40LG (CD40配位體)、PON2 (過氧磷酶2)、AGER (晚期糖基化終產物特異性受體)、IRS1 (胰島素受體受質1)、PTGS1 (前列腺素-內過氧化物合成酶1 (前列腺素G/H合成酶及環氧合酶))、ECE1 (內皮素轉化酶1)、F7 (凝血因子VII (血漿凝血酶原轉化促進劑))、URN (介白素1受體拮抗劑)、EPHX2 (環氧化物水解酶2,細胞質)、IGFBP1 (胰島素樣生長因子結合蛋白1)、MAPK10 (促分裂原活化蛋白激酶10)、FAS (Fas (TNF受體子族,成員6))、ABCB1 (ATP結合卡匣,子族B (MDR/TAP),成員1)、JUN (jun致癌基因)、IGFBP3 (胰島素樣生長因子結合蛋白3)、CD14 (CD14分子)、PDE5A (磷酸二酯酶5A,cGMP特異性)、AGTR2 (血管緊張素II受體,2型)、CD40 (CD40分子、TNF受體子族成員5)、LCAT (卵磷脂-膽固醇醯基轉移酶)、CCR5 (趨化介素(C-C模體)受體5)、MMP1 (基質金屬蛋白酶1 (間質膠原酶))、TIMP1 (TIMP金屬肽酶抑制劑1)、ADM (腎上腺髓素)、DYT10 (肌肉緊張不足10)、STAT3 (信號轉導及轉錄活化劑3 (急性期反應因子))、MMP3 (基質金屬蛋白酶3 (基質溶素1,明膠酶原))、ELN (彈性蛋白)、USF1 (上游轉錄因子1)、CFH (補體因子H)、HSPA4 (熱休克70 kDa蛋白4)、MMP12 (基質金屬蛋白酶12 (巨噬細胞彈性酶))、MME (膜金屬內肽酶)、F2R (凝血因子II (凝血酶)受體)、SELL (選擇素L)、CTSB (組織蛋白酶B)、ANXA5 (膜聯蛋白A5)、ADRB1 (腎上腺素激導性β-1-受體)、CYBA (細胞色素b-245,α多肽)、FGA (纖維蛋白原α鏈)、GGT1 (γ-麩胺醯基轉移酶1)、LIPG (脂肪酶,內皮)、HIF1A (缺氧誘導因子1,α次單元(基礎螺旋-環-螺旋轉錄因子))、CXCR4 (趨化介素(C-X-C模體)受體4)、PROC (蛋白C (凝血因子Va及Villa之去活劑))、SCARB1 (清道夫受體B類,成員1)、CD79A (CD79a分子,免疫球蛋白相關α)、PLTP (磷脂轉移蛋白)、ADD1 (內收蛋白1 (α))、FGG (纖維蛋白原γ鏈)、SAA1 (血清澱粉樣Al)、KCNH2 (鉀電壓閘控通道,子族H (eag相關),成員2)、DPP4 (二肽基-肽酶4)、G6PD (葡萄糖-6-磷酸去氫酶)、NPR1 (利鈉肽受體A/鳥苷酸環化酶A (心房利鈉肽受體A))、VTN (玻連蛋白)、KIAA0101 (KIAA0101)、FOS (FBJ鼠科動物骨肉瘤病毒致癌基因同源物)、TLR2 (toll樣受體2)、PPIG (肽基脯胺醯基異構酶G (親環素G))、IL1R1 (介白素1受體,I型)、AR (雄激素受體)、CYP1A1 (細胞色素P450,家族1,子族A,多肽1)、SERPINA1 (serpin肽酶抑制劑,進化枝A (α-1抗蛋白酶,抗胰蛋白酶),成員1)、MTR (5-甲基四氫葉酸-高半胱胺酸甲基轉移酶)、RBP4 (視黃醇結合蛋白4,血漿)、APOA4 (載脂蛋白A-IV)、CDKN2A (細胞週期蛋白依賴性激酶抑制劑2A (黑色素瘤,pl6,抑制CDK4))、FGF2 (成纖維細胞生長因子2 (基礎))、EDNRB (內皮素受體B型)、ITGA2 (整合素,α2 (CD49B,VLA-2受體之α2次單元))、CAB INI (鈣調磷酸酶結合蛋白1)、SHBG (性激素結合球蛋白)、HMGB1 (高遷移率族盒1)、HSP90B2P (熱休克蛋白90 kDaβ(Grp94),成員2 (偽基因))、CYP3A4 (細胞色素P450,家族3,子族A,多肽4)、GJA1 (間隙連接蛋白,α1,43 kDa)、CAV1 (小窩蛋白1,小窩蛋白,22 kDa)、ESR2 (雌激素受體2 (ERβ))、LTA (淋巴毒素α (TNF超家族,成員1))、GDF15 (生長分化因子15)、BDNF (腦源性神經營養因子)、CYP2D6 (細胞色素P450,家族2,子族D,多肽6)、NGF (神經生長因子(β多肽))、SP1 (Sp 1轉錄因子)、TGIF1 (TGFB誘導因子同源盒1)、SRC (v-src肉瘤(Schmidt-Ruppin A-2)病毒致癌基因同源物(禽))、EGF (表皮生長因子(β-尿抑胃素))、PIK3CG (磷酸肌醇-3-激酶,催化性,γ多肽)、HLA-A (主要組織相容性複合物,I類,A)、KCNQ1 (鉀電壓閘控通道、KQT樣子族,成員1)、CNR1 (大麻素受體1 (腦))、FBN1 (原纖維蛋白1)、CHKA (膽鹼激酶α)、BEST1 (bestrophin 1)、APP (澱粉樣β(A4)前驅體蛋白)、CTNNB1 (鏈蛋白(鈣黏蛋白相關蛋白),β1,88 kDa)、IL2 (介白素2)、CD36 (CD36分子(血小板反應蛋白受體))、PRKAB1 (蛋白激酶,AMP活化,β1非催化性次單元)、TPO (甲狀腺過氧化物酶)、ALDH7A1 (醛去氫酶7家族,成員Al)、CX3CR1 (趨化介素(C-X3-C模體)受體1)、TH (酪胺酸羥化酶)、F9 (凝血因子IX)、GH1 (生長激素1)、TF (轉鐵蛋白)、HFE (血色病)、IE17A (介白素17A)、PTEN (磷酸酶及張力蛋白同源物)、GSTM1 (麩胱甘肽S -轉移酶μ1)、DMD (抗肌萎縮蛋白)、GATA4 (GATA結合蛋白4)、F13A1 (凝血因子XIII,Al多肽)、TTR (轉甲狀腺素蛋白)、FABP4 (脂肪酸結合蛋白4,脂肪細胞)、PON3 (過氧磷酶3)、APOC1 (載脂蛋白C-I)、INSR (胰島素受體)、TNFRSF1B (腫瘤壞死因子受體子族,成員IB)、HTR2A (5-羥基色胺(血清素)受體2A)、CSF3 (群落刺激因子3 (粒細胞))、CYP2C9 (細胞色素P450,家族2,子族C,多肽9)、TXN (硫氧還蛋白)、CYP11B2 (細胞色素P450,家族11,子族B,多肽2)、PTH (副甲狀腺激素)、CSF2 (群落刺激因子2 (粒細胞-巨噬細胞))、KDR (激酶插入結構域受體(III型體酪胺酸激酶))、PLA2G2A (磷脂酶A2,IIA族(血小板、滑液))、B2M (β-2-微球蛋白)、THBS1 (血小板反應蛋白1)、GCG (升糖素)、RHOA (ras同源物基因家族,成員A)、ALDH2 (醛去氫酶2家族(粒線體))、TCF7L2 (轉錄因子7樣2 (T細胞特異性,HMG-盒))、BDKRB2 (緩激肽受體B2)、NFE2L2 (核因子(紅系源性2)樣2)、NOTCH1 (Notch同源物1,易位相關(果蠅))、UGT1A1 (UDP葡萄糖醛酸基轉移酶1家族,多肽Al)、IFNA1 (干擾素,α1)、PPARD (過氧化物酶體增生物活化受體δ)、SIRT1 (sirtuin (沈默配型資訊調節2同源物) 1 (釀酒酵母))、GNRH1 (促性腺激素釋放激素1 (促黃體激素釋放激素))、PAPPA (妊娠相關血漿蛋白A,冠毛素1)、ARR3 (抑制蛋白3,視網膜(X-(抑制蛋白))、NPPC (利鈉肽前驅體C)、AHSP (α血紅素穩定蛋白)、PTK2 (PTK2蛋白酪胺酸激酶2)、IL13 (介白素13)、MTOR (雷帕黴素機械標靶(絲胺酸/酥胺酸激酶))、ITGB2 (整合素、β2 (補體組分3受體3及4次單元))、GSTT1 (麩胱甘肽S-轉硫酶θ1)、IL6ST (介白素6信號轉導子(gpl30,制瘤素M受體))、CPB2 (羧基肽酶B2 (血漿))、CYP1A2 (細胞色素P450,家族1,子族A,多肽2)、HNF4A (肝細胞核因子4,α)、SLC6A4 (溶質載體家族6 (神經遞質轉運蛋白,血清素),成員4)、PLA2G6 (磷脂酶A2,VI族(細胞溶質、非鈣依賴性))、TNFSF11 (腫瘤壞死因子(配位體)超家族,成員11)、SLC8A1 (溶質載體家族8 (鈉/鈣交換劑),成員1)、F2RL1 (凝血因子II (凝血酶)受體樣1)、AKR1A1 (醛-酮還原酶家族1,成員A1 (醛還原酶))、ALDH9A1 (醛去氫酶9家族,成員Al)、BGLAP (骨γ-羧基麩胺酸(gla)蛋白)、MTTP (微粒體三酸甘油酯轉移蛋白)、MTRR (5-甲基四氫葉酸-高半胱胺酸甲基轉移酶還原酶)、SULT1A3 (磺基轉移酶家族,細胞溶質,1A,苯酚優先,成員3)、RAGE (腎腫瘤抗原)、C4B (補體組分4B (Chido血型)、P2RY12 (嘌呤能受體P2Y,G-蛋白偶合,12)、RNLS (腎酶,FAD依賴性胺氧化酶)、CREB1 (cAMP反應性元件結合蛋白1)、POMC (原嗎啡黑皮質素)、RAC1 (ras相關C3肉毒桿菌毒素受質1 (rho家族,小TP結合蛋白Racl))、LMNA (核纖層蛋白NC)、CD59 (CD59分子,補體調節蛋白)、SCN5A (鈉通道,電壓閘控,V型α次單元)、CYP1B1 (細胞色素P450,家族1,子族B,多肽1)、MIF (巨噬細胞遷移抑制因子(糖基化抑制因子))、MMP13 (基質金屬蛋白酶13 (膠原酶3))、TIMP2 (TIMP金屬肽酶抑制劑2)、CYP19A1 (細胞色素P450,家族19,子族A,多肽1)、CYP21A2 (細胞色素P450,家族21,子族A,多肽2)、PTPN22 (蛋白酪胺酸磷酸酶、非受體2型2 (淋巴樣))、MYH14 (肌凝蛋白,重鏈14,非肌肉)、MBL2 (甘露糖結合凝集素(蛋白C) 2,可溶性(調理素缺陷))、SELPLG (選擇素P配位體)、AOC3 (含銅胺氧化酶3 (血管黏附蛋白1))、CTSL1 (組織蛋白酶LI)、PCNA (增殖細胞核抗原)、IGF2 (胰島素樣生長因子2 (生長介素A))、ITGB1 (整合素,β1 (纖維連接蛋白受體,β多肽,抗原CD29包括MDF2、MSK12))、CAST (鈣蛋白酶抑制蛋白)、CXCL12 (趨化介素(C-X-C模體)配位體12 (基質細胞源性因子1))、IGHE (免疫球蛋白重鏈恆定ε)、KCNE1 (鉀電壓閘控通道,Isk相關家族,成員1)、TFRC (轉鐵蛋白受體(p90、CD71))、COL1A1 (膠原蛋白,I型,α1)、COL1A2 (膠原蛋白,I型,α2)、IL2RB (介白素2受體,β)、PLA2G10 (磷脂酶A2,X組)、ANGPT2 (血管生成素2)、PROCR (蛋白C受體,內皮(EPCR))、NOX4 (NADPH氧化酶4)、HAMP (鐵調素抗微生物肽)、PTPN11 (蛋白酪胺酸磷酸酶,非受體1型1)、SLC2A1 (溶質載體家族2 (促進葡萄糖轉運蛋白),成員1)、IL2RA (介白素2受體,α)、CCL5 (趨化介素(C-C模體)配位體5)、IRF1 (干擾素調節因子1)、CFLAR (CASP8及FADD樣凋亡調節因子)、CALC A (降鈣素相關多肽α)、EIF4E (真核轉譯起始因子4E)、GSTP1 (麩胱甘肽S-轉移酶pi 1)、JAK2 (Janus激酶2)、CYP3A5 (細胞色素P450,家族3,子族A,多肽5)、HSPG2 (硫酸乙醯肝素蛋白聚醣2)、CCL3 (趨化介素(C-C模體)配位體3)、MYD88 (髓系分化原發反應基因(88))、VIP (血管活性腸肽)、SOAT1 (甾醇O-醯基轉移酶1)、ADRBK1 (腎上腺素激導性β受體激酶1)、NR4A2 (核受體子族4,A組,成員2)、MMP8 (基質金屬蛋白酶8 (嗜中性球膠原酶))、NPR2 (利鈉肽受體B/鳥苷酸環化酶B (心房利鈉肽受體B))、GCH1 (GTP環化水解酶1)、EPRS (麩胺醯基-脯胺醯基-tRNA合成酶)、PPARGC1A (過氧化物酶體增生物活化受體γ,共活化劑1 α)、F12 (凝血因子XII (Hageman因子))、PEC AMI (血小板/內皮細胞黏附分子)、CCL4 (趨化介素(C-C模體)配位體4)、SERPINA3 (serpin肽酶抑制劑,進化枝A (α- 1抗蛋白酶,抗胰蛋白酶),成員3)、CASR (鈣敏感受體)、GJA5 (間隙連接蛋白,α 5,40 kDa)、FABP2 (脂肪酸結合蛋白2,腸)、TTF2 (轉錄終止因子,RNA聚合酶II)、PROS1 (蛋白S (α))、CTF1 (心肌營養素1)、SGCB (肌聚醣,β (43 kDa抗肌萎縮蛋白相關醣蛋白))、YME1L1 (YMEl樣1 (釀酒酵母))、CAMP (抗菌肽抗微生物肽)、ZC3H12A (含鋅指CCCH型12A)、AKR1B1 (醛-酮還原酶家族1,成員B1 (醛糖還原酶))、DES (結蛋白)、MMP7 (基質金屬蛋白酶7 (基質溶素、子宮))、AHR (芳基烴受體)、CSF1 (群落刺激因子1 (巨噬細胞))、HDAC9 (組蛋白去乙醯酶9)、CTGF (結締組織生長因子)、KCNMA1 (鉀大電導鈣激活通道,子族M,α成員1)、UGT1A (UDP葡萄糖醛酸基轉移酶1家族,多肽A複合物基因座)、PRKCA (蛋白激酶C,α)、COMT (兒茶酚-b-甲基轉移酶)、S100B (S100鈣結合蛋白B)、EGR1 (早期生長反應1)、PRL (泌乳素)、IL15 (介白素15)、DRD4 (多巴胺受體D4)、CAMK2G (鈣/鈣調蛋白依賴性蛋白激酶II γ)、SLC22A2 (溶質載體家族22 (有機陽離子轉運蛋白),成員2)、CCL11 (趨化介素(C-C模體)配位體11)、PGF (胎盤生長因子)、THPO (血小板生成素)、GP6 (醣蛋白VI (血小板))、TACR1 (速激肽受體1)、NTS (神經張力蛋白)、HNF1A (HNF1同源盒A)、SST (生長抑素)、KCND1 (鉀電壓閘控通道,Shal相關子族,成員1)、LOC646627 (磷脂酶抑制劑)、TBXAS1 (血栓烷A合成酶1 (血小板))、CYP2J2 (細胞色素P450,家族2,子族J,多肽2)、TBXA2R (血栓烷A2受體)、ADH1C (醇去氫酶1C (I類),γ多肽)、ALOX12 (花生四烯酸12-脂肪加氧酶)、AHSG (α-2-HS-醣蛋白)、BHMT (甜菜鹼-高半胱胺酸甲基轉移酶)、GJA4 (間隙連接蛋白,α4,37 kDa)、SLC25A4 (溶質載體家族25 (粒線體載體;腺嘌呤核苷酸轉位因子),成員4)、ACLY (ATP檸檬酸裂解酶)、ALOX5AP (花生四烯酸5-脂肪加氧酶活化蛋白)、NUMA1 (核有絲分裂器蛋白1)、CYP27B1 (細胞色素P450,家族27,子族B,多肽1)、CYSLTR2 (半胱胺醯基白三烯受體2)、SOD3 (超氧化物歧化酶3,細胞外)、LTC4S (白三烯C4合成酶)、UCN (尿皮素)、GHRL (腦腸肽/肥胖抑素前肽原)、APOC2 (載脂蛋白C-II)、CLEC4A (C型凝集素結構域家族4,成員A)、KBTBD10 (含kelch重複及BTB (POZ)結構域10)、TNC (肌腱蛋白C)、TYMS (胸苷酸合成酶)、SHC1 (SHC (含Src同源2結構域)轉化蛋白1)、LRP1 (低密度脂蛋白受體相關蛋白1)、SOCS3 (細胞介素信號傳導抑制因子3)、ADH1B (醇去氫酶IB (I類),β多肽)、KLK3 (胰舒血管素相關肽酶3)、HSD11B1 (羥基類固醇(11 -β)去氫酶1)、VKORC1 (維他命K環氧化物還原酶複合物,次單元1)、SERPINB2 (serpin肽酶抑制劑,進化枝B (卵白蛋白),成員2)、TNS1 (張力蛋白1)、RNF19A (環指蛋白19A)、EPOR (紅血球生成素受體)、ITGAM (整合素,α M (補體組分3受體3次單元))、PITX2 (類似配對同源結構域2)、MAPK7 (促分裂原活化蛋白激酶7)、FCGR3A (IgG Fc片段,低親和力111a受體(CD16a))、LEPR (瘦素受體)、ENG (內皮糖蛋白)、GPX1 (麩胱甘肽過氧化物酶1)、GOT2 (麩胺酸-草醯乙酸轉胺酶2、粒線體 (天冬胺酸胺基轉移酶2))、HRH1 (組織胺受體HI)、NR112 (核受體子族1,I組,成員2)、CRH (促腎上腺素釋放激素)、HTR1A (5-羥基色胺(血清素)受體1A)、VDAC1 (電壓依賴性陰離子通道1)、HPSE (乙醯肝素酶)、SFTPD (界面活性劑蛋白D)、TAP2 (轉運蛋白2,ATP結合卡匣,子族B (MDR/TAP))、RNF123 (環指蛋白123)、PTK2B (PTK2B蛋白酪胺酸激酶2 β)、NTRK2 (神經營養酪胺酸激酶受體,2型)、IL6R (介白素6受體)、ACHE (乙醯膽鹼酯酶(Yt血型))、GLP1R (升糖素樣肽1受體)、GHR (生長激素受體)、GSR (麩胱甘肽還原酶)、NQOl (NAD(P)H去氫酶,醌1)、NR5A1 (核受體子族5,A組,成員1)、GJB2 (間隙連接蛋白,β2,26 kDa)、SLC9A1 (溶質載體家族9 (鈉/氫交換劑),成員1)、MAOA (單胺氧化酶A)、PCSK9 (前蛋白轉化酶枯草桿菌蛋白酶/kexin型9)、FCGR2A (IgG Fc片段,低親和力Ila受體(CD32))、SERPINF1 (serpin肽酶抑制劑,進化枝F (α-2抗纖溶酶,色素上皮源性因子),成員1)、EDN3 (內皮素3)、DHFR (二氫葉酸還原酶)、GAS6 (生長停滯特異性6)、SMPD1 (鞘磷脂磷酸二酯酶1,酸溶酶體)、UCP2 (解偶蛋白2 (粒線體,質子載體))、TFAP2A (轉錄因子AP-2 α (活化增強子結合蛋白2 α))、C4BPA (補體組分4結合蛋白,α)、SERPINF2 (serpin肽酶抑制劑,進化枝F (α-2抗纖溶酶,色素上皮源性因子),成員2)、TYMP (胸苷磷酸化酶)、ALPP (鹼性磷酸酶、胎盤(Regan同功酶))、CXCR2 (趨化介素(C-X-C模體)受體2)、SLC39A3 (溶質載體家族39 (鋅轉運蛋白),成員3)、ABCG2 (ATP-結合卡匣,子族G (WHITE),成員2)、ADA (腺苷去胺酶)、JAK3 (Janus激酶3)、HSPA1A (熱休克70 kDa蛋白1A)、FASN (脂肪酸合成酶)、FGF1 (成纖維細胞生長因子1 (酸性))、Fll (凝血因子XI)、ATP7A (ATPase、Cu++轉運,α多肽)、CR1 (補體組分(3b/4b)受體1 (Knops血型))、GFAP (神經膠質纖維酸性蛋白)、ROCK1 (含Rho相關捲曲螺旋之蛋白激酶1)、MECP2 (甲基CpG結合蛋白2 (Rett症候群))、MYLK (肌凝蛋白輕鏈激酶)、BCF1E (丁醯膽鹼酯酶)、LIPE (脂肪酶,激素敏感性)、PRDX5 (過氧化物還原蛋白5)、ADORA1 (腺苷A1受體)、WRN (Werner症候群,RecQ解螺旋酶樣)、CXCR3 (趨化介素(C-X-C模體)受體3)、CD81 (CD81分子)、SMAD7 (SMAD家族成員7)、LAMC2 (層連結蛋白,γ2)、MAP3K5 (促分裂原活化蛋白激酶激酶激酶5)、CF1GA (嗜鉻粒蛋白A (副甲狀腺分泌蛋白1))、IAPP (胰島澱粉樣多肽)、RFIO (視紫質)、ENPP1 (外核苷酸焦磷酸酶/磷酸二酯酶1)、PTF1LF1 (副甲狀腺激素樣激素)、NRG1 (神經調節蛋白1)、VEGFC (血管內皮生長因子C)、ENPEP (麩胺醯基胺基肽酶(胺基肽酶A))、CEBPB (CCAAT/增強子結合蛋白(C/EBP)、β)、NAGLU (N-乙醯基葡萄糖苷酶,α)、F2RL3 (凝血因子II (凝血酶)受體樣3)、CX3CL1 (趨化介素(C-X3-C模體)配位體1)、BDKRB1 (緩激肽受體Bl)、ADAMTS13 (具有血小板反應蛋白1型模體之ADAM金屬肽酶,13)、ELANE (彈性酶,嗜中性球表現)、ENPP2 (外核苷酸焦磷酸酶/磷酸二酯酶2)、CISFl (細胞介素誘導型含SF12蛋白)、GAST (胃泌素)、MYOC (肌纖蛋白,小梁網誘導型糖皮質激素反應)、ATP1A2 (ATPase,Na+/K+轉運,α2多肽)、NF1 (神經纖維瘤蛋白1)、GJB1 (間隙連接蛋白,β1,32 kDa)、MEF2A (肌細胞增強因子2A)、VCL (黏著斑蛋白)、BMPR2 (骨形態發生蛋白受體,II型(絲胺酸/酥胺酸激酶))、TUBB (微管蛋白,β)、CDC42 (細胞分裂週期42 (GTP結合蛋白,25 kDa))、KRT18 (角蛋白18)、F1SF1 (熱休克轉錄因子1)、MYB (v-myb成髓細胞瘤病毒致癌基因同源物(禽))、PRKAA2 (蛋白激酶,AMP活化,α2催化性次單元)、ROCK2 (含Rho相關捲曲螺旋之蛋白激酶2)、TFPI (組織因子路徑抑制劑(脂蛋白相關凝血抑制劑))、PRKG1 (蛋白激酶,cGMP依賴性、I型)、BMP2 (骨形態發生蛋白2)、CTNND1 (鏈蛋白(鈣黏蛋白相關蛋白)、δ 1)、CTF1 (胱硫醚酶(胱硫醚γ-裂解酶))、CTSS (組織蛋白酶S)、VAV2 (vav 2鳥嘌呤核苷酸交換因子)、NPY2R (神經肽Y受體Y2)、IGFBP2 (胰島素樣生長因子結合蛋白2,36 kDa)、CD28 (CD28分子)、GSTA1 (麩胱甘肽S-轉移酶α1)、PPIA (肽基脯胺醯基異構酶A (親環素A))、APOF1 (載脂蛋白FI (β-2- 醣蛋白I))、S100A8 (S100鈣結合蛋白A8)、IL11 (介白素11)、ALOX15 (花生四烯酸15 -脂肪加氧酶)、FBLN1 (纖蛋白1)、NR1F13 (核受體子族1,FI組,成員3)、SCD (硬脂醯基-CoA去飽和酶(δ-9-去飽和酶))、GIP (胃抑制多肽)、CF1GB (嗜鉻粒蛋白B (分泌粒蛋白1))、PRKCB (蛋白激酶C,β)、SRD5A1 (類固醇-5-α-還原酶,α多肽1 (3-側氧基-5α-類固醇δ 4-去氫酶α1))、F1SD11B2 (羥基類固醇(11-β)去氫酶2)、CALCRL (降鈣素受體樣)、GALNT2 (UDP-N-乙醯基-α-D-半乳糖胺:多肽N-乙醯基半乳糖胺基轉移酶2 (GalNAc-T2))、ANGPTL4 (血管生成素樣4)、KCNN4 (鉀中間/小電導鈣激活通道,子族N,成員4)、PIK3C2A (磷酸肌醇-3-激酶,2類,α多肽)、HBEGF (肝素結合EGF樣生長因子)、CYP7A1 (細胞色素P450,家族7,子族A,多肽1)、HLA-DRB5 (主要組織相容性複合物,II類,DRβ5)、BNIP3 (BCL2/腺病毒E1B 19 kDa相互作用蛋白3)、GCKR (葡萄糖激酶(己糖激酶4)調節因子)、S100A12 (S100鈣結合蛋白A 12)、PADI4 (肽基精胺酸去胺酶,IV型)、HSPA14 (熱休克70 kDa蛋白14)、CXCR1 (趨化介素(C-X-C模體)受體1)、H19 (H19,印跡母體表現轉錄本(非蛋白編碼))、KRTAP19-3 (角蛋白相關蛋白19-3)、胰島素、RAC2 (ras相關C3肉毒桿菌毒素受質2 (rho家族,小GTP結合蛋白Rac2))、RYR1 (蘭尼鹼受體1 (骨骼))、CLOCK (clock同源物(小鼠))、NGFR (神經生長因子受體(TNFR超家族,成員16))、DBH (多巴胺β-羥化酶(多巴胺β-單加氧酶))、CHRNA4 (膽鹼激導性受體,菸鹼,α4)、CACNA1C (鈣通道,電壓依賴性,L型,α1C次單元)、PRKAG2 (蛋白激酶,AMP活化,γ2非催化性次單元)、CHAT (膽鹼乙醯基轉移酶)、PTGDS (前列腺素D2合成酶21 kDa (腦))、NR1H2 (核受體子族1,H組,成員2)、TEK (TEK酪胺酸激酶,內皮)、VEGFB (血管內皮生長因子B)、MEF2C (肌細胞增強因子2C)、MAPKAPK2 (促分裂原活化蛋白激酶活化蛋白激酶2)、TNFRSF11 A (腫瘤壞死因子受體子族,成員11a,NFKB活化劑)、HSPA9 (熱休克70 kDa蛋白9 (壽命蛋白))、CYSLTR1 (半胱胺醯基白三烯受體1)、MAT1A (甲硫胺酸腺苷轉移酶I,α)、OPRL1 (鴉片受體樣1)、IMPA1 (肌醇(肌肉)-l(或4) -單磷酸酶1)、CLCN2 (氯離子通道2)、DLD (二氫硫辛醯胺去氫酶)、PSMA6 (蛋白酶體(前體、巨蛋白因子)次單元,α型,6)、PSMB8 (蛋白酶體(前體、巨蛋白因子)次單元,β型,8 (大多功能肽酶7))、CHI3L1 (幾丁質酶3樣1 (軟骨醣蛋白-39))、ALDH1B1 (醛去氫酶1家族,成員Bl)、PARP2 (聚(ADP-核糖)聚合酶2)、STAR (類固醇合成急性調節蛋白)、LBP (脂多醣結合蛋白)、ABCC6 (ATP-結合卡匣,子族C (CFTR/MRP),成員6)、RGS2 (G蛋白信號傳導調節因子2,24 kDa)、EFNB2 (腎上腺素-B2)、囊性纖維化跨膜傳導調節因子(CFTR)、GJB6 (間隙連接蛋白,β6,30 kDa)、APOA2 (載脂蛋白A-II)、AMPD1 (腺苷單磷酸去胺酶1)、DYSF (dysferlin,肢帶肌肉萎縮2B (體染色體隱性))、FDFT1 (法尼基-二磷酸法尼基轉移酶1)、EDN2 (內皮素2)、CCR6 (趨化介素(C-C模體)受體6)、GJB3 (間隙連接蛋白,β3,31 kDa)、IL1RL1 (介白素1受體樣1)、ENTPD1 (外核苷三磷酸二磷酸水解酶1)、BBS4 (Bardet-Biedl症候群4)、CELSR2 (鈣黏蛋白,EGF LAG七經G型受體2 (紅鸛同源物,果蠅))、F11R (Fll受體)、RAPGEF3 (Rap鳥嘌呤核苷酸交換因子(GEF) 3)、HYAL1 (玻尿酸葡萄糖苷酶1)、ZNF259 (鋅指蛋白259)、ATOX1 (ATX1抗氧化劑蛋白1同源物(酵母))、ATF6 (活化轉錄因子6)、KΉK (酮己糖激酶(果糖激酶))、SAT1 (亞精胺/精胺Nl-乙醯基轉移酶1)、GGFI (γ-麩胺醯水解酶(結合酶,吡咯基聚γ麩胺醯水解酶))、TIMP4 (TIMP金屬肽酶抑制劑4)、SLC4A4 (溶質載體家族4,碳酸氫鈉共轉運蛋白,成員4)、PDE2A (磷酸二酯酶2 A,cGMP刺激)、PDE3B (磷酸二酯酶3B,cGMP抑制)、FADS1 (脂肪酸去飽和酶1)、FADS2 (脂肪酸去飽和酶2)、TMSB4X (胸腺素β4,X連鎖)、TXNIP (硫氧還蛋白相互作用蛋白)、LIMS1 (LIM及衰老細胞抗原樣結構域1)、RFIOB (ras同源物基因家族,成員B)、LY96 (淋巴球抗原96)、F側氧基l (叉頭盒01)、PNPLA2 (含patatin樣磷脂酶結構域2)、TRH (促甲狀腺激素釋放激素)、GJC1 (間隙連接蛋白,γ1,45 kDa)、SLC17A5 (溶質載體家族17 (陰離子/糖轉運蛋白),成員5)、FTO (脂肪質量及肥胖相關)、GJD2 (間隙連接蛋白,δ 2,36 kDa)、PSRC1 (富脯胺酸/絲胺酸之捲曲螺旋1)、CASP12 (半胱天冬酶12 (基因/偽基因))、GPBAR1 (G蛋白-偶合膽汁酸受體1)、PXK (含絲胺酸/酥胺酸激酶之PX結構域)、IL33 (介白素33)、TRIB1 (毛球族同源物1 (果蠅))、PBX4 (前B細胞白血病同源盒4)、NUPR1 (核蛋白,轉錄al調節因子、1)、15-Sep(15 kDa硒蛋白)、CILP2 (軟骨中間層蛋白2)、TERC (端粒酶RNA組分)、GGT2 (γ-麩胺醯基轉移酶2)、MT-COl (粒線體編碼之細胞色素c氧化酶I)、UOX (尿酸氧化酶,偽基因)、CRISPR/Cas效應多肽、酶活性CRISPR/Cas效應多肽(例如,能夠裂解標靶核酸)及無酶活性CRISPR/Cas效應多肽(例如,不裂解標靶核酸,但保留結合於標靶核酸)。在一些情況下,供體DNA編碼任何前述多肽之野生型形式;亦即,供體DNA可編碼不包括導致功能降低、功能缺乏或發病機制之突變的「正常」形式。Non-limiting examples of polypeptides that may be encoded by the donor DNA include, for example, IL1B (interleukin 1, beta), XDH (xanthine dehydrogenase), TP53 (tumor protein p53), PTGIS (prostaglandin 12 (prostaglandin) synthase), MB (myoglobin), IL4 (interleukin 4), ANGPT1 (angiopoietin 1), ABCG8 (ATP binding cassette, subfamily G (WHITE), member 8), CTSK (cathepsin K), PTGIR (prostaglandin 12 (prostaglandin) receptor (IP)), KCNJ11 (potassium inward rectifier channel, subfamily J, member 11), INS (insulin), CRP (C-reactive protein, pentraxin-related), PDGFRB (platelet-derived growth factor receptor, beta polypeptide), CCNA2 (cytokine A2), PDGFB (platelet-derived growth factor beta polypeptide (simian sarcoma viral (v-sis) oncogene homolog)), KCNJ5 (potassium inward rectifier channel, subfamily J, member 5), KCNN3 (potassium intermediate/small conductance calcium-activated channel, subfamily N, member 3), CAPN10 (calcification 10), PTGES (prostaglandin E synthetase), ADRA2B (adrenaline-stimulated alpha-2B-receptor), ABCG5 (ATP-binding cassette, subfamily G (WHITE), member 5), PRDX2 (peroxiredoxin 2), CAPN5 (calcification 5), PARP14 (poly (ADP-ribose) polymerase family, member 14), MEX3C (mex-3 homolog C (Cryptidys elegans)), ACE angiotensin I convertase (peptidyl-dipeptidase A) 1), TNF (tumor necrosis factor (TNF superfamily, member 2)), IL6 (interleukin 6 (interferon beta 2)), STN (statin), SERPINE1 (serpin peptidase inhibitor, clade E (connexin, fibronectin activator inhibitor type 1), member 1), ALB (albumin), ADIPOQ (adiponectin, containing C1Q and collagen domains), APOB (apolipoprotein B (including Ag(x) antigen)), APOE (apolipoprotein E), LEP (leptin), MTHFR (5,10-methylenetetrahydrofolate reductase (NADPH)), APOA1 (apolipoprotein A-I), EDN1 (endothelin 1), NPPB (natriuretic peptide pro-actin B), NOS3 (nitric oxide synthase 3 (endothelial cells)), PPARG (peroxisome proliferator-activated receptor gamma), PLAT (fibroblast lysinogen activator, tissue), PTGS2 (prostaglandin-endoperoxide synthase 2 (prostaglandin G/H synthase and cyclooxygenase)), CETP (cholesterol ester transfer protein, plasma), AGTR1 (angiotensin II receptor, type 1), HMGCR (3-hydroxy-3-methylglutaryl-coenzyme A reductase), IGF1 (insulin-like growth factor 1 (interleukin C)), SELE (selectin E), REN (renin), PPARA (peroxisome proliferator-activated receptor alpha), PON1 (peroxisomal phosphatase 1), KNG1 (kininogen 1), CCL2 (chemokine (C-C motif) ligand 2), LPL (lipoprotein lipase), vWF (von Willebrand factor), F2 (coagulation factor II (thrombin)), ICAM1 (intercellular adhesion molecule 1), TGFB1 (transforming growth factor, β1), NPPA (natriuretic peptide propromoter A), IL10 (interleukin 10), EPO (erythropoietin), SOD1 (superoxide dismutase 1, soluble), VCAM1 (vascular cell adhesion molecule 1), IFNG (interferon, gamma), LPA (lipoprotein, Lp(a)), MPO (myeloperoxidase), ESR1 (estrogen receptor 1), MAPK1 (mitogen-activated protein kinase 1), HP (heme phenotype), F3 (coagulation factor III (thrombin, tissue factor)), CST3 (cystatin C), COG2 (oligomeric Golgi complex component 2), MMP9 (matrix metalloproteinase 9 (gelatinase B, 92 kDa gelatinase, 92 kDa Type IV collagenase)), SERPINC1 (serpin peptidase inhibitor, clade C (antithrombin), member 1), F8 (coagulation factor VIII, procoagulant component), HMOX1 (heme oxygenase (ring opening) 1), APOC3 (apolipoprotein C-III), IL8 (interleukin 8), PROK1 (prokinesin 1), CBS (cystathionine-β-synthase), NOS2 (nitric oxide synthase 2, inducing), TLR4 (toll-like receptor 4), SELP (selectin P (granule membrane protein 140 kDa, antigen CD62)), ABCA1 (ATP binding cassette, subfamily A (ABC1), member 1), AGT (angiotensinogen (serpin peptidase inhibitor, clade A, member 8)), LDLR (low-density lipoprotein receptor), GPT (glutamine-pyruvate transaminase (alanine aminotransferase)), VEGFA (vascular endothelial growth factor A), NR3C2 (nuclear receptor subfamily 3, group C, member 2), IL18 (interleukin 18 (interferon-γ-inducing factor)), NOS1 (nitric oxide synthase 1 (neuron)), NR3C1 (nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor)), FGB (fibrillogen beta chain), HGF (hepatocyte growth factor (hepatocyte poi-toiin A; proliferating factor)), ILIA (interleukin 1, alpha), RETN (resistin), AKT1 (v-akt murine thymoma viral oncogene homolog 1), LIPC (lipase, liver), HSPD1 (heat shock protein 60 kDa 1 (chaperone)), MAPK14 (mitogen activated protein kinase 14), SPP1 (secreted phosphoprotein 1), ITGB3 (integrin, β3 (thrombopoietin 111a, antigen CD61)), CAT (catalase), UTS2 (urotensin 2), THBD (thrombomodulin), F10 (coagulation factor X), CP (plasma cucocyanin (ferrooxidase)), TNFRSF11B (tumor necrosis factor receptor subfamily, member lib), EDNRA (endothelin receptor type A), EGFR (epidermal growth factor receptor (erythroblastic leukemia virus (v-erb-b) oncogene homolog, avian)), MMP2 (matrix metalloproteinase 2 (gelatinase A, 72 kDa gelatinase, 72 kDa type IV collagenase)), PLG (fibrolysinogen), NPY (neuropeptide Y), RHOD (ras homolog gene family, member D), MAPK8 (mitogen-activated protein kinase 8), MYC (v-myc myelocytoma viral oncogene homolog (avian)), FN1 (fibronectin 1), CMA1 (chymotrypsin 1, mast cell), PLAU (fibronectin activator, urokinase), GNB3 (guanine nucleotide-binding protein (G protein), beta polypeptide 3), ADRB2 (adrenaline-stimulated beta-2-receptor, surface), APOA5 (apolipoprotein A-V), SOD2 (superoxide dismutase 2, mitochondrial), F5 (coagulation factor V (prothrombin, labile factor)), VDR (vitamin D (1,25-dihydroxyvitamin D3) receptor), ALOX5 (arachidonic acid 5-lipoxygenase), HLA-DRB1 (major histocompatibility complex, class II, DRβ1), PARP1 (poly (ADP-ribose) polymerase 1), CD40LG (CD40 ligand), PON2 (peroxisomal 2), AGER (advanced glycation end product specific receptor), IRS1 (insulin receptor receptor 1), PTGS1 (prostaglandin-endoperoxide synthase 1 (prostaglandin G/H synthase and cyclooxygenase)), ECE1 (endothelin convertase 1), F7 (coagulation factor VII (plasma prothrombin conversion accelerator)), URN (interleukin 1 receptor antagonist), EPHX2 (epoxide hydrolase 2, cytoplasmic), IGFBP1 (insulin-like growth factor binding protein 1), MAPK10 (mitogen-activated protein kinase 10), FAS (Fas (TNF receptor subfamily, member 6)), ABCB1 (ATP binding cassette, subfamily B (MDR/TAP), member 1), JUN (jun oncogene), IGFBP3 (insulin-like growth factor binding protein 3), CD14 (CD14 molecule), PDE5A (phosphodiesterase 5A, cGMP specific), AGTR2 (angiotensin II receptor, type 2), CD40 (CD40 molecule, TNF receptor subfamily member 5), LCAT (lecithin-cholesterol transferase), CCR 5 (chemokine (C-C motif) receptor 5), MMP1 (matrix metalloproteinase 1 (interstitial collagenase)), TIMP1 (TIMP metallopeptidase inhibitor 1), ADM (adrenomedulin), DYT10 (dystonia 10), STAT3 (signal transducer and activator of transcription 3 (acute phase response factor)), MMP3 (matrix metalloproteinase 3 (matrilysin 1, procollagen)), ELN (elastin), USF1 (upstream transcription factor 1), CFH (complement factor H), HSPA4 (heat shock 70 kDa protein 4), MMP12 (matrix metalloproteinase 12 (macrophage elastase)), MME (membrane metalloendopeptidase), F2R (coagulation factor II (thrombin) receptor), SELL (selectin L), CTSB (cathepsin B), ANXA5 (annexin A5), ADRB1 (adrenaline stimulating beta-1-receptor), CYBA (cytochrome b-245, alpha polypeptide), FGA (fibrinogen alpha chain), GGT1 (gamma-glutamicinyltransferase 1), LIPG (lipase, endothelial), HIF1A (hypoxia-inducing factor 1, alpha subunit (basal helix-loop-helix transcription factor)), CXCR4 (interleukin (C-X-C motif) receptor 4), PROC (protein C (inactivator of coagulation factors Va and Villa)), SCARB1 (scavenger receptor class B, member 1), CD79A (CD79a molecule, immunoglobulin-related alpha), PLTP (phospholipid transfer protein), ADD1 (adductin 1 (alpha)), FGG (fibrinogen gamma chain), SAA1 (serum amyloid Al), KCNH2 (potassium voltage-gated channel, subfamily H (eag-related), member 2), DPP4 (dipeptidyl-peptidase 4), G6PD (glucose-6-phosphate dehydrogenase), NPR1 (natriuretic peptide receptor A/guanylate cyclase A) (atrial natriuretic peptide receptor A)), VTN (vitronectin), KIAA0101 (KIAA0101), FOS (FBJ murine osteosarcoma viral oncogene homolog), TLR2 (toll-like receptor 2), PPIG (peptidylprolyl isomerase G (cyclophilin G)), IL1R1 (interleukin 1 receptor, type I), AR (androgen receptor), CYP1A1 (cytochrome P450, family 1, subfamily A, polypeptide 1), SERPINA1 (serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 1), MTR (5-methyltetrahydrofolate-homocysteine methyltransferase), RBP4 (retinol binding protein 4, plasma), APOA4 (apolipoprotein A-IV), CDKN2A (Cell cycle protein-dependent kinase inhibitor 2A (melanoma, p16, inhibits CDK4)), FGF2 (fibroblast growth factor 2 (basal)), EDNRB (endothelin receptor type B), ITGA2 (integrin, α2 (CD49B, α2 subunit of VLA-2 receptor)), CAB INI (calcineurin phosphatase binding protein 1), SHBG (sex hormone binding globulin), HMGB1 (high mobility group box 1), HSP90B2P (heat shock protein 90 kDa beta (Grp94), member 2 (pseudogene)), CYP3A4 (cytochrome P450, family 3, subgroup A, polypeptide 4), GJA1 (gap junction protein, α1, 43 kDa), CAV1 (cavolin 1, caveolin, 22 kDa), ESR2 (estrogen receptor 2 (ERβ)), LTA (lymphotoxin alpha (TNF superfamily, member 1)), GDF15 (growth differentiation factor 15), BDNF (brain-derived neurotrophic factor), CYP2D6 (cytochrome P450, family 2, subfamily D, polypeptide 6), NGF (neural growth factor (β polypeptide)), SP1 (Sp 1 transcription factor), TGIF1 (TGFB inducing factor homeobox 1), SRC (v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian)), EGF (epidermal growth factor (β-urogastatin)), PIK3CG (phosphoinositide-3-kinase, catalytic, gamma polypeptide), HLA-A (major histocompatibility complex, class I, A), KCNQ1 (potassium voltage-gated channel, KQT-like family, member 1), CNR1 (cannabinoid receptor 1 (brain)), FBN1 (protofibrillin 1), CHKA (choleline kinase alpha), BEST1 (bestrophin 1), APP (amyloid beta (A4) precursor protein), CTNNB1 (trencin (calcified mucin-related protein), β1, 88 kDa), IL2 (interleukin 2), CD36 (CD36 molecule (thrombospondin receptor)), PRKAB1 (protein kinase, AMP-activated, β1 non-catalytic subunit), TPO (thyroid peroxidase), ALDH7A1 (aldehyde dehydrogenase 7 family, member Al), CX3CR1 (interleukin (C-X3-C motif) receptor 1), TH (tyrosine hydroxylase), F9 (coagulation factor IX), GH1 (growth hormone 1), TF (transferrin), HFE (hemochromatosis), IE17A (interleukin 17A), PTEN (phosphatase and tensin homolog), GSTM1 (glutathione S-transferase μ1), DMD (dystrophin), GATA4 (GATA binding protein 4), F13A1 (coagulation factor XIII, Al polypeptide), TTR (transthyretin), FABP4 (fatty acid binding protein 4, adipocyte), PON3 (peroxisomal phosphatase 3), APOC1 (apolipoprotein C-I), INSR (insulin receptor), TNFRSF1B (tumor necrosis factor receptor subfamily, member IB), HTR2A (5-hydroxytryptamine (serotonin) receptor 2A), CSF3 (colony stimulating factor 3 (granulocyte)), CYP2C9 (cytochrome P450, family 2, subgroup C, polypeptide 9), TXN (thioredoxin), CYP11B2 (cytochrome P450, family 11, subgroup B, polypeptide 2), PTH (parathyroid hormone), CSF2 (colony stimulating factor 2 (granulocyte-macrophage)), KDR (kinase insert domain receptor (type III body tyrosine kinase)), PLA2G2A (phospholipase A2, group IIA (platelets, synovial fluid)), B2M (beta-2-microglobulin), THBS1 (thrombospondin 1), GCG (glucagon), RHOA (ras homolog gene family, member A), ALDH2 (aldehyde dehydrogenase 2 family (mitochondria)), TCF7L2 (transcription factor 7-like 2 (T cell-specific, HMG-box)), BDKRB2 (bradykinin receptor B2), NFE2L2 (nuclear factor (erythroid 2)-like 2), NOTCH1 (Notch homolog 1, translocation-related (Drosophila)), UGT1A1 (UDP glucuronosyltransferase 1 family, polypeptide Al), IFNA1 (interferon, alpha 1), PPARD (peroxisome proliferator-activated receptor delta), SIRT1 (sirtuin (silent mate information regulator 2 homolog) 1 (brew yeast)), GNRH1 (gonadotropin-releasing hormone 1 (luteinizing hormone-releasing hormone)), PAPPA (pregnancy-associated plasma protein A, pappa lysine 1), ARR3 (arrestin 3, retinal (X-(arrestin)), NPPC (natriuretic peptide promotor C), AHSP (alpha heme stabilizer), PTK2 (PTK2 protein tyrosine kinase 2), IL13 (interleukin 13), MTOR (mechanistic target of rapamycin (serine/thiocyanate kinase)), ITGB2 (integrin, β2 (complement component 3 receptor 3 and 4 subunits)), GSTT1 (glutathione S-transferase theta 1), IL6ST (interleukin 6 signal transducer (gpl30, oncostatin M receptor)), CPB2 (carboxypeptidase B2 (plasma)), CYP1A2 (cytochrome P450, family 1, subfamily A, polypeptide 2), HNF4A (hepatocyte nuclear factor 4, alpha), SLC6A4 (solute carrier family 6 (neurotransmitter, serotonin), member 4), PLA2G6 (phospholipase A2, group VI (cytosolic, calcium-independent)), TNFSF11 (tumor necrosis factor (ligand) superfamily, member 11), SLC8A1 (solutolytic carrier family 8 (sodium/calcium exchanger), member 1), F2RL1 (coagulation factor II (thrombin) receptor-like 1), AKR1A1 (aldodeductase family 1, member A1 (aldehyde reductase)), ALDH9A1 (aldehyde dehydrogenase 9 family, member Al), BGLAP (bone gamma-carboxyglutamine (gla) protein), MTTP (microsomal triglyceride transfer protein), MTRR (5-methyltetrahydrofolate-homocysteine methyltransferase reductase), SULT1A3 (sulfotransferase family, cytosolic, 1A, phenol-preferred, member 3), RAGE (kidney tumor antigen), C4B (complement component 4B (Chido blood group), P2RY12 (purinergic receptor P2Y, G-protein coupled, 12), RNLS (nephrin, FAD-dependent amine oxidase), CREB1 (cAMP responsive element binding protein 1), POMC (promorphin melanocortin), RAC1 (ras-related C3 botulinum toxin substrate 1 (rho family, small TP binding protein Racl)), LMNA (nuclear laminin NC), CD59 (CD59 molecule, complement regulatory protein), SCN5A (sodium channel, voltage-gated, type V alpha subunit), CYP1B1 (cytochrome P450, family 1, subgroup B, polypeptide 1), MIF (macrophage migration inhibitory factor (glycosylation inhibitor)), MMP13 (matrix metalloproteinase 13 (collagenase 3)), TIMP2 (TIMP metallopeptidase inhibitor 2), CYP19A1 (cytochrome P450, family 19, subgroup A, polypeptide 1), CYP21A2 (cytochrome P450, family 21, subgroup A, polypeptide 2), PTPN22 (protein tyrosine phosphatase, non-receptor type 2 (lymphoid)), MYH14 (myosin, heavy chain 14, non-muscle), MBL2 (mannose-binding lectin (protein C) 2, soluble (opsonin deficiency)), SELPLG (selectin P ligand), AOC3 (copper-containing amine oxidase 3 (vascular adhesion protein 1)), CTSL1 (histatinase LI), PCNA (proliferating cell nuclear antigen), IGF2 (insulin-like growth factor 2 (interleukin A)), ITGB1 (integrin, beta 1 (fibronectin receptor, beta polypeptide, antigen CD29 including MDF2, MSK12)), CAST (calcified protein), CXCL12 (chemointerleukin (C-X-C motif) ligand 12 (stromal cell-derived factor 1)), IGHE (immunoglobulin heavy chain homeostasis epsilon), KCNE1 (potassium voltage-gated channel, Isk-related family, member 1), TFRC (transferrin receptor (p90, CD71)), COL1A1 (collagen, type I, α1), COL1A2 (collagen, type I, α2), IL2RB (interleukin 2 receptor, beta), PLA2G10 (phospholipase A2, group X), ANGPT2 (angiopoietin 2), PROCR (protein C receptor, endothelial (EPCR)), NOX4 (NADPH oxidase 4), HAMP (hepcidin antimicrobial peptide), PTPN11 (protein tyrosine phosphatase, non-receptor type 1), SLC2A1 (solute carrier family 2 (glucose transporter), member 1), IL2RA (interleukin 2 receptor, alpha), CCL5 (Cellulin (C-C motif) ligand 5), IRF1 (interferon regulatory factor 1), CFLAR (CASP8 and FADD-like apoptosis regulator), CALC A (calcitonin-related polypeptide alpha), EIF4E (eukaryotic translation initiation factor 4E), GSTP1 (glutathione S-transferase pi 1), JAK2 (Janus kinase 2), CYP3A5 (cytochrome P450, family 3, subgroup A, polypeptide 5), HSPG2 (heparan sulfate proteoglycan 2), CCL3 (interleukin (C-C motif) ligand 3), MYD88 (myeloid differentiation primary response gene (88)), VIP (vasoactive intestinal peptide), SOAT1 (sterol O-acyltransferase 1), ADRBK1 (adrenaline-stimulated beta receptor kinase 1), NR4A2 (nuclear receptor subfamily 4, group A, member 2), MMP8 (matrix metalloproteinase 8 (neutrophil globulinogenase)), NPR2 (natriuretic peptide receptor B/guanylate cyclase B (atrial natriuretic peptide receptor B)), GCH1 (GTP cyclohydrolase 1), EPRS (glutamicin-prolyl-tRNA synthetase), PPARGC1A (peroxisome proliferator-activated receptor gamma, coactivator 1 alpha), F12 (coagulation factor XII (Hageman factor)), PEC AMI (platelet/endothelial cell adhesion molecule), CCL4 (chemointerferon (C-C motif) ligand 4), SERPINA3 (serpin peptidase inhibitor, clade A (α-1 antiprotease, antitrypsin), member 3), CASR (calcium-sensitive receptor), GJA5 (gap junction protein, α 5, 40 kDa), FABP2 (fatty acid binding protein 2, intestinal), TTF2 (transcriptional termination factor, RNA polymerase II), PROS1 (protein S (α)), CTF1 (cardiotrophin 1), SGCB (sarcoglycan, β (43 kDa dystrophin-related glycoprotein)), YME1L1 (YME1-like 1 (brew yeast)), CAMP (antibacterial peptide antimicrobial peptide), ZC3H12A (zinc finger CCCH type 12A), AKR1B1 (aldose reductase family 1, member B1 (aldose reductase)), DES (desmin), MMP7 (matrix metalloproteinase 7 (matrilysin, uterine)), AHR (aryl hydrocarbon receptor), CSF1 (colony stimulating factor 1 (macrophage)), HDAC9 (histone deacetylase 9), CTGF (conjunctive tissue growth factor), KCNMA1 (potassium large conductance calcium-activated channel, subfamily M, alpha member 1), UGT1A (UDP glucuronosyltransferase 1 family, polypeptide A complex locus), PRKCA (protein kinase C, alpha), COMT (catechol-b-methyltransferase), S100B (S100 calcium-binding protein B), EGR1 (early growth response 1), PRL (prolactin), IL15 (interleukin 15), DRD4 (dopamine receptor D4), CAMK2G (calcium/calcimodulin-dependent protein kinase II gamma), SLC22A2 (solute carrier family 22 (organic cation transporter), member 2), CCL11 (chemointerleukin (C-C motif) ligand 11), PGF (placental growth factor), THPO (thrombopoietin), GP6 (glycoprotein VI (platelets)), TACR1 (tachykinin receptor 1), NTS (neurotensin), HNF1A (HNF1 homeobox A), SST (somatostatin), KCND1 (potassium voltage-gated channel, Shal-related subfamily, member 1), LOC646627 (phospholipase inhibitor), TBXAS1 (thromboxane A synthase 1 (platelets)), CYP2J2 (cytochrome P450, family 2, subgroup J, polypeptide 2), TBXA2R (thromboxane A2 receptor), ADH1C (alcohol dehydrogenase 1C (class I), gamma polypeptide), ALOX12 (arachidonic acid 12-lipoxygenase), AHSG (alpha-2-HS-glycoprotein), BHMT (betaine-homocysteine methyltransferase), GJA4 (gap junction protein, alpha 4, 37 kDa), SLC25A4 (solute carrier family 25 (mitochondrial carrier; adenine nucleotide translocation factor), member 4), ACLY (ATP citrate lyase), ALOX5AP (arachidonic acid 5-lipoxygenase activating protein), NUMA1 (nuclear mitotron protein 1), CYP27B1 (cytochrome P450, family 27, subgroup B, polypeptide 1), CYSLTR2 (cysteamine leukotriene receptor 2), SOD3 (superoxide dismutase 3, extracellular), LTC4S (leukotriene C4 synthase), UCN (urocortin), GHRL (ghrelin/obesitystatin prepropeptide), APOC2 (apolipoprotein C-II), CLEC4A (C-type lectin domain family 4, member A), KBTBD10 (kelch repeat and BTB (POZ) domain 10), TNC (tenascin C), TYMS (thymidylate synthase), SHC1 (SHC (Src homology 2 domain) conversion protein 1), LRP1 (low-density lipoprotein receptor-related protein 1), SOCS3 (suppressor of interleukin signaling 3), ADH1B (alcohol dehydrogenase IB (class I), beta polypeptide), KLK3 (pancreatic vasodilator-related peptidase 3), HSD11B1 (hydroxysteroid (11-beta) dehydrogenase 1), VKORC1 (vitamin K epoxide reductase complex, subunit 1), SERPINB2 (serpin peptidase inhibitor, evolutionary clade B (ovalbumin), member 2), TNS1 (tensin 1), RNF19A (ring finger protein 19A), EPOR (erythropoietin receptor), ITGAM (integrin, α M (complement component 3 receptor 3 subunit)), PITX2 (paired homology domain-like 2), MAPK7 (mitogen-activated protein kinase 7), FCGR3A (IgG Fc fragment, low affinity 111a receptor (CD16a)), LEPR (leptin receptor), ENG (endoglin), GPX1 (glutathione peroxidase 1), GOT2 (glutamine-oxalyltransferase 2, mitochondrial (aspartate aminotransferase 2)), HRH1 (histaminic receptor HI), NR112 (nuclear receptor subfamily 1, group I, member 2), CRH (epinephrine-releasing hormone), HTR1A (5-hydroxytryptamine (serotonin) receptor 1A), VDAC1 (voltage-dependent anion channel 1), HPSE (acetylheparinase), SFTPD (surfactant protein D), TAP2 (transporter 2, ATP-binding cassette, subfamily B (MDR/TAP)), RNF123 (RING finger protein 123), PTK2B (PTK2B protein tyrosine kinase 2 beta), NTRK2 (neurotrophic tyrosine kinase receptor, type 2), IL6R (interleukin 6 receptor), ACHE (acetylcholinesterase (Yt blood group)), GLP1R (glucagon-like peptide 1 receptor), GHR (growth hormone receptor), GSR (glutathione reductase), NQO1 (NAD(P)H dehydrogenase, quinone 1), NR5A1 (nuclear receptor subfamily 5, group A, member 1), GJB2 (gap junction protein, β2, 26 kDa), SLC9A1 (solute carrier family 9 (sodium/hydrogen exchanger), member 1), MAOA (monoamine oxidase A), PCSK9 (proprotein convertase subtilisin/kexin type 9), FCGR2A (IgG Fc fragment, low affinity Ila receptor (CD32)), SERPINF1 (serpin peptidase inhibitor, evolutionary clade F (alpha-2 antifibrotic enzyme, pigment epithelium-derived factor), member 1), EDN3 (endothelin 3), DHFR (dihydrofolate reductase), GAS6 (growth arrest-specific 6), SMPD1 (sphingomyelin phosphodiesterase 1, acid lysosome), UCP2 (uncoupling protein 2 (mitochondrial, proton carrier)), TFAP2A (transcription factor AP-2 α (activating enhancer binding protein 2 α)), C4BPA (complement component 4 binding protein, α), SERPINF2 (serpin peptidase inhibitor, evolutionary clade F (α-2 antifibrinolytic enzyme, pigment epithelium-derived factor), member 2), TYMP (thymidine phosphorylase), ALPP (alkaline phosphatase, placental (Regan isozyme)), CXCR2 (chemoattractant (C-X-C motif) receptor 2), SLC39A3 (solute carrier family 39 (zinc transporter), member 3), ABCG2 (ATP-binding cassette, subfamily G (WHITE), member 2), ADA (adenosine deaminase), JAK3 (Janus kinase 3), HSPA1A (heat shock 70 kDa protein 1A), FASN (fatty acid synthase), FGF1 (fibroblast growth factor 1 (acidic)), Fll (coagulation factor XI), ATP7A (ATPase, Cu++ transporter, alpha polypeptide), CR1 (complement component (3b/4b) receptor 1 (Knops blood type)), GFAP (fibroblast acidic protein), ROCK1 (Rho-associated coiled coil containing protein kinase 1), MECP2 (methyl CpG binding protein 2 (Rett syndrome)), MYLK (myosin light chain kinase), BCF1E (butyryl cholinesterase), LIPE (lipase, hormone sensitive), PRDX5 (peroxiredoxin 5), ADORA1 (adenosine A1 receptor), WRN (Werner syndrome, RecQ helicase-like), CXCR3 (C-X-C motif) receptor 3), CD81 (CD81 molecule), SMAD7 (SMAD family member 7), LAMC2 (laminin, gamma 2), MAP3K5 (mitogen-activated protein kinase kinase kinase 5), CF1GA (chromogranin A (parathyroid secretory protein 1)), IAPP (islet amyloliquefacial polypeptide), RFIO (rhodopsin), ENPP1 (ectonucleotide pyrophosphatase/phosphodiesterase 1), PTF1LF1 (parathyroid hormone-like hormone), NRG1 (neuromodulin 1), VEGFC (vascular endothelial growth factor C), ENPEP (glutamicin (aminopeptidase A)), CEBPB (CCAAT/enhancer binding protein (C/EBP), beta), NAGLU (N-acetylglucosidase, α), F2RL3 (coagulation factor II (thrombin) receptor-like 3), CX3CL1 (interleukin (C-X3-C motif) ligand 1), BDKRB1 (bradykinin receptor Bl), ADAMTS13 (ADAM metallopeptidase with thrombospondin type 1 motif, 13), ELANE (elastase, neutrophil expressed), ENPP2 (ectonucleotide pyrophosphatase/phosphodiesterase 2), CISF1 (interleukin-induced SF12-containing protein), GAST (gastrin), MYOC (myosin, trabecular meshwork-induced glucocorticoid response), ATP1A2 (ATPase, Na+/K+ transporter, α2 polypeptide), NF1 (neurofibroma protein 1), GJB1 (gap junction protein, β1, 32 kDa), MEF2A (myocyte enhancing factor 2A), VCL (focal adhesion protein), BMPR2 (bone morphogenetic protein receptor, type II (serine/thiocyanine kinase)), TUBB (tubulin, beta), CDC42 (mitotic cycle 42 (GTP-binding protein, 25 kDa)), KRT18 (keratin 18), F1SF1 (heat shock transcription factor 1), MYB (v-myb myeloblastosis viral oncogene homolog (avian)), PRKAA2 (protein kinase, AMP-activated, alpha 2 catalytic subunit), ROCK2 (Rho-associated coiled-coil containing protein kinase 2), TFPI (tissue factor pathway inhibitor (lipoprotein-associated coagulation inhibitor)), PRKG1 (protein kinase, cGMP-dependent, type I), BMP2 (bone morphogenetic protein 2), CTNND1 (calcificin (calcificin-related protein), delta 1), CTF1 (cystathionase (cystathionine gamma-lyase)), CTSS (histatenase S), VAV2 (vav 2 guanine nucleotide exchange factor), NPY2R (neuropeptide Y receptor Y2), IGFBP2 (insulin-like growth factor binding protein 2, 36 kDa), CD28 (CD28 molecule), GSTA1 (glutathione S-transferase alpha 1), PPIA (peptidylprolyl isomerase A (cyclophilin A)), APOF1 (apolipoprotein FI (beta-2-glycoprotein I)), S100A8 (S100 calcium-binding protein A8), IL11 (interleukin 11), ALOX15 (arachidonic acid 15-lipoxygenase), FBLN1 (fibroin 1), NR1F13 (nuclear receptor subfamily 1, FI group, member 3), SCD (stearoyl-CoA desaturase (delta-9-desaturase)), GIP (gastric inhibitory polypeptide), CF1GB (chromogranin B (secretogranin 1)), PRKCB (protein kinase C, β), SRD5A1 (steroid-5-α-reductase, α polypeptide 1 (3-hydroxy-5α-steroid delta 4-dehydrogenase α1)), F1SD11B2 (hydroxysteroid (11-β) dehydrogenase 2), CALCRL (calcitonin receptor-like), GALNT2 (UDP-N-acetyl-α-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase 2 (GalNAc-T2)), ANGPTL4 (angiopoietin-like 4), KCNN4 (potassium intermediate/small conductance calcium-activated channel, subfamily N, member 4), PIK3C2A (phosphoinositide-3-kinase, class 2, alpha polypeptide), HBEGF (heparin-binding EGF-like growth factor), CYP7A1 (cytochrome P450, family 7, subfamily A, polypeptide 1), HLA-DRB5 (major histocompatibility complex, class II, DRβ5), BNIP3 (BCL2/adenovirus E1B 19 kDa interacting protein 3), GCKR (glucokinase (hexokinase 4) regulator), S100A12 (S100 calcium-binding protein A 12), PADI4 (peptidylarginine deaminase, type IV), HSPA14 (heat shock 70 kDa protein 14), CXCR1 (chemolysin (C-X-C motif) receptor 1), H19 (H19, imprinted maternal transcript (non-protein coding)), KRTAP19-3 (keratin-associated protein 19-3), insulin, RAC2 (ras-related C3 botulinum toxin substrate 2 (rho family, small GTP-binding protein Rac2)), RYR1 (ryanodine receptor 1 (skeletal)), CLOCK (clock homolog (mouse)), NGFR (neural growth factor receptor (TNFR superfamily, member 16)), DBH (dopamine beta-hydroxylase (dopamine beta-monooxygenase)), CHRNA4 (choleline stimulatory receptor, niacin, alpha 4), CACNA1C (calcium channel, voltage-dependent, L-type, alpha 1C subunit), PRKAG2 (protein kinase, AMP-activated, gamma 2 non-catalytic subunit), CHAT (choleline acetyltransferase), PTGDS (prostaglandin D2 synthase 21 kDa (brain)), NR1H2 (nuclear receptor subfamily 1, group H, member 2), TEK (TEK tyrosine kinase, endothelial), VEGFB (vascular endothelial growth factor B), MEF2C (myocyte enhancer factor 2C), MAPKAPK2 (mitogen-activated protein kinase-activated protein kinase 2), TNFRSF11 A (tumor necrosis factor receptor subfamily, member 11a, NFKB activator), HSPA9 (heat shock 70 kDa protein 9 (life span protein)), CYSLTR1 (cysteine leukotriene receptor 1), MAT1A (methionine adenosine transferase I, alpha), OPRL1 (opium receptor-like 1), IMPA1 (inositol (muscle)-1(or 4)-monophosphatase 1), CLCN2 (chloride channel 2), DLD (dihydrolipoic acid amide dehydrogenase), PSMA6 (proteasome (precursor, megalin factor) subunit, alpha type, 6), PSMB8 (proteasome (precursor, megalin factor) subunit, beta type, 8 (large multifunctional peptidase 7)), CHI3L1 (chitinase 3-like 1 (chondroitin-39)), ALDH1B1 (aldehyde dehydrogenase 1 family, member Bl), PARP2 (poly (ADP-ribose) polymerase 2), STAR (steroidogenic acute regulatory protein), LBP (lipopolysaccharide binding protein), ABCC6 (ATP-binding cassette, subfamily C (CFTR/MRP), member 6), RGS2 (regulator of G protein signaling 2, 24 kDa), EFNB2 (adrenaline-B2), cystic fibrosis transmembrane conductance regulator (CFTR), GJB6 (gap junction protein, β6, 30 kDa), APOA2 (apolipoprotein A-II), AMPD1 (adenosine monophosphate deaminase 1), DYSF (dysferlin, limb-girdle muscular atrophy 2B (somatic recessive)), FDFT1 (farnesyl-diphosphate farnesyltransferase 1), EDN2 (endothelin 2), CCR6 (interleukin (C-C motif) receptor 6), GJB3 (gap junction protein, β3, 31 kDa), IL1RL1 (interleukin 1 receptor-like 1), ENTPD1 (ectonucleoside triphosphate diphosphohydrolase 1), BBS4 (Bardet-Biedl syndrome 4), CELSR2 (calcified mucin, EGF LAG seven-channel G-type receptor 2 (red rooster homolog, fruit fly)), F11R (Fll receptor), RAPGEF3 (Rap guanine nucleotide exchange factor (GEF) 3), HYAL1 (hyaluronosidase 1), ZNF259 (zinc finger protein 259), ATOX1 (ATX1 antioxidant protein 1 homolog (yeast)), ATF6 (activating transcription factor 6), KΉK (ketohexokinase (fructokinase)), SAT1 (spermidine/spermine Nl-acetyltransferase 1), GGFI (γ-glutamicin hydrolase (binding enzyme, pyrrolyl poly-γ-glutamicin hydrolase)), TIMP4 (TIMP metallopeptidase inhibitor 4), SLC4A4 (solute carrier family 4, sodium bicarbonate co-transporter, member 4), PDE2A (phosphodiesterase 2 A, cGMP stimulated), PDE3B (phosphodiesterase 3B, cGMP inhibited), FADS1 (fatty acid desaturase 1), FADS2 (fatty acid desaturase 2), TMSB4X (thymosin beta 4, X-linked), TXNIP (thioredoxin interacting protein), LIMS1 (LIM and senescent cell antigen-like domain 1), RFIOB (ras homolog gene family, member B), LY96 (lymphocyte antigen 96), F-side 1 (forkhead box 01), PNPLA2 (patatin-like phospholipase domain-containing 2), TRH (thyroid-stimulating hormone-releasing hormone), GJC1 (gap junction protein, gamma 1, 45 kDa), SLC17A5 (solute carrier family 17 (anion/sugar transporter), member 5), FTO (fat mass and obesity-related), GJD2 (gap junction protein, delta 2, 36 kDa), PSRC1 (proline/serine-rich coiled coil 1), CASP12 (caspase 12 (gene/pseudogene)), GPBAR1 (G protein-coupled bile acid receptor 1), PXK (PX domain containing serine/threonine kinase), IL33 (interleukin 33), TRIB1 (tricholoma homolog 1 (Drosophila)), PBX4 (pre-B cell leukemia homeobox 4), NUPR1 (nuclear protein, transcriptional regulator, 1), 15-Sep (15 kDa selenoprotein), CILP2 (chondrocyte intermediate layer protein 2), TERC (telomerase RNA component), GGT2 (gamma-glutamyl transferase 2), MT-CO1 (mitochondrial encoded cytochrome c oxidase I), UOX (urate oxidase, pseudogene), CRISPR/Cas effector polypeptide, enzymatically active CRISPR/Cas effector polypeptide (e.g., capable of cleaving a target nucleic acid), and enzymatically inactive CRISPR/Cas effector polypeptide (e.g., does not cleave a target nucleic acid, but retains binding to a target nucleic acid). In some cases, the donor DNA encodes a wild-type form of any of the foregoing polypeptides; that is, the donor DNA may encode a "normal" form that does not include mutations that result in reduced function, lack of function, or pathogenicity.

在一些情況下,供體DNA包含編碼螢光多肽之核苷酸序列。合適螢光蛋白包括但不限於綠色螢光蛋白(GFP)或其變異體、GFP之藍色螢光變異體(BFP)、GFP之青色螢光變異體(CFP)、GFP之黃色螢光變異體(YFP)、增強型GFP (EGFP)、增強型CFP (ECFP)、增強型YFP (EYFP)、GFPS65T、Emerald、Topaz (TYFP)、Venus、Citrine、mCitrine、GFPuv、去穩定EGFP (dEGFP)、去穩定ECFP (dECFP)、去穩定EYFP (dEYFP)、mCFPm、Cerulean、T-Sapphire、CyPet、YPet、mKO、HcRed、t-HcRed、DsRed、DsRed2、DsRed-單體、J-Red、dimer2、t-dimer2(12)、mRFPl、pocilloporin、海腎GFP、Monster GFP、paGFP、Kaede蛋白及點燃蛋白、藻膽蛋白及藻膽蛋白結合物(包括B-藻紅蛋白、R-藻紅蛋白及異藻藍蛋白)。螢光蛋白之其他實例包括mHoneydew、mBanana、mOrange、dTomato、tdTomato、mTangerine、mStrawberry、mCherry、mGrapel、mRaspberry、mGrape2、m PI urn (Shaner等人 (2005) Nat. Methods 2:905-909)及其類似物。來自珊瑚蟲物種之多種螢光及有色蛋白質中之任一種均可經編碼,如例如Matz等人 (1999) Nature Biotechnol. 17:969-973中所述。In some cases, the donor DNA comprises a nucleotide sequence encoding a fluorescent polypeptide. Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilized EGFP (dEGFP), destabilized ECFP (dECFP), destabilized EYFP (dEGFP), destabilized EY ...EFP), destabilized EYFP (dEFP), destabilized EYFP (dEFP), destabilized EYFP (dEFP), destabilized EYFP (dEFP), destabilized EYFP (dEFP), destabilized EYFP (dEFP), destabilized (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t-HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2 (12), mRFPl, pocilloporin, sea GFP, Monster GFP, paGFP, Kaede protein and kindling protein, phycobiliprotein and phycobiliprotein conjugates (including B-phycoerythrin, R-phycoerythrin and isophycocyanin). Other examples of fluorescent proteins include mHoneydew, mBanana, mOrange, dTomato, tdTomato, mTangerine, mStrawberry, mCherry, mGrapel, mRaspberry, mGrape2, m PI urn (Shaner et al. (2005) Nat. Methods 2:905-909) and the like. Any of a variety of fluorescent and colored proteins from coral polyp species may be encoded, as described, e.g., in Matz et al. (1999) Nature Biotechnol. 17:969-973.

在一些情況下,供體DNA編碼RNA,例如siRNA、微小RNA、短髮夾RNA (shRNA)、反義RNA、核糖開關、核酶、適體、核糖體RNA、轉移RNA及其類似物。In some cases, the donor DNA encodes RNA, such as siRNA, microRNA, short hairpin RNA (shRNA), antisense RNA, riboswitch, ribozyme, aptamer, ribosomal RNA, transfer RNA, and the like.

除了編碼一或多種基因產物(例如,RNA及/或多肽)之核苷酸序列以外,供體DNA亦可包括一或多種轉錄控制元件,例如啟動子、增強子及其類似物。在一些情況下,轉錄控制元件為誘導型的。在一些情況下,啟動子為可逆的。在一些情況下,轉錄控制元件為組成型的。在一些情況下,啟動子在真核細胞中具有功能。在一些情況下,啟動子為細胞類型特異性啟動子。在一些情況下,啟動子為組織特異性啟動子。In addition to nucleotide sequences encoding one or more gene products (e.g., RNA and/or polypeptides), the donor DNA may also include one or more transcriptional control elements, such as promoters, enhancers, and the like. In some cases, the transcriptional control element is an inducible type. In some cases, the promoter is reversible. In some cases, the transcriptional control element is constitutive. In some cases, the promoter is functional in eukaryotic cells. In some cases, the promoter is a cell type-specific promoter. In some cases, the promoter is a tissue-specific promoter.

供體DNA之核苷酸序列通常與其置換之標靶核酸(例如,基因體序列)不一致。更確切地,供體DNA可相對於標靶核酸(例如,基因體序列)含有至少一或多種單鹼基變化、插入、缺失、倒置或重排,只要存在足夠同源性以支持同源定向修復(例如,用於基因校正,例如以轉化致病鹼基對或非致病鹼基對)。在一些情況下,供體DNA包含側接兩個同源區之非同源序列,使得標靶DNA區域與兩個側接序列之間之同源定向修復導致該非同源序列插入標靶區域處。供體DNA亦可包含載體骨架,該載體骨架含有與相關DNA區域(標靶核酸)不同源且不欲插入相關DNA區域(標靶核酸)中之序列。一般而言,供體序列之同源區將與需要重組之標靶核酸(例如,基因體序列)具有至少50%序列一致性。在某些情況下,存在60%、70%、80%、90%、95%、98%、99%或99.9%序列一致性。可存在1%與100%序列一致性之間之任何值,這取決於供體多核苷酸之長度。The nucleotide sequence of the donor DNA is generally inconsistent with the target nucleic acid (e.g., genome sequence) that it replaces. More specifically, the donor DNA may contain at least one or more single base changes, insertions, deletions, inversions, or rearrangements relative to the target nucleic acid (e.g., genome sequence), as long as there is sufficient homology to support homology-directed repair (e.g., for gene correction, such as to convert pathogenic base pairs or non-pathogenic base pairs). In some cases, the donor DNA comprises a non-homologous sequence flanked by two homologous regions, such that homology-directed repair between the target DNA region and the two flanking sequences results in the insertion of the non-homologous sequence into the target region. The donor DNA may also comprise a vector backbone containing a sequence that is not homologous to the relevant DNA region (target nucleic acid) and is not intended to be inserted into the relevant DNA region (target nucleic acid). Generally, the homologous region of the donor sequence will have at least 50% sequence identity with the target nucleic acid (e.g., genomic sequence) to be recombined. In some cases, there is 60%, 70%, 80%, 90%, 95%, 98%, 99% or 99.9% sequence identity. Any value between 1% and 100% sequence identity may exist, depending on the length of the donor polynucleotide.

如與標靶核酸(例如,基因體序列)相比,供體DNA可包含某些核苷酸序列差異,其中此類差異包括例如限制性位點、核苷酸多態性、選擇標記物(例如,抗藥性基因、螢光蛋白、酶等)等,其可用於評估供體DNA在裂解位點處之成功插入,或在一些情況下可用於其他目的(例如,表示在靶向基因體基因座處之表現)。在一些情況下,若位於編碼區中,則此類核苷酸序列差異將不會使胺基酸序列發生變化,或將產生沈默胺基酸變化(亦即,不影響蛋白質結構或功能之變化)。或者,此等序列差異可包括側接重組序列,諸如FLP、loxP序列或其類似序列,該等序列可在稍後時間經活化以移除標記序列。在一些情況下,供體DNA將包括一或多個核苷酸序列以幫助供體定位至受體細胞之細胞核或幫助供體DNA整合至標靶核酸中。例如,在一些情況下,供體DNA可包含編碼一或多個核定位信號之一或多個核苷酸序列(例如,PKKKRKV (SEQ ID NO: 19399)、VSRKRPRP (SEQ ID NO: 19400)、QRKRKQ (SEQ ID NO: 19401)及其類似序列(Frietas等人 (2009) Cun- Genomics 10:550-7)。在一些情況下,供體DNA將包括核苷酸序列以募集DNA修復酶來增加插入效率。參與同源定向修復之Fiuman酶包括MRN-CtIP、BLM-DNA2、Exol、ERCC1、Rad51、Rad52、Ligase 1、RoIQ、PARP1、Ligase 3、BRCA2、RecQ/BLM-ToroIIIa、RTEL、Roΐd及Roΐh (Verma及Greenburg (2016) Genes Dev. 30 (10): 1138-1154)。在一些情況下,供體DNA作為重構染色質經遞送(Cruz-Becerra及Kadonaga (2020) eLife 2020;9:e55780 DOI: 10.7554/eLife.55780)。As compared to the target nucleic acid (e.g., genomic sequence), the donor DNA may include certain nucleotide sequence differences, where such differences include, for example, restriction sites, nucleotide polymorphisms, selection markers (e.g., drug resistance genes, fluorescent proteins, enzymes, etc.), etc., which can be used to assess the successful insertion of the donor DNA at the cleavage site, or in some cases can be used for other purposes (e.g., to indicate expression at the targeted genomic locus). In some cases, if located in the coding region, such nucleotide sequence differences will not cause changes in the amino acid sequence, or will produce silent amino acid changes (i.e., changes that do not affect protein structure or function). Alternatively, such sequence differences may include flanking recombination sequences, such as FLP, loxP sequences, or the like, which can be activated at a later time to remove the marker sequence. In some cases, the donor DNA will include one or more nucleotide sequences to aid in localization of the donor to the nucleus of the recipient cell or to aid in integration of the donor DNA into the target nucleic acid. For example, in some cases, the donor DNA may include one or more nucleotide sequences encoding one or more nuclear localization signals (e.g., PKKKRKV (SEQ ID NO: 19399), VSRKRPRP (SEQ ID NO: 19400), QRKRKQ (SEQ ID NO: 19401), and the like (Frietas et al. (2009) Cun-Genomics 10:550-7). In some cases, the donor DNA will include nucleotide sequences to recruit DNA repair enzymes to increase insertion efficiency. Fiuman enzymes involved in homology-directed repair include MRN-CtIP, BLM-DNA2, Exol, ERCC1, Rad51, Rad52, Ligase 1, RoIQ, PARP1, Ligase 3, BRCA2, RecQ/BLM-ToroIIIa, RTEL, RoId, and RoIh (Verma and Greenburg (2016) Genes Dev. 30 (10): 1138-1154). In some cases, donor DNA is delivered as reconstituted chromatin (Cruz-Becerra and Kadonaga (2020) eLife 2020;9:e55780 DOI: 10.7554/eLife.55780).

在一些情況下,藉由任何便利方法來保護供體DNA之末端(例如,免於核酸外切降解)且此類方法係熟習此項技術者已知的。例如,可將一或多個雙去氧核苷酸殘基添加至線性分子之3'末端及/或可將自互補寡核苷酸接合至一端或兩端。參見例如Chang等人 (1987) Proc. Natl. Acad Sci USA 84:4959-4963;Nehls等人 (1996) Science 272:886-889。用於保護外源性多核苷酸免於降解之額外方法包括但不限於添加末端胺基,及使用經修飾之核苷酸間鍵聯,例如硫代磷酸酯、胺基磷酸酯及O-甲基核糖或去氧核糖殘基。作為保護線性供體DNA末端之替代方案,可在同源區外部包括額外長度之序列,該等序列可發生降解而不影響重組。 HNS = 功能性RNA元件 In some cases, the ends of the donor DNA are protected (e.g., from exonucleolytic degradation) by any convenient method and such methods are known to those skilled in the art. For example, one or more bi-deoxynucleotide residues can be added to the 3' end of the linear molecule and/or a self-complementary oligonucleotide can be ligated to one or both ends. See, e.g., Chang et al. (1987) Proc. Natl. Acad Sci USA 84:4959-4963; Nehls et al. (1996) Science 272:886-889. Additional methods for protecting exogenous polynucleotides from degradation include, but are not limited to, adding terminal amine groups, and using modified internucleotide linkages, such as phosphorothioates, phosphoramidates, and O-methyl ribose or deoxyribose residues. As an alternative to protecting the ends of the linear donor DNA, additional lengths of sequence can be included outside the homology region that can be degraded without affecting recombination. HNS = functional RNA element

在某些實施例中,供體/模板包含功能性RNA元件(ncRNA、siRNA、shRNA、sgRNA等)之編碼序列。In certain embodiments, the donor/template comprises a coding sequence for a functional RNA element (ncRNA, siRNA, shRNA, sgRNA, etc.).

在一些實施例中,異源核酸序列編碼功能性非轉譯RNA。在一些實施例中,功能性非轉譯RNA為RNA適體或核酶。In some embodiments, the heterologous nucleic acid sequence encodes a functional non-translated RNA. In some embodiments, the functional non-translated RNA is an RNA aptamer or a ribozyme.

在一些實施例中,經工程改造之逆轉錄子的異源核酸進一步包括獨特條碼以促進多路複用。條碼可包括用於鑑定與條碼相關之核酸或細胞之一或多個核苷酸序列。此類條碼可插入例如 msd編碼之DNA之尖端/環區中。條碼可為3-1000個或更多個核苷酸長、10-250個核苷酸長或10-30個核苷酸長,包括此等範圍內之任何長度,諸如3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、40、50、60、70、80、90、100、200、300、400、500、600、700、800、900或1000個核苷酸長。 In some embodiments, the heterologous nucleic acid of the engineered retrotransposons further comprises a unique barcode to facilitate multiplexing. The barcode may comprise one or more nucleotide sequences for identifying the nucleic acid or cell associated with the barcode. Such barcodes may be inserted, for example, into the tip/loop region of the msd- encoded DNA. The barcodes can be 3-1000 or more nucleotides long, 10-250 nucleotides long, or 10-30 nucleotides long, including any length within these ranges, such as 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 nucleotides long.

在一些實施例中,條碼亦用於鑑定逆轉錄子起源之細胞、群落或樣品的位置( 亦即,位置條碼),諸如細胞陣列中之群落位置、多孔板中之孔位置、支架中之管位置或實驗室中之樣品位置。詳言之,條碼可用於鑑定含有逆轉錄子之經遺傳修飾之細胞的位置。條碼之使用允許將來自不同細胞之逆轉錄子匯集於單一反應混合物中以進行測序,同時仍能夠將特定逆轉錄子追溯至其起源之群落。 In some embodiments, barcodes are also used to identify the location of the cell, colony, or sample from which the retrotransposons originated ( i.e. , positional barcodes), such as colony location in a cell array, well location in a multiwell plate, tube location in a rack, or sample location in a laboratory. Specifically, barcodes can be used to identify the location of genetically modified cells containing retrotransposons. The use of barcodes allows retrotransposons from different cells to be pooled in a single reaction mixture for sequencing while still being able to trace a specific retrotransposon back to its colony of origin.

此外,可將銜接子序列添加至經工程改造之逆轉錄子中以促進高通量擴增或測序。例如,可在逆轉錄子構築體之5'及3'端添加一對銜接子序列,以允許藉由同一組引子同時對多種經工程改造之逆轉錄子進行擴增或測序。 HNS = 引導RNA In addition, linker sequences can be added to engineered retrotranscripts to facilitate high-throughput amplification or sequencing. For example, a pair of linker sequences can be added to the 5' and 3' ends of the retrotranscript construct to allow multiple engineered retrotranscripts to be amplified or sequenced simultaneously by the same set of primers. HNS = guide RNA

在一些實施例中,功能性非轉譯RNA係對哺乳動物細胞中之標靶序列具有特異性的CRISPR/Cas引導RNA (gRNA)。圖1G描繪重組逆轉錄子ncRNA之各種組態,該重組逆轉錄子ncRNA藉由在ncRNA之5’端或3’處插入引導RNA而經修飾。引導RNA亦可作為單獨構築體以 反式提供。此外,引導RNA可置於重組逆轉錄子ncRNA之兩端。 In some embodiments, the functional non-translated RNA is a CRISPR/Cas guide RNA (gRNA) specific for a target sequence in a mammalian cell. FIG. 1G depicts various configurations of a recombinant retrotranscript ncRNA modified by inserting a guide RNA at the 5' end or 3' of the ncRNA. The guide RNA can also be provided in trans as a separate construct. In addition, the guide RNA can be placed at both ends of the recombinant retrotranscript ncRNA.

熟練人員應理解,藉由利用哪種RNA引導核酸酶來告知適當引導RNA之選擇。The skilled artisan will appreciate that the selection of the appropriate guide RNA is informed by which RNA-guide nuclease is utilized.

引導RNA藉由包括靶向區段向複合物(RNP複合物)提供標靶特異性,該靶向區段包括引導序列(本文中亦稱為靶向序列),該引導序列係與標靶核酸序列互補之核苷酸序列。如本文所用,術語「引導RNA」係指包含以下之RNA:i)結合於CRISPR/Cas效應多肽(例如2類CRISPR/Cas效應多肽,諸如II型、V型或VI型CRISPR/Cas核酸內切酶)且活化CRISPR/Cas效應多肽之「活化子」核苷酸序列;及ii)包含與標靶核酸雜交之核苷酸序列的「靶向子」核苷酸序列。「活化子」核苷酸序列及「靶向子」核苷酸序列可在單獨RNA分子上(例如,「雙引導RNA」);或可在同一RNA分子上(「單引導RNA」)。在一些情況下,引導核酸僅包括核糖核苷酸。在一些情況下,引導核酸包括核糖核苷酸及去氧核糖核苷酸。The guide RNA provides target specificity to the complex (RNP complex) by including a targeting segment, which includes a guide sequence (also referred to herein as a targeting sequence), which is a nucleotide sequence that is complementary to the target nucleic acid sequence. As used herein, the term "guide RNA" refers to an RNA that includes: i) an "activator" nucleotide sequence that binds to a CRISPR/Cas effector polypeptide (e.g., a type 2 CRISPR/Cas effector polypeptide, such as a type II, type V, or type VI CRISPR/Cas endonuclease) and activates the CRISPR/Cas effector polypeptide; and ii) a "targeter" nucleotide sequence that includes a nucleotide sequence that hybridizes with the target nucleic acid. The "activator" nucleotide sequence and the "targeter" nucleotide sequence can be on separate RNA molecules (e.g., "dual guide RNAs"); or can be on the same RNA molecule ("single guide RNA"). In some cases, the guide nucleic acid includes only ribonucleotides. In some cases, the guide nucleic acid includes ribonucleotides and deoxyribonucleotides.

在一些情況下,CRISPR/Cas引導RNA包含一或多種修飾,例如鹼基修飾、骨架修飾、糖修飾等,以向核酸提供新的或增強特徵(例如經改良之穩定性,諸如經改良之活體內穩定性)。合適核酸修飾包括但不限於:2’O-甲基修飾之核苷酸、2’氟修飾之核苷酸、鎖核酸(LNA)修飾之核苷酸、肽核酸(PNA)修飾之核苷酸、具有硫代磷酸酯鍵聯之核苷酸及5’帽(例如,7-甲基鳥苷酸帽(m7G))。其中含有磷原子之合適經修飾之核酸骨架包括例如硫代磷酸酯、對掌性硫代磷酸酯、二硫代磷酸酯、磷酸三酯、胺基烷基磷酸三酯、甲基及其他烷基膦酸酯(包括3'-伸烷基膦酸酯、5'-伸烷基膦酸酯及對掌性膦酸酯)、次膦酸酯、胺基磷酸酯(包括具有正常3'-5'鍵聯之3'-胺基胺基磷酸酯及胺基烷基胺基磷酸酯、二胺基磷酸酯、硫羰胺基磷酸酯、硫羰烷基膦酸酯、硫羰烷基磷酸三酯、硒代磷酸酯及硼烷磷酸酯)、此等之2'-5’連接類似物以及具有反向極性之彼等,其中一或多個核苷酸間鍵聯為3'至3'、5'至5'或2'至2'鍵聯。具有反向極性之合適寡核苷酸在最接近3'核苷酸間鍵聯處包含單一3'至3'鍵聯,亦即,可呈鹼性之單一反向核苷殘基(核鹼基缺失或具有替代其之羥基)。亦包括各種鹽(例如鉀或鈉)、混合鹽及游離酸形式。CRISPR-Cas引導RNA亦可包括一或多個經取代之糖部分。合適多核苷酸包含選自以下之糖取代基:OH;F;0-、S-或N-烷基;0-、S-或N-烯基;0-、S-或N-炔基;或O-烷基-O-烷基,其中該烷基、烯基及炔基可為經取代或未經取代之Ci至Cio烷基或C2至C10烯基及炔基。0((CH 2) n0) mCH 3、0(CH 2) n0CH 3、0(CH 2) nNH 2、0(CH 2) nCH 3、0(CH 2) n0NH 2及0(CH 2) n0N((CH 2) nCH 3) 2尤其合適,其中n及m為1至約10。其他合適之多核苷酸包含選自以下之糖取代基:Ci至Cio低碳烷基、經取代之低碳烷基、烯基、炔基、烷芳基、芳烷基、O-烷芳基或O-芳烷基、SH、SCH 3、OCN、Cl、Br、CN、CF 3、OCF 3、SOCH 3、S0 2CH 3、0N0 2、N0 2、N 3、NH 2、雜環烷基、雜環烷芳基、胺基烷基胺基、聚烷基胺基、經取代之矽烷基、RNA裂解基、報告基團、嵌入劑、用於改良寡核苷酸之藥物動力學特性之基團或用於改良寡核苷酸之藥效學特性之基團,以及具有相似特性之其他取代基。合適修飾包括2'-甲氧基乙氧基(2'-0-CH 2CH 2OCH 3,亦稱為2'-0-(2-甲氧基乙基)或2'-MOE) (Martin等人, Helv. Chim. Acta, 1995, 78, 486-504),亦即,烷氧基烷氧基。進一步合適修飾包括2'-二甲基胺基氧基乙氧基,亦即0(CH 2) 20N(CH 3) 2基團,亦稱為2'- DMAOE,如下文實例所述;及2'-二甲基胺基乙氧基乙氧基(此項技術中亦稱為2'-0-二甲基-胺基-乙氧基-乙基或2'-DMAEOE),亦即2'-0-CH 2-0-CH 2-N(CH 3) 2In some cases, the CRISPR/Cas guide RNA comprises one or more modifications, such as base modifications, backbone modifications, sugar modifications, etc., to provide new or enhanced characteristics to the nucleic acid (e.g., improved stability, such as improved in vivo stability). Suitable nucleic acid modifications include, but are not limited to: 2'O-methyl modified nucleotides, 2' fluorine modified nucleotides, locked nucleic acid (LNA) modified nucleotides, peptide nucleic acid (PNA) modified nucleotides, nucleotides with phosphorothioate linkages, and 5' caps (e.g., 7-methylguanylate caps (m7G)). Suitable modified nucleic acid backbones containing phosphorus atoms include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkylphosphonates (including 3'-alkylene phosphonates, 5'-alkylene phosphonates and chiral phosphonates), phosphinates, phosphoamidates (including 3'-aminophosphoamidates and aminoalkylphosphoamidates with normal 3'-5' linkages, phosphodiamidates, phosphothioamidates, phosphothioalkylphosphonates, phosphothioalkylphosphotriesters, selenophosphates and boranophosphates), 2'-5' linked analogs of these, as well as those with reverse polarity wherein one or more internucleotide linkages are 3' to 3', 5' to 5', or 2' to 2' linkages. Suitable oligonucleotides with reverse polarity comprise a single 3' to 3' linkage proximal to the 3' internucleotide linkage, i.e., a single reverse nucleoside residue that may be basic (nucleobases are absent or have hydroxyl groups in their place). Various salts (e.g., potassium or sodium), mixed salts, and free acid forms are also included. The CRISPR-Cas guide RNA may also include one or more substituted sugar moieties. Suitable polynucleotides comprise sugar substituents selected from the following: OH; F; O-, S-, or N-alkyl; O-, S-, or N-alkenyl; O-, S-, or N-alkynyl; or O-alkyl-O-alkyl, wherein the alkyl, alkenyl, and alkynyl groups may be substituted or unsubstituted C1 to C10 alkyl or C2 to C10 alkenyl and alkynyl groups. Particularly suitable are 0(( CH2 ) n0 ) mCH3 , 0 ( CH2 ) n0CH3 , 0 ( CH2 ) nNH2 , 0( CH2 ) nCH3 , 0( CH2 ) nONH2 and 0( CH2 ) nON (( CH2 ) nCH3 ) 2 , wherein n and m are 1 to about 10 . Other suitable polynucleotides include sugar substituents selected from the following: C1 to C10 lower alkyl, substituted lower alkyl, alkenyl, alkynyl, alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH3 , OCN, Cl, Br, CN, CF3 , OCF3, SOCH3 , S02CH3, ONO2 , NO2 , N3 , NH2 , heterocycloalkyl, heterocycloalkaryl , aminoalkylamine, polyalkylamine, substituted silyl , RNA cleavage group, reporter group, intercalator, group for improving the pharmacokinetic properties of oligonucleotides or group for improving the pharmacodynamic properties of oligonucleotides, and other substituents with similar properties. Suitable modifications include 2'-methoxyethoxy (2'-0-CH 2 CH 2 OCH 3 , also known as 2'-0-(2-methoxyethyl) or 2'-MOE) (Martin et al., Helv. Chim. Acta, 1995, 78, 486-504), i.e., alkoxyalkoxy. Further suitable modifications include 2'-dimethylaminooxyethoxy, i.e., the O(CH 2 ) 2 ON(CH 3 ) 2 group, also known as 2'-DMAOE, as described in the examples below; and 2'-dimethylaminoethoxyethoxy (also known in the art as 2'-0-dimethyl-amino-ethoxy-ethyl or 2'-DMAEOE), i.e., 2'-0-CH 2 -0-CH 2 -N(CH 3 ) 2 .

各種CRISPR/Cas效應蛋白及CRISPR/Cas引導RNA之實例(以及關於與標靶核酸中存在之原間隔基相鄰模體(PAM)序列相關的要求之資訊)可見於此項技術中,例如參見Jinek等人, Science. 2012年8月17日;337(6096):816-21;Chylinski等人, RNA Biol. 2013年5月;10(5):726-37;Ma等人, Biomed Res Int. 2013;2013:270805;Hou等人, Proc Natl Acad Sci U S A. 2013年9月24日;110(39):15644-9;Jinek等人, Elife. 2013;2:e00471;Pattanayak等人, Nat Biotechnol. 2013年9月;31(9):839-43;Qi等人, Cell. 2013年2月28日 ; 152(5): 1173-83;Wang等人, Cell. 2013年5月9日;153(4):910-8;Auer等人, Genome Res. 2013年10月31日;Chen等人, Nucleic Acids Res. 2013年11月l日;41(20):el9;Cheng等人, Cell Res. 2013年10月;23(10):1163- 71;Cho等人, Genetics. 2013年11月;195(3):1177-80;DiCarlo等人, Nucleic Acids Res. 2013年4月;41(7):4336-43;Dickinson等人, Nat Methods. 2013年10月;10(10):1028-34;Ebina等人, Sci Rep. 2013;3:2510;Fujii等人, Nucleic Acids Res. 2013年11月l日;41(20):el87;Hu等人, Cell Res. 2013年11月;23(ll):1322-5;Jiang等人, Nucleic Acids Res. 2013年11月l日;41(20):el88;Larson等人, Nat Protoc. 2013年11月;8(ll):2180-96;Mali等人, Nat Methods. 2013年10月;10(10):957-63;Nakayama等人,Genesis. 2013年12月;51(12):835-43;Ran等人, Nat Protoc. 2013年11月;8(ll):2281-308;Ran等人, Cell. 2013年9月12日;154(6):1380-9;Upadhyay等人, G3 (Bethesda). 2013年12月9日;3(12):2233-8;Walsh等人,Proc Natl Acad Sci U S A. 2013年9月24日;110(39):15514-5;Xie等人, Mol Plant. 2013年10月9日;Yang等人, Cell. 2013年9月12日;154(6):1370-9;Briner等人, Mol Cell. 2014年10月23日;56(2):333-9;以及美國專利及專利申請案:8,906,616;8,895,308;8,889,418;8,889,356;8,871,445;8,865,406;8,795,965;8,771,945;8,697,359;20140068797;20140170753;20140179006;20140179770;20140186843;20140186919;20140186958;20140189896;20140227787;20140234972;20140242664;20140242699;20140242700;20140242702;20140248702;20140256046;20140273037;20140273226;20140273230;20140273231;20140273232;20140273233;20140273234;20140273235;20140287938;20140295556;20140295557;20140298547;20140304853;20140309487;20140310828;20140310830;20140315985;20140335063;20140335620;20140342456;20140342457;20140342458;20140349400;20140349405;20140356867;20140356956;20140356958;20140356959;20140357523;20140357530;20140364333;及20140377868;其均藉此以引用之方式整體併入。Examples of various CRISPR/Cas effector proteins and CRISPR/Cas guide RNAs (as well as information on requirements related to the presence of a protospacer adjacent motif (PAM) sequence in the target nucleic acid) can be found in the art, e.g., see Jinek et al., Science. 2012 Aug 17;337(6096):816-21; Chylinski et al., RNA Biol. 2013 May;10(5):726-37; Ma et al., Biomed Res Int. 2013;2013:270805; Hou et al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15644-9; Jinek et al., Elife. 2013;2:e00471; Pattanayak et al., Nat Biotechnol. 2013 Sep;31(9):839-43; Qi et al. Cell. 2013 Feb 28;152(5):1173-83; Wang et al. Cell. 2013 May 9;153(4):910-8; Auer et al. Genome Res. 2013 Oct 31; Chen et al. Nucleic Acids Res. 2013 Nov 1;41(20):el9; Cheng et al. Cell Res. 2013 Oct;23(10):1163-71; Cho et al. Genetics. 2013 Nov;195(3):1177-80; DiCarlo et al. Nucleic Acids Res. 2013 Apr;41(7):4336-43;Dickinson et al., Nat Methods. 2013 Oct;10(10):1028-34;Ebina et al., Sci Rep. 2013;3:2510;Fujii et al., Nucleic Acids Res. 2013 Nov 1;41(20):el87;Hu et al., Cell Res. 2013 Nov;23(ll):1322-5;Jiang et al., Nucleic Acids Res. 2013 Nov 1;41(20):el88;Larson et al., Nat Protoc. 2013 Nov;8(ll):2180-96;Mali et al., Nat Methods. 2013 Oct;10(10):957-63;Nakayama et al., Genesis. 2013 Dec;51(12):835-43;Ran et al., Nat Protoc. 2013 Nov;8(ll):2281-308;Ran et al., Cell. 2013 Sep 12;154(6):1380-9;Upadhyay et al., G3 (Bethesda). 2013 Dec 9;3(12):2233-8;Walsh et al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15514-5;Xie et al., Mol Plant. 2013 Oct 9;Yang et al., Cell. 2013 Sep 12;154(6):1370-9; Briner et al., Mol Cell. 2014 Oct 23;56(2):333-9; and U.S. patents and patent applications: 8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445; 8,865,406; 8,795,965; 8,771,945; 8,697,359; 20140068797; 20140170753; 20140179006; 20140179770; 2014 20140242699; 20140242700; 20140242702; 20140 248702;20140256046;20140273037;20140273226;20140273230;2014 20140273231 309487;20140310828;20140310830;20140315985;20140335063;2014 0335620; 20140342456; 20140342457; 20140342458; 20140349400; 20140349405; 20140356867; 20140356956; 20140356958; 20140356959; 20140357523; 20140357530; 20140364333; and 20140377868; which are hereby incorporated by reference in their entirety.

與V型或VI型CRISPR/Cas核酸內切酶及引導RNA相關之實例及指導(以及關於與標靶核酸中存在之原間隔基相鄰模體(PAM)序列相關的要求之資訊)可見於此項技術中,例如參見Zetsche等人, Cell. 2015年10月22日;163(3):759-71;Makarova等人, Nat Rev Microbiol. 2015年11月;13(ll):722-36;及Shmakov等人, Mol Cell. 2015年11月5日;60(3):385-97。 C. 逆轉錄酶(RT) Examples and instructions for Type V or Type VI CRISPR/Cas endonucleases and guide RNAs (as well as information regarding requirements for protospacer adjacent motif (PAM) sequences present in the target nucleic acid) can be found in the art, e.g., Zetsche et al., Cell. 2015 Oct 22;163(3):759-71; Makarova et al., Nat Rev Microbiol. 2015 Nov;13(ll):722-36; and Shmakov et al., Mol Cell. 2015 Nov 5;60(3):385-97. C. Reverse transcriptase (RT)

逆轉錄酶(RT,亦稱為RNA定向DNA聚合酶)係存在於生命之所有三個領域中的酶,其為使用RNA作為模板之DNA聚合酶。本揭示案之逆轉錄酶用於將模板 msdRNA逆轉錄成單股msDNA。 Reverse transcriptase (RT, also known as RNA-directed DNA polymerase) is an enzyme present in all three domains of life, which is a DNA polymerase that uses RNA as a template. The reverse transcriptase of the present disclosure is used to reverse transcribe the template msd RNA into single-stranded msDNA.

可用於本發明之逆轉錄酶或其功能結構域包括原核生物及真核生物RT,條件係RT在宿主內起作用以由RNA模板( 例如,來自逆轉錄子轉錄本ncRNA之RNA模板)生成供體多核苷酸序列。 Reverse transcriptases or functional domains thereof useful in the present invention include prokaryotic and eukaryotic RTs, provided that the RT functions within a host to generate a donor polynucleotide sequence from an RNA template ( eg , an RNA template from a retrotranscriptome ncRNA).

在某些實施例中,合適RT序列(包括胺基酸序列及編碼多核苷酸序列)提供於表A中。In certain embodiments, suitable RT sequences (including amino acid sequences and encoding polynucleotide sequences) are provided in Table A.

在某些實施例中,例如,使用已知之密碼子最佳化技術來修飾原生或野生型RT之核苷酸序列,使得最佳化所需宿主內之表現。In certain embodiments, for example, the nucleotide sequence of a native or wild-type RT is modified using known codon optimization techniques to optimize expression in a desired host.

在某些實施例中,逆轉錄酶之RT結構域用於本發明中,只要其與本發明之經工程改造之逆轉錄子可相容。該結構域可能僅包括RNA依賴性DNA聚合酶活性。在某些實施例中,RT結構域係非突變誘發的, 亦即,不會引起供體多核苷酸之突變( 例如,在逆轉錄酶過程期間)。在某些實施例中,RT結構域可起源於非逆轉錄子RT, 例如病毒RT或人類內源性RT。在某些實施例中,RT結構域為逆轉錄子RT或DGR RT。在某些實施例中,RT可比配對物野生型RT具有更低突變誘發性。在某些實施例中,RT並非突變誘發的。 In certain embodiments, the RT domain of a reverse transcriptase is used in the present invention as long as it is compatible with the engineered retrotranscript of the present invention. The domain may include only RNA-dependent DNA polymerase activity. In certain embodiments, the RT domain is non-mutation-induced, that is , it will not cause mutations in the donor polynucleotide ( e.g. , during the reverse transcriptase process). In certain embodiments, the RT domain may originate from a non-retrotranscript RT, such as a viral RT or a human endogenous RT. In certain embodiments, the RT domain is a retrotranscript RT or a DGR RT. In certain embodiments, RT may have lower mutation inducibility than the partner wild-type RT. In certain embodiments, RT is not mutation-induced.

在一些實施例中,逆轉錄酶由逆轉錄子 ret基因編碼,該基因可伴隨同源 msrmsd基因座且特異性地識別同源ncRNA轉錄本之二級結構。 In some embodiments, the reverse transcriptase is encoded by the retrotranscriptase ret gene, which can be associated with cognate msr and msd loci and specifically recognize the secondary structure of cognate ncRNA transcripts.

在一些實施例中,RT可獲自原核或真核細胞。大多數逆轉錄酶(80%)可在種系發生上劃分為三個主要譜系:II組內含子、多樣性生成逆轉錄元件(DGR)及逆轉錄子。RT之其他進化枝包括流產感染(Abi) RT、CRISPR-Cas相關RT、II組樣(G2L)、未知組(UG)及rvt元件。In some embodiments, RT can be obtained from prokaryotic or eukaryotic cells. Most reverse transcriptases (80%) can be divided into three major lineages in terms of germline occurrence: group II introns, diversity-generating retrotransposons (DGRs), and retrotransposons. Other clades of RT include abortive infection (Abi) RTs, CRISPR-Cas-associated RTs, group II-like (G2L), unknown group (UG), and rvt elements.

在一些實施例中,RT基因為同源RT、來自同源RT之相同物種或進化枝內之物種的逆轉錄子RT或未在同源RT之同一進化枝內之逆轉錄子RT (諸如無關RT或經工程改造之RT)。在一些實施例中,非逆轉錄子相關RT為來自II組內含子之RT、多樣性生成逆轉錄元件(DGR)、流產感染(Abi) RT、CRISPR-Cas相關RT、II組樣(G2L)、未知組(UG)及rvt元件。參見Mestre 等人, Nucleic Acids Research, 第48卷, 第22期, 2020年12月16日, 第12632-12647頁;及Mestere 等人, UG/Abi: 「A Highly Diverse Family of Prokaryotic Reverse Transcriptases Associated With Defense Functions,」doi.org/10.1101/2021.12.02.470933 (以引用之方式併入) In some embodiments, the RT gene is a homologous RT, a retrotranscript RT from a species in the same species or clade as the homologous RT, or a retrotranscript RT that is not in the same clade as the homologous RT (such as an unrelated RT or an engineered RT). In some embodiments, non-retrotranscript related RTs are RTs from group II introns, diversity-generating retrotranscript elements (DGRs), abortive infection (Abi) RTs, CRISPR-Cas related RTs, group II-like (G2L), unknown group (UG), and rvt elements. See Mestre et al. , Nucleic Acids Research, Vol. 48, No. 22, December 16, 2020, pp. 12632-12647; and Mestre et al. , UG/Abi: “A Highly Diverse Family of Prokaryotic Reverse Transcriptases Associated With Defense Functions,” doi.org/10.1101/2021.12.02.470933 (incorporated by reference) .

在一些實施例中,RT來自與逆轉錄子/逆轉錄子樣序列相關之進化枝。在一些實施例中,RT選自表A中提供之RT。在一些實施例中,RT並非與表X中鑑定之序列相關之RT。In some embodiments, the RT is from a clade associated with a retrotransposons/retrotransposons-like sequence. In some embodiments, the RT is selected from the RTs provided in Table A. In some embodiments, the RT is not an RT associated with a sequence identified in Table X.

在原核生物逆轉錄子系統中,RT基因通常位於ncRNA ( msrmsd)基因座下游。在經工程改造之逆轉錄子中,RT位置可能與天然或野生型逆轉錄子不同。在一些實施例中,RT基因可以 順式提供,諸如 msr基因座或 msd基因座上游或下游。在某些實施例中,RT基因以 反式提供,諸如在本文所述之載體系統之載體中單獨提供,其中編碼 msrmsd序列之ncRNA在本文所述之載體系統之不同載體中提供。 In prokaryotic retrotransposons, the RT gene is usually located downstream of the ncRNA ( msr and msd) loci. In engineered retrotransposons, the RT position may be different from that of natural or wild-type retrotransposons. In some embodiments, the RT gene can be provided in cis , such as upstream or downstream of the msr locus or the msd locus. In certain embodiments, the RT gene is provided in trans , such as provided separately in a vector of a vector system described herein, wherein the ncRNA encoding the msr and msd sequences is provided in different vectors of the vector system described herein.

在一些實施例中,RT經修飾( 例如,一或多個核苷酸之插入、缺失及/或取代)或經密碼子最佳化以增強活性或可加工性。 In some embodiments, RT is modified ( e.g. , insertion, deletion and/or substitution of one or more nucleotides) or codon optimized to enhance activity or processability.

在某些實施例中,自RT中移除隱秘終止信號,由此允許生成更長ssDNA。In certain embodiments, the cryptic stop signal is removed from the RT, thereby allowing the generation of longer ssDNA.

在某些實施例中,RT來自編碼msDNA之逆轉錄子,如US 6,017,737;US5,849,563;US5,780,269;US5,436,141;US5,405,775;US5,320,958;CA2,075,515中所述;其均以引用之方式整體併入本文中。In certain embodiments, the RT is derived from a retrotranscript encoding msDNA as described in US 6,017,737; US 5,849,563; US 5,780,269; US 5,436,141; US 5,405,775; US 5,320,958; CA 2,075,515; all of which are incorporated herein by reference in their entirety.

在一些實施例中,經工程改造之逆轉錄子進一步包含編碼逆轉錄酶(RT)或其部分之多核苷酸(例如,DNA分子)。在一些實施例中,經編碼RT或其部分能夠合成編碼msDNA之 msd基因座中的至少一部分之DNA複本。 在一些實施例中,編碼RT之多核苷酸(例如,DNA分子)包含表A中列出之多核苷酸,或與表A中列出之多核苷酸具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多核苷酸。 In some embodiments, the engineered retrotranscript further comprises a polynucleotide (e.g., a DNA molecule) encoding a reverse transcriptase (RT) or a portion thereof. In some embodiments, the encoded RT or a portion thereof is capable of synthesizing a DNA copy of at least a portion of the msd locus encoding msDNA. In some embodiments, the polynucleotide (e.g., a DNA molecule) encoding RT comprises a polynucleotide listed in Table A, or a polynucleotide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, or at least 99.9% sequence identity to a polynucleotide listed in Table A.

在一些實施例中,編碼RT之多核苷酸編碼表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或表C之多肽。In some embodiments, the polynucleotide encoding RT encodes a polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, or at least 99.9% sequence identity to a polypeptide listed in Table A; and/or a polypeptide of Table C.

在一些實施例中,編碼RT之多核苷酸不包含表X中列出之多核苷酸。In some embodiments, the polynucleotides encoding RT do not include the polynucleotides listed in Table X.

一旦經轉譯,RT即與 msd基因座下游之ncRNA模板結合,從而形成RT-RNA複合物,且起始RNA朝向其5'端逆轉錄。因此,在某些態樣中,本揭示案係關於一種經工程改造之核酸-酶構築體,該構築體包含:a.非編碼RNA (ncRNA),其包含:i)編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及ii)編碼該msDNA之 msdRNA部分的 msd基因座;b.在選自以下之位置處或內部插入的異源核酸:該 msd基因座、該 msr基因座上游、該 msd基因座上游及該 msd基因座下游;及c.逆轉錄酶(RT)或其結構域,其包含:i)表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或ii)表C中列出之多肽。在一些實施例中,RT不包含表X中列出之多肽。 Once translated, RT binds to the ncRNA template downstream of the msd locus, thereby forming an RT-RNA complex and initiating reverse transcription of the RNA toward its 5' end. Thus, in certain aspects, the present disclosure relates to an engineered nucleic acid-enzyme construct comprising: a. a non-coding RNA (ncRNA) comprising: i) an msr locus encoding a portion of an msr RNA of multiple copies of a single-stranded DNA (msDNA); and ii) an msd RNA encoding the msDNA. a. a heterologous nucleic acid inserted at or within a position selected from the group consisting of the msd locus, upstream of the msr locus, upstream of the msd locus, and downstream of the msd locus ; and c. a reverse transcriptase (RT) or a domain thereof comprising: i) a polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polypeptide listed in Table A; and/or ii) a polypeptide listed in Table C. In some embodiments, RT does not comprise a polypeptide listed in Table X.

在某些態樣中,本揭示案係關於一種經工程改造之核酸-酶構築體,該構築體包含:a)非編碼RNA (ncRNA),其包含:i)編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及ii)編碼該msDNA之 msdRNA部分的 msd基因座,b)在選自以下之位置處或內部插入的異源核酸:該 msd基因座;該 msr基因座上游;該 msd基因座上游;及該 msd基因座下游;及c)逆轉錄酶(RT)或其部分,其中RT能夠合成編碼該msDNA之該 msd基因座中的至少一部分之DNA複本,且其中ncRNA及/或RT係本文所述之發明中的任一者。 In certain aspects, the disclosure relates to an engineered nucleic acid-enzyme construct comprising: a) a noncoding RNA (ncRNA) comprising: i) an msr locus encoding a msr RNA portion of multiple copies of a single-stranded DNA (msDNA); and ii) an msd locus encoding a msd RNA portion of the msDNA, b) a heterologous nucleic acid inserted at or within a position selected from: the msd locus; upstream of the msr locus; upstream of the msd locus; and downstream of the msd locus; and c) a reverse transcriptase (RT) or a portion thereof, wherein the RT is capable of synthesizing a DNA copy of at least a portion of the msd locus encoding the msDNA, and wherein the ncRNA and/or the RT is any one of the inventions described herein.

在某些態樣中,本揭示案係關於一種經工程改造之核酸-酶構築體,該構築體包含:a)非編碼RNA (ncRNA),其包含:i)編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及ii)編碼該msDNA之 msdRNA部分的 msd基因座;b)在選自以下之位置處或內部插入的異源核酸:該 msd基因座、該 msr基因座上游、該 msd基因座上游及該 msd基因座下游;及c)逆轉錄酶(RT)或其結構域:其中RT包含:i)表A中列出之RT,或與表A中列出之RT具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之RT;及/或ii)表C中列出之共有序列;且其中,RT不包含來自表X之序列。 In certain aspects, the disclosure relates to an engineered nucleic acid-enzyme construct comprising: a) a noncoding RNA (ncRNA) comprising: i) an msr locus encoding an msr RNA portion of a multi-copy single-stranded DNA (msDNA); and ii) an msd locus encoding an msd RNA portion of the msDNA; b) a heterologous nucleic acid inserted at or within a position selected from: the msd locus, upstream of the msr locus, upstream of the msd locus, and downstream of the msd locus; and c) a reverse transcriptase (RT) or a domain thereof: wherein the RT comprises: i) an RT listed in Table A, or has at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 100%, at least 101%, at least 102%, at least 103%, at least 104%, at least 105%, at least 106%, at least 107%, at least 108%, at least 109%, at least 110%, at least 111%, at least 112%, at least 113%, at least 114%, at least 115%, at least 116%, at least 117%, at least 118%, at least 119%, at least 120%, at least 121%, at least 122%, at least 123%, at least 124%, at least 125%, at least 126%, at least 127%, at least 128%, at least 129%, at least 130%, at least 131%, at least 132%, at least 0%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity; and/or ii) the consensus sequence listed in Table C; and wherein the RT does not comprise a sequence from Table X.

在本文所述之核酸酶構築體的一些實施例中,該ncRNA包含:i)表B中列出之ncRNA,或與表B中列出之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA;及/或視情況,其中該ncRNA並非來自表X之逆轉錄子的ncRNA。In some embodiments of the nuclease constructs described herein, the ncRNA comprises: i) an ncRNA listed in Table B, or an ncRNA having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to an ncRNA listed in Table B; and/or optionally, wherein the ncRNA is not an ncRNA from a retrotranscript of Table X.

在一些實施例中,RT連接至諸如RNA引導及非RNA引導之核酸酶之組分。可經由融合蛋白中之肽鍵或短連接肽進行連接。合適連接肽包括柔性連接體,諸如包含G或S重複之彼等,諸如G 4S重複單元或GS重複單元,具有1-20個重複, 例如1、2、3、4、5、6、7或8個重複。 In some embodiments, RT is linked to components such as RNA-guided and non-RNA-guided nucleases. Linking can be via a peptide bond or a short linker peptide in a fusion protein. Suitable linker peptides include flexible linkers, such as those containing G or S repeats, such as G4S repeat units or GS repeat units, with 1-20 repeats, such as 1, 2, 3, 4, 5, 6, 7 or 8 repeats.

在某些實施例中,RT經由非肽鍵化學連接或結合至RNA引導及非RNA引導之核酸酶。此類蛋白質結合物可與本文所述之經工程改造之逆轉錄子的核酸組分一起或單獨地直接遞送至宿主細胞。In certain embodiments, RT is chemically linked or conjugated to RNA-guided and non-RNA-guided nucleases via non-peptide bonds. Such protein conjugates can be delivered directly to host cells together with or alone the nucleic acid component of the engineered retrotranscript described herein.

在一些實施例中,RT連接至DNA修復調節生物分子( 例如,NHEJ肽抑制劑。 D. 可程式化核酸酶 (RNA 引導之核酸酶 ) In some embodiments, the RT is linked to a DNA repair regulatory biomolecule ( e.g. , an NHEJ peptide inhibitor. D. Programmable nucleases (RNA -guided nucleases )

在某些實施例中,經工程改造之逆轉錄子(例如,如本文所述的經工程改造之核酸構築體或經工程改造之核酸-酶構築體)可包含或編碼適合引導核酸酶靶向欲修飾之特定基因體序列的引導RNA (gRNA),作為一種異源核酸。gRNA包括與基因體序列互補之序列,且因此可藉由指導序列與標靶位點序列之間的雜交來介導核酸酶-gRNA複合物與基因體標靶位點之結合。In certain embodiments, an engineered retrotranscript (e.g., an engineered nucleic acid construct or an engineered nucleic acid-enzyme construct as described herein) may comprise or encode, as a heterologous nucleic acid, a guide RNA (gRNA) suitable for directing a nuclease to target a specific genome sequence to be modified. The gRNA includes a sequence that is complementary to the genome sequence and can thus mediate the binding of the nuclease-gRNA complex to the genome target site by hybridization between the guide sequence and the target site sequence.

在某些實施例中,gRNA可在由本文所述的經工程改造之逆轉錄子編碼之ncRNA及/或msDNA之5'端連接至ncRNA及/或msDNA。在某些實施例中,gRNA可在逆轉錄後在由本文所述的經工程改造之逆轉錄子編碼之ncRNA之RNA及/或msDNA之3'端連接至ncRNA及/或msDNA。In certain embodiments, the gRNA may be linked to the ncRNA and/or msDNA at the 5' end of the ncRNA and/or msDNA encoded by the engineered retrotranscript described herein. In certain embodiments, the gRNA may be linked to the ncRNA and/or msDNA at the 3' end of the RNA and/or msDNA of the ncRNA encoded by the engineered retrotranscript described herein after reverse transcription.

在一些實施例中,可與gRNA形成複合物之核酸酶可為技術公認之規律成簇間隔短回文重複(CRISPR)系統Cas效應酶中的任一者,其可用於 例如基因體編輯,包括哺乳動物細胞或人類細胞中之基因體編輯。 例如,可裝載至Cas9或其變異體中之gRNA可由經工程改造之逆轉錄子編碼,使得gRNA經轉錄為msDNA之部分。在一些實施例中,gRNA可在逆轉錄後連接至逆轉錄子ncRNA中之 msr編碼序列的a1區之5'端,以及msDNA。在一些實施例中,gRNA可為逆轉錄後產生之ncRNA及msDNA中存在的經修飾之 msr區域之部分( 亦即,經編碼之gRNA未由於藉由ncRNA逆轉錄合成msDNA而經降解)。 In some embodiments, the nuclease that can form a complex with the gRNA can be any of the Cas effectors of the technically recognized clustered regularly interspaced short palindromic repeat (CRISPR) system, which can be used , for example, for genome editing, including genome editing in mammalian cells or human cells. For example, the gRNA that can be loaded into Cas9 or its variants can be encoded by an engineered retrotranscript so that the gRNA is transcribed as part of the msDNA. In some embodiments, the gRNA can be linked to the 5' end of the a1 region of the msr coding sequence in the retrotranscript ncRNA, and the msDNA after reverse transcription. In some embodiments, the gRNA can be part of the modified msr region present in the ncRNA and msDNA produced after reverse transcription ( that is , the encoded gRNA has not been degraded due to the synthesis of msDNA by reverse transcription of the ncRNA).

已知可用於基於CRISPR之基因體編輯的任何技術公認之CRISPR/Cas效應酶或其變異體(「Cas酶」)均可與經工程改造之逆轉錄子一起使用,不過此類Cas酶可能未必為經工程改造之逆轉錄子之部分,且可(但不要求)單獨提供。例如,Cas酶可作為本文所述之載體系統之部分提供。合適Cas酶之編碼序列可存在於提供經工程改造之逆轉錄子之同一載體上,或存在於不同載體上。當經工程改造之逆轉錄子及Cas酶存在於同一載體上時,它們可處於相同啟動子、增強子或其他轉錄調節元件之轉錄控制下,或可單獨由載體系統中之不同啟動子、增強子及/或其他轉錄調節序列調節。Any technically recognized CRISPR/Cas effector enzyme or variant thereof ("Cas enzyme") known to be useful for CRISPR-based genome editing may be used with an engineered retrotranscript, although such Cas enzymes may not necessarily be part of an engineered retrotranscript, and may (but are not required to) be provided separately. For example, the Cas enzyme may be provided as part of the vector system described herein. The coding sequence of a suitable Cas enzyme may be present on the same vector that provides the engineered retrotranscript, or on a different vector. When the engineered retrotranscript and the Cas enzyme are present on the same vector, they may be under the transcriptional control of the same promoter, enhancer, or other transcriptional regulatory element, or may be regulated separately by different promoters, enhancers, and/or other transcriptional regulatory sequences in the vector system.

在一些實施例中,Cas酶為2類、II型CRISPR/Cas效應酶,諸如Cas9。在一些實施例中,Cas9來自 化膿鏈球菌(SpCas9),且由經工程改造之逆轉錄子編碼之gRNA包含作為單一引導RNA (gRNA)連接在一起的crisprRNA (crRNA)及tracrRNA兩者。 In some embodiments, the Cas enzyme is a Class 2, Type II CRISPR/Cas effector enzyme, such as Cas9. In some embodiments, Cas9 is from Streptococcus pyogenes (SpCas9), and the gRNA encoded by the engineered retrotranscript comprises both a crisprRNA (crRNA) and a tracrRNA linked together as a single guide RNA (gRNA).

在一些實施例中,Cas9來自 金黃色葡萄球菌Cas9 (SaCas9)或其經工程改造之變異體,諸如SaCas9-HF (具有全基因體活性之高保真度變異體)、KKHSaCas9 (其識別5'-NNGRRT-3' PAM,且人類基因體中之標靶位點範圍比野生型SaCas9寬2-4倍)及microABE1744 (一種經工程改造之SaCas9變異體,經調適用於腺嘌呤鹼基編輯(ABE),與其他核酸酶相比,靶向編輯顯著改良,具有減少之RNA脫靶足跡)。 In some embodiments, Cas9 is from Staphylococcus aureus Cas9 (SaCas9) or an engineered variant thereof, such as SaCas9-HF (a high-fidelity variant with genome-wide activity), KKHSaCas9 (which recognizes 5'-NNGRRT-3' PAM and has a 2-4-fold wider range of target sites in the human genome than wild-type SaCas9), and microABE1744 (an engineered SaCas9 variant tuned for adenine base editing (ABE), with significantly improved targeted editing compared to other nucleases and reduced RNA off-target footprints).

在一些實施例中,Cas9來自嗜熱鏈球菌(StCas9)、腦膜炎奈瑟菌(NmCas9)、新殺弗朗西斯菌(FnCas9)或空腸彎曲桿菌(CjCas9)。In some embodiments, Cas9 is from Streptococcus thermophilus (StCas9), Neisseria meningitidis (NmCas9), Francisella novicida (FnCas9), or Curculigo jejuni (CjCas9).

在一些實施例中,Cas9來自 犬鏈球菌(ScCas9),具有不太嚴格之PAM序列要求5'-NNG-3' (而非SpCas9之更嚴格5'-NGG-3')。 In some embodiments, Cas9 is from Streptococcus canis (ScCas9), which has a less stringent PAM sequence requirement of 5'-NNG-3' (rather than the more stringent 5'-NGG-3' of SpCas9).

在一些實施例中,Cas9來自 耳廓葡萄球菌(SauriCas9) (其識別5'-NNGG-3' PAM序列,具有高編輯活性)。 In some embodiments, Cas9 is from Staphylococcus auris (SauriCas9) (which recognizes a 5'-NNGG-3' PAM sequence and has high editing activity).

在一些實施例中,Cas酶係具有突變型催化結構域之Cas9變異體,其保留裂解特異性,但僅在所需標靶序列處對單一DNA股刻切口,而非產生雙股斷裂(DSB)。靶向同一標靶序列之不同股的兩種此類Cas9切口酶變異體可一起使用,以增加產生 DSB之保真度,每種變異體使用可由經工程改造之逆轉錄子提供的不同gRNA。In some embodiments, the Cas enzyme is a Cas9 variant with a mutant catalytic domain that retains cleavage specificity but only nicks a single DNA strand at a desired target sequence rather than generating a double-strand break (DSB). Two such Cas9 nickase variants targeting different strands of the same target sequence can be used together to increase the fidelity of generating DSBs, each variant using a different gRNA that can be provided by an engineered retrotranscript.

在一些實施例中,Cas酶係具有減弱之DNA磷酸骨架相互作用的高保真度Cas9變異體(諸如SpCas9-HF1),其呈現全基因體特異性及無法偵測之脫靶效應。In some embodiments, the Cas enzyme is a high-fidelity Cas9 variant (such as SpCas9-HF1) with weakened DNA phosphate backbone interactions, which exhibits genome-wide specificity and undetectable off-target effects.

在一些實施例中,Cas酶係稱為eSpCas9之Cas9變異體,其削弱eSpCas9與其gRNA之間的相互作用,與標靶DNA序列非精確互補性,由此提供經改良之特異性及更低脫靶編輯率。In some embodiments, the Cas enzyme is a Cas9 variant, referred to as eSpCas9, which weakens the interaction between eSpCas9 and its gRNA, and is not precisely complementary to the target DNA sequence, thereby providing improved specificity and lower off-target editing rates.

在一些實施例中,Cas酶係超精確Cas9變異體(HypaCas9),其改良裂解前之校對,且因此大大減少了脫靶裂解。In some embodiments, the Cas enzyme is a hyperprecise Cas9 variant (HypaCas9), which improves proofreading before cleavage and thus greatly reduces off-target cleavage.

在一些實施例中,Cas酶為Cas9變異體(FokI融合之dCas9),其將dCas9之DNA識別能力與活性核酸酶FokI之特異性相結合。所得核酸酶僅在二聚化後才切割標靶序列,這在脫靶位點處更難發生,由此增強特異性。In some embodiments, the Cas enzyme is a Cas9 variant (FokI-fused dCas9), which combines the DNA recognition ability of dCas9 with the specificity of the active nuclease FokI. The resulting nuclease cleaves the target sequence only after dimerization, which is more difficult to occur at off-target sites, thereby enhancing specificity.

在一些實施例中,Cas酶為Cas9變異體xCas9,其識別多種PAM序列,由此將基因體中之標靶位點增加至四分之一。此外,xCas9變異體亦展現比常用之SpCas9更低之脫靶率。In some embodiments, the Cas enzyme is a Cas9 variant xCas9, which recognizes multiple PAM sequences, thereby increasing the number of target sites in the genome by one quarter. In addition, the xCas9 variant also exhibits a lower off-target rate than the commonly used SpCas9.

在一些實施例中,Cas酶係具有改變之PAM序列特異性的Cas9變異體,包括具有擴展目標範圍之PAM序列之SpG變異體,以及可靶向幾乎所有PAM序列之SpRY變異體。In some embodiments, the Cas enzyme is a Cas9 variant with altered PAM sequence specificity, including SpG variants with expanded target range of PAM sequences and SpRY variants that can target nearly all PAM sequences.

在一些實施例中,Cas酶為dCas9,其具有不活化之催化核酸酶結構域,同時維持允許引導RNA介導之靶向特定DNA序列之識別結構域。dCas9可進一步連接至具有獨特生物學功能之功能結構域,諸如轉錄活化/抑制、DNA甲基化、去甲基化、核酸內切酶(諸如FokI)或螢光染料。代表性(非限制性) dCas9連接之功能結構域包括包括dCas9-SAM、dCas9-SunTag、dCas9-VPR及dCas9-KREB。In some embodiments, the Cas enzyme is dCas9, which has an inactive catalytic nuclease domain while maintaining a recognition domain that allows guide RNA-mediated targeting of specific DNA sequences. dCas9 can be further linked to functional domains with unique biological functions, such as transcriptional activation/inhibition, DNA methylation, demethylation, nucleases (such as FokI) or fluorescent dyes. Representative (non-limiting) dCas9-linked functional domains include dCas9-SAM, dCas9-SunTag, dCas9-VPR, and dCas9-KREB.

在一些實施例中,dCas9與具有胞苷去胺酶活性之催化酶融合,這將GC鹼基對轉化為AT鹼基對。In some embodiments, dCas9 is fused to a catalytic enzyme with cytidine deaminase activity, which converts GC base pairs to AT base pairs.

在一些實施例中,dCas9與經工程改造之RNA腺苷去胺酶融合,這將AT鹼基對轉化為GC鹼基對。In some embodiments, dCas9 is fused to an engineered RNA adenosine deaminase, which converts AT base pairs to GC base pairs.

在一些實施例中,Cas酶為2類、V型CRISPR/Cas效應酶,諸如Cas12a (Cpf1) (V-A型)、Cas12b (C2c1) (V-B型)、Cas12c (C2c3) (V-C型)、Cas12d (CasY) (V-D型)、Cas12e (CasX) (V-E型)、Cas12f (Cas14、C2c10) (V-F型)、Cas12g (V-G型)、Cas12h (V-H型)、Cas12i (V-I型)、Cas12k (C2c5) (V-K型)或C2c4 / C2c8 / C2c9 (V-U型)。In some embodiments, the Cas enzyme is a Class 2, V-type CRISPR/Cas effector enzyme, such as Cas12a (Cpf1) (V-A type), Cas12b (C2c1) (V-B type), Cas12c (C2c3) (V-C type), Cas12d (CasY) (V-D type), Cas12e (CasX) (V-E type), Cas12f (Cas14, C2c10) (V-F type), Cas12g (V-G type), Cas12h (V-H type), Cas12i (V-I type), Cas12k (C2c5) (V-K type) or C2c4/C2c8/C2c9 (V-U type).

在一些實施例中,Cas酶為Cas12a或Cpf1 (來自 普雷沃菌弗朗西斯菌1之CRISPR)。與Cas9不同,Cas12a由於其富AT之PAM序列而非常適合靶向富AT之DNA序列。在一些實施例中,Cas12a為FnCas12a (其識別PAM序列5'‐TTN‐3'),或AsCas12a或LbCas12a (其識別5'‐TTTV‐3' PAM序列),其中V為A、G或C核苷酸。此外,Cas12a在標靶DNA中產生交錯之雙股斷裂,而非由SpCas9生成之鈍端,因此使其更適用於HDR修復。在此實施例中,本發明之經工程改造之逆轉錄子編碼適合用於Cas12a之gRNA,因為該gRNA不需要示蹤RNA,且僅需要crRNA。 In some embodiments, the Cas enzyme is Cas12a or Cpf1 (CRISPR from Prevotella and Francisella 1). Unlike Cas9, Cas12a is well suited for targeting AT-rich DNA sequences due to its AT-rich PAM sequence. In some embodiments, Cas12a is FnCas12a (which recognizes the PAM sequence 5'-TTN-3'), or AsCas12a or LbCas12a (which recognizes the 5'-TTTV-3' PAM sequence), where V is an A, G, or C nucleotide. In addition, Cas12a produces staggered double-strand breaks in the target DNA, rather than the blunt ends generated by SpCas9, making it more suitable for HDR repair. In this embodiment, the engineered retrotranscript of the present invention encodes a gRNA suitable for use with Cas12a because the gRNA does not require a tracer RNA and only requires a crRNA.

在一些實施例中,Cas酶係來自 胺基酸球菌屬之Cas12a變異體(enAsCas12a),與野生型Cas12a相比,其具有擴展目標範圍之PAM序列及顯著較高之編輯活性。 In some embodiments, the Cas enzyme is a Cas12a variant (enAsCas12a) from Aminococcus , which has an expanded target range of PAM sequences and significantly higher editing activity compared to wild-type Cas12a.

在一些實施例中,Cas酶係減少脫靶編輯之高保真度Cas12a變異體(enAsCas12a-HF1)。In some embodiments, the Cas enzyme is a high-fidelity Cas12a variant with reduced off-target editing (enAsCas12a-HF1).

在一些實施例中,Cas酶為Cas12b或C2c1。在一些實施例中,Cas12b來自 酸土脂環酸芽孢桿菌(AacCas12b),或來自 嗜酸脂環酸芽孢桿菌Cas12b (AapCas12b)。 In some embodiments, the Cas enzyme is Cas12b or C2c1. In some embodiments, Cas12b is from Acidococcus terrestris (AacCas12b), or from Acidococcus acidophilus Cas12b (AapCas12b).

在一些實施例中,Cas酶係在37℃下起作用之Cas12b變異體,諸如一種形式之 外村尚芽孢桿菌(Bacillus hisashii)(BhCas12b)。 In some embodiments, the Cas enzyme is a Cas12b variant that functions at 37°C, such as a form of Bacillus hisashii (BhCas12b).

在一些實施例中,Cas酶係比SpCas9具有更高特異性之BhCas12b變異體。In some embodiments, the Cas enzyme is a BhCas12b variant with higher specificity than SpCas9.

在一些實施例中,Cas酶為CasX或Cas12d。In some embodiments, the Cas enzyme is CasX or Cas12d.

在一些實施例中,Cas酶為CasY或Cas12e,且本發明之經工程改造之逆轉錄子編碼短互補性非轉譯RNA (scoutRNA)以及crRNA (而非其他CRISPR-Cas系統中使用之tracrRNA)。In some embodiments, the Cas enzyme is CasY or Cas12e, and the engineered retrotranscript of the present invention encodes a short complementary non-translated RNA (scoutRNA) and crRNA (rather than tracrRNA used in other CRISPR-Cas systems).

在一些實施例中,Cas酶係來自古細菌之Cas14 。Cas14靶向單股(ss) DNA標靶序列,不需要PAM序列來進行活化,且具有附帶活性( 亦即,在結合標靶序列後非特異性地切割其他非標靶ssDNA股)。與Cas12a不同,Cas14a需要與標靶ssDNA之高保真度互補性,且對ssDNA標靶受質之內部種子區域錯配極其敏感。 In some embodiments, the Cas enzyme is Cas14 from Archaea. Cas14 targets single-stranded (ss) DNA target sequences, does not require a PAM sequence for activation, and has incidental activity ( i.e. , non-specifically cleaves other non-target ssDNA strands after binding to the target sequence). Unlike Cas12a, Cas14a requires high-fidelity complementarity with the target ssDNA and is extremely sensitive to mismatches in the internal seed region of the ssDNA target substrate.

在一些實施例中,gRNA核酸酶係經工程改造之RNA引導之FokI核酸酶。RNA引導之FokI核酸酶包含無活性Cas9 (dCas9)及FokI核酸內切酶之融合(FokI-dCas9),其中dCas9部分對FokI賦予引導RNA依賴性靶向。對於經工程改造之RNA引導之Fold核酸酶的描述,參見 例如Havlicek等人 (2017) Mol. Ther. 25(2):342-355,Pan等人 (2016) Sci Rep. 6:35794,Tsai等人 (2014) Nat Biotechnol. 32(6):569-576;以引用之方式併入本文中。 In some embodiments, the gRNA nuclease is an engineered RNA-guided FokI nuclease. The RNA-guided FokI nuclease comprises a fusion of an inactive Cas9 (dCas9) and a FokI endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting to FokI. For a description of engineered RNA-guided Fold nucleases, see , e.g., Havlicek et al. (2017) Mol. Ther. 25(2):342-355, Pan et al. (2016) Sci Rep. 6:35794, Tsai et al. (2014) Nat Biotechnol. 32(6):569-576; incorporated herein by reference.

在其他實施例中,RNA引導之核酸酶可為非CRISPER/Cas相關核酸酶,諸如轉位子編碼之核酸酶、IscB、IscR或TnpB。In other embodiments, the RNA-guided nuclease may be a non-CRISPER/Cas related nuclease, such as a transposon-encoded nuclease, IscB, IscR, or TnpB.

在一些實施例中,Cas酶為Cas9In some embodiments, the Cas enzyme is Cas9

在一些實施例中,本文所述之基於逆轉錄子之編輯系統可包括任何Cas9等效物。Cas9等效物係提供與Cas9相同或實質上相同功能之Cas9樣蛋白。例如,若Cas9係指CRISPR-Cas系統之II型酶,則Cas9等效物可指CRISPR-Cas系統之V型或VI型酶。In some embodiments, the retrotransposons-based editing systems described herein may include any Cas9 equivalents. Cas9 equivalents are Cas9-like proteins that provide the same or substantially the same function as Cas9. For example, if Cas9 refers to a type II enzyme of a CRISPR-Cas system, a Cas9 equivalent may refer to a type V or type VI enzyme of a CRISPR-Cas system.

例如,Cas12e (CasX)為Cas9等效物,據報告具有與Cas9相同之功能,但藉由趨同演化而演化。因此,預期Cas12e (CasX)蛋白與本文所述之基於逆轉錄子之編輯系統一起使用。另外,Cas12e (CasX)之任何變異體或修飾均為可設想的且在本揭示案之範圍內。For example, Cas12e (CasX) is a Cas9 equivalent that is reported to have the same function as Cas9, but evolved by convergent evolution. Therefore, it is expected that the Cas12e (CasX) protein is used with the retrotransposons-based editing system described herein. In addition, any variants or modifications of Cas12e (CasX) are conceivable and within the scope of the present disclosure.

在一些實施例中,Cas9等效物可指Cas12e (CasX)或Cas12d (CasY),其已描述於例如Burstein等人,「New CRISPR–Cas systems from uncultivated microbes.」Cell Res.2017年2月21日. doi: 10.1038/cr.2017.21中,其全部內容藉此以引用之方式併入。使用基因體解析之宏基因體學,鑑定出多種CRISPR–Cas系統,包括在古菌生命域中首次報告之Cas9。此趨異Cas9蛋白在很少研究之奈米古菌中發現,作為活性CRISPR–Cas系統之部分。在細菌中,發現了兩種先前未知之系統:CRISPR– Cas12e及CRISPR– Cas12d,其為迄今發現之最緊湊系統之一。在一些實施例中,Cas9係指Cas12e或Cas12e之變異體。在一些實施例中,Cas9係指Cas12d或Cas12d之變異體。應理解,其他RNA引導之DNA結合蛋白可用作核酸可程式化DNA結合蛋白(napDNAbp),且在本揭示案之範圍內。In some embodiments, a Cas9 equivalent may refer to Cas12e (CasX) or Cas12d (CasY), which have been described, for example, in Burstein et al., "New CRISPR-Cas systems from uncultivated microbes." Cell Res. Feb. 21, 2017. doi: 10.1038/cr.2017.21, the entire contents of which are hereby incorporated by reference. Using genome-resolved metagenomics, a variety of CRISPR-Cas systems have been identified, including the first reported Cas9 in the archaeal domain of life. This exotic Cas9 protein was found in a little-studied nanoarchaea as part of an active CRISPR-Cas system. In bacteria, two previously unknown systems have been discovered: CRISPR-Cas12e and CRISPR-Cas12d, which are among the most compact systems discovered to date. In some embodiments, Cas9 refers to Cas12e or a variant of Cas12e. In some embodiments, Cas9 refers to Cas12d or a variant of Cas12d. It should be understood that other RNA-guided DNA binding proteins can be used as nucleic acid programmable DNA binding proteins (napDNAbp) and are within the scope of the present disclosure.

在一些實施例中,Cas9等效物包含與天然存在之Cas12e (CasX)或Cas12d (CasY)蛋白至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%或至少99.5%一致之胺基酸序列。在一些實施例中,napDNAbp為天然存在之Cas12e (CasX)或Cas12d (CasY)蛋白。在一些實施例中,napDNAbp包含與野生型Cas蛋白或本文所提供之任何Cas部分至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%或至少99.5%一致之胺基酸序列。In some embodiments, the Cas9 equivalent comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% consistent with a naturally occurring Cas12e (CasX) or Cas12d (CasY) protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% consistent with a wild-type Cas protein or any Cas portion provided herein.

在各個實施例中,核酸可程式化DNA結合蛋白包括但不限於Cas9 (例如,dCas9及nCas9)、Cas12e (CasX)、Cas12d (CasY)、Cas12a (Cpf1)、Cas12b1 (C2c1)、Cas13a (C2c2)、Cas12c (C2c3)、Argonaute及Cas12b1。具有與Cas9不同之PAM特異性的核酸可程式化DNA結合蛋白之一實例係來自普雷沃菌及弗朗西斯菌1之規律成簇間隔短回文重複(亦即,Cas12a (Cpf1))。與Cas9類似,Cas12a (Cpf1)亦為2類CRISPR效應子,但其為V型酶子組之成員,而非II型子組之成員。已顯示,Cas12a (Cpf1)介導強大DNA干擾,其特徵與Cas9不同。Cas12a (Cpf1)為一種缺乏tracrRNA之單一RNA引導之核酸內切酶,且其利用富T之原間隔基相鄰模體(TTN、TTTN或YTN)。此外,Cpf1經由交錯之DNA雙股斷裂來裂解DNA。在16種Cpf1-家族蛋白中,來自胺基酸球菌及毛螺菌科之兩種酶顯示出在人類細胞中具有有效基因體編輯活性。Cpf1蛋白為此項技術中已知的且先前已有描述,例如Yamano等人,「Crystal structure of Cpf1 in complex with guide RNA and target DNA.」Cell (165) 2016, 第949-962頁;其全部內容藉此以引用之方式併入。In various embodiments, nucleic acid programmable DNA binding proteins include, but are not limited to, Cas9 (e.g., dCas9 and nCas9), Cas12e (CasX), Cas12d (CasY), Cas12a (Cpf1), Cas12b1 (C2c1), Cas13a (C2c2), Cas12c (C2c3), Argonaute, and Cas12b1. One example of a nucleic acid programmable DNA binding protein with a different PAM specificity from Cas9 is the clustered regularly interspaced short palindromic repeats from Prevotella and Francisella 1 (i.e., Cas12a (Cpf1)). Similar to Cas9, Cas12a (Cpf1) is also a Class 2 CRISPR effector, but it is a member of the Type V enzyme subgroup, not a member of the Type II subgroup. Cas12a (Cpf1) has been shown to mediate potent DNA interference with features distinct from those of Cas9. Cas12a (Cpf1) is a single RNA-guided endonuclease that lacks a tracrRNA and utilizes a T-rich protospacer-adjacent motif (TTN, TTTN, or YTN). In addition, Cpf1 cleaves DNA via staggered DNA double-strand breaks. Among the 16 Cpf1-family proteins, two enzymes from Aminococcus and Lachnospiraceae were shown to have potent genome editing activity in human cells. The Cpf1 protein is known in the art and has been described previously, for example, in Yamano et al., "Crystal structure of Cpf1 in complex with guide RNA and target DNA." Cell (165) 2016, pp. 949-962; the entire contents of which are hereby incorporated by reference.

在其他實施例中,Cas蛋白可包括任何CRISPR相關蛋白,包括但不限於Cas12a、Cas12b1、Cas1、Cas1B、Cas2、Cas3、Cas4、Cas5、Cas6、Cas7、Cas8、Cas9 (亦稱為Csn1及Csx12)、Cas10、Csy1、Csy2、Csy3、Cse1、Cse2、Csc1、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、Cmr1、Cmr3、Cmr4、Cmr5、Cmr6、Csb1、Csb2、Csb3、Csx17、Csx14、Csx10、Csx16、CsaX、Csx3、Csx1、Csx15、Csf1、Csf2、Csf3、Csf4、其同源物或其經修飾形式。In other embodiments, the Cas protein may include any CRISPR-related protein, including but not limited to Cas12a, Cas12b1, Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, homologs thereof, or modified forms thereof.

在各個其他實施例中,RNA引導之核酸酶可為以下蛋白質中之任一者:Cas9、Cas12a (Cpf1)、Cas12e (CasX)、Cas12d (CasY)、Cas12b1 (C2c1)、Cas13a (C2c2)、Cas12c (C2c3)、GeoCas9、CjCas9、Cas12g、Cas12h、Cas12i、Cas13b、Cas13c、Cas13d、Cas14、Csn2、xCas9、SpCas9-NG或Argonaute (Ago)結構域或其變異體。In various other embodiments, the RNA-guided nuclease can be any of the following proteins: Cas9, Cas12a (Cpf1), Cas12e (CasX), Cas12d (CasY), Cas12b1 (C2c1), Cas13a (C2c2), Cas12c (C2c3), GeoCas9, CjCas9, Cas12g, Cas12h, Cas12i, Cas13b, Cas13c, Cas13d, Cas14, Csn2, xCas9, SpCas9-NG or an Argonaute (Ago) domain or a variant thereof.

RNA引導之核酸酶之胺基酸序列為此項技術中容易獲得且已知的。例示性RNA引導之核酸酶及其胺基酸序列可見於例如WO 2017/070633、US 2020/0010835、US 2022/0204975、US 11071790、WO 2020/191233、US 11447770、US 10858639及US 10947530中,其中每一者以引用之方式整體併入本文中。 E. 可程式化核酸酶 ( 其他 ) The amino acid sequences of RNA-guided nucleases are readily available and known in the art. Exemplary RNA-guided nucleases and their amino acid sequences can be found, for example, in WO 2017/070633, US 2020/0010835, US 2022/0204975, US 11071790, WO 2020/191233, US 11447770, US 10858639, and US 10947530, each of which is incorporated herein by reference in its entirety. E. Programmable Nucleases ( Others )

除了CRISPR/Cas系統以外,本發明之經工程改造之逆轉錄子亦可與不使用引導RNA來識別標靶序列之序列特異性核酸酶組合使用,諸如非CRISPR/Cas序列特異性核酸酶,包括TALEN、ZFN、大範圍核酸酶及限制酶,以及使用其他RNA指導之其他序列特異性核酸酶,諸如轉位子編碼之IscB、IscR或TnpB。In addition to the CRISPR/Cas system, the engineered retrotranscripts of the present invention can also be used in combination with sequence-specific nucleases that do not use guide RNA to recognize the target sequence, such as non-CRISPR/Cas sequence-specific nucleases, including TALEN, ZFN, meganuclease and restriction enzymes, and other sequence-specific nucleases guided by other RNAs, such as transposon-encoded IscB, IscR or TnpB.

例如,本發明之經工程改造之逆轉錄子可編碼或提供msDNA,msDNA可充當HDR介導之基因體編輯的供體或模板序列。視情況,經工程改造之逆轉錄子的RT與此類序列特異性核酸酶融合,使得藉由接近HDR介導之基因體編輯位點之RT所產生的msDNA可更有效地參與HDR介導之基因體編輯。For example, the engineered retrotranscript of the present invention can encode or provide msDNA, which can serve as a donor or template sequence for HDR-mediated genome editing. Optionally, the RT of the engineered retrotranscript is fused to such sequence-specific nucleases, so that the msDNA generated by the RT close to the HDR-mediated genome editing site can more effectively participate in HDR-mediated genome editing.

在一些實施例中,非CRISPR/Cas序列特異性核酸酶係或包含TALE核酸酶、TALE切口酶、鋅指(ZF)核酸酶、ZF切口酶、大範圍核酸酶或其組合。在一些實施例中,非CRISPR/Cas序列特異性核酸酶係或包括獨立選擇之TALE核酸酶、TALE切口酶、鋅指(ZF)核酸酶、ZF切口酶、大範圍核酸酶、限制酶或其組合中的兩者、三者、四者或更多者。在一些實施例中,該組合係或包含TALE核酸酶/ZF核酸酶;TALE切口酶/ZF切口酶。In some embodiments, the non-CRISPR/Cas sequence-specific nuclease is or comprises a TALE nuclease, a TALE nickase, a zinc finger (ZF) nuclease, a ZF nickase, a meganuclease, or a combination thereof. In some embodiments, the non-CRISPR/Cas sequence-specific nuclease is or comprises two, three, four or more of independently selected TALE nucleases, TALE nickases, zinc finger (ZF) nucleases, ZF nickases, meganucleases, restriction enzymes, or a combination thereof. In some embodiments, the combination is or comprises a TALE nuclease/ZF nuclease; a TALE nickase/ZF nickase.

在一些實施例中,非CRISPR/Cas序列特異性核酸酶係或包含TALE核酸酶(轉錄活化子樣效應子核酸酶(TALEN))。TALEN係經工程改造以切割特定標靶DNA序列之限制酶。TALEN包含TAL效應子(TALE) DNA結合結構域(其在標靶DNA處或附近結合),與切割標靶DNA之DNA裂解結構域融合。TALE係經工程改造以結合於幾乎任何所需DNA序列。因此,在一些實施例中,TALEN包含N末端加帽區、DNA結合結構域及C末端加帽區,該DNA結合結構域可包含至少一或多種特定地經排序以靶向相關基因體基因座之TALE單體或半單體,其中此三個部分以預定之N末端至C末端取向經排列。視情況,TALEN包括至少一或多個調節或功能性蛋白結構域。In some embodiments, the non-CRISPR/Cas sequence-specific nuclease is or comprises a TALE nuclease (transcription activator-like effector nuclease (TALEN)). TALEN is a restriction enzyme engineered to cleave a specific target DNA sequence. TALEN comprises a TAL effector (TALE) DNA binding domain (which binds at or near the target DNA), fused to a DNA cleavage domain that cleaves the target DNA. TALE is engineered to bind to almost any desired DNA sequence. Therefore, in some embodiments, TALEN comprises an N-terminal capping region, a DNA binding domain, and a C-terminal capping region, and the DNA binding domain may comprise at least one or more TALE monomers or half-monomers specifically ordered to target a relevant genomic locus, wherein these three parts are arranged in a predetermined N-terminal to C-terminal orientation. Optionally, TALEN includes at least one or more regulatory or functional protein domains.

在一些實施例中,TALE單體或半單體可為源自天然或野生型TALE單體但在自然界中通常高度保守之位置處具有改變之胺基酸的變異型TALE單體,且尤其具有作為RVD之胺基酸之組合,該等RVD在自然界中未出現,且其可識別具有比天然存在之RVD更高的活性、特異性及/或親和力之核苷酸。該等變異體可包括胺基酸層面上之缺失、插入及取代,以及核酸層面上在一或多個位置處之顛換、轉變及倒置。該等變異體亦可包括截短。In some embodiments, the TALE monomer or half-monomer may be a variant TALE monomer derived from a natural or wild-type TALE monomer but having altered amino acids at positions that are usually highly conserved in nature, and in particular having a combination of amino acids as RVDs that do not occur in nature and that can recognize nucleotides with higher activity, specificity and/or affinity than naturally occurring RVDs. Such variants may include deletions, insertions and substitutions at the amino acid level, as well as transpositions, conversions and inversions at one or more positions at the nucleic acid level. Such variants may also include truncations.

在一些實施例中,TALE單體/半單體變異體包括母分子之同源及功能性衍生物。在一些實施例中,該等變異體由能夠在高嚴格條件下與編碼母分子之野生型核苷酸序列雜交的多核苷酸編碼。In some embodiments, TALE monomer/half-monomer variants include homologous and functional derivatives of the parent molecule. In some embodiments, the variants are encoded by polynucleotides capable of hybridizing with the wild-type nucleotide sequence encoding the parent molecule under highly stringent conditions.

在一些實施例中,TALE之DNA結合結構域具有至少5種或5種以上TALE單體及至少一或多種半單體,其特定地經排序或排列以靶向相關基因體基因座。本發明之TALE或多肽之構建及生成可涉及此項技術中已知之任何方法。In some embodiments, the DNA binding domain of a TALE has at least 5 or more TALE monomers and at least one or more half-monomers that are specifically ordered or arranged to target a relevant genomic locus. The construction and production of the TALE or polypeptide of the present invention may involve any method known in the art.

天然存在之TALE或「野生型TALE」係由多個變形菌種分泌之核酸結合蛋白。TALE含有由高度保守單體多肽之串聯重複構成的核酸結合結構域,該等多肽主要為33、34或35個胺基酸長,且彼此不同之處主要在於胺基酸位置12及13。DNA結合結構域內所包含之TALE單體之一般表示為Xl-11-(X12X13)-X14-33或34或35,其中下標指示胺基酸位置且X表示任何胺基酸。X12X13指示RVD。在一些多肽單體中,位置13處之可變胺基酸缺失或不存在,且在此類單體中,RVD由單一胺基酸組成。在此類情況下,RVD可替代地表示為X*,其中X表示X12且(*)指示X13不存在。DNA結合結構域可包含TALE單體之數個重複,且這可表示為(Xl-11-(X12X13)-X14-33或34或35)z,其中z視情況為至少5-40,諸如10-26。Naturally occurring TALE or "wild-type TALE" is a nucleic acid binding protein secreted by multiple mutant species. TALE contains a nucleic acid binding domain composed of tandem repeats of highly conserved monomer polypeptides, which are mainly 33, 34 or 35 amino acids long and differ from each other mainly at amino acid positions 12 and 13. The TALE monomer contained in the DNA binding domain is generally represented as Xl-11-(X12X13)-X14-33 or 34 or 35, where the subscript indicates the amino acid position and X represents any amino acid. X12X13 indicates an RVD. In some polypeptide monomers, the variable amino acid at position 13 is missing or absent, and in such monomers, the RVD consists of a single amino acid. In such cases, the RVD can be alternatively represented as X*, where X represents X12 and (*) indicates that X13 is absent. The DNA binding domain may comprise several repeats of the TALE monomer and this may be represented as (Xl-11-(X12X13)-X14-33 or 34 or 35)z, wherein z is optionally at least 5-40, such as 10-26.

TALE單體具有由其RVD中之胺基酸一致性決定的核苷酸結合親和力。具有NI RVD之多肽單體優先結合於腺嘌呤(A),具有NG RVD之單體優先結合於胸腺嘧啶(T),具有HD RVD之單體優先結合於胞嘧啶(C),具有NN RVD之單體優先結合於腺嘌呤(A)及鳥嘌呤(G)兩者,具有IG RVD之單體優先結合於T,具有NS RVD之單體識別所有四個鹼基對且可結合於A、T、G或C。因此,TALE之核酸結合結構域中的多肽單體重複之數目及次序決定其核酸標靶特異性。TALE之結構及功能進一步描述於例如Moscou 等人, Science326:1501 (2009);Boch 等人, Science326:1509-1512 (2009);及Zhang 等人, Nature Biotechnology29:149-153 (2011)中,其中每一者均以引用之方式整體併入。 TALE monomers have nucleotide binding affinity determined by the amino acid identity in their RVDs. Polypeptide monomers with NI RVDs preferentially bind to adenine (A), monomers with NG RVDs preferentially bind to thymine (T), monomers with HD RVDs preferentially bind to cytosine (C), monomers with NN RVDs preferentially bind to both adenine (A) and guanine (G), monomers with IG RVDs preferentially bind to T, and monomers with NS RVDs recognize all four base pairs and can bind to A, T, G, or C. Therefore, the number and order of polypeptide monomer repeats in the nucleic acid binding domain of a TALE determines its nucleic acid target specificity. The structure and function of TALEs are further described in, e.g., Moscou et al. , Science 326: 1501 (2009); Boch et al. , Science 326: 1509-1512 (2009); and Zhang et al. , Nature Biotechnology 29: 149-153 (2011), each of which is incorporated by reference in its entirety.

在一些實施例中,TALE為dTALE (或設計者TALE),參見Zhang 等人, Nature Biotechnology 29:149-153 (2011),以引用之方式併入本文中。 In some embodiments, the TALE is a dTALE (or designer TALE), see Zhang et al. , Nature Biotechnology 29:149-153 (2011), incorporated herein by reference.

在一些實施例中,TALE單體包含優先結合於鳥嘌呤之HN或NH RVD,且TALE對含有鳥嘌呤之標靶核酸序列具有高結合特異性。在以下實施例中,具有RVD RN、NN、NK、SN、NH、KN、HN、NQ、HH、RG、KH、RH及SS之多肽單體優先結合於鳥嘌呤。在一些實施例中,具有RVD RN、NK、NQ、HH、KH、RH、SS及SN之多肽單體優先結合於鳥嘌呤。在一些實施例中,具有RVD HH、KH、NH、NK、NQ、RH、RN及SS之多肽單體優先結合於鳥嘌呤。在一些實施例中,對鳥嘌呤具有高結合特異性之RVD為RN、NH RH及KH。在一些實施例中,具有NV RVD之多肽單體優先結合於腺嘌呤及鳥嘌呤,與具有RVD HN之單體相同。具有NC RVD之單體優先結合於腺嘌呤、鳥嘌呤及胞嘧啶,且具有S(或S*) RVD之單體以可相當親和力結合於腺嘌呤、鳥嘌呤、胞嘧啶及胸腺嘧啶。在更多實施例中,具有H*、HA、KA、N*、NA、NC、NS、RA及S* RVD之單體以可相當親和力結合於腺嘌呤、鳥嘌呤、胞嘧啶及胸腺嘧啶。此類多肽單體允許生成能夠結合於相關但不一致之標靶核酸序列之譜系的退化性TALE。In some embodiments, the TALE monomer comprises an HN or NH RVD that preferentially binds to guanine, and the TALE has high binding specificity to a target nucleic acid sequence containing guanine. In the following embodiments, polypeptide monomers having RVDs RN, NN, NK, SN, NH, KN, HN, NQ, HH, RG, KH, RH, and SS preferentially bind to guanine. In some embodiments, polypeptide monomers having RVDs RN, NK, NQ, HH, KH, RH, SS, and SN preferentially bind to guanine. In some embodiments, polypeptide monomers having RVDs HH, KH, NH, NK, NQ, RH, RN, and SS preferentially bind to guanine. In some embodiments, the RVDs with high binding specificity to guanine are RN, NH RH, and KH. In some embodiments, polypeptide monomers with NV RVD preferentially bind to adenine and guanine, as do monomers with RVD HN. Monomers with NC RVD preferentially bind to adenine, guanine, and cytosine, and monomers with S (or S*) RVD bind to adenine, guanine, cytosine, and thymine with comparable affinity. In further embodiments, monomers with H*, HA, KA, N*, NA, NC, NS, RA, and S* RVDs bind to adenine, guanine, cytosine, and thymine with comparable affinity. Such polypeptide monomers allow for the generation of degenerate TALEs capable of binding to a repertoire of related but non-identical target nucleic acid sequences.

在某些實施例中,TALE多肽具有含有以預定之N末端至C末端次序排列的多肽單體之核酸結合結構域,使得每種多肽單體結合於預定標靶核酸序列之核苷酸,且其中至少一種多肽單體具有HN或NH RVD且優先結合於鳥嘌呤,具有NV RVD且優先結合於腺嘌呤及鳥嘌呤,具有NC RVD且優先結合於腺嘌呤、鳥嘌呤及胞嘧啶,或具有S RVD且結合於腺嘌呤、鳥嘌呤、胞嘧啶及胸腺嘧啶。In certain embodiments, the TALE polypeptide has a nucleic acid binding domain comprising polypeptide monomers arranged in a predetermined N-terminal to C-terminal order, such that each polypeptide monomer binds to a nucleotide of a predetermined target nucleic acid sequence, and wherein at least one polypeptide monomer has an HN or NH RVD and preferentially binds to guanine, has an NV RVD and preferentially binds to adenine and guanine, has an NC RVD and preferentially binds to adenine, guanine and cytosine, or has an S RVD and binds to adenine, guanine, cytosine and thymine.

在一些實施例中,核酸結合結構域中結合於腺嘌呤之每種多肽單體具有NI、NN、NV、NC或S RVD。In some embodiments, each polypeptide monomer in the nucleic acid binding domain that binds to adenine has a NI, NN, NV, NC, or S RVD.

在某些實施例中,核酸結合結構域中結合於鳥嘌呤之每種多肽單體具有HN、NH、NN、NV、NC或S RVD。In certain embodiments, each polypeptide monomer in the nucleic acid binding domain that binds to guanine has an HN, NH, NN, NV, NC, or S RVD.

在某些實施例中,核酸結合結構域中結合於胞嘧啶之每種多肽單體具有HD、NC或S RVD。In certain embodiments, each polypeptide monomer in the nucleic acid binding domain that binds to cytosine has a HD, NC, or S RVD.

在一些實施例中,結合於胸腺嘧啶之每種多肽單體具有NG或S RVD。In some embodiments, each polypeptide monomer that binds to thymine has a NG or S RVD.

在一些實施例中,核酸結合結構域中結合於腺嘌呤之每種多肽單體具有NI RVD。In some embodiments, each polypeptide monomer in the nucleic acid binding domain that binds to adenine has a NI RVD.

在某些實施例中,核酸結合結構域中結合於鳥嘌呤之每種多肽單體具有HN或NH RVD。In certain embodiments, each polypeptide monomer in the nucleic acid binding domain that binds to guanine has an HN or NH RVD.

在某些實施例中,核酸結合結構域中結合於胞嘧啶之每種多肽單體具有HD RVD。In certain embodiments, each polypeptide monomer in the nucleic acid binding domain that binds to cytosine has an HD RVD.

在一些實施例中,結合於胸腺嘧啶之每種多肽單體具有NG RVD。In some embodiments, each polypeptide monomer that binds to thymine has a NG RVD.

在某些實施例中,對腺嘌呤具有特異性之RVD為NI、RI、KI、HI及SI。In certain embodiments, the RVDs specific for adenine are NI, RI, KI, HI, and SI.

在某些實施例中,對腺嘌呤具有特異性之RVD為HN、SI及RI,用於腺嘌呤特異性之RVD最佳為SI。In certain embodiments, the RVDs specific for adenine are HN, SI, and RI, and the most preferred RVD for adenine specificity is SI.

在某些實施例中,對胸腺嘧啶具有特異性之RVD為NG、HG、RG及KG。In certain embodiments, the RVDs specific for thymine are NG, HG, RG, and KG.

在某些實施例中,對胸腺嘧啶具有特異性之RVD為KG、HG及RG,用於胸腺嘧啶特異性之RVD最佳為KG或RG。In certain embodiments, the RVDs specific for thymine are KG, HG, and RG, and the most preferred RVD for thymine specificity is KG or RG.

在某些實施例中,對胞嘧啶具有特異性之RVD為HD、ND、KD、RD、HH、YG及SD。In certain embodiments, the RVDs specific for cytosine are HD, ND, KD, RD, HH, YG, and SD.

在某些實施例中,對胞嘧啶具有特異性之RVD為SD及RD。 WO 2012/067428之圖4B提供代表性RVD及其靶向之核苷酸,其全部內容藉此以引用之方式併入本文中。 In certain embodiments, the RVDs specific for cytosine are SD and RD. Representative RVDs and their targeted nucleotides are provided in Figure 4B of WO 2012/067428, the entire contents of which are hereby incorporated herein by reference.

在某些實施例中,變異型TALE單體可包含對如WO2012/067428之圖4A所描繪之核苷酸展現特異性的任何RVD。所有此類TALE單體均允許生成能夠結合於相關但不一致之標靶核酸序列之譜系的退化性TALE。In certain embodiments, the variant TALE monomer may comprise any RVD that exhibits specificity for a nucleotide as depicted in Figure 4A of WO2012/067428. All such TALE monomers allow for the generation of degenerate TALEs that are capable of binding to a repertoire of related but non-identical target nucleic acid sequences.

在某些實施例中,RVD SH可對G具有特異性,RVD IS可對A具有特異性,且RVD IG可對T具有特異性。In certain embodiments, RVD SH may be specific for G, RVD IS may be specific for A, and RVD IG may be specific for T.

在某些實施例中,RVD NT可結合於G及A。在某些實施例中,RVD NP可結合於A、T及C。在某些實施例中,至少一種所選擇之RVD可為NI、HD、NG、NN、KN、RN、NH、NQ、SS、SN、NK、KH、RH、HH、KI、HI、RI、SI、KG、HG、RG、SD、ND、KD、RD、YG、HN、NV、NS、HA、S*、N*、KA、H*、RA、NA或NC。In certain embodiments, RVD NT can bind to G and A. In certain embodiments, RVD NP can bind to A, T, and C. In certain embodiments, at least one selected RVD can be NI, HD, NG, NN, KN, RN, NH, NQ, SS, SN, NK, KH, RH, HH, KI, HI, RI, SI, KG, HG, RG, SD, ND, KD, RD, YG, HN, NV, NS, HA, S*, N*, KA, H*, RA, NA, or NC.

核酸或DNA結合結構域之一或多種多肽單體的預定之N末端至C末端次序決定了本發明之TALE或多肽可結合之相應的預定標靶核酸序列。The predetermined N-terminal to C-terminal sequence of one or more polypeptide monomers of a nucleic acid or DNA binding domain determines the corresponding predetermined target nucleic acid sequence to which the TALE or polypeptide of the present invention can bind.

如本文所用,單體及至少一或多種半單體「特定地經排序以靶向」相關基因體基因座或基因。在植物基因體中,天然TALE結合位點始終以胸腺嘧啶(T)開始,胸腺嘧啶可由TALE多肽之非重複N末端內的隱秘信號來指定;在一些情況下,此區域可稱為重複0。在動物基因體中,TALE結合位點不一定必須以胸腺嘧啶(T)開始,且本發明之多肽可靶向以T、A、G或C開始之DNA序列。TALE單體之串聯重複始終以半長重複或序列延伸段結束,該序列延伸段可能僅與重複全長TALE單體之前20個胺基酸共享一致性,且此半重複可稱為半單體(WO 2012/067428之圖8)。因此,由此可見所靶向之核酸或DNA之長度等於完整單體之數目加2 (參見WO 2012/067428之圖44)。As used herein, monomers and at least one or more half-monomers are "specifically ordered to target" a genomic locus or gene of interest. In plant genomes, the natural TALE binding site always begins with thymine (T), which can be specified by a cryptic signal within the non-repetitive N-terminus of the TALE polypeptide; in some cases, this region may be referred to as repeat 0. In animal genomes, the TALE binding site does not necessarily have to begin with thymine (T), and the polypeptides of the present invention can target DNA sequences that begin with T, A, G, or C. The tandem repeats of a TALE monomer always end with a half-length repeat or sequence stretch that may only share identity with the first 20 amino acids of the repeating full-length TALE monomer, and this half-repeat may be referred to as a half-monomer (Figure 8 of WO 2012/067428). Therefore, it can be seen that the length of the targeted nucleic acid or DNA is equal to the number of complete monomers plus 2 (see Figure 44 of WO 2012/067428).

在某些實施例中,核酸結合結構域係經工程改造以含有5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25個或更多個多肽單體,該等多肽單體以N末端至C末端方向排列以結合於預定5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25個核苷酸長度之核酸序列。In certain embodiments, the nucleic acid binding domain is engineered to contain 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or more polypeptide monomers arranged in an N-terminal to C-terminal direction to bind to a nucleic acid sequence of a predetermined length of 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 nucleotides.

在某些實施例中,核酸結合結構域係經工程改造以含有5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26個或更多個全長多肽單體,該等多肽單體特定地經排序或排列以靶向分別為7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27及28個核苷酸長度之核酸序列。在某些實施例中,該等多肽單體為連續的。在一些實施例中,半單體可用於替代一或多種單體,尤其若其存在於TALE之C末端時。In certain embodiments, the nucleic acid binding domain is engineered to contain 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 or more full-length polypeptide monomers specifically ordered or arranged to target nucleic acid sequences of 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 and 28 nucleotides in length, respectively. In certain embodiments, the polypeptide monomers are contiguous. In some embodiments, half-monomers can be used to replace one or more monomers, especially if they are present at the C-terminus of the TALE.

多肽單體通常為33、34或35個胺基酸長。除RVD外,多肽單體之胺基酸序列係高度保守的或如本文所述,多肽單體中之胺基酸(除RVD外)展現影響TALE活性之模式,其鑑定可用於本發明之較佳實施例中。The polypeptide monomer is usually 33, 34 or 35 amino acids long. Except for the RVD, the amino acid sequence of the polypeptide monomer is highly conserved or as described herein, the amino acids in the polypeptide monomer (except for the RVD) show a pattern that affects TALE activity, and its identification can be used in preferred embodiments of the present invention.

在某些實施例中,當DNA結合結構域可包含(Xl-11-X12X13-X14-33或34或35)z時,其中Xl-11為11個連續胺基酸之鏈,其中X12X13為重複可變二殘基(RVD),其中X14-33或34或35為21、22或23個連續胺基酸之鏈,其中z為至少5至26,則較佳胺基酸組合為Xl-4處之LTLD (SEQ ID NO:19936)或LTLA (SEQ ID NO:19937)或LTQV (SEQ ID NO:19938),或位置X30-33或X31-34或X32-35處之EQHG (SEQ ID NO:19939)或RDHG (SEQ ID NO:19940)。此外,當單體為34個胺基酸長時,單體中之其他相關胺基酸組合為Xl-4處之LTPD (SEQ ID NO:19941)及XI 6-20處之NQALE (SEQ ID NO:19942)以及X32-34處之DHG。當單體為33或35個胺基酸長時,則相應移位發生於連續胺基酸NQALE (SEQ ID NO:19942)及DHG之位置中。在某些實施例中,NQALE (SEQ ID NO:19942)在X15-19或X17-21處且DHG在X31-33或X33-35處。In certain embodiments, when the DNA binding domain may comprise (X1-11-X12X13-X14-33 or 34 or 35) z, wherein X1-11 is a chain of 11 consecutive amino acids, wherein X12X13 is a repeating variable diresidue (RVD), wherein X14-33 or 34 or 35 is a chain of 21, 22 or 23 consecutive amino acids, wherein z is at least 5 to 26, then the preferred amino acid combination is LTLD (SEQ ID NO: 19936) or LTLA (SEQ ID NO: 19937) or LTQV (SEQ ID NO: 19938) at X1-4, or EQHG (SEQ ID NO: 19939) or RDHG (SEQ ID NO: 19940) at positions X30-33 or X31-34 or X32-35. NO:19940). In addition, when the monomer is 34 amino acids long, the other relevant amino acid combinations in the monomer are LTPD (SEQ ID NO:19941) at X1-4 and NQALE (SEQ ID NO:19942) at X16-20 and DHG at X32-34. When the monomer is 33 or 35 amino acids long, the corresponding shift occurs in the positions of the consecutive amino acids NQALE (SEQ ID NO:19942) and DHG. In certain embodiments, NQALE (SEQ ID NO:19942) is at X15-19 or X17-21 and DHG is at X31-33 or X33-35.

在某些實施例中,當單體為34個胺基酸長時,單體中之相關胺基酸組合為Xl-4處之LTPD及X16-20處之KRALE (SEQ ID NO:19943)以及X32-34處之AHG,或Xl-4處之LTPE (SEQ ID NO:19944)及XI 6-20處之KRALE (SEQ ID NO:19943)以及X32-34處之DHG。當單體為33或35個胺基酸長時,相應移位發生於連續胺基酸KRALE (SEQ ID NO:19943)、AHG及DHG之位置中。在某些實施例中,連續胺基酸之位置可為(Xl-4處之LTPD及X15-19處之KRALE (SEQ ID NO:19943)以及X31-33處之AHG)或(Xl-4處之LTPE及X15-19處之KRALE (SEQ ID NO:19943)以及X31-33處之DHG)或(Xl-4處之LTPD及X17-21處之KRALE以及X33-35處之AHG)或(Xl-4處之LTPE及X17-21處之KRALE (SEQ ID NO:19943)以及X33-35處之DHG)。In certain embodiments, when the monomer is 34 amino acids long, the relevant amino acid combination in the monomer is LTPD at X1-4 and KRALE (SEQ ID NO: 19943) at X16-20 and AHG at X32-34, or LTPE (SEQ ID NO: 19944) at X1-4 and KRALE (SEQ ID NO: 19943) at X16-20 and DHG at X32-34. When the monomer is 33 or 35 amino acids long, the corresponding shift occurs in the position of the consecutive amino acids KRALE (SEQ ID NO: 19943), AHG and DHG. In certain embodiments, the positions of the consecutive amino acids may be (LTPD at X1-4 and KRALE (SEQ ID NO: 19943) at X15-19 and AHG at X31-33) or (LTPE at X1-4 and KRALE (SEQ ID NO: 19943) at X15-19 and DHG at X31-33) or (LTPD at X1-4 and KRALE at X17-21 and AHG at X33-35) or (LTPE at X1-4 and KRALE (SEQ ID NO: 19943) at X17-21 and DHG at X33-35).

在某些實施例中,連續胺基酸[NGKQALE] (SEQ ID NO:19945)存在於位置X14-20或X13-19或X15-21處。此等代表性位置提出本發明之各個實施例且提供鑑定所有TALE單體中之額外相關胺基酸或相關胺基酸組合之指導(參見WO 2012/067428之圖24A-24F及圖25)。In certain embodiments, the consecutive amino acids [NGKQALE] (SEQ ID NO: 19945) are present at positions X14-20 or X13-19 or X15-21. These representative positions suggest various embodiments of the present invention and provide guidance for identifying additional relevant amino acids or relevant amino acid combinations in all TALE monomers (see Figures 24A-24F and Figure 25 of WO 2012/067428).

下文提供多肽單體之保守部分之例示性胺基酸序列。每個序列中之RVD位置由XX或X*表示(其中(*)指示RVD為單一胺基酸且殘基13 (X13)不存在)。 LTPAQVVAIASXXGGKQALETVQRLLPVLCQDHG (SEQ ID NO: 19402) LTPAQVVAIASX*GGKQALETVQRLLPVLCQDHG (SEQ ID NO: 19403) LTPDQVVAIANXXGGKQALATVQRLLPVLCQDHG (SEQ ID NO: 19404) LTPDQVVAIANXXGGKQALETLQRLLPVLCQDHG (SEQ ID NO: 19405) LTPDQVVAIANXXGGKQALETVQRLLPVLCQDHG (SEQ ID NO: 19406) LTPDQVVAIASXXGGKQALATVQRLLPVLCQDHG (SEQ ID NO: 19407) LTPDQVVAIASXXGGKQALETVQRLLPVLCQDHG (SEQ ID NO: 19408) LTPDQVVAIASXXGGKQALETVQRVLPVLCQDHG (SEQ ID NO: 19409) LTPEQVVAIASXXGGKQALETVQRLLPVLCQAHG (SEQ ID NO: 19410) LTPYQVVAIASXXGSKQALETVQRLLPVLCQDHG (SEQ ID NO: 19411) LTREQVVAIASXXGGKQALETVQRLLPVLCQDHG (SEQ ID NO: 19412) LSTAQVVAIASXXGGKQALEGIGEQLLKLRTAPYG (SEQ ID NO: 19413) LSTAQVVAVASXXGGKPALEAVRAQLLALRAAPYG (SEQ ID NO: 19414) Exemplary amino acid sequences of the conserved portion of the polypeptide monomer are provided below. The RVD position in each sequence is represented by XX or X* (where (*) indicates that the RVD is a single amino acid and residue 13 (X13) is absent). LTPAQVVAIASXXGGKQALETVQRLLPVLCQDHG (SEQ ID NO: 19402) LTPAQVVAIASX*GGKQALETVQRLLPVLCQDHG (SEQ ID NO: 19403) LTPDQVVAIANXXGGKQALATVQRLLPVLCQDHG (SEQ ID NO: 19404) LTPDQVVAIANXXGGKQALETLQRLLPVLCQDH G (SEQ ID NO: 19405) LTPDQVVAIANXXGGKQALETVQRLLPVLCQDHG (SEQ ID NO: 19406) LTPDQVVAIASXXGGKQALATVQRLLPVLCQDHG (SEQ ID NO: 19407) LTPDQVVAIASXXGGKQALETVQRLLPVLCQDHG (SEQ ID NO: 19408) LTPDQVVAIASXXGGKQALETVQRVLPVLCQDHG (SEQ ID NO: 19409) LTPEQVVAIASXXGGKQALETVQRLLPVLCQAHG (SEQ ID NO: 19410) LTPYQVVAIASXXGSKQALETVQRLLPVLCQDHG (SEQ ID NO: 19411) LTREQVVAIASXXGG KQALETVQRLLPVLCQDHG (SEQ ID NO: 19412) LSTAQVVAIASXXGGKQALEGIGEQLLKLRTAPYG (SEQ ID NO: 19413) LSTAQVVAVASXXGGKPALEAVRAQLLALRAAPYG (SEQ ID NO: 19414)

WO 2012/067428 (其以引用之方式併入本文中)之圖24A-F提供排除可以序列(X1-11-X14-34或Xl-11-X14-35)表示之RVD的TALE單體之進一步清單,其中X為任何胺基酸且下標為胺基酸位置。Figures 24A-F of WO 2012/067428 (which is incorporated herein by reference) provide a further list of TALE monomers that exclude RVDs that can be represented by the sequence (X1-11-X14-34 or X1-11-X14-35), where X is any amino acid and the subscript is the amino acid position.

在某些實施例中,藉由將來自天然存在之TALE之DNA結合區的直接N末端或C末端「加帽區」之胺基酸序列包括於經工程改造之TALE的經工程改造之TALE DNA結合區之N末端或N末端位置處,來增加TALE多肽結合效率。因此,在某些實施例中,本文所述之TALE多肽進一步包含N末端加帽區及/或C末端加帽區。In certain embodiments, TALE polypeptide binding efficiency is increased by including an amino acid sequence from a "capping region" directly N-terminal or C-terminal to the DNA binding region of a naturally occurring TALE at the N-terminus or N-terminal position of the engineered TALE DNA binding region of the engineered TALE. Thus, in certain embodiments, the TALE polypeptides described herein further comprise an N-terminal capping region and/or a C-terminal capping region.

N末端加帽區之例示性胺基酸序列為: MDPIRSRTPSPARELLSGPQPDGVQPTADRGVSPPAGGPLDGLPARRTMSRTRLPSPPAPSPAFSADSFSDLLRQFDPSLFNTSLFDSLPPFGAHHTEAATGEWDEVQSGLRAADAPPPTMRVAVTAARPPRAKPAPRRRAAQPSDASPAAQVDLRTLGYSQQQQEKIKPKVRSTVAQHHEALVGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEATHEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDTGQLLKIAKRGGVTAVEAVHAWRNALTGAPLN (SEQ ID NO: 19415) An exemplary amino acid sequence for the N-terminal capping region is: MDPIRSRTPSPARELLSGPQPDGVQPTADRGVSPPAGGPLDGLPARRTMSRTRLPSPPAPSPAFSADSFSDLLRQFDPSLFNTSLFDSLPPFGAHHTEAATGEWDEVQSGLRAADAAPPPTMRVAVTAARPPRAKPAPRRRAAQPSDASPAAQVDLRTLGYSQQQQEKIKPKVRSTVAQHHEALVGHGFTHAHIV ALSQHPAALGTVAVKYQDMIAALPEATHEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDTGQLLKIAKRGGVTAVEAVHAWRNALTGAPLN (SEQ ID NO: 19415)

C末端加帽區之例示性胺基酸序列為: RPALESIVAQLSRPDPALAALTNDHLVALACLGGRPALDAVKKGLPHAPALIKRTNRRIPERTSHRVADHAQVVRVLGFFQCHSHPAQAFDDAMTQFGMSRHGLLQLFRRVGVTELEARSGTLPPASQRWDRILQASGMKRAKPSPTSTQTPDQASLHAFADSLERDLDAPSPMHEGDQTRAS (SEQ ID NO: 19416) An exemplary amino acid sequence of the C-terminal capping region is: RPALESIVAQLSRPDPALAALTNDHLVALACLGGRPALDAVKKGLPHAPALIKRTNRRIPERTSHRVADHAQVVRVLGFFQCHSHPAQAFDDAMTQFGMSRHGLLQLFRRVGVTELEARSGTLPPASQRWDRILQASGMKRAKPSPTSTQTPDQASLHAFADSLERDLDAPSPMHEGDQTRAS (SEQ ID NO: 19416)

如本文所用,N末端加帽區的預定之「N末端」至「C末端」取向、包含重複TALE單體之DNA結合結構域及C末端加帽區為本發明之d-TALE或多肽中之不同結構域的組織提供結構基礎。As used herein, the predetermined "N-terminal" to "C-terminal" orientation of the N-terminal capping region, the DNA binding domain comprising the repeat TALE monomer, and the C-terminal capping region provide the structural basis for the organization of the different domains in the d-TALE or polypeptide of the present invention.

完整N末端及/或C末端加帽區對於增強DNA結合區之結合活性並非必需的。因此,在某些實施例中,N末端及/或C末端加帽區之片段包括於本文所述之TALE多肽中。The complete N-terminal and/or C-terminal capping region is not necessary to enhance the binding activity of the DNA binding region. Therefore, in certain embodiments, fragments of the N-terminal and/or C-terminal capping region are included in the TALE polypeptides described herein.

在某些實施例中,本文所述之TALE (包括多種TALE)多肽含有N末端加帽區片段,該片段包括N末端加帽區之至少10、20、30、40、50、54、60、70、80、87、90、94、100、102、110、117、120、130、140、147、150、160、170、180、190、200、210、220、230、240、250、260或270個胺基酸。在某些實施例中,N末端加帽區片段胺基酸屬於N末端加帽區之C末端(DNA結合區近端)。包括C末端240個胺基酸之N末端加帽區片段增強與全長加帽區相等之結合活性,而包括C末端147個胺基酸之片段保留全長加帽區之大於80%功效,且包括C末端117個胺基酸之片段保留全長加帽區之大於50%活性。In certain embodiments, the TALE (including a plurality of TALE) polypeptides described herein contain an N-terminal capping region fragment that includes at least 10, 20, 30, 40, 50, 54, 60, 70, 80, 87, 90, 94, 100, 102, 110, 117, 120, 130, 140, 147, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, or 270 amino acids of the N-terminal capping region. In certain embodiments, the N-terminal capping region fragment amino acids are at the C-terminus (DNA binding region proximal) of the N-terminal capping region. The N-terminal capping region fragment including the C-terminal 240 amino acids enhanced binding activity equivalent to the full-length capping region, while the fragment including the C-terminal 147 amino acids retained greater than 80% of the efficacy of the full-length capping region, and the fragment including the C-terminal 117 amino acids retained greater than 50% of the activity of the full-length capping region.

在一些實施例中,本文所述之TALE多肽含有C末端加帽區片段,該片段包括C末端加帽區之至少6、10、20、30、37、40、50、60、68、70、80、90、100、110、120、127、130、140、150、155、160、170、180個胺基酸。在某些實施例中,C末端加帽區片段胺基酸屬於C末端加帽區之N末端(DNA結合區近端)。在某些實施例中,包括C末端68個胺基酸之C末端加帽區片段增強與全長加帽區相等之結合活性,而包括C末端20個胺基酸之片段保留全長加帽區之大於50%功效。In some embodiments, the TALE polypeptides described herein contain a C-terminal capping region fragment comprising at least 6, 10, 20, 30, 37, 40, 50, 60, 68, 70, 80, 90, 100, 110, 120, 127, 130, 140, 150, 155, 160, 170, 180 amino acids of the C-terminal capping region. In certain embodiments, the C-terminal capping region fragment amino acids belong to the N-terminus (DNA binding region proximal) of the C-terminal capping region. In certain embodiments, the C-terminal capping region fragment comprising the C-terminal 68 amino acids enhances binding activity equivalent to the full-length capping region, while the fragment comprising the C-terminal 20 amino acids retains greater than 50% of the efficacy of the full-length capping region.

在某些實施例中,本文所述之TALE多肽的加帽區不需要具有與本文所提供之加帽區序列一致的序列。因此,在一些實施例中,本文所述之TALE多肽的加帽區具有與本文所提供之加帽區胺基酸序列至少50%、60%、70%、80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%一致或共享一致性之序列。序列一致性與序列同源性相關。同源性比較可藉由眼睛進行,或更通常地,借助於可容易獲得之序列比較程式進行。此等市售電腦程式可計算兩個或兩個以上序列之間之同源性百分比(%),且亦可計算兩個或兩個以上胺基酸或核酸序列共享之序列一致性。在一些較佳實施例中,本文所述之TALE多肽的加帽區具有與本文所提供之加帽區胺基酸序列至少95%一致或共享一致性之序列。In certain embodiments, the capping region of the TALE polypeptides described herein need not have a sequence that is identical to the capping region sequence provided herein. Thus, in some embodiments, the capping region of the TALE polypeptides described herein has a sequence that is at least 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical or shares identity with the capping region amino acid sequence provided herein. Sequence identity is related to sequence homology. Homology comparisons can be made by eye, or more commonly, with the aid of readily available sequence comparison programs. These commercially available computer programs can calculate the percent (%) of homology between two or more sequences, and can also calculate the sequence identity shared by two or more amino acid or nucleic acid sequences. In some preferred embodiments, the capping region of the TALE polypeptides described herein has a sequence that is at least 95% identical or shares identity with the capping region amino acid sequences provided herein.

序列同源性可藉由此項技術中已知之多種電腦程式中之任一種生成,該等電腦程式包括但不限於BLAST或FASTA。亦可使用用於進行比對之合適電腦程式,如GCG Wisconsin Bestfit套裝。一旦該軟體產生最佳比對,就可計算同源性%,較佳地序列一致性%。該軟體通常進行此舉作為序列比較之部分,且生成數值結果。同源性%可在連續序列上計算, 亦即,一個序列與另一序列進行比對,且將一個序列中之每個胺基酸或核苷酸與另一序列中之相應胺基酸或核苷酸直接進行比較,一次一個殘基。這稱為「無間隙」比對。通常,此類無間隙比對僅在相對較少數目之殘基上執行。 Sequence homology can be generated by any of a variety of computer programs known in the art, including but not limited to BLAST or FASTA. Suitable computer programs for alignment, such as the GCG Wisconsin Bestfit suite, can also be used. Once the software produces the best alignment, the homology %, preferably the sequence identity %, can be calculated. The software usually does this as part of a sequence comparison and generates a numerical result. Homology % can be calculated on a continuous sequence, that is , one sequence is aligned with another sequence, and each amino acid or nucleotide in one sequence is directly compared with the corresponding amino acid or nucleotide in the other sequence, one residue at a time. This is called a "gapless" alignment. Typically, such gapless alignments are performed only on a relatively small number of residues.

多肽單體之保守部分以及N末端及C末端加帽區之額外序列包括於具有以下基因寄存編號之序列中:AAW59491.1、AAQ79773.2、YP_450163.1、YP_001912778.1、ZP_02242672.1、AAW59493.1、AAY54170.1、ZP_02245314.1、ZP_02243372.1、AAT46123.1、AAW59492.1、YP_451030.1、YP_001915105.1、ZP_02242534.1、AAW77510.1、ACD11364.1、ZP_02245056.1、ZP_02245055.1、ZP_02242539.1、ZP_02241531.1、ZP_02243779.1、AAN01357.1、ZP_02245177.1、ZP_02243366.1、ZP_02241530.1、AAS58130.3、ZP_02242537.1、YP_200918.1、YP_200770.1、YP_451187.1、YP_451156.1、AAS58127.2、YP_451027.1、UR_451025.1、AAA92974.1、UR_001913755.1、ABB70183.1、UR_451893.1、UR_450167.1、ABY60855.1、UR_200767.1、ZR_02245186.1、ZR_02242931.1、ZR_02242535.1、AAU54169.1、UR_450165.1、UR_001913452.1、AAS58129.3、ACM44927.1、ZR_02244836.1、AAT46125.1、UR_450161.1、ZR_02242546.1、AAT46122.1、UR_451897.1、AAF98343.1、UR_001913484.1、AAY54166.1、UR_001915093.1、UR_001913457.1、ZR_02242538.1、UR_200766.1、UR_453043.1、UR_001915089.1、UR_001912981.1、ZR_02242929.1、UR_001911730.1、UR_201654.1、UR_199877.1、ABB70129.1、UR_451696.1、UR_199876.1、AAS75145.1、AAT46124.1、UR_200914.1、UR001915101.1、ZR_02242540.1、AAG02079.2、UR_451895.1、YP451189.1、UR_200915.1、AAS46027.1、UR_001913759.1、UR_001912987.1、AAS58128.2、AAS46026.1、UR_201653.1、UR_202894.1、UR_001913480.1、ZR_02242666.1、R_001912775.1、ZR_02242662.1、AAS46025.1、AAC43587.1、BAA37119.1、NPJ544725.1、AB077779.1、BAA37120.1、ACZ62652.1、BAF46271.1、ACZ62653.1、NPJ544793.1、ABO77780.1、ZR_02243740.1、ZR_02242930.1、AAB69865.1、AAY54168.1、ZR_02245191.1、UR_001915097.1、ZR_02241539.1、UR_451158.1、BAA37121.1、UR_001913182.1、UR_200903.1、ZR_02242528.1、ZR_06705357.1、ZR_06706392.1、ADI48328.1、ZR_06731493.1、ADI48327.1、AB077782.1、ZR06731656.1、NR_942641.1、AAY43360.1、ZR_06730254.1、ACN39605.1、UR_451894.1、UR_201652.1、UR_001965982.1、BAF46269.1、NPJ544708.1、ACN82432.1、AB077781.1、P14727.2、BAF46272.1、AAY43359.1、BAF46270.1、NR_644743.1、ABG37631.1、AAB00675.1、YP199878.1、ZR_02242536.1、CAA48680.1、ADM80412.1、AAA27592.1、ABG37632.1、ABP97430.1、ZR_06733167.1、AAY43358.1、2KQ5_A、BAD42396.1、ABO27075.1、UR_002253357.1、UR_002252977.1、ABO27074.1、ABO27067.1、ABO27072.1、ABO27068.1、UR_003750492.1、ABO27073.1、NR_519936.1、ABO27071.1、AB027070.1及ABO27069.1,其中每一者藉此以引用之方式併入。The conserved portions of the polypeptide monomers and additional sequences of the N-terminal and C-terminal capping regions are included in the sequences with the following gene accession numbers: AAW59491.1, AAQ79773.2, YP_450163.1, YP_001912778.1, ZP_02242672.1, AAW59493.1, AAY54170.1, ZP_02245314.1, ZP_02243372.1, AAT46123.1, AAW59492.1, YP_451030.1, YP_001915105.1, ZP_02242534.1, AAW77510.1, ACD11364.1, ZP_02245056.1, ZP_ 02245055.1, ZP_02242539.1, ZP_02241531.1, ZP_02243779.1, AAN01357.1, ZP_02245177.1, ZP_02243366.1, ZP_02241530.1, AAS58130.3, ZP_0224 2537.1 , YP_200918.1, YP_200770.1, YP_451187.1, YP_451156.1, AAS58127.2, YP_451027.1, UR_451025.1, AAA92974.1, UR_001913755.1, ABB70183.1, UR_4518 93 .1, UR_450167.1, ABY60855.1, UR_200767.1, ZR_02245186.1, ZR_02242931.1, ZR_02242535.1, AAU54169.1, UR_450165.1, UR_001913452.1, AAS58129.3 , ACM44927.1, ZR_02244836.1, AAT46125.1, UR_450161.1, ZR_02242546.1, AAT46122.1, UR_451897.1, AAF98343.1, UR_001913484.1, AAY54166.1, UR_0019 15 093.1, UR_001913457.1, ZR_02242538.1, UR_200766.1, UR_453043.1, UR_001915089.1, UR_001912981.1, ZR_02242929.1, UR_001911730.1, UR_201654. 1. UR_199877.1, ABB70129.1, UR_451696.1, UR_199876.1, AAS75145.1, AAT46124.1, UR_200914.1, UR001915101.1, ZR_02242540.1, AAG02079.2, UR_451895. 1、YP451189.1、UR_200915.1、AAS46027.1、UR_001913759.1、UR_001912987.1、AAS58128.2、AAS46026.1、UR_201653.1、UR_202894.1、UR_001913480.1、ZR_02242666.1、R_001912775.1、ZR_02242662.1、AAS46025.1、AAC43587.1、BAA37119.1、NPJ544725.1、AB077779.1、BAA37120.1、ACZ62652.1、BAF46271.1、AC Z62653.1, NPJ544793.1, ABO77780.1, ZR_02243740.1, ZR_02242930.1, AAB69865.1, AAY54168.1, ZR_02245191.1, UR_001915097.1, ZR_02241539.1, UR_ 45 1158.1, BAA37121.1, UR_001913182.1, UR_200903.1, ZR_02242528.1, ZR_06705357.1, ZR_06706392.1, ADI48328.1, ZR_06731493.1, ADI48327.1, AB07 778 2.1、ZR06731656.1、NR_942641.1、AAY43360.1、ZR_06730254.1、ACN39605.1、UR_451894.1、UR_201652.1、UR_001965982.1、BAF46269.1、NPJ544708.1、ACN82432.1、AB077781.1、P14727.2、BAF46272.1、AAY43359.1、BAF46270.1、NR_644743.1、ABG37631.1、AAB00675.1、YP199878.1、ZR_02242536.1、CAA48680.1 , ADM80412.1, AAA27592.1, ABG37632.1, ABP97430.1, ZR_06733167.1, AAY43358.1, 2KQ5_A, BAD42396.1, ABO27075.1, UR_002253357.1, UR_002252977.1, ABO27074.1, ABO27067.1, ABO27072.1, ABO27068.1, UR_003750492.1, ABO27073.1, NR_519936.1, ABO27071.1, AB027070.1, and ABO27069.1, each of which is hereby incorporated by reference.

在一些實施例中,本文所述之TALE亦包括核定位信號及/或細胞攝取信號。此類信號係此項技術中已知的且可將TALE靶向細胞之細胞核及/或細胞內隔室。此類細胞攝取信號包括但不限於跨越人類免疫缺乏病毒Tat蛋白之殘基47-57的最小Tat蛋白轉導結構域:YGRKKRRQRRR (SEQ ID NO:19952)。In some embodiments, the TALEs described herein also include a nuclear localization signal and/or a cytosolic uptake signal. Such signals are known in the art and can target the TALE to the nucleus and/or intracellular compartment of a cell. Such cytosolic uptake signals include, but are not limited to, the minimal Tat protein transduction domain spanning residues 47-57 of the human immunodeficiency virus Tat protein: YGRKKRRQRRR (SEQ ID NO: 19952).

在一些實施例中,本文所述之TALE包括作為非TALE核酸或非TALE DNA結合結構域之核酸或DNA結合結構域。 如本文所用,術語「非TALE DNA結合結構域」係指具有與編碼TALE蛋白或其片段之核酸實質上不同源之核酸序列所對應的核酸序列之DNA結合結構域, 例如,與編碼TALE蛋白之核酸不同且源自相同或不同生物體之核酸序列。 In some embodiments, the TALEs described herein include nucleic acids or DNA binding domains that are non-TALE nucleic acids or non-TALE DNA binding domains. As used herein, the term "non-TALE DNA binding domain" refers to a DNA binding domain having a nucleic acid sequence that corresponds to a nucleic acid sequence that is substantially non-originating from a nucleic acid encoding a TALE protein or a fragment thereof, for example , a nucleic acid sequence that is different from a nucleic acid encoding a TALE protein and that originates from the same or a different organism.

在某些實施例中,本文所述之TALE包括連接至非TALE多肽之核酸或DNA結合結構域。In certain embodiments, a TALE described herein comprises a nucleic acid or DNA binding domain linked to a non-TALE polypeptide.

「非TALE多肽」係指具有與TALE蛋白或其片段實質上不同源之蛋白質所對應的胺基酸序列之多肽, 例如,與TALE蛋白不同且源自相同或不同生物體之蛋白質。在本文中,術語「連接」意欲包括可能使核酸結合結構域及非TALE多肽彼此連接之任何方式,包括例如藉由作為同一多肽鏈之部分的肽鍵或藉由其他共價相互作用,諸如化學連接體。非TALE多肽可連接至例如核酸結合結構域之N末端及/或C末端,可連接至C末端或N末端帽區,或可間接連接至核酸結合結構域。 "Non-TALE polypeptide" refers to a polypeptide having an amino acid sequence corresponding to a protein that is not substantially derived from a TALE protein or a fragment thereof, for example , a protein that is different from the TALE protein and originates from the same or a different organism. As used herein, the term "linked" is intended to include any means by which the nucleic acid binding domain and the non-TALE polypeptide may be linked to each other, including, for example, by peptide bonds that are part of the same polypeptide chain or by other covalent interactions, such as chemical linkers. The non-TALE polypeptide may be linked, for example, to the N-terminus and/or C-terminus of the nucleic acid binding domain, may be linked to a C-terminal or N-terminal cap region, or may be indirectly linked to the nucleic acid binding domain.

在某些實施例中,本發明之TALE或多肽包含嵌合DNA結合結構域。嵌合DNA結合結構域可藉由將完整TALE (包括N末端及C端加帽區)與另一TALE或非TALE DNA結合結構域(諸如鋅指(ZF)、螺旋-環-螺旋或催化不活化之DNA核酸內切酶( 例如,EcoRI、大範圍核酸酶等))融合而生成,或TALE之部分可與其他DNA結合結構域融合。嵌合結構域可具有組合兩個結構域之特異性之新穎DNA結合特異性。 In certain embodiments, the TALE or polypeptide of the present invention comprises a chimeric DNA binding domain. A chimeric DNA binding domain can be generated by fusing a complete TALE (including the N-terminal and C-terminal capping regions) to another TALE or a non-TALE DNA binding domain such as a zinc finger (ZF), a helix-loop-helix, or a catalytically inactive DNA endonuclease ( e.g. , EcoRI, meganuclease, etc.), or a portion of a TALE can be fused to other DNA binding domains. A chimeric domain can have a novel DNA binding specificity that combines the specificities of both domains.

在某些實施例中,本發明之TALE多肽包括與一或多個效應子結構域連接之核酸結合結構域。在某些實施例中,效應子結構域為切口酶或核酸酶。In certain embodiments, the TALE polypeptides of the invention include a nucleic acid binding domain linked to one or more effector domains. In certain embodiments, the effector domain is a nickase or a nuclease.

在某些實施例中,序列特異性核酸酶為鋅指核酸酶(ZFN),諸如具有鋅指(ZF)模塊陣列之人工鋅指核酸酶,以靶向標靶序列( 例如,基因體中之標靶序列或標靶位點)中之新DNA結合位點。ZF陣列中之每個鋅指模塊靶向三個DNA鹼基。個別鋅指結構域之客製化陣列經組裝成ZF蛋白(ZFP)。所得ZFP可連接至功能結構域,諸如核酸酶。 In certain embodiments, the sequence-specific nuclease is a zinc finger nuclease (ZFN), such as an artificial zinc finger nuclease having an array of zinc finger (ZF) modules to target a new DNA binding site in a target sequence ( e.g. , a target sequence or target site in a genome). Each zinc finger module in the ZF array targets three DNA bases. Customized arrays of individual zinc finger domains are assembled into ZF proteins (ZFPs). The resulting ZFPs can be linked to functional domains, such as nucleases.

ZF核酸酶(ZFN)可用作替代可程式化核酸酶,用於替代RNA引導核酸酶進行基於逆轉錄子之編輯。ZFN蛋白已廣泛描述於此項技術中,例如Carroll等人,「Genome Engineering with Zinc-Finger Nucleases,」 Genetics, 2011年8月, 第188卷: 773-782;Durai等人,「Zinc finger nucleases: custom-designed molecular scissors for genome engineering of plant and mammalian cells,」 Nucleic Acids Res, 2005, 第33卷: 5978-90;及Gaj等人, 「ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering,」 Trends Biotechnol.2013, 第31卷: 397-405,其中每一者以引用之方式整體併入本文中。ZF nucleases (ZFNs) can be used as alternative programmable nucleases to RNA-guided nucleases for retrotranscript-based editing. ZFN proteins have been extensively described in this technology, for example, Carroll et al., "Genome Engineering with Zinc-Finger Nucleases," Genetics, August 2011, Vol. 188: 773-782; Durai et al., "Zinc finger nucleases: custom-designed molecular scissors for genome engineering of plant and mammalian cells," Nucleic Acids Res, 2005, Vol. 33: 5978-90; and Gaj et al., "ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering," Trends Biotechnol. 2013, Vol. 31: 397-405, each of which is incorporated herein by reference in its entirety.

在某些實施例中,ZF連接之核酸酶為IIS型限制酶FokI之催化結構域(參見Kim 等人, PNAS U.S.A.91:883-887, 1994;Kim 等人, PNAS U.S.A. 93:1156-1160, 1996,均以引用之方式併入本文中)。 In certain embodiments, the ZF-linked nuclease is the catalytic domain of the type IIS restriction enzyme FokI (see Kim et al. , PNAS USA 91:883-887, 1994; Kim et al. , PNAS USA 93:1156-1160, 1996, each of which is incorporated herein by reference).

在某些實施例中,ZFN包含配對ZFN異二聚體,導致增加之裂解特異性及/或降低之脫靶活性。在此實施例中,異二聚體中之每個ZFN靶向由短間隔基隔開之不同核苷酸序列(參見Doyon 等人, Nat. Methods8:74-79, 2011,以引用之方式併入本文中)。 In certain embodiments, the ZFNs comprise paired ZFN heterodimers, resulting in increased cleavage specificity and/or reduced off-target activity. In this embodiment, each ZFN in the heterodimer targets a different nucleotide sequence separated by a short spacer (see Doyon et al. , Nat. Methods 8:74-79, 2011, incorporated herein by reference).

在某些實施例中,ZFN包含多核苷酸結合結構域(包含多個序列特異性ZF模塊)及多核苷酸裂解切口酶結構域。In certain embodiments, a ZFN comprises a polynucleotide binding domain (comprising multiple sequence-specific ZF modules) and a polynucleotide cleavage nickase domain.

在某些實施例中,使用兩個指模塊之文庫對ZF進行工程改造。In certain embodiments, a library of two finger templates is used to engineer ZFs.

在某些實施例中,在ZFN中使用二指單元串來改良多鋅指肽之DNA結合特異性(參見PNAS USA 98: 1437-1441,以引用之方式併入本文中)。In certain embodiments, strings of two-finger units are used in ZFNs to improve the DNA binding specificity of multi-zinc finger peptides (see PNAS USA 98: 1437-1441, incorporated herein by reference).

在某些實施例中,ZFN具有超過3個指。在某些實施例中,ZFN具有4、5或6個指。在某些實施例中,ZFN中之ZF模塊由一或多個連接體隔開以改良特異性。In some embodiments, the ZFN has more than 3 fingers. In some embodiments, the ZFN has 4, 5, or 6 fingers. In some embodiments, the ZF modules in the ZFN are separated by one or more linkers to improve specificity.

在某些實施例中,ZFN之ZF包括裂解結構域之二聚體界面中的取代,該等取代防止ZF之間之均二聚化,但允許異二聚體形成。In certain embodiments, the ZFs of the ZFNs include substitutions in the dimer interface of the cleavage domain that prevent homodimerization between the ZFs but allow heterodimer formation.

在某些實施例中,ZFN之ZF具有保留活性同時抑制均二聚化之設計。In certain embodiments, the ZFs of the ZFNs have a design that retains activity while inhibiting homodimerization.

在某些實施例中,ZFN係Carroll 等人, Genetics 188(4):773-782, 2011 (以引用之方式併入本文中)之表1中的任一ZF核酸酶。 In certain embodiments, the ZFN is any one of the ZF nucleases in Table 1 of Carroll et al ., Genetics 188(4):773-782, 2011 (incorporated herein by reference).

用於生成ZF、ZF陣列及ZFN之一般原理及指導可見於此項技術中,諸如ZFN中之ZF或ZF陣列的模塊設計(其中不同模塊可經重排且組裝成用於新標靶之新組合),如Carroll 等人, Nat. Protoc.1: 1329-1341, 2006 (以引用之方式併入本文中)中所教示;使用部分隨機文庫生成的經工程改造之ZF之新三指集合;使用基於細胞之快速方法來剖繪經工程改造之Cys2His2鋅指結構域的DNA結合特異性(參見 Nucleic Acids Res.35: e81,以引用之方式併入)。鄰近組合中表現良好之某些DNA三聯體之ZF描述於Sander 等人, 2011中。藉由相鄰組裝(CoDA)進行之無選擇鋅指核酸酶工程改造教示於 Nat. Methods8: 67-69)中。ToolGen描述其集合中在模塊組裝中表現最佳之個別指(Kim等人, 2011)。用於快速構建ZFN之預組裝鋅指陣列教示於 Nat. Methods8:7中。 General principles and guidance for generating ZFs, ZF arrays, and ZFNs can be found in this technology, such as modular design of ZFs or ZF arrays in ZFNs (where different modules can be rearranged and assembled into new combinations for new targets), as taught in Carroll et al. , Nat. Protoc. 1: 1329-1341, 2006 (incorporated herein by reference); new three-finger sets of engineered ZFs generated using partially random libraries; using a rapid cell-based method to profile the DNA binding specificity of engineered Cys2His2 zinc finger domains (see Nucleic Acids Res. 35: e81, incorporated by reference). Certain DNA triplet ZFs that perform well in neighbor assembly are described in Sander et al ., 2011. Selection-free zinc finger nuclease engineering by neighbor assembly (CoDA) is taught in Nat. Methods 8: 67-69). ToolGen describes individual fingers in its collection that perform best in modular assembly (Kim et al., 2011). Preassembled zinc finger arrays for rapid construction of ZFNs are taught in Nat. Methods 8:7.

可經調適用於本發明之額外非限制性ZF及AFNz包括以下所述之彼等:WO2010/065123、WO2000/041566、WO2003/080809、WO2015/143046、WO2016/183298、WO2013/044008、WO2015/031619、WO2017/136049、WO2016/014794、WO2017/091512、WO1995/009233、WO2000/023464、WO2000/042219、WO2002/026960、WO2001/083793;US9428756、US9145565、US8846578、US8524874、US6777185、US6599692、US7235354、US6503717、US7491531、US7943553、US7262054、US8680021、US7705139、US7273923、US6780590、US6785613、US7788044、US7177766、US6453242、US6794136、US7358085、US8383766、US7030215、US7013219、US7361635、US7939327、US8772453、US9163245、US7045304、US8313925、US9260726、US6689558、US8466267、US7253273、US7947873、US9388426、US8153399、US8569253、US8524221、US7951925、US9115409、US8772008、US9121072、US9624498、US6979539、US9491934、US6933113、US9567609、US7070934、US9624509、US8735153、US9567573、US6919204、US2002-0081614、US2004-0203064、US2006-0166263、US2006-0292621、US2003-0134318、US2006-0294617、US2007-0287189、US2007-0065931、US2003-0105593、US2003-0108880、US2009-0305402、US2008-0209587、US2013-0123484、US2004-0091991、US2009-0305977、US2008-0233641、US2014-0287500、US2011-0287512、US2009-0258363、US2013-0244332、US2007-0134796、US2010-0256221、US2005-0267061、US2012-0204282、US2012-0252122、US2010-0311124、US2016-0215298、US2008-0031109、US2014-0017214、US2015-0267205、US2004-0235002、US2004-0204345、US2015-0064789、US2006-0063231、US2011-0265198、US2017-0218349,均以引用之方式併入本文中。Additional non-limiting ZFs and AFNs that may be adapted for use in the present invention include those described in WO2010/065123, WO2000/041566, WO2003/080809, WO2015/143046, WO2016/183298, WO2013/044008, WO2015/031619, WO2017/136049, WO2016/014794, WO2017 /091512, WO1995/009233, WO2000/023464, WO2000/042219, WO2002/026960, WO2001/083793; US9428756, US9145565, US8846578, US8524874, US6777185, US659 9692, US7235354, US6503717, US749153 1. US7943553, US7262054, US8680021, US7705139, US7273923, US6780590, US6785613, US7788044, US7177766, US6453242, US6794136, US7358085, US8383766, US7 030215、US7013219、US7361635、US79 39327, US8772453, US9163245, US7045304, US8313925, US9260726, US6689558, US8466267, US7253273, US7947873, US9388426, US8153399, US8569253, US852422 1. US7951925, US9115409, US8772008, US9121072, US9624498, US6979539, US9491934, US6933113, US9567609, US7070934, US9624509, US8735153, US9567573, US6919204, US2002-0081614, US2004-020 3064、US2006-0166263、US2006-029 2621, US2003-0134318, US2006-0294617, US2007-0287189, US2007-0065931, US2003-0105593, US2003-0108880, US2009-0305402, US2008-0209587, US201 3-0123484, US2004-0091991, US2009-0305 977, US2008-0233641, US2014-0287500, US2011-0287512, US2009-0258363, US2013-0244332, US2007-0134796, US2010-0256221, US2005-0267061, US2012 -0204282, US2012-0252122, US2010-03111 24. US2016-0215298, US2008-0031109, US2014-0017214, US2015-0267205, US2004-0235002, US2004-0204345, US2015-0064789, US2006-0063231, US2011-0265198, and US2017-0218349 are all incorporated herein by reference.

本文亦提供能夠表現一或多種ZFN之多核苷酸及載體,其可為本發明之載體系統之部分。該等多核苷酸及載體可在細胞,諸如真核細胞、哺乳動物細胞或人類細胞中表現。合適載體、細胞及表現系統更詳細描述於本文中別處,且可適合與TALE、大範圍核酸酶及CRISPR-Cas核酸酶一起使用。Also provided herein are polynucleotides and vectors capable of expressing one or more ZFNs, which can be part of the vector systems of the invention. Such polynucleotides and vectors can be expressed in cells, such as eukaryotic cells, mammalian cells, or human cells. Suitable vectors, cells, and expression systems are described in more detail elsewhere herein and may be suitable for use with TALEs, meganucleases, and CRISPR-Cas nucleases.

在某些實施例中,序列特異性核酸酶為大範圍核酸酶。In certain embodiments, the sequence-specific nuclease is a meganuclease.

大範圍核酸酶為一類序列特異性核酸內切酶,其識別大DNA標靶位點(>12 bp)。此等蛋白質可裂解獨特染色體序列,而不影響總體基因體完整性。大範圍核酸酶產生位點特異性DNA DSB,且在供體DNA (諸如本發明之經工程改造之逆轉錄子所涵蓋或編碼的異源核酸中存在者)存在下,促進藉由同源重組(HR)在裂解位點處整合供體DNA。Meganucleases are a class of sequence-specific endonucleases that recognize large DNA target sites (>12 bp). These proteins can cleave unique chromosomal sequences without affecting overall genomic integrity. Meganucleases generate site-specific DNA DSBs and, in the presence of donor DNA (such as that present in a heterologous nucleic acid encompassed or encoded by an engineered retrotranscript of the invention), promote integration of the donor DNA at the cleavage site by homologous recombination (HR).

在某些實施例中,大範圍核酸酶為歸巢核酸內切酶,其為在真核生物、細菌及古細菌中發現之一類廣泛存在之蛋白質。在某些實施例中,大範圍核酸酶屬於歸巢核酸內切酶之LAGLIDADG (SEQ ID NO:19953)家族。In certain embodiments, the meganuclease is a homing endonuclease, which is a class of ubiquitous proteins found in eukaryotes, bacteria, and archaea. In certain embodiments, the meganuclease belongs to the LAGLIDADG (SEQ ID NO: 19953) family of homing endonucleases.

在某些實施例中,大範圍核酸酶為I-SceI、I-Cre-I、I-Dmol或其經工程改造或天然存在之變異體。此等蛋白質之標誌為一個或兩個複本中可見之充分保守LAGLIDADG (SEQ ID NO:19953)肽模體,稱為(do)十肽。僅具有一個此類模體之歸巢核酸內切酶(諸如I-Crel或I-Ceul)作為均二聚體發揮作用。相比之下,攜帶兩個(do)十肽模體之較大蛋白質(諸如I-Scel、Pl-Scel及I-Dmol)為單鏈蛋白質。In certain embodiments, the meganuclease is I-Scel, I-Cre-I, I-Dmol, or an engineered or naturally occurring variant thereof. The hallmark of these proteins is a well-conserved LAGLIDADG (SEQ ID NO: 19953) peptide motif found in one or two copies, referred to as a (do) decapeptide. Nesting endonucleases that have only one such motif (such as I-Crel or I-Ceu1) function as homodimers. In contrast, larger proteins that carry two (do) decapeptide motifs (such as I-Scel, Pl-Scel, and I-Dmol) are single-chain proteins.

額外歸巢核酸酶可見於homingendonuclease.net網站,該網站提供列出已知LAGLIDADG (SEQ ID NO:19953)歸巢核酸內切酶之基礎特性之資料庫。亦參見Taylor 等人, Nucleic Acids Research40 (Wl): W110-W116, 2012 (均以引用之方式併入本文中)。 Additional homing nucleases can be found at the homingendonuclease.net website, which provides a database listing the basic properties of the known LAGLIDADG (SEQ ID NO: 19953) homing endonuclease. See also Taylor et al. , Nucleic Acids Research 40 (W1): W110-W116, 2012 (both incorporated herein by reference).

在某些實施例中,藉由改變大範圍核酸酶內之胺基酸及/或藉由將其他效應子結構域與大範圍核酸酶融合來修飾大範圍核酸酶之特異性(或多核苷酸識別)。In certain embodiments, the specificity (or polynucleotide recognition) of a meganuclease is modified by changing amino acids within the meganuclease and/or by fusing additional effector domains to the meganuclease.

在某些實施例中,大範圍核酸酶為megaTAL,其包括來自TALE之DNA結合結構域。In certain embodiments, the meganuclease is a megaTAL, which includes a DNA binding domain from a TALE.

在某些實施例中,大範圍核酸酶係經工程改造以具有切口酶活性。In certain embodiments, the meganuclease is engineered to have nickase activity.

額外合適之天然及經工程改造之大範圍核酸酶及megaTAL描述於WO2006/097853、WO2004/067736、WO2012/030747、WO2007/123636、WO2010/001189、WO2018/071565、WO2007/049095、WO2009/068937、WO2005/105989、WO2008/102198、WO2007/057781、WO2019/126558、WO2010/046786、US2010-0151556、US2014-0121115、US2011-0207199、US2012-0301456、US2013-0189759、US2011-0158974、US2010-0144012、US2014-0112904、US2013-0196320、US2010-0203031、US2010-0167357、US2012-0272348、US2012-0258537、US2011-0072527、US2013-0183282、US2014-0178942、US2012-0260356、US2013-0236946、US2010-0325745、US2011-0041194、US2014-0004608、US2011-0263028、US2011-0225664、US2013-0145487、US2013-0045539、US2012-0171191、US2015-0315557、US2014-0017731、US2011-0091441、US2014-0038239、US2010-0229252、US2009-0222937、US2010-0146651、US2013-0059387、US2011-0179507、US2013-0326644、US2006-0078552、US2004-0002092、US2012-0052582、US2009-0162937、US2010-0086533、US2009-0220476、US8802437、US7842489、US8715992、US8426177、US8476072、US9365864、US9540623、US9273296、US9290748、US8163514、US8148098、US8143016、US8143015、US8133697、US8129134、US8124369、US8119361、US7897372、US9683257、US10287626、US10273524、US10000746、US10006052、US7919605、US9018364、US10407672、US8211685、US9365864、US7476500中,均以引用之方式併入本文中。Additional suitable natural and engineered meganucleases and megaTALs are described in WO2006/097853, WO2004/067736, WO2012/030747, WO2007/123636, WO2010/001189, WO2018/071565, WO2007/049095, WO2009/068937, WO2010/001189 ... O2005/105989、WO2008/102198、WO2007/057781、WO2019/126558、WO2010/046786、US2010-0151556、US2014-0121115、US2011-0207199、US2012-0301456、US 2013-0189759、 US2011-0158974, US2010-0144012, US2014-0112904, US2013-0196320, US2010-0203031, US2010-0167357, US2012-0272348, US2012-0258537, US2011-0072 527、US2013-01 83282, US2014-0178942, US2012-0260356, US2013-0236946, US2010-0325745, US2011-0041194, US2014-0004608, US2011-0263028, US2011-0225664, US201 3-0145487、US2 013-0045539, US2012-0171191, US2015-0315557, US2014-0017731, US2011-0091441, US2014-0038239, US2010-0229252, US2009-0222937, US2010-014665 1. US2013-00593 87. US2011-0179507, US2013-0326644, US2006-0078552, US2004-0002092, US2012-0052582, US2009-0162937, US2010-0086533, US2009-0220476, US88024 37. US7842489, U S8715992, US8426177, US8476072, US9365864, US9540623, US9273296, US9290748, US8163514, US8148098, US8143016, US8143015, US8133697, US8129134, US8124 369、US811 9361, US7897372, US9683257, US10287626, US10273524, US10000746, US10006052, US7919605, US9018364, US10407672, US8211685, US9365864, US7476500, all of which are incorporated herein by reference.

在某些實施例中,序列特異性核酸酶為TnpB,其為可程式化RNA引導之DNA核酸內切酶。咸信TnpB為CRISPR-Cas核酸酶之功能性祖細胞。In certain embodiments, the sequence-specific nuclease is TnpB, which is a programmable RNA-guided DNA endonuclease. It is believed that TnpB is a functional progenitor cell of CRISPR-Cas nuclease.

轉位子為可移動遺傳元件,其僅含有其轉座及其調節所需之基因。此等元件編碼tnpA轉位酶,該tnpA轉位酶為動員所必需的;且通常攜帶輔助tnpB基因,該基因對於轉座而言為非必要的。TnpB已顯示為核酸酶,該核酸酶由源自轉位子右端元件之RNA引導,裂解5'-TTGAT轉位子相關模體旁邊之DNA,且TnpB可經再編程以裂解人類細胞中之DNA標靶位點。Transposons are mobile genetic elements that contain only the genes required for their transposition and its regulation. These elements encode the tnpA transposase, which is essential for mobilization, and usually carry an auxiliary tnpB gene, which is dispensable for transposition. TnpB has been shown to be a nuclease that is guided by RNA derived from the right-hand element of the transposon, cleaves DNA next to the 5'-TTGAT transposon-associated motif, and can be reprogrammed to cleave DNA target sites in human cells.

在某些實施例中,TnpB來自IS200/IS605家族之 抗輻射奇異球菌(D. Radiodurans)ISDra2。 In certain embodiments, TnpB is from the IS200/IS605 family of D. radiodurans ISDra2.

在某些實施例中,TnpB來自轉位子PsiTn554。In certain embodiments, TnpB is from transposon PsiTn554.

在某些實施例中,序列特異性核酸酶為TnpB樣蛋白,諸如Fanzor1或Fanzor2。此等蛋白質廣泛存在於多種真核轉座元件(TE)及感染真核生物之大雙股DNA (dsDNA)病毒中。Fanzor及TnpB蛋白在其C末端半區共享相同保守胺基酸模體:D-X(125, 275)-[TS]-[TS]-X-X-[C4鋅指]-X(5,50)-RD,但其N末端區域為高變的。Fanzor1蛋白經常由來自不同超家族之DNA轉位子捕獲,包括Helitron、Mariner、IS4樣、Sola及MuDr。相比之下,Fanzor2蛋白僅出現於一些IS607型元件中。In certain embodiments, the sequence-specific nuclease is a TnpB-like protein, such as Fanzor1 or Fanzor2. These proteins are widely present in a variety of eukaryotic transposable elements (TEs) and large double-stranded DNA (dsDNA) viruses that infect eukaryotes. Fanzor and TnpB proteins share the same conserved amino acid motif in their C-terminal half: D-X(125, 275)-[TS]-[TS]-X-X-[C4 zinc finger]-X(5,50)-RD, but their N-terminal regions are highly variable. Fanzor1 proteins are frequently captured by DNA transposons from different superfamilies, including Helitron, Mariner, IS4-like, Sola, and MuDr. In contrast, Fanzor2 proteins are only found in some IS607-type elements.

在某些實施例中,序列特異性核酸酶為IscB。In certain embodiments, the sequence-specific nuclease is IscB.

ISC (插入序列Cas9樣)為一組新穎細菌及古細菌DNA轉位子,其編碼Cas9同源物。ISC轉位子編碼之含兩個核酸酶結構域之蛋白質可能為CRISPR相關Cas9之祖先。同源區包括富精胺酸螺旋及插入RuvC樣核酸酶結構域中之HNH核酸酶結構域。然而,ISC基因與Cas基因或CRISPR無關。其表示一組獨特之非自主轉位子,具有許多不同之全長ISC轉位子家族。其末端序列(詳言之,3'末端)與IS605超家族轉位子之彼等序列相似,該等轉位子由TnpA基因編碼之Y1酪胺酸轉位酶動員,且通常亦編碼含有RuvC樣核酸內切酶結構域之TnpB蛋白。ISC及IS605轉位子之末端區域含有可能由Y1轉位酶識別之回文結構。這兩組轉位子精確地插入特定4-bp標靶位點之中間或上游,而無標靶位點重複。ISC (Insertion Sequence Cas9-like) is a group of emerging bacterial and archaeal DNA transposons that encode Cas9 homologs. The protein containing two nuclease domains encoded by the ISC transposon may be the ancestor of the CRISPR-related Cas9. The homology region includes an arginine-rich helix and an HNH nuclease domain inserted into the RuvC-like nuclease domain. However, the ISC gene is not related to the Cas gene or CRISPR. It represents a unique group of non-autonomous transposons with many different full-length ISC transposon families. Its terminal sequence (in detail, the 3' end) is similar to those of the IS605 superfamily transposons, which are activated by the Y1 tyrosine translocase encoded by the TnpA gene and usually also encode TnpB proteins containing a RuvC-like nuclease domain. The terminal regions of ISC and IS605 transposons contain palindromic structures that are likely recognized by Y1 transposase. These two sets of transposons precisely insert into the middle or upstream of the specific 4-bp target site without duplication of the target site.

在某些實施例中,序列特異性核酸酶為限制性核酸內切酶(RE),諸如具有至少8 nt之嚴格/長識別序列之RE。In certain embodiments, the sequence-specific nuclease is a restriction endonuclease (RE), such as an RE having a strict/long recognition sequence of at least 8 nt.

在某些實施例中,RE係具有七個及八個鹼基對識別序列之稀切RE。例示性稀切RE酶包括NotI,其在5'-GCGGCCGC-3'序列(SEQ ID NO: 19417)之第一GC之後進行切割。In certain embodiments, the RE is a rare-cutting RE with seven and eight base pair recognition sequences. Exemplary rare-cutting RE enzymes include NotI, which cuts after the first GC of the 5'-GCGGCCGC-3' sequence (SEQ ID NO: 19417).

在某些實施例中,該系統之組分( 例如,與RT、序列特異性核酸酶及DNA修復調節生物分子複合的逆轉錄子編碼之ncRNA或msDNA)可以所謂的分裂複合物組態形成多種複合物。多種複合物可結合在一起以形成功能複合物。 In certain embodiments, the components of the system ( e.g. , ncRNA or msDNA encoded by a retrotranscript in complex with RT, sequence-specific nucleases, and DNA repair regulatory biomolecules) can form multiple complexes in a so-called split-complex configuration. Multiple complexes can be combined together to form a functional complex.

例如,在一些實施例中,該系統中之第一組分可為分裂蛋白或結構域。該分裂蛋白或結構域之片段可與該系統之第二組分締合,而該分裂蛋白或結構域之另一片段可與該系統之第三組分締合。該分裂蛋白或結構域之兩個片段可結合在一起( 例如,連同該系統之其他組分)以形成功能複合物。 For example, in some embodiments, the first component of the system can be a split protein or domain. A fragment of the split protein or domain can be combined with a second component of the system, and another fragment of the split protein or domain can be combined with a third component of the system. The two fragments of the split protein or domain can be combined together ( e.g. , together with other components of the system) to form a functional complex.

在某些實施例中,該分裂蛋白或結構域為序列特異性核酸酶, 例如CRISPR/Cas效應酶( 例如Cas蛋白,諸如Cas9或Cas12)、ZFN、TALEN、大範圍核酸酶、TnpB、IscB或限制性核酸內切酶(RE)。 In certain embodiments, the split protein or domain is a sequence-specific nuclease, such as a CRISPR/Cas effector enzyme ( e.g., a Cas protein such as Cas9 or Cas12), a ZFN, a TALEN, a meganuclease, TnpB, IscB, or a restriction endonuclease (RE).

在某些實施例中,該分裂蛋白或結構域為逆轉錄酶結構域。In certain embodiments, the split protein or domain is a reverse transcriptase domain.

在某些實施例中,該分裂蛋白或結構域為DNA修復調節生物分子。 例如,序列特異性核酸酶之第一片段可與逆轉錄酶結構域締合,且序列特異性核酸酶之第二片段可與DNA修復調節生物分子締合。該分裂蛋白或結構域之兩個片段可結合在一起( 例如,連同逆轉錄酶結構域及DNA修復調節生物分子)以形成功能複合物。該分裂蛋白或結構域之各部分之間的締合可藉由本文所述之銜接子蛋白或連接體( 例如,用於使Cas蛋白與功能結構域締合之彼等)進行。 In certain embodiments, the split protein or domain is a DNA repair regulatory biomolecule. For example, a first fragment of a sequence-specific nuclease can be conjugated to a reverse transcriptase domain, and a second fragment of a sequence-specific nuclease can be conjugated to a DNA repair regulatory biomolecule. The two fragments of the split protein or domain can be combined together ( e.g. , together with the reverse transcriptase domain and the DNA repair regulatory biomolecule) to form a functional complex. The combination between the parts of the split protein or domain can be carried out by an adaptor protein or linker described herein ( e.g. , those used to combine the Cas protein with the functional domain).

在某些實施例中,在該分裂蛋白或結構域之兩個部分實質上包含功能性分裂蛋白或結構域之意義上,該分裂蛋白或結構域為分裂的。在理想情況下,分裂應始終使催化結構域不受影響。彼分裂蛋白或結構域可充當序列特異性核酸酶或其可為死Cas,由於其催化結構域中之典型突變,該死Cas基本上為具有極少或不具有催化活性之RNA結合蛋白。In certain embodiments, the split protein or domain is split in the sense that the two parts of the split protein or domain substantially comprise a functional split protein or domain. Ideally, the split should always leave the catalytic domain unaffected. The split protein or domain may act as a sequence-specific nuclease or it may be a dead Cas, which is essentially an RNA binding protein with little or no catalytic activity due to typical mutations in its catalytic domain.

該分裂蛋白或結構域之每個片段可與二聚化搭配物融合。例如,雷帕黴素敏感性二聚化結構域使得化學誘導性分裂蛋白或結構域能夠對該分裂蛋白或結構域之活性進行時間控制。因此,該分裂蛋白或結構域可藉由分裂成兩個片段而變成化學誘導性的,且雷帕黴素敏感性二聚化結構域可用於該分裂蛋白或結構域之控制再組裝。該分裂蛋白或結構域之兩個部分可被視為該分裂蛋白或結構域之N’末端部分及C’末端部分。融合通常在該分裂蛋白或結構域之分裂點處。換言之,該分裂蛋白或結構域之N’末端部分之C’末端與二聚體一半融合,而C’末端部分之N’末端與二聚體另一半融合。Each fragment of the split protein or domain can be fused to a dimerization partner. For example, a rapamycin-sensitive dimerization domain enables a chemically induced split protein or domain to temporally control the activity of the split protein or domain. Thus, the split protein or domain can become chemically induced by splitting into two fragments, and the rapamycin-sensitive dimerization domain can be used for controlled reassembly of the split protein or domain. The two parts of the split protein or domain can be considered as the N' terminal portion and the C' terminal portion of the split protein or domain. The fusion is usually at the split point of the split protein or domain. In other words, the C' terminus of the N' terminal portion of the split protein or domain is fused to one half of the dimer, and the N' terminus of the C' terminal portion is fused to the other half of the dimer.

在新產生斷裂之意義上,該分裂蛋白或結構域未必為分裂的。分裂點通常 在電腦中設計且經選殖至構築體中。該分裂蛋白或結構域之兩個部分(N’末端及C’末端部分)一起形成完整分裂蛋白或結構域,其較佳地包含至少70%或更多野生型胺基酸(或編碼該等胺基酸之核苷酸)、至少80%或更多、至少90%或更多、至少95%或更多及至少99%或更多野生型胺基酸(或編碼該等胺基酸之核苷酸)。當兩個部分結合在一起時,所需分裂蛋白或結構域功能得以恢復或重構。二聚體可為均二聚體或異二聚體。 The split protein or domain is not necessarily split in the sense of newly creating a break. The split point is usually designed in silico and cloned into a construct. The two parts of the split protein or domain (the N' terminal and C' terminal parts) together form a complete split protein or domain, which preferably comprises at least 70% or more wild-type amino acids (or nucleotides encoding the amino acids), at least 80% or more, at least 90% or more, at least 95% or more, and at least 99% or more wild-type amino acids (or nucleotides encoding the amino acids). When the two parts are combined together, the desired split protein or domain function is restored or reconstructed. The dimer can be a homodimer or a heterodimer.

在某些實施例中,該系統之蛋白質組分( 例如,RT、序列特異性核酸酶(CRISPR/Cas效應酶、ZFN、TALEN、大範圍核酸酶、TnpB、IscB或限制性核酸內切酶(RE))及DNA修復調節生物分子)可進一步包含一或多個額外功能結構域。 In certain embodiments, the protein components of the system ( e.g. , RT, sequence-specific nuclease (CRISPR/Cas effector enzyme, ZFN, TALEN, meganuclease, TnpB, IscB or restriction endonuclease (RE)), and DNA repair regulatory biomolecule) may further comprise one or more additional functional domains.

在某些實施例中,該功能結構域包含核定位信號(NLS)。在某些實施例中,一或多個C末端或N末端NLS經連接。在某些實施例中,連接C末端NLS以用於真核細胞( 例如,人類細胞)中之表現及核靶向。在某些實施例中,NLS可在未處於C末端或N末端之位置處,例如,NLS可在兩個多肽之間。 In some embodiments, the functional domain comprises a nuclear localization signal (NLS). In some embodiments, one or more C-terminal or N-terminal NLSs are linked. In some embodiments, a C-terminal NLS is linked for expression and nuclear targeting in eukaryotic cells ( e.g. , human cells). In some embodiments, the NLS may be at a position that is not at the C-terminus or N-terminus, for example, the NLS may be between two polypeptides.

NLS之非限制性實例包括源自以下之NLS序列:SV40病毒大T抗原之NLS;來自核質蛋白之NLS ( 例如,核質蛋白二分NLS);c-myc NLS;hRNPAl M9 NLS;來自輸入蛋白-α之IBB結構域的NLS;肌瘤T蛋白之NLS;人類p53之NLS;小鼠c-abl IV之NLS;流感病毒NS1之NLS;肝炎病毒δ抗原之NLS;小鼠Mxl蛋白之NLS;人類聚(ADP-核糖)聚合酶之NLS;及類固醇激素受體(人類)糖皮質激素之NLS。例示性NLS序列包括Feng Zhang 等人(WO2016/106236)之段落[00106]中所述之彼等序列,均以引用之方式併入本文中。 Non-limiting examples of NLSs include NLS sequences derived from: NLS of SV40 virus large T antigen; NLS from nucleoplasmin ( e.g. , nucleoplasmin bipartite NLS); c-myc NLS; hRNPAl M9 NLS; NLS from the IBB domain of importin-α; NLS of myoma T protein; NLS of human p53; NLS of mouse c-abl IV; NLS of influenza virus NS1; NLS of hepatitis virus delta antigen; NLS of mouse Mxl protein; NLS of human poly (ADP-ribose) polymerase; and NLS of steroid hormone receptor (human) glucocorticoid. Exemplary NLS sequences include those described in paragraph [00106] of Feng Zhang et al . (WO2016/106236), all of which are incorporated herein by reference.

在某些實施例中,該功能結構域包含至少兩個NLS結構域。一或多個NLS結構域可位於多肽之末端處或近旁或附近,且若存在兩個或兩個以上NLS,則兩者中之每一者均可位於多肽之末端處或近旁或附近。In certain embodiments, the functional domain comprises at least two NLS domains. One or more NLS domains may be located at or near or near the end of the polypeptide, and if there are two or more NLS, each of the two may be located at or near or near the end of the polypeptide.

在任何融合蛋白中,兩個結構域(諸如RT及Cas酶,或Cas及DNA修復調節生物分子)之間之融合可藉由連接體連接。 若本文所用,「連接體」包括接合兩個蛋白質或結構域以形成融合蛋白之肽。通常,此類分子除了接合或保存蛋白質/結構域之間之一些最小距離或其他空間關係之外不具有特定生物活性。然而,在某些實施例中,可選擇連接體以影響連接體及/或融合蛋白之一些特性,諸如連接體之折疊、淨電荷或疏水性。用於本揭示案之合適連接體係熟習此項技術者熟知的,且包括但不限於直鏈或分支鏈碳連接體、雜環碳連接體或肽連接體。然而,如本文所用,該連接體亦可為共價鍵(碳-碳鍵或碳-雜原子鍵)。 In any fusion protein, the fusion between two domains (such as RT and Cas enzymes, or Cas and DNA repair regulatory biomolecules) can be connected by a linker. As used herein, "linkers" include peptides that join two proteins or domains to form a fusion protein. Typically, such molecules do not have specific biological activity other than joining or preserving some minimum distance or other spatial relationship between proteins/domains. However, in some embodiments, the linker can be selected to affect some properties of the linker and/or fusion protein, such as the folding, net charge or hydrophobicity of the linker. Suitable linkers for use in the present disclosure are well known to those skilled in the art and include, but are not limited to, linear or branched chain carbon linkers, heterocyclic carbon linkers, or peptide linkers. However, as used herein, the linker may also be a covalent bond (carbon-carbon bond or carbon-heteroatom bond).

在特定實施例中,使用該連接體來分離序列特異性核酸酶(CRISPR/Cas效應酶、ZFN、TALEN、大範圍核酸酶、TnpB、IscB或限制性核酸內切酶(RE))及RT及/或DNA修復調節生物分子,其距離足以確保每個蛋白質結構域保留其所需功能特性。In certain embodiments, the linker is used to separate a sequence-specific nuclease (CRISPR/Cas effector, ZFN, TALEN, meganuclease, TnpB, IscB, or restriction endonuclease (RE)) and a RT and/or DNA repair regulatory biomolecule at a distance sufficient to ensure that each protein domain retains its desired functional properties.

較佳肽連接體序列採用柔性延伸構形,且不展現發展有序二級結構之傾向。在某些實施例中,該連接體可為化學部分,該化學部分可為單體、二聚體、多聚體或聚合物的。在某些實施例中,該連接體包含胺基酸。柔性連接體中之典型胺基酸包括Gly、Asn及Ser。因此,在特定實施例中,該連接體包含Gly、Asn及Ser胺基酸中之一或多者的組合。其他近中性胺基酸(諸如Thr及Ala)亦可用於連接體序列中。Preferred peptide linker sequences adopt a flexible extended conformation and do not show a tendency to develop an ordered secondary structure. In certain embodiments, the linker can be a chemical moiety, which can be a monomer, dimer, multimer or polymer. In certain embodiments, the linker comprises amino acids. Typical amino acids in flexible linkers include Gly, Asn and Ser. Therefore, in specific embodiments, the linker comprises a combination of one or more of Gly, Asn and Ser amino acids. Other near-neutral amino acids (such as Thr and Ala) can also be used in linker sequences.

在某些實施例中,該連接體包含富GlySer序列,諸如G nS連接體(n = 1、2、3、4或5,諸如GS或G 4S) (SEQ ID NO:19946)或其重複(諸如1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19或20個重複,視情況具有約4-30個殘基、4-20個殘基或4-10個殘基之總長度)。 In certain embodiments, the linker comprises a GlySer-rich sequence, such as a GnS linker (n = 1, 2, 3, 4 or 5, such as GS or G4S ) (SEQ ID NO: 19946) or repeats thereof (such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 repeats, optionally with a total length of about 4-30 residues, 4-20 residues or 4-10 residues).

在某些實施例中,該連接體包含具有3、6、9或12個重複之G 4S連接體。 In some embodiments, the linker comprises a G 4 S linker having 3, 6, 9, or 12 repeats.

在某些實施例中,該連接體為以下所揭示之連接體:Maratea等人, Gene40: 39-46, 1985;Murphy等人, PNAS USA 83: 8258-62, 1986;US4,935,233;或US4,751,180,均以引用之方式併入。 In certain embodiments, the linker is a linker disclosed in Maratea et al., Gene 40: 39-46, 1985; Murphy et al., PNAS USA 83: 8258-62, 1986; US Pat. No. 4,935,233; or US Pat. No. 4,751,180, all of which are incorporated by reference.

在某些實施例中,該連接體包含GlySer連接體,諸如GGS、GGGS (SEQ ID NO:19947)、GSG、GGGGS (SEQ ID NO:19948),視情況具有3個(諸如(GGS) 3、(SEQ ID NO:19949) (GGGGS) 3) (SEQ ID NO:19950)、4、5、6、7、8、9、10、11或12個或更多個重複,以提供合適長度。 In certain embodiments, the linker comprises a GlySer linker, such as GGS, GGGS (SEQ ID NO: 19947), GSG, GGGGS (SEQ ID NO: 19948), optionally with 3 (such as (GGS) 3 , (SEQ ID NO: 19949) (GGGGS) 3 ) (SEQ ID NO: 19950), 4, 5, 6, 7, 8, 9, 10, 11 or 12 or more repeats to provide an appropriate length.

在某些實施例中,該連接體包含(GGGGS) 3-15(SEQ ID NO:19951),諸如(GGGGS) 3-11(SEQ ID NO:19951), 例如具有1、2、3、4、5、6、7、8、9、10或11個重複之GGGGS (SEQ ID NO:19951) 。 In certain embodiments, the linker comprises (GGGGS) 3-15 (SEQ ID NO: 19951), such as (GGGGS) 3-11 (SEQ ID NO: 19951), for example , having 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or 11 repeats of GGGGS (SEQ ID NO: 19951).

在某些實施例中,該連接體包含LEPGEKPYKCPECGKSFSQSGALTRHQRTHTR (SEQ ID NO: 19418)。In certain embodiments, the linker comprises LEPGEKPYKCPECGKSFSQSGALTRHQRTHTR (SEQ ID NO: 19418).

在另一實施例中,該連接體為XTEN連接體。In another embodiment, the linker is an XTEN linker.

在某些實施例中,N末端及/或C末端NLS亦充當連接體( 例如,PKKKRKVEASSPKKRKVEAS (SEQ ID NO: 19420))。 In certain embodiments, the N-terminal and/or C-terminal NLS also serves as a linker ( e.g. , PKKKRKVEASSPKKRKVEAS (SEQ ID NO: 19420)).

gRNA及各種核酸酶及其融合可以蛋白質之形式提供,視情況其中核酸酶與gRNA複合,或可由編碼RNA引導之核酸酶的核酸提供,諸如RNA ( 例如,信使RNA)或DNA (表現載體)。在一些實施例中,RNA引導之核酸酶及gRNA均由載體提供。兩者可由單一載體表現,或在不同載體上單獨表現。編碼RNA引導之核酸酶及gRNA之載體可包括於載體系統中,該載體系統包含經工程改造之逆轉錄子 msr基因、 msd基因及 ret基因序列。 gRNA and various nucleases and fusions thereof can be provided in the form of proteins, where the nuclease is complexed with the gRNA, or can be provided by nucleic acids encoding RNA-guided nucleases, such as RNA ( e.g. , messenger RNA) or DNA (expression vector). In some embodiments, RNA-guided nucleases and gRNA are both provided by vectors. Both can be expressed by a single vector, or separately on different vectors. Vectors encoding RNA-guided nucleases and gRNAs can be included in a vector system comprising engineered retrotransposons msr gene, msd gene, and ret gene sequences.

可最佳化密碼子使用,以改良特定細胞或生物體中經工程改造之逆轉錄子之產生, 例如逆轉錄子逆轉錄酶、ncRNA及/或RNA引導之核酸酶。例如,如與天然存在之多核苷酸序列相比,編碼ncRNA、RNA引導之核酸酶或逆轉錄酶之核酸可經修飾以取代特定細胞(諸如真核細胞(例如,酵母細胞、人類細胞、非人類細胞、哺乳動物細胞、囓齒動物細胞、小鼠細胞、大鼠細胞)或任何其他相關宿主細胞)中具有較高使用頻率之密碼子。當編碼逆轉錄酶或ncRNA之核酸經引入細胞中時,該蛋白質可在細胞中瞬時、條件性或組成性表現。 F. RT-PN 融合蛋白 Codon usage can be optimized to improve the production of engineered retrotranscripts, such as retrotranscriptases, ncRNAs, and/or RNA-guided nucleases, in a particular cell or organism. For example, a nucleic acid encoding an ncRNA, an RNA-guided nuclease, or a reverse transcriptase can be modified to replace codons with a higher usage frequency in a particular cell, such as a eukaryotic cell (e.g., a yeast cell, a human cell, a non-human cell, a mammalian cell, a rodent cell, a mouse cell, a rat cell, or any other relevant host cell), as compared to a naturally occurring polynucleotide sequence. When a nucleic acid encoding a reverse transcriptase or ncRNA is introduced into a cell, the protein can be expressed transiently, conditionally, or constitutively in the cell. F. RT-PN fusion protein

本文所述之基於重組逆轉錄子之編輯系統考慮了包含視情況由連接體接合的可程式化核酸酶(PN)及RT之融合蛋白。本申請案考慮將任何合適之可程式化核酸酶及RT (例如,表A之逆轉錄子RT)組合於單一融合蛋白中。在一實施例中,RT接合至PN之N末端。在另一實施例中,RT接合至PN之C末端。PN及RT之實例各自在本文中定義。The editing system based on recombinant retrotransposons described herein contemplates a fusion protein comprising a programmable nuclease (PN) and RT optionally joined by a linker. The present application contemplates combining any suitable programmable nuclease and RT (e.g., the retrotransposons RT of Table A) in a single fusion protein. In one embodiment, RT is joined to the N-terminus of PN. In another embodiment, RT is joined to the C-terminus of PN. Examples of PN and RT are each defined herein.

在各個實施例中,該等融合蛋白可包含任何合適之結構組態。例如,該融合蛋白可包含自N末端至C末端方向融合至RT之PN。在其他實施例中,該融合蛋白可包含自N末端至C末端方向融合至NP之RT。融合結構域可視情況由連接體(例如,胺基酸序列)接合。 G. 基於逆轉錄子之基因編輯系統 In various embodiments, the fusion proteins may comprise any suitable structural configuration. For example, the fusion protein may comprise PN fused to RT from the N-terminus to the C-terminus. In other embodiments, the fusion protein may comprise RT fused to NP from the N-terminus to the C-terminus. The fusion domains may be joined by a linker (e.g., an amino acid sequence) as appropriate. G. Retrotranscript-based gene editing system

本揭示案係關於包括逆轉錄子(包括逆轉錄子RT及ncRNA)之新穎基因體編輯系統。在例示性實施例中,該等編輯系統包含: (a) 一或多個逆轉錄子RT多肽序列,該等序列包含與表A之任一胺基酸序列或編碼胺基酸序列之多核苷酸序列(亦如表A中所提供)之至少45%、46%、47%、48%、49%、50%、51%、52%、53%、54%、55%、56%、57%、58%、59%、60%、61%、62%、63%、64%或65%序列一致性; (b) (a)一或多個逆轉錄子ncRNA多核苷酸序列,該等序列包含與表B之任一胺基酸序列之至少45%、46%、47%、48%、49%、50%、51%、52%、53%、54%、55%、56%、57%、58%、59%、60%、61%、62%、63%、64%或65%序列一致性,其中在一實施例中,逆轉錄子RT及ncRNA為同源對; (c) 一或多種可程式化核酸酶(例如,如本文所揭示之TnpB、Cas12a或Cas9);及 (d) 包含引導RNA之一或多個多核苷酸序列,其中引導RNA包含靶向多核苷酸序列之互補序列且其結合於可程式化核酸酶。 The present disclosure relates to novel genome editing systems including retrotranscripts (including retrotranscript RT and ncRNA). In exemplary embodiments, the editing systems include: (a) one or more retrotranscript RT polypeptide sequences, which contain at least 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64% or 65% sequence identity with any one of the amino acid sequences or polynucleotide sequences encoding the amino acid sequences in Table A (also provided in Table A); (b) (a) one or more retrotranscript ncRNA polynucleotide sequences comprising at least 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64% or 65% sequence identity to any one of the amino acid sequences of Table B, wherein in one embodiment, the retrotranscript RT and the ncRNA are a homologous pair; (c) one or more programmable nucleases (e.g., TnpB, Cas12a or Cas9 as disclosed herein); and (d) one or more polynucleotide sequences comprising a guide RNA, wherein the guide RNA comprises a complementary sequence to the targeting polynucleotide sequence and is bound to the programmable nuclease.

在其他態樣中,基於逆轉錄子之基因編輯系統可包含一或多種具有基因體修飾功能之額外輔助蛋白,包括重組酶、轉化酶、核酸酶、聚合酶、連接酶、去胺酶、逆轉錄酶或表觀遺傳修飾功能。在各個實施例中,可單獨提供輔助蛋白。在其他實施例中,輔助蛋白可視情況用連接體融合至逆轉錄子組分(例如,融合至逆轉錄子RT)。In other aspects, the gene editing system based on the retrotranscript may include one or more additional auxiliary proteins with genome modification functions, including recombinases, invertases, nucleases, polymerases, ligases, deaminases, reverse transcriptases or epigenetic modification functions. In various embodiments, the auxiliary protein can be provided separately. In other embodiments, the auxiliary protein can be fused to the retrotranscript component (e.g., fused to the retrotranscript RT) with a linker as appropriate.

在各個實施例中,該基因體編輯系統可包含引導RNA,該引導RNA與一或多個靶向多核苷酸序列雜交。在較佳實施例中,該基因體編輯系統之引導RNA包含12-40個核苷酸。In various embodiments, the genome editing system may include a guide RNA that hybridizes to one or more targeting polynucleotide sequences. In a preferred embodiment, the guide RNA of the genome editing system comprises 12-40 nucleotides.

在各個實施例中,該基因體編輯系統包含靶向多核苷酸序列,該靶向多核苷酸序列包含選自5'-TTTN-3'、5'-TTN-3'、5'-TNN-3'、5'-TTV-3'或5'- TTTV-3'之一或多個原間隔基相鄰模體(PAM)識別結構域,其中N= A、T、C或G且V = A、C或G。在額外實施例中,該靶向多核苷酸序列包含一或多個鬆弛PAM識別結構域。Jacobsen, Thomas等人 「Characterization of Cas12a nucleases reveals diverse PAM profiles between closely-related orthologs.」Nucleic acids research 第48卷,10 (2020): 5624-5638. doi:10.1093/nar/gkaa272。先前工作已證明,藉由擴展非規範PAM (諸如ATTA、CTTA、GTTA及TCTA)之靶向範圍,解決了對延伸之TTTV原間隔基相鄰模體(PAM)之需求的限制 Kleinstiver, Benjamin P等人 「Engineered CRISPR-Cas12a variants with increased activities and improved targeting ranges for gene, epigenetic and base editing.」 Nature biotechnology第37卷,3 (2019): 276-282. doi:10.1038/s41587-018-0011-0。大多數Cpf1核酸酶需要富胸腺嘧啶PAM。不同研究已證明,使用 活體外活體內( 大腸桿菌)PAM鑑定分析,增加了Cpf1靶向範圍。Zhang, Xiaochun等人 「Multiplex gene regulation by CRISPR-ddCpf1.」 Cell discovery3.1 (2017): 1-9。兩種Cpf1核酸內切酶AsCpf1及LbCpf1需要TTTV作為PAM序列,其中V可為A、C或G核苷酸。位置S542R/K607R及S542R/K548V/N552R處之突變產生AsCpf1變異體,且此等變異體能夠分別識別TYCV及TATV PAM,其中Y可為C或T。Gao, Linyi等人 「Engineered Cpf1 variants with altered PAM specificities.」 Nature biotechnology35.8 (2017): 789-792。AsCpf1顯示對TTTV PAM之活性增加,且對TTTT PAM之活性降低 Kim, Hui K.等人 「In vivo high-throughput profiling of CRISPR–Cpf1 activity.」 Nature methods14.2 (2017): 153-159。 In various embodiments, the genome editing system comprises a targeting polynucleotide sequence comprising one or more protospacer adjacent motif (PAM) recognition domains selected from 5'-TTTN-3', 5'-TTN-3', 5'-TNN-3', 5'-TTV-3' or 5'-TTTV-3', wherein N = A, T, C or G and V = A, C or G. In additional embodiments, the targeting polynucleotide sequence comprises one or more relaxed PAM recognition domains. Jacobsen, Thomas et al. "Characterization of Cas12a nucleases reveals diverse PAM profiles between closely-related orthologs." Nucleic acids research Vol. 48, 10 (2020): 5624-5638. doi:10.1093/nar/gkaa272. Previous work has demonstrated that the requirement for an extended TTTV protospacer base-adjacent motif (PAM) was overcome by expanding the targeting range of non-canonical PAMs such as ATTA, CTTA, GTTA, and TCTA. Kleinstiver, Benjamin P, et al. “Engineered CRISPR-Cas12a variants with increased activities and improved targeting ranges for gene, epigenetic and base editing.” Nature biotechnology, vol. 37, 3 (2019): 276-282. doi:10.1038/s41587-018-0011-0. Most Cpf1 nucleases require a thymine-rich PAM. Different studies have demonstrated an increase in the Cpf1 targeting range using in vitro and in vivo ( E. coli) PAM characterization assays. Zhang, Xiaochun et al. "Multiplex gene regulation by CRISPR-ddCpf1." Cell discovery 3.1 (2017): 1-9. Two Cpf1 endonucleases, AsCpf1 and LbCpf1, require TTTV as the PAM sequence, where V can be an A, C, or G nucleotide. Mutations at positions S542R/K607R and S542R/K548V/N552R generate AsCpf1 variants that recognize TYCV and TATV PAMs, respectively, where Y can be C or T. Gao, Linyi et al. "Engineered Cpf1 variants with altered PAM specificities." Nature biotechnology 35.8 (2017): 789-792. AsCpf1 shows increased activity toward TTTV PAM and decreased activity toward TTTT PAM Kim, Hui K. et al. “In vivo high-throughput profiling of CRISPR–Cpf1 activity.” Nature methods 14.2 (2017): 153-159.

因此,設計該編輯系統來識別用於基因體編輯之改變之PAM識別結構域係在本揭示案之範圍內。在較佳實施例中,可程式化多肽組分識別靶向多核苷酸序列中之一或多個非規範PAM序列,該PAM在非標靶股上之crRNA互補DNA序列上游。在相關實施例中,gRNA具有八個核苷酸之種子序列,該種子序列位於間隔基之5'端,且鄰近靶向多核苷酸序列上之PAM序列。Therefore, it is within the scope of the present disclosure to design the editing system to recognize altered PAM recognition domains for genome editing. In preferred embodiments, the programmable polypeptide assembly recognizes one or more non-canonical PAM sequences in a targeted polynucleotide sequence, the PAM being upstream of a crRNA complementary DNA sequence on a non-target strand. In a related embodiment, the gRNA has an eight nucleotide seed sequence located 5' to the spacer and adjacent to a PAM sequence on a targeted polynucleotide sequence.

在進一步實施例中,一或多個多肽序列及一或多個包含該基因體編輯系統之同源引導RNA之多核苷酸序列形成核糖核蛋白複合物。In further embodiments, one or more polypeptide sequences and one or more polynucleotide sequences comprising a cognate guide RNA of the genome editing system form a ribonucleoprotein complex.

在其中可程式化核酸酶為V型酶(例如Cas12a)之各個實施例中,該基因體編輯系統之一或多個多肽序列包含: ●   一或多個α-螺旋識別葉(REC)及核酸酶葉(NUC); ●   楔形(WED)、α-螺旋識別葉(REC)、PAM相互作用(PI)、 ●   RuvC核酸酶、橋螺旋(BH)及NUC結構域;或 ●   選自RuvC、REC、WED、BH、PI及NUC結構域之一或多個結構域。 In various embodiments in which the programmable nuclease is a V-type enzyme (e.g., Cas12a), one or more polypeptide sequences of the genome editing system include: ●   One or more α-helical recognition lobe (REC) and nuclease lobe (NUC); ●   Wedge (WED), α-helical recognition lobe (REC), PAM interaction (PI), ●   RuvC nuclease, bridge helix (BH), and NUC domains; or ●   One or more domains selected from RuvC, REC, WED, BH, PI, and NUC domains.

在其中可程式化核酸酶為II型酶(例如Cas9)之各個實施例中,該基因體編輯系統之一或多個多肽序列包含: ●     α-螺旋葉,及 ●     包含兩個核酸酶結構域(裂解非標靶DNA股之RuvC結構域及裂解標靶股之HNH結構域)之核酸酶葉。 In various embodiments in which the programmable nuclease is a type II enzyme (e.g., Cas9), one or more polypeptide sequences of the genome editing system comprises: ●     an α-helical lobe, and ●     a nuclease lobe comprising two nuclease domains (a RuvC domain that cleaves non-target DNA strands and a HNH domain that cleaves target strands).

在各個實施例中,可程式化核酸酶組分之分子量的特徵在於其分子量為約50 kDa - 100 kDa、100 kDa - 200 kDa、200 kDa - 500 kDa。In various embodiments, the molecular weight of the programmable nuclease component is characterized by a molecular weight of about 50 kDa - 100 kDa, 100 kDa - 200 kDa, 200 kDa - 500 kDa.

在額外實施例中,多肽序列包含選自以下之至少一種活性:核酸內切酶活性;核糖核酸內切酶活性,或RNA引導之DNase活性。在此類實施例中,同源引導RNA及Cas12a蛋白修飾宿主細胞基因體之靶向多核苷酸序列。在某些情況下,藉由在宿主細胞基因體中之靶向多核苷酸序列處插入、缺失或改變一或多個鹼基對來修飾靶向多核苷酸序列。In additional embodiments, the polypeptide sequence comprises at least one activity selected from the following: endonuclease activity; endoribonuclease activity, or RNA-guided DNase activity. In such embodiments, the homologous guide RNA and Cas12a protein modify the target polynucleotide sequence of the host cell genome. In some cases, the target polynucleotide sequence is modified by inserting, deleting or changing one or more base pairs at the target polynucleotide sequence in the host cell genome.

在相關實施例中,該基因體編輯系統之特徵在於增強之定點整合之效率及精確度。較佳地,藉由供體核酸序列上之交錯懸垂來增強由基因體編輯系統實現之定點整合之效率及精確度。在某些實施例中,靶向多核苷酸序列為雙股的且含有5'懸垂,其中該懸垂較佳地包含五個核苷酸。In related embodiments, the genome editing system is characterized by enhanced efficiency and accuracy of site-directed integration. Preferably, the efficiency and accuracy of site-directed integration achieved by the genome editing system is enhanced by staggered overhangs on the donor nucleic acid sequence. In certain embodiments, the targeting polynucleotide sequence is double-stranded and contains a 5' overhang, wherein the overhang preferably comprises five nucleotides.

在各個其他實施例中,該基因體編輯系統之多肽包含一或多種突變。更佳地,突變編碼核酸酶缺陷型多肽。在各個實施例中,該基因體編輯系統包含一或多種去胺酶與核酸酶缺陷型多肽之融合。較佳地,該基因體編輯系統之一或多種去胺酶選自腺嘌呤去胺酶或胞嘧啶去胺酶。胞苷去胺酶及腺苷去胺酶鹼基編輯之使用揭示於美國專利第9,840,699號中。一種方法係產生Cas12a融合蛋白(較佳地,無活性或切口酶變異體)及鹼基編輯酶或鹼基編輯酶之活性結構域。胞苷去胺酶及腺苷去胺酶鹼基編輯揭示於美國專利第9,840,699號中。在各個實施例中,該等組合物包括使靶向多核苷酸序列與包含Cas12a及一或多種鹼基編輯多肽(諸如去胺酶)之融合蛋白接觸;以及將該融合蛋白靶向DNA股之靶向多核苷酸序列的gRNA。因此,一或多種去胺酶與Cas12a基因體編輯系統之核酸酶缺陷型多肽之融合使得能夠對DNA及/或RNA進行鹼基編輯。在所選實施例中,該系統修飾DNA及RNA上之一或多個核鹼基。在相關實施例中,該系統使得能夠進行多重基因編輯。較佳地,該基因體編輯系統包含單一crRNA。更佳地,該系統使得能夠同時靶向多種基因。In various other embodiments, the polypeptide of the genome editing system comprises one or more mutations. More preferably, the mutation encodes a nuclease-deficient polypeptide. In various embodiments, the genome editing system comprises a fusion of one or more deaminases and a nuclease-deficient polypeptide. Preferably, one or more deaminases of the genome editing system are selected from adenine deaminases or cytosine deaminases. The use of cytidine deaminases and adenosine deaminases alkali editing is disclosed in U.S. Patent No. 9,840,699. One method is to produce a Cas12a fusion protein (preferably, an inactive or nickase variant) and an alkali editor or an active domain of an alkali editor. Cytidine deaminase and adenosine deaminase base editing are disclosed in U.S. Patent No. 9,840,699. In various embodiments, the compositions include contacting a targeting polynucleotide sequence with a fusion protein comprising Cas12a and one or more base editing polypeptides (such as deaminase); and a gRNA that targets the fusion protein to a targeting polynucleotide sequence of a DNA strand. Therefore, the fusion of one or more deaminases with the nuclease-deficient polypeptide of the Cas12a genome editing system enables base editing of DNA and/or RNA. In selected embodiments, the system modifies one or more nucleobases on DNA and RNA. In related embodiments, the system enables multiple gene editing. Preferably, the genome editing system comprises a single crRNA. Preferably, the system enables the simultaneous targeting of multiple genes.

在其他實施例中,可程式化多肽可操作地連接至核定位信號(NLS)。較佳地,可程式化多肽包含N末端或C末端或兩者上之NLS或Cas12a多肽上之多個NLS。在一些實施例中,連接至NLS之多肽進一步包含crRNA以形成核糖核蛋白複合物。在一些實施例中,多肽包含該多肽之N末端或C末端之一或多個NLS重複。In other embodiments, the programmable polypeptide is operably linked to a nuclear localization signal (NLS). Preferably, the programmable polypeptide comprises an NLS at the N-terminus or C-terminus or both or multiple NLSs on the Cas12a polypeptide. In some embodiments, the polypeptide connected to the NLS further comprises crRNA to form a ribonucleoprotein complex. In some embodiments, the polypeptide comprises one or more NLS repeats at the N-terminus or C-terminus of the polypeptide.

在所選實施例中,該基因體編輯系統之一或多個多肽序列包含修飾,其中該修飾包含核酸酶缺陷型多肽(dCas)。在相關實施例中,該基因體編輯系統之引導RNA包含引發編輯引導RNA (pegRNA)。較佳地,該基因體編輯系統之pegRNA與靶向多核苷酸序列雜交且充當一或多種逆轉錄酶之引子。更佳地,該基因體編輯系統之pegRNA結合於切口股,以使用修復模板藉由逆轉錄酶起始修復。In selected embodiments, one or more polypeptide sequences of the genome editing system comprises a modification, wherein the modification comprises a nuclease-deficient polypeptide (dCas). In related embodiments, the guide RNA of the genome editing system comprises a priming editing guide RNA (pegRNA). Preferably, the pegRNA of the genome editing system hybridizes with the target polynucleotide sequence and acts as a primer for one or more reverse transcriptases. More preferably, the pegRNA of the genome editing system binds to the nicked strand to initiate repair by a reverse transcriptase using a repair template.

在各個額外實施例中,該基因體編輯系統之核酸酶缺陷型多肽包含切口酶活性。較佳地,該基因體編輯系統包含一或多種逆轉錄酶與核酸酶缺陷型Cas (dCas)之融合。在某些實例中,一或多種逆轉錄酶之融合選自 視情況選用之組分/修飾 供體模板 In various additional embodiments, the nuclease-deficient polypeptide of the genome editing system comprises nickase activity. Preferably, the genome editing system comprises a fusion of one or more reverse transcriptases and a nuclease-deficient Cas (dCas). In certain embodiments, the fusion of one or more reverse transcriptases is selected from a component/modification donor template selected as appropriate .

在一實施例中,本文中之組合物及系統可進一步包含一或多種用於編輯之供體模板。在一些情況下,供體模板可包含一或多種多核苷酸。在某些情況下,供體模板可包含一或多種多核苷酸之編碼序列。供體模板可為DNA模板。其可為單股或雙股的。其亦可為環狀單股或雙股的。其亦可為線性單股或雙股的。不受理論束縛,供體模板可在由本文所述之可程式化核酸酶切割器藉由包括HDR及NHEJ之細胞修復機器進行靶向切割後整合至基因體中。在各個實施例中,HDR供體由ncRNA之逆轉錄形成且形成其RT-DNA部分。In one embodiment, the compositions and systems herein may further include one or more donor templates for editing. In some cases, the donor template may include one or more polynucleotides. In some cases, the donor template may include the coding sequence of one or more polynucleotides. The donor template may be a DNA template. It may be single-stranded or double-stranded. It may also be circular single-stranded or double-stranded. It may also be linear single-stranded or double-stranded. Without being bound by theory, the donor template may be integrated into the genome after targeted cleavage by the programmable nuclease cutter described herein through the cell repair machinery including HDR and NHEJ. In various embodiments, the HDR donor is formed by reverse transcription of ncRNA and forms its RT-DNA portion.

可整合至ncRNA中且表現為RT-DNA產物之供體模板可用於編輯標靶多核苷酸。在一些情況下,供體多核苷酸包含欲引入標靶多核苷酸中之一或多種突變。此類突變之實例包括取代、缺失、插入或其組合。突變可引起標靶多核苷酸上之開放閱讀框移位。在一些情況下,供體模板會改變標靶多核苷酸中之終止密碼子。例如,供體模板可校正早熟終止密碼子。可藉由缺失終止密碼子或向終止密碼子引入一或多種突變來實現校正。在其他例示性實施例中,供體模板藉由插入或恢復基因之功能複本或其功能片段或者功能調節序列或調節序列之功能片段來解決可例如出現於某些疾病背景中的功能突變、缺失或易位之損失。功能片段係指藉由提供足以恢復野生型基因或非編碼調節序列(例如,編碼長非編碼RNA之序列)之功能性的核苷酸序列實現之少於基因之完整複本。在某些例示性實施例中,本文所揭示之系統可用於置換缺陷基因或其缺陷片段之單一等位基因。在另一例示性實施例中,本文所揭示之系統可用於置換缺陷基因或缺陷基因片段之兩個等位基因。「缺陷基因」或「缺陷基因片段」為基因或基因之一部分,其在表現時無法生成具有相應野生型基因之功能性的功能蛋白或非編碼RNA。在某些例示性實施例中,此等缺陷基因可能與一或多種疾病表型相關。在某些例示性實施例中,不置換缺陷基因或基因片段,但使用本文所述之系統來插入編碼補償或超越缺陷基因表現之基因或基因片段之供體模板,使得消除與缺陷基因表現相關之細胞表型或改變為不同或所需細胞表型。Donor templates that can be integrated into ncRNA and expressed as RT-DNA products can be used to edit target polynucleotides. In some cases, the donor polynucleotides include one or more mutations to be introduced into the target polynucleotides. Examples of such mutations include substitutions, deletions, insertions, or combinations thereof. Mutations can cause open reading frame shifts on target polynucleotides. In some cases, the donor templates change the stop codons in the target polynucleotides. For example, the donor template can correct premature stop codons. Correction can be achieved by deleting the stop codon or introducing one or more mutations into the stop codon. In other exemplary embodiments, the donor template solves the loss of functional mutations, deletions, or translocations that may occur, for example, in certain disease backgrounds by inserting or restoring a functional copy of a gene or a functional fragment thereof, or a functional regulatory sequence or a functional fragment of a regulatory sequence. A functional fragment refers to a less than complete copy of a gene achieved by providing a nucleotide sequence sufficient to restore the functionality of a wild-type gene or a non-coding regulatory sequence (e.g., a sequence encoding a long non-coding RNA). In certain exemplary embodiments, the system disclosed herein can be used to replace a single allele of a defective gene or a defective fragment thereof. In another exemplary embodiment, the system disclosed herein can be used to replace two alleles of a defective gene or a defective gene fragment. A "defective gene" or "defective gene fragment" is a gene or a portion of a gene that, when expressed, is unable to generate a functional protein or non-coding RNA having the functionality of the corresponding wild-type gene. In certain exemplary embodiments, these defective genes may be associated with one or more disease phenotypes. In certain exemplary embodiments, the defective gene or gene fragment is not replaced, but the system described herein is used to insert a donor template encoding a gene or gene fragment that compensates or exceeds the expression of the defective gene, so that the cell phenotype associated with the defective gene expression is eliminated or changed to a different or desired cell phenotype.

在本發明之一實施例中,供體模板可包括但不限於基因或基因片段、編碼蛋白或欲表現之RNA轉錄本、調節元件、修復模板及其類似物。根據本發明,供體模板可包含與介導插入之轉座組分一起發揮作用的左端及右端序列元件。In one embodiment of the present invention, the donor template may include but is not limited to a gene or gene fragment, an RNA transcript encoding a protein or to be expressed, a regulatory element, a repair template and the like. According to the present invention, the donor template may include left and right sequence elements that function together with the transposition component mediating insertion.

在某些情況下,供體模板操縱標靶多核苷酸上之剪接位點。在一些實例中,供體模板破壞剪接位點。可藉由將多核苷酸插入剪接位點中及/或將一或多種突變引入剪接位點中來實現破壞。在某些實例中,供體模板可恢復剪接位點。例如,多核苷酸可包含剪接位點序列。In some cases, the donor template manipulates a splice site on the target polynucleotide. In some instances, the donor template disrupts the splice site. Disruption can be achieved by inserting a polynucleotide into the splice site and/or introducing one or more mutations into the splice site. In some instances, the donor template can restore the splice site. For example, the polynucleotide can include a splice site sequence.

欲插入之供體模板可具有10個鹼基對或核苷酸至50 kb長之大小,例如50至40 k、100及30 k、100至10000、100至300、200至400、300至500、400至600、500至700、600至800、700至900、800至1000、900至1100、1000至1200、1100至1300、1200至1400、1300至1500、1400至1600、1500至1700、600至1800、1700至1900、1800至2000個鹼基對(bp)或核苷酸長。The donor template to be inserted can have a size of 10 base pairs or nucleotides to 50 kb in length, for example, 50 to 40 k, 100 and 30 k, 100 to 10000, 100 to 300, 200 to 400, 300 to 500, 400 to 600, 500 to 700, 600 to 800, 700 to 900, 800 to 1000, 900 to 1100, 1000 to 1200, 1100 to 1300, 1200 to 1400, 1300 to 1500, 1400 to 1600, 1500 to 1700, 600 to 1800, 1700 to 1900, 1800 to 2000 base pairs (bp) or nucleotides in length.

在一些實施例中,異源核酸序列係可經由HDR整合至宿主基因體中之供體DNA模板。在其他實施例中,異源核酸序列係可經由NHEJ整合至宿主基因體中之供體DNA模板。In some embodiments, the heterologous nucleic acid sequence is a donor DNA template that can be integrated into the host genome via HDR. In other embodiments, the heterologous nucleic acid sequence is a donor DNA template that can be integrated into the host genome via NHEJ.

在某些實施例中,異源核酸包含或編碼供體/模板序列,其中供體/模板校正/修復/移除標靶基因體位點處之突變。例如,突變可為疾病基因中之突變型外顯子。In certain embodiments, the heterologous nucleic acid comprises or encodes a donor/template sequence, wherein the donor/template corrects/repairs/removes a mutation at a target genome site. For example, the mutation may be a mutant exon in a disease gene.

在某些實施例中,供體/模板可編碼或包含功能性DNA元件,諸如啟動子、增強子、蛋白質結合序列、甲基化位點或用於輔助基因編輯之同源區等。In certain embodiments, the donor/template may encode or contain functional DNA elements, such as promoters, enhancers, protein binding sequences, methylation sites, or homology regions for assisting gene editing.

「供體DNA」或「供體DNA模板」意謂欲插入由基因編輯核酸酶(例如,Cas9或Cas12a核酸酶)裂解之位點處的DNA區段(可為單股或雙股DNA) (例如,在dsDNA裂解之後、在對標靶DNA刻切口之後、在對標靶DNA雙重刻切口之後及其類似情形)。供體DNA模板可含有與標靶位點處之基因體序列的足夠同源性,例如與側接標靶位點之核苷酸序列的70%、80%、85%、90%、95%或100%同源性,例如在標靶位點之約50個鹼基或更少鹼基內,例如在約30個鹼基內、在約15個鹼基內、在約10個鹼基內、在約5個鹼基內或直接側接標靶位點,以支持該供體DNA模板與與其攜帶同源性之基因體序列之間的同源定向修復。在藉由NHEJ修復之情況下,供體DNA模板不需要針對其靶向編輯之位點之同源性。供體模板可整合至逆轉錄子之ncRNA中,且以RT-DNA形式表現為RT產物。"Donor DNA" or "donor DNA template" means a DNA segment (which can be single-stranded or double-stranded DNA) to be inserted at a site to be cleaved by a gene-editing nuclease (e.g., Cas9 or Cas12a nuclease) (e.g., after dsDNA cleavage, after nicking of the target DNA, after double nicking of the target DNA, and the like). The donor DNA template may contain sufficient homology to the genome sequence at the target site, such as 70%, 80%, 85%, 90%, 95% or 100% homology to the nucleotide sequence flanking the target site, such as within about 50 bases or less of the target site, such as within about 30 bases, within about 15 bases, within about 10 bases, within about 5 bases or directly flanking the target site to support homology-directed repair between the donor DNA template and the genome sequence carrying homology thereto. In the case of repair by NHEJ, the donor DNA template does not require homology to the site of its targeted editing. The donor template may be integrated into the ncRNA of the retrotranscript and appear as an RT product in the form of RT-DNA.

供體DNA模板與基因體序列之間大約25、50、100或200個核苷酸或超過200個核苷酸之序列同源性(或10與200個核苷酸或更多核苷酸之間之任何整數值)可支持同源定向修復。供體DNA模板可具有任何長度,例如50個核苷酸或更多、100個核苷酸或更多、250個核苷酸或更多、500個核苷酸或更多、1000個核苷酸或更多、5000個核苷酸或更多等。合適供體DNA模板可為50個核苷酸至100個核苷酸、100個核苷酸至500個核苷酸、500個核苷酸至1000個核苷酸、1000個核苷酸至5000個核苷酸或5000個核苷酸至10,000個核苷酸或超過10,000個核苷酸長。Sequence homology of about 25, 50, 100 or 200 nucleotides or more (or any integer value between 10 and 200 nucleotides or more) between the donor DNA template and the genome sequence can support homology-directed repair. The donor DNA template can have any length, such as 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 500 nucleotides or more, 1000 nucleotides or more, 5000 nucleotides or more, etc. Suitable donor DNA templates can be 50 nucleotides to 100 nucleotides, 100 nucleotides to 500 nucleotides, 500 nucleotides to 1000 nucleotides, 1000 nucleotides to 5000 nucleotides, or 5000 nucleotides to 10,000 nucleotides or more than 10,000 nucleotides in length.

如上所述,在一些實施例中,供體DNA模板包含第一同源臂及第二同源臂。第一同源臂位於或接近供體DNA之5’端;且包含與標靶核酸中之第一核苷酸序列至少部分地互補之核苷酸序列。第二同源臂位於或接近供體DNA之3’端;且包含與標靶核酸中之第二核苷酸序列至少部分地互補之核苷酸序列。第一及第二同源臂可各自獨立地具有約10個核苷酸至400個核苷酸之長度;例如,10個核苷酸(nt)至15 nt、15 nt至20 nt、20 nt至25 nt、25 nt至30 nt、30 nt至35 nt、35 nt至40 nt、40 nt至45 nt、45 nt至50 nt、50 nt至75 nt、75 nt至100 nt、100 nt至125 nt、125 nt至150 nt、150 nt至175 nt、175 nt至200 nt、200 nt至225 nt、225 nt至250 nt、250 nt至275 nt、275 nt至300 nt、325 nt至350 nt、350 nt至375 nt或375 nt至400 nt。As described above, in some embodiments, the donor DNA template comprises a first homology arm and a second homology arm. The first homology arm is located at or near the 5' end of the donor DNA; and comprises a nucleotide sequence that is at least partially complementary to the first nucleotide sequence in the target nucleic acid. The second homology arm is located at or near the 3' end of the donor DNA; and comprises a nucleotide sequence that is at least partially complementary to the second nucleotide sequence in the target nucleic acid. The first and second homology arms can each independently have a length of about 10 nucleotides to 400 nucleotides; for example, 10 nucleotides (nt) to 15 nt, 15 nt to 20 nt, 20 nt to 25 nt, 25 nt to 30 nt, 30 nt to 35 nt, 35 nt to 40 nt, 40 nt to 45 nt, 45 nt to 50 nt, 50 nt to 75 nt, 75 nt to 100 nt, 100 nt to 125 nt, 125 nt to 150 nt, 150 nt to 175 nt, 175 nt to 200 nt, 200 nt to 225 nt, 225 nt to 250 nt, 250 nt to 275 nt, 275 nt to 300 nt, 325 nt to 350 nt, 350 nt to nt to 375 nt or 375 nt to 400 nt.

在某些實施例中,供體DNA模板用於編輯標靶核苷酸序列。在某些實施例中,供體DNA模板包含欲引入標靶多核苷酸中之一或多種突變。此類突變之實例包括取代、缺失、插入或其組合。在某些實施例中,突變引起標靶多核苷酸上之開放閱讀框移位。在某些實施例中,供體多核苷酸改變標靶多核苷酸中之終止密碼子。在某些實施例中,供體多核苷酸校正早熟終止密碼子。可藉由缺失終止密碼子或藉由引入一或多個序列變化以將終止密碼子改變為密碼子來實現校正。在某些實施例中,供體多核苷酸藉由插入或恢復基因之功能複本或其功能片段或者功能調節序列或調節序列之功能片段來解決可例如出現於某些疾病背景中的功能突變、缺失或易位之損失。功能片段包括少於基因之完整複本但以其他方式提供足以恢復野生型基因或非編碼調節序列( 例如,編碼長非編碼RNA之序列)之功能性的核苷酸序列之片段。 In certain embodiments, the donor DNA template is used to edit the target nucleotide sequence. In certain embodiments, the donor DNA template comprises one or more mutations to be introduced into the target polynucleotide. Examples of such mutations include substitutions, deletions, insertions, or combinations thereof. In certain embodiments, the mutation causes an open reading frame shift on the target polynucleotide. In certain embodiments, the donor polynucleotide changes the stop codon in the target polynucleotide. In certain embodiments, the donor polynucleotide corrects a premature stop codon. Correction can be achieved by deleting the stop codon or by introducing one or more sequence changes to change the stop codon to a codon. In certain embodiments, the donor polynucleotide addresses loss of function mutations, deletions or translocations that may occur, for example, in certain disease settings by inserting or restoring a functional copy of a gene or a functional fragment thereof, or a functional regulatory sequence or a functional fragment of a regulatory sequence. Functional fragments include fragments of nucleotide sequence that are less than a complete copy of a gene but that otherwise provide sufficient nucleotide sequence to restore the functionality of a wild-type gene or non-coding regulatory sequence ( e.g. , a sequence encoding a long non-coding RNA).

在某些實施例中,供體DNA模板可用於置換缺陷基因或其缺陷片段之單一等位基因。在另一實施例中,供體DNA模板用於置換缺陷基因或缺陷基因片段之兩個等位基因。「缺陷基因」或「缺陷基因片段」為基因或基因之一部分,其在表現時無法生成具有相應野生型基因之功能性的功能蛋白或非編碼RNA。In some embodiments, the donor DNA template can be used to replace a single allele of a defective gene or a defective fragment thereof. In another embodiment, the donor DNA template is used to replace both alleles of a defective gene or a defective gene fragment. A "defective gene" or "defective gene fragment" is a gene or a portion of a gene that, when expressed, fails to generate a functional protein or non-coding RNA having the functionality of the corresponding wild-type gene.

在某些例示性實施例中,此等缺陷基因可能與一或多種疾病表型相關。在某些例示性實施例中,不置換缺陷基因或基因片段,但使用異源核酸來插入編碼補償或超越缺陷基因表現之基因或基因片段之供體多核苷酸,使得消除與缺陷基因表現相關之細胞表型或改變為不同或所需細胞表型。這可藉由包括治療蛋白(諸如治療抗體或其功能片段,或與一或多種疾病表型相關之缺陷蛋白的野生型形式)之編碼序列來實現。In certain exemplary embodiments, these defective genes may be associated with one or more disease phenotypes. In certain exemplary embodiments, the defective gene or gene fragment is not replaced, but a donor polynucleotide encoding a gene or gene fragment that compensates or exceeds the expression of the defective gene is inserted using a heterologous nucleic acid, so that the cell phenotype associated with the expression of the defective gene is eliminated or changed to a different or desired cell phenotype. This can be achieved by including a coding sequence for a therapeutic protein (such as a therapeutic antibody or a functional fragment thereof, or a wild-type form of a defective protein associated with one or more disease phenotypes).

在某些實施例中,供體可包括但不限於基因或基因片段、編碼蛋白或欲表現之RNA轉錄本、調節元件、修復模板及其類似物。根據本發明,供體多核苷酸可包含與介導插入之轉座組分一起發揮作用的左端及右端序列元件。In certain embodiments, the donor may include, but is not limited to, a gene or gene fragment, an RNA transcript encoding a protein or to be expressed, a regulatory element, a repair template, and the like. According to the present invention, the donor polynucleotide may include left and right sequence elements that function with the transposition component mediating insertion.

在某些實施例中,供體DNA模板操縱標靶多核苷酸上之剪接位點。在某些實施例中,供體DNA模板破壞剪接位點。可藉由將多核苷酸插入剪接位點中及/或將一或多種突變引入剪接位點中來實現破壞。在某些實施例中,供體多核苷酸可恢復剪接位點。例如,多核苷酸可包含剪接位點序列。In some embodiments, the donor DNA template manipulates the splice site on the target polynucleotide. In some embodiments, the donor DNA template disrupts the splice site. Disruption can be achieved by inserting a polynucleotide into the splice site and/or introducing one or more mutations into the splice site. In some embodiments, the donor polynucleotide can restore the splice site. For example, the polynucleotide can include a splice site sequence.

在某些實施例中,欲插入之供體DNA模板具有10 bp至50 kb長之大小, 例如50 bp至約40 kb、100 bp至約30 kb、100 bp至約10 kb、100 bp至300 bp、200 bp至400 bp、300 bp至500 bp、400 bp至600 bp、500 bp至700 bp、600 bp至800 bp、700 bp至900 bp、800 bp至1000 bp、900 bp至1100 bp、1000 bp至1200 bp、1100 bp至1300 bp、1200 bp至1400 bp、1300 bp至1500 bp、1400 bp至1600 bp、1500 bp至1700 bp、1600 bp至1800 bp、1700 bp至1900 bp、1800 bp至2000 bp核苷酸長。 In certain embodiments, the donor DNA template to be inserted has a size of 10 bp to 50 kb in length, such as 50 bp to about 40 kb, 100 bp to about 30 kb, 100 bp to about 10 kb, 100 bp to 300 bp, 200 bp to 400 bp, 300 bp to 500 bp, 400 bp to 600 bp, 500 bp to 700 bp, 600 bp to 800 bp, 700 bp to 900 bp, 800 bp to 1000 bp, 900 bp to 1100 bp, 1000 bp to 1200 bp, 1100 bp to 1300 bp, 1200 bp to 1400 bp, 1300 bp to 1500 bp, 1400 bp to 1600 bp, or bp, 1500 bp to 1700 bp, 1600 bp to 1800 bp, 1700 bp to 1900 bp, 1800 bp to 2000 bp nucleotide length.

在某些實施例中,欲插入序列一端或兩端之同源臂獨立地為約20 bp、40 bp、60 bp、80 bp、100 bp、120 bp或150 bp。In certain embodiments, the homology arms at one or both ends of the sequence to be inserted are independently about 20 bp, 40 bp, 60 bp, 80 bp, 100 bp, 120 bp or 150 bp.

供體DNA之第一同源臂及第二同源臂側接欲引入標靶核酸中之核苷酸序列(「相關核苷酸序列」或「間插核苷酸序列」)。相關核苷酸序列可包含:i)編碼相關多肽之核苷酸序列;ii)編碼基因外顯子之核苷酸序列;iii)啟動子序列;iv)增強子序列;v)編碼非編碼RNA之核苷酸序列;或vi)前述之任何組合。The first homology arm and the second homology arm of the donor DNA are flanked by nucleotide sequences to be introduced into the target nucleic acid ("related nucleotide sequences" or "intervening nucleotide sequences"). The related nucleotide sequences may include: i) nucleotide sequences encoding related polypeptides; ii) nucleotide sequences encoding gene exons; iii) promoter sequences; iv) enhancer sequences; v) nucleotide sequences encoding non-coding RNA; or vi) any combination of the foregoing.

供體DNA可提供基因校正、基因置換、基因標記、轉殖基因插入、核苷酸缺失、基因破壞、基因突變等。例如,供體DNA可用於向標靶DNA中添加(例如,插入或置換)核酸材料(例如,以「敲入」編碼蛋白質、siRNA、miRNA等之核酸),添加標籤(例如,6xHis、螢光蛋白(例如,綠色螢光蛋白;黃色螢光蛋白等)、血球凝集素(HA)、FLAG等),向基因中添加調節序列(例如,啟動子、多腺苷酸化信號、內部核糖體進入序列(IRES)、2A肽、起始密碼子、終止密碼子、剪接信號、定位信號、增強子等),修飾核酸序列(例如,引入突變)及其類似情形。例如,供體DNA可用於以位點特異性(亦即,「靶向」)方式修飾DNA;例如,基因剔除、基因敲入、基因編輯、基因標記等,如用於例如基因療法,例如以治療疾病;或用作抗病毒、抗病原體或抗癌治療劑、在農業中產生經遺傳修飾之生物體、由細胞大規模產生蛋白質以實現治療、診斷或研究目的、誘導富潛能幹細胞、生物研究、靶向用於缺失或置換之病原體基因等。Donor DNA can provide gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, gene mutation, etc. For example, donor DNA can be used to add (e.g., insert or replace) nucleic acid material to target DNA (e.g., to "knock in" nucleic acids encoding proteins, siRNAs, miRNAs, etc.), add tags (e.g., 6xHis, fluorescent proteins (e.g., green fluorescent proteins; yellow fluorescent proteins, etc.), hemagglutinin (HA), FLAG, etc.), add regulatory sequences to genes (e.g., promoters, polyadenylation signals, internal ribosome entry sequences (IRES), 2A peptides, start codons, stop codons, splicing signals, localization signals, enhancers, etc.), modify nucleic acid sequences (e.g., introduce mutations), and the like. For example, the donor DNA can be used to modify DNA in a site-specific (i.e., "targeted") manner; for example, gene knockout, gene knock-in, gene editing, gene tagging, etc., such as for use in, for example, gene therapy, e.g., to treat disease; or as an antiviral, antipathogenic, or anticancer therapeutic, to generate genetically modified organisms in agriculture, to mass produce proteins by cells for therapeutic, diagnostic, or research purposes, to induce high-potential stem cells, for biological research, to target pathogenic genes for deletion or replacement, etc.

在一些情況下,供體DNA包含編碼相關多肽之核苷酸序列。相關多肽包括例如a)包含一或多種胺基酸取代、插入及/或缺失且展現功能降低之多肽功能形式,例如,其中功能降低與病理疾患相關或引起病理疾患;b)螢光多肽;c)激素;d)配位體之受體;e)離子通道;f)神經遞質;g)及其類似物。In some cases, the donor DNA comprises a nucleotide sequence encoding a polypeptide of interest. The polypeptide of interest includes, for example, a) a functional form of a polypeptide comprising one or more amino acid substitutions, insertions and/or deletions and exhibiting reduced function, for example, where the reduced function is associated with or causes a pathological disorder; b) a fluorescent polypeptide; c) a hormone; d) a receptor for a ligand; e) an ion channel; f) a neurotransmitter; g) and the like.

在一些情況下,供體DNA包含編碼受體細胞中缺乏之野生型蛋白質之核苷酸序列。在一些情況下,供體DNA編碼參與凝血之野生型因子(例如,因子VII、因子VIII、因子IX及其類似因子)。在一些情況下,供體DNA包含編碼治療抗體之核苷酸序列。在一些情況下,供體DNA包含編碼經工程改造之蛋白質或受體之核苷酸序列。在一些情況下,經工程改造之受體為T細胞受體(TCR)、自然殺手(NK)受體(NKR)或B細胞受體(BCR)。在一些情況下,經工程改造之TCR或NKR靶向癌症標記物(例如,在癌細胞之表面上表現(例如,過表現)之多肽)。在一些情況下,供體DNA包含編碼嵌合抗原受體(CAR)之核苷酸序列。在一些情況下,CAR靶向癌症標記物。編碼CAR、TCR及/或NCR蛋白之供體DNA可折疊成DNA摺紙結構(DNA奈米結構)且在活體外或活體內遞送至T細胞或NK細胞中。In some cases, the donor DNA comprises a nucleotide sequence encoding a wild-type protein lacking in the recipient cell. In some cases, the donor DNA encodes a wild-type factor involved in coagulation (e.g., factor VII, factor VIII, factor IX and the like). In some cases, the donor DNA comprises a nucleotide sequence encoding a therapeutic antibody. In some cases, the donor DNA comprises a nucleotide sequence encoding an engineered protein or receptor. In some cases, the engineered receptor is a T cell receptor (TCR), a natural killer (NK) receptor (NKR), or a B cell receptor (BCR). In some cases, the engineered TCR or NKR targets a cancer marker (e.g., a polypeptide expressed (e.g., overexpressed) on the surface of a cancer cell). In some cases, the donor DNA comprises a nucleotide sequence encoding a chimeric antigen receptor (CAR). In some cases, the CAR targets a cancer marker. Donor DNA encoding the CAR, TCR and/or NCR proteins can be folded into a DNA origami structure (DNA nanostructure) and delivered to T cells or NK cells in vitro or in vivo.

可由供體DNA編碼之多肽的非限制性實例包括例如IL1B (介白素1, β)、XDH (黃嘌呤去氫酶)、TP53 (腫瘤蛋白p53)、PTGIS (前列腺素12 (前列腺環素)合成酶)、MB (肌紅蛋白)、IL4 (介白素4)、ANGPT1 (血管生成素1)、ABCG8 (ATP結合卡匣,子族G (WHITE),成員8)、CTSK (組織蛋白酶K)、PTGIR (前列腺素12 (前列腺環素)受體(IP))、KCNJ11 (鉀內向整流通道,子族J,成員11)、INS (胰島素)、CRP (C反應蛋白,五聚環蛋白相關)、PDGFRB (血小板源性生長因子受體,β多肽)、CCNA2 (細胞週期蛋白A2)、PDGFB (血小板源性生長因子β多肽(猿肉瘤病毒性(v-sis)致癌基因同源物))、KCNJ5 (鉀內向整流通道,子族J,成員5)、KCNN3 (鉀中間/小電導鈣激活通道,子族N,成員3)、CAPN10 (鈣蛋白酶10)、PTGES (前列腺素E合成酶)、ADRA2B (腎上腺素激導性α-2B-受體)、ABCG5 (ATP結合卡匣,子族G (WHITE),成員5)、PRDX2 (過氧化物還原蛋白2)、CAPN5 (鈣蛋白酶5)、PARP14 (聚(ADP-核糖)聚合酶家族,成員14)、MEX3C (mex-3同源物C (秀麗隱桿線蟲))、ACE血管緊張素I轉化酶(肽基-二肽酶A) 1)、TNF (腫瘤壞死因子(TNF超家族,成員2))、IL6 (介白素6 (干擾素β2))、STN (斯他汀)、SERPINE1 (serpin肽酶抑制劑,進化枝E (連接蛋白,纖維蛋白溶酶原活化劑抑制劑1型),成員1)、ALB (白蛋白)、ADIPOQ (脂聯素,含有C1Q及膠原蛋白結構域)、APOB (載脂蛋白B (包括Ag(x)抗原))、APOE (載脂蛋白E)、LEP (瘦素)、MTHFR (5,10-亞甲基四氫葉酸還原酶(NADPH))、APOA1 (載脂蛋白A-I)、EDN1 (內皮素1)、NPPB (利鈉肽前驅體B)、NOS3 (一氧化氮合成酶3 (內皮細胞))、PPARG (過氧化物酶體增生物活化受體γ)、PLAT (纖維蛋白溶酶原活化劑,組織)、PTGS2 (前列腺素-內過氧化物合成酶2 (前列腺素G/H合成酶及環氧合酶))、CETP (膽固醇酯轉移蛋白,血漿)、AGTR1 (血管緊張素II受體,1型)、HMGCR (3-羥基-3-甲基戊二醯基-輔酶A還原酶)、IGF1 (胰島素樣生長因子1 (生長介素C))、SELE (選擇素E)、REN (腎素)、PPARA (過氧化物酶體增生物活化受體α)、PON1 (過氧磷酶1)、KNG1 (激肽原1)、CCL2 (趨化介素(C-C模體)配位體2)、LPL (脂蛋白脂肪酶)、vWF (馮威里氏因子)、F2 (凝血因子II (凝血酶))、ICAM1 (細胞間黏附分子1)、TGFB1 (轉化生長因子,β1)、NPPA (利鈉肽前驅體A)、IL10 (介白素10)、EPO (紅血球生成素)、SOD1 (超氧化物歧化酶1,可溶性)、VCAM1 (血管細胞黏附分子1)、IFNG (干擾素,γ)、LPA (脂蛋白,Lp(a))、MPO (髓過氧化物酶)、ESR1 (雌激素受體1)、MAPK1 (促分裂原活化蛋白激酶1)、HP (血紅素結合素)、F3 (凝血因子III (凝血質、組織因子))、CST3 (胱抑素C)、COG2 (寡聚高爾基體複合物組分2)、MMP9 (基質金屬蛋白酶9 (明膠酶B,92 kDa明膠酶,92 kDa IV型膠原酶))、SERPINC1 (serpin肽酶抑制劑,進化枝C (抗凝血酶),成員1)、F8 (凝血因子VIII,促凝血組分)、HMOX1 (血紅素加氧酶(開環) 1)、APOC3 (載脂蛋白C-III)、IL8 (介白素8)、PROK1 (前動力蛋白1)、CBS (胱硫醚-β-合成酶)、NOS2 (一氧化氮合成酶2,誘導性)、TLR4 (toll樣受體4)、SELP (選擇素P (顆粒膜蛋白140 kDa,抗原CD62))、ABCA1 (ATP結合卡匣,子族A (ABC1),成員1)、AGT (血管緊張素原(serpin肽酶抑制劑,進化枝A,成員8))、LDLR (低密度脂蛋白受體)、GPT (麩胺酸-丙酮酸轉胺酶(丙胺酸胺基轉移酶))、VEGFA (血管內皮生長因子A)、NR3C2 (核受體子族3,C組,成員2)、IL18 (介白素18 (干擾素-γ-誘導因子))、NOS1 (一氧化氮合成酶1 (神經元))、NR3C1 (核受體子族3,C組,成員1 (糖皮質激素受體))、FGB (纖維蛋白原β鏈)、HGF (肝細胞生長因子(肝細胞生成素A;擴散因子))、ILIA (介白素1,α)、RETN (抵抗素)、AKT1 (v-akt鼠科動物胸腺瘤病毒致癌基因同源物1)、LIPC (脂肪酶,肝)、HSPD1 (熱休克60 kDa蛋白1 (伴侶蛋白))、MAPK14 (促分裂原活化蛋白激酶14)、SPP1 (分泌型磷蛋白1)、ITGB3 (整合素,β3 (血小板醣蛋白111a,抗原CD61))、CAT (過氧化氫酶)、UTS2 (尾加壓素2)、THBD (血栓調節蛋白)、F10 (凝血因子X)、CP (血漿銅藍蛋白(鐵氧化酶))、TNFRSF11B (腫瘤壞死因子受體子族,成員lib)、EDNRA (內皮素受體A型)、EGFR (表皮生長因子受體(成紅細胞性白血病病毒(v-erb-b)致癌基因同源物,禽))、MMP2 (基質金屬蛋白酶2 (明膠酶A,72 kDa明膠酶,72 kDa IV型膠原酶))、PLG (纖維蛋白溶酶原)、NPY (神經肽Y)、RHOD (ras同源物基因家族,成員D)、MAPK8 (促分裂原活化蛋白激酶8)、MYC (v-myc髓細胞瘤病毒致癌基因同源物(禽))、FN1 (纖維連接蛋白1)、CMA1 (糜蛋白酶1,肥大細胞)、PLAU (纖維蛋白溶酶原活化劑,尿激酶)、GNB3 (鳥嘌呤核苷酸結合蛋白(G蛋白),β多肽3)、ADRB2 (腎上腺素激導性β-2-受體,表面)、APOA5 (載脂蛋白A-V)、SOD2 (超氧化物歧化酶2,粒線體)、F5 (凝血因子V (促凝血球蛋白原,不穩定因子))、VDR (維他命D (1,25-二羥基維他命D3)受體)、ALOX5 (花生四烯酸5 -脂肪加氧酶)、HLA-DRB1 (主要組織相容性複合物,II類,DRβ1)、PARP1 (聚(ADP-核糖)聚合酶1)、CD40LG (CD40配位體)、PON2 (過氧磷酶2)、AGER (晚期糖基化終產物特異性受體)、IRS1 (胰島素受體受質1)、PTGS1 (前列腺素-內過氧化物合成酶1 (前列腺素G/H合成酶及環氧合酶))、ECE1 (內皮素轉化酶1)、F7 (凝血因子VII (血漿凝血酶原轉化促進劑))、URN (介白素1受體拮抗劑)、EPHX2 (環氧化物水解酶2,細胞質)、IGFBP1 (胰島素樣生長因子結合蛋白1)、MAPK10 (促分裂原活化蛋白激酶10)、FAS (Fas (TNF受體子族,成員6))、ABCB1 (ATP結合卡匣,子族B (MDR/TAP),成員1)、JUN (jun致癌基因)、IGFBP3 (胰島素樣生長因子結合蛋白3)、CD14 (CD14分子)、PDE5A (磷酸二酯酶5A,cGMP特異性)、AGTR2 (血管緊張素II受體,2型)、CD40 (CD40分子、TNF受體子族成員5)、LCAT (卵磷脂-膽固醇醯基轉移酶)、CCR5 (趨化介素(C-C模體)受體5)、MMP1 (基質金屬蛋白酶1 (間質膠原酶))、TIMP1 (TIMP金屬肽酶抑制劑1)、ADM (腎上腺髓素)、DYT10 (肌肉緊張不足10)、STAT3 (信號轉導及轉錄活化劑3 (急性期反應因子))、MMP3 (基質金屬蛋白酶3 (基質溶素1,明膠酶原))、ELN (彈性蛋白)、USF1 (上游轉錄因子1)、CFH (補體因子H)、HSPA4 (熱休克70 kDa蛋白4)、MMP12 (基質金屬蛋白酶12 (巨噬細胞彈性酶))、MME (膜金屬內肽酶)、F2R (凝血因子II (凝血酶)受體)、SELL (選擇素L)、CTSB (組織蛋白酶B)、ANXA5 (膜聯蛋白A5)、ADRB1 (腎上腺素激導性β-1-受體)、CYBA (細胞色素b-245,α多肽)、FGA (纖維蛋白原α鏈)、GGT1 (γ-麩胺醯基轉移酶1)、LIPG (脂肪酶,內皮)、HIF1A (缺氧誘導因子1,α次單元(基礎螺旋-環-螺旋轉錄因子))、CXCR4 (趨化介素(C-X-C模體)受體4)、PROC (蛋白C (凝血因子Va及Villa之去活劑))、SCARB1 (清道夫受體B類,成員1)、CD79A (CD79a分子,免疫球蛋白相關α)、PLTP (磷脂轉移蛋白)、ADD1 (內收蛋白1 (α))、FGG (纖維蛋白原γ鏈)、SAA1 (血清澱粉樣Al)、KCNH2 (鉀電壓閘控通道,子族H (eag相關),成員2)、DPP4 (二肽基-肽酶4)、G6PD (葡萄糖-6-磷酸去氫酶)、NPR1 (利鈉肽受體A/鳥苷酸環化酶A (心房利鈉肽受體A))、VTN (玻連蛋白)、KIAA0101 (KIAA0101)、FOS (FBJ鼠科動物骨肉瘤病毒致癌基因同源物)、TLR2 (toll樣受體2)、PPIG (肽基脯胺醯基異構酶G (親環素G))、IL1R1 (介白素1受體,I型)、AR (雄激素受體)、CYP1A1 (細胞色素P450,家族1,子族A,多肽1)、SERPINA1 (serpin肽酶抑制劑,進化枝A (α-1抗蛋白酶,抗胰蛋白酶),成員1)、MTR (5-甲基四氫葉酸-高半胱胺酸甲基轉移酶)、RBP4 (視黃醇結合蛋白4,血漿)、APOA4 (載脂蛋白A-IV)、CDKN2A (細胞週期蛋白依賴性激酶抑制劑2A (黑色素瘤,pl6,抑制CDK4))、FGF2 (成纖維細胞生長因子2 (基礎))、EDNRB (內皮素受體B型)、ITGA2 (整合素,α2 (CD49B,VLA-2受體之α2次單元))、CAB INI (鈣調磷酸酶結合蛋白1)、SHBG (性激素結合球蛋白)、HMGB1 (高遷移率族盒1)、HSP90B2P (熱休克蛋白90 kDaβ(Grp94),成員2 (偽基因))、CYP3A4 (細胞色素P450,家族3,子族A,多肽4)、GJA1 (間隙連接蛋白,α1,43 kDa)、CAV1 (小窩蛋白1,小窩蛋白,22 kDa)、ESR2 (雌激素受體2 (ERβ))、LTA (淋巴毒素α (TNF超家族,成員1))、GDF15 (生長分化因子15)、BDNF (腦源性神經營養因子)、CYP2D6 (細胞色素P450,家族2,子族D,多肽6)、NGF (神經生長因子(β多肽))、SP1 (Sp 1轉錄因子)、TGIF1 (TGFB誘導因子同源盒1)、SRC (v-src肉瘤(Schmidt-Ruppin A-2)病毒致癌基因同源物(禽))、EGF (表皮生長因子(β-尿抑胃素))、PIK3CG (磷酸肌醇-3-激酶,催化性,γ多肽)、HLA-A (主要組織相容性複合物,I類,A)、KCNQ1 (鉀電壓閘控通道、KQT樣子族,成員1)、CNR1 (大麻素受體1 (腦))、FBN1 (原纖維蛋白1)、CHKA (膽鹼激酶α)、BEST1 (bestrophin 1)、APP (澱粉樣β(A4)前驅體蛋白)、CTNNB1 (鏈蛋白(鈣黏蛋白相關蛋白),β1,88 kDa)、IL2 (介白素2)、CD36 (CD36分子(血小板反應蛋白受體))、PRKAB1 (蛋白激酶,AMP活化,β1非催化性次單元)、TPO (甲狀腺過氧化物酶)、ALDH7A1 (醛去氫酶7家族,成員Al)、CX3CR1 (趨化介素(C-X3-C模體)受體1)、TH (酪胺酸羥化酶)、F9 (凝血因子IX)、GH1 (生長激素1)、TF (轉鐵蛋白)、HFE (血色病)、IE17A (介白素17A)、PTEN (磷酸酶及張力蛋白同源物)、GSTM1 (麩胱甘肽S -轉移酶μ1)、DMD (抗肌萎縮蛋白)、GATA4 (GATA結合蛋白4)、F13A1 (凝血因子XIII,Al多肽)、TTR (轉甲狀腺素蛋白)、FABP4 (脂肪酸結合蛋白4,脂肪細胞)、PON3 (過氧磷酶3)、APOC1 (載脂蛋白C-I)、INSR (胰島素受體)、TNFRSF1B (腫瘤壞死因子受體子族,成員IB)、HTR2A (5-羥基色胺(血清素)受體2A)、CSF3 (群落刺激因子3 (粒細胞))、CYP2C9 (細胞色素P450,家族2,子族C,多肽9)、TXN (硫氧還蛋白)、CYP11B2 (細胞色素P450,家族11,子族B,多肽2)、PTH (副甲狀腺激素)、CSF2 (群落刺激因子2 (粒細胞-巨噬細胞))、KDR (激酶插入結構域受體(III型體酪胺酸激酶))、PLA2G2A (磷脂酶A2,IIA族(血小板、滑液))、B2M (β-2-微球蛋白)、THBS1 (血小板反應蛋白1)、GCG (升糖素)、RHOA (ras同源物基因家族,成員A)、ALDH2 (醛去氫酶2家族(粒線體))、TCF7L2 (轉錄因子7樣2 (T細胞特異性,HMG-盒))、BDKRB2 (緩激肽受體B2)、NFE2L2 (核因子(紅系源性2)樣2)、NOTCH1 (Notch同源物1,易位相關(果蠅))、UGT1A1 (UDP葡萄糖醛酸基轉移酶1家族,多肽Al)、IFNA1 (干擾素,α1)、PPARD (過氧化物酶體增生物活化受體δ)、SIRT1 (sirtuin (沈默配型資訊調節2同源物) 1 (釀酒酵母))、GNRH1 (促性腺激素釋放激素1 (促黃體激素釋放激素))、PAPPA (妊娠相關血漿蛋白A,冠毛素1)、ARR3 (抑制蛋白3,視網膜(X-(抑制蛋白))、NPPC (利鈉肽前驅體C)、AHSP (α血紅素穩定蛋白)、PTK2 (PTK2蛋白酪胺酸激酶2)、IL13 (介白素13)、MTOR (雷帕黴素機械標靶(絲胺酸/酥胺酸激酶))、ITGB2 (整合素、β2 (補體組分3受體3及4次單元))、GSTT1 (麩胱甘肽S-轉硫酶θ1)、IL6ST (介白素6信號轉導子(gpl30,制瘤素M受體))、CPB2 (羧基肽酶B2 (血漿))、CYP1A2 (細胞色素P450,家族1,子族A,多肽2)、HNF4A (肝細胞核因子4,α)、SLC6A4 (溶質載體家族6 (神經遞質轉運蛋白,血清素),成員4)、PLA2G6 (磷脂酶A2,VI族(細胞溶質、非鈣依賴性))、TNFSF11 (腫瘤壞死因子(配位體)超家族,成員11)、SLC8A1 (溶質載體家族8 (鈉/鈣交換劑),成員1)、F2RL1 (凝血因子II (凝血酶)受體樣1)、AKR1A1 (醛-酮還原酶家族1,成員A1 (醛還原酶))、ALDH9A1 (醛去氫酶9家族,成員Al)、BGLAP (骨γ-羧基麩胺酸(gla)蛋白)、MTTP (微粒體三酸甘油酯轉移蛋白)、MTRR (5-甲基四氫葉酸-高半胱胺酸甲基轉移酶還原酶)、SULT1A3 (磺基轉移酶家族,細胞溶質,1A,苯酚優先,成員3)、RAGE (腎腫瘤抗原)、C4B (補體組分4B (Chido血型)、P2RY12 (嘌呤能受體P2Y,G-蛋白偶合,12)、RNLS (腎酶,FAD依賴性胺氧化酶)、CREB1 (cAMP反應性元件結合蛋白1)、POMC (原嗎啡黑皮質素)、RAC1 (ras相關C3肉毒桿菌毒素受質1 (rho家族,小TP結合蛋白Racl))、LMNA (核纖層蛋白NC)、CD59 (CD59分子,補體調節蛋白)、SCN5A (鈉通道,電壓閘控,V型α次單元)、CYP1B1 (細胞色素P450,家族1,子族B,多肽1)、MIF (巨噬細胞遷移抑制因子(糖基化抑制因子))、MMP13 (基質金屬蛋白酶13 (膠原酶3))、TIMP2 (TIMP金屬肽酶抑制劑2)、CYP19A1 (細胞色素P450,家族19,子族A,多肽1)、CYP21A2 (細胞色素P450,家族21,子族A,多肽2)、PTPN22 (蛋白酪胺酸磷酸酶、非受體2型2 (淋巴樣))、MYH14 (肌凝蛋白,重鏈14,非肌肉)、MBL2 (甘露糖結合凝集素(蛋白C) 2,可溶性(調理素缺陷))、SELPLG (選擇素P配位體)、AOC3 (含銅胺氧化酶3 (血管黏附蛋白1))、CTSL1 (組織蛋白酶LI)、PCNA (增殖細胞核抗原)、IGF2 (胰島素樣生長因子2 (生長介素A))、ITGB1 (整合素,β1 (纖維連接蛋白受體,β多肽,抗原CD29包括MDF2、MSK12))、CAST (鈣蛋白酶抑制蛋白)、CXCL12 (趨化介素(C-X-C模體)配位體12 (基質細胞源性因子1))、IGHE (免疫球蛋白重鏈恆定ε)、KCNE1 (鉀電壓閘控通道,Isk相關家族,成員1)、TFRC (轉鐵蛋白受體(p90、CD71))、COL1A1 (膠原蛋白,I型,α1)、COL1A2 (膠原蛋白,I型,α2)、IL2RB (介白素2受體,β)、PLA2G10 (磷脂酶A2,X組)、ANGPT2 (血管生成素2)、PROCR (蛋白C受體,內皮(EPCR))、NOX4 (NADPH氧化酶4)、HAMP (鐵調素抗微生物肽)、PTPN11 (蛋白酪胺酸磷酸酶,非受體1型1)、SLC2A1 (溶質載體家族2 (促進葡萄糖轉運蛋白),成員1)、IL2RA (介白素2受體,α)、CCL5 (趨化介素(C-C模體)配位體5)、IRF1 (干擾素調節因子1)、CFLAR (CASP8及FADD樣凋亡調節因子)、CALC A (降鈣素相關多肽α)、EIF4E (真核轉譯起始因子4E)、GSTP1 (麩胱甘肽S-轉移酶pi 1)、JAK2 (Janus激酶2)、CYP3A5 (細胞色素P450,家族3,子族A,多肽5)、HSPG2 (硫酸乙醯肝素蛋白聚醣2)、CCL3 (趨化介素(C-C模體)配位體3)、MYD88 (髓系分化原發反應基因(88))、VIP (血管活性腸肽)、SOAT1 (甾醇O-醯基轉移酶1)、ADRBK1 (腎上腺素激導性β受體激酶1)、NR4A2 (核受體子族4,A組,成員2)、MMP8 (基質金屬蛋白酶8 (嗜中性球膠原酶))、NPR2 (利鈉肽受體B/鳥苷酸環化酶B (心房利鈉肽受體B))、GCH1 (GTP環化水解酶1)、EPRS (麩胺醯基-脯胺醯基-tRNA合成酶)、PPARGC1A (過氧化物酶體增生物活化受體γ,共活化劑1 α)、F12 (凝血因子XII (Hageman因子))、PEC AMI (血小板/內皮細胞黏附分子)、CCL4 (趨化介素(C-C模體)配位體4)、SERPINA3 (serpin肽酶抑制劑,進化枝A (α- 1抗蛋白酶,抗胰蛋白酶),成員3)、CASR (鈣敏感受體)、GJA5 (間隙連接蛋白,α 5,40 kDa)、FABP2 (脂肪酸結合蛋白2,腸)、TTF2 (轉錄終止因子,RNA聚合酶II)、PROS1 (蛋白S (α))、CTF1 (心肌營養素1)、SGCB (肌聚醣,β (43 kDa抗肌萎縮蛋白相關醣蛋白))、YME1L1 (YMEl樣1 (釀酒酵母))、CAMP (抗菌肽抗微生物肽)、ZC3H12A (含鋅指CCCH型12A)、AKR1B1 (醛-酮還原酶家族1,成員B1 (醛糖還原酶))、DES (結蛋白)、MMP7 (基質金屬蛋白酶7 (基質溶素、子宮))、AHR (芳基烴受體)、CSF1 (群落刺激因子1 (巨噬細胞))、HDAC9 (組蛋白去乙醯酶9)、CTGF (結締組織生長因子)、KCNMA1 (鉀大電導鈣激活通道,子族M,α成員1)、UGT1A (UDP葡萄糖醛酸基轉移酶1家族,多肽A複合物基因座)、PRKCA (蛋白激酶C,α)、COMT (兒茶酚-b-甲基轉移酶)、S100B (S100鈣結合蛋白B)、EGR1 (早期生長反應1)、PRL (泌乳素)、IL15 (介白素15)、DRD4 (多巴胺受體D4)、CAMK2G (鈣/鈣調蛋白依賴性蛋白激酶II γ)、SLC22A2 (溶質載體家族22 (有機陽離子轉運蛋白),成員2)、CCL11 (趨化介素(C-C模體)配位體11)、PGF (胎盤生長因子)、THPO (血小板生成素)、GP6 (醣蛋白VI (血小板))、TACR1 (速激肽受體1)、NTS (神經張力蛋白)、HNF1A (HNF1同源盒A)、SST (生長抑素)、KCND1 (鉀電壓閘控通道,Shal相關子族,成員1)、LOC646627 (磷脂酶抑制劑)、TBXAS1 (血栓烷A合成酶1 (血小板))、CYP2J2 (細胞色素P450,家族2,子族J,多肽2)、TBXA2R (血栓烷A2受體)、ADH1C (醇去氫酶1C (I類),γ多肽)、ALOX12 (花生四烯酸12-脂肪加氧酶)、AHSG (α-2-HS-醣蛋白)、BHMT (甜菜鹼-高半胱胺酸甲基轉移酶)、GJA4 (間隙連接蛋白,α4,37 kDa)、SLC25A4 (溶質載體家族25 (粒線體載體;腺嘌呤核苷酸轉位因子),成員4)、ACLY (ATP檸檬酸裂解酶)、ALOX5AP (花生四烯酸5-脂肪加氧酶活化蛋白)、NUMA1 (核有絲分裂器蛋白1)、CYP27B1 (細胞色素P450,家族27,子族B,多肽1)、CYSLTR2 (半胱胺醯基白三烯受體2)、SOD3 (超氧化物歧化酶3,細胞外)、LTC4S (白三烯C4合成酶)、UCN (尿皮素)、GHRL (腦腸肽/肥胖抑素前肽原)、APOC2 (載脂蛋白C-II)、CLEC4A (C型凝集素結構域家族4,成員A)、KBTBD10 (含kelch重複及BTB (POZ)結構域10)、TNC (肌腱蛋白C)、TYMS (胸苷酸合成酶)、SHC1 (SHC (含Src同源2結構域)轉化蛋白1)、LRP1 (低密度脂蛋白受體相關蛋白1)、SOCS3 (細胞介素信號傳導抑制因子3)、ADH1B (醇去氫酶IB (I類),β多肽)、KLK3 (胰舒血管素相關肽酶3)、HSD11B1 (羥基類固醇(11 -β)去氫酶1)、VKORC1 (維他命K環氧化物還原酶複合物,次單元1)、SERPINB2 (serpin肽酶抑制劑,進化枝B (卵白蛋白),成員2)、TNS1 (張力蛋白1)、RNF19A (環指蛋白19A)、EPOR (紅血球生成素受體)、ITGAM (整合素,α M (補體組分3受體3次單元))、PITX2 (類似配對同源結構域2)、MAPK7 (促分裂原活化蛋白激酶7)、FCGR3A (IgG Fc片段,低親和力111a受體(CD16a))、LEPR (瘦素受體)、ENG (內皮糖蛋白)、GPX1 (麩胱甘肽過氧化物酶1)、GOT2 (麩胺酸-草醯乙酸轉胺酶2、粒線體 (天冬胺酸胺基轉移酶2))、HRH1 (組織胺受體HI)、NR112 (核受體子族1,I組,成員2)、CRH (促腎上腺素釋放激素)、HTR1A (5-羥基色胺(血清素)受體1A)、VDAC1 (電壓依賴性陰離子通道1)、HPSE (乙醯肝素酶)、SFTPD (界面活性劑蛋白D)、TAP2 (轉運蛋白2,ATP結合卡匣,子族B (MDR/TAP))、RNF123 (環指蛋白123)、PTK2B (PTK2B蛋白酪胺酸激酶2 β)、NTRK2 (神經營養酪胺酸激酶受體,2型)、IL6R (介白素6受體)、ACHE (乙醯膽鹼酯酶(Yt血型))、GLP1R (升糖素樣肽1受體)、GHR (生長激素受體)、GSR (麩胱甘肽還原酶)、NQOl (NAD(P)H去氫酶,醌1)、NR5A1 (核受體子族5,A組,成員1)、GJB2 (間隙連接蛋白,β2,26 kDa)、SLC9A1 (溶質載體家族9 (鈉/氫交換劑),成員1)、MAOA (單胺氧化酶A)、PCSK9 (前蛋白轉化酶枯草桿菌蛋白酶/kexin型9)、FCGR2A (IgG Fc片段,低親和力Ila受體(CD32))、SERPINF1 (serpin肽酶抑制劑,進化枝F (α-2抗纖溶酶,色素上皮源性因子),成員1)、EDN3 (內皮素3)、DHFR (二氫葉酸還原酶)、GAS6 (生長停滯特異性6)、SMPD1 (鞘磷脂磷酸二酯酶1,酸溶酶體)、UCP2 (解偶蛋白2 (粒線體,質子載體))、TFAP2A (轉錄因子AP-2 α (活化增強子結合蛋白2 α))、C4BPA (補體組分4結合蛋白,α)、SERPINF2 (serpin肽酶抑制劑,進化枝F (α-2抗纖溶酶,色素上皮源性因子),成員2)、TYMP (胸苷磷酸化酶)、ALPP (鹼性磷酸酶、胎盤(Regan同功酶))、CXCR2 (趨化介素(C-X-C模體)受體2)、SLC39A3 (溶質載體家族39 (鋅轉運蛋白),成員3)、ABCG2 (ATP-結合卡匣,子族G (WHITE),成員2)、ADA (腺苷去胺酶)、JAK3 (Janus激酶3)、HSPA1A (熱休克70 kDa蛋白1A)、FASN (脂肪酸合成酶)、FGF1 (成纖維細胞生長因子1 (酸性))、Fll (凝血因子XI)、ATP7A (ATPase、Cu++轉運,α多肽)、CR1 (補體組分(3b/4b)受體1 (Knops血型))、GFAP (神經膠質纖維酸性蛋白)、ROCK1 (含Rho相關捲曲螺旋之蛋白激酶1)、MECP2 (甲基CpG結合蛋白2 (Rett症候群))、MYLK (肌凝蛋白輕鏈激酶)、BCF1E (丁醯膽鹼酯酶)、LIPE (脂肪酶,激素敏感性)、PRDX5 (過氧化物還原蛋白5)、ADORA1 (腺苷A1受體)、WRN (Werner症候群,RecQ解螺旋酶樣)、CXCR3 (趨化介素(C-X-C模體)受體3)、CD81 (CD81分子)、SMAD7 (SMAD家族成員7)、LAMC2 (層連結蛋白,γ2)、MAP3K5 (促分裂原活化蛋白激酶激酶激酶5)、CF1GA (嗜鉻粒蛋白A (副甲狀腺分泌蛋白1))、IAPP (胰島澱粉樣多肽)、RFIO (視紫質)、ENPP1 (外核苷酸焦磷酸酶/磷酸二酯酶1)、PTF1LF1 (副甲狀腺激素樣激素)、NRG1 (神經調節蛋白1)、VEGFC (血管內皮生長因子C)、ENPEP (麩胺醯基胺基肽酶(胺基肽酶A))、CEBPB (CCAAT/增強子結合蛋白(C/EBP)、β)、NAGLU (N-乙醯基葡萄糖苷酶,α)、F2RL3 (凝血因子II (凝血酶)受體樣3)、CX3CL1 (趨化介素(C-X3-C模體)配位體1)、BDKRB1 (緩激肽受體Bl)、ADAMTS13 (具有血小板反應蛋白1型模體之ADAM金屬肽酶,13)、ELANE (彈性酶,嗜中性球表現)、ENPP2 (外核苷酸焦磷酸酶/磷酸二酯酶2)、CISFl (細胞介素誘導型含SF12蛋白)、GAST (胃泌素)、MYOC (肌纖蛋白,小梁網誘導型糖皮質激素反應)、ATP1A2 (ATPase,Na+/K+轉運,α2多肽)、NF1 (神經纖維瘤蛋白1)、GJB1 (間隙連接蛋白,β1,32 kDa)、MEF2A (肌細胞增強因子2A)、VCL (黏著斑蛋白)、BMPR2 (骨形態發生蛋白受體,II型(絲胺酸/酥胺酸激酶))、TUBB (微管蛋白,β)、CDC42 (細胞分裂週期42 (GTP結合蛋白,25 kDa))、KRT18 (角蛋白18)、F1SF1 (熱休克轉錄因子1)、MYB (v-myb成髓細胞瘤病毒致癌基因同源物(禽))、PRKAA2 (蛋白激酶,AMP活化,α2催化性次單元)、ROCK2 (含Rho相關捲曲螺旋之蛋白激酶2)、TFPI (組織因子路徑抑制劑(脂蛋白相關凝血抑制劑))、PRKG1 (蛋白激酶,cGMP依賴性、I型)、BMP2 (骨形態發生蛋白2)、CTNND1 (鏈蛋白(鈣黏蛋白相關蛋白)、δ 1)、CTF1 (胱硫醚酶(胱硫醚γ-裂解酶))、CTSS (組織蛋白酶S)、VAV2 (vav 2鳥嘌呤核苷酸交換因子)、NPY2R (神經肽Y受體Y2)、IGFBP2 (胰島素樣生長因子結合蛋白2,36 kDa)、CD28 (CD28分子)、GSTA1 (麩胱甘肽S-轉移酶α1)、PPIA (肽基脯胺醯基異構酶A (親環素A))、APOF1 (載脂蛋白FI (β-2- 醣蛋白I))、S100A8 (S100鈣結合蛋白A8)、IL11 (介白素11)、ALOX15 (花生四烯酸15 -脂肪加氧酶)、FBLN1 (纖蛋白1)、NR1F13 (核受體子族1,FI組,成員3)、SCD (硬脂醯基-CoA去飽和酶(δ-9-去飽和酶))、GIP (胃抑制多肽)、CF1GB (嗜鉻粒蛋白B (分泌粒蛋白1))、PRKCB (蛋白激酶C,β)、SRD5A1 (類固醇-5-α-還原酶,α多肽1 (3-側氧基-5α-類固醇δ 4-去氫酶α1))、F1SD11B2 (羥基類固醇(11-β)去氫酶2)、CALCRL (降鈣素受體樣)、GALNT2 (UDP-N-乙醯基-α-D-半乳糖胺:多肽N-乙醯基半乳糖胺基轉移酶2 (GalNAc-T2))、ANGPTL4 (血管生成素樣4)、KCNN4 (鉀中間/小電導鈣激活通道,子族N,成員4)、PIK3C2A (磷酸肌醇-3-激酶,2類,α多肽)、HBEGF (肝素結合EGF樣生長因子)、CYP7A1 (細胞色素P450,家族7,子族A,多肽1)、HLA-DRB5 (主要組織相容性複合物,II類,DRβ5)、BNIP3 (BCL2/腺病毒E1B 19 kDa相互作用蛋白3)、GCKR (葡萄糖激酶(己糖激酶4)調節因子)、S100A12 (S100鈣結合蛋白A 12)、PADI4 (肽基精胺酸去胺酶,IV型)、HSPA14 (熱休克70 kDa蛋白14)、CXCR1 (趨化介素(C-X-C模體)受體1)、H19 (H19,印跡母體表現轉錄本(非蛋白編碼))、KRTAP19-3 (角蛋白相關蛋白19-3)、胰島素、RAC2 (ras相關C3肉毒桿菌毒素受質2 (rho家族,小GTP結合蛋白Rac2))、RYR1 (蘭尼鹼受體1 (骨骼))、CLOCK (clock同源物(小鼠))、NGFR (神經生長因子受體(TNFR超家族,成員16))、DBH (多巴胺β-羥化酶(多巴胺β-單加氧酶))、CHRNA4 (膽鹼激導性受體,菸鹼,α4)、CACNA1C (鈣通道,電壓依賴性,L型,α1C次單元)、PRKAG2 (蛋白激酶,AMP活化,γ2非催化性次單元)、CHAT (膽鹼乙醯基轉移酶)、PTGDS (前列腺素D2合成酶21 kDa (腦))、NR1H2 (核受體子族1,H組,成員2)、TEK (TEK酪胺酸激酶,內皮)、VEGFB (血管內皮生長因子B)、MEF2C (肌細胞增強因子2C)、MAPKAPK2 (促分裂原活化蛋白激酶活化蛋白激酶2)、TNFRSF11 A (腫瘤壞死因子受體子族,成員11a,NFKB活化劑)、HSPA9 (熱休克70 kDa蛋白9 (壽命蛋白))、CYSLTR1 (半胱胺醯基白三烯受體1)、MAT1A (甲硫胺酸腺苷轉移酶I,α)、OPRL1 (鴉片受體樣1)、IMPA1 (肌醇(肌肉)-l(或4) -單磷酸酶1)、CLCN2 (氯離子通道2)、DLD (二氫硫辛醯胺去氫酶)、PSMA6 (蛋白酶體(前體、巨蛋白因子)次單元,α型,6)、PSMB8 (蛋白酶體(前體、巨蛋白因子)次單元,β型,8 (大多功能肽酶7))、CHI3L1 (幾丁質酶3樣1 (軟骨醣蛋白-39))、ALDH1B1 (醛去氫酶1家族,成員Bl)、PARP2 (聚(ADP-核糖)聚合酶2)、STAR (類固醇合成急性調節蛋白)、LBP (脂多醣結合蛋白)、ABCC6 (ATP-結合卡匣,子族C (CFTR/MRP),成員6)、RGS2 (G蛋白信號傳導調節因子2,24 kDa)、EFNB2 (腎上腺素-B2)、囊性纖維化跨膜傳導調節因子(CFTR)、GJB6 (間隙連接蛋白,β6,30 kDa)、APOA2 (載脂蛋白A-II)、AMPD1 (腺苷單磷酸去胺酶1)、DYSF (dysferlin,肢帶肌肉萎縮2B (體染色體隱性))、FDFT1 (法尼基-二磷酸法尼基轉移酶1)、EDN2 (內皮素2)、CCR6 (趨化介素(C-C模體)受體6)、GJB3 (間隙連接蛋白,β3,31 kDa)、IL1RL1 (介白素1受體樣1)、ENTPD1 (外核苷三磷酸二磷酸水解酶1)、BBS4 (Bardet-Biedl症候群4)、CELSR2 (鈣黏蛋白,EGF LAG七經G型受體2 (紅鸛同源物,果蠅))、F11R (Fll受體)、RAPGEF3 (Rap鳥嘌呤核苷酸交換因子(GEF) 3)、HYAL1 (玻尿酸葡萄糖苷酶1)、ZNF259 (鋅指蛋白259)、ATOX1 (ATX1抗氧化劑蛋白1同源物(酵母))、ATF6 (活化轉錄因子6)、KΉK (酮己糖激酶(果糖激酶))、SAT1 (亞精胺/精胺Nl-乙醯基轉移酶1)、GGFI (γ-麩胺醯水解酶(結合酶,吡咯基聚γ麩胺醯水解酶))、TIMP4 (TIMP金屬肽酶抑制劑4)、SLC4A4 (溶質載體家族4,碳酸氫鈉共轉運蛋白,成員4)、PDE2A (磷酸二酯酶2 A,cGMP刺激)、PDE3B (磷酸二酯酶3B,cGMP抑制)、FADS1 (脂肪酸去飽和酶1)、FADS2 (脂肪酸去飽和酶2)、TMSB4X (胸腺素β4,X連鎖)、TXNIP (硫氧還蛋白相互作用蛋白)、LIMS1 (LIM及衰老細胞抗原樣結構域1)、RFIOB (ras同源物基因家族,成員B)、LY96 (淋巴球抗原96)、F側氧基l (叉頭盒01)、PNPLA2 (含patatin樣磷脂酶結構域2)、TRH (促甲狀腺激素釋放激素)、GJC1 (間隙連接蛋白,γ1,45 kDa)、SLC17A5 (溶質載體家族17 (陰離子/糖轉運蛋白),成員5)、FTO (脂肪質量及肥胖相關)、GJD2 (間隙連接蛋白,δ 2,36 kDa)、PSRC1 (富脯胺酸/絲胺酸之捲曲螺旋1)、CASP12 (半胱天冬酶12 (基因/偽基因))、GPBAR1 (G蛋白-偶合膽汁酸受體1)、PXK (含絲胺酸/酥胺酸激酶之PX結構域)、IL33 (介白素33)、TRIB1 (毛球族同源物1 (果蠅))、PBX4 (前B細胞白血病同源盒4)、NUPR1 (核蛋白,轉錄al調節因子、1)、15-Sep(15 kDa硒蛋白)、CILP2 (軟骨中間層蛋白2)、TERC (端粒酶RNA組分)、GGT2 (γ-麩胺醯基轉移酶2)、MT-COl (粒線體編碼之細胞色素c氧化酶I)、UOX (尿酸氧化酶,偽基因)、CRISPR/Cas效應多肽、酶活性CRISPR/Cas效應多肽(例如,能夠裂解標靶核酸)及無酶活性CRISPR/Cas效應多肽(例如,不裂解標靶核酸,但保留結合於標靶核酸)。在一些情況下,供體DNA編碼任何前述多肽之野生型形式;亦即,供體DNA可編碼不包括導致功能降低、功能缺乏或發病機制之突變的「正常」形式。Non-limiting examples of polypeptides that may be encoded by the donor DNA include, for example, IL1B (interleukin 1, beta), XDH (xanthine dehydrogenase), TP53 (tumor protein p53), PTGIS (prostaglandin 12 (prostaglandin) synthase), MB (myoglobin), IL4 (interleukin 4), ANGPT1 (angiopoietin 1), ABCG8 (ATP binding cassette, subfamily G (WHITE), member 8), CTSK (cathepsin K), PTGIR (prostaglandin 12 (prostaglandin) receptor (IP)), KCNJ11 (potassium inward rectifier channel, subfamily J, member 11), INS (insulin), CRP (C-reactive protein, pentraxin-related), PDGFRB (platelet-derived growth factor receptor, beta polypeptide), CCNA2 (cytokine A2), PDGFB (platelet-derived growth factor beta polypeptide (simian sarcoma viral (v-sis) oncogene homolog)), KCNJ5 (potassium inward rectifier channel, subfamily J, member 5), KCNN3 (potassium intermediate/small conductance calcium-activated channel, subfamily N, member 3), CAPN10 (calcification 10), PTGES (prostaglandin E synthetase), ADRA2B (adrenaline-stimulated alpha-2B-receptor), ABCG5 (ATP-binding cassette, subfamily G (WHITE), member 5), PRDX2 (peroxiredoxin 2), CAPN5 (calcification 5), PARP14 (poly (ADP-ribose) polymerase family, member 14), MEX3C (mex-3 homolog C (Cryptidys elegans)), ACE angiotensin I convertase (peptidyl-dipeptidase A) 1), TNF (tumor necrosis factor (TNF superfamily, member 2)), IL6 (interleukin 6 (interferon beta 2)), STN (statin), SERPINE1 (serpin peptidase inhibitor, clade E (connexin, fibronectin activator inhibitor type 1), member 1), ALB (albumin), ADIPOQ (adiponectin, containing C1Q and collagen domains), APOB (apolipoprotein B (including Ag(x) antigen)), APOE (apolipoprotein E), LEP (leptin), MTHFR (5,10-methylenetetrahydrofolate reductase (NADPH)), APOA1 (apolipoprotein A-I), EDN1 (endothelin 1), NPPB (natriuretic peptide pro-actin B), NOS3 (nitric oxide synthase 3 (endothelial cells)), PPARG (peroxisome proliferator-activated receptor gamma), PLAT (fibroblast lysinogen activator, tissue), PTGS2 (prostaglandin-endoperoxide synthase 2 (prostaglandin G/H synthase and cyclooxygenase)), CETP (cholesterol ester transfer protein, plasma), AGTR1 (angiotensin II receptor, type 1), HMGCR (3-hydroxy-3-methylglutaryl-coenzyme A reductase), IGF1 (insulin-like growth factor 1 (interleukin C)), SELE (selectin E), REN (renin), PPARA (peroxisome proliferator-activated receptor alpha), PON1 (peroxisomal phosphatase 1), KNG1 (kininogen 1), CCL2 (chemokine (C-C motif) ligand 2), LPL (lipoprotein lipase), vWF (von Willebrand factor), F2 (coagulation factor II (thrombin)), ICAM1 (intercellular adhesion molecule 1), TGFB1 (transforming growth factor, β1), NPPA (natriuretic peptide propromoter A), IL10 (interleukin 10), EPO (erythropoietin), SOD1 (superoxide dismutase 1, soluble), VCAM1 (vascular cell adhesion molecule 1), IFNG (interferon, gamma), LPA (lipoprotein, Lp(a)), MPO (myeloperoxidase), ESR1 (estrogen receptor 1), MAPK1 (mitogen-activated protein kinase 1), HP (heme phenotype), F3 (coagulation factor III (thrombin, tissue factor)), CST3 (cystatin C), COG2 (oligomeric Golgi complex component 2), MMP9 (matrix metalloproteinase 9 (gelatinase B, 92 kDa gelatinase, 92 kDa Type IV collagenase)), SERPINC1 (serpin peptidase inhibitor, clade C (antithrombin), member 1), F8 (coagulation factor VIII, procoagulant component), HMOX1 (heme oxygenase (ring opening) 1), APOC3 (apolipoprotein C-III), IL8 (interleukin 8), PROK1 (prokinesin 1), CBS (cystathionine-β-synthase), NOS2 (nitric oxide synthase 2, inducing), TLR4 (toll-like receptor 4), SELP (selectin P (granule membrane protein 140 kDa, antigen CD62)), ABCA1 (ATP binding cassette, subfamily A (ABC1), member 1), AGT (angiotensinogen (serpin peptidase inhibitor, clade A, member 8)), LDLR (low-density lipoprotein receptor), GPT (glutamine-pyruvate transaminase (alanine aminotransferase)), VEGFA (vascular endothelial growth factor A), NR3C2 (nuclear receptor subfamily 3, group C, member 2), IL18 (interleukin 18 (interferon-γ-inducing factor)), NOS1 (nitric oxide synthase 1 (neuron)), NR3C1 (nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor)), FGB (fibrillogen beta chain), HGF (hepatocyte growth factor (hepatocyte poi-toiin A; proliferating factor)), ILIA (interleukin 1, alpha), RETN (resistin), AKT1 (v-akt murine thymoma viral oncogene homolog 1), LIPC (lipase, liver), HSPD1 (heat shock protein 60 kDa 1 (chaperone)), MAPK14 (mitogen activated protein kinase 14), SPP1 (secreted phosphoprotein 1), ITGB3 (integrin, β3 (thrombopoietin 111a, antigen CD61)), CAT (catalase), UTS2 (urotensin 2), THBD (thrombomodulin), F10 (coagulation factor X), CP (plasma cucocyanin (ferrooxidase)), TNFRSF11B (tumor necrosis factor receptor subfamily, member lib), EDNRA (endothelin receptor type A), EGFR (epidermal growth factor receptor (erythroblastic leukemia virus (v-erb-b) oncogene homolog, avian)), MMP2 (matrix metalloproteinase 2 (gelatinase A, 72 kDa gelatinase, 72 kDa type IV collagenase)), PLG (fibrolysinogen), NPY (neuropeptide Y), RHOD (ras homolog gene family, member D), MAPK8 (mitogen-activated protein kinase 8), MYC (v-myc myelocytoma viral oncogene homolog (avian)), FN1 (fibronectin 1), CMA1 (chymotrypsin 1, mast cell), PLAU (fibronectin activator, urokinase), GNB3 (guanine nucleotide-binding protein (G protein), beta polypeptide 3), ADRB2 (adrenaline-stimulated beta-2-receptor, surface), APOA5 (apolipoprotein A-V), SOD2 (superoxide dismutase 2, mitochondrial), F5 (coagulation factor V (prothrombin, labile factor)), VDR (vitamin D (1,25-dihydroxyvitamin D3) receptor), ALOX5 (arachidonic acid 5-lipoxygenase), HLA-DRB1 (major histocompatibility complex, class II, DRβ1), PARP1 (poly (ADP-ribose) polymerase 1), CD40LG (CD40 ligand), PON2 (peroxisomal 2), AGER (advanced glycation end product specific receptor), IRS1 (insulin receptor receptor 1), PTGS1 (prostaglandin-endoperoxide synthase 1 (prostaglandin G/H synthase and cyclooxygenase)), ECE1 (endothelin convertase 1), F7 (coagulation factor VII (plasma prothrombin conversion accelerator)), URN (interleukin 1 receptor antagonist), EPHX2 (epoxide hydrolase 2, cytoplasmic), IGFBP1 (insulin-like growth factor binding protein 1), MAPK10 (mitogen-activated protein kinase 10), FAS (Fas (TNF receptor subfamily, member 6)), ABCB1 (ATP binding cassette, subfamily B (MDR/TAP), member 1), JUN (jun oncogene), IGFBP3 (insulin-like growth factor binding protein 3), CD14 (CD14 molecule), PDE5A (phosphodiesterase 5A, cGMP specific), AGTR2 (angiotensin II receptor, type 2), CD40 (CD40 molecule, TNF receptor subfamily member 5), LCAT (lecithin-cholesterol transferase), CCR 5 (chemokine (C-C motif) receptor 5), MMP1 (matrix metalloproteinase 1 (interstitial collagenase)), TIMP1 (TIMP metallopeptidase inhibitor 1), ADM (adrenomedulin), DYT10 (dystonia 10), STAT3 (signal transducer and activator of transcription 3 (acute phase response factor)), MMP3 (matrix metalloproteinase 3 (matrilysin 1, procollagen)), ELN (elastin), USF1 (upstream transcription factor 1), CFH (complement factor H), HSPA4 (heat shock 70 kDa protein 4), MMP12 (matrix metalloproteinase 12 (macrophage elastase)), MME (membrane metalloendopeptidase), F2R (coagulation factor II (thrombin) receptor), SELL (selectin L), CTSB (cathepsin B), ANXA5 (annexin A5), ADRB1 (adrenaline stimulating beta-1-receptor), CYBA (cytochrome b-245, alpha polypeptide), FGA (fibrinogen alpha chain), GGT1 (gamma-glutamicinyltransferase 1), LIPG (lipase, endothelial), HIF1A (hypoxia-inducing factor 1, alpha subunit (basal helix-loop-helix transcription factor)), CXCR4 (interleukin (C-X-C motif) receptor 4), PROC (protein C (inactivator of coagulation factors Va and Villa)), SCARB1 (scavenger receptor class B, member 1), CD79A (CD79a molecule, immunoglobulin-related alpha), PLTP (phospholipid transfer protein), ADD1 (adductin 1 (alpha)), FGG (fibrinogen gamma chain), SAA1 (serum amyloid Al), KCNH2 (potassium voltage-gated channel, subfamily H (eag-related), member 2), DPP4 (dipeptidyl-peptidase 4), G6PD (glucose-6-phosphate dehydrogenase), NPR1 (natriuretic peptide receptor A/guanylate cyclase A) (atrial natriuretic peptide receptor A)), VTN (vitronectin), KIAA0101 (KIAA0101), FOS (FBJ murine osteosarcoma viral oncogene homolog), TLR2 (toll-like receptor 2), PPIG (peptidylprolyl isomerase G (cyclophilin G)), IL1R1 (interleukin 1 receptor, type I), AR (androgen receptor), CYP1A1 (cytochrome P450, family 1, subfamily A, polypeptide 1), SERPINA1 (serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 1), MTR (5-methyltetrahydrofolate-homocysteine methyltransferase), RBP4 (retinol binding protein 4, plasma), APOA4 (apolipoprotein A-IV), CDKN2A (Cell cycle protein-dependent kinase inhibitor 2A (melanoma, p16, inhibits CDK4)), FGF2 (fibroblast growth factor 2 (basal)), EDNRB (endothelin receptor type B), ITGA2 (integrin, α2 (CD49B, α2 subunit of VLA-2 receptor)), CAB INI (calcineurin phosphatase binding protein 1), SHBG (sex hormone binding globulin), HMGB1 (high mobility group box 1), HSP90B2P (heat shock protein 90 kDa beta (Grp94), member 2 (pseudogene)), CYP3A4 (cytochrome P450, family 3, subgroup A, polypeptide 4), GJA1 (gap junction protein, α1, 43 kDa), CAV1 (cavolin 1, caveolin, 22 kDa), ESR2 (estrogen receptor 2 (ERβ)), LTA (lymphotoxin alpha (TNF superfamily, member 1)), GDF15 (growth differentiation factor 15), BDNF (brain-derived neurotrophic factor), CYP2D6 (cytochrome P450, family 2, subfamily D, polypeptide 6), NGF (neural growth factor (β polypeptide)), SP1 (Sp 1 transcription factor), TGIF1 (TGFB inducing factor homeobox 1), SRC (v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian)), EGF (epidermal growth factor (β-urogastatin)), PIK3CG (phosphoinositide-3-kinase, catalytic, gamma polypeptide), HLA-A (major histocompatibility complex, class I, A), KCNQ1 (potassium voltage-gated channel, KQT-like family, member 1), CNR1 (cannabinoid receptor 1 (brain)), FBN1 (protofibrillin 1), CHKA (choleline kinase alpha), BEST1 (bestrophin 1), APP (amyloid beta (A4) precursor protein), CTNNB1 (trencin (calcified mucin-related protein), β1, 88 kDa), IL2 (interleukin 2), CD36 (CD36 molecule (thrombospondin receptor)), PRKAB1 (protein kinase, AMP-activated, β1 non-catalytic subunit), TPO (thyroid peroxidase), ALDH7A1 (aldehyde dehydrogenase 7 family, member Al), CX3CR1 (interleukin (C-X3-C motif) receptor 1), TH (tyrosine hydroxylase), F9 (coagulation factor IX), GH1 (growth hormone 1), TF (transferrin), HFE (hemochromatosis), IE17A (interleukin 17A), PTEN (phosphatase and tensin homolog), GSTM1 (glutathione S-transferase μ1), DMD (dystrophin), GATA4 (GATA binding protein 4), F13A1 (coagulation factor XIII, Al polypeptide), TTR (transthyretin), FABP4 (fatty acid binding protein 4, adipocyte), PON3 (peroxisomal phosphatase 3), APOC1 (apolipoprotein C-I), INSR (insulin receptor), TNFRSF1B (tumor necrosis factor receptor subfamily, member IB), HTR2A (5-hydroxytryptamine (serotonin) receptor 2A), CSF3 (colony stimulating factor 3 (granulocyte)), CYP2C9 (cytochrome P450, family 2, subgroup C, polypeptide 9), TXN (thioredoxin), CYP11B2 (cytochrome P450, family 11, subgroup B, polypeptide 2), PTH (parathyroid hormone), CSF2 (colony stimulating factor 2 (granulocyte-macrophage)), KDR (kinase insert domain receptor (type III body tyrosine kinase)), PLA2G2A (phospholipase A2, group IIA (platelets, synovial fluid)), B2M (beta-2-microglobulin), THBS1 (thrombospondin 1), GCG (glucagon), RHOA (ras homolog gene family, member A), ALDH2 (aldehyde dehydrogenase 2 family (mitochondria)), TCF7L2 (transcription factor 7-like 2 (T cell-specific, HMG-box)), BDKRB2 (bradykinin receptor B2), NFE2L2 (nuclear factor (erythroid 2)-like 2), NOTCH1 (Notch homolog 1, translocation-related (Drosophila)), UGT1A1 (UDP glucuronosyltransferase 1 family, polypeptide Al), IFNA1 (interferon, alpha 1), PPARD (peroxisome proliferator-activated receptor delta), SIRT1 (sirtuin (silent mate information regulator 2 homolog) 1 (brew yeast)), GNRH1 (gonadotropin-releasing hormone 1 (luteinizing hormone-releasing hormone)), PAPPA (pregnancy-associated plasma protein A, pappa lysine 1), ARR3 (arrestin 3, retinal (X-(arrestin)), NPPC (natriuretic peptide promotor C), AHSP (alpha heme stabilizer), PTK2 (PTK2 protein tyrosine kinase 2), IL13 (interleukin 13), MTOR (mechanistic target of rapamycin (serine/thiocyanate kinase)), ITGB2 (integrin, β2 (complement component 3 receptor 3 and 4 subunits)), GSTT1 (glutathione S-transferase theta 1), IL6ST (interleukin 6 signal transducer (gpl30, oncostatin M receptor)), CPB2 (carboxypeptidase B2 (plasma)), CYP1A2 (cytochrome P450, family 1, subfamily A, polypeptide 2), HNF4A (hepatocyte nuclear factor 4, alpha), SLC6A4 (solute carrier family 6 (neurotransmitter, serotonin), member 4), PLA2G6 (phospholipase A2, group VI (cytosolic, calcium-independent)), TNFSF11 (tumor necrosis factor (ligand) superfamily, member 11), SLC8A1 (solutolytic carrier family 8 (sodium/calcium exchanger), member 1), F2RL1 (coagulation factor II (thrombin) receptor-like 1), AKR1A1 (aldodeductase family 1, member A1 (aldehyde reductase)), ALDH9A1 (aldehyde dehydrogenase 9 family, member Al), BGLAP (bone gamma-carboxyglutamine (gla) protein), MTTP (microsomal triglyceride transfer protein), MTRR (5-methyltetrahydrofolate-homocysteine methyltransferase reductase), SULT1A3 (sulfotransferase family, cytosolic, 1A, phenol-preferred, member 3), RAGE (kidney tumor antigen), C4B (complement component 4B (Chido blood group), P2RY12 (purinergic receptor P2Y, G-protein coupled, 12), RNLS (nephrin, FAD-dependent amine oxidase), CREB1 (cAMP responsive element binding protein 1), POMC (promorphin melanocortin), RAC1 (ras-related C3 botulinum toxin substrate 1 (rho family, small TP binding protein Racl)), LMNA (nuclear laminin NC), CD59 (CD59 molecule, complement regulatory protein), SCN5A (sodium channel, voltage-gated, type V alpha subunit), CYP1B1 (cytochrome P450, family 1, subgroup B, polypeptide 1), MIF (macrophage migration inhibitory factor (glycosylation inhibitor)), MMP13 (matrix metalloproteinase 13 (collagenase 3)), TIMP2 (TIMP metallopeptidase inhibitor 2), CYP19A1 (cytochrome P450, family 19, subgroup A, polypeptide 1), CYP21A2 (cytochrome P450, family 21, subgroup A, polypeptide 2), PTPN22 (protein tyrosine phosphatase, non-receptor type 2 (lymphoid)), MYH14 (myosin, heavy chain 14, non-muscle), MBL2 (mannose-binding lectin (protein C) 2, soluble (opsonin deficiency)), SELPLG (selectin P ligand), AOC3 (copper-containing amine oxidase 3 (vascular adhesion protein 1)), CTSL1 (histatinase LI), PCNA (proliferating cell nuclear antigen), IGF2 (insulin-like growth factor 2 (interleukin A)), ITGB1 (integrin, beta 1 (fibronectin receptor, beta polypeptide, antigen CD29 including MDF2, MSK12)), CAST (calcified protein), CXCL12 (chemointerleukin (C-X-C motif) ligand 12 (stromal cell-derived factor 1)), IGHE (immunoglobulin heavy chain homeostasis epsilon), KCNE1 (potassium voltage-gated channel, Isk-related family, member 1), TFRC (transferrin receptor (p90, CD71)), COL1A1 (collagen, type I, α1), COL1A2 (collagen, type I, α2), IL2RB (interleukin 2 receptor, beta), PLA2G10 (phospholipase A2, group X), ANGPT2 (angiopoietin 2), PROCR (protein C receptor, endothelial (EPCR)), NOX4 (NADPH oxidase 4), HAMP (hepcidin antimicrobial peptide), PTPN11 (protein tyrosine phosphatase, non-receptor type 1), SLC2A1 (solute carrier family 2 (glucose transporter), member 1), IL2RA (interleukin 2 receptor, alpha), CCL5 (Cellulin (C-C motif) ligand 5), IRF1 (interferon regulatory factor 1), CFLAR (CASP8 and FADD-like apoptosis regulator), CALC A (calcitonin-related polypeptide alpha), EIF4E (eukaryotic translation initiation factor 4E), GSTP1 (glutathione S-transferase pi 1), JAK2 (Janus kinase 2), CYP3A5 (cytochrome P450, family 3, subgroup A, polypeptide 5), HSPG2 (heparan sulfate proteoglycan 2), CCL3 (interleukin (C-C motif) ligand 3), MYD88 (myeloid differentiation primary response gene (88)), VIP (vasoactive intestinal peptide), SOAT1 (sterol O-acyltransferase 1), ADRBK1 (adrenaline-stimulated beta receptor kinase 1), NR4A2 (nuclear receptor subfamily 4, group A, member 2), MMP8 (matrix metalloproteinase 8 (neutrophil globulinogenase)), NPR2 (natriuretic peptide receptor B/guanylate cyclase B (atrial natriuretic peptide receptor B)), GCH1 (GTP cyclohydrolase 1), EPRS (glutamicin-prolyl-tRNA synthetase), PPARGC1A (peroxisome proliferator-activated receptor gamma, coactivator 1 alpha), F12 (coagulation factor XII (Hageman factor)), PEC AMI (platelet/endothelial cell adhesion molecule), CCL4 (chemointerferon (C-C motif) ligand 4), SERPINA3 (serpin peptidase inhibitor, clade A (α-1 antiprotease, antitrypsin), member 3), CASR (calcium-sensitive receptor), GJA5 (gap junction protein, α 5, 40 kDa), FABP2 (fatty acid binding protein 2, intestinal), TTF2 (transcriptional termination factor, RNA polymerase II), PROS1 (protein S (α)), CTF1 (cardiotrophin 1), SGCB (sarcoglycan, β (43 kDa dystrophin-related glycoprotein)), YME1L1 (YME1-like 1 (brew yeast)), CAMP (antibacterial peptide antimicrobial peptide), ZC3H12A (zinc finger CCCH type 12A), AKR1B1 (aldose reductase family 1, member B1 (aldose reductase)), DES (desmin), MMP7 (matrix metalloproteinase 7 (matrilysin, uterine)), AHR (aryl hydrocarbon receptor), CSF1 (colony stimulating factor 1 (macrophage)), HDAC9 (histone deacetylase 9), CTGF (conjunctive tissue growth factor), KCNMA1 (potassium large conductance calcium-activated channel, subfamily M, alpha member 1), UGT1A (UDP glucuronosyltransferase 1 family, polypeptide A complex locus), PRKCA (protein kinase C, alpha), COMT (catechol-b-methyltransferase), S100B (S100 calcium-binding protein B), EGR1 (early growth response 1), PRL (prolactin), IL15 (interleukin 15), DRD4 (dopamine receptor D4), CAMK2G (calcium/calcimodulin-dependent protein kinase II gamma), SLC22A2 (solute carrier family 22 (organic cation transporter), member 2), CCL11 (chemointerleukin (C-C motif) ligand 11), PGF (placental growth factor), THPO (thrombopoietin), GP6 (glycoprotein VI (platelets)), TACR1 (tachykinin receptor 1), NTS (neurotensin), HNF1A (HNF1 homeobox A), SST (somatostatin), KCND1 (potassium voltage-gated channel, Shal-related subfamily, member 1), LOC646627 (phospholipase inhibitor), TBXAS1 (thromboxane A synthase 1 (platelets)), CYP2J2 (cytochrome P450, family 2, subgroup J, polypeptide 2), TBXA2R (thromboxane A2 receptor), ADH1C (alcohol dehydrogenase 1C (class I), gamma polypeptide), ALOX12 (arachidonic acid 12-lipoxygenase), AHSG (alpha-2-HS-glycoprotein), BHMT (betaine-homocysteine methyltransferase), GJA4 (gap junction protein, alpha 4, 37 kDa), SLC25A4 (solute carrier family 25 (mitochondrial carrier; adenine nucleotide translocation factor), member 4), ACLY (ATP citrate lyase), ALOX5AP (arachidonic acid 5-lipoxygenase activating protein), NUMA1 (nuclear mitotron protein 1), CYP27B1 (cytochrome P450, family 27, subgroup B, polypeptide 1), CYSLTR2 (cysteamine leukotriene receptor 2), SOD3 (superoxide dismutase 3, extracellular), LTC4S (leukotriene C4 synthase), UCN (urocortin), GHRL (ghrelin/obesitystatin prepropeptide), APOC2 (apolipoprotein C-II), CLEC4A (C-type lectin domain family 4, member A), KBTBD10 (kelch repeat and BTB (POZ) domain 10), TNC (tenascin C), TYMS (thymidylate synthase), SHC1 (SHC (Src homology 2 domain) conversion protein 1), LRP1 (low-density lipoprotein receptor-related protein 1), SOCS3 (suppressor of interleukin signaling 3), ADH1B (alcohol dehydrogenase IB (class I), beta polypeptide), KLK3 (pancreatic vasodilator-related peptidase 3), HSD11B1 (hydroxysteroid (11-beta) dehydrogenase 1), VKORC1 (vitamin K epoxide reductase complex, subunit 1), SERPINB2 (serpin peptidase inhibitor, evolutionary clade B (ovalbumin), member 2), TNS1 (tensin 1), RNF19A (ring finger protein 19A), EPOR (erythropoietin receptor), ITGAM (integrin, α M (complement component 3 receptor 3 subunit)), PITX2 (paired homology domain-like 2), MAPK7 (mitogen-activated protein kinase 7), FCGR3A (IgG Fc fragment, low affinity 111a receptor (CD16a)), LEPR (leptin receptor), ENG (endoglin), GPX1 (glutathione peroxidase 1), GOT2 (glutamine-oxalyltransferase 2, mitochondrial (aspartate aminotransferase 2)), HRH1 (histaminic receptor HI), NR112 (nuclear receptor subfamily 1, group I, member 2), CRH (epinephrine-releasing hormone), HTR1A (5-hydroxytryptamine (serotonin) receptor 1A), VDAC1 (voltage-dependent anion channel 1), HPSE (acetylheparinase), SFTPD (surfactant protein D), TAP2 (transporter 2, ATP-binding cassette, subfamily B (MDR/TAP)), RNF123 (RING finger protein 123), PTK2B (PTK2B protein tyrosine kinase 2 beta), NTRK2 (neurotrophic tyrosine kinase receptor, type 2), IL6R (interleukin 6 receptor), ACHE (acetylcholinesterase (Yt blood group)), GLP1R (glucagon-like peptide 1 receptor), GHR (growth hormone receptor), GSR (glutathione reductase), NQO1 (NAD(P)H dehydrogenase, quinone 1), NR5A1 (nuclear receptor subfamily 5, group A, member 1), GJB2 (gap junction protein, β2, 26 kDa), SLC9A1 (solute carrier family 9 (sodium/hydrogen exchanger), member 1), MAOA (monoamine oxidase A), PCSK9 (proprotein convertase subtilisin/kexin type 9), FCGR2A (IgG Fc fragment, low affinity Ila receptor (CD32)), SERPINF1 (serpin peptidase inhibitor, evolutionary clade F (alpha-2 antifibrotic enzyme, pigment epithelium-derived factor), member 1), EDN3 (endothelin 3), DHFR (dihydrofolate reductase), GAS6 (growth arrest specific 6), SMPD1 (sphingomyelin phosphodiesterase 1, acid lysosome), UCP2 (uncoupling protein 2 (mitochondrial, proton carrier)), TFAP2A (transcription factor AP-2 α (activating enhancer binding protein 2 α)), C4BPA (complement component 4 binding protein, α), SERPINF2 (serpin peptidase inhibitor, evolutionary clade F (α-2 antifibrotic enzyme, pigment epithelium-derived factor), member 2), TYMP (thymidine phosphorylase), ALPP (alkaline phosphatase, placental (Regan isozyme)), CXCR2 (chemoattractant (C-X-C motif) receptor 2), SLC39A3 (solute carrier family 39 (zinc transporter), member 3), ABCG2 (ATP-binding cassette, subfamily G (WHITE), member 2), ADA (adenosine deaminase), JAK3 (Janus kinase 3), HSPA1A (heat shock 70 kDa protein 1A), FASN (fatty acid synthase), FGF1 (fibroblast growth factor 1 (acidic)), Fll (coagulation factor XI), ATP7A (ATPase, Cu++ transporter, alpha polypeptide), CR1 (complement component (3b/4b) receptor 1 (Knops blood type)), GFAP (fibroblast acidic protein), ROCK1 (Rho-associated coiled coil containing protein kinase 1), MECP2 (methyl CpG binding protein 2 (Rett syndrome)), MYLK (myosin light chain kinase), BCF1E (butyryl cholinesterase), LIPE (lipase, hormone sensitive), PRDX5 (peroxiredoxin 5), ADORA1 (adenosine A1 receptor), WRN (Werner syndrome, RecQ helicase-like), CXCR3 (C-X-C motif) receptor 3), CD81 (CD81 molecule), SMAD7 (SMAD family member 7), LAMC2 (laminin, gamma 2), MAP3K5 (mitogen-activated protein kinase kinase kinase 5), CF1GA (chromogranin A (parathyroid secretory protein 1)), IAPP (islet amyloliquefacial polypeptide), RFIO (rhodopsin), ENPP1 (ectonucleotide pyrophosphatase/phosphodiesterase 1), PTF1LF1 (parathyroid hormone-like hormone), NRG1 (neuromodulin 1), VEGFC (vascular endothelial growth factor C), ENPEP (glutamicin (aminopeptidase A)), CEBPB (CCAAT/enhancer binding protein (C/EBP), beta), NAGLU (N-acetylglucosidase, α), F2RL3 (coagulation factor II (thrombin) receptor-like 3), CX3CL1 (interleukin (C-X3-C motif) ligand 1), BDKRB1 (bradykinin receptor Bl), ADAMTS13 (ADAM metallopeptidase with thrombospondin type 1 motif, 13), ELANE (elastase, neutrophil expressed), ENPP2 (ectonucleotide pyrophosphatase/phosphodiesterase 2), CISF1 (interleukin-induced SF12-containing protein), GAST (gastrin), MYOC (myosin, trabecular meshwork-induced glucocorticoid response), ATP1A2 (ATPase, Na+/K+ transporter, α2 polypeptide), NF1 (neurofibroma protein 1), GJB1 (gap junction protein, β1, 32 kDa), MEF2A (myocyte enhancing factor 2A), VCL (focal adhesion protein), BMPR2 (bone morphogenetic protein receptor, type II (serine/thiocyanine kinase)), TUBB (tubulin, beta), CDC42 (mitotic cycle 42 (GTP-binding protein, 25 kDa)), KRT18 (keratin 18), F1SF1 (heat shock transcription factor 1), MYB (v-myb myeloblastosis viral oncogene homolog (avian)), PRKAA2 (protein kinase, AMP-activated, alpha 2 catalytic subunit), ROCK2 (Rho-associated coiled-coil containing protein kinase 2), TFPI (tissue factor pathway inhibitor (lipoprotein-associated coagulation inhibitor)), PRKG1 (protein kinase, cGMP-dependent, type I), BMP2 (bone morphogenetic protein 2), CTNND1 (calcificin (calcificin-related protein), delta 1), CTF1 (cystathionase (cystathionine gamma-lyase)), CTSS (histatenase S), VAV2 (vav 2 guanine nucleotide exchange factor), NPY2R (neuropeptide Y receptor Y2), IGFBP2 (insulin-like growth factor binding protein 2, 36 kDa), CD28 (CD28 molecule), GSTA1 (glutathione S-transferase alpha 1), PPIA (peptidylprolyl isomerase A (cyclophilin A)), APOF1 (apolipoprotein FI (beta-2-glycoprotein I)), S100A8 (S100 calcium-binding protein A8), IL11 (interleukin 11), ALOX15 (arachidonic acid 15-lipoxygenase), FBLN1 (fibroin 1), NR1F13 (nuclear receptor subfamily 1, FI group, member 3), SCD (stearoyl-CoA desaturase (delta-9-desaturase)), GIP (gastric inhibitory polypeptide), CF1GB (chromogranin B (secretogranin 1)), PRKCB (protein kinase C, β), SRD5A1 (steroid-5-α-reductase, α polypeptide 1 (3-hydroxy-5α-steroid delta 4-dehydrogenase α1)), F1SD11B2 (hydroxysteroid (11-β) dehydrogenase 2), CALCRL (calcitonin receptor-like), GALNT2 (UDP-N-acetyl-α-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase 2 (GalNAc-T2)), ANGPTL4 (angiopoietin-like 4), KCNN4 (potassium intermediate/small conductance calcium-activated channel, subfamily N, member 4), PIK3C2A (phosphoinositide-3-kinase, class 2, alpha polypeptide), HBEGF (heparin-binding EGF-like growth factor), CYP7A1 (cytochrome P450, family 7, subfamily A, polypeptide 1), HLA-DRB5 (major histocompatibility complex, class II, DRβ5), BNIP3 (BCL2/adenovirus E1B 19 kDa interacting protein 3), GCKR (glucokinase (hexokinase 4) regulator), S100A12 (S100 calcium-binding protein A 12), PADI4 (peptidylarginine deaminase, type IV), HSPA14 (heat shock 70 kDa protein 14), CXCR1 (chemolysin (C-X-C motif) receptor 1), H19 (H19, imprinted maternal transcript (non-protein coding)), KRTAP19-3 (keratin-associated protein 19-3), insulin, RAC2 (ras-related C3 botulinum toxin substrate 2 (rho family, small GTP-binding protein Rac2)), RYR1 (ryanodine receptor 1 (skeletal)), CLOCK (clock homolog (mouse)), NGFR (neural growth factor receptor (TNFR superfamily, member 16)), DBH (dopamine beta-hydroxylase (dopamine beta-monooxygenase)), CHRNA4 (choleline stimulatory receptor, niacin, alpha 4), CACNA1C (calcium channel, voltage-dependent, L-type, alpha 1C subunit), PRKAG2 (protein kinase, AMP-activated, gamma 2 non-catalytic subunit), CHAT (choleline acetyltransferase), PTGDS (prostaglandin D2 synthase 21 kDa (brain)), NR1H2 (nuclear receptor subfamily 1, group H, member 2), TEK (TEK tyrosine kinase, endothelial), VEGFB (vascular endothelial growth factor B), MEF2C (myocyte enhancer factor 2C), MAPKAPK2 (mitogen-activated protein kinase-activated protein kinase 2), TNFRSF11 A (tumor necrosis factor receptor subfamily, member 11a, NFKB activator), HSPA9 (heat shock 70 kDa protein 9 (life span protein)), CYSLTR1 (cysteine leukotriene receptor 1), MAT1A (methionine adenosine transferase I, alpha), OPRL1 (opium receptor-like 1), IMPA1 (inositol (muscle)-1(or 4)-monophosphatase 1), CLCN2 (chloride channel 2), DLD (dihydrolipoic acid amide dehydrogenase), PSMA6 (proteasome (precursor, megalin factor) subunit, alpha type, 6), PSMB8 (proteasome (precursor, megalin factor) subunit, beta type, 8 (large multifunctional peptidase 7)), CHI3L1 (chitinase 3-like 1 (chondroitin-39)), ALDH1B1 (aldehyde dehydrogenase 1 family, member Bl), PARP2 (poly (ADP-ribose) polymerase 2), STAR (steroidogenic acute regulatory protein), LBP (lipopolysaccharide binding protein), ABCC6 (ATP-binding cassette, subfamily C (CFTR/MRP), member 6), RGS2 (regulator of G protein signaling 2, 24 kDa), EFNB2 (adrenaline-B2), cystic fibrosis transmembrane conductance regulator (CFTR), GJB6 (gap junction protein, β6, 30 kDa), APOA2 (apolipoprotein A-II), AMPD1 (adenosine monophosphate deaminase 1), DYSF (dysferlin, limb-girdle muscular atrophy 2B (somatic recessive)), FDFT1 (farnesyl-diphosphate farnesyltransferase 1), EDN2 (endothelin 2), CCR6 (interleukin (C-C motif) receptor 6), GJB3 (gap junction protein, β3, 31 kDa), IL1RL1 (interleukin 1 receptor-like 1), ENTPD1 (ectonucleoside triphosphate diphosphohydrolase 1), BBS4 (Bardet-Biedl syndrome 4), CELSR2 (calcified mucin, EGF LAG seven-channel G-type receptor 2 (red rooster homolog, fruit fly)), F11R (Fll receptor), RAPGEF3 (Rap guanine nucleotide exchange factor (GEF) 3), HYAL1 (hyaluronosidase 1), ZNF259 (zinc finger protein 259), ATOX1 (ATX1 antioxidant protein 1 homolog (yeast)), ATF6 (activating transcription factor 6), KΉK (ketohexokinase (fructokinase)), SAT1 (spermidine/spermine Nl-acetyltransferase 1), GGFI (γ-glutamicin hydrolase (binding enzyme, pyrrolyl poly-γ-glutamicin hydrolase)), TIMP4 (TIMP metallopeptidase inhibitor 4), SLC4A4 (solute carrier family 4, sodium bicarbonate co-transporter, member 4), PDE2A (phosphodiesterase 2 A, cGMP stimulated), PDE3B (phosphodiesterase 3B, cGMP inhibited), FADS1 (fatty acid desaturase 1), FADS2 (fatty acid desaturase 2), TMSB4X (thymosin beta 4, X-linked), TXNIP (thioredoxin interacting protein), LIMS1 (LIM and senescent cell antigen-like domain 1), RFIOB (ras homolog gene family, member B), LY96 (lymphocyte antigen 96), F-side 1 (forkhead box 01), PNPLA2 (patatin-like phospholipase domain-containing 2), TRH (thyroid-stimulating hormone-releasing hormone), GJC1 (gap junction protein, gamma 1, 45 kDa), SLC17A5 (solute carrier family 17 (anion/sugar transporter), member 5), FTO (fat mass and obesity-related), GJD2 (gap junction protein, delta 2, 36 kDa), PSRC1 (proline/serine-rich coiled coil 1), CASP12 (caspase 12 (gene/pseudogene)), GPBAR1 (G protein-coupled bile acid receptor 1), PXK (PX domain containing serine/threonine kinase), IL33 (interleukin 33), TRIB1 (tricholoma homolog 1 (Drosophila)), PBX4 (pre-B cell leukemia homeobox 4), NUPR1 (nuclear protein, transcriptional regulator, 1), 15-Sep (15 kDa selenoprotein), CILP2 (chondrocyte intermediate layer protein 2), TERC (telomerase RNA component), GGT2 (gamma-glutamyl transferase 2), MT-CO1 (mitochondrial encoded cytochrome c oxidase I), UOX (urate oxidase, pseudogene), CRISPR/Cas effector polypeptide, enzymatically active CRISPR/Cas effector polypeptide (e.g., capable of cleaving a target nucleic acid), and enzymatically inactive CRISPR/Cas effector polypeptide (e.g., does not cleave a target nucleic acid, but retains binding to a target nucleic acid). In some cases, the donor DNA encodes a wild-type form of any of the foregoing polypeptides; that is, the donor DNA may encode a "normal" form that does not include mutations that result in reduced function, lack of function, or pathogenicity.

在一些情況下,供體DNA包含編碼螢光多肽之核苷酸序列。合適螢光蛋白包括但不限於綠色螢光蛋白(GFP)或其變異體、GFP之藍色螢光變異體(BFP)、GFP之青色螢光變異體(CFP)、GFP之黃色螢光變異體(YFP)、增強型GFP (EGFP)、增強型CFP (ECFP)、增強型YFP (EYFP)、GFPS65T、Emerald、Topaz (TYFP)、Venus、Citrine、mCitrine、GFPuv、去穩定EGFP (dEGFP)、去穩定ECFP (dECFP)、去穩定EYFP (dEYFP)、mCFPm、Cerulean、T-Sapphire、CyPet、YPet、mKO、HcRed、t-HcRed、DsRed、DsRed2、DsRed-單體、J-Red、dimer2、t-dimer2(12)、mRFPl、pocilloporin、海腎GFP、Monster GFP、paGFP、Kaede蛋白及點燃蛋白、藻膽蛋白及藻膽蛋白結合物(包括B-藻紅蛋白、R-藻紅蛋白及異藻藍蛋白)。螢光蛋白之其他實例包括mHoneydew、mBanana、mOrange、dTomato、tdTomato、mTangerine、mStrawberry、mCherry、mGrapel、mRaspberry、mGrape2、m PI urn (Shaner等人(2005) Nat. Methods 2:905-909)及其類似物。來自珊瑚蟲物種之多種螢光及有色蛋白質中之任一種均可經編碼,如例如Matz等人 (1999) Nature Biotechnol. 17:969-973中所述。In some cases, the donor DNA comprises a nucleotide sequence encoding a fluorescent polypeptide. Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilized EGFP (dEGFP), destabilized ECFP (dECFP), destabilized EYFP (dEGFP), destabilized EY ...EFP), destabilized EYFP (dEFP), destabilized EYFP (dEFP), destabilized EYFP (dEFP), destabilized EYFP (dEFP), destabilized EYFP (dEFP), destabilized EYFP (dEFP), destabilized (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t-HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2 (12), mRFPl, pocilloporin, sea GFP, Monster GFP, paGFP, Kaede protein and kindling protein, phycobiliprotein and phycobiliprotein conjugates (including B-phycoerythrin, R-phycoerythrin and isophycocyanin). Other examples of fluorescent proteins include mHoneydew, mBanana, mOrange, dTomato, tdTomato, mTangerine, mStrawberry, mCherry, mGrapel, mRaspberry, mGrape2, m PI urn (Shaner et al. (2005) Nat. Methods 2:905-909) and the like. Any of a variety of fluorescent and colored proteins from coral polyp species may be encoded, as described, e.g., in Matz et al. (1999) Nature Biotechnol. 17:969-973.

在一些情況下,供體DNA編碼RNA,例如siRNA、微小RNA、短髮夾RNA (shRNA)、反義RNA、核糖開關、核酶、適體、核糖體RNA、轉移RNA及其類似物。In some cases, the donor DNA encodes RNA, such as siRNA, microRNA, short hairpin RNA (shRNA), antisense RNA, riboswitch, ribozyme, aptamer, ribosomal RNA, transfer RNA, and the like.

除了編碼一或多種基因產物(例如,RNA及/或多肽)之核苷酸序列以外,供體DNA亦可包括一或多種轉錄控制元件,例如啟動子、增強子及其類似物。在一些情況下,轉錄控制元件係誘導型的。在一些情況下,啟動子係可逆的。在一些情況下,轉錄控制元件係組成型的。在一些情況下,啟動子在真核細胞中具有功能。在一些情況下,啟動子係細胞類型特異性啟動子。在一些情況下,啟動子為組織特異性啟動子。In addition to nucleotide sequences encoding one or more gene products (e.g., RNA and/or polypeptides), the donor DNA may also include one or more transcriptional control elements, such as promoters, enhancers, and the like. In some cases, the transcriptional control element is an inducible type. In some cases, the promoter is reversible. In some cases, the transcriptional control element is constitutive. In some cases, the promoter is functional in eukaryotic cells. In some cases, the promoter is a cell type-specific promoter. In some cases, the promoter is a tissue-specific promoter.

供體DNA之核苷酸序列通常與其置換之標靶核酸(例如,基因體序列)不一致。更確切地,供體DNA可相對於標靶核酸(例如,基因體序列)含有至少一或多種單鹼基變化、插入、缺失、倒置或重排,只要存在足夠同源性以支持同源定向修復(例如,用於基因校正,例如以轉化致病鹼基對或非致病鹼基對)。在一些情況下,供體DNA包含側接兩個同源區之非同源序列,使得標靶DNA區域與兩個側接序列之間之同源定向修復導致該非同源序列插入標靶區域處。供體DNA亦可包含載體骨架,該載體骨架含有與相關DNA區域(標靶核酸)不同源且不欲插入相關DNA區域(標靶核酸)中之序列。一般而言,供體序列之同源區與需要重組之標靶核酸(例如,基因體序列)將具有至少50%序列一致性。在某些情況下,存在60%、70%、80%、90%、95%、98%、99%或99.9%序列一致性。可存在1%與100%序列一致性之間之任何值,這取決於供體多核苷酸之長度。The nucleotide sequence of the donor DNA is generally inconsistent with the target nucleic acid (e.g., genome sequence) that it replaces. More specifically, the donor DNA may contain at least one or more single base changes, insertions, deletions, inversions, or rearrangements relative to the target nucleic acid (e.g., genome sequence), as long as there is sufficient homology to support homology-directed repair (e.g., for gene correction, such as to convert pathogenic base pairs or non-pathogenic base pairs). In some cases, the donor DNA comprises a non-homologous sequence flanked by two homologous regions, such that homology-directed repair between the target DNA region and the two flanking sequences results in the insertion of the non-homologous sequence into the target region. The donor DNA may also comprise a vector backbone containing a sequence that is not homologous to the relevant DNA region (target nucleic acid) and is not intended to be inserted into the relevant DNA region (target nucleic acid). Generally, the homologous region of the donor sequence and the target nucleic acid (e.g., genomic sequence) to be recombined will have at least 50% sequence identity. In some cases, there is 60%, 70%, 80%, 90%, 95%, 98%, 99% or 99.9% sequence identity. Any value between 1% and 100% sequence identity may exist, depending on the length of the donor polynucleotide.

如與標靶核酸(例如,基因體序列)相比,供體DNA可包含某些核苷酸序列差異,其中此類差異包括例如限制性位點、核苷酸多態性、選擇標記物(例如,抗藥性基因、螢光蛋白、酶等)等,其可用於評估供體DNA在裂解位點處之成功插入,或在一些情況下可用於其他目的(例如,表示在靶向基因體基因座處之表現)。在一些情況下,若位於編碼區中,則此類核苷酸序列差異將不會使胺基酸序列發生變化,或將產生沈默胺基酸變化(亦即,不影響蛋白質結構或功能之變化)。或者,此等序列差異可包括側接重組序列,諸如FLP、loxP序列或其類似序列,該等序列可在稍後時間經活化以移除標記序列。在一些情況下,供體DNA將包括一或多個核苷酸序列以幫助供體定位至受體細胞之細胞核或幫助供體DNA整合至標靶核酸中。例如,在一些情況下,供體DNA可包含編碼一或多個核定位信號之一或多個核苷酸序列及其類似序列(Frietas等人 (2009) Cun- Genomics 10:550-7)。在一些情況下,供體DNA將包括核苷酸序列以募集DNA修復酶來增加插入效率。參與同源定向修復之Fiuman酶包括MRN-CtIP、BLM-DNA2、Exol、ERCC1、Rad51、Rad52、Ligase 1、RoIQ、PARP1、Ligase 3、BRCA2、RecQ/BLM-ToroIIIa、RTEL、Roΐd及Roΐh (Verma及Greenburg (2016) Genes Dev. 30 (10): 1138-1154)。在一些情況下,供體DNA作為重構染色質經遞送(Cruz-Becerra及Kadonaga (2020) eLife 2020;9:e55780 DOI: 10.7554/eLife. 55780)。As compared to the target nucleic acid (e.g., genomic sequence), the donor DNA may include certain nucleotide sequence differences, where such differences include, for example, restriction sites, nucleotide polymorphisms, selection markers (e.g., drug resistance genes, fluorescent proteins, enzymes, etc.), etc., which can be used to assess the successful insertion of the donor DNA at the cleavage site, or in some cases can be used for other purposes (e.g., to indicate expression at the targeted genomic locus). In some cases, if located in the coding region, such nucleotide sequence differences will not cause changes in the amino acid sequence, or will produce silent amino acid changes (i.e., changes that do not affect protein structure or function). Alternatively, such sequence differences may include flanking recombination sequences, such as FLP, loxP sequences, or the like, which can be activated at a later time to remove the marker sequence. In some cases, the donor DNA will include one or more nucleotide sequences to help the donor localize to the nucleus of the recipient cell or to help the donor DNA integrate into the target nucleic acid. For example, in some cases, the donor DNA may include one or more nucleotide sequences encoding one or more nuclear localization signals and their analogs (Frietas et al. (2009) Cun-Genomics 10:550-7). In some cases, the donor DNA will include nucleotide sequences to recruit DNA repair enzymes to increase insertion efficiency. Fiuman enzymes involved in homology-directed repair include MRN-CtIP, BLM-DNA2, Exol, ERCC1, Rad51, Rad52, Ligase 1, RoIQ, PARP1, Ligase 3, BRCA2, RecQ/BLM-ToroIIIa, RTEL, Roΐd and Roΐh (Verma and Greenburg (2016) Genes Dev. 30 (10): 1138-1154). In some cases, donor DNA is delivered as reconstructed chromatin (Cruz-Becerra and Kadonaga (2020) eLife 2020;9:e55780 DOI: 10.7554/eLife.55780).

在一些情況下,藉由任何便利方法來保護供體DNA之末端(例如,免於核酸外切降解)且此類方法係熟習此項技術者已知的。例如,可將一或多個雙去氧核苷酸殘基添加至線性分子之3'末端及/或可將自互補寡核苷酸接合至一端或兩端。參見例如Chang等人 (1987) Proc. Natl. Acad Sci USA 84:4959-4963;Nehls等人(1996) Science 272:886-889。用於保護外源性多核苷酸免於降解之額外方法包括但不限於添加末端胺基,及使用經修飾之核苷酸間鍵聯,例如硫代磷酸酯、胺基磷酸酯及O-甲基核糖或去氧核糖殘基。作為保護線性供體DNA末端之替代方案,可在同源區外部包括額外長度之序列,該等序列可發生降解而不影響重組。 連接體 In some cases, the ends of the donor DNA are protected (e.g., from exonucleolytic degradation) by any convenient method and such methods are known to those skilled in the art. For example, one or more bi-deoxynucleotide residues can be added to the 3' end of the linear molecule and/or a self-complementary oligonucleotide can be ligated to one or both ends. See, e.g., Chang et al. (1987) Proc. Natl. Acad Sci USA 84:4959-4963; Nehls et al. (1996) Science 272:886-889. Additional methods for protecting exogenous polynucleotides from degradation include, but are not limited to, adding terminal amine groups, and using modified internucleotide linkages, such as phosphorothioates, phosphoramidates, and O-methyl ribose or deoxyribose residues. As an alternative to protecting the ends of the linear donor DNA, additional lengths of sequence can be included outside the homology region that can be degraded without affecting recombination .

在一些實施例中,基於逆轉錄子之基因編輯系統的多肽藉由連接體彼此偶合至一或多種輔助功能。此類輔助功能可包括去胺酶、核酸酶、逆轉錄酶及重組酶。例如,逆轉錄子RT可連接至可程式化核酸酶,諸如II型或V型CRISPR核酸酶。如提及融合蛋白時所用之術語連接體係指接合蛋白質以形成融合蛋白之分子。通常,此類分子除了接合或保存蛋白質之間之一些最小距離或其他空間關係之外不具有特定生物活性。然而,在一實施例中,可選擇連接體以影響連接體及/或融合蛋白之一些特性,諸如連接體之折疊、淨電荷或疏水性。In some embodiments, the polypeptides of the retrotranscript-based gene editing system are coupled to one or more auxiliary functions via linkers. Such auxiliary functions may include deaminases, nucleases, reverse transcriptases, and recombinases. For example, the retrotranscript RT can be linked to a programmable nuclease, such as a type II or type V CRISPR nuclease. The term linker as used in reference to fusion proteins refers to a molecule that joins proteins to form a fusion protein. Typically, such molecules do not have specific biological activity other than joining or preserving some minimum distance or other spatial relationship between proteins. However, in one embodiment, the linker can be selected to affect some properties of the linker and/or fusion protein, such as the folding, net charge, or hydrophobicity of the linker.

用於本發明方法之合適連接體係熟習此項技術者熟知的,且包括但不限於直鏈或分支鏈碳連接體、雜環碳連接體或肽連接體。較佳肽連接體序列採用柔性延伸構形,且不展現發展有序二級結構之傾向。在一實施例中,該連接體可為化學部分,該化學部分可為單體、二聚體、多聚體或聚合物的。較佳地,該連接體包含胺基酸。柔性連接體中之典型胺基酸包括Gly、Asn及Ser。Suitable linkers for use in the methods of the present invention are well known to those skilled in the art and include, but are not limited to, linear or branched carbon linkers, heterocyclic carbon linkers, or peptide linkers. Preferred peptide linker sequences adopt a flexible extended conformation and do not exhibit a tendency to develop an ordered secondary structure. In one embodiment, the linker may be a chemical moiety that may be a monomer, dimer, multimer, or polymer. Preferably, the linker comprises an amino acid. Typical amino acids in flexible linkers include Gly, Asn, and Ser.

因此,在特定實施例中,該連接體包含Gly、Asn及Ser胺基酸中之一或多者的組合。其他近中性胺基酸(諸如Thr及Ala)亦可用於連接體序列中。例示性連接體揭示於Maratea等人 (1985), Gene 40: 39-46;Murphy等人 (1986) Proc. Nat'l. Acad. Sci. USA 83: 8258-62;美國專利第4,935,233號;及美國專利第4,751,180號中。例如,GlySer連接體可基於GGS之重複單元,亦即,至多2、3、4、5、6、7、8、9、10、11個或甚至12個或更多重複單元,包括但不限於: SEQ ID 描述 序列    基於GGS重複單元之GlySer連接體 GGS 19469 基於GGS重複單元之GlySer連接體 GGS GGS 19470 基於GGS重複單元之GlySer連接體 GGS GGS GGS 19471 基於GGS重複單元之GlySer連接體 GGS GGS GGS GGS 19472 基於GGS重複單元之GlySer連接體 GGS GGS GGS GGS GGS 19473 基於GGS重複單元之GlySer連接體 GGS GGS GGS GGS GGS GGS 19474 基於GGS重複單元之GlySer連接體 GGS GGS GGS GGS GGS GGS GGS 19475 基於GGS重複單元之GlySer連接體 GGS GGS GGS GGS GGS GGS GGS GGS 19476 基於GGS重複單元之GlySer連接體 GGS GGS GGS GGS GGS GGS GGS GGS GGS 19477 基於GGS重複單元之GlySer連接體 GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS 19478 基於GGS重複單元之GlySer連接體 GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS 19479 基於GGS重複單元之GlySer連接體 GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS 19480 基於GGS重複單元之GlySer連接體 GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS 19481 基於GGS重複單元之GlySer連接體 GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS 19482 基於GGS重複單元之GlySer連接體 GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS Thus, in certain embodiments, the linker comprises a combination of one or more of the amino acids Gly, Asn, and Ser. Other near-neutral amino acids such as Thr and Ala may also be used in the linker sequence. Exemplary linkers are disclosed in Maratea et al. (1985), Gene 40: 39-46; Murphy et al. (1986) Proc. Nat'l. Acad. Sci. USA 83: 8258-62; U.S. Patent No. 4,935,233; and U.S. Patent No. 4,751,180. For example, a GlySer linker may be based on a repeat unit of GGS, i.e., up to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or even 12 or more repeat units, including but not limited to: SEQ ID describe sequence GlySer linker based on GGS repeat unit GGS 19469 GlySer linker based on GGS repeat unit GGS GGS 19470 GlySer linker based on GGS repeat unit GGS GGS GGS 19471 GlySer linker based on GGS repeat unit GGS GGS GGS GGS 19472 GlySer linker based on GGS repeat unit GGS GGS GGS GGS GGS 19473 GlySer linker based on GGS repeat unit GGS GGS GGS GGS GGS GGS 19474 GlySer linker based on GGS repeat unit GGS GGS GGS GGS GGS GGS GGS 19475 GlySer linker based on GGS repeat unit GGS GGS GGS GGS GGS GGS GGS GGS 19476 GlySer linker based on GGS repeat unit GGS GGS GGS GGS GGS GGS GGS GGS GGS 19477 GlySer linker based on GGS repeat unit GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS 19478 GlySer linker based on GGS repeat unit GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS 19479 GlySer linker based on GGS repeat unit GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS 19480 GlySer linker based on GGS repeat unit GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS 19481 GlySer linker based on GGS repeat unit GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS 19482 GlySer linker based on GGS repeat unit GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS GGS

在另一實例中,GlySer連接體可基於GSG之重複單元,亦即,至多2、3、4、5、6、7、8、9、10、11個或甚至12個或更多重複單元,包括但不限於: SEQ ID 描述 序列    基於GSG重複單元之GlySer連接體 GSG 19483 基於GSG重複單元之GlySer連接體 GSG GSG 19484 基於GSG重複單元之GlySer連接體 GSG GSG GSG 19485 基於GSG重複單元之GlySer連接體 GSG GSG GSG GSG 19486 基於GSG重複單元之GlySer連接體 GSG GSG GSG GSG GSG 19487 基於GSG重複單元之GlySer連接體 GSG GSG GSG GSG GSG GSG 19488 基於GSG重複單元之GlySer連接體 GSG GSG GSG GSG GSG GSG GSG GSG 19489 基於GSG重複單元之GlySer連接體 GSG GSG GSG GSG GSG GSG GSG GSG GSG 19490 基於GSG重複單元之GlySer連接體 GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG 19491 基於GSG重複單元之GlySer連接體 GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG 19492 基於GSG重複單元之GlySer連接體 GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG 19493 基於GSG重複單元之GlySer連接體 GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG 19494 基於GSG重複單元之GlySer連接體 GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG 19495 基於GSG重複單元之GlySer連接體 GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG In another example, the GlySer linker can be based on repeat units of GSG, i.e., up to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or even 12 or more repeat units, including but not limited to: SEQ ID describe sequence GlySer linker based on GSG repeat unit GSG 19483 GlySer linker based on GSG repeat unit GSG GSG 19484 GlySer linker based on GSG repeat unit GSG GSG GSG 19485 GlySer linker based on GSG repeat unit GSG GSG GSG GSG 19486 GlySer linker based on GSG repeat unit GSG GSG GSG GSG GSG 19487 GlySer linker based on GSG repeat unit GSG GSG GSG GSG GSG GSG 19488 GlySer linker based on GSG repeat unit GSG GSG GSG GSG GSG GSG GSG GSG 19489 GlySer linker based on GSG repeat unit GSG GSG GSG GSG GSG GSG GSG GSG GSG 19490 GlySer linker based on GSG repeat unit GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG 19491 GlySer linker based on GSG repeat unit GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG 19492 GlySer linker based on GSG repeat unit GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG 19493 GlySer linker based on GSG repeat unit GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG 19494 GlySer linker based on GSG repeat unit GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG 19495 GlySer linker based on GSG repeat unit GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG GSG

在另一實例中,GlySer連接體可基於GGGS之重複單元,亦即,至多2、3、4、5、6、7、8、9、10、11個或甚至12個或更多重複單元,包括但不限於: SEQ ID 描述 序列 19496 基於GGGS重複單元之GlySer連接體 GGGS 19497 基於GGGS重複單元之GlySer連接體 GGGS GGGS 19498 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS 19499 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS GGGS 19500 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS GGGS GGGS 19501 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS GGGS GGGS GGGS 19502 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19503 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19504 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19505 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19506 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19507 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19508 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19509 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19510 基於GGGS重複單元之GlySer連接體 GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS In another example, the GlySer linker can be based on repeat units of GGGS, i.e., up to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or even 12 or more repeat units, including but not limited to: SEQ ID describe sequence 19496 GlySer linker based on GGGS repeat unit GGGS 19497 GlySer linker based on GGGS repeat unit GGGS GGGS 19498 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS 19499 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS GGGS 19500 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS GGGS GGGS 19501 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS GGGS GGGS GGGS 19502 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19503 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19504 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19505 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19506 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19507 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19508 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19509 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS 19510 GlySer linker based on GGGS repeat unit GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS GGGS

在另一實例中,GlySer連接體可基於GGGGS之重複單元,亦即,至多2、3、4、5、6、7、8、9、10、11個或甚至12個或更多重複單元,包括但不限於: SEQ ID 描述 序列 19511 基於GGGGS重複單元之GlySer連接體 GGGGS 19512 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS 19513 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS 19514 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS GGGGS 19515 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS GGGGS GGGGS 19516 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19517 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19518 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19519 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19520 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19521 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19522 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19523 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19524 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19525 基於GGGGS重複單元之GlySer連接體 GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS In another example, the GlySer linker can be based on repeat units of GGGGS, i.e., up to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or even 12 or more repeat units, including but not limited to: SEQ ID describe sequence 19511 GlySer linker based on GGGGS repeat unit GGGGS 19512 GlySer linker based on GGGGS repeat unit GGGGS GGGGS 19513 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS 19514 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS GGGGS 19515 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS GGGGS GGGGS 19516 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19517 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19518 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19519 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19520 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19521 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19522 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19523 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19524 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS 19525 GlySer linker based on GGGGS repeat unit GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS GGGGS

在另一實施例中,LEPGEKPYKCPECGKSFSQSGALTRHQRTHTR (SEQ ID NO: 19526)用作連接體。In another embodiment, LEPGEKPYKCPECGKSFSQSGALTRHQRTHTR (SEQ ID NO: 19526) is used as a linker.

在另一額外實施例中,該連接體為XTEN連接體,其為TCGGGATCTGAGACGCCTGGGACCTCGGAATCGGCTACGCCCGAAAGT (SEQ ID NO: 19527)。在其他特定實施例中,Cas12a多肽在C末端藉助於LEPGEKPYKCPECGKSFSQSGALTRHQRTHTRLEPGEKPYKCPECGKSFSQSGALTRHQRTHTRLEPGEKPYKCPECGKSFSQSGALTRHQRTHTR (SEQ ID NO: 19528)連接體連接至去胺酶蛋白或其催化結構域之N末端。另外,N末端及C末端NLS亦可充當連接體(例如,PKKKRKVEASSPKKRKVEAS (SEQ ID NO: 19420))。In another additional embodiment, the linker is an XTEN linker, which is TCGGGATCTGAGACGCCTGGGACCTCGGAATCGGCTACGCCCGAAAGT (SEQ ID NO: 19527). In other specific embodiments, the Cas12a polypeptide is linked to the N-terminus of the deaminase protein or its catalytic domain at the C-terminus by means of a LEPGEKPYKCPECGKSFSQSGALTRHQRTHTRLEPGEKPYKCPECGKSFSQSGALTRHQRTHTRLEPGEKPYKCPECGKSFSQSGALTRHQRTHTR (SEQ ID NO: 19528) linker. In addition, N-terminal and C-terminal NLSs can also serve as linkers (e.g., PKKKRKVEASSPKKRKVEAS (SEQ ID NO: 19420)).

連接體之上述描述意欲為非限制性的,且包括上述連接體之任何組合或重複GlySer連接體之異源組合。The above description of linkers is intended to be non-limiting and includes any combination of the above linkers or heterologous combinations of repeat GlySer linkers.

該連接體可如同共價鍵般簡單,或其可為多個原子長之聚合連接體。在某些實施例中,該連接體為多肽或基於胺基酸。在其他實施例中,該連接體並非肽樣的。在某些實施例中,該連接體為共價鍵(例如,碳-碳鍵、二硫鍵、碳-雜原子鍵等)。在某些實施例中,該連接體為醯胺鍵聯之碳-氮鍵。在某些實施例中,該連接體為環狀或無環、經取代或未經取代、分支鏈或無分支鏈脂族或雜脂族連接體。在某些實施例中,該連接體為聚合物(例如,聚乙烯、聚乙二醇、聚醯胺、聚酯等)。在某些實施例中,該連接體包含胺基烷酸之單體、二聚體或聚合物。在某些實施例中,該連接體包含胺基烷酸(例如,甘胺酸、乙酸、丙胺酸、β-丙胺酸、3-胺基丙酸、4-胺基丁酸、5-戊酸等)。在某些實施例中,該連接體包含胺基己酸(Ahx)之單體、二聚體或聚合物。在某些實施例中,該連接體係基於碳環部分(例如,環戊烷、環己烷)。在其他實施例中,該連接體包含聚乙二醇部分(PEG)。在其他實施例中,該連接體包含胺基酸。在某些實施例中,該連接體包含肽。在某些實施例中,該連接體包含芳基或雜芳基部分。在某些實施例中,該連接體係基於苯環。該連接體可包括官能化部分以促進親核試劑(例如,硫醇、胺基)自肽連接至該連接體。任何親電子劑均可用作該連接體之部分。例示性親電子劑包括但不限於活化酯、活化醯胺、邁克爾受體、烷基鹵化物、芳基鹵化物、醯基鹵化物及異硫氰酸酯。The linker can be as simple as a covalent bond, or it can be a polymeric linker of multiple atoms in length. In some embodiments, the linker is a polypeptide or based on amino acids. In other embodiments, the linker is non-peptide-like. In some embodiments, the linker is a covalent bond (e.g., a carbon-carbon bond, a disulfide bond, a carbon-heteroatom bond, etc.). In some embodiments, the linker is an amide-linked carbon-nitrogen bond. In some embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or heteroaliphatic linker. In some embodiments, the linker is a polymer (e.g., polyethylene, polyethylene glycol, polyamide, polyester, etc.). In some embodiments, the linker comprises a monomer, dimer or polymer of aminoalkanoic acid. In some embodiments, the linker comprises aminoalkanoic acid (e.g., glycine, acetic acid, alanine, β-alanine, 3-aminopropionic acid, 4-aminobutyric acid, 5-pentanoic acid, etc.). In some embodiments, the linker comprises a monomer, dimer or polymer of aminohexanoic acid (Ahx). In some embodiments, the linker is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol moiety (PEG). In other embodiments, the linker comprises an amino acid. In some embodiments, the linker comprises a peptide. In some embodiments, the linker comprises an aryl or heteroaryl moiety. In some embodiments, the linker is based on a benzene ring. The linker can include a functionalized portion to facilitate attachment of a nucleophilic agent (e.g., thiol, amine group) from the peptide to the linker. Any electrophilic agent can be used as part of the linker. Exemplary electrophilic agents include, but are not limited to, activated esters, activated amides, Michael acceptors, alkyl halides, aryl halides, acyl halides, and isothiocyanates.

該連接體可為例如可裂解連接體或蛋白酶敏感性連接體。在一些實施例中,該連接體選自由F2A連接體、P2A連接體、T2A連接體、E2A連接體及其組合組成之群。此自裂解肽連接體家族係稱為2A肽,已描述於此項技術中(參見例如Kim, J. H.等人 (2011) PLoS ONE 6:e18556)。在一些實施例中,該連接體為F2A連接體。在一些實施例中,該連接體為GGGS連接體。在一些實施例中,融合蛋白含有三個具有間插連接體之結構域,具有結構:結構域-連接體-結構域-連接體-結構域。The linker can be, for example, a cleavable linker or a protease-sensitive linker. In some embodiments, the linker is selected from the group consisting of a F2A linker, a P2A linker, a T2A linker, an E2A linker, and combinations thereof. This family of self-cleaving peptide linkers is called 2A peptides and has been described in the art (see, for example, Kim, J. H. et al. (2011) PLoS ONE 6: e18556). In some embodiments, the linker is a F2A linker. In some embodiments, the linker is a GGGS linker. In some embodiments, the fusion protein contains three domains with an intervening linker, having the structure: domain-linker-domain-linker-domain.

此項技術中已知之可裂解連接體可與本揭示案結合使用。例示性此類連接體包括:F2A連接體、T2A連接體、P2A連接體、E2A連接體(參見例如WO2017127750)。熟練技術人員應理解,其他技術公認之連接體可適合用於本揭示案之構築體(例如,由本揭示案之核酸編碼)。熟練技術人員同樣應理解,其他多順反子構築體(在同一分子內分別編碼超過一種核鹼基編輯系統組分/多肽之mRNA)可能適合如本文所提供之用途。 核定位結構域 Cleavable linkers known in the art may be used in conjunction with the present disclosure. Exemplary such linkers include: F2A linker, T2A linker, P2A linker, E2A linker (see, e.g., WO2017127750). A skilled artisan will appreciate that other art-recognized linkers may be suitable for use with the constructs of the present disclosure (e.g., encoded by the nucleic acids of the present disclosure). A skilled artisan will also appreciate that other polycistronic constructs (mRNAs encoding more than one nucleobase editing system component/polypeptide in the same molecule) may be suitable for use as provided herein. Nuclear Localization Domain

在各個實施例中,該等基因編輯系統或其任何組分可與一或多個核定位序列(NLS),諸如約或超過約1、2、3、4、5、6、7、8、9、10個或更多NLS融合。在一實施例中,基因編輯器組分(例如,核酸可程式化DNA結合蛋白或編輯輔助蛋白)包含胺基末端處或附近之約或超過約1、2、3、4、5、6、7、8、9、10個或跟多NLS,羧基末端處或附近之約或超過約1、2、3、4、5、6、7、8、9、10個或更多NLS,或其組合(例如,胺基末端處之零個或至少一或多個NLS以及羧基末端處之零個或一或多個NLS)。當存在超過一個NLS時,每一者可獨立於其他經選擇,使得單一NLS可以超過一個複本及/或與以一或多個複本存在之一或多個其他NLS組合存在。在本發明之一實施例中,編輯器組分多肽包含至多6個NLS。在一實施例中,當NLS之最近胺基酸沿著自N末端或C末端開始之多肽鏈在約1、2、3、4、5、10、15、20、25、30、40、50個或更多胺基酸內時,NLS被認為在N末端或C末端附近。NLS之非限制性實例包括源自以下之NLS序列:SV40病毒大T抗原之NLS,具有胺基酸序列PKKKRKV (SEQ ID NO: 19399);來自核質蛋白之NLS (例如,具有序列KRPAATKKAGQAKKKK (SEQ ID NO: 19529)之核質蛋白二分NLS ;具有胺基酸序列PAAKRVKLD (SEQ ID NO: 19530)或RQRRNELKRSP (SEQ ID NO: 19531)之c-myc NLS;hRNPAl M9 NLS,具有序列NQSSNFGPMKGGNFGGRSS GPYGGGGQYFAKPRNQGGY (SEQ ID NO: 19532);來自輸入蛋白-α之IBB結構域的序列RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 19533);肌瘤T蛋白之序列VSRKRPRP (SEQ ID NO: 19400)及PPKKARED (SEQ ID NO: 19534);人類p53之序列PQPKKKPL (SEQ ID NO: 19535);小鼠c-abl IV之序列SALIKKKKKMAP (SEQ ID NO: 19536);流感病毒NS 1之序列DRLRR (SEQ ID NO: 19537)及PKQKKRK (SEQ ID NO: 19538);肝炎病毒δ抗原之序列RKLKKKIKKL (SEQ ID NO: 19539);小鼠Mxl蛋白之序列REKKKFLKRR (SEQ ID NO: 19540);人類聚(ADP-核糖)聚合酶之序列KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 19541);及類固醇激素受體(人類)糖皮質激素之序列RI<CLQAGMNLEARI<TI<I< (SEQ ID NO: 19542)。In various embodiments, the gene editing systems or any components thereof can be fused to one or more nuclear localization sequences (NLS), such as about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS. In one embodiment, the gene editor component (e.g., a nucleic acid programmable DNA binding protein or an editing accessory protein) comprises about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs at or near the amino terminus, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs at or near the carboxyl terminus, or a combination thereof (e.g., zero or at least one or more NLSs at the amino terminus and zero or one or more NLSs at the carboxyl terminus). When more than one NLS is present, each may be selected independently of the others, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs present in one or more copies. In one embodiment of the invention, the editor component polypeptide comprises up to 6 NLSs. In one embodiment, an NLS is considered to be near the N-terminus or C-terminus when the most recent amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50 or more amino acids along the polypeptide chain starting from the N-terminus or C-terminus. Non-limiting examples of NLSs include NLS sequences derived from: the NLS of the SV40 virus large T antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 19399); an NLS from a nucleoplasmic protein (e.g., a nucleoplasmic protein bipartite NLS having the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 19529); a c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO: 19530) or RQRRNELKRSP (SEQ ID NO: 19531); an hRNPAl M9 NLS having the sequence NQSSNFGPMKGGNFGGRSS GPYGGGGQYFAKPRNQGGY (SEQ ID NO: 19532); a sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 19533) from the IBB domain of importin-alpha. NO: 19533); the sequence of myoma T protein VSRKRPRP (SEQ ID NO: 19400) and PPKKARED (SEQ ID NO: 19534); the sequence of human p53 PQPKKKPL (SEQ ID NO: 19535); the sequence of mouse c-abl IV SALIKKKKKMAP (SEQ ID NO: 19536); the sequence of influenza virus NS 1 DRLRR (SEQ ID NO: 19537) and PKQKKRK (SEQ ID NO: 19538); the sequence of hepatitis virus delta antigen RKLKKKIKKL (SEQ ID NO: 19539); the sequence of mouse Mxl protein REKKKFLKRR (SEQ ID NO: 19540); the sequence of human poly (ADP-ribose) polymerase KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 19541); and the sequence of the steroid hormone receptor (human) glucocorticoid RI<CLQAGMNLEARI<TI<I< (SEQ ID NO: 19542).

一般而言,一或多個NLS具有足夠強度以驅動相關多肽(例如,逆轉錄子RT)以可偵測之量積聚於真核細胞之細胞核中。一般而言,核定位活性之強度可源自相關多肽(例如,逆轉錄子RT)中之NLS數目、所用之特定NLS或此等因素之組合。可藉由任何合適技術來執行細胞核中之積聚之偵測。Generally, one or more NLSs are strong enough to drive the accumulation of the relevant polypeptide (e.g., retrotranscript RT) in detectable amounts in the nucleus of eukaryotic cells. Generally, the strength of nuclear localization activity can be derived from the number of NLSs in the relevant polypeptide (e.g., retrotranscript RT), the specific NLS used, or a combination of these factors. Detection of accumulation in the cell nucleus can be performed by any suitable technique.

例如,可偵測標記物可與相關多肽(例如,逆轉錄子RT)融合,使得細胞內之位置可能可視化,諸如與用於偵測細胞核位置之手段(例如對細胞核具特異性之染色劑,諸如DAPI)組合。亦可自細胞中分離細胞核,接著可藉由用於偵測蛋白質之任何合適方法,諸如免疫組織化學、西方墨點或酶活性分析來分析細胞核之內容物。亦可間接地確定細胞核中之積聚,諸如藉由用於確定如與未暴露於多肽或複合物或暴露於缺乏一或多個NLS之多肽之對照相比的複合物形成效應之分析(例如,用於確定標靶序列處之DNA裂解或突變之分析,或用於確定受複合物形成及/或多肽活性影響的改變之基因表現活性之分析)。在本文所述之多肽蛋白複合物及系統之一實施例中,經密碼子最佳化之多肽蛋白包含連接至該蛋白的C末端之NLS。在一實施例中,其他定位標籤可與多肽融合,諸如但不限於將多肽定位於細胞中之特定位點,諸如細胞器,諸如粒線體、質體、葉綠體、囊泡、高爾基體、(核或細胞)膜、核糖體、核仁、ER、細胞骨架、空泡、中心體、核小體、顆粒、中心粒等。For example, a detectable marker can be fused to a polypeptide of interest (e.g., a retrotranscript RT) so that the location within the cell can be visualized, such as in combination with a means for detecting the location of the nucleus (e.g., a stain specific for the nucleus, such as DAPI). The nucleus can also be isolated from the cell, and the contents of the nucleus can then be analyzed by any suitable method for detecting proteins, such as immunohistochemistry, Western blot, or enzyme activity analysis. Accumulation in the nucleus can also be determined indirectly, such as by an assay for determining the effect of complex formation (e.g., an assay for determining DNA cleavage or mutation at a target sequence, or an assay for determining altered gene expression activity affected by complex formation and/or polypeptide activity) as compared to a control not exposed to the polypeptide or complex or exposed to a polypeptide lacking one or more NLSs. In one embodiment of the polypeptide protein complexes and systems described herein, the codon-optimized polypeptide protein comprises an NLS linked to the C-terminus of the protein. In one embodiment, other localization tags can be fused to the polypeptide, such as but not limited to localizing the polypeptide to a specific site in the cell, such as an organelle, such as mitochondria, plastids, chloroplasts, vesicles, Golgi apparatus, (nuclear or cell) membrane, ribosomes, nucleoli, ER, cytoskeleton, vacuoles, centrosomes, nucleosomes, granules, centrioles, etc.

在本發明之一實施例中,至少一個核定位信號(NLS)連接至編碼相關多肽之核酸序列(例如,逆轉錄子RT)。在較佳實施例中,至少一或多個C末端或N末端NLS經連接(且因此,編碼多肽之核酸分子可包括編碼NLS,使得所表現之產物連接有NLS)。在一較佳實施例中,連接C末端NLS以用於真核細胞(較佳地,人類細胞)中之最佳表現及核靶向。本發明亦涵蓋用於遞送多種核酸組分之方法,其中每種核酸組分對不同的相關標靶基因座具有特異性,由此修飾多個相關標靶基因座。該複合物之核酸組分可包含一或多種蛋白質結合RNA適體。該一或多種適體能夠結合噬菌體外殼蛋白。In one embodiment of the invention, at least one nuclear localization signal (NLS) is linked to a nucleic acid sequence encoding a polypeptide of interest (e.g., a retrotranscript RT). In a preferred embodiment, at least one or more C-terminal or N-terminal NLSs are linked (and thus, a nucleic acid molecule encoding a polypeptide may include an encoding NLS such that the expressed product is linked to the NLS). In a preferred embodiment, the C-terminal NLS is linked for optimal expression and nuclear targeting in eukaryotic cells (preferably, human cells). The invention also encompasses methods for delivering multiple nucleic acid components, each of which is specific for a different target locus of interest, thereby modifying multiple target loci of interest. The nucleic acid component of the complex may include one or more protein-binding RNA aptamers. The one or more aptamers are capable of binding to bacteriophage coat proteins.

在其他實例中,包含相關多肽(例如,逆轉錄子RT)及另一種輔助蛋白(例如,核酸酶)之融合蛋白含有選自或源自SV40、c-Myc或NLP-1之一或多個核定位信號。In other examples, a fusion protein comprising a polypeptide of interest (eg, a retrotranscript RT) and another accessory protein (eg, a nuclease) contains one or more nuclear localization signals selected from or derived from SV40, c-Myc or NLP-1.

上述NLS實例為非限制性的。本文所考慮之蛋白質及/或融合可包含任何已知NLS序列,包括以下所述之彼等序列中的任一者:Cokol等人,「Finding nuclear localization signals,」 EMBO Rep., 2000, 1(5): 411-415及Freitas等人,「Mechanisms and Signals for the Nuclear Import of Proteins,」 Current Genomics, 2009, 10(8): 550-7,其中每一者均以引用之方式併入本文中。The above NLS examples are non-limiting. The proteins and/or fusions contemplated herein may comprise any known NLS sequence, including any of those described in Cokol et al., "Finding nuclear localization signals," EMBO Rep., 2000, 1(5): 411-415 and Freitas et al., "Mechanisms and Signals for the Nuclear Import of Proteins," Current Genomics, 2009, 10(8): 550-7, each of which is incorporated herein by reference.

在各個實施例中,基於逆轉錄子之編輯系統的任何多肽組分均可經一或多個核定位信號工程改造,這有助於促進蛋白質易位至細胞核中。基於逆轉錄子之編輯系統的多肽可包含任何已知NLS序列,包括以下所述之彼等序列中的任一者:Cokol等人,「Finding nuclear localization signals,」 EMBO Rep., 2000, 1(5): 411-415及Freitas等人,「Mechanisms and Signals for the Nuclear Import of Proteins,」 Current Genomics, 2009, 10(8): 550-7,其中每一者均以引用之方式併入本文中。In various embodiments, any polypeptide component of the retrotransposons-based editing system can be engineered with one or more nuclear localization signals, which helps promote protein translocation into the nucleus. The polypeptides of the retrotransposons-based editing system can comprise any known NLS sequence, including any of those described in Cokol et al., "Finding nuclear localization signals," EMBO Rep., 2000, 1(5): 411-415 and Freitas et al., "Mechanisms and Signals for the Nuclear Import of Proteins," Current Genomics, 2009, 10(8): 550-7, each of which is incorporated herein by reference.

在各個實施例中,本文所揭示之多肽可包括一或多個、較佳地至少兩個核定位信號。NLS可為此項技術中之任何已知NLS序列。NLS亦可為任何未來發現之用於核定位之NLS。NLS亦可為任何天然存在之NLS,或任何非天然存在之NLS (例如,具有一或多種所需突變之NLS)。術語「核定位序列」或「NLS」係指促進蛋白質例如藉由核轉運輸入細胞核中之胺基酸序列。核定位序列係此項技術中已知的且對於熟練技術人員而言將為顯而易見的。例如,NLS序列描述於Plank等人之國際PCT申請案PCT/EP2000/011690中,該申請案在2000年11月23日申請,在2001年5月31日作為WO/2001/038547公開,其內容以引用之方式併入本文中。在一些實施例中,NLS包含胺基酸序列PKKKRKV (SEQ ID NO: 19399)。In various embodiments, the polypeptide disclosed herein may include one or more, preferably at least two, nuclear localization signals. The NLS may be any known NLS sequence in the art. The NLS may also be any NLS discovered in the future for nuclear localization. The NLS may also be any naturally occurring NLS, or any non-naturally occurring NLS (e.g., an NLS with one or more desired mutations). The term "nuclear localization sequence" or "NLS" refers to an amino acid sequence that promotes protein import into the cell nucleus, such as by nuclear transport. Nuclear localization sequences are known in the art and will be apparent to a skilled artisan. For example, NLS sequences are described in Plank et al., International PCT Application PCT/EP2000/011690, filed on November 23, 2000, published as WO/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference. In some embodiments, the NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 19399).

另一代表性核定位信號為肽序列,該肽序列將蛋白質引導至其中表現該序列之細胞的細胞核中。核定位信號主要為鹼性的,幾乎可定位於蛋白質之胺基酸序列中的任何位置,一般包含四個胺基酸(Autieri及Agrawal, (1998) J. Biol. Chem.273: 14731-37,以引用之方式併入本文中)至八個胺基酸之短序列,且通常富含離胺酸及精胺酸殘基(Magin等人, (2000) Virology 274: 11-16,以引用之方式併入本文中)。核定位信號通常包含脯胺酸殘基。多個核定位信號已經鑑定且已用於影響生物分子自細胞之細胞質轉運至細胞核。參見例如Tinland等人, (1992) Proc. Natl. Acad. Sci. U.S.A.89:7442-46;Moede等人, (1999) FEBS Lett.461:229-34,其以引用之方式併入。目前認為易位涉及核孔蛋白。Another representative nuclear localization signal is a peptide sequence that directs a protein to the nucleus of a cell in which the sequence is expressed. Nuclear localization signals are primarily alkaline and can be located at almost any position in the amino acid sequence of a protein, generally comprising a short sequence of four amino acids (Autieri and Agrawal, (1998) J. Biol. Chem. 273: 14731-37, incorporated herein by reference) to eight amino acids, and are typically rich in lysine and arginine residues (Magin et al., (2000) Virology 274: 11-16, incorporated herein by reference). Nuclear localization signals typically comprise proline residues. A number of nuclear localization signals have been identified and have been used to affect the transport of biomolecules from the cytoplasm of a cell to the nucleus. See, e.g., Tinland et al., (1992) Proc. Natl. Acad. Sci. U.S.A. 89:7442-46; Moede et al., (1999) FEBS Lett. 461:229-34, which are incorporated by reference. It is currently believed that translocation involves nucleoporins.

大多數NLS可分類為三個通用組:(i)單分NLS,由SV40大T抗原NLS (PKKKRKV) (SEQ ID NO: 19399)例示;(ii)二分模體,由可變數目之間隔胺基酸隔開的兩個基礎結構域組成且由非洲爪蟾核質蛋白NLS (KRXXXXXXXXXXKKKL) (SEQ ID NO: 19419)例示;及(iii)非規範序列,諸如hnRNP Al蛋白之M9、流感病毒核蛋白NLS及酵母Gal4蛋白NLS (Dingwall及Laskey 1991)。核定位信號出現於蛋白質胺基酸序列中之不同點。已在蛋白質之N末端、C末端及中心區域鑑定出NLS。因此,本揭示案提供可在多肽(包括融合蛋白)之C末端、N末端以及內部區域經一或多個NLS修飾之多肽。Most NLSs can be classified into three general groups: (i) monopartite NLSs, exemplified by the SV40 large T antigen NLS (PKKKRKV) (SEQ ID NO: 19399); (ii) bipartite motifs, consisting of two basic domains separated by a variable number of spacer amino acids and exemplified by the Xenopus laevis nucleoplasmic protein NLS (KRXXXXXXXXXXKKKL) (SEQ ID NO: 19419); and (iii) non-canonical sequences, such as M9 of the hnRNP Al protein, the influenza virus nucleoprotein NLS, and the yeast Gal4 protein NLS (Dingwall and Laskey 1991). Nuclear localization signals occur at different points in the protein amino acid sequence. NLSs have been identified at the N-terminal, C-terminal, and central regions of proteins. Thus, the present disclosure provides polypeptides that can be modified with one or more NLSs at the C-terminus, N-terminus, and internal regions of the polypeptide (including fusion proteins).

本揭示案涵蓋用於修飾多肽以包括一或多各NLS之任何合適手段。在一態樣中,多肽(例如,可程式化核酸酶)可經工程改造以在其N末端或其C末端(或兩者)處表現經轉譯融合之NLS,亦即,形成多肽-NLS融合構築體。另外,NLS可包括在多肽與例如N末端、C末端或內部連接之NLS胺基酸序列之間以及在蛋白質之中心區域中編碼的各種胺基酸連接體或間隔區。The present disclosure encompasses any suitable means for modifying a polypeptide to include one or more NLSs. In one aspect, a polypeptide (e.g., a programmable nuclease) can be engineered to express a translationally fused NLS at its N-terminus or its C-terminus (or both), i.e., to form a polypeptide-NLS fusion construct. Additionally, the NLS can include various amino acid linkers or spacers encoded between the polypeptide and, for example, the N-terminus, C-terminus, or internally linked NLS amino acid sequences, as well as in the central region of the protein.

因此,本揭示案亦提供用於表現包含多肽及一或多個NLS之融合蛋白之核苷酸構築體、載體及宿主細胞。 標籤結構域 Therefore, the present disclosure also provides nucleotide constructs, vectors and host cells for expressing fusion proteins comprising a polypeptide and one or more NLSs.

在一些實施例中,基於逆轉錄子之編輯系統或其組分可包含多肽標籤,諸如親和標籤(幾丁質結合蛋白(CBP)、麥芽糖結合蛋白(MBP)、穀胱甘肽-S-轉移酶(GST)、SBP標籤、Strep標籤、AviTag、鈣調蛋白標籤);增溶標籤;層析標籤(聚陰離子胺基酸標籤,諸如FLAG標籤);抗原決定基標籤(與高親和力抗體結合之短肽序列,諸如V5標籤、Myc標籤、VSV標籤、Xpress標籤、E標籤、S標籤及HA標籤);螢光標籤(例如,GFP)。在一些實施例中,基於逆轉錄子之編輯系統肽可包含胺基酸標籤,諸如一或多個離胺酸、組胺酸或麩胺酸,該標籤可添加至多肽序列中(例如,N末端或C末端處)。離胺酸可用於增加肽溶解度或允許生物素化。蛋白質及胺基酸標籤係經遺傳移植至重組蛋白質上之肽序列。將序列標籤連接至蛋白質以達成各種目的,諸如肽純化、鑑定或定位,以用於各種應用,包括例如親和純化、蛋白質陣列、西方墨點、免疫螢光及免疫沈澱。此類標籤隨後可藉由化學劑或藉由酶手段移除,諸如藉由特異性蛋白水解或內含肽剪接。In some embodiments, the retrotransposon-based editing system or its components may comprise a polypeptide tag, such as an affinity tag (chitin binding protein (CBP), maltose binding protein (MBP), glutathione-S-transferase (GST), SBP tag, Strep tag, AviTag, calcimodulin tag); a solubilization tag; a chromatography tag (polyanionic amino acid tag, such as FLAG tag); an antigenic determinant tag (a short peptide sequence that binds to a high-affinity antibody, such as V5 tag, Myc tag, VSV tag, Xpress tag, E tag, S tag and HA tag); a fluorescent tag (e.g., GFP). In some embodiments, retrotranscript-based editing system peptides may include an amino acid tag, such as one or more lysine, histidine, or glutamine, which can be added to the polypeptide sequence (e.g., at the N-terminus or C-terminus). Lysine can be used to increase peptide solubility or allow biotinylation. Protein and amino acid tags are peptide sequences that are genetically transplanted onto recombinant proteins. Sequence tags are attached to proteins for various purposes, such as peptide purification, identification, or localization, for use in various applications, including, for example, affinity purification, protein arrays, Western blotting, immunofluorescence, and immunoprecipitation. Such tags can then be removed by chemicals or by enzymatic means, such as by specific proteolysis or intein splicing.

或者,可視情況缺失位於肽或蛋白質之胺基酸序列之羧基及胺基末端區域的胺基酸殘基,以提供經截短之序列。可替代地缺失某些胺基酸(例如,C末端或N末端殘基),這取決於序列之用途,例如,將序列表現為可溶性較大序列之部分,或連接至固體支撐物。 適體 Alternatively, amino acid residues at the carboxyl and amino terminal regions of the amino acid sequence of a peptide or protein may be deleted, as appropriate, to provide a truncated sequence. Certain amino acids (e.g., the C-terminal or N-terminal residue) may alternatively be deleted, depending on the purpose of the sequence, e.g., to express the sequence as part of a soluble larger sequence, or to attach to a solid support.

在特定實施例中,基於逆轉錄子之編輯系統的核酸組分(例如,引導RNA或逆轉錄子ncRNA)可進一步包含經設計以改良核酸組分分子結構、架構、穩定性、遺傳表現或其任何組合之功能結構。此類結構可包括適體。In certain embodiments, the nucleic acid component of the retrotransposons-based editing system (e.g., guide RNA or retrotransposons ncRNA) may further comprise a functional structure designed to improve the molecular structure, architecture, stability, genetic expression, or any combination thereof of the nucleic acid component. Such structures may include aptamers.

適體為生物分子,其可經設計或選擇以與其他配位體緊密結合,例如使用稱為指數富集配位體之系統進化之技術(SELEX;Tuerk C, Gold L: 「Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase.」Science 1990, 249:505-510)。例如,核酸適體可選自隨機序列寡核苷酸之池,該等寡核苷酸對廣泛生物醫學相關標靶具有高結合親和力及特異性,表明適體具有廣泛治療用途(Keefe, Anthony D., Supriya Pai及Andrew Ellington. 「Aptamers as therapeutics.」Nature Reviews Drug Discovery 9.7 (2010): 537-550)。此等特徵亦表明適體作為藥物遞送媒劑之廣泛用途(Levy-Nissenbaum, Etgar等人 「Nanotechnology and aptamers: applications in drug delivery.」Trends in biotechnology 26.8 (2008): 442-449;及Hicke BJ, Stephens AW. 「Escort aptamers: a delivery service for diagnosis and therapy.」J Clin Invest 2000, 106:923-928.)。適體亦可經構建為充當分子開關,藉由改變特性來回應於que,諸如結合螢光團以模擬綠色螢光蛋白活性之RNA適體(Paige, Jeremy S., Karen Y. Wu及Sarnie R. Jaffrey. 「RNA mimics of green fluorescent protein.」Science 333.6042 (2011): 642-646)。亦有建議,適體可用作靶向siRNA治療遞送系統之組分,例如靶向細胞表面蛋白(Zhou, Jiehua及John J. Rossi. 「Aptamer-targeted cell-specific RNA interference.」Silence 1.1 (2010): 4)。Aptamers are biomolecules that can be designed or selected to bind tightly to other ligands, for example using a technique called systematic evolution of ligands by exponential enrichment (SELEX; Tuerk C, Gold L: "Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase." Science 1990, 249: 505-510). For example, nucleic acid aptamers can be selected from a pool of random sequence oligonucleotides that have high binding affinity and specificity for a wide range of biomedically relevant targets, indicating that aptamers have broad therapeutic uses (Keefe, Anthony D., Supriya Pai and Andrew Ellington. "Aptamers as therapeutics." Nature Reviews Drug Discovery 9.7 (2010): 537-550). These characteristics also indicate the wide application of aptamers as drug delivery vehicles (Levy-Nissenbaum, Etgar et al. "Nanotechnology and aptamers: applications in drug delivery." Trends in biotechnology 26.8 (2008): 442-449; and Hicke BJ, Stephens AW. "Escort aptamers: a delivery service for diagnosis and therapy." J Clin Invest 2000, 106: 923-928.). Aptamers can also be constructed to act as molecular switches, changing properties in response to que, such as RNA aptamers that bind fluorophores to mimic the activity of green fluorescent protein (Paige, Jeremy S., Karen Y. Wu and Sarnie R. Jaffrey. "RNA mimics of green fluorescent protein." Science 333.6042 (2011): 642-646). It has also been suggested that aptamers can be used as components of targeted siRNA therapeutic delivery systems, such as targeting cell surface proteins (Zhou, Jiehua and John J. Rossi. "Aptamer-targeted cell-specific RNA interference." Silence 1.1 (2010): 4).

因此,在特定實施例中,基於逆轉錄子之基因編輯核酸組分例如由一或多種適體修飾,該一或多種適體經設計以改良RNA或DNA組分分子遞送,包括穿過細胞膜遞送至細胞內隔室,或進入細胞核中。除一或多種適體以外或在無此一或多種適體之情況下,此類結構可包括(多種)部分,以便使核酸組分分子對於所選效應子為可遞送、可誘導性或反應性的。因此,本發明涵蓋回應於正常或病理性生理條件之reRNA組分分子,該等條件包括但不限於pH、缺氧、氧濃度、溫度、蛋白質濃度、酶濃度、脂質結構、曝光、機械破壞(例如,超音波)、磁場、電場或電磁輻射。 調節DNA修復之劑 Thus, in certain embodiments, retrotranscript-based gene editing nucleic acid components are modified, for example, by one or more aptamers designed to improve delivery of RNA or DNA component molecules, including delivery across cell membranes to intracellular compartments, or into the nucleus. Such structures may include (a variety of) moieties in addition to or in the absence of one or more aptamers, so as to render the nucleic acid component molecule deliverable, inducible, or responsive to a selected effector. Thus, the present invention encompasses reRNA component molecules that respond to normal or pathological physiological conditions, including but not limited to pH , hypoxia, oxygen concentration, temperature, protein concentration, enzyme concentration, lipid structure, light exposure, mechanical damage (e.g., ultrasound), magnetic field, electric field or electromagnetic radiation.

在某些實施例中,本文所述之基於逆轉錄子之基因編輯系統進一步包含或編碼DNA修復調節生物分子,其可進一步增強修復結果之整合效率。In certain embodiments, the retrotransposon-based gene editing system described herein further comprises or encodes a DNA repair regulatory biomolecule, which can further enhance the integration efficiency of the repair outcome.

在某些實施例中,該DNA修復調節生物分子包含非同源末端接合(NHEJ)抑制劑。In certain embodiments, the DNA repair modulating biomolecule comprises a non-homologous end joining (NHEJ) inhibitor.

在某些實施例中,該DNA修復調節生物分子包含同源定向修復(HDR)啟動子。In certain embodiments, the DNA repair regulatory biomolecule comprises a homology-directed repair (HDR) promoter.

在某些實施例中,該DNA修復調節生物分子包含NHEJ抑制劑及HDR啟動子。In certain embodiments, the DNA repair modulating biomolecule comprises an NHEJ inhibitor and an HDR promoter.

在某些實施例中,與無DNA修復調節生物分子之其他方面一致的實施例相比,DNA修復調節生物分子增強或改良更精確之基因體編輯及/或同源重組效率。In certain embodiments, a DNA repair modulating biomolecule enhances or improves more precise genome editing and/or homologous recombination efficiency compared to an otherwise consistent embodiment without the DNA repair modulating biomolecule.

在一些實施例中,HDR啟動子及/或NHEJ抑制劑可包含一或多種小分子。攜帶重組增強子(諸如在DNA損傷之基因體位點處局部活化HDR且抑制NHEJ之小分子)之系統可在其置放於經工程改造之系統上時經調適以進一步增強其效率。一般而言,小分子重組增強子可經合成為攜帶連接體及官能基(諸如用於與蛋白質之Cys殘基上的硫醇基反應之馬來醯亞胺),以化學結合至經工程改造之系統。亦可採用市售官能化PEG連接體(炔、疊氮化物、環辛炔等)之使用來進行結合,且可利用正交結合化學來進行多價呈現。In some embodiments, the HDR promoter and/or NHEJ inhibitor may comprise one or more small molecules. Systems carrying recombinant enhancers (such as small molecules that locally activate HDR and inhibit NHEJ at genomic sites of DNA damage) can be adapted to further enhance their efficiency when placed on an engineered system. In general, small molecule recombinant enhancers can be synthesized to carry linkers and functional groups (such as maleimide for reacting with thiol groups on Cys residues of proteins) to chemically bind to engineered systems. Commercially available functionalized PEG linkers (alkynes, azides, cyclooctynes, etc.) can also be used for conjugation, and orthogonal conjugation chemistry can be used for multivalent presentation.

在修飾不影響所選重組增強子之效能的情況下,可容易地鑑定結合位點。In cases where the modification does not affect the potency of the selected recombinant enhancer, the binding site can be readily identified.

在某些實施例中,可實現一或多種DNA修復調節生物分子之多價呈現,包括NHEJ抑制劑、HDR啟動子或其組合之多種部分。參見例如「Genomic targeting of epigenetic probes using a chemically tailored Cas9 system」 Liszczak 等人, Proc Natl Acad Sci U.S.A.114: 681-686, 2017 (以引用之方式併入本文中)。在某些實施例中,小分子化合物之多價呈現可藉由用作其呈現支架之分選酶環蛋白來實現。 In certain embodiments, multivalent presentation of one or more DNA repair modulating biomolecules can be achieved, including multiple portions of NHEJ inhibitors, HDR promoters, or combinations thereof. See, e.g., "Genomic targeting of epigenetic probes using a chemically tailored Cas9 system" Liszczak et al. , Proc Natl Acad Sci USA 114: 681-686, 2017 (incorporated herein by reference). In certain embodiments, multivalent presentation of small molecule compounds can be achieved by using a sortase cycloprotein as a scaffold for presentation thereof.

在一些實施例中,該DNA修復調節生物分子可包含HDR啟動子。HDR啟動子可包含小分子,諸如RSI或其類似物。在某些實施例中,HDR啟動子刺激RAD51活性或RAD52模體蛋白1 (RDMl)活性。在某些實施例中,HDR啟動子包含諾考達唑,其可導致更高HDR選擇。In some embodiments, the DNA repair regulatory biomolecule may comprise an HDR promoter. An HDR promoter may comprise a small molecule, such as RSI or its analog. In certain embodiments, an HDR promoter stimulates RAD51 activity or RAD52 motif protein 1 (RDM1) activity. In certain embodiments, an HDR promoter comprises nocodazole, which may result in higher HDR selection.

在某些實施例中,可在遞送本文所述之基於逆轉錄子之編輯系統之前投與HDR啟動子。In some embodiments, an HDR enabler may be administered prior to delivering the reverse transcription based editing system described herein.

在某些實施例中,HDR啟動子局部地增強HDR而不抑制NHEJ。例如,RAD5l係參與股交換及HDR修復期間之同源區搜索之蛋白質。在某些實施例中,HDR啟動子為苯基苯甲醯胺RSI,經鑑定為小分子RAD51刺激劑(參見WO2019/135816之[0200]-[0204],特定地以引用之方式併入本文中)。In some embodiments, the HDR promoter locally enhances HDR without inhibiting NHEJ. For example, RAD51 is a protein involved in homology region search during strand exchange and HDR repair. In some embodiments, the HDR promoter is phenylbenzamide RSI, identified as a small molecule RAD51 stimulator (see [0200]-[0204] of WO2019/135816, specifically incorporated herein by reference).

在某些實施例中,該DNA修復調節生物分子包含C末端結合蛋白相互作用蛋白(CtIP)或其功能片段或同源物。CtIP為同源重組之早期步驟中之關鍵蛋白。根據此實施例,CtIP或其功能片段或同源物可連接( 例如,融合)至RT或序列特異性核酸酶( 例如,CRISPR/Cas效應酶、ZFN、TALEN、大範圍核酸酶、TnpB、IscB或限制性核酸內切酶(RE)),且藉由HDR刺激轉殖基因整合。 In certain embodiments, the DNA repair regulatory biomolecule comprises a C-terminal binding protein interacting protein (CtIP) or a functional fragment or homolog thereof. CtIP is a key protein in the early steps of homologous recombination. According to this embodiment, CtIP or a functional fragment or homolog thereof can be linked ( e.g. , fused) to RT or sequence-specific nucleases ( e.g. , CRISPR/Cas effectors, ZFNs, TALENs, meganucleases, TnpB, IscB, or restriction endonucleases (RE)), and transgene integration is stimulated by HDR.

在某些實施例中,CtIP片段為野生型CtIP之最小N末端片段,諸如包含全長CtIP之殘基1-296的N末端片段(HDR增強子之HE),如Charpentier 等人(Nature Comm., DOI: 10.1038/s41467-018-03475-7,以引用之方式併入本文中)中所述,顯示足以刺激HDR。該片段之活性取決於CDK磷酸化位點( 例如,S233、T245及S276)及同源重組中之CtIP活性所必需之多聚化結構域。因此,包含CDK磷酸化位點及CtIP活性所必需之多聚化結構域的替代片段亦在本發明之範圍內。 In certain embodiments, the CtIP fragment is a minimal N-terminal fragment of wild-type CtIP, such as an N-terminal fragment comprising residues 1-296 of full-length CtIP (HE of the HDR enhancer), as described in Charpentier et al . (Nature Comm., DOI: 10.1038/s41467-018-03475-7, incorporated herein by reference), which was shown to be sufficient to stimulate HDR. The activity of this fragment depends on the CDK phosphorylation sites ( e.g. , S233, T245 and S276) and the multimerization domain required for CtIP activity in homologous recombination. Therefore, alternative fragments comprising CDK phosphorylation sites and the multimerization domain required for CtIP activity are also within the scope of the present invention.

在某些實施例中,該DNA修復調節生物分子包含優勢陰性53BP1。In certain embodiments, the DNA repair modulating biomolecule comprises dominant negative 53BP1.

在某些實施例中,該DNA修復調節生物分子包含細胞週期特異性降解標籤,諸如(人類) Geminin及(鼠科動物) CyclinB2之降解結構域。In certain embodiments, the DNA repair regulatory biomolecule comprises a cell cycle specific degradation tag, such as the degradation domain of (human) Geminin and (murine) CyclinB2.

在某些實施例中,該DNA修復調節生物分子包含CyclinB2,CyclinB2係與p34cdc2締合之B型細胞週期蛋白之成員,且為細胞週期調節機器之必要組分。可藉由促進細胞週期之G2期(此時HDR更活躍)中的Cas9活性之相對增加來增加CRISPR介導之敲入效率。在某些實施例中,(人類) Geminin及(鼠科動物) CyclinB2之降解結構域可用作N末端或C末端融合以充當DNA修復調節生物分子。已知此等結構域決定了嵌合蛋白之細胞週期特異性型態,即與G1相比,它們在S及G2中之相對濃度增加,從而替代(high-jacking)習知CyclinB2及Geminin降解路徑。這會產生活性Geminin-Cas9及CyclinB2-Cas9嵌合蛋白,該等蛋白以細胞週期依賴性方式降解。與常用之Cas9相比,此類嵌合體將DSB之修復轉移至HDR修復路徑。In some embodiments, the DNA repair regulatory biomolecule comprises CyclinB2, which is a member of the B-type cell cycle proteins that associate with p34cdc2 and is an essential component of the cell cycle regulatory machinery. The efficiency of CRISPR-mediated knock-in can be increased by promoting a relative increase in Cas9 activity in the G2 phase of the cell cycle (when HDR is more active). In some embodiments, the degradation domains of (human) Geminin and (murine) CyclinB2 can be used as N-terminal or C-terminal fusions to serve as DNA repair regulatory biomolecules. These domains are known to determine the cell cycle-specific profile of the chimeric proteins, i.e., their relative concentrations are increased in S and G2 compared to G1, thereby high-jacking the known CyclinB2 and Geminin degradation pathways. This results in active Geminin-Cas9 and CyclinB2-Cas9 chimeric proteins that are degraded in a cell cycle-dependent manner. Compared to the commonly used Cas9, these chimeras shift the repair of DSBs to the HDR repair pathway.

雖然不希望受特定理論束縛,但據信此類細胞週期特異性降解標籤之應用允許/促進更有效/安全之基因編輯。While not wishing to be bound by a particular theory, it is believed that the application of such cell cycle-specific degradation tags allows/facilitates more efficient/safe gene editing.

在某些實施例中,DNA修復調節生物分子包含Rad家族成員蛋白,諸如Rad50、Rad51、Rad52等,該蛋白用於促進外源性DNA整合至宿主染色體中。特定言之,Rad52為一種重要的同源重組蛋白,且它與Rad51之複合物在HDR中發揮關鍵作用,主要參與外源DNA在真核生物中之調節。HR過程中之關鍵步驟包括Rad51介導之修復及股交換。Rad52作為DNA修復調節生物分子之共表現顯著增強了HDR之可能性, 例如達三倍。 In certain embodiments, the DNA repair regulatory biomolecule comprises a Rad family member protein, such as Rad50, Rad51, Rad52, etc., which is used to promote the integration of exogenous DNA into host chromosomes. Specifically, Rad52 is an important homologous recombination protein, and its complex with Rad51 plays a key role in HDR, mainly participating in the regulation of exogenous DNA in eukaryotes. The key steps in the HR process include Rad51-mediated repair and strand exchange. The co-expression of Rad52 as a DNA repair regulatory biomolecule significantly enhances the possibility of HDR, for example , by three times.

在某些實施例中,該DNA修復調節生物分子包含呈 例如N末端或C末端融合形式之RAD52蛋白。 In certain embodiments, the DNA repair modulating biomolecule comprises a RAD52 protein, for example, in the form of an N-terminal or C-terminal fusion.

在某些實施例中,該DNA修復調節生物分子包含類似於RAD52起作用之RAD52模體蛋白1 (RDMl)。RDM1已顯示能夠修復由DNA複製引起之DSB,防止G2或M細胞週期停滯,且改良HDR選擇。In certain embodiments, the DNA repair regulatory biomolecule comprises a RAD52 motif protein 1 (RDM1) that functions similarly to RAD52. RDM1 has been shown to be able to repair DSBs caused by DNA replication, prevent G2 or M cell cycle arrest, and improve HDR selection.

在某些實施例中,該DNA修復調節生物分子包含腫瘤抑制因子p53結合蛋白1 (53BP1)之優勢陰性形式。野生型蛋白53BP1係在NHEJ與HDR之間進行選擇之關鍵調節因子–它為一種促NHEJ因子,該因子藉由阻斷DNA末端切除且亦藉由抑制BRCA1募集至DSB位點來限制HDR。已顯示,泛素變異體對53BP1之整體抑制顯著地改良AAV中使用單股寡核苷酸遞送或雙股供體之非造血及造血細胞中的Cas9介導之HDR頻率。In certain embodiments, the DNA repair regulatory biomolecule comprises a dominant negative form of the tumor suppressor p53 binding protein 1 (53BP1). The wild-type protein 53BP1 is a key regulator in the choice between NHEJ and HDR - it is a pro-NHEJ factor that limits HDR by blocking DNA end resection and also by inhibiting BRCA1 recruitment to DSB sites. It has been shown that global inhibition of 53BP1 by ubiquitin variants significantly improves the frequency of Cas9-mediated HDR in non-hematopoietic and hematopoietic cells using single-stranded oligonucleotide delivery or double-stranded donors in AAV.

在某些實施例中,53BP1之優勢陰性(DN)形式包含最小焦點形成區域,但缺乏此區域外部( 例如,朝向N末端及串聯C末端BRCT重複)之結構域,該等結構域募集參與NHEJ之關鍵效應子,分別諸如RIFl-PTIP及EXPAND。53BP1銜接蛋白藉由此最小焦點形成區域經募集至DSB位點處之特定組蛋白標記物,該區域包含數個保守結構域,包括寡聚結構域(OD)、富甘胺酸-精胺酸(GAR)模體、Tudor結構域及相鄰之泛素依賴性募集(UDR)模體。Tudor結構域介導與K2023處二甲基化之組蛋白H4之相互作用。 In certain embodiments, the dominant negative (DN) form of 53BP1 comprises a minimal focus-forming region, but lacks domains outside of this region ( e.g. , toward the N-terminus and tandem C-terminal BRCT repeats) that recruit key effectors involved in NHEJ, such as RIF1-PTIP and EXPAND, respectively. 53BP1 binds proteins through this minimal focus-forming region, which is recruited to specific histone markers at DSB sites, and comprises several conserved domains, including an oligomerization domain (OD), a glycine-arginine-rich (GAR) motif, a Tudor domain, and an adjacent ubiquitin-dependent recruitment (UDR) motif. The Tudor domain mediates interaction with histone H4 dimethylated at K2023.

在某些實施例中,53BP1之優勢陰性形式(DN1S)抑制內源性53BP1及下游NHEJ蛋白在DNA損傷位點處之積聚,同時上調BRCA1 HDR蛋白之募集。53BP1之此類DN形式可用作DNA修復調節生物分子,呈N末端或C端融合形式(諸如Cas9融合,以在由其gRNA定義之Cas9標靶位點處局部抑制NHEJ ,同時促進HDR之增加,且不會整體影響NHEJ,由此改良細胞活力)。In certain embodiments, the dominant negative form (DN1S) of 53BP1 inhibits the accumulation of endogenous 53BP1 and downstream NHEJ proteins at DNA damage sites, while upregulating the recruitment of BRCA1 HDR proteins. Such DN forms of 53BP1 can be used as DNA repair regulatory biomolecules in the form of N-terminal or C-terminal fusions (such as Cas9 fusions to locally inhibit NHEJ at the Cas9 target site defined by its gRNA, while promoting the increase of HDR, and will not affect NHEJ globally, thereby improving cell viability).

在某些實施例中,該DNA修復調節生物分子包含NHEJ抑制劑,諸如DNA連接酶IV抑制劑、KU抑制劑( 例如,KU70或KU80)、DNA-PKc抑制劑或artemis抑制劑。 In certain embodiments, the DNA repair modulating biomolecule comprises an NHEJ inhibitor, such as a DNA ligase IV inhibitor, a KU inhibitor ( e.g. , KU70 or KU80), a DNA-PKc inhibitor, or an artemis inhibitor.

在某些實施例中,NHEJ抑制劑抑制NHEJ路徑,增強HDR,或調節兩者。在某些實施例中,NHEJ抑制劑為小分子抑制劑。In certain embodiments, the NHEJ inhibitor inhibits the NHEJ pathway, enhances HDR, or modulates both. In certain embodiments, the NHEJ inhibitor is a small molecule inhibitor.

在某些實施例中,NHEJ路徑之小分子抑制劑包含SCR7類似物,例如PK66、PK76、PK409。In certain embodiments, the small molecule inhibitor of the NHEJ pathway comprises an SCR7 analog, such as PK66, PK76, PK409.

在某些實施例中,NHEJ抑制劑包含KU抑制劑,例如KU5788及KU0060648。In certain embodiments, the NHEJ inhibitor comprises a KU inhibitor, such as KU5788 and KU0060648.

在某些實施例中,小分子NHEJ抑制劑藉由用於分選酶介導之接合之PEG連接至聚甘胺酸三肽,如WO2019/135816,Guimaraes 等人, Nat Protoc8:1787-99, 2013;Theile 等人, Nat Protoc8:1800-7, 2013;及Schmohl 等人, Curr Opin Chem Biol22:122-8, 2014 (均以引用之方式併入本文中)所述。相同手段亦可用於將小分子HDR增強子連接至蛋白質。 In certain embodiments, small molecule NHEJ inhibitors are linked to polyglycine tripeptides via PEG for sortase-mediated conjugation, as described in WO2019/135816, Guimaraes et al. , Nat Protoc 8:1787-99, 2013; Theile et al. , Nat Protoc 8:1800-7, 2013; and Schmohl et al. , Curr Opin Chem Biol 22:122-8, 2014 (all incorporated herein by reference). The same approach can also be used to link small molecule HDR enhancers to proteins.

用於結合小分子DNA修復調節生物分子而不損失活性之例示性方法描述於WO2019135816中,其中聚甘胺酸肽與環4處之對羧基部分的SCR-7結合保留了抑制劑活性,其中該分子之環1、2及3參與標靶銜接,從而提供一種將小分子NHEJ抑制劑接合至本文所述之系統( 例如,接合至包括Cas酶之序列特異性核酸酶,或接合至RT)的簡單且有效策略以精確地增強核酸標靶位點附近之HDR路徑。 An exemplary method for conjugating small molecule DNA repair modulating biomolecules without loss of activity is described in WO2019135816, wherein conjugation of a poly-glycine peptide to SCR-7 of the carboxyl portion at loop 4 retains inhibitor activity, wherein loops 1, 2, and 3 of the molecule are involved in target binding, thereby providing a simple and effective strategy for conjugating small molecule NHEJ inhibitors to the systems described herein ( e.g. , to sequence-specific nucleases including Cas enzymes, or to RT) to precisely enhance the HDR pathway near the nucleic acid target site.

在某些實施例中,可利用基於DNA依賴性蛋白激酶(DNA-PK)或異二聚體Ku (KU70/KU80)之小分子抑制劑之核酸靶向部分結合物。KU-0060648為一種有效KU抑制劑,其亦可經聚甘胺酸官能化且用於增強重組。In certain embodiments, nucleic acid targeting moiety conjugates based on small molecule inhibitors of DNA-dependent protein kinase (DNA-PK) or heterodimeric Ku (KU70/KU80) can be utilized. KU-0060648 is a potent KU inhibitor that can also be functionalized with polyglycine and used to enhance recombination.

在某些實施例中,該DNA修復調節生物分子包含腫瘤抑制因子p53。p53在DNA修復中發揮直接作用,包括HR調節,其中它影響新DNA之延伸,由此影響HDR選擇。在活體內,p53結合於核基質且為修復DNA結構之限速因子。p53經由反式活化依賴性及非反式活化依賴性路徑調節幾乎所有真核生物中之DNA修復過程,但僅p53之非反式活化依賴性功能參與HR調節。野生型p53蛋白可連接雙股斷裂以形成完整DNA,以及亦在抑制NHEJ時發揮作用。p53與HR相關蛋白(包括Rad51)相互作用,其中它藉由與Rad51之直接相互作用來控制HR。 附屬結構域 In certain embodiments, the DNA repair regulatory biomolecule comprises the tumor suppressor p53. p53 plays a direct role in DNA repair, including HR regulation, where it affects the extension of new DNA, thereby affecting HDR selection. In vivo, p53 is bound to the nuclear matrix and is the rate-limiting factor for repairing DNA structure. p53 regulates DNA repair processes in almost all eukaryotic organisms via transactivation-dependent and non-transactivation-dependent pathways, but only the non-transactivation-dependent function of p53 is involved in HR regulation. Wild-type p53 protein can connect double-strand breaks to form intact DNA, and also plays a role in inhibiting NHEJ. p53 interacts with HR-related proteins, including Rad51, where it controls HR by directly interacting with Rad51. Accessory domains

在其他態樣中,基於逆轉錄子之基因編輯系統可包含一或多種具有基因體修飾功能之額外輔助蛋白,包括重組酶、轉化酶、核酸酶、聚合酶、連接酶、去胺酶、逆轉錄酶或表觀遺傳修飾功能。在各個實施例中,可單獨提供輔助蛋白。在其他實施例中,輔助蛋白可視情況用連接體融合至基於逆轉錄子之編輯系統之多肽組分。In other aspects, the gene editing system based on retrotransposons may include one or more additional auxiliary proteins with genome modification functions, including recombinases, invertases, nucleases, polymerases, ligases, deaminases, reverse transcriptases or epigenetic modification functions. In various embodiments, auxiliary proteins can be provided separately. In other embodiments, auxiliary proteins can be fused to the polypeptide components of the editing system based on retrotransposons using a linker as appropriate.

基於逆轉錄子之基因編輯系統可進一步包含此項技術中已知之額外多肽多肽、蛋白質及/或肽。多肽之非限制性類別包括抗原、抗體、抗體片段、細胞介素、肽、激素、酶、氧化劑、抗氧化劑、合成多肽及嵌合多肽、受體、酶、激素、轉錄因子、配位體、膜轉運蛋白、結構蛋白、核酸酶或其組分、變異體或片段(例如,生物活性片段)。Retroviral gene editing systems can further include additional polypeptides, proteins and/or peptides known in the art. Non-limiting classes of polypeptides include antigens, antibodies, antibody fragments, cytokines, peptides, hormones, enzymes, oxidants, antioxidants, synthetic polypeptides and chimeric polypeptides, receptors, enzymes, hormones, transcription factors, ligands, membrane transporters, structural proteins, nucleases, or components, variants or fragments (e.g., biologically active fragments) thereof.

如本文所用,術語「肽」一般係指約50個胺基酸或更少胺基酸之較短多肽。僅具有兩個胺基酸之肽可稱為「二肽」。僅具有三個胺基酸之肽可稱為「三肽」。多肽一般係指具有約4至約50個胺基酸之多肽。肽可經由熟習此項技術者已知之任何方法獲得。在一些實施例中,肽可在培養物中表現。在一些實施例中,肽可經由化學合成(例如,固相肽合成)獲得。As used herein, the term "peptide" generally refers to a shorter polypeptide of about 50 amino acids or less. A peptide having only two amino acids may be referred to as a "dipeptide". A peptide having only three amino acids may be referred to as a "tripeptide". A polypeptide generally refers to a polypeptide having about 4 to about 50 amino acids. Peptides may be obtained by any method known to those skilled in the art. In some embodiments, peptides may be expressed in culture. In some embodiments, peptides may be obtained by chemical synthesis (e.g., solid phase peptide synthesis).

在一些實施例中,RNA有效載荷(例如,編碼一或多種相關編碼產物之線性及/或環狀mRNA有效載荷,或非編碼RNA,諸如引導RNA)可編碼使用者可程式化之DNA結合蛋白或基因編輯器輔助蛋白,諸如但不限於去胺酶、核酸酶、轉位酶、聚合酶及逆轉錄酶等。In some embodiments, RNA payloads (e.g., linear and/or circular mRNA payloads encoding one or more cognate coding products, or non-coding RNAs such as guide RNAs) may encode user-programmable DNA binding proteins or gene editor accessory proteins such as, but not limited to, deaminases, nucleases, transposases, polymerases, and reverse transcriptases, among others.

在一些實施例中,RNA有效載荷(例如,編碼一或多種相關編碼產物之線性及/或環狀mRNA有效載荷,例如本文所述之起始構築體及基準構築體)可編碼與非蛋白質相關之簡單蛋白質。結合蛋白之非限制性實例包括醣蛋白、血紅素、卵磷脂蛋白、核蛋白及磷蛋白。In some embodiments, RNA payloads (e.g., linear and/or circular mRNA payloads encoding one or more cognate coding products, such as the starting constructs and baseline constructs described herein) may encode simple proteins associated with non-proteins. Non-limiting examples of associated proteins include glycoproteins, heme, phosphatidylcholine, nucleoproteins, and phosphoproteins.

在一些實施例中,RNA有效載荷(例如,編碼一或多種相關編碼產物之線性及/或環狀mRNA有效載荷,例如本文所述之起始構築體及基準構築體)可編碼藉由化學或物理手段源自簡單或結合蛋白之蛋白質。所衍生蛋白質之非限制性實例包括變性蛋白質及肽。In some embodiments, RNA payloads (e.g., linear and/or circular mRNA payloads encoding one or more cognate coding products, such as the starting constructs and baseline constructs described herein) may encode proteins derived from simple or conjugated proteins by chemical or physical means. Non-limiting examples of derived proteins include denatured proteins and peptides.

在一些實施例中,多肽、蛋白質或肽可為未經修飾的。In some embodiments, the polypeptide, protein or peptide may be unmodified.

在一些實施例中,多肽、蛋白質或肽可為經修飾的。修飾之類型包括但不限於磷酸化、糖基化、乙醯化、泛素化/類泛素化、甲基化、棕櫚醯化、醌、醯胺化、肉豆蔻醯化、吡咯啶酮羧酸、羥基化、磷酸泛醯巰基乙胺、異戊二烯化、GPI錨定、氧化、ADP-核糖基化、硫酸化、S-亞硝基化、瓜胺酸化、硝化、γ-羧基麩胺酸、甲醯化、腐胺離胺酸(hypusine)、托帕醌(TPQ)、溴化、離胺酸托帕醌(LTQ)、色胺酸色胺酸醌(TTQ)、碘化及半胱胺酸色胺酸醌(CTQ)。在一些態樣中,多肽、蛋白質或肽可藉由轉錄後修飾進行修飾,該轉錄後修飾可影響其結構、次細胞定位及/或功能。In some embodiments, the polypeptide, protein or peptide may be modified. Types of modification include, but are not limited to, phosphorylation, glycosylation, acetylation, ubiquitination/ubiquitination, methylation, palmitylation, quinone, acylation, myristoylation, pyrrolidonecarboxylic acid, hydroxylation, phosphopantetheinylethylamine, prenylation, GPI anchoring, oxidation, ADP-ribosylation, sulfation, S-nitrosylation, citrullination, nitration, γ-carboxyglutamate, formylation, putrescine lysine (hypusine), tropaquinone (TPQ), bromination, lysine tropaquinone (LTQ), tryptophanyl quinone (TTQ), iodination and cysteine tropaquinone (CTQ). In some aspects, a polypeptide, protein or peptide may be modified by post-transcriptional modifications that affect its structure, subcellular localization and/or function.

在一些實施例中,可使用磷酸化來修飾多肽、蛋白質或肽。磷酸化或向絲胺酸、酥胺酸或酪胺酸殘基中添加磷酸酯基係最常見之蛋白質修飾形式之一。蛋白質磷酸化在細胞內信號傳導級聯中之信號的微調中發揮重要作用。In some embodiments, phosphorylation can be used to modify a polypeptide, protein, or peptide. Phosphorylation, or the addition of phosphate groups to serine, threonine, or tyrosine residues, is one of the most common forms of protein modification. Protein phosphorylation plays an important role in the fine-tuning of signals in intracellular signaling cascades.

在一些實施例中,可使用泛素化來修飾多肽、蛋白質或肽,泛素化係泛素共價連接至標靶蛋白。泛素化介導之蛋白質週轉已顯示在驅動細胞週期以及非蛋白質降解依賴性細胞內信號傳導路徑中發揮作用。In some embodiments, polypeptides, proteins or peptides may be modified using ubiquitination, which is the covalent attachment of ubiquitin to a target protein. Ubiquitination-mediated protein turnover has been shown to play a role in driving cell cycle as well as protein degradation-independent intracellular signaling pathways.

在一些實施例中,可使用乙醯化及甲基化來修飾多肽、蛋白質或肽,乙醯化及甲基化可在基因表現之調節中發揮作用。作為非限制性實例,乙醯化及甲基化可能介導染色質結構域(例如,常染色質及異染色質)之形成,該等結構域可能對介導基因沈默產生影響。In some embodiments, acetylation and methylation can be used to modify polypeptides, proteins or peptides, and acetylation and methylation can play a role in the regulation of gene expression. As a non-limiting example, acetylation and methylation may mediate the formation of chromatin domains (e.g., euchromatin and heterochromatin), which may have an impact on mediating gene silencing.

在一些實施例中,可使用糖基化來修飾多肽、蛋白質或肽。糖基化係連接大量聚醣基團之一且為一種修飾,該修飾在所有蛋白質之約一半中發生且在包括但不限於胚胎發育、細胞分裂及蛋白質結構調節之生物過程中發揮作用。蛋白質糖基化之兩種主要類型為N-糖基化及O-糖基化。對於N-糖基化,聚醣連接至天冬醯胺;且對於O-糖基化,聚醣連接至絲胺酸或酥胺酸。In some embodiments, glycosylation can be used to modify a polypeptide, protein, or peptide. Glycosylation is the attachment of one of a large number of glycan groups and is a modification that occurs in about half of all proteins and plays a role in biological processes including, but not limited to, embryonic development, cell division, and regulation of protein structure. The two main types of protein glycosylation are N-glycosylation and O-glycosylation. For N-glycosylation, the glycans are attached to asparagine; and for O-glycosylation, the glycans are attached to serine or thixoamine.

在一些實施例中,可使用類泛素化來修飾多肽、蛋白質或肽。類泛素化係向蛋白質中添加SUMO (小型泛素樣修飾劑)且為一種類似於泛素化之轉譯後修飾。In some embodiments, paraubiquitination can be used to modify polypeptides, proteins, or peptides. Paraubiquitination is the addition of SUMO (small ubiquitin-like modifier) to proteins and is a post-translational modification similar to ubiquitination.

在其他實施例中,RNA有效載荷(例如,編碼一或多種相關編碼產物之線性及/或環狀mRNA有效載荷,例如本文所述之起始構築體及基準構築體)可編碼治療蛋白,諸如下文所例示之彼等蛋白質。In other embodiments, RNA payloads (e.g., linear and/or circular mRNA payloads encoding one or more cognate coding products, such as the starting constructs and baseline constructs described herein) can encode therapeutic proteins, such as those exemplified below.

在其他實施例中,RNA有效載荷(例如,編碼一或多種相關產物之線性及/或環狀mRNA有效載荷,例如本文所述之起始構築體及基準構築體)可編碼基因編輯系統,諸如下文所例示之彼等系統。如本文所用,「核鹼基編輯系統」為一種蛋白質、DNA或RNA組合物,其能夠對一或多種相關靶向基因進行編輯、修飾或改變。根據本發明,目前市售或開發中之一或多種核鹼基編輯系統可由本發明之本文所述之RNA有效載荷(例如,編碼一或多種相關編碼產物之線性及/或環狀mRNA有效載荷)編碼。 誘導性修飾 In other embodiments, the RNA payload (e.g., linear and/or circular mRNA payloads encoding one or more relevant products, such as the starting constructs and baseline constructs described herein) can encode a gene editing system, such as those exemplified below. As used herein, a "nucleobase editing system" is a protein, DNA or RNA composition that is capable of editing, modifying or altering one or more relevant targeted genes. According to the present invention, one or more nucleobase editing systems currently commercially available or under development can be encoded by the RNA payloads described herein of the present invention (e.g., linear and/or circular mRNA payloads encoding one or more relevant coding products). Induced Modification

在一實施例中,基於逆轉錄子之編輯系統可經組態為誘導型基因編輯系統。該系統之誘導性質將允許使用能量形式對基因編輯或基因表現進行時空控制。能量形式可包括但不限於電磁輻射、聲能、化學能及熱能。誘導系統之實例包括四環素誘導型啟動子(Tet-On或Tet-Off)、小分子雙雜交轉錄活化系統(FKBP、ABA等)或光誘導系統(光敏素、LOV結構域或隱花色素)。在一實施例中,基於逆轉錄子之編輯系統或其組分可包括光誘導性轉錄效應子(LITE)以便以序列特異性方式指導轉錄活性之變化。光組分可包括逆轉錄子編輯系統組分、光反應性細胞色素異二聚體(例如,來自擬南芥)及轉錄活化/抑制結構域。誘導型DNA結合蛋白及其使用方法之進一步實例提供於美國臨時申請案第61/736,465號及第US 61/721,283號以及國際專利公開案第WO 2014/018423 A2號中,該公開案由此以引用之方式整體併入。In one embodiment, the editing system based on retrotranscripts can be configured as an inducible gene editing system. The inducing properties of the system will allow the use of energy forms to control gene editing or gene expression in time and space. Energy forms may include but are not limited to electromagnetic radiation, sound energy, chemical energy, and thermal energy. Examples of inducing systems include tetracycline-induced promoters (Tet-On or Tet-Off), small molecule double hybrid transcription activation systems (FKBP, ABA, etc.) or light-induced systems (phytochrome, LOV domain or cryptochrome). In one embodiment, the editing system based on retrotranscripts or its components may include light-induced transcription effectors (LITEs) to guide changes in transcriptional activity in a sequence-specific manner. The light component may include a retrotranscript editing system component, a light-responsive cytochrome heterodimer (e.g., from Arabidopsis thaliana), and a transcriptional activation/repression domain. Further examples of inducible DNA binding proteins and methods of use thereof are provided in U.S. Provisional Application Nos. 61/736,465 and 61/721,283 and International Patent Publication No. WO 2014/018423 A2, which are hereby incorporated by reference in their entirety.

一旦細胞基因體中之基因之所有複本均已經編輯,該系統就無需再在彼細胞中繼續表現。實際上,若在非預期基因體位點等處出現脫靶效應,則持續表現將為非所需的。因此,限時表現將為有用的。誘導性表現提供一種方法,但此外,申請人已工程改造一種自失活系統,該系統依賴於非編碼核酸組分分子標靶序列在載體自身內之使用。因此,在表現開始後,該系統將導致其自身破壞,但在破壞完成之前,它將有時間編輯標靶基因之基因體複本(在二倍體細胞中之正常點突變的情況下,其需要至多兩個編輯)。簡言之,該自失活系統包括額外RNA (例如,核酸組分分子),其靶向Cas12a多肽自身之編碼序列或靶向與以下一或多者中存在之獨特序列互補的一或多個非編碼核酸組分分子標靶序列:(a)在驅動非編碼RNA元件表現之啟動子內,(b)在驅動Cas12a多肽基因表現之啟動子內,(c)在Cas12a多肽編碼序列中之ATG轉譯起始密碼子的100 bp內,(d)在病毒遞送載體之反向末端重複(iTR)內,例如在AAV基因體中。Once all copies of a gene in the genome of a cell have been edited, the system no longer needs to continue to express it in that cell. In fact, if off-target effects occur at unexpected genomic sites, etc., continued expression will be undesirable. Therefore, time-limited expression will be useful. Induced expression provides one approach, but in addition, the applicants have engineered a self-inactivating system that relies on the use of non-coding nucleic acid component molecular target sequences within the vector itself. Therefore, after expression begins, the system will cause its own destruction, but before the destruction is completed, it will have time to edit the genomic copies of the target gene (in the case of normal point mutations in diploid cells, it requires at most two edits). Briefly, the self-inactivation system includes an additional RNA (e.g., a nucleic acid component molecule) that targets the coding sequence of the Cas12a polypeptide itself or targets one or more non-coding nucleic acid component molecule target sequences that are complementary to a unique sequence present in one or more of the following: (a) within a promoter that drives expression of a non-coding RNA element, (b) within a promoter that drives expression of a Cas12a polypeptide gene, (c) within 100 bp of the ATG translation start codon in the Cas12a polypeptide coding sequence, (d) within the inverted terminal repeats (iTR) of a viral delivery vector, such as in the AAV genome.

在一些態樣中,提供單一核酸組分分子,其能夠與Cas12a多肽起始密碼子下游之序列雜交,由此在一段時期後,出現Cas12a多肽表現之損失。在一些態樣中,提供一或多種核酸組分分子,其能夠與編碼該系統之多核苷酸之一或多個編碼區或非編碼區雜交,由此在一段時期後,該系統中之一或多者或在一些情況下全部不活化。在該系統之一些態樣中,且不欲受理論限制,細胞可包含複數種複合物,其中複合物之第一子集包含能夠靶向欲編輯之一或多個基因體基因座的第一核酸組分分子,且複合物之第二子集包含至少一種能夠靶向編碼該系統之多核苷酸的第二核酸組分分子,其中複合物之第一子集介導一或多個靶向基因體基因座之編輯且複合物之第二子集最終使該系統不活化,由此使細胞中之進一步表現不活化。In some aspects, a single nucleic acid component molecule is provided that can hybridize with a sequence downstream of the Cas12a polypeptide start codon, whereby after a period of time, loss of Cas12a polypeptide expression occurs. In some aspects, one or more nucleic acid component molecules are provided that can hybridize with one or more coding regions or non-coding regions of a polynucleotide encoding the system, whereby after a period of time, one or more or all of the system is inactivated. In some aspects of the system, and without intending to be limited by theory, a cell may comprise a plurality of complexes, wherein a first subset of the complexes comprises a first nucleic acid component molecule capable of targeting one or more genomic loci to be edited, and a second subset of the complexes comprises at least one second nucleic acid component molecule capable of targeting a polynucleotide encoding the system, wherein the first subset of the complexes mediates editing of the one or more targeted genomic loci and the second subset of the complexes ultimately inactivates the system, thereby inactivating further expression in the cell.

各種編碼序列(Cas12a多肽及核酸組分分子)可包括於單一載體上或多個載體上。例如,有可能在一種載體上編碼酶且在另一載體上編碼各種RNA序列,或在一種載體上編碼酶及一種核酸組分分子,且在另一載體上編碼剩餘核酸組分分子,或任何其他排列。一般而言,使用總計一種或兩種不同載體之系統為較佳的。 H. 經工程改造之逆轉錄子的表現載體 The various coding sequences (Cas12a polypeptides and nucleic acid component molecules) can be included on a single vector or on multiple vectors. For example, it is possible to encode the enzyme on one vector and various RNA sequences on another vector, or to encode the enzyme and one nucleic acid component molecule on one vector and the remaining nucleic acid component molecules on another vector, or any other arrangement. In general, systems using a total of one or two different vectors are preferred. H. Expression vectors of engineered retrotransposons

將經工程改造之逆轉錄子遞送至細胞一般可使用或不使用載體來實現。由經工程改造之逆轉錄子編碼的ncRNA之遞送一般不需要用於由經工程改造之逆轉錄子產生ncRNA之載體。例如,ncRNA可直接封裝至遞送媒劑(諸如脂質奈米顆粒)中且遞送至宿主細胞中,如其他章節中所述。Delivery of engineered retrotranscripts to cells can generally be accomplished with or without the use of vectors. Delivery of ncRNA encoded by an engineered retrotranscript generally does not require a vector for producing the ncRNA from the engineered retrotranscript. For example, the ncRNA can be directly encapsulated into a delivery vehicle (such as lipid nanoparticles) and delivered to a host cell, as described in other sections.

經工程改造之逆轉錄子(或含有該等逆轉錄子之載體)可經引入任何類型之細胞中,包括來自原核、真核或古細菌生物體之任何細胞,包括細菌、古細菌、真菌、原生生物、植物( 例如,單子葉植物及雙子葉植物);及動物( 例如,脊椎動物及無脊椎動物)。可用經工程改造之逆轉錄子轉染的動物之實例包括但不限於脊椎動物(諸如魚、鳥、哺乳動物( 例如,人類及非人類靈長類動物、農場動物、寵物及實驗動物))、爬行動物及兩棲類。 Engineered retrotransposons (or vectors containing such retrotransposons) can be introduced into any type of cell, including any cell from a prokaryotic, eukaryotic, or archaeal organism, including bacteria, archaea, fungi, protists, plants ( e.g. , monocots and dicots); and animals ( e.g. , vertebrates and invertebrates). Examples of animals that can be transfected with engineered retrotransposons include, but are not limited to, vertebrates (e.g., fish, birds, mammals ( e.g. , humans and non-human primates, farm animals, pets, and experimental animals)), reptiles, and amphibians.

經工程改造之逆轉錄子可經引入單一細胞或細胞群體中。來自組織、器官及活組織檢查之細胞以及重組細胞、經遺傳修飾之細胞、來自 活體外培養之細胞株的細胞及人工細胞( 例如,奈米顆粒、脂質體、聚合物囊泡或囊封核酸之微膠囊)均可用經工程改造之逆轉錄子轉染。 Engineered retrotransposons can be introduced into single cells or cell populations. Cells from tissues, organs, and biopsies, as well as recombinant cells, genetically modified cells, cells from cell lines cultured in vitro , and artificial cells ( e.g. , nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids) can be transfected with engineered retrotransposons.

經工程改造之逆轉錄子可經引入細胞片段、細胞成分或細胞器( 例如,動物及植物細胞中之粒線體、植物細胞及藻類中之質體( 例如,葉綠體))中。 The engineered retrotransposons can be introduced into cellular fragments, cellular components, or cellular organelles ( e.g. , mitochondria in animal and plant cells, plastids ( e.g. , chloroplasts) in plant cells and algae).

在用經工程改造之逆轉錄子轉染後,細胞可進行培養或擴增。Following transfection with the engineered retrotransposons, cells can be cultured or expanded.

將核酸引入宿主細胞中之方法為此項技術中熟知的。常用方法包括通常使用二價陽離子( 例如,CaCl 2)之化學誘導性轉化、葡聚糖介導之轉染、聚凝胺介導之轉染、lipofectamine及LT-1介導之轉染、電穿孔、原生質體融合、將核酸囊封於脂質體中以及將包含經工程改造之逆轉錄子的核酸直接顯微注射至細胞核中。參見 例如Sambrook 等人(2001) Molecular Cloning, a laboratory manual, 第3版, Cold Spring Harbor Laboratories, New York,Davis 等人(1995) Basic Methods in Molecular Biology, 第2版, McGraw-Hill,及Chu 等人(1981) Gene 13:197;以引用之方式整體併入本文中。 Methods for introducing nucleic acids into host cells are well known in the art. Common methods include chemically induced transformation, usually using divalent cations ( e.g. , CaCl ), dextran-mediated transfection, polybrene-mediated transfection, lipofectamine and LT-1-mediated transfection, electroporation, protoplast fusion, encapsulation of nucleic acids in liposomes, and direct microinjection of nucleic acids containing engineered retrotransposons into cell nuclei. See , e.g., Sambrook et al. (2001) Molecular Cloning, a laboratory manual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis et al. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13:197; incorporated herein by reference in their entirety.

用於植物細胞之遺傳轉化的方法為此項技術中已知的且包括以下所闡述之彼等方法:US2022/0145296,及美國轉來第8,575,425號;第7,692,068號;第8,802,934號;第7,541,517號;其中每一個均以引用之方式整體併入本文中。亦參見Rakoczy-Trojanowska, M. (2002) Cell Mol Biol Lett. 7:849-858;Jones等人 (2005) Plant Methods 1:5;Rivera等人 (2012) Physics of Life Reviews 9:308-345;Bartlett等人 (2008) Plant Methods 4:1-12;Bates, G. W. (1999) Methods in Molecular Biology 111:359-366;Binns及Thomashow (1988) Annual Reviews in Microbiology 42:575-606;Christou, P. (1992) The Plant Journal 2:275-281;Christou, P. (1995) Euphytica 85:13-27;Tzfira等人 (2004) TRENDS in Genetics 20:375-383;Yao等人 (2006) Journal of Experimental Botany 57:3737-3746;Zupan及Zambryski (1995) Plant Physiology 107:1041-1047;以及Jones等人 (2005) Plant Methods 1:5。Methods for genetic transformation of plant cells are known in the art and include those described in US2022/0145296, and U.S. Patent Nos. 8,575,425; 7,692,068; 8,802,934; and 7,541,517; each of which is incorporated herein by reference in its entirety. See also Rakoczy-Trojanowska, M. (2002) Cell Mol Biol Lett. 7:849-858; Jones et al. (2005) Plant Methods 1:5; Rivera et al. (2012) Physics of Life Reviews 9:308-345; Bartlett et al. (2008) Plant Methods 4:1-12; Bates, G. W. (1999) Method s in Molecular Biology 111:359-366; Binns and Thomashow (1988) Annual Reviews in Microbiology 42:575-606; Christou, P. (1992) The Plant Journal 2:275-281; Christou, P. (1995) Euphytica 85:13-27; Tzfira et al. (2004) TRENDS in Genetics 20:375-383; Yao et al. (2006) Journal of Experimental Botany 57:3737-3746; Zupan and Zambryski (1995) Plant Physiology 107:1041-1047; and Jones et al. (2005) Plant Methods 1:5.

根據習知方法,已轉化之植物細胞可生長成轉殖基因生物體,諸如植物。參見例如McCormick等人 (1986) Plant Cell Reports 5:81-84。Transformed plant cells can be grown into transgenic organisms, such as plants, according to known methods. See, for example, McCormick et al. (1986) Plant Cell Reports 5:81-84.

可用本文所述之經工程改造之逆轉錄子轉化的植物材料包括植物細胞、植物原生質體、可使植物再生之植物細胞組織培養物、植物癒傷組織、植物叢以及植物或植物部分中之完整植物細胞(諸如胚胎、花粉、胚珠、種子、葉、花、枝、果實、仁、穗、穗軸、殼、莖、根、根尖、花藥及其類似物)。再生植物之後代、變異體及突變體亦包括於本揭示案之範圍內,只要此等部分包含由經工程改造之逆轉錄子引入的遺傳修飾。進一步提供保留由經工程改造之逆轉錄子引入的遺傳修飾之加工植物產品或副產品。Plant materials that can be transformed with the engineered retrotransposons described herein include plant cells, plant protoplasts, plant cell tissue cultures that can regenerate plants, plant wound tissue, plant clumps, and intact plant cells in plants or plant parts (e.g., embryos, pollen, ovules, seeds, leaves, flowers, branches, fruits, kernels, ears, cobs, husks, stems, roots, root tips, anthers, and the like). Progeny, variants, and mutants of the regenerated plants are also included within the scope of the present disclosure, as long as such parts contain the genetic modification introduced by the engineered retrotransposons. Further provided are processed plant products or by-products that retain the genetic modification introduced by the engineered retrotransposons.

本文所述之經工程改造之逆轉錄子可用於產生具有所需表型的轉殖基因植物,該等表型包括但不限於增加之疾病抗性(例如,增加之病毒、細菌或真菌抗性)、增加之昆蟲抗性、增加之乾旱抗性、增加之產率及改變之果實成熟特徵、糖及油成分以及顏色。The engineered retrotransposons described herein can be used to produce transgenic plants with desired phenotypes including, but not limited to, increased disease resistance (e.g., increased viral, bacterial or fungal resistance), increased insect resistance, increased drought resistance, increased yield, and altered fruit ripening characteristics, sugar and oil content, and color.

在一些實施例中,逆轉錄子 msr基因、 msd基因及/或 ret基因在 活體外自載體中表現,諸如在 活體外轉錄系統中。所得ncRNA或msDNA可經分離,接著進行封裝及/或調配以直接遞送至宿主細胞中。例如,經分離之ncRNA或msDNA可在諸如脂質奈米顆粒之遞送媒劑中進行封裝/調配,如其他章節中所述。 In some embodiments, the retrotransposons msr gene, msd gene and/or ret gene are expressed in an in vitro self-vector, such as in an in vivo transcription system. The resulting ncRNA or msDNA can be isolated and then packaged and/or formulated for direct delivery to a host cell. For example, the isolated ncRNA or msDNA can be packaged/formulated in a delivery medium such as lipid nanoparticles, as described in other sections.

在一些實施例中,逆轉錄子 msr基因、 msd基因及/或 ret基因在 活體內自細胞內之載體中表現。可用單一載體或多個單獨載體將逆轉錄子 msr基因、 msd基因及/或 ret基因引入細胞中以在宿主個體中產生msDNA。 In some embodiments, the retrotransposons msr gene, msd gene and/or ret gene are expressed in vivo from a vector in a cell. Retrotransposons msr gene, msd gene and/or ret gene can be introduced into cells using a single vector or multiple separate vectors to produce msDNA in a host individual.

在其他實施例中,本文所述之基於逆轉錄子之基因體編輯系統的逆轉錄子 msr基因、 msd基因及/或 ret基因以及任何其他組分(例如, 反式引導RNA、可程式化核酸酶(例如, 反式))可在 活體內由遞送至細胞之RNA表現。可用單一載體或多個單獨載體將逆轉錄子 msr基因、 msd基因及/或 ret基因引入細胞中以在宿主個體中產生msDNA。 In other embodiments, the retrotranscript msr gene, msd gene and/or ret gene and any other components (e.g., trans- guide RNA, programmable nuclease (e.g., trans- )) of the retrotranscript-based genome editing system described herein can be expressed in vivo by RNA delivered to cells. Retrotranscript msr gene, msd gene and/or ret gene can be introduced into cells using a single vector or multiple separate vectors to produce msDNA in a host individual.

編碼基於重組逆轉錄子之基因體編輯系統或其組分的載體及/或核酸分子可包括可操作地連接至逆轉錄子序列之控制元件,該等控制元件允許在個體物種 活體外活體內產生msDNA。例如,逆轉錄子 msr基因、 msd基因及/或 ret基因能可操作地連接至啟動子以允許逆轉錄子逆轉錄酶及/或msDNA產物之表現。在一些實施例中,編碼所需相關產物之異源序列( 例如,編碼多肽或調節RNA之多核苷酸、用於基因編輯之供體多核苷酸或用於分子記錄之原間隔基DNA)可插入 msr基因及/或 msd基因中。 Vectors and/or nucleic acid molecules encoding a recombinant retrotranscript-based genome editing system or its components may include control elements operably linked to the retrotranscript sequence that allow the production of msDNA in vitro or in vivo in an individual species. For example, a retrotranscript msr gene, msd gene, and/or ret gene can be operably linked to a promoter to allow expression of the retrotranscript reverse transcriptase and/or msDNA product. In some embodiments, a heterologous sequence encoding a desired related product ( e.g. , a polynucleotide encoding a polypeptide or regulatory RNA, a donor polynucleotide for gene editing, or a protospacer DNA for molecular recording) can be inserted into the msr gene and/or msd gene.

能夠用包含經工程改造之逆轉錄子序列之載體或逆轉錄子遞送系統轉染的任何真核、古細菌或原核細胞均可用於在 活體內產生msDNA。可憑經驗確定構築體產生msDNA以及其他逆轉錄子編碼產物之能力。例如,可藉由由於引入之序列而出現的表型變化或藉由直接DNA測序來分析經轉染之細胞。 Any eukaryotic, archaeal, or prokaryotic cell that can be transfected with a vector or retrotransposon delivery system containing an engineered retrotranscript sequence can be used to produce msDNA in vivo . The ability of a construct to produce msDNA and other retrotranscript-encoded products can be determined empirically. For example, transfected cells can be analyzed by phenotypic changes due to the introduced sequence or by direct DNA sequencing.

在一些實施例中,經工程改造之逆轉錄子由包含一或多種載體之載體系統產生。在該載體系統中, msr基因、 msd基因及/或 ret基因可由同一載體提供( 亦即,所有此類逆轉錄子元件之順式排列),其中該載體包含可操作地連接至 msr基因及/或 msd基因之啟動子。在一些實施例中,該啟動子進一步可操作地連接至 ret基因。在其他實施例中,該載體進一步包含可操作地連接至 ret基因之第二啟動子。或者, ret基因可由不包括 msr基因及/或 msd基因之第二載體提供( 亦即,msr-msd及ret之反式排列)。在其他實施例中, msr基因、 msd基因及 ret基因各自由不同載體提供( 亦即,所有逆轉錄子元件之 反式排列)。 In some embodiments, the engineered retrotransposons are produced by a vector system comprising one or more vectors. In the vector system, the msr gene, the msd gene and/or the ret gene can be provided by the same vector ( that is , the cis-arrangement of all such retrotransposons), wherein the vector comprises a promoter operably linked to the msr gene and/or the msd gene. In some embodiments, the promoter is further operably linked to the ret gene. In other embodiments, the vector further comprises a second promoter operably linked to the ret gene. Alternatively, the ret gene can be provided by a second vector that does not include the msr gene and/or the msd gene ( that is , the trans-arrangement of msr-msd and ret). In other embodiments, the msr gene, the msd gene and the ret gene are each provided by different vectors ( that is , the trans- arrangement of all retrotransposons).

多種載體可用於載體或載體系統,包括但不限於線性多核苷酸、與離子或兩親化合物相關之多核苷酸、質體及病毒。A variety of vectors can be used for the vector or vector system, including but not limited to linear polynucleotides, polynucleotides associated with ions or amphiphilic compounds, plasmids, and viruses.

病毒載體之實例包括但不限於腺病毒載體、腺相關病毒(AAV)載體、逆轉錄病毒載體、慢病毒載體及其類似物。表現構築體可在活細胞中複製,或其可以合成方式製得。Examples of viral vectors include, but are not limited to, adenoviral vectors, adeno-associated virus (AAV) vectors, retroviral vectors, lentiviral vectors, and the like. The expression construct can be replicated in living cells, or it can be produced synthetically.

在一些實施例中,包含經工程改造之逆轉錄子序列的核酸處於啟動子之轉錄控制下。在一些實施例中,該啟動子勝任藉由RNA聚合酶I、II或III起始可操作地連接之編碼序列之轉錄。In some embodiments, the nucleic acid comprising an engineered retrotranscript sequence is under the transcriptional control of a promoter. In some embodiments, the promoter is competent to initiate transcription of an operably linked coding sequence by RNA polymerase I, II or III.

用於哺乳動物細胞表現之例示性啟動子包括SV40早期啟動子、CMV啟動子(諸如CMV即刻早期啟動子) (參見美國專利第5,168,062號及第5,385,839號,以引用之方式整體併入本文中)、小鼠乳腺腫瘤病毒LTR啟動子、腺病毒主要晚期啟動子(Ad MLP)及單純疱疹病毒啟動子等。其他非病毒啟動子(諸如源自鼠科動物金屬硫蛋白基因之啟動子)亦將用於哺乳動物表現。Exemplary promoters for mammalian cell expression include the SV40 early promoter, the CMV promoter (such as the CMV immediate early promoter) (see U.S. Pat. Nos. 5,168,062 and 5,385,839, which are incorporated herein by reference in their entirety), the mouse mammary tumor virus LTR promoter, the adenovirus major late promoter (Ad MLP), and the herpes simplex virus promoter, etc. Other non-viral promoters (such as the promoter derived from the murine metallothionein gene) will also be used for mammalian expression.

用於植物細胞表現之例示性啟動子包括CaMV 35S啟動子(Odell等人, 1985, Nature 313:810-812);水稻肌動蛋白啟動子(McElroy等人, 1990, Plant Cell 2:163-171);泛素啟動子(Christensen等人, 1989, Plant Mol. Biol. 12:619-632;及Christensen等人, 1992, Plant Mol. Biol. 18:675-689);pEMU啟動子(Last等人, 1991, Theor. Appl. Genet. 81:581-588);及MAS啟動子(Velten等人, 1984, EMBO J. 3:2723-2730)。Exemplary promoters for plant cell expression include the CaMV 35S promoter (Odell et al., 1985, Nature 313:810-812); the rice actin promoter (McElroy et al., 1990, Plant Cell 2:163-171); the ubiquitin promoter (Christensen et al., 1989, Plant Mol. Biol. 12:619-632; and Christensen et al., 1992, Plant Mol. Biol. 18:675-689); the pEMU promoter (Last et al., 1991, Theor. Appl. Genet. 81:581-588); and the MAS promoter (Velten et al., 1984, EMBO J. 3:2723-2730).

在額外實施例中,基於逆轉錄子之載體亦可包含組織特異性啟動子,以便僅在其經遞送至特定組織中之後才開始表現。非限制性例示性組織特異性啟動子包括B29啟動子、CD14啟動子、CD43啟動子、CD45啟動子、CD68啟動子、結蛋白啟動子、彈性蛋白酶-1啟動子、內皮糖蛋白啟動子、纖維連接蛋白啟動子、Flt-1啟動子、GFAP啟動子、GPIIb啟動子、ICAM- 2啟動子、INF-b啟動子、Mb啟動子、Nphsl啟動子、OG-2啟動子、SP-B啟動子、SYN1啟動子及WASP啟動子。In additional embodiments, retroviral-based vectors may also contain tissue-specific promoters so that the protein is expressed only after it has been delivered to a specific tissue. Non-limiting exemplary tissue-specific promoters include B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, Flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-b promoter, Mb promoter, Nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter, and WASP promoter.

此等及其他啟動子可使用此項技術中熟知之技術自市售質體獲得或併入市售質體中。參見 例如Sambrook 等人, 同上。 These and other promoters can be obtained from or incorporated into commercially available plasmids using techniques well known in the art. See , e.g., Sambrook et al. , supra.

在一些實施例中,一或多種增強子元件與啟動子聯合使用以增加構築體之表現水準。實例包括SV40早期基因增強子,如Dijkema 等人, EMBOJ (1985) 4:761中所述;源自勞斯肉瘤病毒之長末端重複(LTR)的增強子/啟動子,如Gorman 等人, Proc. Natl. Acad. Sci. USA (1982b) 79:6777中所述;以及源自人類CMV之元件,如Boshart 等人, Cell (1985) 41:521中所述,諸如CMV內含子A序列中所包括之元件。所有此類序列均以引用之方式併入本文中。 In some embodiments, one or more enhancer elements are used in conjunction with a promoter to increase the expression level of the construct. Examples include the SV40 early gene enhancer, as described in Dijkema et al. , EMBOJ (1985) 4:761; enhancer/promoter derived from the long terminal repeat (LTR) of Rous sarcoma virus, as described in Gorman et al ., Proc. Natl. Acad. Sci. USA (1982b) 79:6777; and elements derived from human CMV, as described in Boshart et al. , Cell (1985) 41:521, such as elements included in the CMV intron A sequence. All such sequences are incorporated herein by reference.

在一實施例中,用於表現經工程改造之逆轉錄子(包括 msr基因、 msd基因及/或 ret基因)之表現載體包含可操作地連接至編碼 msr基因、 msd基因及/或 ret基因之多核苷酸的啟動子。 In one embodiment, an expression vector for expressing an engineered retrotranscript (including msr gene, msd gene and/or ret gene) comprises a promoter operably linked to a polynucleotide encoding msr gene, msd gene and/or ret gene.

在一些實施例中,載體或載體系統亦包含轉錄終止子/多腺苷酸化信號。此類序列之實例包括但不限於源自SV40之彼等序列,如Sambrook 等人, 同上中所述,以及牛生長激素終止子序列(參見 例如美國專利第5,122,458號)。 In some embodiments, the vector or vector system also comprises a transcriptional terminator/polyadenylation signal. Examples of such sequences include, but are not limited to, those derived from SV40, as described in Sambrook et al. , supra, and the bovine growth hormone terminator sequence (see , e.g., U.S. Patent No. 5,122,458).

另外,5'- UTR序列可置於與編碼序列相鄰處,以進一步增強表現。此類序列可包括包含內部核糖體進入位點(IRES)之UTR。IRES之包括允許自載體轉譯一或多個開放閱讀框。該IRES元件吸引真核生物核糖體轉譯起始複合物且促進轉譯起始。參見 例如Kaufman 等人, Nuc. Acids Res. (1991) 19:4485-4490;Gurtu 等人, Biochem. Biophys. Res. Comm. (1996) 229:295-298:Rees 等人, BioTechniques (1996) 20:102-110;Kobayashi 等人, BioTechniques (1996) 21:399-402;及Mosser 等人, BioTechniques (199722 ISO- 161)c。多種IRES序列為已知的且包括源自多種病毒之序列,諸如小RNA病毒之前導序列,諸如腦心肌炎病毒(EMCV) UTR (Jang 等人. Virol. (1989) 63:1651-1660)、脊髓灰白質炎前導序列、A型肝炎病毒前導、C型肝炎病毒IRES、人類鼻病毒2型IRES (Dobrikova 等人, Proc. Natl. Acad. Sci. (2003) 100(251:15125-151301))、來自口蹄疫病毒之IRES元件(Ramesh 等人, Nucl. Acid Res. (1996) 24:2697-2700)、梨形鞭毛蟲病毒IRES (Garlapati 等人, J Biol. Chem. (2004) 279(51):3389-33971)及其類似序列。多種非病毒IRES序列亦將用於本文中,包括但不限於來自酵母之IRES序列以及人類血管緊張素II 1型受體IRES (Martin 等人, Mol. Cell Endocrinol. (2003) 212:51-61)、成纖維細胞生長因子IRES (FGF-1 IRES及FGF-2 IRES,Martineau 等人(2004) Mol. Cell. Biol. 24( 17): 7622-7635)、血管內皮生長因子IRES (Baranick 等人(2008) Proc. Natl. Acad Sci. U.S.A. 105(12):4733-4738,Stein 等人(1998) Mol. Cell. Biol. 18(6):3112-3119,Bert 等人(2006) RNA 12(6): 1074-1083)及胰島素樣生長因子2 IRES (Pedersen 等人(2002) Biochem. J. 363(Pt l):37-44)。 Additionally, a 5'-UTR sequence may be placed adjacent to the coding sequence to further enhance expression. Such sequences may include a UTR comprising an internal ribosome entry site (IRES). The inclusion of an IRES allows translation of one or more open reading frames from the vector. The IRES element attracts the eukaryotic ribosome translation initiation complex and promotes translation initiation. See , e.g., Kaufman et al ., Nuc. Acids Res. (1991) 19:4485-4490; Gurtu et al. , Biochem. Biophys. Res. Comm. (1996) 229:295-298; Rees et al. , BioTechniques (1996) 20:102-110; Kobayashi et al. , BioTechniques (1996) 21:399-402; and Mosser et al. , BioTechniques (199722 ISO-161)c. A variety of IRES sequences are known and include sequences derived from a variety of viruses, such as the leader sequences of picorna viruses, such as the encephalomyocarditis virus (EMCV) UTR (Jang et al . Virol. (1989) 63:1651-1660), the polio leader, the hepatitis A virus leader, the hepatitis C virus IRES, the human rhinovirus type 2 IRES (Dobrikova et al. , Proc. Natl. Acad. Sci. (2003) 100(251:15125-151301)), the IRES element from foot-and-mouth disease virus (Ramesh et al. , Nucl. Acid Res. (1996) 24:2697-2700), the pyriformis virus IRES (Garlapati et al. , J Biol. Chem. (2004) 279(51):3389-33971) and similar sequences. A variety of non-viral IRES sequences will also be used herein, including but not limited to IRES sequences from yeast and human angiotensin II type 1 receptor IRES (Martin et al. , Mol. Cell Endocrinol. (2003) 212:51-61), fibroblast growth factor IRES (FGF-1 IRES and FGF-2 IRES, Martineau et al. (2004) Mol. Cell. Biol. 24(17):7622-7635), vascular endothelial growth factor IRES (Baranick et al . (2008) Proc. Natl. Acad Sci. USA 105(12):4733-4738, Stein et al. (1998) Mol. Cell. Biol. 18(6):3112-3119, Bert et al. (2006) RNA 12(6): 1074-1083) and insulin-like growth factor 2 IRES (Pedersen et al. (2002) Biochem. J. 363(Pt 1):37-44).

此等元件可以 例如由Clontech (Mountain View, CA)、Invivogen (San Diego, CA), Addgene (Cambridge, MA)及GeneCopoeia (Rockville, MD)出售之質體形式購得。亦參見IRESite: The database of experimentally verified IRES structures (iresite.org)。IRES序列可包括於載體中,例如,以與來自表現卡匣之逆轉錄子逆轉錄酶組合來表現用於重組工程之多種噬菌體重組蛋白或用於HDR之RNA引導之核酸酶( 例如,Cas9)。 Such elements can be purchased in the form of plasmids sold , for example, by Clontech (Mountain View, CA), Invivogen (San Diego, CA), Addgene (Cambridge, MA), and GeneCopoeia (Rockville, MD). See also IRESite: The database of experimentally verified IRES structures (iresite.org). IRES sequences can be included in vectors, for example, to express a variety of phage recombinant proteins for recombineering or RNA-guided nucleases ( e.g. , Cas9) for HDR in combination with a retrotranscriptase from an expression cassette.

在一些實施例中,可使用編碼病毒自裂解2A肽(諸如T2A肽)之多核苷酸,以允許在一種啟動子下自單一載體或單一轉錄單位產生多種蛋白質產物( 例如,Cas9、噬菌體重組蛋白、逆轉錄子逆轉錄酶)。可將一或多種2A連接體肽插入多順反子構築體中之編碼序列之間。該2A肽為自裂解的,允許以等莫耳水準產生自該多順反子構築體中共表現之蛋白質。可使用來自各種病毒之2A肽,包括但不限於源自口蹄疫病毒、馬A型鼻炎病毒、Jhosea asigna病毒及豬鐵士古病毒-1之2A肽。參見 例如Kim 等人(2011) PLoS One 6(4): el8556,Trichas 等人(2008) BMC Biol. 6:40,Provost 等人(2007) Genesis 45(10): 625-629,Furler 等人(2001) Gene Ther. 8(11):864-873;以引用之方式整體併入本文中。 In some embodiments, polynucleotides encoding viral self-cleaving 2A peptides (such as T2A peptides) can be used to allow the production of multiple protein products ( e.g. , Cas9, phage recombinant proteins, retrotranscriptase) from a single vector or a single transcription unit under one promoter. One or more 2A linker peptides can be inserted between the coding sequences in the polycistronic construct. The 2A peptide is self-cleaving, allowing the production of proteins co-expressed in the polycistronic construct at equimolar levels. 2A peptides from various viruses can be used, including but not limited to 2A peptides from foot-and-mouth disease virus, equine rhinitis virus A, Jhosea asigna virus, and porcine tegus virus-1. See , e.g., Kim et al. (2011) PLoS One 6(4): el8556, Trichas et al. (2008) BMC Biol. 6:40, Provost et al. (2007) Genesis 45(10): 625-629, Furler et al. (2001) Gene Ther. 8(11):864-873; each of which is incorporated herein by reference in its entirety.

在一些實施例中,該表現構築體包含適合轉化細菌宿主之質體。多種細菌表現載體為熟習此項技術者已知的,且適當載體之選擇為選擇問題。細菌表現載體包括但不限於pACYC177、pASK75、pBAD、pBADM、pBAT、pCal、pET、pETM、pGAT、pGEX、pHAT、pKK223、pMal、pProEx、pQE及pZA31 細菌質體可含有抗生素選擇標記物( 例如,胺苄青黴素(ampicillin)、卡那黴素、紅黴素、羧苄青黴素、鏈黴素或四環素抗性)、lacZ基因(b-半乳糖苷酶由x-gal受質產生藍色色素)、螢光標記物( 例如GFP. mCherry)或用於選擇經轉化細菌之其他標記物。參見 例如Sambrook 等人, 同上。 In some embodiments, the expression construct comprises a plasmid suitable for transformation of a bacterial host. A variety of bacterial expression vectors are known to those skilled in the art, and the selection of an appropriate vector is a matter of choice. Bacterial expression vectors include, but are not limited to, pACYC177, pASK75, pBAD, pBADM, pBAT, pCal, pET, pETM, pGAT, pGEX, pHAT, pKK223, pMal, pProEx, pQE, and pZA31. Bacterial plasmids may contain antibiotic selection markers ( e.g. , ampicillin, kanamycin, erythromycin, carbenicillin, streptomycin, or tetracycline resistance), the lacZ gene (b-galactosidase produces a blue pigment from the x-gal substrate), fluorescent markers ( e.g., GFP. mCherry), or other markers for selection of transformed bacteria. See , e.g., Sambrook et al. , supra.

在其他實施例中,該表現構築體包含適合轉化酵母細胞之質體。酵母表現質體通常含有酵母特異性複製起點(ORI)及營養選擇標記物( 例如,HIS3、URA3、LYS2、LEU2、TRP1、METIS、ura4+、leul+、ade6+)、抗生素選擇標記物( 例如,卡那黴素抗性)、螢光標記物( 例如,mCherry)或用於選擇經轉化酵母細胞之其他標記物。酵母質體可進一步含有允許在細菌宿主( 例如,大腸桿菌)及酵母細胞之間穿梭之組分。可獲得許多不同類型之酵母質體,包括酵母整合質體(Yip),其缺乏ORI且藉由同源重組整合至宿主染色體中;酵母複製質體(YRp),其含有自主複製序列(ARS)且可獨立複製;酵母著絲粒質體(YCp),其為含有ARS部分及著絲粒序列(CEN)部分之低複本載體;及酵母附加型質體(YEp),其為包含來自2微米環(天然酵母質體)之片段的高複本數質體,該片段允許每個細胞穩定繁殖50個或更多複本。 In other embodiments, the expression construct comprises a plasmid suitable for transforming yeast cells. Yeast expression plasmids typically contain a yeast-specific origin of replication (ORI) and a nutritional selection marker ( e.g. , HIS3, URA3, LYS2, LEU2, TRP1, METIS, ura4+, leul+, ade6+), an antibiotic selection marker ( e.g. , kanamycin resistance), a fluorescent marker ( e.g. , mCherry), or other markers for selecting transformed yeast cells. Yeast plasmids may further contain components that allow shuttling between bacterial hosts ( e.g. , E. coli) and yeast cells. Many different types of yeast plasmids are available, including yeast integrating plasmids (Yip), which lack an ORI and integrate into host chromosomes by homologous recombination; yeast replicating plasmids (YRp), which contain an autonomously replicating sequence (ARS) and can replicate independently; yeast centromeric plasmids (YCp), which are low-copy vectors containing an ARS portion and a centromeric sequence (CEN) portion; and yeast episomal plasmids (YEp), which are high-copy number plasmids that contain a fragment from the 2-micron circle (a native yeast plasmid) that allows for the stable propagation of 50 or more copies per cell.

在其他實施例中,該表現構築體不包含適合轉化酵母細胞之質體。In other embodiments, the expression construct does not comprise a plasmid suitable for transformation of yeast cells.

在其他實施例中,該表現構築體包含源自病毒基因體之病毒或經工程改造之構築體。已開發多種基於病毒之系統以將基因轉移至哺乳動物細胞中。此等包括腺病毒、逆轉錄病毒(g-逆轉錄病毒及慢病毒)、痘病毒、腺相關病毒、桿狀病毒及單純疱疹病毒(參見 例如Wamock 等人(2011) Methods Mol. Biol. 737:1-25;Walther 等人(2000) Drugs 60(2):249-271;及Lundstrom (2003) Trends Biotechnol. 21(3): 117-122;以引用之方式整體併入本文中)。某些病毒能夠經由受體介導之內吞作用進入細胞,整合至宿主細胞基因體中且穩定且有效地表現病毒基因,使其成為將外源基因轉移至哺乳動物細胞中之有吸引力之候選者。 In other embodiments, the expression construct comprises a virus or an engineered construct derived from a viral genome. A variety of virus-based systems have been developed to transfer genes into mammalian cells. These include adenoviruses, retroviruses (g-retroviruses and lentiviruses), poxviruses, adeno-associated viruses, bacilli, and herpes simplex viruses (see , e.g., Wamock et al. (2011) Methods Mol. Biol. 737: 1-25; Walther et al. (2000) Drugs 60(2): 249-271; and Lundstrom (2003) Trends Biotechnol. 21(3): 117-122; incorporated herein by reference in their entirety). Certain viruses are able to enter cells via receptor-mediated endocytosis, integrate into the host cell genome and stably and efficiently express viral genes, making them attractive candidates for transferring foreign genes into mammalian cells.

例如,逆轉錄病毒為基因遞送系統提供一個便利平台。可使用此項技術中已知之技術將所選序列插入載體中且封裝於逆轉錄病毒顆粒中。接著可分離重組病毒且將其 活體內或離體遞送至個體之細胞。已描述多種逆轉錄病毒系統(美國專利第5,219,740號;Miller及Rosman (1989) BioTechniques 7:980-990;Miller, A. D. (1990) Human Gene Therapy 1:5-14;Scarpa 等人(1991) Virology 180:849-852; Bums 等人(1993) Proc. Natl. Acad. Sci. USA 90: 8033-8037;Boris-Lawrie及Temin (1993) Cur. Opin. Genet. Develop. 3:102-109;及Ferry 等人(2011) Curr. Pharm. Des. 17(24): 2516-2527)。慢病毒為一類逆轉錄病毒,其尤其可用於將多核苷酸遞送至哺乳動物細胞,因為它們能夠感染分裂細胞及非分裂細胞(參見 例如Lois 等人(2002) Science 295:868-872;Durand 等人(2011) Viruses 3(2): 132-159;以引用之方式併入本文中)。 For example, retroviruses provide a convenient platform for gene delivery systems. The selected sequence can be inserted into a vector and encapsulated in a retroviral particle using techniques known in the art. The recombinant virus can then be isolated and delivered to the cells of an individual in vivo or in vitro. A variety of retroviral systems have been described (U.S. Patent No. 5,219,740; Miller and Rosman (1989) BioTechniques 7:980-990; Miller, AD (1990) Human Gene Therapy 1:5-14; Scarpa et al. (1991) Virology 180:849-852; Bums et al. (1993) Proc. Natl. Acad. Sci. USA 90: 8033-8037; Boris-Lawrie and Temin (1993) Cur. Opin. Genet. Develop. 3:102-109; and Ferry et al. (2011) Curr. Pharm. Des. 17(24): 2516-2527). Lentiviruses are a type of retrovirus that are particularly useful for delivering polynucleotides to mammalian cells because they are able to infect both dividing and non-dividing cells (see , e.g., Lois et al. (2002) Science 295:868-872; Durand et al. (2011) Viruses 3(2): 132-159; incorporated herein by reference).

亦已描述多種腺病毒載體。與整合至宿主基因體中之逆轉錄病毒不同,腺病毒在染色體外持續存在,因此將與插入突變誘發相關之風險降至最低。A variety of adenoviral vectors have also been described. Unlike retroviruses, which integrate into the host genome, adenoviruses persist extrachromosomally, thus minimizing the risks associated with the induction of insertional mutagenesis.

另外,已開發各種腺相關病毒(AAV)載體系統來用於基因遞送。AAV載體可使用此項技術中熟知之技術容易地構建。參見 例如美國專利第5,173,414號及第5,139,941號;國際公開案第WO 92/01070號(1992年1月23日公開)及第WO 93/03769號(1993年3月4日公開);Lebkowski 等人, Molec. Cell. Biol. (1988) 8:3988-3996;Vincent 等人, Vaccines 90 (1990) (Cold Spring Harbor LaboratoryPress);Carter, B. J. Current Opinion in Biotechnology (1992) 3:533-539;Muzyczka, N. Current Topics in Microbiol and Immunol. (1992) 158:97-129;Kotin, R. M. Human Gene Therapy (1994) 5:793-801;Shelling及Smith, Gene Therapy (1994) 1:165-169;及Zhou 等人, J. Exp. Med. (1994) 179:1867-1875。 In addition, various adeno-associated virus (AAV) vector systems have been developed for gene delivery. AAV vectors can be easily constructed using techniques well known in the art. See , e.g., U.S. Patent Nos. 5,173,414 and 5,139,941; International Publication Nos. WO 92/01070 (published Jan. 23, 1992) and WO 93/03769 (published Mar. 4, 1993); Lebkowski et al. , Molec. Cell. Biol. (1988) 8:3988-3996; Vincent et al. , Vaccines 90 (1990) (Cold Spring Harbor Laboratory Press); Carter, BJ Current Opinion in Biotechnology (1992) 3:533-539; Muzyczka, N. Current Topics in Microbiol and Immunol. (1992) 158:97-129; Kotin, RM Human Gene Therapy (1994) 5:793-801; Shelling and Smith, Gene Therapy (1994) 1:165-169; and Zhou et al. , J. Exp. Med. (1994) 179:1867-1875.

可用於遞送編碼經工程改造之逆轉錄子的核酸之另一載體系統係由Small, Jr., P. A. 等人描述之經腸投與之重組痘病毒疫苗(美國專利第5,676,950號,1997年10月14日發布,以引用之方式併入本文中)。 Another vector system that can be used to deliver nucleic acids encoding engineered retrotransposons is the enterally administered recombinant poxvirus vaccine described by Small, Jr., PA et al. (U.S. Patent No. 5,676,950, issued October 14, 1997, incorporated herein by reference).

其他病毒載體包括源自包括牛痘病毒及禽痘病毒在內的病毒之痘家族之彼等病毒載體。舉例而言,可如下構建表現相關核酸分子( 例如,經工程改造之逆轉錄子)之牛痘病毒重組體。首先將編碼特定核酸序列之DNA插入適當載體中,使得其鄰近牛痘啟動子及側接牛痘DNA序列,諸如編碼胸苷激酶(TK)之序列。接著使用此載體來轉染同時感染牛痘之細胞。同源重組用於將牛痘啟動子加上編碼相關序列之基因插入病毒基因體中。可藉由在5-溴去氧尿苷存在下培養細胞且挑選對其具有抗性之病毒斑來選擇所得TK重組體。 Other viral vectors include those derived from the pox family of viruses, including vaccinia virus and fowlpox virus. For example, a vaccinia virus recombinant expressing a nucleic acid molecule of interest ( e.g. , an engineered retrotransposons) can be constructed as follows. First, DNA encoding a specific nucleic acid sequence is inserted into an appropriate vector so that it is adjacent to the vaccinia promoter and flanked by vaccinia DNA sequences, such as sequences encoding thymidine kinase (TK). This vector is then used to transfect cells simultaneously infected with vaccinia. Homologous recombination is used to insert the vaccinia promoter plus a gene encoding a sequence of interest into the viral genome. The resulting TK recombinant can be selected by culturing the cells in the presence of 5-bromodeoxyuridine and selecting viral plaques that are resistant to it.

在一些實施例中,亦可使用禽痘病毒(諸如雞痘病毒及金絲雀痘病毒)來遞送相關核酸分子。禽痘載體之使用在人類及其他哺乳動物物種中為尤其需要的,因為禽痘屬之成員僅可在易感禽物種中進行生產性複製且因此在哺乳動物細胞中不具感染性。用於產生重組禽痘病毒之方法為此項技術中已知的且採用遺傳重組,如上文關於牛痘病毒之產生所述。參見 例如WO 91/12882;WO 89/03429;及WO 92/03545。 In some embodiments, avipox viruses (such as chickenpox virus and canarypox virus) can also be used to deliver relevant nucleic acid molecules. The use of avipox vectors is particularly desirable in humans and other mammalian species because members of the genus Avipox can only replicate productively in susceptible avian species and are therefore not infectious in mammalian cells. Methods for generating recombinant avipox viruses are known in the art and employ genetic recombination, as described above for the generation of vaccinia viruses. See, for example, WO 91/12882; WO 89/03429; and WO 92/03545.

分子結合載體亦可用於基因遞送,諸如以下所述之腺病毒嵌合載體:Michael 等人, J. Biol. Chem. (1993) 268:6866-6869及Wagner 等人, Proc. Natl. Acad. Sci. USA (1992) 89:6099-6103。 Molecularly conjugated vectors can also be used for gene delivery, such as the adenovirus chimeric vectors described by Michael et al ., J. Biol. Chem. (1993) 268:6866-6869 and Wagner et al. , Proc. Natl. Acad. Sci. USA (1992) 89:6099-6103.

α病毒屬之成員亦將用作病毒載體來遞送本發明之多核苷酸,諸如但不限於源自辛德畢斯病毒(SIN)、Semliki森林病毒(SFV)及委內瑞拉馬腦炎病毒(VEE)之載體。對於可用於實踐本發明方法之辛德畢斯病毒源性載體之描述,參見Dubensky 等人(1996) J. Virol. 70:508-519;及國際公開案第WO 95/07995號、第WO 96/17072號;以及Dubensky, Jr., T. W. 等人, 美國專利第5,843,723號, 1998年12月1日發佈,及Dubensky, Jr., T. W., 美國專利第5,789,245號, 1998年8月4日發佈,兩者均以引用之方式併入本文中。尤其較佳為嵌合α病毒載體,其包含源自辛德畢斯病毒及委內瑞拉馬腦炎病毒之序列。參見 例如Perri 等人(2003) J. Virol. 77: 10394-10403及國際公開案第WO 02/099035號、第WO 02/080982號、第WO 01/81609號及第WO 00/61772號;以引用之方式整體併入本文中。 Members of the alphavirus genus may also be used as viral vectors to deliver the polynucleotides of the invention, such as, but not limited to, vectors derived from Sindbis virus (SIN), Semliki Forest virus (SFV), and Venezuelan equine encephalitis virus (VEE). For a description of Sindbis virus-derived vectors that can be used to practice the methods of the present invention, see Dubensky et al . (1996) J. Virol. 70:508-519; and International Publication Nos. WO 95/07995 and WO 96/17072; and Dubensky, Jr., TW et al. , U.S. Patent No. 5,843,723, issued December 1, 1998, and Dubensky, Jr., TW, U.S. Patent No. 5,789,245, issued August 4, 1998, both of which are incorporated herein by reference. Particularly preferred are chimeric alphavirus vectors that contain sequences derived from Sindbis virus and Venezuelan equine encephalitis virus. See , e.g., Perri et al. (2003) J. Virol. 77: 10394-10403 and International Publication Nos. WO 02/099035, WO 02/080982, WO 01/81609, and WO 00/61772; incorporated herein by reference in their entirety.

基於牛痘之感染/轉染系統可便利地用於在宿主細胞中提供相關核酸( 例如,經工程改造之逆轉錄子)之誘導性瞬時表現。在此系統中,細胞首先在 活體外經牛痘病毒重組體感染,該重組體編碼噬菌體T7 RNA聚合酶。此聚合酶呈現強烈特異性,因為其僅轉錄攜帶T7啟動子之模板。感染後,細胞在T7啟動子之驅動下經相關核酸轉染。牛痘病毒重組體在細胞質中表現之聚合酶將經轉染之DNA轉錄成RNA。該方法提供大量RNA之高水準、瞬時、細胞質產生。參見 例如Elroy-Stein及Moss, Proc. Natl. Acad. Sci. USA (1990) 87:6743- 6747;Fuerst 等人, Proc. Natl. Acad. Sci. USA (1986) 83:8122-8126。 The vaccinia-based infection/transfection system can be conveniently used to provide induced transient expression of relevant nucleic acids ( e.g. , engineered retrotransposons) in host cells. In this system, cells are first infected in vitro with vaccinia virus recombinants that encode bacteriophage T7 RNA polymerase. This polymerase exhibits strong specificity in that it only transcribes templates carrying the T7 promoter. After infection, cells are transfected with relevant nucleic acids driven by the T7 promoter. The polymerase expressed in the cytoplasm by the vaccinia virus recombinants transcribes the transfected DNA into RNA. This method provides high-level, transient, cytoplasmic production of large amounts of RNA. See , e.g., Elroy-Stein and Moss, Proc. Natl. Acad. Sci. USA (1990) 87:6743-6747; Fuerst et al. , Proc. Natl. Acad. Sci. USA (1986) 83:8122-8126.

在用牛痘或禽痘病毒重組體感染或使用其他病毒載體遞送核酸之其他方法中,可使用擴增系統,該系統將在引入宿主細胞中之後導致高水準表現。特定言之,可對T7 RNA聚合酶編碼區之前之T7 RNA聚合酶啟動子進行工程改造。源自此模板之RNA之轉譯將產生T7 RNA聚合酶,該T7 RNA聚合酶又將轉錄更多模板。同時將存在cDNA,其表現處於T7啟動子之控制下。因此,由擴增模板RNA轉譯生成之一些T7 RNA聚合酶將導致所需基因之轉錄。因為需要一些T7 RNA聚合酶來起始擴增,故可將T7 RNA聚合酶連同模板一起引入細胞中以引發轉錄反應。該聚合酶可作為蛋白質經引入,或經引入編碼RNA聚合酶之質體上。對於T7系統及其用於轉化細胞之用途之進一步論述,參見 例如國際公開案第WO 94/26911號;Studier及Moffatt, J. Mol. Biol. (1986) 189:113-130;Deng及Wolff, Gene (1994) 143:245-249;Gao 等人, Biochem. Biophys. Res. Commun. (1994) 200:1201-1206;Gao及Huang, Nuc. Acids Res. (1993) 21:2867-2872;Chen 等人, Nuc. Acids Res. (1994) 22:2114-2120;及美國專利第5,135,855號。 In other methods of delivering nucleic acids using recombinant infection of cowpox or fowlpox viruses or using other viral vectors, an amplification system can be used that will result in high levels of expression after introduction into host cells. Specifically, the T7 RNA polymerase promoter preceding the T7 RNA polymerase coding region can be engineered. Translation of RNA derived from this template will produce T7 RNA polymerase, which in turn will transcribe more templates. At the same time, there will be cDNA, whose expression is under the control of the T7 promoter. Therefore, some T7 RNA polymerase generated by the amplified template RNA will result in transcription of the desired gene. Because some T7 RNA polymerase is needed to initiate amplification, the T7 RNA polymerase can be introduced into the cell together with the template to initiate the transcription reaction. The polymerase can be introduced as a protein, or introduced into the plasmid encoding the RNA polymerase. For further discussion of the T7 system and its use for transforming cells, see , e.g., International Publication No. WO 94/26911; Studier and Moffatt, J. Mol. Biol. (1986) 189:113-130; Deng and Wolff, Gene (1994) 143:245-249; Gao et al. , Biochem. Biophys. Res. Commun. (1994) 200:1201-1206; Gao and Huang, Nuc. Acids Res. (1993) 21:2867-2872; Chen et al. , Nuc. Acids Res. (1994) 22:2114-2120; and U.S. Patent No. 5,135,855.

亦可使用昆蟲細胞表現系統,諸如桿狀病毒系統,且該等系統為熟習此項技術者已知的且描述於 例如桿狀病毒及昆蟲細胞表現方案(Methods in Molecular Biology, D.W. Murhammer編, Humana Press, 第2版, 2007)及L. King The Baculovirus Expression System: A laboratory guide (Springer, 1992)中。用於桿狀病毒/昆蟲細胞表現系統之材料及方法可以套組形式尤其自Thermo Fisher Scientific (Waltham, MA)及Clontech (Mountain View, CA)購得。 Insect cell expression systems, such as the baculovirus system, may also be used and are known to those skilled in the art and are described, for example, in Baculovirus and Insect Cell Expression Protocols (Methods in Molecular Biology, DW Murhammer, ed., Humana Press, 2nd edition, 2007) and L. King The Baculovirus Expression System: A laboratory guide (Springer, 1992). Materials and methods for the baculovirus/insect cell expression system are available in kit form, inter alia, from Thermo Fisher Scientific (Waltham, MA) and Clontech (Mountain View, CA).

植物表現系統亦可用於轉化植物細胞。一般而言,此類系統使用基於病毒之載體以異源基因轉染植物細胞。對於此類系統之描述,參見 例如Porta 等人, Mol. Biotech. (1996) 5:209-221;及Hackland 等人, Arch. Virol. (1994) 139:1-22。 Plant expression systems can also be used to transform plant cells. Generally, such systems use virus-based vectors to transfect plant cells with heterologous genes. For descriptions of such systems, see, for example, Porta et al. , Mol. Biotech. (1996) 5:209-221; and Hackland et al. , Arch. Virol. (1994) 139:1-22.

為了獲得經工程改造之逆轉錄子或由此編碼之ncRNA的表現,必須將表現構築體或ncRNA遞送至細胞中。此遞送可在 活體外完成,如在用於轉化細胞株之實驗室程序中,或在 活體內離體完成,如在某些疾病狀態之治療中。一種遞送機制係經由病毒感染,其中表現構築體經囊封於傳染性病毒顆粒中。 In order to obtain expression of an engineered retrotranscript or the ncRNA encoded thereby, the expression construct or ncRNA must be delivered to the cell. This delivery can be accomplished in vitro , such as in laboratory procedures used to transform cell lines, or in vivo or ex vivo , such as in the treatment of certain disease states. One delivery mechanism is through viral infection, in which the expression construct is encapsulated in infectious viral particles.

亦考慮數種用於將表現構築體轉移至培養細胞中之非病毒方法。此等包括使用磷酸鈣沈澱、DEAE-葡聚糖、電穿孔、直接顯微注射、負載DNA之脂質體、lipofectamine-DNA複合物、細胞音波處理、使用高速微彈之基因轟擊及受體介導之轉染(參見 例如Graham及Van Der Eb (1973) Virology 52:456-467;Chen及Okayama (1987) Mol. Cell Biol. 7:2745-2752;Rippe 等人(1990) Mol. Cell Biol. 10:689-695;Gopal (1985) Mol. Cell Biol. 5:1188-1190;Tur-Kaspa 等人(1986) Mol. Cell. Biol. 6:716-718;Potter 等人(1984) Proc. Natl. Acad. Sci. USA 81:7161-7165);Harland及Weintraub (1985) J. Cell Biol. 101:1094-1099);Nicolau及Sene (1982) Biochim. Biophys. Acta 721:185-190;Fraley 等人(1979) Proc. Natl. Acad. Sci. USA 76:3348-3352;Fechheimer 等人(1987) Proc Natl. Acad. Sci. USA 84:8463-8467;Yang 等人(1990) Proc. Natl. Acad. Sci. USA 87:9568-9572;Wu及Wu (1987) J. Biol. Chem. 262:4429-4432;Wu及Wu (1988) Biochemistry 27:887-892;以引用之方式併入本文中)。此等技術中之一些可成功地經調適用於 活體內離體使用。 Several non-viral methods for transferring expression constructs into cultured cells are also contemplated. These include the use of calcium phosphate precipitation, DEAE-dextran, electroporation, direct microinjection, DNA-loaded liposomes, lipofectamine-DNA complexes, cell sonication, gene bombardment using high-speed microprojectiles, and receptor-mediated transfection (see , e.g., Graham and Van Der Eb (1973) Virology 52:456-467; Chen and Okayama (1987) Mol. Cell Biol. 7:2745-2752; Rippe et al. (1990) Mol. Cell Biol. 10:689-695; Gopal (1985) Mol. Cell Biol. 5:1188-1190; Tur-Kaspa et al. (1986) Mol. Cell. Biol. 6:716-718; Potter et al. (1984) Proc. Natl. Acad. USA 81:7161-7165); Harland and Weintraub (1985) J. Cell Biol. 101:1094-1099); Nicolau and Sene (1982) Biochim. Biophys. Acta 721:185-190; Fraley et al. (1979) Proc. Natl. Acad. Sci. USA 76:3348-3352; Fechheimer et al. (1987) Proc. Natl. Acad. Sci. USA 84:8463-8467; Yang et al. (1990) Proc. Natl. Acad. Sci. USA 87:9568-9572; Wu and Wu (1987) J. Biol. Chem. 262:4429-4432; Wu and Wu (1988) Biochemistry 27:887-892; incorporated herein by reference). Some of these techniques can be successfully adapted for in vivo or ex vivo use.

一旦表現構築體已經遞送至細胞中,包含經工程改造之逆轉錄子序列的核酸就可定位於不同位點處且表現。在一些實施例中,包含經工程改造之逆轉錄子序列的核酸可穩定地整合至細胞之基因體中。此整合可經由同源重組(基因置換)處於同源位置及取向中,或其可在隨機、非特異性位置中經整合(基因增強)。在其他實施例中,核酸可作為獨立、附加型DNA區段穩定地維持於細胞中。此類核酸區段或附加體編碼足以允許獨立於宿主細胞週期或與宿主細胞週期同步地維持及複製之序列。如何將表現構築體遞送至細胞以及核酸保留於細胞中之何處取決於所採用之表現構築體類型。Once the expression construct has been delivered to the cell, the nucleic acid comprising the engineered retrotranscript sequence can be localized at different sites and expressed. In some embodiments, the nucleic acid comprising the engineered retrotranscript sequence can be stably integrated into the genome of the cell. This integration can be in a homologous position and orientation via homologous recombination (gene replacement), or it can be integrated in a random, non-specific position (gene enhancement). In other embodiments, the nucleic acid can be stably maintained in the cell as an independent, additional DNA segment. Such nucleic acid segments or episomes encode sequences sufficient to allow maintenance and replication independent of or in sync with the host cell cycle. How the expression construct is delivered to the cell and where in the cell the nucleic acid is retained depends on the type of expression construct used.

在一些實施例中,表現構築體可簡單地由包含經工程改造之逆轉錄子的裸重組DNA或質體組成。構築體之轉移可藉由任何上文所提及之方法執行,該等方法以物理方式或以化學方式滲透細胞膜。此尤其適用於 活體外轉移,但其亦可應用於 活體內使用。Dubensky 等人(Proc. Natl. Acad. Sci. USA (1984) 81:7529-7533)成功地將呈磷酸鈣沈澱物形式之多瘤病毒DNA注射至成年及新生小鼠之肝臟及脾臟中,證明活躍病毒複製及急性傳染。Benvenisty及Neshif (Proc. Natl. Acad. Sci. USA (1986) 83:9551-9555)亦證明直接腹膜內注射磷酸鈣沈澱之質體導致經轉染基因之表現。預期編碼相關經工程改造之逆轉錄子的DNA亦可在 活體內以相似方式轉移且表現逆轉錄子產物。 In some embodiments, the expression construct may simply consist of naked recombinant DNA or plasmids containing an engineered retrotransposon. Transfer of the construct may be performed by any of the methods mentioned above that physically or chemically permeate the cell membrane. This is particularly useful for in vitro transfer, but it may also be applied for in vivo use. Dubensky et al. (Proc. Natl. Acad. Sci. USA (1984) 81:7529-7533) successfully injected polyomavirus DNA in the form of a calcium phosphate precipitate into the liver and spleen of adult and newborn mice, demonstrating active viral replication and acute infection. Benvenisty and Neshif (Proc. Natl. Acad. Sci. USA (1986) 83:9551-9555) also demonstrated that direct intraperitoneal injection of calcium phosphate-precipitated plasmids resulted in expression of transfected genes. It is expected that DNA encoding related engineered retrotranscripts can also be transferred in vivo in a similar manner and express retrotranscript products.

在另一實施例中,可藉由顆粒轟擊將裸DNA表現構築體轉移至細胞中。此方法依賴於將DNA包覆之微彈加速至高速之能力,從而使它們刺穿細胞膜且進入細胞而不殺死細胞(Klein 等人(1987) Nature 327:70-73)。已開發數種用於加速小顆粒之裝置。一種此類裝置依賴於高壓放電來生成電流,該電流又提供原動力(Yang 等人(1990) Proc. Natl. Acad. Sci. USA 87:9568-9572)。微彈可由生物惰性物質(諸如鎢或金珠)組成。 In another embodiment, naked DNA expression constructs can be transferred into cells by particle bombardment. This method relies on the ability to accelerate DNA-coated microprojectiles to high speeds, allowing them to pierce the cell membrane and enter the cell without killing the cell (Klein et al. (1987) Nature 327:70-73). Several devices have been developed for accelerating small particles. One such device relies on a high-voltage discharge to generate an electric current, which in turn provides the motive force (Yang et al . (1990) Proc. Natl. Acad. Sci. USA 87:9568-9572). The microprojectiles can be composed of biologically inert materials such as tungsten or gold beads.

在另一實施例中,可使用脂質體來遞送表現構築體。脂質體為囊泡結構,其特徵在於磷脂雙層膜及內部水介質。多層脂質體具有由水介質隔開之多個脂質層。當磷脂懸浮於過量水溶液中時,它們會自發地形成。脂質組分在形成封閉結構之前經歷自重排,且在脂質雙層之間截留水及溶解之溶質(Ghosh及Bachhawat (1991) Liver Diseases, Targeted Diagnosis and Therapy Using Specific Receptors and Ligands,Wu 等人(編), Marcel Dekker, NY, 87-104)。亦考慮使用lipofectamine-DNA複合物。 In another embodiment, liposomes can be used to deliver the expression construct. Liposomes are vesicular structures characterized by a phospholipid bilayer membrane and an internal aqueous medium. Multilamellar liposomes have multiple lipid layers separated by an aqueous medium. They form spontaneously when phospholipids are suspended in an excess of aqueous solution. The lipid components undergo self-rearrangement before forming a closed structure and trap water and dissolved solutes between the lipid bilayers (Ghosh and Bachhawat (1991) Liver Diseases, Targeted Diagnosis and Therapy Using Specific Receptors and Ligands, Wu et al . (eds.), Marcel Dekker, NY, 87-104). The use of lipofectamine-DNA complexes is also contemplated.

在一些實施例中,脂質體可與血凝病毒(HVJ)複合。這已顯示出促進與細胞膜之融合且促進脂質體囊封之DNA進入細胞(Kaneda 等人(1989) Science 243:375-378)。在其他實施例中,脂質體可與核非組蛋白染色體蛋白(HMG-I)複合或聯合使用(Kato 等人(1991) J. Biol. Chem. 266(6):3361-3364)。 In some embodiments, the liposomes may be complexed with hemagglutinating virus (HVJ). This has been shown to promote fusion with cell membranes and facilitate entry of liposome-encapsulated DNA into cells (Kaneda et al. (1989) Science 243:375-378). In other embodiments, the liposomes may be complexed or used in combination with nuclear non-histone chromosomal proteins (HMG-I) (Kato et al. (1991) J. Biol. Chem. 266(6):3361-3364).

在其他實施例中,脂質體可與HVJ及HMG-I兩者複合或聯合使用。當DNA構築體中使用細菌啟動子時,亦將需要在脂質體內包括適當細菌聚合酶。In other embodiments, liposomes may be complexed or used in combination with both HVJ and HMG-I. When a bacterial promoter is used in the DNA construct, it will also be necessary to include an appropriate bacterial polymerase in the liposomes.

可用於將核酸遞送至細胞中之其他表現構築體為受體介導之遞送媒劑。此等利用了幾乎所有真核細胞中受體介導之內吞作用對大分子之選擇性攝取。由於各種受體之細胞類型特異性分佈,遞送可為高度特異性的(Wu及Wu (1993) Adv. Drug Delivery Rev. 12:159- 167)。受體介導之基因靶向媒劑一般由兩種組分組成:細胞受體特異性配位體及DNA結合劑。數種配位體已用於受體介導之基因轉移。最廣泛表徵之配位體為去唾液酸血清類黏蛋白(ASOR)及轉鐵蛋白(參見 例如Wu及Wu (1987), 同上;Wagner 等人(1990) Proc. Natl. Acad. Sci. USA 87(9):3410- 3414)。識別與ASOR相同之受體的合成新醣蛋白已用作基因遞送媒劑(Ferkol 等人(1993) FASEB J. 7:1081-1091;Perales 等人(1994) Proc. Natl. Acad. Sci. USA 91(9):4086-4090),且表皮生長因子(EGF)亦已用於將基因遞送至鱗狀癌細胞(Myers, EPO 0273085)。 Other expression constructs that can be used to deliver nucleic acids into cells are receptor-mediated delivery vehicles. These take advantage of the selective uptake of macromolecules by receptor-mediated endocytosis in nearly all eukaryotic cells. Due to the cell type-specific distribution of the various receptors, delivery can be highly specific (Wu and Wu (1993) Adv. Drug Delivery Rev. 12:159-167). Receptor-mediated gene targeting vehicles generally consist of two components: a cell receptor-specific ligand and a DNA-binding agent. Several ligands have been used for receptor-mediated gene transfer. The most widely characterized ligands are asialo-seromucoid (ASOR) and transferrin (see , e.g., Wu and Wu (1987), supra; Wagner et al. (1990) Proc. Natl. Acad. Sci. USA 87(9):3410-3414). Synthetic neoglycoproteins that recognize the same receptor as ASOR have been used as gene delivery vehicles (Ferkol et al. (1993) FASEB J. 7:1081-1091; Perales et al. (1994) Proc. Natl. Acad. Sci. USA 91(9):4086-4090), and epidermal growth factor (EGF) has also been used to deliver genes to squamous cell carcinomas (Myers, EPO 0273085).

在其他實施例中,遞送媒劑可包含配位體及脂質體。例如,Nicolau 等人(Methods Enzymol. (1987) 149:157-176)使用併入脂質體中之乳糖1-神經醯胺(一種半乳糖末端去唾液酸神經節苷脂)且觀察到肝細胞對胰島素基因之攝取增加。因此,編碼特定基因之核酸亦可藉由任何數目之受體-配位體系統(具有或不具有脂質體)特異性地遞送至細胞中係可行的。此外,細胞表面抗原之抗體同樣可用作靶向部分。 In other embodiments, the delivery vehicle may include a ligand and a liposome. For example, Nicolau et al. (Methods Enzymol. (1987) 149: 157-176) used lactose 1-ceramide (a galactose-terminal desialoganglioside) incorporated into liposomes and observed an increase in the uptake of insulin genes by hepatocytes. Therefore, it is feasible that nucleic acids encoding specific genes can also be specifically delivered to cells by any number of receptor-ligand systems (with or without liposomes). In addition, antibodies to cell surface antigens can also be used as targeting moieties.

在一些實施例中,可用於本文所述之逆轉錄子遞送系統的啟動子可為組成型、誘導型或組織特異性的。在一些實施例中,啟動子可為組成型啟動子。非限制性例示性組成型啟動子包括巨細胞病毒即刻早期啟動子(CMV)、猿猴病毒(SV40)啟動子、腺病毒主要晚期(MLP)啟動子、勞斯肉瘤病毒(RSV)啟動子、小鼠乳腺腫瘤病毒(MMTV)啟動子、磷酸甘油酸激酶(PGK)啟動子、延伸因子-α (EFla)啟動子、泛素啟動子、肌動蛋白啟動子、微管蛋白啟動子、免疫球蛋白啟動子、其功能片段或前述任一者之組合。在一些實施例中,啟動子可為CMV啟動子。在一些實施例中,啟動子可為經截短之CMV啟動子。在其他實施例中,啟動子可為EFla啟動子。在一些實施例中,啟動子可為誘導型啟動子。非限制性例示性誘導型啟動子包括可藉由熱休克、光、化學品、肽、金屬、類固醇、抗生素或酒精誘導之彼等啟動子。在一些實施例中,誘導型啟動子可為具有低基礎(非誘導型)表現水準之啟動子,例如Tet-On®啟動子(Clontech)。在一些實施例中,啟動子可為組織特異性啟動子。在一些實施例中,組織特異性啟動子排他地或主要在肝組織中表現。非限制性例示性組織特異性啟動子包括B29啟動子、CD14啟動子、CD43啟動子、CD45啟動子、CD68啟動子、結蛋白啟動子、彈性蛋白酶-1啟動子、內皮糖蛋白啟動子、纖維連接蛋白啟動子、Flt-1啟動子、GFAP啟動子、GPIIb啟動子、ICAM-2啟動子、INF-b啟動子、Mb啟動子、Nphsl啟動子、OG-2啟動子、SP-B啟動子、SYN1啟動子及WASP啟動子。 I. 遞送系統及遞送方法 概覽 In some embodiments, promoters useful in the retrotransposons delivery systems described herein can be constitutive, inducible, or tissue-specific. In some embodiments, the promoter can be a constitutive promoter. Non-limiting exemplary constitutive promoters include cytomegalovirus immediate early promoter (CMV), simian virus (SV40) promoter, adenovirus major late (MLP) promoter, Rous sarcoma virus (RSV) promoter, mouse mammary tumor virus (MMTV) promoter, phosphoglycerate kinase (PGK) promoter, elongation factor-α (EF1a) promoter, ubiquitin promoter, actin promoter, tubulin promoter, immunoglobulin promoter, functional fragments thereof, or combinations of any of the foregoing. In some embodiments, the promoter may be a CMV promoter. In some embodiments, the promoter may be a truncated CMV promoter. In other embodiments, the promoter may be an EF1a promoter. In some embodiments, the promoter may be an inducible promoter. Non-limiting exemplary inducible promoters include those that can be induced by heat shock, light, chemicals, peptides, metals, steroids, antibiotics or alcohol. In some embodiments, the inducible promoter may be a promoter with a low basal (non-induced) expression level, such as the Tet-On® promoter (Clontech). In some embodiments, the promoter may be a tissue-specific promoter. In some embodiments, the tissue-specific promoter is exclusively or predominantly expressed in liver tissue. Non-limiting exemplary tissue-specific promoters include B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, Flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-b promoter, Mb promoter, Nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter, and WASP promoter. I. Overview of Delivery Systems and Delivery Methods

在另一態樣中,本揭示案提供用於例如在 活體外離體活體內條件下轉移及/或表現該等基於逆轉錄子之基因編輯系統之載體。在另一態樣中,本揭示案提供細胞遞送組合物及方法,包括用於將本文所述之基於逆轉錄子之基因編輯系統被動及/或主動轉運至細胞(例如,質體)、藉由基於病毒之重組載體(例如,AAV及/或慢病毒載體)遞送、藉由非基於病毒之系統(例如脂質體及LNP)遞送以及藉由病毒樣顆粒遞送之組合物。取決於所採用之遞送系統,本文所述之基於逆轉錄子之基因編輯系統可以DNA (例如,質體或基於DNA之病毒載體)、RNA (例如,由LNP遞送之引導RNA及mRNA)、DNA及RNA混合物、蛋白質(例如,病毒樣顆粒)及核糖核蛋白(RNP)複合物之形式進行遞送。可採用用於遞送本文所揭示之基於逆轉錄子之基因編輯系統之組分的方法之任何合適組合。 In another aspect, the present disclosure provides vectors for transferring and/or expressing the retrotransposons-based gene editing systems, e.g., in vitro , ex vivo , and in vivo conditions. In another aspect, the present disclosure provides cell delivery compositions and methods, including compositions for passively and/or actively delivering the retrotransposons-based gene editing systems described herein to cells (e.g., plasmids), delivery by viral-based recombinant vectors (e.g., AAV and/or lentiviral vectors), delivery by non-viral-based systems (e.g., liposomes and LNPs), and delivery by virus-like particles. Depending on the delivery system employed, the retrotransposons-based gene editing systems described herein can be delivered in the form of DNA (e.g., plasmids or DNA-based viral vectors), RNA (e.g., guide RNA and mRNA delivered by LNPs), mixtures of DNA and RNA, proteins (e.g., virus-like particles), and ribonucleoprotein (RNP) complexes. Any suitable combination of methods for delivering the components of the retrotransposons-based gene editing systems disclosed herein can be employed.

基於逆轉錄子之基因編輯系統及/或其組分可藉由任何已知之遞送系統,諸如上文所述之彼等遞送系統進行遞送,包括(a)無載體(例如,電穿孔)、(b)病毒遞送系統及(c)非病毒遞送系統。病毒遞送系統包括表現載體、腺相關病毒(AAV)載體、逆轉錄病毒載體、慢病毒載體及其類似載體。表現構築體可在活細胞中複製,或其可以合成方式製得。非病毒遞送系統包括但不限於脂質顆粒( 例如脂質奈米顆粒(LNP))、非脂質奈米顆粒、胞泌體、脂質體、膠束、病毒顆粒、穩定核酸-脂質顆粒(SNALP)、脂質複合物/多聚複合物(polyplex)、DNA奈米線、金奈米顆粒、iTOP、鏈球菌溶血素O (SLO)、多功能包膜型奈米裝置(MEND)、脂質包覆之中孔二氧化矽顆粒、無機奈米顆粒及聚合物遞送技術( 例如,基於聚合物之顆粒)。 Retroviral gene editing systems and/or their components can be delivered by any known delivery system, such as those described above, including (a) vector-free (e.g., electroporation), (b) viral delivery systems, and (c) non-viral delivery systems. Viral delivery systems include expression vectors, adeno-associated virus (AAV) vectors, retroviral vectors, lentiviral vectors, and the like. Expression constructs can be replicated in living cells, or they can be made synthetically. Non-viral delivery systems include, but are not limited to, lipid particles ( e.g., lipid nanoparticles (LNPs)), non-lipid nanoparticles, exosomes, liposomes, micelles, virosomes, stable nucleic acid-lipid particles (SNALPs), lipid complexes/polyplexes, DNA nanowires, gold nanoparticles, iTOP, streptolysin O (SLO), multifunctional coated nanodevices (MENDs), lipid-coated mesoporous silica particles, inorganic nanoparticles, and polymer delivery technologies ( e.g. , polymer-based particles).

核酸形式(包括RNA治療劑)之遞送進一步描述於Paunovska K, Loughrey D, Dahlman JE. Drug delivery systems for RNA therapeutics. Nat Rev Genet. 2022年5月;23(5):265-280. doi: 10.1038/s41576-021-00439-4. Epub 2022年1月4日. PMID: 34983972; PMCID: PMC8724758;Hong CA, Nam YS. Functional nanostructures for effective delivery of small interfering RNA therapeutics. Theranostics. 2014年9月19日; 4(12):1211-32. doi: 10.7150/thno.8491. PMID: 25285170; PMCID: PMC4183999;Liu F, Wang C, Gao Y, Li X, Tian F, Zhang Y, Fu M, Li P, Wang Y, Wang F. Current Transport Systems and Clinical Applications for Small Interfering RNA (siRNA) Drugs. Mol Diagn Ther. 2018年10月;22(5):551-569. doi: 10.1007/s40291-018-0338-8. PMID: 29926308;Zhang Y, Almazi JG, Ong HX, Johansen MD, Ledger S, Traini D, Hansbro PM, Kelleher AD, Ahlenstiel CL. Nanoparticle Delivery Platforms for RNAi Therapeutics Targeting COVID-19 Disease in the Respiratory Tract. Int J Mol Sci. 2022年2月22日; 23(5):2408. doi: 10.3390/ijms23052408. PMID: 35269550; PMCID: PMC8909959;Zhang M, Hu S, Liu L, Dang P, Liu Y, Sun Z, Qiao B, Wang C. Engineered exosomes from different sources for cancer-targeted therapy. Signal Transduct Target Ther. 2023年3月15日; 8(1):124. doi: 10.1038/s41392-023-01382-y. PMID: 36922504; PMCID: PMC10017761;Hastings ML, Krainer AR. RNA therapeutics. RNA. 2023年4月; 29(4):393-395. doi: 10.1261/rna.079626.123. PMID: 36928165; PMCID: PMC10019368;Miele E, Spinelli GP, Miele E, Di Fabrizio E, Ferretti E, Tomao S, Gulino A. Nanoparticle-based delivery of small interfering RNA: challenges for cancer therapy. Int J Nanomedicine. 2012;7:3637-57. doi: 10.2147/IJN.S23696. Epub 2012年7月20日. PMID: 22915840; PMCID: PMC3418108,其中每一者以引用之方式整體併入。Delivery of nucleic acid forms, including RNA therapeutics, is further described in Paunovska K, Loughrey D, Dahlman JE. Drug delivery systems for RNA therapeutics. Nat Rev Genet. 2022 May;23(5):265-280. doi: 10.1038/s41576-021-00439-4. Epub 2022 Jan 4. PMID: 34983972; PMCID: PMC8724758;Hong CA, Nam YS. Functional nanostructures for effective delivery of small interfering RNA therapeutics. Theranostics. 2014 Sep 19;4(12):1211-32. doi: 10.7150/thno.8491. PMID: 25285170; PMCID: PMC4183999; Liu F, Wang C, Gao Y, Li X, Tian F, Zhang Y, Fu M, Li P, Wang Y, Wang F. Current Transport Systems and Clinical Applications for Small Interfering RNA (siRNA) Drugs. Mol Diagn Ther. 2018 Oct;22(5):551-569. doi: 10.1007/s40291-018-0338-8. ID: 29926308; Zhang Y, Almazi JG, Ong HX, Johansen MD, Ledger S, Traini D, Hansbro PM, Kelleher AD, Ahlenstiel CL. Nanoparticle Delivery Platforms for RNAi Therapeutics Targeting COVID-19 Disease in the Respiratory Tract. Int J Mol Sci. 2022 Feb 22; 23(5):2408. doi: 10.3390/ijms23052408. PMID: 35269550; PMCID: PMC8909959;Zhang M, Hu S, Liu L, Dang P, Liu Y, Sun Z, Qiao B, Wang C. Engineered exosomes from different sources for cancer-targeted therapy. Signal Transduct Target Ther. 2023年3月15日; 8(1):124. doi: 10.1038/s41392-023-01382-y. PMID: 36922504; PMCID: PMC10017761;Hastings ML, Krainer AR. RNA therapeutics. RNA. 2023年4月; 29(4):393-395. doi: 10.1261/rna.079626.123. PMID: 36928165; PMCID: PMC10019368; Miele E, Spinelli GP, Miele E, Di Fabrizio E, Ferretti E, Tomao S, Gulino A. Nanoparticle-based delivery of small interfering RNA: challenges for cancer therapy. Int J Nanomedicine. 2012;7:3637-57. doi: 10.2147/IJN.S23696. Epub 2012 Jul 20. PMID: 22915840; PMCID: PMC3418108, each of which is incorporated by reference in its entirety.

經工程改造之基於逆轉錄子之基因編輯系統(或含有該等逆轉錄子之載體)可經引入任何類型之細胞中,包括來自原核、真核或古細菌生物體之任何細胞,包括細菌、古細菌、真菌、原生生物、植物( 例如,單子葉植物及雙子葉植物);及動物( 例如,脊椎動物及無脊椎動物)。可用經工程改造之基於逆轉錄子之基因編輯系統轉染的動物之實例包括但不限於脊椎動物(諸如魚、鳥、哺乳動物( 例如,人類及非人類靈長類動物、農場動物、寵物及實驗動物))、爬行動物及兩棲類。 Engineered retrotransposon-based gene editing systems (or vectors containing such retrotransposon) can be introduced into any type of cell, including any cell from a prokaryotic, eukaryotic, or archaeal organism, including bacteria, archaea, fungi, protists, plants ( e.g. , monocots and dicots); and animals ( e.g. , vertebrates and invertebrates). Examples of animals that can be transfected with engineered retrotransposon-based gene editing systems include, but are not limited to, vertebrates (such as fish, birds, mammals ( e.g. , humans and non-human primates, farm animals, pets, and experimental animals)), reptiles, and amphibians.

經工程改造之基於逆轉錄子之基因編輯系統(或其組分)可經引入單一細胞或細胞群體中。來自組織、器官及活組織檢查之細胞以及重組細胞、經遺傳修飾之細胞、來自 活體外培養之細胞株的細胞及人工細胞( 例如,奈米顆粒、脂質體、聚合物囊泡或囊封核酸之微膠囊)均可用經工程改造之基於逆轉錄子之基因編輯系統轉染。 The engineered retrotransposon-based gene editing system (or its components) can be introduced into a single cell or a population of cells. Cells from tissues, organs and biopsies, as well as recombinant cells, genetically modified cells, cells from cell lines cultured in vitro , and artificial cells ( e.g. , nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids) can be transfected with the engineered retrotransposon-based gene editing system.

經工程改造之基於逆轉錄子之基因編輯系統(或其組分)可經引入細胞片段、細胞成分或細胞器( 例如,動物及植物細胞中之粒線體、植物細胞及藻類中之質體( 例如,葉綠體))中。 The engineered retrotransposon-based gene editing system (or components thereof) can be introduced into cellular fragments, cellular components, or cellular organelles ( e.g. , mitochondria in animal and plant cells, plastids ( e.g. , chloroplasts) in plant cells and algae).

在用經工程改造之基於逆轉錄子之基因編輯系統轉染後,細胞可進行培養或擴增。Following transfection with the engineered retroviral-based gene editing system, cells can be cultured or expanded.

將核酸引入宿主細胞中之方法為此項技術中熟知的。常用方法包括通常使用二價陽離子( 例如,CaCl 2)之化學誘導性轉化、葡聚糖介導之轉染、聚凝胺介導之轉染、lipofectamine及LT-1介導之轉染、電穿孔、原生質體融合、將核酸囊封於脂質體中以及將包含Cas12a編輯系統的核酸直接顯微注射至細胞核中。參見 例如Sambrook 等人(2001) Molecular Cloning, a laboratory manual, 第3版, Cold Spring Harbor Laboratories, New York,Davis 等人(1995) Basic Methods in Molecular Biology, 第2版, McGraw-Hill,及Chu 等人(1981) Gene 13:197;以引用之方式整體併入本文中。 Methods for introducing nucleic acids into host cells are well known in the art. Common methods include chemically induced transformation, dextran-mediated transfection, polybrene-mediated transfection, lipofectamine and LT- 1 -mediated transfection, electroporation, protoplast fusion, encapsulation of nucleic acids in liposomes, and direct microinjection of nucleic acids containing the Cas12a editing system into the nucleus. See, for example, Sambrook et al. (2001) Molecular Cloning, a laboratory manual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis et al . (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13: 197; incorporated herein by reference in their entirety.

本文所揭示之基於逆轉錄子之基因編輯系統(或其組分)亦可靶向植物細胞。用於植物細胞之遺傳轉化的方法為此項技術中已知的且包括以下所闡述之彼等方法:US2022/0145296,及美國專利第8,575,425號;第7,692,068號;第8,802,934號;第7,541,517號;其中每一者均以引用之方式整體併入本文中。亦參見Rakoczy-Trojanowska, M. (2002) Cell Mol Biol Lett.7:849-858;Jones等人(2005) Plant Methods 1:5;Rivera等人(2012) Physics of Life Reviews 9:308-345;Bartlett等人(2008) Plant Methods 4:1-12;Bates, G. W. (1999) Methods in Molecular Biology 111:359-366;Binns及Thomashow (1988) Annual Reviews in Microbiology 42:575-606;Christou, P. (1992) The Plant Journal 2:275-281;Christou, P. (1995) Euphytica 85:13-27;Tzfira等人(2004) TRENDS in Genetics 20:375-383;Yao等人(2006) Journal of Experimental Botany 57:3737-3746;Zupan及Zambryski (1995) Plant Physiology 107:1041-1047;及Jones等人(2005) Plant Methods 1:5。The retroviral-based gene editing system (or its components) disclosed herein can also be targeted to plant cells. Methods for genetic transformation of plant cells are known in the art and include those described in US2022/0145296, and U.S. Patent Nos. 8,575,425; 7,692,068; 8,802,934; 7,541,517; each of which is incorporated herein by reference in its entirety. See also Rakoczy-Trojanowska, M. (2002) Cell Mol Biol Lett. 7:849-858; Jones et al. (2005) Plant Methods 1:5; Rivera et al. (2012) Physics of Life Reviews 9:308-345; Bartlett et al. (2008) Plant Methods 4:1-12; Bates, G. W. (1999) Methods in Molecular Biology 111:359-366; Binns and Thomashow (1988) Annual Reviews in Microbiology 42:575-606; Christou, P. (1992) The Plant Journal 2:275-281; Christou, P. (1995) Euphytica 85:13-27; Tzfira et al. (2004) TRENDS in Genetics 20:375-383; Yao et al. (2006) Journal of Experimental Botany 57:3737-3746; Zupan and Zambryski (1995) Plant Physiology 107:1041-1047; and Jones et al. (2005) Plant Methods 1:5.

根據習知方法,已轉化之植物細胞可生長成轉殖基因生物體,諸如植物。參見例如McCormick等人(1986) Plant Cell Reports 5:81-84。Transformed plant cells can be grown into transgenic organisms, such as plants, according to known methods. See, for example, McCormick et al. (1986) Plant Cell Reports 5:81-84.

可用本文所述之基於逆轉錄子之基因編輯系統(或其組分)轉化的植物材料包括植物細胞、植物原生質體、可使植物再生之植物細胞組織培養物、植物癒傷組織、植物叢以及植物或植物部分中之完整植物細胞(諸如胚胎、花粉、胚珠、種子、葉、花、枝、果實、仁、穗、穗軸、殼、莖、根、根尖、花藥及其類似物)。再生植物之後代、變異體及突變體亦包括於本揭示案之範圍內,只要此等部分包含由基於逆轉錄子之基因編輯系統引入的遺傳修飾。進一步提供保留由基於逆轉錄子之基因編輯系統引入的遺傳修飾之加工植物產品或副產品。Plant materials that can be transformed with the retrotransposon-based gene editing system described herein (or its components) include plant cells, plant protoplasts, plant cell tissue cultures that can regenerate plants, plant wound tissue, plant clusters, and intact plant cells in plants or plant parts (such as embryos, pollen, ovules, seeds, leaves, flowers, branches, fruits, kernels, ears, cobs, shells, stems, roots, root tips, anthers, and the like). Progeny, variants, and mutants of regenerated plants are also included in the scope of the present disclosure, as long as these parts contain the genetic modifications introduced by the retrotransposon-based gene editing system. Further provided are processed plant products or by-products that retain the genetic modifications introduced by the retrotransposon-based gene editing system.

本文所述之基於逆轉錄子之基因編輯系統可用於產生具有所需表型的轉殖基因植物,該等表型包括但不限於增加之疾病抗性(例如,增加之病毒、細菌或真菌抗性)、增加之昆蟲抗性、增加之乾旱抗性、增加之產率及改變之果實成熟特徵、糖及油成分以及顏色。The retroviral-based gene editing system described herein can be used to produce transgenic plants with desired phenotypes, including but not limited to increased disease resistance (e.g., increased viral, bacterial or fungal resistance), increased insect resistance, increased drought resistance, increased yield, and altered fruit ripening characteristics, sugar and oil content, and color.

在涉及基於逆轉錄子之基因編輯系統的一些實施例中,逆轉錄子 msr基因、 msd基因及/或 ret基因可在 活體外自載體中表現,諸如在 活體外轉錄系統中。所得ncRNA或msDNA可經分離,接著進行封裝及/或調配以直接遞送至宿主細胞中。例如,經分離之ncRNA或msDNA可在諸如脂質奈米顆粒之遞送媒劑中進行封裝/調配,如其他章節中所述。 In some embodiments involving retrotranscript-based gene editing systems, retrotranscript msr genes, msd genes, and/or ret genes can be expressed in ex vivo self-vectors, such as in ex vivo transcription systems. The resulting ncRNA or msDNA can be isolated and then packaged and/or formulated for direct delivery to host cells. For example, the isolated ncRNA or msDNA can be packaged/formulated in a delivery medium such as lipid nanoparticles, as described in other sections.

在涉及基於逆轉錄子之基因編輯系統的一些實施例中,逆轉錄子 msr基因、 msd基因及/或 ret基因在 活體內自細胞內之載體中表現。可用單一載體或多個單獨載體將逆轉錄子 msr基因、 msd基因及/或 ret基因引入細胞中以在宿主個體中產生msDNA。 In some embodiments involving retrotransposons-based gene editing systems, retrotransposons msr genes, msd genes, and/or ret genes are expressed in vivo from vectors in cells. Retrotransposons msr genes, msd genes, and/or ret genes can be introduced into cells using a single vector or multiple separate vectors to produce msDNA in a host individual.

在其他實施例中,本文所述之基於逆轉錄子之基因體編輯系統的逆轉錄子 msr基因、 msd基因及/或 ret基因以及任何其他組分(例如, 反式引導RNA、可程式化核酸酶(例如, 反式))可在 活體內由遞送至細胞之RNA表現。可用單一載體或多個單獨載體將逆轉錄子 msr基因、 msd基因及/或 ret基因引入細胞中以在宿主個體中產生msDNA。 In other embodiments, the retrotranscript msr gene, msd gene and/or ret gene and any other components (e.g., trans- guide RNA, programmable nuclease (e.g., trans- )) of the retrotranscript-based genome editing system described herein can be expressed in vivo by RNA delivered to cells. Retrotranscript msr gene, msd gene and/or ret gene can be introduced into cells using a single vector or multiple separate vectors to produce msDNA in a host individual.

編碼基於重組逆轉錄子之基因體編輯系統或其組分的載體及/或核酸分子可包括可操作地連接至逆轉錄子序列之控制元件,該等控制元件允許在個體物種 活體外活體內產生msDNA。例如,在涉及基於逆轉錄子之基因編輯系統的實施例中,逆轉錄子 msr基因、 msd基因及/或 ret基因能可操作地連接至啟動子以允許逆轉錄子逆轉錄酶及/或msDNA產物之表現。在一些實施例中,編碼所需相關產物之異源序列( 例如,編碼多肽或調節RNA之多核苷酸、用於基因編輯之供體多核苷酸或用於分子記錄之原間隔基DNA)可插入 msr基因及/或 msd基因中。 Vectors and/or nucleic acid molecules encoding a recombinant retrotranscript-based genome editing system or its components may include control elements operably linked to the retrotranscript sequence that allow the production of msDNA in vitro or in vivo in an individual species. For example, in embodiments involving retrotranscript-based gene editing systems, the retrotranscript msr gene, msd gene, and/or ret gene can be operably linked to a promoter to allow expression of the retrotranscript reverse transcriptase and/or msDNA product. In some embodiments, a heterologous sequence encoding a desired related product ( e.g. , a polynucleotide encoding a polypeptide or regulatory RNA, a donor polynucleotide for gene editing, or a protospacer DNA for molecular recording) can be inserted into the msr gene and/or msd gene.

在一些實施例中,基於逆轉錄子之基因編輯系統由包含一或多種載體之載體系統產生。In some embodiments, a retroviral-based gene editing system is generated by a vector system comprising one or more vectors.

多種載體可用於載體或載體系統,包括但不限於線性多核苷酸、與離子或兩親化合物相關之多核苷酸、質體及病毒。 全RNA形式 A variety of vectors can be used for the vector or vector system, including but not limited to linear polynucleotides, polynucleotides associated with ions or amphiphilic compounds, plasmids , and viruses.

在各個實施例中,本文所揭示之基於逆轉錄子之基因編輯系統(或其組分)可以「全RNA」形式進行遞送。如本文所用,術語「全RNA」形式係指如下實情:逆轉錄子編輯系統之每種組分(例如,逆轉錄子RT、可程式化核酸酶、sgRNA及ncRNA)以RNA (例如,編碼RNA或非編碼RNA)形式進行遞送及/或投與。在一些實施例中,該等RNA組分可藉由直接手段(諸如電穿孔或轉染)遞送至細胞及/或組織。在其他實施例中,該等RNA組分可藉助於遞送媒劑(諸如LNP或脂質體)遞送至細胞及/或組織。In various embodiments, the retrotranscript-based gene editing system disclosed herein (or its components) can be delivered in "all RNA" form. As used herein, the term "all RNA" form refers to the fact that each component of the retrotranscript editing system (e.g., retrotranscript RT, programmable nuclease, sgRNA and ncRNA) is delivered and/or administered in the form of RNA (e.g., coding RNA or non-coding RNA). In some embodiments, the RNA components can be delivered to cells and/or tissues by direct means (such as electroporation or transfection). In other embodiments, the RNA components can be delivered to cells and/or tissues with the aid of delivery vehicles (such as LNPs or liposomes).

在各個實施例中,本文所述之逆轉錄子編輯系統可包含編碼逆轉錄子逆轉錄酶(例如,來自表X或表A的任何RT)之編碼RNA (例如,線性或環狀mRNA)、編碼可程式化核酸酶(例如,Cas9、Cas12a或TnpB核酸酶)之編碼RNA (例如,線性或環狀mRNA)、逆轉錄子ncRNA (例如,來自表B的ncRNA)以及用於將可程式化核酸酶靶向特定所需標靶序列之引導RNA。In various embodiments, the retrotranscript editing system described herein may include a coding RNA (e.g., a linear or circular mRNA) encoding a retrotranscript reverse transcriptase (e.g., any RT from Table X or Table A), a coding RNA (e.g., a linear or circular mRNA) encoding a programmable nuclease (e.g., Cas9, Cas12a or TnpB nuclease), a retrotranscript ncRNA (e.g., an ncRNA from Table B), and a guide RNA for targeting the programmable nuclease to a specific desired target sequence.

在一些實施例中,RT及核酸酶組分可在同一編碼RNA分子上經編碼。蛋白質亦可由單獨編碼RNA分子表現。在其他實施例中,RT及核酸酶組分可融合在一起作為單一融合多肽,該多肽具有視情況藉由連接體接合之RT結構域及核酸酶結構域。In some embodiments, the RT and nuclease components can be encoded on the same coding RNA molecule. The protein can also be expressed by a separate coding RNA molecule. In other embodiments, the RT and nuclease components can be fused together as a single fusion polypeptide having an RT domain and a nuclease domain optionally joined by a linker.

另外,在一些實施例中,ncRNA及引導RNA可融合在一起作為單一RNA分子。例如,引導RNA可位於ncRNA之5’端。在其他實施例中,引導RNA可位於ncRNA之3’端。在一些實施例中,ncRNA可在ncRNA之3’及5’端均包含引導RNA。In addition, in some embodiments, the ncRNA and the guide RNA can be fused together as a single RNA molecule. For example, the guide RNA can be located at the 5' end of the ncRNA. In other embodiments, the guide RNA can be located at the 3' end of the ncRNA. In some embodiments, the ncRNA can include guide RNAs at both the 3' and 5' ends of the ncRNA.

在其他實施例中,ncRNA及引導RNA可為單獨分子,亦即,單獨進行遞送。In other embodiments, the ncRNA and guide RNA can be separate molecules, i.e., delivered separately.

在其他實施例中,逆轉錄子編輯系統可包括ncRNA-引導RNA融合及作為單獨分子提供之額外引導RNA。In other embodiments, the retrotranscript editing system may include a ncRNA-guide RNA fusion and an additional guide RNA provided as a separate molecule.

在各個實施例中,全RNA逆轉錄子編輯系統之不同RNA組分可以不同比率進行組合及投與(例如,直接地或在遞送媒劑內)。在一些實施例中,此類RNA組分或物質之比率可表述為莫耳比。In various embodiments, the different RNA components of the total RNA retrotranscript editing system can be combined and administered (e.g., directly or in a delivery vehicle) in different ratios. In some embodiments, the ratios of such RNA components or substances can be expressed as molar ratios.

例如,RT編碼RNA與核酸酶編碼RNA之莫耳比可為約1:1、約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。For example, the molar ratio of RT encoding RNA to nuclease encoding RNA can be about 1:1, about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20. Available ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40.

在另一實例中,核酸酶編碼RNA與RT編碼RNA之莫耳比可為約1:1、約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。In another example, the molar ratio of nuclease encoding RNA to RT encoding RNA can be about 1:1, about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20. Available ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40.

在另一實例中,ncRNA或ncRNA-引導RNA融合與單獨引導RNA之莫耳比可為約1:1、約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。In another example, the molar ratio of ncRNA or ncRNA-guide RNA fusion to guide RNA alone can be about 1:1, about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20. Available ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40.

在另一實例中,單獨引導RNA與ncRNA或ncRNA-引導RNA融合之莫耳比可為約1:1、約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。In another example, the molar ratio of guide RNA alone to ncRNA or ncRNA-guide RNA fusion can be about 1:1, about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20. Available ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40.

在另一實例中,ncRNA與單獨引導RNA之莫耳比可為約1:1、約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。In another example, the molar ratio of ncRNA to guide RNA alone can be about 1:1, about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20. Available ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40.

在另一實例中,單獨引導RNA與ncRNA之莫耳比可為約1:1、約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。In another example, the molar ratio of the guide RNA to the ncRNA alone can be about 1:1, about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20. Available ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40.

在另一實例中,編碼RNA (例如,編碼RT及/或核酸酶)與ncRNA或ncRNA-引導RNA融合之莫耳比可為約1:1、約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20,無論情況如何。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。In another example, the molar ratio of the coding RNA (e.g., encoding RT and/or nuclease) to the ncRNA or ncRNA-guide RNA fusion can be about 1:1, about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20, or any other ratio. Available ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40.

在另一實例中,編碼逆轉錄子RT之編碼RNA與ncRNA或ncRNA-引導RNA融合之莫耳比可為約1:1、約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20,無論情況如何。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。In another example, the molar ratio of the coding RNA encoding the retrotranscript RT to the ncRNA or ncRNA-guide RNA fusion can be about 1:1, about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20, or any other ratio. Available ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40.

在另一實例中,編碼可程式化核酸酶之編碼RNA與ncRNA或ncRNA-引導RNA融合之莫耳比可為約1:1、約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20,無論情況如何。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。In another example, the molar ratio of the coding RNA encoding the programmable nuclease to the ncRNA or ncRNA-guide RNA fusion can be about 1:1, about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20, or any other ratio. Available ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40.

在另一實例中,編碼逆轉錄子RT或核酸酶之編碼RNA與單獨引導RNA之莫耳比可為約1:1、約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。In another example, the molar ratio of the coding RNA encoding the reverse transcriptase RT or nuclease to the guide RNA alone can be about 1:1, about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20. Available ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40.

在另一實例中,單獨引導RNA與編碼逆轉錄子RT或核酸酶之編碼RNA之莫耳比可為約1:1、約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。In another example, the molar ratio of the guide RNA alone to the RNA encoding the reverse transcriptase RT or the nuclease can be about 1:1, about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20. Available ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40.

在某些實施例中,ncRNA-sgRNA相對於RT mRNA之量有所增加。在某些實施例中,RT mRNA: ncRNA-sgRNA比率為約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。在某些實施例中,RT-Cas9 (或Cas9-RT)融合由mRNA編碼。在某些實施例中,RT-Cas9 mRNA: ncRNA-sgRNA比率為約1:1.5、約1:2、約1:2.5、約1:3、約1:4、約1:5、約1:6、約1:7、約1:8、約1:9、約1:10、約1:12、約1:15、約1:20。可用範圍包括1:1至1:2、1:1.5至1:4、1:2至1:4、1:2至1:8、1:2至1:10、1:3至1:9、1:3至1:12、1:3至1:15、1:4至1:8、1:4至1:12、1:4至1:20、1:5至1:10、1:5至1:15、1:5至1:20、1:10至1:20或1:10至1:40。在某些實施例中,靶向多個遺傳基因座,因此ncRNA-sgRNA包括ncRNA-sgRNA物質之混合物且相同比率及範圍為可適用的。 病毒載體遞送 In some embodiments, the amount of ncRNA-sgRNA is increased relative to the amount of RT mRNA. In some embodiments, the RT mRNA:ncRNA-sgRNA ratio is about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20. Useful ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40. In certain embodiments, the RT-Cas9 (or Cas9-RT) fusion is encoded by mRNA. In certain embodiments, the RT-Cas9 mRNA:ncRNA-sgRNA ratio is about 1:1.5, about 1:2, about 1:2.5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:7, about 1:8, about 1:9, about 1:10, about 1:12, about 1:15, about 1:20. Useful ranges include 1:1 to 1:2, 1:1.5 to 1:4, 1:2 to 1:4, 1:2 to 1:8, 1:2 to 1:10, 1:3 to 1:9, 1:3 to 1:12, 1:3 to 1:15, 1:4 to 1:8, 1:4 to 1:12, 1:4 to 1:20, 1:5 to 1:10, 1:5 to 1:15, 1:5 to 1:20, 1:10 to 1:20, or 1:10 to 1:40. In certain embodiments, multiple genetic loci are targeted, and thus the ncRNA-sgRNA includes a mixture of ncRNA-sgRNA species and the same ratios and ranges are applicable. Viral Vector Delivery

在各個實施例中,本文所述之基於逆轉錄子之基因編輯系統可在病毒載體中進行遞送。In various embodiments, the retroviral-based gene editing systems described herein can be delivered in viral vectors.

病毒載體之實例包括但不限於腺病毒載體、腺相關病毒(AAV)載體、逆轉錄病毒載體、慢病毒載體及其類似物。表現構築體可在活細胞中複製,或其可以合成方式製得。Examples of viral vectors include, but are not limited to, adenoviral vectors, adeno-associated virus (AAV) vectors, retroviral vectors, lentiviral vectors, and the like. The expression construct can be replicated in living cells, or it can be produced synthetically.

在一些實施例中,包含基於逆轉錄子之基因編輯系統(或其組分)的核酸處於啟動子之轉錄控制下。在一些實施例中,該啟動子勝任藉由RNA聚合酶I、II或III起始可操作地連接之編碼序列之轉錄。In some embodiments, the nucleic acid comprising the retrotransposon-based gene editing system (or its components) is under the transcriptional control of a promoter. In some embodiments, the promoter is competent to initiate transcription of an operably linked coding sequence by RNA polymerase I, II or III.

用於哺乳動物細胞表現之例示性啟動子包括SV40早期啟動子、CMV啟動子(諸如CMV即刻早期啟動子) (參見美國專利第5,168,062號及第5,385,839號,以引用之方式整體併入本文中)、小鼠乳腺腫瘤病毒LTR啟動子、腺病毒主要晚期啟動子(Ad MLP)及單純疱疹病毒啟動子等。其他非病毒啟動子(諸如源自鼠科動物金屬硫蛋白基因之啟動子)亦將用於哺乳動物表現。Exemplary promoters for mammalian cell expression include the SV40 early promoter, the CMV promoter (such as the CMV immediate early promoter) (see U.S. Pat. Nos. 5,168,062 and 5,385,839, which are incorporated herein by reference in their entirety), the mouse mammary tumor virus LTR promoter, the adenovirus major late promoter (Ad MLP), and the herpes simplex virus promoter, etc. Other non-viral promoters (such as the promoter derived from the murine metallothionein gene) will also be used for mammalian expression.

用於植物細胞表現之例示性啟動子包括CaMV 35S啟動子(Odell等人, 1985, Nature 313:810-812);水稻肌動蛋白啟動子(McElroy等人, 1990, Plant Cell 2:163-171);泛素啟動子(Christensen等人, 1989, Plant Mol. Biol. 12:619-632;及Christensen等人, 1992, Plant Mol.Biol.18:675-689);pEMU啟動子(Last等人, 1991, Theor. Appl. Genet. 81:581-588);及MAS啟動子(Velten等人, 1984, EMBO J. 3:2723-2730)。Exemplary promoters for plant cell expression include the CaMV 35S promoter (Odell et al., 1985, Nature 313:810-812); the rice actin promoter (McElroy et al., 1990, Plant Cell 2:163-171); the ubiquitin promoter (Christensen et al., 1989, Plant Mol. Biol. 12:619-632; and Christensen et al., 1992, Plant Mol. Biol. 18:675-689); the pEMU promoter (Last et al., 1991, Theor. Appl. Genet. 81:581-588); and the MAS promoter (Velten et al., 1984, EMBO J. 3:2723-2730).

在額外實施例中,基於逆轉錄子之載體亦可包含組織特異性啟動子,以便僅在其經遞送至特定組織中之後才開始表現。非限制性例示性組織特異性啟動子包括B29啟動子、CD14啟動子、CD43啟動子、CD45啟動子、CD68啟動子、結蛋白啟動子、彈性蛋白酶-1啟動子、內皮糖蛋白啟動子、纖維連接蛋白啟動子、Flt-1啟動子、GFAP啟動子、GPIIb啟動子、ICAM- 2啟動子、INF-b啟動子、Mb啟動子、Nphsl啟動子、OG-2啟動子、SP-B啟動子、SYN1啟動子及WASP啟動子。In additional embodiments, retroviral-based vectors may also contain tissue-specific promoters so that the protein is expressed only after it has been delivered to a specific tissue. Non-limiting exemplary tissue-specific promoters include B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, Flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-b promoter, Mb promoter, Nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter, and WASP promoter.

此等及其他啟動子可使用此項技術中熟知之技術自市售質體獲得或併入市售質體中。參見 例如Sambrook 等人, 同上。 These and other promoters can be obtained from or incorporated into commercially available plasmids using techniques well known in the art. See , e.g., Sambrook et al. , supra.

在一些實施例中,一或多種增強子元件與啟動子聯合使用以增加構築體之表現水準。實例包括SV40早期基因增強子,如Dijkema 等人, EMBOJ (1985) 4:761中所述;源自勞斯肉瘤病毒之長末端重複(LTR)的增強子/啟動子,如Gorman 等人, Proc.Natl. Acad. Sci. USA (1982b) 79:6777;以及源自人類CMV之元件,如Boshart 等人, Cell (1985) 41:521中所述,諸如CMV內含子A序列中所包括之元件。所有此類序列均以引用之方式併入本文中。 In some embodiments, one or more enhancer elements are used in conjunction with a promoter to increase the expression level of the construct. Examples include the SV40 early gene enhancer, as described in Dijkema et al. , EMBOJ (1985) 4:761; enhancer/promoter derived from the long terminal repeat (LTR) of Rous sarcoma virus, as described in Gorman et al ., Proc. Natl. Acad. Sci. USA (1982b) 79:6777; and elements derived from human CMV, as described in Boshart et al. , Cell (1985) 41:521, such as elements included in the CMV intron A sequence. All such sequences are incorporated herein by reference.

在一實施例中,用於表現基於逆轉錄子之基因編輯系統(或其組分)的表現載體包含可操作地連接至編碼該等組分之多核苷酸之啟動子。基於逆轉錄子之基因編輯系統的組分可經組態為個別基因轉錄本或融合構築體。例如,核酸酶組分可與逆轉錄酶組分融合。在另一實例中,ncRNA組分可與引導RNA組分融合。在另一實例中,核酸酶組分可與逆轉錄酶組分融合,但引導RNA及ncRNA為單獨的。在其他實施例中,引導RNA及ncRNA組分可融合,但逆轉錄酶及核酸酶組分單獨加以提供。考慮融合組分及非融合組分之任何功能組合。In one embodiment, an expression vector for expressing a retrotranscript-based gene editing system (or its components) comprises a promoter operably linked to a polynucleotide encoding the components. The components of the retrotranscript-based gene editing system can be configured as individual gene transcripts or fusion constructs. For example, a nuclease component can be fused to a reverse transcriptase component. In another example, a ncRNA component can be fused to a guide RNA component. In another example, a nuclease component can be fused to a reverse transcriptase component, but the guide RNA and ncRNA are separate. In other embodiments, the guide RNA and ncRNA components can be fused, but the reverse transcriptase and nuclease components are provided separately. Any functional combination of fusion components and non-fusion components is contemplated.

在一些實施例中,載體或載體系統亦包含轉錄終止子/多腺苷酸化信號。此類序列之實例包括但不限於源自SV40之彼等序列,如Sambrook 等人, 同上中所述,以及牛生長激素終止子序列(參見 例如美國專利第5,122,458號)。 In some embodiments, the vector or vector system also comprises a transcriptional terminator/polyadenylation signal. Examples of such sequences include, but are not limited to, those derived from SV40, as described in Sambrook et al. , supra, and the bovine growth hormone terminator sequence (see , e.g., U.S. Patent No. 5,122,458).

另外,5'- UTR序列可置於與編碼序列相鄰處,以進一步增強表現。此類序列可包括包含內部核糖體進入位點(IRES)之UTR。IRES之包括允許自載體轉譯一或多個開放閱讀框。該IRES元件吸引真核生物核糖體轉譯起始複合物且促進轉譯起始。參見 例如Kaufman 等人, Nuc. Acids Res. (1991) 19:4485-4490;Gurtu 等人, Biochem. Biophys. Res. Comm. (1996) 229:295-298:Rees 等人, BioTechniques (1996) 20:102-110;Kobayashi 等人, BioTechniques (1996) 21:399-402;及Mosser 等人, BioTechniques (199722 ISO- 161)c。多種IRES序列為已知的且包括源自多種病毒之序列,諸如小RNA病毒之前導序列,諸如腦心肌炎病毒(EMCV) UTR (Jang 等人. Virol. (1989) 63:1651-1660)、脊髓灰白質炎前導序列、A型肝炎病毒前導、C型肝炎病毒IRES、人類鼻病毒2型IRES (Dobrikova 等人, Proc.Natl.Acad.Sci.(2003) 100(251:15125-151301))、來自口蹄疫病毒之IRES元件(Ramesh 等人, Nucl. Acid Res. (1996) 24:2697-2700)、梨形鞭毛蟲病毒IRES (Garlapati 等人, J Biol. Chem. (2004) 279(51):3389-33971)及其類似序列。多種非病毒IRES序列亦將用於本文中,包括但不限於來自酵母之IRES序列以及人類血管緊張素II 1型受體IRES (Martin 等人, Mol. Cell Endocrinol. (2003) 212:51-61)、成纖維細胞生長因子IRES (FGF-1 IRES及FGF-2 IRES,Martineau 等人(2004) Mol. Cell. Biol. 24( 17): 7622-7635)、血管內皮生長因子IRES (Baranick 等人(2008) Proc.Natl. Acad Sci. U.S.A. 105(12):4733-4738,Stein 等人(1998) Mol.Cell. Biol.18(6):3112-3119,Bert 等人(2006) RNA 12(6): 1074-1083)及胰島素樣生長因子2 IRES (Pedersen 等人(2002) Biochem. J. 363(Pt l):37-44)。 Additionally, a 5'-UTR sequence may be placed adjacent to the coding sequence to further enhance expression. Such sequences may include a UTR comprising an internal ribosome entry site (IRES). The inclusion of an IRES allows translation of one or more open reading frames from the vector. The IRES element attracts the eukaryotic ribosome translation initiation complex and promotes translation initiation. See , e.g., Kaufman et al ., Nuc. Acids Res. (1991) 19:4485-4490; Gurtu et al. , Biochem. Biophys. Res. Comm. (1996) 229:295-298; Rees et al. , BioTechniques (1996) 20:102-110; Kobayashi et al. , BioTechniques (1996) 21:399-402; and Mosser et al. , BioTechniques (199722 ISO-161)c. A variety of IRES sequences are known and include sequences derived from a variety of viruses, such as the leader sequences of picorna viruses, such as the encephalomyocarditis virus (EMCV) UTR (Jang et al . Virol. (1989) 63:1651-1660), the polio leader, the hepatitis A virus leader, the hepatitis C virus IRES, the human rhinovirus type 2 IRES (Dobrikova et al. , Proc. Natl. Acad. Sci. (2003) 100(251:15125-151301)), the IRES element from foot-and-mouth disease virus (Ramesh et al. , Nucl. Acid Res. (1996) 24:2697-2700), the pyriformis virus IRES (Garlapati et al. , J Biol. Chem. (2004) 279(51):3389-33971) and similar sequences. A variety of non-viral IRES sequences will also be used herein, including but not limited to IRES sequences from yeast and human angiotensin II type 1 receptor IRES (Martin et al. , Mol. Cell Endocrinol. (2003) 212:51-61), fibroblast growth factor IRES (FGF-1 IRES and FGF-2 IRES, Martineau et al. (2004) Mol. Cell. Biol. 24(17):7622-7635), vascular endothelial growth factor IRES (Baranick et al . (2008) Proc. Natl. Acad Sci. USA 105(12):4733-4738, Stein et al. (1998) Mol. Cell. Biol. 18(6):3112-3119, Bert et al. (2006) RNA 12(6): 1074-1083) and insulin-like growth factor 2 IRES (Pedersen et al. (2002) Biochem. J. 363(Pt 1):37-44).

此等元件可以 例如由Clontech (Mountain View, CA)、Invivogen (San Diego, CA), Addgene (Cambridge, MA)及GeneCopoeia (Rockville, MD)出售之質體形式購得。亦參見IRESite: The database of experimentally verified IRES structures (iresite.org). IRES序列可包括於載體中,例如,以與來自表現卡匣之逆轉錄子逆轉錄酶組合來表現用於重組工程之多種噬菌體重組蛋白或用於HDR之RNA引導之核酸酶( 例如,Cas9)。 Such elements can be purchased in the form of plasmids sold , for example, by Clontech (Mountain View, CA), Invivogen (San Diego, CA), Addgene (Cambridge, MA), and GeneCopoeia (Rockville, MD). See also IRESite: The database of experimentally verified IRES structures (iresite.org). IRES sequences can be included in vectors, for example, to express a variety of phage recombinant proteins for recombineering or RNA-guided nucleases ( e.g. , Cas9) for HDR in combination with a retrotranscriptase from an expression cassette.

在一些實施例中,可使用編碼病毒自裂解2A肽(諸如T2A肽)之多核苷酸,以允許在一種啟動子下自單一載體或單一轉錄單位產生多種蛋白質產物( 例如,Cas9、噬菌體重組蛋白、逆轉錄子逆轉錄酶)。可將一或多種2A連接體肽插入多順反子構築體中之編碼序列之間。該2A肽為自裂解的,允許以等莫耳水準產生自該多順反子構築體中共表現之蛋白質。可使用來自各種病毒之2A肽,包括但不限於源自口蹄疫病毒、馬A型鼻炎病毒、Jhosea asigna病毒及豬鐵士古病毒-1之2A肽。參見 例如Kim 等人(2011) PLoS One 6(4): el8556,Trichas 等人(2008) BMC Biol. 6:40,Provost 等人(2007) Genesis 45(10): 625-629,Furler 等人(2001) Gene Ther. 8(11):864-873;以引用之方式整體併入本文中。 In some embodiments, polynucleotides encoding viral self-cleaving 2A peptides (such as T2A peptides) can be used to allow the production of multiple protein products ( e.g. , Cas9, phage recombinant proteins, retrotranscriptase) from a single vector or a single transcription unit under one promoter. One or more 2A linker peptides can be inserted between the coding sequences in the polycistronic construct. The 2A peptide is self-cleaving, allowing the production of proteins co-expressed in the polycistronic construct at equimolar levels. 2A peptides from various viruses can be used, including but not limited to 2A peptides from foot-and-mouth disease virus, equine rhinitis virus A, Jhosea asigna virus, and porcine tegus virus-1. See , e.g., Kim et al. (2011) PLoS One 6(4): el8556, Trichas et al. (2008) BMC Biol. 6:40, Provost et al. (2007) Genesis 45(10): 625-629, Furler et al. (2001) Gene Ther. 8(11):864-873; each of which is incorporated herein by reference in its entirety.

在一些實施例中,該表現構築體包含適合轉化細菌宿主之質體。多種細菌表現載體為熟習此項技術者已知的,且適當載體之選擇為選擇問題。細菌表現載體包括但不限於pACYC177、pASK75、pBAD、pBADM、pBAT、pCal、pET、pETM、pGAT、pGEX、pHAT、pKK223、pMal、pProEx、pQE及pZA31 細菌質體可含有抗生素選擇標記物( 例如,胺苄青黴素(ampicillin)、卡那黴素、紅黴素、羧苄青黴素、鏈黴素或四環素抗性)、lacZ基因(b-半乳糖苷酶由x-gal受質產生藍色色素)、螢光標記物( 例如GFP. mCherry)或用於選擇經轉化細菌之其他標記物。參見 例如Sambrook 等人, 同上。 In some embodiments, the expression construct comprises a plasmid suitable for transformation of a bacterial host. A variety of bacterial expression vectors are known to those skilled in the art, and the selection of an appropriate vector is a matter of choice. Bacterial expression vectors include, but are not limited to, pACYC177, pASK75, pBAD, pBADM, pBAT, pCal, pET, pETM, pGAT, pGEX, pHAT, pKK223, pMal, pProEx, pQE, and pZA31. Bacterial plasmids may contain antibiotic selection markers ( e.g. , ampicillin, kanamycin, erythromycin, carbenicillin, streptomycin, or tetracycline resistance), the lacZ gene (b-galactosidase produces a blue pigment from the x-gal substrate), fluorescent markers ( e.g., GFP. mCherry), or other markers for selection of transformed bacteria. See , e.g., Sambrook et al. , supra.

在其他實施例中,該表現構築體包含適合轉化酵母細胞之質體。酵母表現質體通常含有酵母特異性複製起點(ORI)及營養選擇標記物( 例如,HIS3、URA3、LYS2、LEU2、TRP1、METIS、ura4+、leul+、ade6+)、抗生素選擇標記物( 例如,卡那黴素抗性)、螢光標記物( 例如,mCherry)或用於選擇經轉化酵母細胞之其他標記物。酵母質體可進一步含有允許在細菌宿主( 例如,大腸桿菌)及酵母細胞之間穿梭之組分。可獲得許多不同類型之酵母質體,包括酵母整合質體(Yip),其缺乏ORI且藉由同源重組整合至宿主染色體中;酵母複製質體(YRp),其含有自主複製序列(ARS)且可獨立複製;酵母著絲粒質體(YCp),其為含有ARS部分及著絲粒序列(CEN)部分之低複本載體;及酵母附加型質體(YEp),其為包含來自2微米環(天然酵母質體)之片段的高複本數質體,該片段允許每個細胞穩定繁殖50個或更多複本。 In other embodiments, the expression construct comprises a plasmid suitable for transforming yeast cells. Yeast expression plasmids typically contain a yeast-specific origin of replication (ORI) and a nutritional selection marker ( e.g. , HIS3, URA3, LYS2, LEU2, TRP1, METIS, ura4+, leul+, ade6+), an antibiotic selection marker ( e.g. , kanamycin resistance), a fluorescent marker ( e.g. , mCherry), or other markers for selecting transformed yeast cells. Yeast plasmids may further contain components that allow shuttling between bacterial hosts ( e.g. , E. coli) and yeast cells. Many different types of yeast plasmids are available, including yeast integrating plasmids (Yip), which lack an ORI and integrate into host chromosomes by homologous recombination; yeast replicating plasmids (YRp), which contain an autonomously replicating sequence (ARS) and can replicate independently; yeast centromeric plasmids (YCp), which are low-copy vectors containing an ARS portion and a centromeric sequence (CEN) portion; and yeast episomal plasmids (YEp), which are high-copy number plasmids that contain a fragment from the 2-micron circle (a native yeast plasmid) that allows for the stable propagation of 50 or more copies per cell.

在其他實施例中,該表現構築體不包含適合轉化酵母細胞之質體。In other embodiments, the expression construct does not comprise a plasmid suitable for transformation of yeast cells.

在其他實施例中,該表現構築體包含源自病毒基因體之病毒或經工程改造之構築體。已開發多種基於病毒之系統以將基因轉移至哺乳動物細胞中。此等包括腺病毒、逆轉錄病毒(g-逆轉錄病毒及慢病毒)、痘病毒、腺相關病毒、桿狀病毒及單純疱疹病毒(參見 例如Wamock 等人(2011) Methods Mol. Biol.737:1-25;Walther 等人(2000) Drugs 60(2):249-271;及Lundstrom (2003) Trends Biotechnol. 21(3): 117-122;以引用之方式整體併入本文中)。某些病毒能夠經由受體介導之內吞作用進入細胞,整合至宿主細胞基因體中且穩定且有效地表現病毒基因,使其成為將外源基因轉移至哺乳動物細胞中之有吸引力之候選者。 In other embodiments, the expression construct comprises a virus or engineered construct derived from a viral genome. A variety of virus-based systems have been developed to transfer genes into mammalian cells. These include adenovirus, retrovirus (g-retrovirus and lentivirus), poxvirus, adeno-associated virus, bacilli, and herpes simplex virus (see , e.g., Wamock et al. (2011) Methods Mol. Biol. 737: 1-25; Walther et al. (2000) Drugs 60(2): 249-271; and Lundstrom (2003) Trends Biotechnol. 21(3): 117-122; incorporated herein by reference in their entirety). Certain viruses are able to enter cells via receptor-mediated endocytosis, integrate into the host cell genome and stably and efficiently express viral genes, making them attractive candidates for transferring foreign genes into mammalian cells.

例如,逆轉錄病毒為基因遞送系統提供一個便利平台。可使用此項技術中已知之技術將所選序列插入載體中且封裝於逆轉錄病毒顆粒中。接著可分離重組病毒且將其 活體內或離體遞送至個體之細胞。已描述多種逆轉錄病毒系統(美國專利第5,219,740號;Miller及Rosman (1989) BioTechniques 7:980-990;Miller, A. D. (1990) Human Gene Therapy 1:5-14;Scarpa 等人(1991) Virology 180:849-852;Bums 等人(1993) Proc.Natl.Acad.Sci.USA 90:8033-8037;Boris-Lawrie及Temin (1993) Cur. Opin. Genet. Develop. 3:102-109;及Ferry 等人(2011) Curr. Pharm. Des. 17(24): 2516-2527)。慢病毒為一類逆轉錄病毒,其尤其可用於將多核苷酸遞送至哺乳動物細胞,因為它們能夠感染分裂細胞及非分裂細胞(參見 例如Lois 等人(2002) Science 295:868-872;Durand 等人(2011) Viruses 3(2): 132-159;以引用之方式併入本文中)。 For example, retroviruses provide a convenient platform for gene delivery systems. The selected sequence can be inserted into a vector and encapsulated in a retroviral particle using techniques known in the art. The recombinant virus can then be isolated and delivered to the cells of an individual in vivo or in vitro. A variety of retroviral systems have been described (U.S. Patent No. 5,219,740; Miller and Rosman (1989) BioTechniques 7:980-990; Miller, AD (1990) Human Gene Therapy 1:5-14; Scarpa et al. (1991) Virology 180:849-852; Bums et al . (1993) Proc. Natl. Acad. Sci. USA 90:8033-8037; Boris-Lawrie and Temin (1993) Cur. Opin. Genet. Develop. 3:102-109; and Ferry et al. (2011) Curr. Pharm. Des. 17(24): 2516-2527). Lentiviruses are a type of retrovirus that are particularly useful for delivering polynucleotides to mammalian cells because they are able to infect both dividing and non-dividing cells (see , e.g., Lois et al. (2002) Science 295:868-872; Durand et al. (2011) Viruses 3(2): 132-159; incorporated herein by reference).

亦已描述多種腺病毒載體。與整合至宿主基因體中之逆轉錄病毒不同,腺病毒在染色體外持續存在,因此將與插入突變誘發相關之風險降至最低。A variety of adenoviral vectors have also been described. Unlike retroviruses, which integrate into the host genome, adenoviruses persist extrachromosomally, thus minimizing the risks associated with the induction of insertional mutagenesis.

另外,已開發各種腺相關病毒(AAV)載體系統來用於基因遞送。AAV載體可使用此項技術中熟知之技術容易地構建。參見 例如美國專利第5,173,414號及第5,139,941號;國際公開案第WO 92/01070號(1992年1月23日公開)及第WO 93/03769號(1993年3月4日公開);Lebkowski 等人, Molec. Cell. Biol. (1988) 8:3988-3996;Vincent 等人, Vaccines 90 (1990) (Cold Spring Harbor LaboratoryPress);Carter, B. J. Current Opinion in Biotechnology (1992) 3:533-539;Muzyczka, N. Current Topics in Microbiol and Immunol.(1992) 158:97-129;Kotin, R. M. Human Gene Therapy (1994) 5:793-801;Shelling及Smith, Gene Therapy (1994) 1:165-169;及Zhou 等人, J. Exp. Med. (1994) 179:1867-1875。 In addition, various adeno-associated virus (AAV) vector systems have been developed for gene delivery. AAV vectors can be easily constructed using techniques well known in the art. See , e.g., U.S. Patent Nos. 5,173,414 and 5,139,941; International Publication Nos. WO 92/01070 (published Jan. 23, 1992) and WO 93/03769 (published Mar. 4, 1993); Lebkowski et al. , Molec. Cell. Biol. (1988) 8:3988-3996; Vincent et al. , Vaccines 90 (1990) (Cold Spring Harbor Laboratory Press); Carter, BJ Current Opinion in Biotechnology (1992) 3:533-539; Muzyczka, N. Current Topics in Microbiol and Immunol. (1992) 158:97-129; Kotin, RM Human Gene Therapy (1994) 5:793-801; Shelling and Smith, Gene Therapy (1994) 1:165-169; and Zhou et al. , J. Exp. Med. (1994) 179:1867-1875.

可用於遞送編碼Cas12a編輯系統組分的核酸之另一載體系統係由Small, Jr., P. A. 等人描述之經腸投與之重組痘病毒疫苗(美國專利第5,676,950號,1997年10月14日發布,以引用之方式併入本文中)。 Another vector system that can be used to deliver nucleic acids encoding components of the Cas12a editing system is the enterally administered recombinant poxvirus vaccine described by Small, Jr., PA et al. (U.S. Patent No. 5,676,950, issued October 14, 1997, incorporated herein by reference).

其他病毒載體包括源自包括牛痘病毒及禽痘病毒在內的病毒之痘家族之彼等病毒載體。舉例而言,可如下構建表現相關核酸分子( 例如,Cas12a編輯系統)之牛痘病毒重組體。首先將編碼特定核酸序列之DNA插入適當載體中,使得其鄰近牛痘啟動子及側接牛痘DNA序列,諸如編碼胸苷激酶(TK)之序列。接著使用此載體來轉染同時感染牛痘之細胞。同源重組用於將牛痘啟動子加上編碼相關序列之基因插入病毒基因體中。可藉由在5-溴去氧尿苷存在下培養細胞且挑選對其具有抗性之病毒斑來選擇所得TK重組體。 Other viral vectors include those derived from the pox family of viruses including vaccinia virus and fowlpox virus. For example, a vaccinia virus recombinant expressing a related nucleic acid molecule ( e.g. , Cas12a editing system) can be constructed as follows. First, the DNA encoding a specific nucleic acid sequence is inserted into an appropriate vector so that it is adjacent to the vaccinia promoter and flanking vaccinia DNA sequences, such as sequences encoding thymidine kinase (TK). This vector is then used to transfect cells that are simultaneously infected with vaccinia. Homologous recombination is used to insert the vaccinia promoter plus a gene encoding a related sequence into the viral genome. The resulting TK recombinant can be selected by culturing cells in the presence of 5-bromodeoxyuridine and selecting viral plaques that are resistant to it.

在一些實施例中,亦可使用禽痘病毒(諸如雞痘病毒及金絲雀痘病毒)來遞送相關核酸分子。禽痘載體之使用在人類及其他哺乳動物物種中為尤其需要的,因為禽痘屬之成員僅可在易感禽物種中進行生產性複製且因此在哺乳動物細胞中不具感染性。用於產生重組禽痘病毒之方法為此項技術中已知的且採用遺傳重組,如上文關於牛痘病毒之產生所述。參見 例如WO 91/12882;WO 89/03429;及WO 92/03545。 In some embodiments, avipox viruses (such as chickenpox virus and canarypox virus) can also be used to deliver relevant nucleic acid molecules. The use of avipox vectors is particularly desirable in humans and other mammalian species because members of the genus Avipox can only replicate productively in susceptible avian species and are therefore not infectious in mammalian cells. Methods for generating recombinant avipox viruses are known in the art and employ genetic recombination, as described above for the generation of vaccinia viruses. See, for example, WO 91/12882; WO 89/03429; and WO 92/03545.

分子結合載體亦可用於基因遞送,諸如以下所述之腺病毒嵌合載體:Michael 等人, J. Biol.Chem.(1993) 268:6866-6869及Wagner 等人, Proc. Natl. Acad. Sci. USA (1992) 89:6099-6103。 Molecularly conjugated vectors can also be used for gene delivery, such as the adenovirus chimeric vectors described by Michael et al ., J. Biol. Chem. (1993) 268:6866-6869 and Wagner et al. , Proc. Natl. Acad. Sci. USA (1992) 89:6099-6103.

α病毒屬之成員亦將用作病毒載體來遞送本發明之多核苷酸,諸如但不限於源自辛德畢斯病毒(SIN)、Semliki森林病毒(SFV)及委內瑞拉馬腦炎病毒(VEE)之載體。對於可用於實踐本發明方法之辛德畢斯病毒源性載體之描述,參見Dubensky 等人(1996) J. Virol. 70:508-519;及國際公開案第WO 95/07995號、第WO 96/17072號;以及Dubensky, Jr., T. W. 等人, 美國專利第5,843,723號, 1998年12月1日發佈,及Dubensky, Jr., T. W., 美國專利第5,789,245號, 1998年8月4日發佈,兩者均以引用之方式併入本文中。尤其較佳為嵌合α病毒載體,其包含源自辛德畢斯病毒及委內瑞拉馬腦炎病毒之序列。參見 例如Perri 等人(2003) J. Virol.77: 10394-10403及國際公開案第WO 02/099035號、第WO 02/080982號、第WO 01/81609號及第WO 00/61772號;以引用之方式整體併入本文中。 Members of the alphavirus genus may also be used as viral vectors to deliver the polynucleotides of the invention, such as, but not limited to, vectors derived from Sindbis virus (SIN), Semliki Forest virus (SFV), and Venezuelan equine encephalitis virus (VEE). For a description of Sindbis virus-derived vectors that can be used to practice the methods of the present invention, see Dubensky et al . (1996) J. Virol. 70:508-519; and International Publication Nos. WO 95/07995 and WO 96/17072; and Dubensky, Jr., TW et al. , U.S. Patent No. 5,843,723, issued December 1, 1998, and Dubensky, Jr., TW, U.S. Patent No. 5,789,245, issued August 4, 1998, both of which are incorporated herein by reference. Particularly preferred are chimeric alphavirus vectors that contain sequences derived from Sindbis virus and Venezuelan equine encephalitis virus. See , e.g., Perri et al. (2003) J. Virol. 77: 10394-10403 and International Publication Nos. WO 02/099035, WO 02/080982, WO 01/81609, and WO 00/61772; incorporated herein by reference in their entirety.

基於牛痘之感染/轉染系統可便利地用於在宿主細胞中提供相關核酸( 例如,Cas12a編輯系統)之誘導性瞬時表現。在此系統中,細胞首先在 活體外經牛痘病毒重組體感染,該重組體編碼噬菌體T7 RNA聚合酶。此聚合酶呈現強烈特異性,因為其僅轉錄攜帶T7啟動子之模板。感染後,細胞在T7啟動子之驅動下經相關核酸轉染。牛痘病毒重組體在細胞質中表現之聚合酶將經轉染之DNA轉錄成RNA。該方法提供大量RNA之高水準、瞬時、細胞質產生。參見 例如Elroy-Stein及Moss, Proc.Natl.Acad.Sci.USA (1990) 87:6743- 6747;Fuerst 等人, Proc.Natl.Acad.Sci.USA (1986) 83:8122-8126。 The infection/transfection system based on cowpox can be conveniently used to provide induced transient expression of relevant nucleic acids ( e.g. , Cas12a editing system) in host cells. In this system, cells are first infected in vitro with cowpox virus recombinants that encode bacteriophage T7 RNA polymerase. This polymerase exhibits strong specificity because it only transcribes templates carrying the T7 promoter. After infection, the cells are transfected with relevant nucleic acids driven by the T7 promoter. The polymerase expressed by the cowpox virus recombinant in the cytoplasm transcribes the transfected DNA into RNA. This method provides high-level, transient, cytoplasmic production of large amounts of RNA. See , e.g., Elroy-Stein and Moss, Proc. Natl. Acad. Sci. USA (1990) 87:6743-6747; Fuerst et al. , Proc. Natl. Acad. Sci. USA (1986) 83:8122-8126.

在用牛痘或禽痘病毒重組體感染或使用其他病毒載體遞送核酸之其他方法中,可使用擴增系統,該系統將在引入宿主細胞中之後導致高水準表現。特定言之,可對T7 RNA聚合酶編碼區之前之T7 RNA聚合酶啟動子進行工程改造。源自此模板之RNA之轉譯將產生T7 RNA聚合酶,該T7 RNA聚合酶又將轉錄更多模板。同時將存在cDNA,其表現處於T7啟動子之控制下。因此,由擴增模板RNA轉譯生成之一些T7 RNA聚合酶將導致所需基因之轉錄。因為需要一些T7 RNA聚合酶來起始擴增,故可將T7 RNA聚合酶連同模板一起引入細胞中以引發轉錄反應。該聚合酶可作為蛋白質經引入,或經引入編碼RNA聚合酶之質體上。對於T7系統及其用於轉化細胞之用途之進一步論述,參見 例如國際公開案第WO 94/26911號;Studier及Moffatt, J. Mol.Biol.(1986) 189:113-130;Deng及Wolff, Gene (1994) 143:245-249;Gao 等人, Biochem. Biophys. Res. Commun. (1994) 200:1201-1206;Gao及Huang, Nuc. Acids Res.(1993) 21:2867-2872;Chen 等人, Nuc. Acids Res.(1994) 22:2114-2120;及美國專利第5,135,855號。 In other methods of delivering nucleic acids using recombinant infection of cowpox or fowlpox viruses or using other viral vectors, an amplification system can be used that will result in high levels of expression after introduction into host cells. Specifically, the T7 RNA polymerase promoter preceding the T7 RNA polymerase coding region can be engineered. Translation of RNA derived from this template will produce T7 RNA polymerase, which in turn will transcribe more templates. At the same time, there will be cDNA, whose expression is under the control of the T7 promoter. Therefore, some T7 RNA polymerase generated by the amplified template RNA will result in transcription of the desired gene. Because some T7 RNA polymerase is needed to initiate amplification, the T7 RNA polymerase can be introduced into the cell together with the template to initiate the transcription reaction. The polymerase can be introduced as a protein, or introduced into the plasmid encoding the RNA polymerase. For further discussion of the T7 system and its use for transforming cells, see , e.g., International Publication No. WO 94/26911; Studier and Moffatt, J. Mol. Biol. (1986) 189:113-130; Deng and Wolff, Gene (1994) 143:245-249; Gao et al. , Biochem. Biophys. Res. Commun. (1994) 200:1201-1206; Gao and Huang, Nuc. Acids Res. (1993) 21:2867-2872; Chen et al. , Nuc. Acids Res. (1994) 22:2114-2120; and U.S. Patent No. 5,135,855.

亦可使用昆蟲細胞表現系統,諸如桿狀病毒系統,且該等系統為熟習此項技術者已知的且描述於 例如桿狀病毒及昆蟲細胞表現方案(Methods in Molecular Biology, D.W.Murhammer編, Humana Press, 第2版, 2007)及L. King The Baculovirus Expression System: A laboratory guide (Springer, 1992)中。用於桿狀病毒/昆蟲細胞表現系統之材料及方法可以套組形式尤其自Thermo Fisher Scientific (Waltham, MA)及Clontech (Mountain View, CA)購得。 Insect cell expression systems, such as the baculovirus system, may also be used and are known to those skilled in the art and are described, for example, in Baculovirus and Insect Cell Expression Protocols (Methods in Molecular Biology, DW Murhammer, ed., Humana Press, 2nd edition, 2007) and L. King The Baculovirus Expression System: A laboratory guide (Springer, 1992). Materials and methods for the baculovirus/insect cell expression system are available in kit form, inter alia, from Thermo Fisher Scientific (Waltham, MA) and Clontech (Mountain View, CA).

植物表現系統亦可用於轉化植物細胞。一般而言,此類系統使用基於病毒之載體以異源基因轉染植物細胞。對於此類系統之描述,參見 例如Porta 等人, Mol. Biotech. (1996) 5:209-221;及Hackland 等人, Arch. Virol.(1994) 139:1-22。 Plant expression systems can also be used to transform plant cells. Generally, such systems use virus-based vectors to transfect plant cells with heterologous genes. For descriptions of such systems, see, for example, Porta et al. , Mol. Biotech. (1996) 5:209-221; and Hackland et al. , Arch. Virol. (1994) 139:1-22.

為了獲得基於逆轉錄子之編輯系統(或其組分)的表現,必須將表現構築體及/或RNA組分遞送至細胞中。此遞送可在 活體外完成,如在用於轉化細胞株之實驗室程序中,或在 活體內離體完成,如在某些疾病狀態之治療中。一種遞送機制係經由病毒感染,其中表現構築體經囊封於傳染性病毒顆粒中。 非病毒遞送方法 In order to obtain expression of a retroviral-based editing system (or its components), the expression construct and/or RNA components must be delivered to the cell. This delivery can be accomplished in vitro , as in laboratory procedures used to transform cell lines, or in vivo or ex vivo , as in the treatment of certain disease states. One mechanism of delivery is via viral infection, where the expression construct is encapsulated in infectious viral particles. Non-viral delivery methods

亦考慮數種用於轉移表現構築體之非病毒方法,該等方法可用於將基於逆轉錄子之編輯系統或其組分遞送至細胞中。此等包括使用磷酸鈣沈澱、DEAE-葡聚糖、電穿孔、直接顯微注射、負載DNA之脂質體、lipofectamine-DNA複合物、細胞音波處理、使用高速微彈之基因轟擊及受體介導之轉染(參見 例如Graham及Van Der Eb (1973) Virology 52:456-467;Chen及Okayama (1987) Mol. Cell Biol. 7:2745-2752;Rippe 等人(1990) Mol.Cell Biol.10:689-695;Gopal (1985) Mol.Cell Biol.5:1188-1190;Tur-Kaspa 等人(1986) Mol.Cell. Biol.6:716-718;Potter 等人(1984) Proc.Natl.Acad.Sci.USA 81:7161-7165);Harland及Weintraub (1985) J. Cell Biol.101:1094-1099);Nicolau及Sene (1982) Biochim. Biophys. Acta 721:185-190;Fraley 等人(1979) Proc.Natl.Acad.Sci.USA 76:3348-3352;Fechheimer 等人(1987) Proc Natl. Acad.Sci.USA 84:8463-8467;Yang 等人(1990) Proc.Natl.Acad.Sci.USA 87:9568-9572;Wu及Wu (1987) J. Biol.Chem.262: 4429-4432;Wu及Wu (1988) Biochemistry 27:887-892;以引用之方式併入本文中)。此等技術中之一些可成功地經調適用於 活體內離體使用。 Several non-viral methods for transferring expression constructs are also contemplated that can be used to deliver retroviral-based editing systems or components thereof into cells. These include the use of calcium phosphate precipitation, DEAE-dextran, electroporation, direct microinjection, DNA-loaded liposomes, lipofectamine-DNA complexes, cell sonication, gene bombardment using high-speed microprojectiles, and receptor-mediated transfection (see , e.g., Graham and Van Der Eb (1973) Virology 52:456-467; Chen and Okayama (1987) Mol. Cell Biol. 7:2745-2752; Rippe et al. (1990) Mol. Cell Biol. 10:689-695; Gopal (1985) Mol. Cell Biol. 5:1188-1190; Tur-Kaspa et al. (1986) Mol. Cell. Biol. 6:716-718; Potter et al . (1984) USA 81:7161-7165); Harland and Weintraub (1985) J. Cell Biol. 101:1094-1099); Nicolau and Sene (1982) Biochim. Biophys. Acta 721:185-190; Fraley et al . (1979) Proc. Natl. Acad. Sci. USA 76:3348-3352; Fechheimer et al. (1987) Proc. Natl. Acad. Sci. USA 84:8463-8467; Yang et al. (1990) Proc. Natl. Acad. Sci. USA 87:9568-9572; Wu and Wu (1987) J. Biol. Chem. 262: 4429-4432; Wu and Wu (1988) Biochemistry 27:887-892; incorporated herein by reference). Some of these techniques can be successfully adapted for in vivo or ex vivo use.

在一些實施例中,編碼基於逆轉錄子之編輯系統或其組分的核酸分子可穩定地整合至細胞之基因體中。此整合可經由同源重組(基因置換)處於同源位置及取向中,或其可在隨機、非特異性位置中經整合(基因增強)。在其他實施例中,核酸可作為獨立、附加型DNA區段穩定地維持於細胞中。此類核酸區段或附加體編碼足以允許獨立於宿主細胞週期或與宿主細胞週期同步地維持及複製之序列。如何將表現構築體遞送至細胞以及核酸保留於細胞中之何處取決於所採用之表現構築體類型。In some embodiments, nucleic acid molecules encoding a retrotranscript-based editing system or its components can be stably integrated into the genome of a cell. This integration can be in a homologous position and orientation via homologous recombination (gene replacement), or it can be integrated in a random, non-specific position (gene enhancement). In other embodiments, the nucleic acid can be stably maintained in the cell as an independent, additional DNA segment. Such nucleic acid segments or episomes encode sequences sufficient to allow maintenance and replication independent of or in sync with the host cell cycle. How the expression construct is delivered to the cell and where the nucleic acid is retained in the cell depends on the type of expression construct used.

在一些實施例中,編碼基於逆轉錄子之編輯系統或其組分的表現構築體可簡單地由包含編碼該等基於逆轉錄子之編輯系統或其組分的核苷酸序列之裸重組DNA或質體組成。構築體之轉移可藉由任何上文所提及之方法執行,該等方法以物理方式或以化學方式滲透細胞膜。此尤其適用於 活體外轉移,但其亦可應用於 活體內使用。Dubensky 等人(Proc. Natl.Acad.Sci.USA (1984) 81:7529-7533)成功地將呈磷酸鈣沈澱物形式之多瘤病毒DNA注射至成年及新生小鼠之肝臟及脾臟中,證明活躍病毒複製及急性傳染。Benvenisty及Neshif (Proc.Natl.Acad.Sci.USA (1986) 83:9551-9555)亦證明直接腹膜內注射磷酸鈣沈澱之質體導致經轉染基因之表現。預期編碼相關基於逆轉錄子之編輯系統或其組分的DNA亦可在 活體內以相似方式轉移且表現逆轉錄子產物。 In some embodiments, the expression construct encoding the retrotransposon-based editing system or its components may simply consist of naked recombinant DNA or plasmids comprising nucleotide sequences encoding the retrotransposon-based editing system or its components. Transfer of the construct may be performed by any of the methods mentioned above, which physically or chemically permeate the cell membrane. This is particularly applicable to in vitro transfer, but it can also be applied for in vivo use. Dubensky et al. (Proc. Natl. Acad. Sci. USA (1984) 81:7529-7533) successfully injected polyomavirus DNA in the form of calcium phosphate precipitates into the liver and spleen of adult and newborn mice, demonstrating active viral replication and acute infection. Benvenisty and Neshif (Proc. Natl. Acad. Sci. USA (1986) 83:9551-9555) also demonstrated that direct intraperitoneal injection of calcium phosphate-precipitated plasmids resulted in expression of the transfected gene. It is expected that DNA encoding a related retrotranscript-based editing system or its components can also be transferred in vivo in a similar manner and express retrotranscript products.

在另一實施例中,可藉由顆粒轟擊將編碼基於逆轉錄子之編輯系統或其組分的DNA表現構築體轉移至細胞中。此方法依賴於將DNA包覆之微彈加速至高速之能力,從而使它們刺穿細胞膜且進入細胞而不殺死細胞(Klein 等人(1987) Nature 327:70-73)。已開發數種用於加速小顆粒之裝置。一種此類裝置依賴於高壓放電來生成電流,該電流又提供原動力(Yang 等人(1990) Proc.Natl.Acad.Sci.USA 87:9568-9572)。微彈可由生物惰性物質(諸如鎢或金珠)組成。 脂質體 In another embodiment, DNA expression constructs encoding retrotranscript-based editing systems or components thereof can be transferred into cells by particle bombardment. This method relies on the ability to accelerate DNA-coated microprojectiles to high speeds, allowing them to pierce cell membranes and enter cells without killing them (Klein et al. (1987) Nature 327:70-73). Several devices have been developed for accelerating small particles. One such device relies on a high-voltage discharge to generate an electric current, which in turn provides the motive force (Yang et al. (1990) Proc. Natl. Acad. Sci. USA 87:9568-9572). Microprojectiles can be composed of biologically inert materials such as tungsten or gold beads. Liposomes

在另一實施例中,可使用脂質體來遞送編碼基於逆轉錄子之編輯系統或其組分的構築體。脂質體為囊泡結構,其特徵在於磷脂雙層膜及內部水介質。多層脂質體具有由水介質隔開之多個脂質層。當磷脂懸浮於過量水溶液中時,它們會自發地形成。脂質組分在形成封閉結構之前經歷自重排,且在脂質雙層之間截留水及溶解之溶質(Ghosh及Bachhawat (1991) Liver Diseases, Targeted Diagnosis and Therapy Using Specific Receptors and Ligands,Wu 等人(編), Marcel Dekker, NY, 87-104)。亦考慮使用lipofectamine-DNA複合物。 In another embodiment, liposomes can be used to deliver constructs encoding retrotranscript-based editing systems or components thereof. Liposomes are vesicular structures characterized by a phospholipid bilayer membrane and an internal aqueous medium. Multilamellar liposomes have multiple lipid layers separated by an aqueous medium. They form spontaneously when phospholipids are suspended in an excess of aqueous solution. The lipid components undergo self-rearrangement before forming a closed structure and trap water and dissolved solutes between the lipid bilayers (Ghosh and Bachhawat (1991) Liver Diseases, Targeted Diagnosis and Therapy Using Specific Receptors and Ligands, Wu et al. (eds.), Marcel Dekker, NY, 87-104). The use of lipofectamine-DNA complexes is also contemplated.

在一些實施例中,脂質體可與血凝病毒(HVJ)複合。這已顯示出促進與細胞膜之融合且促進脂質體囊封之DNA進入細胞(Kaneda 等人(1989) Science 243:375-378)。在其他實施例中,脂質體可與核非組蛋白染色體蛋白(HMG-I)複合或聯合使用(Kato 等人(1991) J. Biol. Chem.266(6):3361-3364)。 In some embodiments, the liposomes may be complexed with hemagglutinating virus (HVJ). This has been shown to promote fusion with cell membranes and facilitate entry of liposome-encapsulated DNA into cells (Kaneda et al. (1989) Science 243:375-378). In other embodiments, the liposomes may be complexed or used in combination with nuclear non-histone chromosomal proteins (HMG-I) (Kato et al. (1991) J. Biol. Chem. 266(6):3361-3364).

在其他實施例中,脂質體可與HVJ及HMG-I兩者複合或聯合使用。當DNA構築體中使用細菌啟動子時,亦將需要在脂質體內包括適當細菌聚合酶。In other embodiments, liposomes may be complexed or used in combination with both HVJ and HMG-I. When a bacterial promoter is used in the DNA construct, it will also be necessary to include an appropriate bacterial polymerase in the liposomes.

脂質體可由數種不同類型之脂質(例如,磷脂)製得。脂質體可包含天然磷脂及脂質,諸如l,2-二硬脂醯基-sn-甘油-3 -磷脂醯膽鹼(DSPC)、鞘磷脂、卵磷脂醯膽鹼、單唾液酸神經節苷脂或其任何組合。Liposomes can be made from several different types of lipids (e.g., phospholipids). Liposomes can include natural phospholipids and lipids such as 1,2-distearoyl-sn-glycero-3-phosphatidylinositol choline (DSPC), sphingomyelin, phosphatidylcholine, monosialoganglioside, or any combination thereof.

可將數種其他添加劑添加至脂質體中以便改變其結構及特性。例如,脂質體可進一步包含膽固醇、鞘磷脂及/或1,2-二油醯基-sn-甘油-3-磷酸乙醇胺(DOPE),例如以增加穩定性及/或防止脂質體內部貨物之滲漏。A variety of other additives can be added to liposomes in order to modify their structure and properties. For example, liposomes can further comprise cholesterol, sphingomyelin and/or 1,2-dioleyl-sn-glycero-3-phosphoethanolamine (DOPE), for example to increase stability and/or prevent leakage of cargo inside the liposomes.

在一實施例中,脂質體包含轉運聚合物,該聚合物可視情況為分支鏈的,包含至少10個胺基酸且組胺酸與非組胺酸胺基酸之比率大於1.5且小於10。該分支鏈轉運聚合物可包含一或多個主鏈、一或多個末端分支鏈以及視情況存在之一或多各非末端分支鏈。參見美國專利第7,070,807號,以引用之方式整體併入本文中。在一實施例中,該轉運聚合物為組胺酸-離胺酸共聚物(HKP),其用於封裝及遞送mRNA及其他貨物。參見美國專利第7,163,695號及第7,772,201號,以引用之方式整體併入本文中。In one embodiment, the liposome comprises a transport polymer, which may be branched, comprising at least 10 amino acids and a ratio of histidine to non-histidine amino acids greater than 1.5 and less than 10. The branched transport polymer may comprise one or more backbones, one or more terminal branches, and optionally one or more non-terminal branches. See U.S. Patent No. 7,070,807, which is incorporated herein by reference in its entirety. In one embodiment, the transport polymer is a histidine-lysine copolymer (HKP), which is used to encapsulate and deliver mRNA and other cargoes. See U.S. Patent Nos. 7,163,695 and 7,772,201, which are incorporated herein by reference in their entirety.

在一實施例中,脂質顆粒可為穩定核酸脂質顆粒(SNALP)。SNALP可包含可離子化脂質(DLinDMA) (例如,在低pH下為陽離子性的)、中性輔助脂質、膽固醇、可擴散聚乙二醇(PEG)-脂質或其任何組合。在一些實例中,SNALP可包含合成膽固醇、二棕櫚醯基磷脂醯膽鹼、3-N-[(w-甲氧基聚乙二醇)2000)胺甲醯基]-l,2-二肉豆蔻基氧基丙胺及陽離子l,2-二亞油基氧基-3-N,N二甲基胺基丙烷。在一些實例中,SNALP可包含合成膽固醇、l,2-二硬脂醯基-sn-甘油-3-磷酸膽鹼、PEG- eDMA及1,2-二亞油基氧基-3-(N;N-二甲基)胺基丙烷(DLinDMA)。 基於聚合物之媒劑 In one embodiment, the lipid particles may be stable nucleic acid lipid particles (SNALP). SNALP may include ionizable lipids (DLinDMA) (e.g., cationic at low pH), neutral auxiliary lipids, cholesterol, diffusible polyethylene glycol (PEG)-lipids, or any combination thereof. In some examples, SNALP may include synthetic cholesterol, dimalmitoylphosphatidylcholine, 3-N-[(w-methoxypolyethylene glycol)2000)aminomethyl]-1,2-dimyristyloxypropylamine, and cationic 1,2-dilinoleyloxy-3-N,N-dimethylaminopropane. In some examples, SNALP can include synthetic cholesterol, 1,2-distearoyl-sn-glycero-3-phosphocholine, PEG-eDMA, and 1,2-dilinoleyloxy-3-(N;N-dimethyl)aminopropane (DLinDMA). Polymer-based media

在一實施例中,可由包含基於聚合物之顆粒(例如,奈米顆粒)的遞送媒劑囊封基於逆轉錄子之編輯系統或其組分。在一實施例中,基於聚合物之顆粒可模擬病毒膜融合機制。基於聚合物之顆粒可為流感病毒機器之合成複本,且與細胞經由內吞路徑攝取的各種類型之核酸((siRNA、miRNA、質體DNA或核酸組分、mRNA)形成轉染複合物,此為涉及酸性隔間形成之過程。晚期內體中之低pH充當化學開關,使顆粒表面具疏水性且促進膜穿過。一旦進入細胞溶質,顆粒就會釋放其有效載荷以進行細胞作用。此活性內體逃逸技術為安全的,且由於它使用天然攝取路徑而使轉染效率增至最大。在一實施例中,基於聚合物之顆粒可包含烷基化及羧基烷基化分支鏈聚乙烯亞胺。在一些實例中,基於聚合物之顆粒為VIROMER,例如VIROMER RNAi、VIROMER RED、VIROMER mRNA。遞送本文中之系統及組合物的例示性方法包括以下所述之彼等方法:Bawage SS等人, Synthetic mRNA expressed Cast 3a mitigates RNA virus infections, biorxiv.org/content/10.1101/370460vl. 完整doi: doi.org/10.1101/370460,Viromer® RED,一種用於轉染角質細胞之有力工具。doi: 10.13140/RG.2.2.16993.61281,Viromer® Transfection - Factbook 2018:技術、產品概覽、用戶資料,doi: 10.13140/RG.2.2.23912.16642。 胞泌體 In one embodiment, a retroviral-based editing system or components thereof can be encapsulated by a delivery vehicle comprising polymer-based particles (e.g., nanoparticles). In one embodiment, the polymer-based particles can mimic the viral membrane fusion mechanism. The polymer-based particles can be synthetic copies of the influenza virus machinery and form transfection complexes with various types of nucleic acids (siRNA, miRNA, plasmid DNA or nucleic acid components, mRNA) that are taken up by the cell via the endocytic pathway, a process that involves the formation of an acidic compartment. The low pH in the late endosome acts as a chemical switch, making the particle surface hydrophobic and facilitating membrane permeation. Once in the cytosol, the particles release their payload for cellular action. This active endosomal escape technology is safe and maximizes transfection efficiency because it uses the natural uptake pathway. In one embodiment, the polymer-based particles can comprise alkylated and carboxyalkylated branched-chain polyethyleneimine. In some examples, the polymer-based particles are VIROMER, such as VIROMER RNAi, VIROMER RED, VIROMER mRNA. Exemplary methods for delivering the systems and compositions herein include those described in: Bawage SS et al., Synthetic mRNA expressed Cast 3a mitigates RNA virus infections, biorxiv.org/content/10.1101/370460vl. Full doi: doi.org/10.1101/370460, Viromer® RED, A powerful tool for transfecting keratinocytes . doi: 10.13140/RG.2.2.16993.61281, Viromer® Transfection - Factbook 2018: Technology, Product Overview, User Information, doi: 10.13140/RG.2.2.23912.16642.

遞送媒劑可包含胞泌體。胞泌體包括膜結合之細胞外囊泡,其可用於含有及遞送各種類型之生物分子,諸如蛋白質、碳水化合物、脂質及核酸及其複合物(例如,RNP)。胞泌體之實例包括以下所述之彼等:Schroeder A等人, J Intern Med. 2010年1月;267(l):9-21;El-Andaloussi S等人, Nat Protoc. 2012年12月;7(12):2112-26;Uno Y等人, HumThe delivery vehicle may comprise an exosome. Exosomes include membrane-bound extracellular vesicles that can be used to contain and deliver various types of biomolecules, such as proteins, carbohydrates, lipids, and nucleic acids and complexes thereof (e.g., RNPs). Examples of exosomes include those described below: Schroeder A et al., J Intern Med. 2010 Jan;267(l):9-21; El-Andaloussi S et al., Nat Protoc. 2012 Dec;7(12):2112-26; Uno Y et al., Hum

Gene Ther. 2011年6月;22(6):711-9;Zou W等人, Hum Gene Then 2011年4月;22(4):465-75。例示性胞泌體可由293F細胞生成,其中在一些情況下,負載mRNA之胞泌體比負載mRNA之LNP驅動更高mRNA表現。參見例如J. Biol. Chem. (2021) 297(5) 101266Gene Ther. 2011 Jun;22(6):711-9; Zou W et al., Hum Gene Then 2011 Apr;22(4):465-75. Exemplary exosomes can be generated by 293F cells, where in some cases, mRNA-loaded exosomes drive higher mRNA expression than mRNA-loaded LNPs. See, e.g., J. Biol. Chem. (2021) 297(5) 101266

在一些實例中,胞泌體可與貨物之一或多種組分形成複合物(例如,藉由直接地或間接地結合)。在某些實例中,胞泌體之分子可與第一銜接蛋白融合且貨物之組分可與第二銜接蛋白融合。第一及第二銜接蛋白可特異性地彼此結合,因此使貨物與胞泌體締合。此類胞泌體之實例包括以下所述之彼等:Ye Y等人, Biomater Sci. 2020年4月28日. doi: 10.1039/d0bm00427h。 受體介導之遞送 In some examples, the exosome can form a complex with one or more components of the cargo (e.g., by binding directly or indirectly). In certain examples, the exosome molecule can be fused to a first anchor protein and the cargo component can be fused to a second anchor protein. The first and second anchor proteins can specifically bind to each other, thereby associating the cargo with the exosome. Examples of such exosomes include those described below: Ye Y et al., Biomater Sci. 2020 Apr 28. doi: 10.1039/d0bm00427h. Receptor-mediated delivery

編碼基於逆轉錄子之編輯系統或其組分的其他表現構築體為受體介導之遞送媒劑。此等利用了幾乎所有真核細胞中受體介導之內吞作用對大分子之選擇性攝取。由於各種受體之細胞類型特異性分佈,遞送可為高度特異性的(Wu及Wu (1993) Adv. Drug Delivery Rev. 12:159- 167)。受體介導之基因靶向媒劑一般由兩種組分組成:細胞受體特異性配位體及DNA結合劑。數種配位體已用於受體介導之基因轉移。最廣泛表徵之配位體為去唾液酸血清類黏蛋白(ASOR)及轉鐵蛋白(參見 例如Wu及Wu (1987), 同上;Wagner 等人(1990) Proc.Natl.Acad.Sci.USA 87(9):3410- 3414)。識別與ASOR相同之受體的合成新醣蛋白已用作基因遞送媒劑(Ferkol 等人(1993) FASEB J. 7:1081-1091;Perales 等人(1994) Proc.Natl.Acad.Sci.USA 91(9):4086-4090),且表皮生長因子(EGF)亦已用於將基因遞送至鱗狀癌細胞(Myers, EPO 0273085)。 Other expression constructs encoding retrotranscript-based editing systems or their components are receptor-mediated delivery vehicles. These take advantage of the selective uptake of macromolecules by receptor-mediated endocytosis in nearly all eukaryotic cells. Due to the cell type-specific distribution of the various receptors, delivery can be highly specific (Wu and Wu (1993) Adv. Drug Delivery Rev. 12:159-167). Receptor-mediated gene targeting vehicles generally consist of two components: a cell receptor-specific ligand and a DNA-binding agent. Several ligands have been used for receptor-mediated gene transfer. The most widely characterized ligands are asialo-seromucoid (ASOR) and transferrin (see , e.g., Wu and Wu (1987), supra; Wagner et al. (1990) Proc. Natl. Acad. Sci. USA 87(9):3410-3414). Synthetic neoglycoproteins that recognize the same receptor as ASOR have been used as gene delivery vehicles (Ferkol et al. (1993) FASEB J. 7:1081-1091; Perales et al. (1994) Proc. Natl. Acad. Sci. USA 91(9):4086-4090), and epidermal growth factor (EGF) has also been used to deliver genes to squamous cell carcinomas (Myers, EPO 0273085).

在其他實施例中,包含一或多種編碼基於逆轉錄子之編輯系統或其組分的表現構築體之遞送媒劑可包含配位體及脂質體。例如,Nicolau 等人(Methods Enzymol. (1987) 149:157-176)使用併入脂質體中之乳糖1-神經醯胺(一種半乳糖末端去唾液酸神經節苷脂)且觀察到肝細胞對胰島素基因之攝取增加。因此,編碼特定基因之核酸亦可藉由任何數目之受體-配位體系統(具有或不具有脂質體)特異性地遞送至細胞中係可行的。此外,細胞表面抗原之抗體同樣可用作靶向部分。 In other embodiments, the delivery vehicle comprising one or more expression constructs encoding retrotranscript-based editing systems or components thereof may comprise a ligand and a liposome. For example, Nicolau et al. (Methods Enzymol. (1987) 149: 157-176) used lactose 1-ceramide (a galactose-terminal desialoganglioside) incorporated into liposomes and observed an increase in the uptake of insulin genes by hepatocytes. Thus, it is feasible that nucleic acids encoding specific genes can also be specifically delivered to cells by any number of receptor-ligand systems (with or without liposomes). In addition, antibodies to cell surface antigens can also be used as targeting moieties.

在一些實施例中,可用於本文所述之基於逆轉錄子之編輯系統或其組分的啟動子可為組成型、誘導型或組織特異性的。在一些實施例中,啟動子可為組成型啟動子。非限制性例示性組成型啟動子包括巨細胞病毒即刻早期啟動子(CMV)、猿猴病毒(SV40)啟動子、腺病毒主要晚期(MLP)啟動子、勞斯肉瘤病毒(RSV)啟動子、小鼠乳腺腫瘤病毒(MMTV)啟動子、磷酸甘油酸激酶(PGK)啟動子、延伸因子-α (EFla)啟動子、泛素啟動子、肌動蛋白啟動子、微管蛋白啟動子、免疫球蛋白啟動子、其功能片段或前述任一者之組合。在一些實施例中,啟動子可為CMV啟動子。在一些實施例中,啟動子可為經截短之CMV啟動子。在其他實施例中,啟動子可為EFla啟動子。在一些實施例中,啟動子可為誘導型啟動子。非限制性例示性誘導型啟動子包括可藉由熱休克、光、化學品、肽、金屬、類固醇、抗生素或酒精誘導之彼等啟動子。在一些實施例中,誘導型啟動子可為具有低基礎(非誘導型)表現水準之啟動子,例如Tet-On®啟動子(Clontech)。在一些實施例中,啟動子可為組織特異性啟動子。在一些實施例中,組織特異性啟動子排他地或主要在肝組織中表現。非限制性例示性組織特異性啟動子包括B29啟動子、CD14啟動子、CD43啟動子、CD45啟動子、CD68啟動子、結蛋白啟動子、彈性蛋白酶-1啟動子、內皮糖蛋白啟動子、纖維連接蛋白啟動子、Flt-1啟動子、GFAP啟動子、GPIIb啟動子、ICAM- 2啟動子、INF-b啟動子、Mb啟動子、Nphsl啟動子、OG-2啟動子、SP-B啟動子、SYN1啟動子及WASP啟動子。 脂質奈米顆粒(LNP) In some embodiments, promoters useful in the retrotransposons-based editing systems described herein or components thereof can be constitutive, induced, or tissue-specific. In some embodiments, the promoter can be a constitutive promoter. Non-limiting exemplary constitutive promoters include cytomegalovirus immediate early promoter (CMV), simian virus (SV40) promoter, adenovirus major late (MLP) promoter, Rous sarcoma virus (RSV) promoter, mouse mammary tumor virus (MMTV) promoter, phosphoglycerate kinase (PGK) promoter, elongation factor-α (EF1a) promoter, ubiquitin promoter, actin promoter, tubulin promoter, immunoglobulin promoter, functional fragments thereof, or combinations of any of the foregoing. In some embodiments, the promoter may be a CMV promoter. In some embodiments, the promoter may be a truncated CMV promoter. In other embodiments, the promoter may be an EF1a promoter. In some embodiments, the promoter may be an inducible promoter. Non-limiting exemplary inducible promoters include those that can be induced by heat shock, light, chemicals, peptides, metals, steroids, antibiotics, or alcohol. In some embodiments, the inducible promoter may be a promoter with a low basal (non-induced) expression level, such as the Tet-On® promoter (Clontech). In some embodiments, the promoter may be a tissue-specific promoter. In some embodiments, the tissue-specific promoter is expressed exclusively or primarily in liver tissue. Non-limiting exemplary tissue-specific promoters include B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, Flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-b promoter, Mb promoter, Nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter and WASP promoter. Lipid Nanoparticles (LNP)

本文所述之基於逆轉錄子之編輯系統或其組分(例如,線性及環狀mRNA;核鹼基編輯系統及/或其組分)可由脂質奈米顆粒(LNP)及包含RNA囊封之LNP的組合物及/或調配物封裝及遞送。The retrotranscript-based editing systems or components thereof described herein (e.g., linear and circular mRNAs; nucleobase editing systems and/or components thereof) can be encapsulated and delivered by lipid nanoparticles (LNPs) and compositions and/or formulations comprising RNA-encapsulated LNPs.

下文描述可用作本文所考慮之有效載荷遞送媒劑的LNP,以及可用於製備本文中用於將有效載荷遞送至細胞之LNP之各種可離子化脂質、結構脂質、PEG化脂質及磷脂。此外,下文描述預期之額外LNP組分,諸如靶向部分及其他脂質組分。Described below are LNPs that can be used as payload delivery vehicles contemplated herein, as well as various ionizable lipids, structured lipids, PEGylated lipids, and phospholipids that can be used to prepare LNPs for delivery of payloads to cells herein. In addition, contemplated additional LNP components, such as targeting moieties and other lipid components, are described below.

在一態樣中,本揭示案進一步提供用於遞送本文所揭示之治療有效載荷(例如本文所述之RNA有效載荷,其可編碼相關多肽,例如核鹼基編輯系統或治療蛋白)的遞送系統。在一些實施例中,適合遞送本文所揭示之治療有效載荷的遞送系統包含脂質奈米顆粒(LNP)調配物。In one aspect, the present disclosure further provides a delivery system for delivering a therapeutic payload disclosed herein (e.g., an RNA payload described herein, which may encode a polypeptide of interest, such as a nucleobase editing system or a therapeutic protein). In some embodiments, a delivery system suitable for delivering a therapeutic payload disclosed herein comprises a lipid nanoparticle (LNP) formulation.

在一些實施例中,本揭示案之LNP包含可離子化脂質、結構脂質、PEG化脂質(亦稱為PEG脂質)及磷脂。在替代實施例中,LNP包含可離子化脂質、結構脂質、PEG化脂質(亦稱為PEG脂質)及兩性離子胺基酸脂質。在一些實施例中,除了任何前述脂質組分外,LNP進一步包含第五脂質。在一些實施例中,LNP囊封本揭示案之活性劑之一或多種元件。在一些實施例中,LNP進一步包含共價或非共價結合至LNP之外表面之靶向部分。在一些實施例中,該靶向部分係結合於特定器官系統之細胞或以其他方式促進該等細胞之攝取之靶向部分。In some embodiments, the LNP of the present disclosure comprises ionizable lipids, structural lipids, PEGylated lipids (also referred to as PEG lipids) and phospholipids. In alternative embodiments, the LNP comprises ionizable lipids, structural lipids, PEGylated lipids (also referred to as PEG lipids) and zwitterionic amino acid lipids. In some embodiments, in addition to any of the aforementioned lipid components, the LNP further comprises a fifth lipid. In some embodiments, the LNP encapsulates one or more elements of the active agent of the present disclosure. In some embodiments, the LNP further comprises a targeting moiety covalently or non-covalently bound to the outer surface of the LNP. In some embodiments, the targeting moiety is a targeting moiety that is bound to the cells of a specific organ system or otherwise promotes the uptake of the cells.

在一些實施例中,LNP具有至少約20 nm、30 nm, 40 nm、50 nm、60 nm、70 nm、80 nm或90 nm之直徑。在一些實施例中,LNP具有小於約100 nm、110 nm、120 nm、130 nm、140 nm、150 nm或160 nm之直徑。在一些實施例中,LNP具有小於約100 nm之直徑。在一些實施例中,LNP具有小於約90 nm之直徑。在一些實施例中,LNP具有小於約80 nm之直徑。在一些實施例中,LNP具有約60-100 nm之直徑。在一些實施例中,LNP具有約75-80 nm之直徑。In some embodiments, the LNP has a diameter of at least about 20 nm, 30 nm, 40 nm, 50 nm, 60 nm, 70 nm, 80 nm, or 90 nm. In some embodiments, the LNP has a diameter of less than about 100 nm, 110 nm, 120 nm, 130 nm, 140 nm, 150 nm, or 160 nm. In some embodiments, the LNP has a diameter of less than about 100 nm. In some embodiments, the LNP has a diameter of less than about 90 nm. In some embodiments, the LNP has a diameter of less than about 80 nm. In some embodiments, the LNP has a diameter of about 60-100 nm. In some embodiments, the LNP has a diameter of about 75-80 nm.

在一些實施例中,本揭示案之脂質奈米顆粒組合物係根據調配物中之組分脂質各自的莫耳比來描述。作為非限制性實例,可離子化脂質之mol-%可為約10 mol-%至約80 mol-%。作為非限制性實例,可離子化脂質之mol-%可為約20 mol-%至約70 mol-%。作為非限制性實例,可離子化脂質之mol-%可為約30 mol-%至約60 mol-%。作為非限制性實例,可離子化脂質之mol-%可為約35 mol-%至約55 mol-%。作為非限制性實例,可離子化脂質之mol-%可為約40 mol-%至約50 mol-%。In some embodiments, the lipid nanoparticle compositions of the present disclosure are described according to the molar ratio of the component lipids in the formulation. As a non-limiting example, the mol% of the ionizable lipids can be about 10 mol% to about 80 mol%. As a non-limiting example, the mol% of the ionizable lipids can be about 20 mol% to about 70 mol%. As a non-limiting example, the mol% of the ionizable lipids can be about 30 mol% to about 60 mol%. As a non-limiting example, the mol% of the ionizable lipids can be about 35 mol% to about 55 mol%. As a non-limiting example, the mol% of the ionizable lipids can be about 40 mol% to about 50 mol%.

在一些實施例中,磷脂之mol-%可為約1 mol-%至約50 mol-%。在一些實施例中,磷脂之mol-%可為約2 mol-%至約45 mol-%。在一些實施例中,磷脂之mol-%可為約3 mol-%至約40 mol-%。在一些實施例中,磷脂之mol-%可為約4 mol-%至約35 mol-%。在一些實施例中,磷脂之mol-%可為約5 mol-%至約30 mol-%。在一些實施例中,磷脂之mol-%可為約10 mol-%至約20 mol-%。在一些實施例中,磷脂之mol-%可為約5 mol-%至約20 mol-%。In some embodiments, the mol% of phospholipids may be about 1 mol% to about 50 mol%. In some embodiments, the mol% of phospholipids may be about 2 mol% to about 45 mol%. In some embodiments, the mol% of phospholipids may be about 3 mol% to about 40 mol%. In some embodiments, the mol% of phospholipids may be about 4 mol% to about 35 mol%. In some embodiments, the mol% of phospholipids may be about 5 mol% to about 30 mol%. In some embodiments, the mol% of phospholipids may be about 10 mol% to about 20 mol%. In some embodiments, the mol% of phospholipids may be about 5 mol% to about 20 mol%.

在一些實施例中,結構脂質之mol-%可為約10 mol-%至約80 mol-%。在一些實施例中,結構脂質之mol-%可為約20 mol-%至約70 mol-%。在一些實施例中,結構脂質之mol-%可為約30 mol-%至約60 mol-%。在一些實施例中,結構脂質之mol-%可為約35 mol-%至約55 mol-%。在一些實施例中,結構脂質之mol-%可為約40 mol-%至約50 mol-%。In some embodiments, the mol% of structural lipids may be about 10 mol% to about 80 mol%. In some embodiments, the mol% of structural lipids may be about 20 mol% to about 70 mol%. In some embodiments, the mol% of structural lipids may be about 30 mol% to about 60 mol%. In some embodiments, the mol% of structural lipids may be about 35 mol% to about 55 mol%. In some embodiments, the mol% of structural lipids may be about 40 mol% to about 50 mol%.

在一些實施例中,PEG脂質之mol-%可為約0.1 mol-%至約10 mol-%。在一些實施例中,PEG脂質之mol-%可為約0.2 mol-%至約5 mol-%。在一些實施例中,PEG脂質之mol-%可為約0.5 mol-%至約3 mol-%。在一些實施例中,PEG脂質之mol-%可為約1 mol-%至約2 mol-%。在一些實施例中,PEG脂質之mol-%可為約1.5 mol-%。在一些實施例中,PEG脂質之mol-%可為約2.5 mol-%。 i. 可離子化脂質 In some embodiments, the mol% of PEG lipids may be about 0.1 mol% to about 10 mol%. In some embodiments, the mol% of PEG lipids may be about 0.2 mol% to about 5 mol%. In some embodiments, the mol% of PEG lipids may be about 0.5 mol% to about 3 mol%. In some embodiments, the mol% of PEG lipids may be about 1 mol% to about 2 mol%. In some embodiments, the mol% of PEG lipids may be about 1.5 mol%. In some embodiments, the mol% of PEG lipids may be about 2.5 mol%. i. Ionizable lipids

在一些實施例中,本文所揭示之LNP包含可離子化脂質。在一些實施例中,LNP包含兩種或兩種以上可離子化脂質。In some embodiments, the LNP disclosed herein comprises an ionizable lipid. In some embodiments, the LNP comprises two or more ionizable lipids.

下文描述本揭示案之多種例示性可離子化脂質。Various exemplary ionizable lipids of the present disclosure are described below.

在一些實施例中,本揭示案之LNP包含以下一者所揭示之可離子化脂質:US 2023/0053437;US 2019/0240354;US 2010/0130588;US 2021/0087135;WO 2021/204179;US 2021/0128488;US 2020/0121809;US 2017/0119904;US 2013/0108685;US 2013/0195920;US 2015/0005363;US 2014/0308304;US 2013/0053572;WO 2019/232095A1;WO 2021/077067;WO 2019/152557;US 2017/0210697;或WO 2019/089828A1,其中每一個均以引用之方式整體併入本文中。In some embodiments, the LNP of the present disclosure comprises an ionizable lipid disclosed in one of the following: US 2023/0053437; US 2019/0240354; US 2010/0130588; US 2021/0087135; WO 2021/204179; US 2021/0128488; US 2020/0121809; US 2017/0119904; US 2013/0108685; US 2013/0195920; US 2015/0005363; US 2014/0308304; US 2013/0053572; WO 2019/232095A1; WO 2021/077067; WO 2019/152557; US 2017/0210697; or WO 2019/089828A1, each of which is incorporated herein by reference in its entirety.

在一些實施例中,本文所述之LNP包含美國申請公開案US2017/0119904中所揭示之脂質,例如可離子化脂質,該案以引用之方式整體併入本文中。In some embodiments, the LNPs described herein comprise lipids, such as ionizable lipids, disclosed in U.S. Application Publication No. US2017/0119904, which is incorporated herein by reference in its entirety.

在一些實施例中,本文所述之LNP包含PCT申請公開案WO2021/204179中所揭示之脂質,例如可離子化脂質,該案以引用之方式整體併入本文中。In some embodiments, the LNPs described herein comprise lipids, such as ionizable lipids, disclosed in PCT application publication WO2021/204179, which is incorporated herein by reference in its entirety.

在一些實施例中,本文所述之LNP包含PCT申請案WO2022/251665A1中所揭示之脂質,例如可離子化脂質,該案以引用之方式整體併入本文中。在一些實施例中,本文所述之LNP包含表Z之可離子化脂質: 表Z – 例示性可離子化脂質 化合物# 結構 L-1 L-2 L-3 L-4 L-5 L-6 L-7 L-8 L-9 L-10 In some embodiments, the LNPs described herein comprise lipids disclosed in PCT application WO2022/251665A1, such as ionizable lipids, which are incorporated herein by reference in their entirety. In some embodiments, the LNPs described herein comprise ionizable lipids of Table Z: Table Z - Exemplary ionizable lipids Compound# Structure L-1 L-2 L-3 L-4 L-5 L-6 L-7 L-8 L-9 L-10

在一些實施例中,可離子化脂質為MC3。In some embodiments, the ionizable lipid is MC3.

在一些實施例中,本揭示案之LNP包含PCT申請公開案WO2023044343A1中所揭示之可離子化脂質,該案以引用之方式整體併入本文中。 式(VII-A) In some embodiments, the LNP of the present disclosure comprises an ionizable lipid disclosed in PCT application publication WO2023044343A1, which is incorporated herein by reference in its entirety. Formula (VII-A)

在一些實施例中,本揭示案之脂質具有式(VII-A)之結構: (VII-A), 或其醫藥學上可接受之鹽,其中: A為-N(-X 1R 1)-、-C(R ')(-L 1-N(R")R 6)-、-C(R')(-OR 7a)-、-C(R')(-N(R")R 8a)-、-C(R')(-C(=O)OR 9a)-、-C(R')(-C(=O)N(R")R 10a)-或-C(=N-R 11a)-; T為-X 2a-Y 1a-Q 1a或-X 3-C(=O)OR 4; X 1為視情況經取代之C 2-C 6伸烷基; R 1為-OH、-R 1a, Z 1為視情況經取代之C 1-C 6烷基; Z 1a為氫或視情況經取代之C 1-C 6烷基; X 2及X 2a獨立地為視情況經取代之C 2-C 14伸烷基或視情況經取代之C 2-C 14伸烯基; X 3為視情況經取代之C 2-C 14伸烷基或視情況經取代之C 2-C 14伸烯基; (i) Y 1; 其中標有「*」之鍵連接至X 2; Y 1a; 其中標有「*」之鍵連接至X 2a; 每個Z 2獨立地為H或視情況經取代之C 1-C 8烷基; 每個Z 3獨立地為視情況經取代之C 1-C 6伸烷基; Q 1為-NR 2R 3、-CH(OR 2)(OR 3)、-CR 2=C(R 3)(R 12)或-C(R 2)(R 3)(R 12); Q 1a為-NR 2'R 3'、-CH(OR 2')(OR 3')、-CR 2=C(R 3)(R 12)或-C(R 2')(R 3')(R 12');或 (ii) Y 1, 其中標有「*」之鍵連接至X 2; Y 1a, 其中標有「*」之鍵連接至X 2a; 每個Z 2獨立地為H或視情況經取代之C 1-C 8烷基; 每個Z 3獨立地為視情況經取代之C 1-C 6伸烷基; Q 1為-NR 2R 3; Q 1a為-NR 2'R 3'; R 2、R 3及R 12獨立地為氫、視情況經取代之C 1-C 14烷基、視情況經取代之C 2-C 14伸烯基或-(CH 2) m-G-(CH 2) nH; R 2'、R 3'及R 12'獨立地為氫、視情況經取代之C 1-C 14烷基、視情況經取代之C 2-C 14伸烯基或-(CH 2) m-G-(CH 2) nH; G為C 3-C 8伸環烷基; 每個m獨立地為0、1、2、3、4、5、6、7、8、9、10、11或12; 每個n独立地為0、1、2、3、4、5、6、7、8、9、10、11或12; X 3為視情況經取代之C 2-C 14伸烷基; R 4為視情況經取代之C 4-C 14烷基; L 1為C 1-C 8伸烷基; R 6為C 1-C 6烷基、(羥基)C 1-C 6烷基或(胺基)C 1-C 6烷基 R 7a為-C(=O)N(R'")R 7b、-C(=S)N(R'")R 7b、-N=C(R 7b)(R 7c)或 ; R 7b為C 1-C 6烷基、(羥基)C 1-C 6烷基或(胺基)C 1-C 6烷基; R 7c為氫或C 1-C 6烷基; R 8a為-C(=O)N(R'")R 8b、-C(=S)N(R'")R 8b、-N=C(R 8b)(R 8c)或 , R 8b為C 1-C 6烷基、(羥基)C 1-C 6烷基或(胺基)C 1-C 6烷基; R 8c為氫或C 1-C 6烷基; R 9a為-N=C(R 9b)(R 9c); R 9b為C 1-C 6烷基、(羥基)C 1-C 6烷基或(胺基)C 1-C 6烷基; R 9c為氫或C 1-C 6烷基; R 10a為-N=C(R 10b)(R 10c); R 10b為C 1-C 6烷基、(羥基)C 1-C 6烷基或(胺基)C 1-C 6烷基; R 10c為氫或C 1-C 6烷基; R 11a為-OR 11b、-N(R")R 11b、-OC(=O)R 11b或-N(R")C(=O)R 11b; R 11b為C 1-C 6烷基、(羥基)C 1-C 6烷基或(胺基)C 1-C 6烷基; R'為氫或C 1-C 6烷基; R"為氫或C 1-C 6烷基;且 R'"為氫或C 1-C 6烷基。 式(VIII-A) In some embodiments, the lipids of the present disclosure have the structure of formula (VII-A): (VII-A), or a pharmaceutically acceptable salt thereof, wherein: A is -N(-X 1 R 1 )-, -C(R ' )(-L 1 -N(R")R 6 )-, -C(R')(-OR 7a )-, -C(R')(-N(R")R 8a )-, -C(R')(-C(=O)OR 9a )-, -C(R')(-C(=O)N(R")R 10a )-, or -C(=NR 11a )-; T is -X 2a -Y 1a -Q 1a or -X 3 -C(=O)OR 4 ; X 1 is an optionally substituted C 2 -C 6 alkylene group; R 1 is -OH, -R 1a , or , Z1 is an optionally substituted C1 - C6 alkyl group; Z1a is hydrogen or an optionally substituted C1 - C6 alkyl group; X2 and X2a are independently an optionally substituted C2 - C14 alkylene group or an optionally substituted C2 - C14 alkenylene group; X3 is an optionally substituted C2 - C14 alkylene group or an optionally substituted C2 - C14 alkenylene group; (i) Y1 is , , , , , or ; The key marked with "*" is connected to X 2 ; Y 1a is , , , , , or ; wherein the bond marked with "*" is connected to X 2a ; each Z 2 is independently H or an optionally substituted C 1 -C 8 alkyl group; each Z 3 is independently an optionally substituted C 1 -C 6 alkylene group; Q 1 is -NR 2 R 3 , -CH(OR 2 )(OR 3 ), -CR 2 ═C(R 3 )(R 12 ) or -C(R 2 )(R 3 )(R 12 ); Q 1a is -NR 2' R 3' , -CH(OR 2' )(OR 3' ), -CR 2 ═C(R 3 )(R 12 ) or -C(R 2' )(R 3' )(R 12' ); or (ii) Y 1 is , , or , where the key marked with "*" is connected to X 2 ; Y 1a is , , or , wherein the bond marked with "*" is connected to X 2a ; each Z 2 is independently H or an optionally substituted C 1 -C 8 alkyl group; each Z 3 is independently an optionally substituted C 1 -C 6 alkylene group; Q 1 is -NR 2 R 3 ; Q 1a is -NR 2' R 3' ; R 2 , R 3 and R 12 are independently hydrogen, an optionally substituted C 1 -C 14 alkyl group, an optionally substituted C 2 -C 14 alkenyl group or -(CH 2 ) m -G-(CH 2 ) n H; R 2' , R 3' and R 12' are independently hydrogen, an optionally substituted C 1 -C 14 alkyl group, an optionally substituted C 2 -C each n is independently 0, 1, 2 , 3 , 4, 5, 6, 7 , 8, 9, 10, 11 or 12; X3 is an optionally substituted C2-C14 alkylene group ; R4 is an optionally substituted C4 - C14 alkylene group; L1 is a C1 - C8 alkylene group; R6 is a C1 - C6 alkyl group, a (hydroxy) C1 - C6 alkyl group or a (amino) C1 - C6 alkyl group; R7a is -C(=O)N(R'") R7b , -C(=S)N(R'")R 7b , -N=C(R 7b )(R 7c ) or ; R 7b is C 1 -C 6 alkyl, (hydroxy) C 1 -C 6 alkyl or (amino) C 1 -C 6 alkyl; R 7c is hydrogen or C 1 -C 6 alkyl; R 8a is -C(=O)N(R'")R 8b , -C(=S)N(R'")R 8b , -N=C(R 8b )(R 8c ) or , R 8b is C 1 -C 6 alkyl, (hydroxy) C 1 -C 6 alkyl or (amino) C 1 -C 6 alkyl; R 8c is hydrogen or C 1 -C 6 alkyl; R 9a is -N=C(R 9b )(R 9c ); R 9b is C 1 -C 6 alkyl, (hydroxy) C 1 -C 6 alkyl or (amino) C 1 -C 6 alkyl; R 9c is hydrogen or C 1 -C 6 alkyl; R 10a is -N=C(R 10b )(R 10c ); R 10b is C 1 -C 6 alkyl, (hydroxy) C 1 -C 6 alkyl or (amino) C 1 -C 6 alkyl; R 10c is hydrogen or C 1 -C 6 alkyl; R 11a is -OR 11b , -N(R")R 11b , -OC(=O)R 11b or -N(R")C(=O)R 11b ; R 11b is C 1 -C 6 alkyl, (hydroxy)C 1 -C 6 alkyl or (amino)C 1 -C 6 alkyl; R' is hydrogen or C 1 -C 6 alkyl; R" is hydrogen or C 1 -C 6 alkyl; and R'" is hydrogen or C 1 -C 6 alkyl. Formula (VIII-A)

在一些實施例中,本揭示案之脂質具有式(VII-A)之結構,其中本揭示案之脂質具有式(VIII-A)之結構: (VIII-A), 或其醫藥學上可接受之鹽。 式(VII-B) In some embodiments, the lipid of the present disclosure has a structure of formula (VII-A), wherein the lipid of the present disclosure has a structure of formula (VIII-A): (VIII-A), or a pharmaceutically acceptable salt thereof. Formula (VII-B)

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構: (VII-B), 或其醫藥學上可接受之鹽,其中: A為-C(R ')(-L 1-N(R")R 6)-、-C(R')(-OR 7a)-、-C(R')(-N(R")R 8a)-、-C(R')(-C(=O) OR 9a)-、-C(R')(-C(=O)N(R")R 10a)-或-C(=N-R 11a)-; T為-X 2a-Y 1a-Q 1a或-X 3-C(=O)OR 4; X 2及X 2a獨立地為視情況經取代之C 2-C 14伸烷基或視情況經取代之C 2-C 14伸烯基; X 3為視情況經取代之C 1-C 14伸烷基或視情況經取代之C 2-C 14伸烯基; Y 1, 其中標有「*」之鍵連接至X 2; Y 1a, 其中標有「*」之鍵連接至X 2a; 每個Z 3獨立地為視情況經取代之C 1-C 6伸烷基或視情況經取代之C 2-C 14伸烯基; Q 1為-NR 2R 3、-CH(OR 2)(OR 3)、-CR 2=C(R 3)(R 12)或-C(R 2)(R 3)(R 12); Q 1a為-NR 2'R 3'、-CH(OR 2')(OR 3')、-CR 2=C(R 3)(R 12)或-C(R 2')(R 3')(R 12'); R 2、R 3及R 12獨立地為氫、視情況經取代之C 1-C 14烷基、視情況經取代之C 2-C 14伸烯基或-(CH 2) m-G-(CH 2) nH; R 2'、R 3'及R 12'獨立地為氫、視情況經取代之C 1-C 14烷基、視情況經取代之C 2-C 14伸烯基或-(CH 2) m-G-(CH 2) nH; G為C 3-C 8伸環烷基; 每個m獨立地為0、1、2、3、4、5、6、7、8、9、10、11或12; 每個n独立地為0、1、2、3、4、5、6、7、8、9、10、11或12; X 3為視情況經取代之C 2-C 14伸烷基; R 4為視情況經取代之C 4-C 14烷基; L 1為C 1-C 8伸烷基; R 6為(羥基)C 1-C 6烷基或(胺基)C 1-C 6烷基。 R 7a為-C(=O)N(R'")R 7b、-C(=S)N(R'")R 7b、-N=C(R 7b)(R 7c)、 ; Z 1為視情況經取代之C 1-C 6烷基; R 10為C 1-C 6伸烷基; R 7b為C 1-C 6烷基、(羥基)C 1-C 6烷基或(胺基)C 1-C 6烷基; R 7c為氫或C 1-C 6烷基; R 8a為-C(=O)N(R'")R 8b、-C(=S)N(R'")R 8b、-N=C(R 8b)(R 8c)、 ; R 8b為C 1-C 6烷基、(羥基)C 1-C 6烷基或(胺基)C 1-C 6烷基; R 8c為氫或C 1-C 6烷基; R 9a為-N=C(R 9b)(R 9c); R 9b為C 1-C 6烷基、(羥基)C 1-C 6烷基或(胺基)C 1-C 6烷基; R 9c為氫或C 1-C 6烷基; R 10a為-N=C(R 10b)(R 10c); R 10b為C 1-C 6烷基、(羥基)C 1-C 6烷基或(胺基)C 1-C 6烷基; R 10c為氫或C 1-C 6烷基; R 11a為-OR 11b、-N(R")R 11b、-OC(=O)R 11b或-N(R")C(=O)R 11b; R 11b為C 1-C 6烷基、(羥基)C 1-C 6烷基或(胺基)C 1-C 6烷基; R'為氫或C 1-C 6烷基; R"為氫或C 1-C 6烷基;且 R'"為氫或C 1-C 6烷基。 In some embodiments, the lipids of the present disclosure have the structure of formula (VII-B): (VII-B), or a pharmaceutically acceptable salt thereof, wherein: A is -C(R ' )(- L1 -N(R") R6 )-, -C(R')(- OR7a )-, -C(R')(-N(R") R8a )-, -C(R')(-C(=O) OR9a )-, -C(R')(-C(=O)N(R") R10a )-, or -C(= NR11a )-; T is -X2a - Y1a - Q1a or -X3- C(=O) OR4 ; X2 and X2a are independently C2 - C14 alkylene or C2- C14 alkenylene; X3 is C1 -C14 alkylene or C1 -C14 alkenylene; 14 -alkylene or optionally substituted C 2 -C 14 -alkenylene; Y 1 is , , or , where the key marked with "*" is connected to X 2 ; Y 1a is , , or , wherein the bond marked with "*" is connected to X2a ; each Z3 is independently an optionally substituted C1 - C6 alkylene group or an optionally substituted C2 - C14 alkenylene group; Q1 is -NR2R3 , -CH( OR2 )(OR3), -CR2 =C( R3 )( R12 ), or -C( R2 )( R3 )( R12 ); Q1a is -NR2'R3 ' , -CH( OR2' )(OR3 ' ), -CR2 =C( R3 )( R12 ), or -C(R2 ' )( R3' ) ( R12 ' ); R2 , R3 , and R12 are independently hydrogen, an optionally substituted C1- C6 alkylene group, or an optionally substituted C2-C14 alkenylene group; R 2' , R 3' and R 12' are independently hydrogen, optionally substituted C 1 -C 14 alkyl, optionally substituted C 2 -C 14 alkenyl or -(CH 2 ) m -G-(CH 2 ) n H; G is C 3 -C 8 cycloalkylene ; each m is independently 0, 1, 2 , 3, 4, 5, 6, 7, 8 , 9, 10, 11 or 12; each n is independently 0, 1, 2, 3, 4, 5, 6, 7 , 8, 9, 10 , 11 or 12; X 3 is optionally substituted C 2 -C 14 alkylene; R R4 is an optionally substituted C4 - C14 alkyl group; L1 is a C1 - C8 alkylene group; R6 is a (hydroxy) C1 - C6 alkyl group or an (amino) C1 - C6 alkyl group. R7a is -C(=O)N(R'") R7b , -C(=S)N(R'") R7b , -N=C( R7b )( R7c ), , or ; Z 1 is an optionally substituted C 1 -C 6 alkyl group; R 10 is a C 1 -C 6 alkylene group; R 7b is a C 1 -C 6 alkyl group, a (hydroxy) C 1 -C 6 alkyl group or a (amino) C 1 -C 6 alkyl group; R 7c is hydrogen or a C 1 -C 6 alkyl group; R 8a is -C(=O)N(R'")R 8b , -C(=S)N(R'")R 8b , -N=C(R 8b )(R 8c ), or ; R 8b is C 1 -C 6 alkyl, (hydroxy) C 1 -C 6 alkyl or (amino) C 1 -C 6 alkyl; R 8c is hydrogen or C 1 -C 6 alkyl; R 9a is -N=C(R 9b )(R 9c ); R 9b is C 1 -C 6 alkyl, (hydroxy) C 1 -C 6 alkyl or (amino) C 1 -C 6 alkyl; R 9c is hydrogen or C 1 -C 6 alkyl; R 10a is -N=C(R 10b )(R 10c ); R 10b is C 1 -C 6 alkyl, (hydroxy) C 1 -C 6 alkyl or (amino) C 1 -C 6 alkyl; R 10c is hydrogen or C 1 -C 6 alkyl; R 11a is -OR 11b , -N(R")R 11b , -OC(=O)R 11b or -N(R")C(=O)R 11b ; R 11b is C 1 -C 6 alkyl, (hydroxy)C 1 -C 6 alkyl or (amino)C 1 -C 6 alkyl; R' is hydrogen or C 1 -C 6 alkyl; R" is hydrogen or C 1 -C 6 alkyl; and R'" is hydrogen or C 1 -C 6 alkyl.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中A為-C(R ')(-L 1-N(R")R 6)-。 In some embodiments, the lipids of the present disclosure have the structure of Formula (VII-B), wherein A is -C(R ' )(- L1 -N(R") R6 )-.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中A為-C(R')(-OR 7a)-。 In some embodiments, the lipids of the present disclosure have the structure of Formula (VII-B), wherein A is -C(R')(-OR 7a )-.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中A為-C(R')(-N(R")R 8a)。 In some embodiments, the lipids of the present disclosure have the structure of Formula (VII-B), wherein A is -C(R')(-N(R")R 8a ).

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中A為-C(R')(-C(=O)OR 9a)。 In some embodiments, the lipids of the present disclosure have the structure of Formula (VII-B), wherein A is -C(R')(-C(=O)OR 9a ).

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中A為-C(R')(-C(=O)N(R")R 10a)-。 In some embodiments, the lipids of the present disclosure have the structure of Formula (VII-B), wherein A is -C(R')(-C(=O)N(R")R 10a )-.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中A為-C(=N-R 11a)-。 In some embodiments, the lipids of the present disclosure have a structure of Formula (VII-B), wherein A is -C(=NR 11a )-.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中T為-X 2a-Y 1a-Q 1aIn some embodiments, the lipids of the present disclosure have the structure of Formula (VII-B), wherein T is -X 2a -Y 1a -Q 1a .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中T為-X 3-C(=O)OR 4In some embodiments, the lipids of the present disclosure have the structure of Formula (VII-B), wherein T is -X 3 -C(=O)OR 4 .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中X 2及/或X 2a為視情況經取代之C 2-C 14伸烷基(例如,C 2-C 10伸烷基、C 2-C 8伸烷基、C 2、C 3、C 4、C 5、C 6、C 7或C 8伸烷基)。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中X 2為C 2-C 14伸烷基。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中X 2a為C 2-C 14伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein X 2 and/or X 2a is an optionally substituted C 2 -C 14 alkylene (e.g., C 2 -C 10 alkylene, C 2 -C 8 alkylene, C 2 , C 3 , C 4 , C 5 , C 6 , C 7 or C 8 alkylene). In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein X 2 is a C 2 -C 14 alkylene. In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein X 2a is a C 2 -C 14 alkylene.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Y 1及/或Y 1aIn some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein Y 1 and/or Y 1a is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Y 1In some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein Y 1 is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Y 1aIn some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein Y 1a is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Y 1及/或Y 1aIn some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein Y 1 and/or Y 1a is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Y 1In some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein Y 1 is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Y 1aIn some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein Y 1a is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Y 1及/或Y 1aIn some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein Y 1 and/or Y 1a is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Y 1In some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein Y 1 is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Y 1aIn some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein Y 1a is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Y 1及/或Y 1aIn some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein Y 1 and/or Y 1a is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Y 1In some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein Y 1 is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Y 1aIn some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein Y 1a is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Q 1及/或Q 1a為-C(R 2')(R 3')(R 12')。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Q 1為-C(R 2')(R 3')(R 12')。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中Q 1a為-C(R 2')(R 3')(R 12')。 In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein Q 1 and/or Q 1a is -C(R 2' )(R 3' )(R 12' ). In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein Q 1 is -C(R 2' )(R 3' )(R 12' ). In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein Q 1a is -C(R 2' )(R 3' )(R 12' ).

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中X 3為視情況經取代之C 1-C 14伸烷基(例如,C 1-C 6、C 1-C 4伸烷基)。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中X 3為C 1-C 14伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein X 3 is an optionally substituted C 1 -C 14 alkylene group (e.g., C 1 -C 6 , C 1 -C 4 alkylene group). In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein X 3 is a C 1 -C 14 alkylene group.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 2、R 3、R 12、R 2'、R 3'及/或R 12’為氫。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 2為氫。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 3為氫。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 12為氫。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 2’為氫。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 3’為氫。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 12’為氫。 In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 2 , R 3 , R 12 , R 2' , R 3' and/or R 12' are hydrogen. In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 2 is hydrogen. In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 3 is hydrogen. In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 12 is hydrogen. In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 2' is hydrogen. In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 3' is hydrogen. In some embodiments, the lipids of the present disclosure have the structure of Formula (VII-B), wherein R 12′ is hydrogen.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 2、R 3、R 12、R 2'、R 3'及/或R 12'為視情況經取代之C 1-C 14烷基(例如,C 4-C 10烷基、C 5、C 6、C 7、C 8、C 9烷基)。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 2為C 4-C 10烷基。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 3為C 4-C 10烷基。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 12為C 4-C 10烷基。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 2’為C 4-C 10烷基。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 3’為C 4-C 10烷基。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 12’為C 4-C 10烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 2 , R 3 , R 12 , R 2' , R 3' and/or R 12' are optionally substituted C 1 -C 14 alkyl groups (e.g., C 4 -C 10 alkyl groups, C 5 , C 6 , C 7 , C 8 , C 9 alkyl groups). In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 2 is a C 4 -C 10 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 3 is a C 4 -C 10 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 12 is a C 4 -C 10 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 2' is a C 4 -C 10 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 3' is a C 4 -C 10 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 12' is a C 4 -C 10 alkyl group.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 4為視情況經取代之C 4-C 14烷基(例如,C 8-C 14烷基、直鏈C 8-C 14烷基、C 8、C 9、C 10、C 11、C 12、C 13或C 14烷基)。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 4為直鏈C 8-C 14烷基。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 4為直鏈C 11烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 4 is an optionally substituted C 4 -C 14 alkyl group (e.g., C 8 -C 14 alkyl group, linear C 8 -C 14 alkyl group, C 8 , C 9 , C 10 , C 11 , C 12 , C 13 or C 14 alkyl group). In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 4 is a linear C 8 -C 14 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 4 is a linear C 11 alkyl group.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中L 1為C 1-C 3伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein L 1 is a C 1 -C 3 alkylene group.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 6為(羥基)C 1-C 6烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 6 is (hydroxy) C 1 -C 6 alkyl.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 7a。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 7a。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 7aIn some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 7a is or In some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein R 7a is In some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein R 7a is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 7a選自由-C(=O)N(R'")R 7b、-C(=S)N(R'")R 7b及-N=C(R 7b)(R 7c)組成之群。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 7a為-C(=O)N(R'")R 7b。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 7a為-C(=S)N(R'")R 7b。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 7a為-N=C(R 7b)(R 7c)。 In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 7a is selected from the group consisting of -C(=O)N(R'")R 7b , -C(=S)N(R'")R 7b , and -N=C(R 7b )(R 7c ). In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 7a is -C(=O)N(R'")R 7b . In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 7a is -C(=S)N(R'")R 7b . In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 7a is -N=C(R 7b )(R 7c ).

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 8a選自由-C(=O)N(R'")R 8b、-C(=S)N(R'")R 8b及-N=C(R 8b)(R 8c)組成之群。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 8a為-C(=O)N(R'")R 8b。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 8a為-C(=S)N(R'")R 8b。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 8a為-N=C(R 8b)(R 8c)。 In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 8a is selected from the group consisting of -C(=O)N(R'")R 8b , -C(=S)N(R'")R 8b , and -N=C(R 8b )(R 8c ). In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 8a is -C(=O)N(R'")R 8b . In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 8a is -C(=S)N(R'")R 8b . In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 8a is -C(=S)N(R ' " )R 8b .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 8aIn some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 8a is .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 9b為(羥基)C 1-C 6烷基。 In some embodiments, the lipids of the present disclosure have the structure of Formula (VII-B), wherein R 9b is (hydroxy)C 1 -C 6 alkyl.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 10b為(胺基)C 1-C 6烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 10b is (amino) C 1 -C 6 alkyl.

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 11a為-OR 11b或-OC(=O)R 11b。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 11a為-OR 11b。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 11a為-OC(=O)R 11bIn some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 11a is -OR 11b or -OC(=O)R 11b . In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 11a is -OR 11b . In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 11a is -OC(=O)R 11b .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 11a為-N(R")R 11b或-N(R")C(=O)R 11b。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 11a為-N(R")R 11b。在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 11a為-N(R")C(=O)R 11bIn some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 11a is -N(R")R 11b or -N(R")C(=O)R 11b . In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 11a is -N(R")R 11b . In some embodiments, the lipids of the present disclosure have a structure of formula (VII-B), wherein R 11a is -N(R")C(=O)R 11b .

在一些實施例中,本揭示案之脂質具有式(VII-B)之結構,其中R 11b為(胺基)C 1-C 6烷基。 式(III-C) In some embodiments, the lipid of the present disclosure has a structure of formula (VII-B), wherein R 11b is (amino) C 1 -C 6 alkyl. Formula (III-C)

在一些實施例中,本揭示案之脂質具有式(III-C)之結構: (III-C), 或其醫藥學上可接受之鹽,其中 R 20為C 1-C 6伸烷基-NR 20'C(O)OR 20''; R 20'為氫或視情況經取代之C 1-C 6烷基; R 20''為視情況經取代之C 1-C 6烷基、苯基或苄基; Z 1為視情況經取代之C 1-C 6烷基; X 2及X 2a獨立地為視情況經取代之C 2-C 14伸烷基; Y 1及Y 1a獨立地為 ; 其中標有「*」之鍵連接至X 2或X 2a; Z 3獨立地為視情況經取代之C 2-C 6伸烷基; R 2及R 3獨立地為視情況經取代之C 4-C 14烷基;且 R 2'及R 3'獨立地為視情況經取代之C 4-C 14烷基。 In some embodiments, the lipids of the present disclosure have the structure of formula (III-C): (III-C), or a pharmaceutically acceptable salt thereof, wherein R 20 is C 1 -C 6 alkylene-NR 20' C(O)OR 20'' ; R 20' is hydrogen or an optionally substituted C 1 -C 6 alkylene; R 20'' is an optionally substituted C 1 -C 6 alkylene, phenyl or benzyl; Z 1 is an optionally substituted C 1 -C 6 alkylene; X 2 and X 2a are independently an optionally substituted C 2 -C 14 alkylene; Y 1 and Y 1a are independently or ; wherein the bond marked with "*" is connected to X 2 or X 2a ; Z 3 is independently an optionally substituted C 2 -C 6 alkylene group; R 2 and R 3 are independently an optionally substituted C 4 -C 14 alkyl group; and R 2' and R 3' are independently an optionally substituted C 4 -C 14 alkyl group.

在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中R 20為-CH 2CH 2CH 2NHC(O)O-第三丁基或-CH 2CH 2CH 2NHC(O)O-苄基。在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中R 20為-CH 2CH 2CH 2NHC(O)O-第三丁基。在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中R 20為-CH 2CH 2CH 2NHC(O)O-苄基。 In some embodiments, the lipids of the present disclosure have a structure of formula (III-C), wherein R 20 is -CH 2 CH 2 CH 2 NHC(O)O-tert-butyl or -CH 2 CH 2 CH 2 NHC(O)O-benzyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-C), wherein R 20 is -CH 2 CH 2 CH 2 NHC(O)O-tert-butyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-C), wherein R 20 is -CH 2 CH 2 CH 2 NHC(O)O-tert-butyl.

在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中X 2及X 2a獨立地為C 4-C 8伸烷基(例如,C 5、C 6、C 7伸烷基)。在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中X 2為C 6烷基。在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中X 2a為C 6烷基 In some embodiments, the lipids of the present disclosure have a structure of formula (III-C), wherein X 2 and X 2a are independently C 4 -C 8 alkylene (e.g., C 5 , C 6 , C 7 alkylene). In some embodiments, the lipids of the present disclosure have a structure of formula (III-C), wherein X 2 is a C 6 alkylene. In some embodiments, the lipids of the present disclosure have a structure of formula (III-C), wherein X 2a is a C 6 alkylene.

在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中Y 1及Y 1a,其中Z 3為C 2-C 4伸烷基(例如,C 2伸烷基)。在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中Y 1,其中Z 3為C 2-C 4伸烷基(例如,C 2伸烷基)。在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中Y 1a,其中Z 3為C 2-C 4伸烷基(例如,C 2伸烷基)。 In some embodiments, the lipid of the present disclosure has a structure of formula (III-C), wherein Y 1 and Y 1a are , wherein Z 3 is C 2 -C 4 alkylene (e.g., C 2 alkylene). In some embodiments, the lipid of the present disclosure has a structure of formula (III-C), wherein Y 1 is , wherein Z 3 is C 2 -C 4 alkylene (e.g., C 2 alkylene). In some embodiments, the lipid of the present disclosure has a structure of formula (III-C), wherein Y 1a is , wherein Z 3 is C 2 -C 4 alkylene (eg, C 2 alkylene).

在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中R2、R3、R2'及R3’獨立地為視情況經取代之C4-C10烷基(例如,C6-C9烷基、C6、C7、C8、C9烷基)。在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中R2為C6-C9烷基。在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中R3為C6-C9烷基。在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中R 2’為C 6-C 9烷基。在一些實施例中,本揭示案之脂質具有式(III-C)之結構,其中R 3’為C 6-C 9烷基。 式(III-D) In some embodiments, the lipids of the present disclosure have a structure of formula (III-C), wherein R2, R3, R2' and R3' are independently optionally substituted C4-C10 alkyl (e.g., C6-C9 alkyl, C6, C7, C8, C9 alkyl). In some embodiments, the lipids of the present disclosure have a structure of formula (III-C), wherein R2 is a C6-C9 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-C), wherein R3 is a C6-C9 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-C), wherein R 2' is a C 6 -C 9 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-C), wherein R 3' is a C 6 -C 9 alkyl. Formula (III-D)

在一些實施例中,本揭示案之脂質具有式(III-D)之結構: (III-D), 或其醫藥學上可接受之鹽,其中 R 1為-OH; X 1為視情況經取代之C 4伸烷基; X 2及X 2a獨立地為視情況經取代之C 2-C 14伸烷基; Y 1及Y 1a獨立地為 ; Z 3獨立地為視情況經取代之C 2-C 6伸烷基; R 2及R 3獨立地為視情況經取代之C 4-C 14烷基或經視情況經取代之環丙基取代的C 1-C 2烷基;或 R 2及R 3獨立地為視情況經取代之C 4-C 14烷基或經視情況經取代之環丙基取代的C 1-C 2烷基。 In some embodiments, the lipids of the present disclosure have the structure of formula (III-D): (III-D), or a pharmaceutically acceptable salt thereof, wherein R 1 is -OH; X 1 is an optionally substituted C 4 alkylene group; X 2 and X 2a are independently an optionally substituted C 2 -C 14 alkylene group; Y 1 and Y 1a are independently or ; Z 3 is independently an optionally substituted C 2 -C 6 alkylene group; R 2 and R 3 are independently an optionally substituted C 4 -C 14 alkyl group or an optionally substituted C 1 -C 2 alkyl group substituted with a cyclopropyl group; or R 2 and R 3 are independently an optionally substituted C 4 -C 14 alkyl group or an optionally substituted C 1 -C 2 alkyl group substituted with a cyclopropyl group.

在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中X 1為C 4伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein X 1 is a C 4 alkylene group.

在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中X 2及X 2a獨立地為視情況經取代會C 4-C 10伸烷基(例如,C 5、C 6、C 7、C 8、C 9或C 10伸烷基)。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中X 2為C 4-C 10伸烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中X 2a為C 4-C 10伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein X 2 and X 2a are independently optionally substituted C 4 -C 10 alkylene (e.g., C 5 , C 6 , C 7 , C 8 , C 9 or C 10 alkylene). In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein X 2 is C 4 -C 10 alkylene. In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein X 2a is C 4 -C 10 alkylene.

在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中Y 1及Y 1a獨立地為 ,其中Z 3獨立地為C 2-C 4伸烷基(例如,C 2、C 4伸烷基)。 In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein Y 1 and Y 1a are independently , wherein Z 3 is independently C 2 -C 4 alkylene (eg, C 2 , C 4 alkylene).

在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 2、R 3、R 2'及R 3’獨立地為C 6-C 14烷基(例如,C 6、C 7、C 8、C 9、C 10、C 11、C 12、C 13或C 14烷基)或經視情況經取代之環丙基取代的C 1-C 2烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 2、R 3、R 2'及R 3’獨立地為C 6-C 14烷基(例如,C 6、C 7、C 8、C 9、C 10、C 11、C 12、C 13或C 14烷基)。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 2為C 6-C 14烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 3為C 6-C 14烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 2’為C 6-C 14烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 3’為C 6-C 14烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 2為由經取代之環丙基取代的C 1-C 2烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 3為由經取代之環丙基取代的C 1-C 2烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 2'為由經取代之環丙基取代的C 1-C 2烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 3'為由經取代之環丙基取代的C 1-C 2烷基 In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 2 , R 3 , R 2′ and R 3′ are independently C 6 -C 14 alkyl (e.g., C 6 , C 7 , C 8 , C 9 , C 10 , C 11 , C 12 , C 13 or C 14 alkyl) or C 1 -C 2 alkyl substituted with an optionally substituted cyclopropyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 2 , R 3 , R 2′ and R 3′ are independently C 6 -C 14 alkyl (e.g., C 6 , C 7 , C 8 , C 9 , C 10 , C 11 , C 12 , C 13 or C 14 alkyl). In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 2 is a C 6 -C 14 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 3 is a C 6 -C 14 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 2' is a C 6 -C 14 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 3' is a C 6 -C 14 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 2 is a C 1 -C 2 alkyl group substituted with a substituted cyclopropyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 3 is a C 1 -C 2 alkyl substituted by a substituted cyclopropyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 2' is a C 1 -C 2 alkyl substituted by a substituted cyclopropyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 3' is a C 1 -C 2 alkyl substituted by a substituted cyclopropyl.

在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 2、R 3、R 2'及R 3’獨立地為經伸環丙基-(C 1-C 6伸烷基,視情況由經C 1-C 6烷基取代之伸環丙基取代)取代之C 1-C 2烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 2為經伸環丙基-(C 1-C 6伸烷基,視情況由經C 1-C 6烷基取代之伸環丙基取代)取代之C 1-C 2烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 3為經伸環丙基-(C 1-C 6伸烷基,視情況由經C 1-C 6烷基取代之伸環丙基取代)取代之C 1-C 2烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 2'為經伸環丙基-(C 1-C 6伸烷基,視情況由經C 1-C 6烷基取代之伸環丙基取代)取代之C 1-C 2烷基。在一些實施例中,本揭示案之脂質具有式(III-D)之結構,其中R 3'為經伸環丙基-(C 1-C 6伸烷基,視情況由經C 1-C 6烷基取代之伸環丙基取代)取代之C 1-C 2烷基。 式(III-E) In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 2 , R 3 , R 2' and R 3' are independently C 1 -C 2 alkyl substituted with cyclopropylene-(C 1 -C 6 alkylene, optionally substituted with cyclopropylene substituted with C 1 -C 6 alkylene). In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 2 is C 1 -C 2 alkyl substituted with cyclopropylene-(C 1 -C 6 alkylene, optionally substituted with cyclopropylene substituted with C 1 -C 6 alkylene). In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 3 is a C 1 -C 2 alkyl substituted with a cyclopropylene-(C 1 -C 6 alkylene, optionally substituted with a cyclopropylene substituted with a C 1 -C 6 alkylene). In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 2' is a C 1 -C 2 alkyl substituted with a cyclopropylene-(C 1 -C 6 alkylene, optionally substituted with a cyclopropylene substituted with a C 1 -C 6 alkylene). In some embodiments, the lipids of the present disclosure have a structure of formula (III-D), wherein R 3′ is a C 1 -C 2 alkyl substituted with a cyclopropyl-(C 1 -C 6 alkylene, optionally substituted with a cyclopropyl substituted with a C 1 -C 6 alkylene). Formula (III-E)

在一些實施例中,本揭示案之脂質具有式(III-E)之結構: (III-E), 或其醫藥學上可接受之鹽,其中 R 1為-OH; X 1為分支鏈C 2-C 8伸烷基 X 2及X 2a獨立地為視情況經取代之C 2-C 14伸烷基; Y 1及Y 1a獨立地為 ; Z 3獨立地為視情況經取代之C 2-C 6伸烷基; R 2及R 3獨立地為視情況經取代之C 4-C 14烷基; R 2'及R 3'獨立地為視情況經取代之C 4-C 14烷基。 In some embodiments, the lipids of the present disclosure have the structure of formula (III-E): (III-E), or a pharmaceutically acceptable salt thereof, wherein R 1 is -OH; X 1 is a branched C 2 -C 8 alkylene group; X 2 and X 2a are independently optionally substituted C 2 -C 14 alkylene groups; Y 1 and Y 1a are independently or ; Z 3 is independently an optionally substituted C 2 -C 6 alkylene group; R 2 and R 3 are independently an optionally substituted C 4 -C 14 alkyl group; R 2' and R 3' are independently an optionally substituted C 4 -C 14 alkyl group.

在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中X 1為分支鏈C 6伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein X 1 is a branched C 6 alkylene group.

在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中X 2及X 2a獨立地為C 4-C 10伸烷基(例如,C 6、C 7、C 8伸烷基)。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中X 2為C 4-C 10伸烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中X 2a為C 4-C 10伸烷基 In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein X 2 and X 2a are independently C 4 -C 10 alkylene (e.g., C 6 , C 7 , C 8 alkylene). In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein X 2 is C 4 -C 10 alkylene. In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein X 2 is C 4 -C 10 alkylene.

在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中Y 1及Y 1a,其中Z 3獨立地為視情況經取代之C 2伸烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中Y 1,其中Z 3獨立地為視情況經取代之C 2伸烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中Y 1a,其中Z 3獨立地為視情況經取代之C 2伸烷基。 In some embodiments, the lipid of the present disclosure has a structure of formula (III-E), wherein Y 1 and Y 1a are , wherein Z 3 is independently an optionally substituted C 2 alkylene group. In some embodiments, the lipid of the present disclosure has a structure of formula (III-E), wherein Y 1 is , wherein Z 3 is independently an optionally substituted C 2 alkylene group. In some embodiments, the lipid of the present disclosure has a structure of formula (III-E), wherein Y 1a is , wherein Z 3 is independently an optionally substituted C 2 alkylene group.

在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 2、R 3、R 2'及R 3’獨立地為C 6-C 12烷基(例如,C 9烷基)或視情況經C 2-C 8伸烯基(例如,C 4、C 6伸烯基)取代之C 4-C 10烷基(例如,C 4 C 6烷基)。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 2為C 6-C 12烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 3為C 6-C 12烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 2’為C 6-C 12烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 3’為C 6-C 12烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 2為視情況經C 2-C 8伸烯基取代之C 4-C 10烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 3為視情況經C 2-C 8伸烯基取代之C 4-C 10烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 2’為視情況經C 2-C 8伸烯基取代之C 4-C 10烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 3’為視情況經C 2-C 8伸烯基取代之C 4-C 10烷基。 式(III-F) In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 2 , R 3 , R 2' and R 3' are independently C 6 -C 12 alkyl (e.g., C 9 alkyl) or C 4 -C 10 alkyl (e.g., C 4 , C 6 alkyl) optionally substituted with C 2 -C 8 alkenyl (e.g., C 4 , C 6 alkenyl). In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 2 is C 6 -C 12 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 3 is C 6 -C 12 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 2' is C 6 -C 12 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 3' is a C 6 -C 12 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 2 is a C 4 -C 10 alkyl group optionally substituted with a C 2 -C 8 alkenyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 3 is a C 4 -C 10 alkyl group optionally substituted with a C 2 -C 8 alkenyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 2' is a C 4 -C 10 alkyl group optionally substituted with a C 2 -C 8 alkenyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 3′ is a C 4 -C 10 alkyl group optionally substituted with a C 2 -C 8 alkenyl group. Formula (III-F)

在一些實施例中,本揭示案之脂質具有式(III-F)之結構: (III-F), 或其醫藥學上可接受之鹽,其中 R 1為-OH; X 1為視情況經取代之C 2-C 6伸烷基; X 2及X 2a獨立地為視情況經取代之C 2-C 14伸烷基; Y 1及Y 1a各自為一鍵; R 2及R 3獨立地為視情況經取代之C 4-C 14烷基;且 R 2'及R 3'獨立地為視情況經取代之C 4-C 14烷基。 In some embodiments, the lipids of the present disclosure have the structure of formula (III-F): (III-F), or a pharmaceutically acceptable salt thereof, wherein R 1 is -OH; X 1 is an optionally substituted C 2 -C 6 alkylene group; X 2 and X 2a are independently an optionally substituted C 2 -C 14 alkylene group; Y 1 and Y 1a are each a bond; R 2 and R 3 are independently an optionally substituted C 4 -C 14 alkylene group; and R 2' and R 3' are independently an optionally substituted C 4 -C 14 alkylene group.

在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中X 1為C 4伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein X 1 is a C 4 alkylene group.

在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中X 2及X 2a獨立地為C 4-C 10伸烷基(例如,C 6-C 8伸烷基、C 6、C 7、C 8伸烷基)。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中X 2為C 4-C 10伸烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中X 2a為C 4-C 10伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein X 2 and X 2a are independently C 4 -C 10 alkylene (e.g., C 6 -C 8 alkylene, C 6 , C 7 , C 8 alkylene). In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein X 2 is C 4 -C 10 alkylene. In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein X 2a is C 4 -C 10 alkylene.

在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 2、R 3、R 2'及R 3’獨立地為C 6-C 10烷基(例如,C 7 C 8烷基)。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 2為C 6-C 10烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 3為C 6-C 10烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 2’為C 6-C 10烷基。在一些實施例中,本揭示案之脂質具有式(III-E)之結構,其中R 3’為C 6-C 10烷基。 式(VIII-B) In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 2 , R 3 , R 2' and R 3' are independently C 6 -C 10 alkyl (e.g., C 7 , C 8 alkyl). In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 2 is C 6 -C 10 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 3 is C 6 -C 10 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 2' is C 6 -C 10 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (III-E), wherein R 3' is C 6 -C 10 alkyl. Formula (VIII-B)

在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構: (VIII-B), 或其醫藥學上可接受之鹽,其中: X 1為一鍵, R 1為C 1-C 6烷基, X 2為C 2-C 6伸烷基, X 2a為C 2-C 14伸烷基, 其中X 2或X 2a經OH或C 1-4伸烷基-OH取代, Y 1, 其中標有「*」之鍵連接至X 2; Y 1a, 其中標有「*」之鍵連接至X 2a; 每個Z 3獨立地為視情況經取代之C 1-C 6伸烷基或視情況經取代之C 2-C 14伸烯基; Q 1為-C(R 2)(R 3)(R 12); Q 1a為-C(R 2')(R 3')(R 12'); R 2、R 3及R 12獨立地為氫、視情況經取代之C 1-C 14烷基或視情況經取代之C 2-C 14伸烯基,且 R 2'、R 3'及R 12'獨立地為氫、視情況經取代之C 1-C 14烷基或視情況經取代之C 2-C 14伸烯基。 In some embodiments, the lipids of the present disclosure have the structure of formula (VIII-B): (VIII-B), or a pharmaceutically acceptable salt thereof, wherein: X1 is a bond, R1 is a C1 - C6 alkyl group, X2 is a C2 - C6 alkylene group, X2a is a C2 - C14 alkylene group, wherein X2 or X2a is substituted with OH or C1-4 alkylene-OH, and Y1 is , , or , where the key marked with "*" is connected to X 2 ; Y 1a is , , or , wherein the bond marked with "*" is connected to X2a ; each Z3 is independently a C1 - C6 alkylene group which is optionally substituted or a C2 - C14 alkenylene group which is optionally substituted; Q1 is -C( R2 )( R3 )( R12 ); Q1a is -C( R2' )(R3 ' )(R12 ' ); R2 , R3 and R12 are independently hydrogen, a C1 - C14 alkyl group which is optionally substituted or a C2 - C14 alkenylene group which is optionally substituted, and R2 ' , R3' and R12 ' are independently hydrogen, a C1 - C14 alkyl group which is optionally substituted or a C2 - C14 alkenylene group which is optionally substituted.

在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中R 1為甲基。 In some embodiments, the lipids of the present disclosure have a structure of formula (VIII-B), wherein R 1 is methyl.

在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中X 2為C 4、C 5或C 6伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (VIII-B), wherein X 2 is C 4 , C 5 or C 6 alkylene.

在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中X 2a為C 4-C 8伸烷基(例如,C 5、C 6或C 7伸烷基)。 In some embodiments, the lipids of the present disclosure have a structure of formula (VIII-B), wherein X 2a is C 4 -C 8 alkylene (eg, C 5 , C 6 or C 7 alkylene).

在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中Y 1,且Y 1a。在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中Y 1。在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中Y 1。在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中Y 1a。在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中Y 1aIn some embodiments, the lipid of the present disclosure has a structure of formula (VIII-B), wherein Y 1 is or , and Y 1a is or In some embodiments, the lipid of the present disclosure has a structure of formula (VIII-B), wherein Y 1 is In some embodiments, the lipid of the present disclosure has a structure of formula (VIII-B), wherein Y 1 is In some embodiments, the lipid of the present disclosure has a structure of formula (VIII-B), wherein Y 1a is In some embodiments, the lipid of the present disclosure has a structure of formula (VIII-B), wherein Y 1a is .

在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中R 2、R 3、R 12、R 2'、R 3'及R 12’獨立地為氫或C 5-C 12烷基(例如,C 6、C 7、C 8、C 9、C 10、C 11烷基)。在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中R 2為氫。在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中R 3為氫。在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中R 2’為氫。在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中R 3’為氫。在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中R 2為C 5-C 12烷基。在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中R 3為C 5-C 12烷基。在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中R 2’為C 5-C 12烷基。在一些實施例中,本揭示案之脂質具有式(VIII-B)之結構,其中R 3’為C 5-C 12烷基。 式(X) In some embodiments, the lipids of the present disclosure have a structure of formula (VIII-B), wherein R 2 , R 3 , R 12 , R 2' , R 3' and R 12' are independently hydrogen or C 5 -C 12 alkyl (e.g., C 6 , C 7 , C 8 , C 9 , C 10 , C 11 alkyl). In some embodiments, the lipids of the present disclosure have a structure of formula (VIII-B), wherein R 2 is hydrogen. In some embodiments, the lipids of the present disclosure have a structure of formula (VIII-B), wherein R 3 is hydrogen. In some embodiments, the lipids of the present disclosure have a structure of formula (VIII-B), wherein R 2' is hydrogen. In some embodiments, the lipids of the present disclosure have a structure of formula (VIII-B), wherein R 3' is hydrogen. In some embodiments, the lipids of the present disclosure have a structure of formula (VIII-B), wherein R 2 is C 5 -C 12 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (VIII-B), wherein R 3 is C 5 -C 12 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (VIII-B), wherein R 2' is C 5 -C 12 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (VIII-B), wherein R 3' is C 5 -C 12 alkyl. Formula (X)

在一些實施例中,本揭示案之脂質具有式(X)之結構: (X), 或其醫藥學上可接受之鹽,其中 每個cc獨立地選自3至9; R xx選自氫及視情況經取代之C 1-C 6烷基;且 (i) ee為1, 每個dd獨立地選自1至4;且 每個R ww獨立地選自由以下組成之群:C 4-C 14烷基、分支鏈C 4-C 12烯基、包含至少兩個雙鍵之C 4-C 12烯基及C 9-C 12烯基,其中該C 4-C 14烷基之任何–(CH 2) 2-可視情況經C 2-C 6伸環烷基置換; (ii) ee為0, 每個dd為1;且 每個R ww為直鏈C 4-C 12烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (X): (X), or a pharmaceutically acceptable salt thereof, wherein each cc is independently selected from 3 to 9; R xx is selected from hydrogen and an optionally substituted C 1 -C 6 alkyl group; and (i) ee is 1, each dd is independently selected from 1 to 4; and each R ww is independently selected from the group consisting of a C 4 -C 14 alkyl group, a branched chain C 4 -C 12 alkenyl group, a C 4 -C 12 alkenyl group containing at least two double bonds, and a C 9 -C 12 alkenyl group, wherein any -(CH 2 ) 2 - of the C 4 -C 14 alkyl group may be optionally replaced by a C 2 -C 6 cycloalkylene group; (ii) ee is 0, each dd is 1; and each R ww is a straight chain C 4 -C 12 alkyl group.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中R xx為H。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中R xx為視情況經取代之C 1-C 6烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中R xx為C 1烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中R xx為C 2烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中R xx為C 3烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中R xx為C 4烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中R xx為C 5烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中R xx為C 6烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein R xx is H. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein R xx is an optionally substituted C 1 -C 6 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein R xx is a C 1 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein R xx is a C 2 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein R xx is a C 3 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein R xx is a C 4 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein R xx is a C 5 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein R xx is a C 6 alkyl group.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww獨立地選自由以下組成之群:C 4-C 14烷基、分支鏈C 4-C 12烯基、包含至少兩個雙鍵之C 4-C 12烯基及C 9-C 12烯基,其中該C 4-C 14烷基之任何–(CH 2) 2-可視情況經C 2-C 6伸環烷基置換。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 4-C 14烷基,其中該C 4-C 14烷基之任何–(CH 2) 2-可視情況經C 2-C 6伸環烷基置換。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 4-C 14烷基,其中該C 4-C 14烷基之任何–(CH 2) 2-可視情況經伸環丙基置換。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為分支鏈C 4-C 12烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為包含至少兩個雙鍵之C 4-C 12烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 9-C 12烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為直鏈C 4-C 12烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww獨立地選自由以下組成之群:C 6-C 14烷基、分支鏈C 8-C 12烯基、包含至少兩個雙鍵之C 8-C 12烯基及C 9-C 12烯基,其中該C 6-C 14烷基之任何–(CH 2) 2-可視情況經伸環丙基置換。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 6-C 14烷基,其中該C 6-C 14烷基之任何–(CH 2) 2-可視情況經伸環丙基置換。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為分支鏈C 8-C 12烯基,例如(直鏈或支鏈C 3-C 5伸烷基)-(分支鏈C 5-C 7烯基),例如(分支鏈C 5伸烷基)-(分支鏈C 5烯基),例如 In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is independently selected from the group consisting of C 4 -C 14 alkyl, branched C 4 -C 12 alkenyl, C 4 -C 12 alkenyl containing at least two double bonds, and C 9 -C 12 alkenyl , wherein any -(CH 2 ) 2 - of the C 4 -C 14 alkyl may be optionally replaced by a C 2 -C 6 cycloalkylene. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 4 -C 14 alkyl, wherein any -(CH 2 ) 2 - of the C 4 -C 14 alkyl may be optionally replaced by a C 2 -C 6 cycloalkylene. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 4 -C 14 alkyl group, wherein any -(CH 2 ) 2 - of the C 4 -C 14 alkyl group may be optionally replaced by a cyclopropene. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a branched chain C 4 -C 12 alkenyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 4 -C 12 alkenyl group containing at least two double bonds. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 9 -C 12 alkenyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a linear C 4 -C 12 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is independently selected from the group consisting of a C 6 -C 14 alkyl group, a branched C 8 -C 12 alkenyl group, a C 8 -C 12 alkenyl group containing at least two double bonds, and a C 9 -C 12 alkenyl group, wherein any -(CH 2 ) 2 - of the C 6 -C 14 alkyl group may be optionally replaced by a cyclopropane group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 6 -C 14 alkyl group, wherein any -(CH 2 ) 2 - of the C 6 -C 14 alkyl group may be optionally replaced by a cyclopropyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a branched C 8 -C 12 alkenyl group, such as (straight or branched C 3 -C 5 alkylene)-(branched C 5 -C 7 alkenyl), such as (branched C 5 alkylene)-(branched C 5 alkenyl), such as .

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為包含至少兩個雙鍵之C 8-C 12烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 9-C 12烯基。 In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 8 -C 12 alkenyl group containing at least two double bonds. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 9 -C 12 alkenyl group.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww獨立地選自由以下組成之群:C 6-C 14烷基(例如,C 6、C 8、C 9、C 10、C 11、C 13烷基),其中該C 6-C 14烷基之任何–(CH 2) 2-可視情況經伸環丙基置換。 In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is independently selected from the group consisting of C 6 -C 14 alkyl (e.g., C 6 , C 8 , C 9 , C 10 , C 11 , C 13 alkyl), wherein any -(CH 2 ) 2 - of the C 6 -C 14 alkyl may be optionally substituted with a cyclopropyl group.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww獨立地為分支鏈C 8-C 12烯基(例如,分支鏈C 10烯基)。 In some embodiments, the lipids of the present disclosure have a structure of Formula (X), wherein each R ww is independently a branched-chain C 8 -C 12 alkenyl (eg, a branched-chain C 10 alkenyl).

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww獨立地為包含至少兩個雙鍵之C 8-C 12烯基(例如,包含兩個雙鍵之C 9或C 10烯基)。 In some embodiments, the lipids of the present disclosure have a structure of Formula (X), wherein each R ww is independently a C 8 -C 12 alkenyl group containing at least two double bonds (eg, a C 9 or C 10 alkenyl group containing two double bonds).

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww獨立地為(C 1伸烷基)-(伸環丙基-C 6烷基)或(C 2伸烷基)-(伸環丙基-C 2烷基)。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww獨立地為(C 1伸烷基)-(伸環丙基-C 6烷基)。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww獨立地為(C 2伸烷基)-(伸環丙基-C 2烷基)。 In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is independently (C 1 alkylene)-(cyclopropylene-C 6 alkylene) or (C 2 alkylene)-(cyclopropylene-C 2 alkylene). In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is independently (C 1 alkylene)-(cyclopropylene-C 6 alkylene). In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is independently (C 2 alkylene)-(cyclopropylene-C 2 alkylene).

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 4烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 5烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 6烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 7烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 8烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 9烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 10烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 11烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 12烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 13烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 14烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 4 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 5 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 6 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 7 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 8 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 9 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 10 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 11 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 12 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 13 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 14 alkyl group.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 9烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 10烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 11烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 12烯基。 In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 9 alkenyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 10 alkenyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 11 alkenyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 12 alkenyl.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為包含至少兩個雙鍵之C 8烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為包含至少兩個雙鍵之C 9烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為包含至少兩個雙鍵之C 10烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為包含至少兩個雙鍵之C 11烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為包含至少兩個雙鍵之C 12烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為包含至少兩個雙鍵之C 13烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為包含至少兩個雙鍵之C 14烯基。 In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 8 alkenyl group containing at least two double bonds. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 9 alkenyl group containing at least two double bonds. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 10 alkenyl group containing at least two double bonds. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 11 alkenyl group containing at least two double bonds. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 12 alkenyl group containing at least two double bonds. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 13 alkenyl group containing at least two double bonds. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 14 alkenyl group containing at least two double bonds.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 9烷基,其中該C 9烷基之一個–(CH 2) 2-經C 2-C 6伸環烷基置換。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 9烷基,其中該C 9烷基之一個–(CH 2) 2-經伸環丙基置換。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 9烷基,其中該C 9烷基之兩個–(CH 2) 2-經C 2-C 6伸環烷基置換。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為C 9烷基,其中該C 9烷基之兩個–(CH 2) 2-經伸環丙基置換。 In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 9 alkyl group, wherein one -(CH 2 ) 2 - of the C 9 alkyl group is replaced by a C 2 -C 6 cycloalkylene group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 9 alkyl group, wherein one -(CH 2 ) 2 - of the C 9 alkyl group is replaced by a cyclopropylene group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a C 9 alkyl group, wherein two -(CH 2 ) 2 - of the C 9 alkyl group are replaced by C 2 -C 6 cycloalkylene groups. In some embodiments, the lipids of the present disclosure have the structure of Formula (X), wherein each R ww is a C 9 alkyl group, wherein two —(CH 2 ) 2 -cyclopropyl groups of the C 9 alkyl group are replaced.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為直鏈C 4烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為直鏈C 5烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為直鏈C 6烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為直鏈C 7烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為直鏈C 8烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為直鏈C 9烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為直鏈C 10烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為直鏈C 11烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為直鏈C 12烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為直鏈C 13烷基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為直鏈C 14烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a straight chain C 4 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a straight chain C 5 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a straight chain C 6 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a straight chain C 7 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a straight chain C 8 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a straight chain C 9 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a straight chain C 10 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a straight chain C 11 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a straight chain C 12 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a straight chain C 13 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a straight chain C 14 alkyl group.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為分支鏈C 8烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為分支鏈C 9烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為分支鏈C 10烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為分支鏈C 11烯基。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個R ww為分支鏈C 12烯基。 In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a branched chain C 8 alkenyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a branched chain C 9 alkenyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a branched chain C 10 alkenyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a branched chain C 11 alkenyl. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each R ww is a branched chain C 12 alkenyl.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個cc獨立地選自3至7。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個cc為3。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個cc為4。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個cc為5。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個cc為6。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個cc為7。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個cc為8。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個cc為9。In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each cc is independently selected from 3 to 7. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each cc is 3. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each cc is 4. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each cc is 5. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each cc is 6. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each cc is 7. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each cc is 8. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each cc is 9.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個dd獨立地選自1至4。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個dd為1。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個dd為2。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個dd為3。在一些實施例中,本揭示案之脂質具有式(X)之結構,其中每個dd為4。In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each dd is independently selected from 1 to 4. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each dd is 1. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each dd is 2. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each dd is 3. In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein each dd is 4.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中ee為1。In some embodiments, the lipids of the present disclosure have a structure of formula (X), wherein ee is 1.

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中ee為0。 式(X-A) In some embodiments, the lipid of the present disclosure has a structure of formula (X), wherein ee is 0. Formula (X-A)

在一些實施例中,本揭示案之脂質具有式(X)之結構,其中本揭示案之脂質具有式(X-A)之結構: (X-A), 或其醫藥學上可接受之鹽,其中 每個cc獨立地選自3至7; 每個dd獨立地選自1至4; R xx選自氫及視情況經取代之C 1-C 6烷基;且 每個R ww獨立地選自由以下組成之群:C 4-C 14烷基或(直鏈或分支鏈C 3-C 5伸烷基)-(分支鏈C 5-C 7烯基)。 In some embodiments, the lipid of the present disclosure has a structure of formula (X), wherein the lipid of the present disclosure has a structure of formula (XA): (XA), or a pharmaceutically acceptable salt thereof, wherein each cc is independently selected from 3 to 7; each dd is independently selected from 1 to 4; R xx is selected from hydrogen and optionally substituted C 1 -C 6 alkyl; and each R ww is independently selected from the group consisting of C 4 -C 14 alkyl or (straight or branched C 3 -C 5 alkylene)-(branched C 5 -C 7 alkenyl).

在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中R xx為氫。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中R xx為C 1烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中R xx為C 2烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中R xx為C 3烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中R xx為C 4烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中R xx為C 5烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中R xx為C 6烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein R xx is hydrogen. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein R xx is C 1 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein R xx is C 2 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein R xx is C 3 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein R xx is C 4 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein R xx is C 5 alkyl. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein R xx is C 6 alkyl.

在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個cc為4、5、6或7。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個cc為3。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個cc為4。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個cc為5。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個cc為6。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個cc為7。In some embodiments, the lipids of the present disclosure have a structure of formula (X-A), wherein each cc is 4, 5, 6 or 7. In some embodiments, the lipids of the present disclosure have a structure of formula (X-A), wherein each cc is 3. In some embodiments, the lipids of the present disclosure have a structure of formula (X-A), wherein each cc is 4. In some embodiments, the lipids of the present disclosure have a structure of formula (X-A), wherein each cc is 5. In some embodiments, the lipids of the present disclosure have a structure of formula (X-A), wherein each cc is 6. In some embodiments, the lipids of the present disclosure have a structure of formula (X-A), wherein each cc is 7.

在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個dd為1或3。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個dd為1。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個dd為2。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個dd為3。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個dd為4。In some embodiments, the lipids of the present disclosure have a structure of formula (X-A), wherein each dd is 1 or 3. In some embodiments, the lipids of the present disclosure have a structure of formula (X-A), wherein each dd is 1. In some embodiments, the lipids of the present disclosure have a structure of formula (X-A), wherein each dd is 2. In some embodiments, the lipids of the present disclosure have a structure of formula (X-A), wherein each dd is 3. In some embodiments, the lipids of the present disclosure have a structure of formula (X-A), wherein each dd is 4.

在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為C 4-C 14烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為C 4烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為C 5烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為C 6烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為C 7烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為C 8烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為C 9烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為C 10烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為C 11烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為C 12烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為C 13烷基。在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為C 14烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is a C 4 -C 14 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is a C 4 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is a C 5 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is a C 6 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is a C 7 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is a C 8 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is a C 9 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is a C 10 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is a C 11 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is a C 12 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is a C 13 alkyl group. In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is a C 14 alkyl group.

在一些實施例中,本揭示案之脂質具有式(X-A)之結構,其中每個R ww為(直鏈或分支鏈C 3-C 5伸烷基)-(分支鏈C 5-C 7烯基),例如(分支鏈C 5伸烷基)-(分支鏈C 5烯基),例如 In some embodiments, the lipids of the present disclosure have a structure of formula (XA), wherein each R ww is (straight or branched C 3 -C 5 alkylene)-(branched C 5 -C 7 alkenyl), for example (branched C 5 alkylene)-(branched C 5 alkenyl), for example .

在一些實施例中,本揭示案之脂質包含無環核心。在一些實施例中,本揭示案之脂質選自下表(I)中之任何脂質或其醫藥學上可接受之鹽: 表(I). 具有無環核心之可離子化脂質的非限制性實例 結構 化合物編號 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 In some embodiments, the lipids of the present disclosure comprise an acyclic core. In some embodiments, the lipids of the present disclosure are selected from any lipid in the following Table (I) or a pharmaceutically acceptable salt thereof: Table (I). Non-limiting examples of ionizable lipids having an acyclic core Structure Compound No. 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 twenty one twenty two twenty three twenty four 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

在一些實施例中,本揭示案之LNP包含PCT申請公開案WO2023044333A1中所揭示之可離子化脂質,該案以引用之方式整體併入本文中。 式(CY) 在一些實施例中,本文所揭示之LNP包含式(CY)之可離子化脂質 (CY), 或其醫藥學上可接受之鹽, 其中: R 1選自由以下組成之群:-OH、-OAc、R 1a; Z 1為視情況經取代之C 1-C 6烷基; X 1為視情況經取代之C 2-C 6伸烷基; X 2選自由以下組成之群:一鍵、-CH 2-及-CH 2CH 2-; X 2’選自由以下組成之群:一鍵、-CH 2-及-CH 2CH 2-; X 3選自由以下組成之群:一鍵、-CH 2-及-CH 2CH 2-; X 3’選自由以下組成之群:一鍵、-CH 2-及-CH 2CH 2-; X 4及X 5獨立地為視情況經取代之C 2-C 14伸烷基或視情況經取代之C 2-C 14伸烯基; Y 1及Y 2獨立地選自由以下組成之群: ; 其中標有「*」之鍵連接至X 4或X 5; 每個Z 2獨立地為H或視情況經取代之C 1-C 8烷基; 每個Z 3獨立地為視情況經取代之C 1-C 6伸烷基; R 2選自由以下組成之群:視情況經取代之C 4-C 20烷基、視情況經取代之C 2-C 14烯基及–(CH 2) pCH(OR 6)(OR 7); R 3選自由以下組成之群:視情況經取代之C 4-C 20烷基、視情況經取代之C 2-C 14烯基或-(CH 2) qCH(OR 8)(OR 9); R 1a為: ; R 2a、R 2b及R 2c獨立地為氫及C 1-C 6烷基; R 3a、R 3b及R 3c獨立地為氫及C 1-C 6烷基; R 4a、R 4b及R 4c獨立地為氫及C 1-C 6烷基; R 5a、R 5b及R 5c獨立地為氫及C 1-C 6烷基; R 6、R 7、R 8及R 9獨立地為視情況經取代之C 1-C 14烷基、視情況經取代之C 2-C 14烯基或-(CH 2) m-A-(CH 2) nH; 每個A獨立地為C 3-C 8伸環烷基; 每個m獨立地為0、1、2、3、4、5、6、7、8、9、10、11或12; 每個n独立地為0、1、2、3、4、5、6、7、8、9、10、11或12; p選自由0、1、2、3、4、5、6及7組成之群;且 q選自由0、1、2、3、4、5、6及7組成之群。 式(CY-I)、(CY-II)、(CY-III)、(CY-IV)及(CY-V) In some embodiments, the LNP disclosed herein comprises an ionizable lipid disclosed in PCT application publication WO2023044333A1, which is incorporated herein by reference in its entirety. Formula (CY) In some embodiments, the LNP disclosed herein comprises an ionizable lipid of formula (CY) (CY), or a pharmaceutically acceptable salt thereof, wherein: R 1 is selected from the group consisting of: -OH, -OAc, R 1a , and ; Z 1 is an optionally substituted C 1 -C 6 alkyl group; X 1 is an optionally substituted C 2 -C 6 alkylene group; X 2 is selected from the group consisting of a bond, -CH 2 - and -CH 2 CH 2 -; X 2' is selected from the group consisting of a bond, -CH 2 - and -CH 2 CH 2 -; X 3 is selected from the group consisting of a bond, -CH 2 - and -CH 2 CH 2 -; X 3' is selected from the group consisting of a bond, -CH 2 - and -CH 2 CH 2 -; X 4 and X 5 are independently an optionally substituted C 2 -C 14 alkylene group or an optionally substituted C 2 -C 14 alkenylene group; Y 1 and Y 2 are independently selected from the group consisting of: , , , , , , , , , , or ; wherein the bond marked with "*" is connected to X 4 or X 5 ; each Z 2 is independently H or an optionally substituted C 1 -C 8 alkyl group; each Z 3 is independently an optionally substituted C 1 -C 6 alkylene group; R 2 is selected from the group consisting of an optionally substituted C 4 -C 20 alkyl group, an optionally substituted C 2 -C 14 alkenyl group, and -(CH 2 ) p CH(OR 6 )(OR 7 ); R 3 is selected from the group consisting of an optionally substituted C 4 -C 20 alkyl group, an optionally substituted C 2 -C 14 alkenyl group, or -(CH 2 ) q CH(OR 8 )(OR 9 ); R 1a is: , , or ; R 2a , R 2b and R 2c are independently hydrogen and C 1 -C 6 alkyl; R 3a , R 3b and R 3c are independently hydrogen and C 1 -C 6 alkyl; R 4a , R 4b and R 4c are independently hydrogen and C 1 -C 6 alkyl; R 5a , R 5b and R 5c are independently hydrogen and C 1 -C 6 alkyl; R 6 , R 7 , R 8 and R 9 are independently optionally substituted C 1 -C 14 alkyl, optionally substituted C 2 -C 14 alkenyl or -(CH 2 ) m -A-(CH 2 ) n H; each A is independently C 3 -C 8 cycloalkylene; Each m is independently 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12; each n is independently 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12; p is selected from the group consisting of 0, 1, 2, 3, 4, 5, 6 and 7; and q is selected from the group consisting of 0, 1, 2, 3, 4, 5, 6 and 7. Formula (CY-I), (CY-II), (CY-III), (CY-IV) and (CY-V)

在一些實施例中,本揭示案包括式(CY-I)、(CY-II)、(CY-III)、(CY-IV)或(CY-V)化合物: (CY-I)、                           (CY-II)、                   (CY-III)、 (CY-IV) 及   (CY-V) 或其醫藥學上可接受之鹽, 其中X 1、X 2、X 2’、X 3、X 3’、X 4、X 5、Y 1、Y 2、R 1, R 2及R 3在本文中定義。 式(CY-VI)及(CY-VII) In some embodiments, the present disclosure includes compounds of formula (CY-I), (CY-II), (CY-III), (CY-IV), or (CY-V): (CY-I), (CY-II), (CY-III), (CY-IV) and (CY-V) or a pharmaceutically acceptable salt thereof, wherein X 1 , X 2 , X 2' , X 3 , X 3' , X 4 , X 5 , Y 1 , Y 2 , R 1 , R 2 and R 3 are defined herein. Formula (CY-VI) and (CY-VII)

在一些實施例中,本揭示案包括式(CY-VI)或(CY-VII)化合物: (CY-VI)              (CY-VII) 或其醫藥學上可接受之鹽, 其中X 1、X 4、X 5、R 1、R 2及R 3在本文中定義。 式(CY-VIII)及(CY-IX) In some embodiments, the present disclosure includes compounds of formula (CY-VI) or (CY-VII): (CY-VI) (CY-VII) or a pharmaceutically acceptable salt thereof, wherein X 1 , X 4 , X 5 , R 1 , R 2 and R 3 are as defined herein. Formula (CY-VIII) and (CY-IX)

在一些實施例中,本揭示案包括式(CY-VIII)或(CY-IX)化合物: (CY- VIII)                  (CY- IX), 或其醫藥學上可接受之鹽。 其中X 1、X 4、X 5、R 1、R 2及R 3在本文中定義。 式(CY-IV-a)、(CY-IV-b)及(CY-IV-c) In some embodiments, the present disclosure includes compounds of formula (CY-VIII) or (CY-IX): (CY-VIII) (CY-IX), or a pharmaceutically acceptable salt thereof. wherein X1 , X4 , X5 , R1 , R2 and R3 are as defined herein. Formula (CY-IV-a), (CY-IV-b) and (CY-IV-c)

在一些實施例中,本揭示案包括式(CY-IV-a)、(CY-IV-b)或(CY-IV-c)化合物 (CY-IV-a)            (CY-IV-b)          (CY-IV-c), 或其醫藥學上可接受之鹽。 其中X 1、X 4、X 5、R 2及R 3在本文中定義。 式(CY-IV-d)、(CY-IV-e)及(CY-IV-f) In some embodiments, the present disclosure includes compounds of formula (CY-IV-a), (CY-IV-b) or (CY-IV-c) (CY-IV-a) (CY-IV-b) (CY-IV-c), or a pharmaceutically acceptable salt thereof. wherein X 1 , X 4 , X 5 , R 2 and R 3 are as defined herein. Formula (CY-IV-d), (CY-IV-e) and (CY-IV-f)

在一些實施例中,本揭示案包括式(CY-IV-d)、(CY-IV-e)或(CY-IV-f)化合物 (CY-IV-d)            (CY-IV-e)          (CY-IV-f), 或其醫藥學上可接受之鹽。 其中X 1、X 4、X 5、R 2及R 3在本文中定義。 式(CY-IV) In some embodiments, the present disclosure includes compounds of formula (CY-IV-d), (CY-IV-e) or (CY-IV-f) (CY-IV-d) (CY-IV-e) (CY-IV-f), or a pharmaceutically acceptable salt thereof. wherein X 1 , X 4 , X 5 , R 2 and R 3 are as defined herein. Formula (CY-IV)

在一些實施例中,本揭示案之脂質具有式(CY-IV’)之結構: (CY-IV’), 或其醫藥學上可接受之鹽,其中R 1、R 2、R 3、X 1、X 2、X 3、X 4、X 5、Y 1及Y 2如結合式(CY-I’)所定義。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-IV'): (CY-IV'), or a pharmaceutically acceptable salt thereof, wherein R 1 , R 2 , R 3 , X 1 , X 2 , X 3 , X 4 , X 5 , Y 1 and Y 2 are as defined in the combination formula (CY-I').

在一些實施例中,本揭示案之脂質具有式(CY-IV’)之結構,其中: R 1為-OH、R 1a, 其中Z 1為視情況經取代之C 1-C 6烷基; X 1為視情況經取代之C 2-C 6伸烷基; X 2及X 3獨立地為一鍵、-CH 2-或-CH 2CH 2-; X 4及X 5獨立地為視情況經取代之C 2-C 14伸烷基; Y 1及Y 2獨立地為 ; R 2及R 3獨立地為視情況經取代之C 4-C 20烷基; R 1a為: ; R 2a、R 2b及R 2c獨立地為氫及C 1-C 6烷基; R 3a、R 3b及R 3c獨立地為氫及C 1-C 6烷基; R 4a、R 4b及R 4c獨立地為氫及C 1-C 6烷基;且 R 5a、R 5b及R 5c獨立地為氫及C 1-C 6烷基 In some embodiments, the lipid of the present disclosure has a structure of formula (CY-IV'), wherein: R 1 is -OH, R 1a , or , wherein Z 1 is an optionally substituted C 1 -C 6 alkyl group; X 1 is an optionally substituted C 2 -C 6 alkylene group; X 2 and X 3 are independently a bond, -CH 2 - or -CH 2 CH 2 -; X 4 and X 5 are independently an optionally substituted C 2 -C 14 alkylene group; Y 1 and Y 2 are independently , , , or ; R 2 and R 3 are independently optionally substituted C 4 -C 20 alkyl; R 1a is: , , or ; R 2a , R 2b and R 2c are independently hydrogen and C 1 -C 6 alkyl; R 3a , R 3b and R 3c are independently hydrogen and C 1 -C 6 alkyl; R 4a , R 4b and R 4c are independently hydrogen and C 1 -C 6 alkyl; and R 5a , R 5b and R 5c are independently hydrogen and C 1 -C 6 alkyl

在一些實施例中,本揭示案之脂質具有式(CY-IV’)之結構,其中R 1為-OH、 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-IV'), wherein R 1 is -OH, or .

在一些實施例中,本揭示案之脂質具有式(CY-IV’)之結構,其中Y 1及Y 2獨立地為: In some embodiments, the lipids of the present disclosure have a structure of formula (CY-IV'), wherein Y1 and Y2 are independently: .

在一些實施例中,本揭示案之脂質具有式(CY-IV’)之結構,其中R 2為-CH(OR 6)(OR 7)。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-IV'), wherein R 2 is -CH(OR 6 )(OR 7 ).

在一些實施例中,本揭示案之脂質具有式(CY-IV’)之結構,其中R 3為-CH(OR 8)(OR 9)。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-IV'), wherein R 3 is -CH(OR 8 )(OR 9 ).

具有式(CY-IV’)之結構的脂質之非限制性實例包括化合物CY7、CY8、CY19、CY20、CY21、CY28、CY29、CY40、CY41、CY42、CY48、CY49、CY58、CY59及CY60。 式(CY-VI’) Non-limiting examples of lipids having a structure of formula (CY-IV') include compounds CY7, CY8, CY19, CY20, CY21, CY28, CY29, CY40, CY41, CY42, CY48, CY49, CY58, CY59, and CY60. Formula (CY-VI')

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構: (CY-VI’), 或其醫藥學上可接受之鹽,其中R 1、R 6、R 7、R 8、R 9、X 1、X 2、X 3、X 4、X 5、Y 1及Y 2如結合式(CY-I’)所定義。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-VI'): (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein R 1 , R 6 , R 7 , R 8 , R 9 , X 1 , X 2 , X 3 , X 4 , X 5 , Y 1 and Y 2 are as defined in the combination formula (CY-I').

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中R 1為-OH。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein R 1 is -OH.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中X 1為C 2-C 6伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein X1 is a C2 - C6 alkylene group.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中X 2為-CH 2CH 2-。 In some embodiments, the lipids of the present disclosure have a structure of Formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein X2 is -CH2CH2- .

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中X 4為C 2-C 6伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein X 4 is a C 2 -C 6 alkylene group.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中X 5為C 2-C 6伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein X5 is a C2 - C6 alkylene group.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中Y 1為: In some embodiments, the lipid of the present disclosure has a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein Y 1 is: .

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中Y 2為: In some embodiments, the lipid of the present disclosure has a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein Y2 is: .

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中每個Z 3獨立地為視情況經取代之C 1-C 6伸烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein each Z 3 is independently an optionally substituted C 1 -C 6 alkylene group.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中每個Z 3為-CH 2CH 2-。 In some embodiments, the lipids of the present disclosure have a structure of Formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein each Z 3 is -CH 2 CH 2 -.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中R 6為C 5-C 14烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein R 6 is a C 5 -C 14 alkyl group.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中R 7為C 5-C 14烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein R7 is a C5 - C14 alkyl group.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中R 6為C 6-C 14烯基。 In some embodiments, the lipid of the present disclosure has a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein R 6 is C 6 -C 14 alkenyl.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中R 7為C 6-C 14烯基。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein R7 is C6 - C14 alkenyl.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中R 8為C 5-C 16烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein R 8 is a C 5 -C 16 alkyl group.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中R 9為C 5-C 14烷基。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein R 9 is a C 5 -C 14 alkyl group.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中R 8為C 6-C 14烯基。 In some embodiments, the lipid of the present disclosure has a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein R 8 is C 6 -C 14 alkenyl.

在一些實施例中,本揭示案之脂質具有式(CY-VI’)之結構,或其醫藥學上可接受之鹽,其中R 9為C 6-C 14烯基。 In some embodiments, the lipids of the present disclosure have a structure of formula (CY-VI'), or a pharmaceutically acceptable salt thereof, wherein R 9 is C 6 -C 14 alkenyl.

在一些實施例中,本揭示案之脂質包含雜環核心,其中雜原子為氮。在一些實施例中,該雜環核心包含吡咯啶或其衍生物。在一些實施例中,該雜環核心包含哌啶或其衍生物。在一些實施例中,本揭示案之脂質選自下表(II)中之任何脂質或其醫藥學上可接受之鹽: R 1 In some embodiments, the lipid of the present disclosure comprises a heterocyclic core, wherein the hetero atom is nitrogen. In some embodiments, the heterocyclic core comprises pyrrolidine or a derivative thereof. In some embodiments, the heterocyclic core comprises piperidine or a derivative thereof. In some embodiments, the lipid of the present disclosure is selected from any lipid in Table (II) below or a pharmaceutically acceptable salt thereof: R 1

在一些實施例中,R 1選自由以下組成之群:-OH、-OAc、R 1aIn some embodiments, R 1 is selected from the group consisting of: -OH, -OAc, R 1a , and .

在一些實施例中,R 1為-OH或-OAc。在一些實施例中,R 1為OH。在一些實施例中,R 1為-OAc。在一些實施例中,R 1為R 1a。在一些實施例中,R 1為咪唑基。 In some embodiments, R 1 is -OH or -OAc. In some embodiments, R 1 is OH. In some embodiments, R 1 is -OAc. In some embodiments, R 1 is R 1a . In some embodiments, R 1 is imidazolyl.

在一些實施例中,R 1R 2 In some embodiments, R1 is . R 2

在一些實施例中,R 2選自由以下組成之群:視情況經取代之C 4-C 20烷基、視情況經取代之C 2-C 14烯基及–(CH 2) pCH(OR 6)(OR 7)。 In some embodiments, R 2 is selected from the group consisting of optionally substituted C 4 -C 20 alkyl, optionally substituted C 2 -C 14 alkenyl, and —(CH 2 ) p CH(OR 6 )(OR 7 ).

在一些實施例中,R 2為視情況經取代之C 4-C 20烷基。在一些實施例中,R 2為視情況經取代之C 8-C 17烷基。在一些實施例中,R 2為視情況經取代之C 9-C 16烷基。在一些實施例中,R 2為視情況經取代之C 8-C 10烷基。在一些實施例中,R 2為視情況經取代之C 11-C 13烷基。在一些實施例中,R 2為視情況經取代之C 14-C 16烷基。在一些實施例中,R 2為視情況經取代之C 9烷基。在一些實施例中,R 2為視情況經取代之C 10烷基。在一些實施例中,R 2為視情況經取代之C 11烷基。在一些實施例中,R 2為視情況經取代之C 12烷基。在一些實施例中,R 2為視情況經取代之C 13烷基。在一些實施例中,R 2為視情況經取代之C 14烷基。在一些實施例中,R 2為視情況經取代之C 15烷基。在一些實施例中,R 2為視情況經取代之C 16烷基。 In some embodiments, R 2 is an optionally substituted C 4 -C 20 alkyl. In some embodiments, R 2 is an optionally substituted C 8 -C 17 alkyl. In some embodiments, R 2 is an optionally substituted C 9 -C 16 alkyl. In some embodiments, R 2 is an optionally substituted C 8 -C 10 alkyl. In some embodiments, R 2 is an optionally substituted C 11 -C 13 alkyl. In some embodiments, R 2 is an optionally substituted C 14 -C 16 alkyl. In some embodiments, R 2 is an optionally substituted C 9 alkyl. In some embodiments, R 2 is an optionally substituted C 10 alkyl. In some embodiments, R 2 is an optionally substituted C 11 alkyl. In some embodiments, R 2 is an optionally substituted C 12 alkyl. In some embodiments, R 2 is an optionally substituted C 13 alkyl. In some embodiments, R 2 is an optionally substituted C 14 alkyl. In some embodiments, R 2 is an optionally substituted C 15 alkyl. In some embodiments, R 2 is an optionally substituted C 16 alkyl.

在一些實施例中,R 2為視情況經取代之C 2-C 14烯基。在一些實施例中,R 2為視情況經取代之C 5-C 14烯基。在一些實施例中,R 2為視情況經取代之C 7-C 14烯基。在一些實施例中,R 2為視情況經取代之C 9-C 14烯基。在一些實施例中,R 2為視情況經取代之C 10-C 14烯基。在一些實施例中,R 2為視情況經取代之C 12-C 14烯基。 In some embodiments, R 2 is an optionally substituted C 2 -C 14 alkenyl. In some embodiments, R 2 is an optionally substituted C 5 -C 14 alkenyl. In some embodiments, R 2 is an optionally substituted C 7 -C 14 alkenyl. In some embodiments, R 2 is an optionally substituted C 9 -C 14 alkenyl. In some embodiments, R 2 is an optionally substituted C 10 -C 14 alkenyl. In some embodiments, R 2 is an optionally substituted C 12 -C 14 alkenyl .

在一些實施例中,R 2為–(CH 2) pCH(OR 6)(OR 7)。在一些實施例中,R 2為–CH(OR 6)(OR 7)。在一些實施例中,R 2為–CH 2CH(OR 6)(OR 7)。在一些實施例中,R 2為–(CH 2) 2CH(OR 6)(OR 7)。在一些實施例中,R 2為–(CH 2) 3CH(OR 6)(OR 7)。在一些實施例中,R 2為–(CH 2) 4CH(OR 6)(OR 7)。 In some embodiments, R 2 is –(CH 2 ) p CH(OR 6 )(OR 7 ). In some embodiments, R 2 is –CH(OR 6 )(OR 7 ). In some embodiments, R 2 is –CH 2 CH(OR 6 )(OR 7 ). In some embodiments, R 2 is –(CH 2 ) 2 CH(OR 6 )(OR 7 ). In some embodiments, R 2 is –(CH 2 ) 3 CH(OR 6 )(OR 7 ). In some embodiments, R 2 is –(CH 2 ) 4 CH(OR 6 )(OR 7 ).

在一些實施例中,R 2選自由以下組成之群: R 3 In some embodiments, R is selected from the group consisting of: and . R 3

在一些實施例中,R 3選自由以下組成之群:視情況經取代之C 4-C 20烷基、視情況經取代之C 2-C 14烯基及–(CH 2) qCH(OR 6)(OR 7)。 In some embodiments, R 3 is selected from the group consisting of optionally substituted C 4 -C 20 alkyl, optionally substituted C 2 -C 14 alkenyl, and —(CH 2 ) q CH(OR 6 )(OR 7 ).

在一些實施例中,R 3為視情況經取代之C 4-C 20烷基。在一些實施例中,R 3為視情況經取代之C 8-C 17烷基。在一些實施例中,R 3為視情況經取代之C 9-C 16烷基。在一些實施例中,R 3為視情況經取代之C 8-C 10烷基。在一些實施例中,R 3為視情況經取代之C 11-C 13烷基。在一些實施例中,R 3為視情況經取代之C 14-C 16烷基。在一些實施例中,R 3為視情況經取代之C 9烷基。在一些實施例中,R 3為視情況經取代之C 10烷基。在一些實施例中,R 3為視情況經取代之C 11烷基。在一些實施例中,R 3為視情況經取代之C 12烷基。在一些實施例中,R 3為視情況經取代之C 13烷基。在一些實施例中,R 3為視情況經取代之C 14烷基。在一些實施例中,R 3為視情況經取代之C 15烷基。在一些實施例中,R 3為視情況經取代之C 16烷基。 In some embodiments, R 3 is an optionally substituted C 4 -C 20 alkyl. In some embodiments, R 3 is an optionally substituted C 8 -C 17 alkyl. In some embodiments, R 3 is an optionally substituted C 9 -C 16 alkyl. In some embodiments, R 3 is an optionally substituted C 8 -C 10 alkyl. In some embodiments, R 3 is an optionally substituted C 11 -C 13 alkyl. In some embodiments, R 3 is an optionally substituted C 14 -C 16 alkyl. In some embodiments, R 3 is an optionally substituted C 9 alkyl. In some embodiments, R 3 is an optionally substituted C 10 alkyl. In some embodiments, R 3 is optionally substituted C 11 alkyl. In some embodiments, R 3 is optionally substituted C 12 alkyl. In some embodiments, R 3 is optionally substituted C 13 alkyl. In some embodiments, R 3 is optionally substituted C 14 alkyl. In some embodiments, R 3 is optionally substituted C 15 alkyl. In some embodiments, R 3 is optionally substituted C 16 alkyl.

在一些實施例中,R 3為視情況經取代之C 2-C 14烯基。在一些實施例中,R 3為視情況經取代之C 5-C 14烯基。在一些實施例中,R 3為視情況經取代之C 7-C 14烯基。在一些實施例中,R 3為視情況經取代之C 9-C 14烯基。在一些實施例中,R 3為視情況經取代之C 10-C 14烯基。在一些實施例中,R 3為視情況經取代之C 12-C 14烯基。 In some embodiments, R 3 is an optionally substituted C 2 -C 14 alkenyl. In some embodiments, R 3 is an optionally substituted C 5 -C 14 alkenyl. In some embodiments, R 3 is an optionally substituted C 7 -C 14 alkenyl. In some embodiments, R 3 is an optionally substituted C 9 -C 14 alkenyl. In some embodiments, R 3 is an optionally substituted C 10 -C 14 alkenyl. In some embodiments, R 3 is an optionally substituted C 12 -C 14 alkenyl .

在一些實施例中,R 3為-(CH 2) qCH(OR 8)(OR 9)。在一些實施例中,R 3為-CH(OR 8)(OR 9)。在一些實施例中,R 3為-CH 2CH(OR 8)(OR 9)。在一些實施例中,R 3為-(CH 2) 2CH(OR 8)(OR 9)。在一些實施例中,R 3為-(CH 2) 3CH(OR 8)(OR 9)。在一些實施例中,R 3為-(CH 2) 4CH(OR 8)(OR 9)。 In some embodiments, R 3 is -(CH 2 ) q CH(OR 8 )(OR 9 ). In some embodiments, R 3 is -CH(OR 8 )(OR 9 ). In some embodiments, R 3 is -CH 2 CH(OR 8 )(OR 9 ). In some embodiments, R 3 is -(CH 2 ) 2 CH(OR 8 )(OR 9 ). In some embodiments, R 3 is -(CH 2 ) 3 CH(OR 8 )(OR 9 ). In some embodiments, R 3 is -(CH 2 ) 4 CH(OR 8 )(OR 9 ).

在一些實施例中,R 3選自由以下組成之群: R 6 、R 7、R 8、R 9 In some embodiments, R3 is selected from the group consisting of: and R6 , R7 , R8 , R9

在一些實施例中,R 6、R 7、R 8及R 9獨立地為視情況經取代之C 1-C 14烷基、視情況經取代之C 2-C 14烯基或-(CH 2) m-A-(CH 2) nH。在一些實施例中,R 6、R 7、R 8及R 9獨立地為視情況經取代之C 1-C 14烷基。在一些實施例中,R 6、R 7、R 8及R 9獨立地為視情況經取代之C 2-C 14烯基。在一些實施例中,R 6、R 7、R 8及R 9獨立地為-(CH 2) m-A-(CH 2) nH。 In some embodiments, R 6 , R 7 , R 8 and R 9 are independently optionally substituted C 1 -C 14 alkyl, optionally substituted C 2 -C 14 alkenyl, or -(CH 2 ) m -A-(CH 2 ) n H. In some embodiments, R 6 , R 7 , R 8 and R 9 are independently optionally substituted C 1 -C 14 alkyl. In some embodiments, R 6 , R 7 , R 8 and R 9 are independently optionally substituted C 2 -C 14 alkenyl. In some embodiments, R 6 , R 7 , R 8 and R 9 are independently -(CH 2 ) m -A-(CH 2 ) n H.

在一些實施例中,R 6為視情況經取代之C 1-C 14烷基、視情況經取代之C 2-C 14烯基或-(CH 2) m-A-(CH 2) nH。在一些實施例中,R 6為視情況經取代之C 3-C 10烷基。在一些實施例中,R 6為視情況經取代之C 4-C 10烷基。在一些實施例中,R 6獨立地為視情況經取代之C 5-C 10烷基。在一些實施例中,R 6為視情況經取代之C 9-C 10烷基。在一些實施例中,R 6為視情況經取代之C 1-C 14烷基。在一些實施例中,R 6為視情況經取代之C 2-C 14烯基。在一些實施例中,R 6為–(CH 2) m-A-(CH 2) nH。 In some embodiments, R 6 is optionally substituted C 1 -C 14 alkyl, optionally substituted C 2 -C 14 alkenyl, or -(CH 2 ) m -A-(CH 2 ) n H. In some embodiments, R 6 is optionally substituted C 3 -C 10 alkyl. In some embodiments, R 6 is optionally substituted C 4 -C 10 alkyl . In some embodiments, R 6 is independently optionally substituted C 5 -C 10 alkyl. In some embodiments, R 6 is optionally substituted C 9 -C 10 alkyl . In some embodiments, R 6 is optionally substituted C 1 -C 14 alkyl. In some embodiments, R 6 is optionally substituted C 2 -C 14 alkenyl. In some embodiments, R 6 is —(CH 2 ) m —A-(CH 2 ) n H.

在一些實施例中,R 7為視情況經取代之C 1-C 14烷基、視情況經取代之C 2-C 14烯基或–(CH 2) m-A-(CH 2) nH。在一些實施例中,R 7為視情況經取代之C 3-C 10烷基。在一些實施例中,R 7為視情況經取代之C 4-C 10烷基。在一些實施例中,R 7為視情況經取代之C 5-C 10烷基。在一些實施例中,R 7為視情況經取代之C 9-C 10烷基。在一些實施例中,R 7為視情況經取代之C 1-C 14烷基。在一些實施例中,R 7為視情況經取代視情況經取代之C 2-C 14烯基。在一些實施例中,R 7為–(CH 2) m-A-(CH 2) nH。 In some embodiments, R 7 is optionally substituted C 1 -C 14 alkyl, optionally substituted C 2 -C 14 alkenyl, or -(CH 2 ) m -A-(CH 2 ) n H. In some embodiments, R 7 is optionally substituted C 3 -C 10 alkyl. In some embodiments, R 7 is optionally substituted C 4 -C 10 alkyl . In some embodiments, R 7 is optionally substituted C 5 -C 10 alkyl . In some embodiments, R 7 is optionally substituted C 9 -C 10 alkyl. In some embodiments, R 7 is optionally substituted C 1 -C 14 alkyl. In some embodiments, R7 is optionally substituted or optionally substituted C2 - C14 alkenyl. In some embodiments, R7 is -( CH2 ) m -A-( CH2 ) nH .

在一些實施例中,R 8為視情況經取代之C 1-C 14烷基、視情況經取代之C 2-C 14烯基或–(CH 2) m-A-(CH 2) nH。在一些實施例中,R 8為視情況經取代之C 3-C 10烷基。在一些實施例中,R 8為視情況經取代之C 4-C 10烷基。在一些實施例中,R 8為視情況經取代之C 5-C 10烷基。在一些實施例中,R 8為視情況經取代之C 9-C 10烷基。在一些實施例中,R 8為視情況經取代之C 1-C 14烷基。在一些實施例中,R 8為視情況經取代之C 2-C 14烯基。在一些實施例中,R 8為–(CH 2) m-A-(CH 2) nH。 In some embodiments, R 8 is optionally substituted C 1 -C 14 alkyl, optionally substituted C 2 -C 14 alkenyl, or -(CH 2 ) m -A-(CH 2 ) n H. In some embodiments, R 8 is optionally substituted C 3 -C 10 alkyl. In some embodiments, R 8 is optionally substituted C 4 -C 10 alkyl . In some embodiments, R 8 is optionally substituted C 5 -C 10 alkyl. In some embodiments, R 8 is optionally substituted C 9 -C 10 alkyl. In some embodiments, R 8 is optionally substituted C 1 -C 14 alkyl. In some embodiments, R 8 is optionally substituted C 2 -C 14 alkenyl. In some embodiments, R 8 is —(CH 2 ) m —A-(CH 2 ) n H.

在一些實施例中,R 9為視情況經取代之C 1-C 14烷基、視情況經取代之C 2-C 14烯基或–(CH 2) m-A-(CH 2) nH。在一些實施例中,R 9為視情況經取代之C 3-C 10烷基。在一些實施例中,R 9為視情況經取代之C 4-C 10烷基。在一些實施例中,R 9為視情況經取代之C 5-C 10烷基。在一些實施例中,R 9為視情況經取代之C 9-C 10烷基。在一些實施例中,R 9為視情況經取代之C 1-C 14烷基。在一些實施例中,R 9為視情況經取代之C 2-C 14烯基。在一些實施例中,R 9為–(CH 2) m-A-(CH 2) nH。 In some embodiments, R 9 is an optionally substituted C 1 -C 14 alkyl, an optionally substituted C 2 -C 14 alkenyl, or -(CH 2 ) m -A-(CH 2 ) n H. In some embodiments, R 9 is an optionally substituted C 3 -C 10 alkyl. In some embodiments, R 9 is an optionally substituted C 4 -C 10 alkyl. In some embodiments, R 9 is an optionally substituted C 5 -C 10 alkyl. In some embodiments, R 9 is an optionally substituted C 9 -C 10 alkyl. In some embodiments, R 9 is an optionally substituted C 1 -C 14 alkyl. In some embodiments, R 9 is an optionally substituted C 2 -C 14 alkenyl. In some embodiments, R 9 is —(CH 2 ) m —A—(CH 2 ) n H.

在一些實施例中,每個m獨立地為0、1、2、3、4、5、6、7、8、9、10、11或12。在一些實施例中,每個m為0。在一些實施例中,每個m為1。在一些實施例中,每個m為2。在一些實施例中,每個m為3。在一些實施例中,每個m為4。在一些實施例中,每個m為5。在一些實施例中,每個m為6。在一些實施例中,每個m為7。在一些實施例中,每個m為8。在一些實施例中,每個m為9。在一些實施例中,每個m為10。在一些實施例中,每個m為11。在一些實施例中,每個m為12。In some embodiments, each m is independently 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12. In some embodiments, each m is 0. In some embodiments, each m is 1. In some embodiments, each m is 2. In some embodiments, each m is 3. In some embodiments, each m is 4. In some embodiments, each m is 5. In some embodiments, each m is 6. In some embodiments, each m is 7. In some embodiments, each m is 8. In some embodiments, each m is 9. In some embodiments, each m is 10. In some embodiments, each m is 11. In some embodiments, each m is 12.

在一些實施例中,每個n獨立地為0、1、2、3、4、5、6、7、8、9、10、11或12。在一些實施例中,每個n為0。在一些實施例中,每個n為1。在一些實施例中,每個n為2。在一些實施例中,每個n為3。在一些實施例中,每個n為4。在一些實施例中,每個n為5。在一些實施例中,每個n為6。在一些實施例中,每個n為7。在一些實施例中,每個n為8。在一些實施例中,每個n為9。在一些實施例中,每個n為10。在一些實施例中,每個n為11。在一些實施例中,每個n為12。In some embodiments, each n is independently 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12. In some embodiments, each n is 0. In some embodiments, each n is 1. In some embodiments, each n is 2. In some embodiments, each n is 3. In some embodiments, each n is 4. In some embodiments, each n is 5. In some embodiments, each n is 6. In some embodiments, each n is 7. In some embodiments, each n is 8. In some embodiments, each n is 9. In some embodiments, each n is 10. In some embodiments, each n is 11. In some embodiments, each n is 12.

在一些實施例中,每個A獨立地為C 3-C 8伸環烷基。在一些實施例中,每個A為伸環丙基。 X 1 In some embodiments, each A is independently C 3 -C 8 cycloalkylene. In some embodiments, each A is cyclopropylene.

在一些實施例中,X1為視情況經取代之C2-C6伸烷基。在一些實施例中,X1為視情況經取代之C2-C5伸烷基。在一些實施例中,X1為視情況經取代之C2-C4伸烷基。在一些實施例中,X1為視情況經取代之C2-C3伸烷基。在一些實施例中,X1為視情況經取代之C2伸烷基。在一些實施例中,X1為視情況經取代之C3伸烷基。在一些實施例中,X1為視情況經取代之C4伸烷基。在一些實施例中,X1為視情況經取代之C5伸烷基。在一些實施例中,X1為視情況經取代之C6伸烷基。在一些實施例中,X1為視情況經取代之–(CH2)2-。在一些實施例中,X1為視情況經取代之–(CH2)3-。在一些實施例中,X1為視情況經取代之–(CH2)4-。在一些實施例中,X1為視情況經取代之–(CH2)5-。在一些實施例中,X1為視情況經取代之–(CH 2) 6-。 X 2 In some embodiments, X1 is an optionally substituted C2-C6 alkylene. In some embodiments, X1 is an optionally substituted C2-C5 alkylene. In some embodiments, X1 is an optionally substituted C2-C4 alkylene. In some embodiments, X1 is an optionally substituted C2-C3 alkylene. In some embodiments, X1 is an optionally substituted C2 alkylene. In some embodiments, X1 is an optionally substituted C3 alkylene. In some embodiments, X1 is an optionally substituted C4 alkylene. In some embodiments, X1 is an optionally substituted C5 alkylene. In some embodiments, X1 is an optionally substituted C6 alkylene. In some embodiments, X1 is an optionally substituted –(CH2)2-. In some embodiments, X1 is optionally substituted -(CH2)3-. In some embodiments, X1 is optionally substituted -(CH2)4-. In some embodiments, X1 is optionally substituted -(CH2)5-. In some embodiments, X1 is optionally substituted -( CH2 ) 6- . X2

在一些實施例中,X2選自由以下組成之群:一鍵、-CH2-及-CH2CH2-。在一些實施例中,X2為一鍵。在一些實施例中,X2為-CH2-。在一些實施例中,X2為-CH2CH2-。 X 2’ In some embodiments, X2 is selected from the group consisting of a bond, -CH2- and -CH2CH2-. In some embodiments, X2 is a bond. In some embodiments, X2 is -CH2-. In some embodiments, X2 is -CH2CH2-. X 2'

在一些實施例中,X2’選自由以下組成之群:一鍵、-CH2-及-CH2CH2-。在一些實施例中,X 2’為一鍵。在一些實施例中,X 2’為-CH 2-。在一些實施例中,X 2’為-CH 2CH 2-。 X 3 In some embodiments, X2' is selected from the group consisting of a bond, -CH2- and -CH2CH2- . In some embodiments, X2 ' is a bond. In some embodiments, X2' is -CH2- . In some embodiments, X2' is -CH2CH2- . X3

在一些實施例中,X 3選自由以下組成之群:一鍵、-CH 2-及-CH 2CH 2-。在一些實施例中,X 3為一鍵。在一些實施例中,X 3為-CH 2-。在一些實施例中,X 3為-CH 2CH 2-。 X 3’ In some embodiments, X 3 is selected from the group consisting of a bond, -CH 2 - and -CH 2 CH 2 -. In some embodiments, X 3 is a bond. In some embodiments, X 3 is -CH 2 -. In some embodiments, X 3 is -CH 2 CH 2 -. X 3'

在一些實施例中,X 3’選自由以下組成之群:一鍵、-CH 2-及-CH 2CH 2-。在一些實施例中,X 3’為一鍵。在一些實施例中,X 3’為-CH 2-。在一些實施例中,X 3’為-CH 2CH 2-。 X 4 In some embodiments, X 3' is selected from the group consisting of a bond, -CH 2 - and -CH 2 CH 2 -. In some embodiments, X 3' is a bond. In some embodiments, X 3' is -CH 2 -. In some embodiments, X 3' is -CH 2 CH 2 -. X 4

在一些實施例中,X 4選自由以下組成之群:視情況經取代之C 2-C 14伸烷基及視情況經取代之C 2-C 14伸烯基。在一些實施例中,X 4為視情況經取代之C 2-C 14伸烷基。在一些實施例中,X 4為視情況經取代之C 2-C 10伸烷基。在一些實施例中,X 4為視情況經取代之C 2-C 8伸烷基。在一些實施例中,X 4為視情況經取代之C 2-C 6伸烷基。在一些實施例中,X 4為視情況經取代之C 3-C 6伸烷基。在一些實施例中,X 4為視情況經取代之C 3伸烷基。在一些實施例中,X 4為視情況經取代之C 4伸烷基。在一些實施例中,X 4為視情況經取代之C 5伸烷基。在一些實施例中,X 4為視情況經取代之C 6伸烷基。在一些實施例中,X 4為視情況經取代之–(CH 2) 2-。在一些實施例中,X 4為視情況經取代之–(CH 2) 3-。在一些實施例中,X 4為視情況經取代之–(CH 2) 4-。在一些實施例中,X 4為視情況經取代之–(CH 2) 5-。在一些實施例中,X 4為視情況經取代之–(CH 2) 6-。 X 5 In some embodiments, X 4 is selected from the group consisting of an optionally substituted C 2 -C 14 alkylene and an optionally substituted C 2 -C 14 alkenylene. In some embodiments, X 4 is an optionally substituted C 2 -C 14 alkylene. In some embodiments, X 4 is an optionally substituted C 2 -C 10 alkylene. In some embodiments, X 4 is an optionally substituted C 2 -C 8 alkylene. In some embodiments, X 4 is an optionally substituted C 2 -C 6 alkylene. In some embodiments, X 4 is an optionally substituted C 3 -C 6 alkylene. In some embodiments, X 4 is an optionally substituted C 3 alkylene. In some embodiments, X 4 is an optionally substituted C 4 alkylene. In some embodiments, X 4 is an optionally substituted C 5 alkylene. In some embodiments, X 4 is an optionally substituted C 6 alkylene. In some embodiments, X 4 is an optionally substituted –(CH 2 ) 2 -. In some embodiments, X 4 is an optionally substituted –(CH 2 ) 3 -. In some embodiments, X 4 is an optionally substituted –(CH 2 ) 4 -. In some embodiments, X 4 is an optionally substituted –(CH 2 ) 5 -. In some embodiments, X 4 is an optionally substituted –(CH 2 ) 6 -. X 5

在一些實施例中,X 5選自由以下組成之群:視情況經取代之C 2-C 14伸烷基及視情況經取代之C 2-C 14伸烯基。在一些實施例中,X 5為視情況經取代之C 2-C 14伸烷基。在一些實施例中,X 5為視情況經取代之C 2-C 10伸烷基。在一些實施例中,X 5為視情況經取代之C 2-C 8伸烷基。在一些實施例中,X 5為視情況經取代之C 2-C 6伸烷基。在一些實施例中,X 5為視情況經取代之C 3-C 6伸烷基。在一些實施例中,X 5為視情況經取代之C 3伸烷基。在一些實施例中,X 5為視情況經取代之C 4伸烷基。在一些實施例中,X 5為視情況經取代之C 5伸烷基。在一些實施例中,X 5為視情況經取代之C 6伸烷基。在一些實施例中,X 5為視情況經取代之–(CH 2) 2-。在一些實施例中,X 5為視情況經取代之–(CH 2) 3-。在一些實施例中,X 5為視情況經取代之–(CH 2) 4-。在一些實施例中,X 5為視情況經取代之–(CH 2) 5-。在一些實施例中,X 5為視情況經取代之–(CH 2) 6-。 Y 1 In some embodiments, X 5 is selected from the group consisting of an optionally substituted C 2 -C 14 alkylene and an optionally substituted C 2 -C 14 alkenylene. In some embodiments, X 5 is an optionally substituted C 2 -C 14 alkylene. In some embodiments, X 5 is an optionally substituted C 2 -C 10 alkylene. In some embodiments, X 5 is an optionally substituted C 2 -C 8 alkylene. In some embodiments, X 5 is an optionally substituted C 2 -C 6 alkylene. In some embodiments, X 5 is an optionally substituted C 3 -C 6 alkylene. In some embodiments, X 5 is an optionally substituted C 3 alkylene. In some embodiments, X 5 is an optionally substituted C 4 alkylene. In some embodiments, X 5 is an optionally substituted C 5 alkylene. In some embodiments, X 5 is an optionally substituted C 6 alkylene. In some embodiments, X 5 is an optionally substituted –(CH 2 ) 2 -. In some embodiments, X 5 is an optionally substituted –(CH 2 ) 3 -. In some embodiments, X 5 is an optionally substituted –(CH 2 ) 4 -. In some embodiments, X 5 is an optionally substituted –(CH 2 ) 5 -. In some embodiments, X 5 is an optionally substituted –(CH 2 ) 6 -. Y 1

在一些實施例中,Y 1選自由以下組成之群: In some embodiments, Y1 is selected from the group consisting of: , , , , , , , , , , and .

在一些實施例中,Y 1Y 2 In some embodiments, Y1 is or . Y 2

在一些實施例中,Y 2選自由以下組成之群: In some embodiments, Y2 is selected from the group consisting of: , , , , , , , , , , and .

在一些實施例中,Y 2表(II). 具有環狀核心之可離子化脂質的非限制性實例 結構 化合物編號 CY1 CY2 CY3 CY4 CY5 CY6 CY7 CY8 CY9 CY10 CY11 CY12 CY13 CY14 CY15 CY16 CY17 CY18 CY19 CY20 CY21 CY22 CY23 CY24 CY25 CY26 CY27 CY28 CY29 CY30 CY31 CY32 CY33 CY34 CY35 CY36 CY37 CY38 CY39 CY40 CY41 CY42 CY43 CY44 CY45 CY46 CY47 CY48 CY49 CY50 CY51 CY52 CY53 CY54 CY55 CY56 CY57 CY58 CY59 CY60 CY61 CY62 CY63 CY64 CY65 CY66 CY67 CY68 CY69 CY70 CY71 In some embodiments, Y2 is or Table (II). Non-limiting examples of ionizable lipids having a cyclic core Structure Compound No. CY1 CY2 CY3 CY4 CY5 CY6 CY7 CY8 CY9 CY10 CY11 CY12 CY13 CY14 CY15 CY16 CY17 CY18 CY19 CY20 CY21 CY22 CY23 CY24 CY25 CY26 CY27 CY28 CY29 CY30 CY31 CY32 CY33 CY34 CY35 CY36 CY37 CY38 CY39 CY40 CY41 CY42 CY43 CY44 CY45 CY46 CY47 CY48 CY49 CY50 CY51 CY52 CY53 CY54 CY55 CY56 CY57 CY58 CY59 CY60 CY61 CY62 CY63 CY64 CY65 CY66 CY67 CY68 CY69 CY70 CY71

在一些實施例中,本揭示案之LNP包含PCT申請案PCT/US2022/082276 (WO2023122752)中所揭示之可離子化脂質,該案以引用之方式整體併入本文中。In some embodiments, the LNPs of the present disclosure comprise ionizable lipids disclosed in PCT application PCT/US2022/082276 (WO2023122752), which is incorporated herein by reference in its entirety.

在一實施例中,本揭示案提供式 IA化合物: IA, 或其醫藥學上可接受之鹽或溶劑合物,其中: A選自由以下組成之群:-N(R 1a)-及-C(R')-OC(=O)(R 8a)-; R 1a為-L 1-R 1; L 1為C 2-C 6伸烷基或–(CH 2) 2-6-OC(=O)-; R 1選自由以下組成之群:-OH、 ; R 2a、R 2b及R 2c獨立地選自由氫及C 1-C 6烷基組成之群; R 3a、R 3b及R 3c獨立地選自由氫及C 1-C 6烷基組成之群; R 4a、R 4b及R 4c獨立地選自由氫及C 1-C 6烷基組成之群; R 5a、R 5b及R 5c獨立地選自由氫及C 1-C 6烷基組成之群; R 6a、R 6b及R 6c獨立地選自由氫及C 1-C 6烷基組成之群;或 R 6a及R 6b與其所連接之氮原子合起來形成4員至8員雜環基;且R 6c選自由氫及C 1-C 6烷基組成之群; R 7a、R 7b及R 7c獨立地選自由氫及C 1-C 6烷基組成之群;或 R 7a及R 7b與其所連接之氮原子合起來形成4員至8員雜環基;且R 7c選自由氫及C 1-C 6烷基組成之群; R'選自由氫及C 1-C 6烷基組成之群; R 8a為- L 2-R 8; L 2為C 2-C 6伸烷基; R 8選自由以下組成之群:-NR 9aR 9b; R 9a及R 9b獨立地選自由氫及C 1-C 6烷基組成之群;或 R 9a及R 9b與其所連接之氮原子合起來形成4員至8員雜環基; Q 1為C 1-C 20伸烷基; W 1選自由以下組成之群:-C(=O)O-、-OC(=O)-、-C(=O)N(R 12a)-、-N(R 12a)C(=O)-、-OC(=O)N(R 12a)-、- N(R 12a)C(=O)O-及-OC(=O)O-; R 12a選自由氫及C 1-C 6烷基組成之群; X 1為視情況經取代之C 1-C 15伸烷基;或 X 1為一鍵; Y 1選自由以下組成之群:-(CH 2) m-、-O-、-S-及-S-S-; m為0、1、2、3、4、5或6; Z 1選自由以下組成之群:視情況經取代之C 4-C 12伸環烷基、 ; R 10選自由以下組成之群:氫、C 1-C 20烷基及C 2-C 20烯基; Q 2為C 1-C 20伸烷基; W 2選自由以下組成之群:-C(=O)O-、-C(=O)N(R 12b)-、-OC(=O)N(R 12b)-及-OC(=O)O-; R 12b選自由氫及C 1-C 6烷基組成之群; X 2為視情況經取代之C 1-C 15伸烷基;或 X 2為一鍵; Y 2選自由以下組成之群:-(CH 2) n-、-O-、-S-及-S-S-; n為0、1、2、3、4、5或6; Z 2選自由以下組成之群:-(CH 2) p-、視情況經取代之C 4-C 12伸環烷基、 ; p為0或1;且 R 11選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; 其中X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10及R 11之一或多個亞甲基鍵聯視情況且獨立地經選自-O-、-CH=CH-、-S-及C 3-C 6伸環烷基之基團置換。 In one embodiment, the present disclosure provides a compound of formula IA : IA , or a pharmaceutically acceptable salt or solvent thereof, wherein: A is selected from the group consisting of: -N(R 1a )- and -C(R')-OC(=O)(R 8a )-; R 1a is -L 1 -R 1 ; L 1 is C 2 -C 6 alkylene or -(CH 2 ) 2-6 -OC(=O)-; R 1 is selected from the group consisting of: -OH, , , , , , , and ; R 2a , R 2b and R 2c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 3a , R 3b and R 3c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 4a , R 4b and R 4c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 5a , R 5b and R 5c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 6a , R 6b and R 6c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 6a and R 6b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; and R 6c is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 7a , R 7b and R 7c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 7a and R 7b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; and R 7c is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R' is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 8a is -L 2 -R 8 ; L 2 is C 2 -C 6 alkylene; R 8 is selected from the group consisting of: -NR 9a R 9b , , , , , , , and R 9a and R 9b are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 9a and R 9b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; Q 1 is C 1 -C 20 alkylene; W 1 is selected from the group consisting of: -C(=O)O-, -OC(=O)-, -C(=O)N(R 12a )-, -N(R 12a )C(=O)-, -OC(=O)N(R 12a )-, -N(R 12a )C(=O)O- and -OC(=O)O-; R 12a is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; X 1 is an optionally substituted C 1 -C 15 alkylene; or X 1 is a bond; Y 1 is selected from the group consisting of: -(CH 2 ) m -, -O-, -S- and -SS-; m is 0, 1, 2, 3, 4, 5 or 6; Z 1 is selected from the group consisting of: optionally substituted C 4 -C 12 cycloalkylene, and R 10 is selected from the group consisting of hydrogen, C 1 -C 20 alkyl and C 2 -C 20 alkenyl; Q 2 is C 1 -C 20 alkylene; W 2 is selected from the group consisting of -C(=O)O-, -C(=O)N(R 12b )-, -OC(=O)N(R 12b )- and -OC(=O)O-; R 12b is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; X 2 is an optionally substituted C 1 -C 15 alkylene; or X 2 is a bond; Y 2 is selected from the group consisting of -(CH 2 ) n -, -O-, -S- and -SS-; n is 0, 1, 2, 3, 4, 5 or 6; Z 2 is selected from the group consisting of -(CH 2 ) p -, optionally substituted C 4 -C 12 cycloalkylene group, and ; p is 0 or 1; and R 11 is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; wherein one or more methylene bonds of X 1 , X 2 , Y 1 , Y 2 , Z 1 , Z 2 , R 10 and R 11 are, as the case may be, independently replaced by a group selected from -O-, -CH=CH-, -S- and C 3 -C 6 cycloalkylene.

在一實施例中,本揭示案提供式 IB化合物: IB, 或其醫藥學上可接受之鹽或溶劑合物,其中: A選自由以下組成之群:-N(R 1a)-及-C(R')-OC(=O)(R 8a)-; R 1a為-L 1-R 1; L 1為C 2-C 6伸烷基或–(CH 2) 2-6-OC(=O)-; R 1選自由以下組成之群:-OH、 ; R 2a、R 2b及R 2c獨立地選自由氫及C 1-C 6烷基組成之群; R 3a、R 3b及R 3c獨立地選自由氫及C 1-C 6烷基組成之群; R 4a、R 4b及R 4c獨立地選自由氫及C 1-C 6烷基組成之群; R 5a、R 5b及R 5c獨立地選自由氫及C 1-C 6烷基組成之群; R 6a、R 6b及R 6c獨立地選自由氫及C 1-C 6烷基組成之群;或 R 6a及R 6b與其所連接之氮原子合起來形成4員至8員雜環基;且R 6c選自由氫及C 1-C 6烷基組成之群; R 7a、R 7b及R 7c獨立地選自由氫及C 1-C 6烷基組成之群;或 R 7a及R 7b與其所連接之氮原子合起來形成4員至8員雜環基;且R 7c選自由氫及C 1-C 6烷基組成之群; R'選自由氫及C 1-C 6烷基組成之群; R 8a為- L 2-R 8; L 2為C 2-C 6伸烷基; R 8選自由以下組成之群:-NR 9aR 9b; R 9a及R 9b獨立地選自由氫及C 1-C 6烷基組成之群;或 R 9a及R 9b與其所連接之氮原子合起來形成4員至8員雜環基; Q 1為C 1-C 20伸烷基; W 1選自由以下組成之群:-C(=O)O-、-OC(=O)-、-C(=O)N(R 12a)-、-N(R 12a)C(=O)-、-OC(=O)N(R 12a)-、- N(R 12a)C(=O)O-及-OC(=O)O-; R 12a選自由氫及C 1-C 6烷基組成之群; X 1為視情況經取代之C 1-C 15伸烷基;或 X 1為一鍵; Y 1選自由以下組成之群:-(CH 2) m-、-O-、-S-及-S-S-; m為0、1、2、3、4、5或6; Z 1選自由以下組成之群:視情況經取代之C 5-C 12橋接伸環烷基、 ; R 10選自由以下組成之群:氫、C 1-C 20烷基及C 2-C 20烯基; Q 2為C 1-C 20伸烷基; W 2選自由以下組成之群:-C(=O)O-、-C(=O)N(R 12b)-、-OC(=O)N(R 12b)-及-OC(=O)O-; R 12b選自由氫及C 1-C 6烷基組成之群; X 2為視情況經取代之C 1-C 15伸烷基;或 X 2為一鍵; Y 2選自由以下組成之群:-(CH 2) n-、-O-、-S-及-S-S-; n為0、1、2、3、4、5或6; Z 2選自由以下組成之群:-(CH 2) p-、視情況經取代之C 4-C 12伸環烷基、 ; p為0或1;且 R 11選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; 其中X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10及R 11之一或多個亞甲基鍵聯視情況且獨立地經選自-O-、-CH=CH-、-S-及C 3-C 6伸環烷基之基團置換。 In one embodiment, the present disclosure provides a compound of formula IB : IB , or a pharmaceutically acceptable salt or solvent thereof, wherein: A is selected from the group consisting of: -N(R 1a )- and -C(R')-OC(=O)(R 8a )-; R 1a is -L 1 -R 1 ; L 1 is C 2 -C 6 alkylene or -(CH 2 ) 2-6 -OC(=O)-; R 1 is selected from the group consisting of: -OH, , , , , , , and ; R 2a , R 2b and R 2c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 3a , R 3b and R 3c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 4a , R 4b and R 4c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 5a , R 5b and R 5c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 6a , R 6b and R 6c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 6a and R 6b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; and R 6c is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 7a , R 7b and R 7c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 7a and R 7b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; and R 7c is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R' is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 8a is -L 2 -R 8 ; L 2 is C 2 -C 6 alkylene; R 8 is selected from the group consisting of: -NR 9a R 9b , , , , , , , and R 9a and R 9b are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 9a and R 9b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; Q 1 is C 1 -C 20 alkylene; W 1 is selected from the group consisting of: -C(=O)O-, -OC(=O)-, -C(=O)N(R 12a )-, -N(R 12a )C(=O)-, -OC(=O)N(R 12a )-, -N(R 12a )C(=O)O- and -OC(=O)O-; R 12a is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; X 1 is an optionally substituted C 1 -C 15 alkylene; or X 1 is a bond; Y 1 is selected from the group consisting of: -(CH 2 ) m -, -O-, -S- and -SS-; m is 0, 1, 2, 3, 4, 5 or 6; Z 1 is selected from the group consisting of: optionally substituted C 5 -C 12 bridged cycloalkyl, and R 10 is selected from the group consisting of hydrogen, C 1 -C 20 alkyl and C 2 -C 20 alkenyl; Q 2 is C 1 -C 20 alkylene; W 2 is selected from the group consisting of -C(=O)O-, -C(=O)N(R 12b )-, -OC(=O)N(R 12b )- and -OC(=O)O-; R 12b is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; X 2 is an optionally substituted C 1 -C 15 alkylene; or X 2 is a bond; Y 2 is selected from the group consisting of -(CH 2 ) n -, -O-, -S- and -SS-; n is 0, 1, 2, 3, 4, 5 or 6; Z 2 is selected from the group consisting of -(CH 2 ) p -, optionally substituted C 4 -C 12 cycloalkylene group, and ; p is 0 or 1; and R 11 is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; wherein one or more methylene bonds of X 1 , X 2 , Y 1 , Y 2 , Z 1 , Z 2 , R 10 and R 11 are, as the case may be, independently replaced by a group selected from -O-, -CH=CH-, -S- and C 3 -C 6 cycloalkylene.

在一實施例中,本揭示案提供式 IC化合物: IC, 或其醫藥學上可接受之鹽或溶劑合物,其中: A選自由以下組成之群:-N(R 1a)-及-C(R')-OC(=O)(R 8a)-; R 1a為-L 1-R 1; L 1為C 2-C 6伸烷基或–(CH 2) 2-6-OC(=O)-; R 1選自由以下組成之群:-OH、 ; R 2a、R 2b及R 2c獨立地選自由氫及C 1-C 6烷基組成之群; R 3a、R 3b及R 3c獨立地選自由氫及C 1-C 6烷基組成之群; R 4a、R 4b及R 4c獨立地選自由氫及C 1-C 6烷基組成之群; R 5a、R 5b及R 5c獨立地選自由氫及C 1-C 6烷基組成之群; R 6a、R 6b及R 6c獨立地選自由氫及C 1-C 6烷基組成之群;或 R 6a及R 6b與其所連接之氮原子合起來形成4員至8員雜環基;且R 6c選自由氫及C 1-C 6烷基組成之群; R 7a、R 7b及R 7c獨立地選自由氫及C 1-C 6烷基組成之群;或 R 7a及R 7b與其所連接之氮原子合起來形成4員至8員雜環基;且R 7c選自由氫及C 1-C 6烷基組成之群; R'選自由氫及C 1-C 6烷基組成之群; R 8a為- L 2-R 8; L 2為C 2-C 6伸烷基; R 8選自由以下組成之群:-NR 9aR 9b; R 9a及R 9b獨立地選自由氫及C 1-C 6烷基組成之群;或 R 9a及R 9b與其所連接之氮原子合起來形成4員至8員雜環基; Q 1為C 1-C 20伸烷基; W 1選自由以下組成之群:-C(=O)O-、-OC(=O)-、-C(=O)N(R 12a)-、-N(R 12a)C(=O)-、-OC(=O)N(R 12a)-、- N(R 12a)C(=O)O-及-OC(=O)O-; R 12a選自由氫及C 1-C 6烷基組成之群; X 1為視情況經取代之分支鏈C 1-C 15伸烷基;或 X 1為一鍵; Y 1選自由以下組成之群:-(CH 2) m-、-O-、-S-及-S-S-; m為0、1、2、3、4、5或6; Z 1選自由以下組成之群:視情況經取代之C 4-C 12伸環烷基、 ; R 10選自由以下組成之群:氫、C 1-C 20烷基及C 2-C 20烯基; Q 2為C 1-C 20伸烷基; W 2選自由以下組成之群:-C(=O)O-、-C(=O)N(R 12b)-、-OC(=O)N(R 12b)-及-OC(=O)O-; R 12b選自由氫及C 1-C 6烷基組成之群; X 2為視情況經取代之C 1-C 15伸烷基;或 Y 2選自由以下組成之群:-(CH 2) n-、-O-、-S-及-S-S-; n為0、1、2、3、4、5或6; Z 2為-(CH 2) p-; p為0或1;且 R 11為C 1-C 20支鏈烷基; 其中X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10及R 11之一或多個亞甲基鍵聯視情況且獨立地經選自-O-、-CH=CH-、-S-及C 3-C 6伸環烷基之基團置換。 In one embodiment, the present disclosure provides compounds of formula IC : IC , or a pharmaceutically acceptable salt or solvent thereof, wherein: A is selected from the group consisting of: -N(R 1a )- and -C(R')-OC(=O)(R 8a )-; R 1a is -L 1 -R 1 ; L 1 is C 2 -C 6 alkylene or -(CH 2 ) 2-6 -OC(=O)-; R 1 is selected from the group consisting of: -OH, , , , , , , and ; R 2a , R 2b and R 2c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 3a , R 3b and R 3c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 4a , R 4b and R 4c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 5a , R 5b and R 5c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 6a , R 6b and R 6c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 6a and R 6b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; and R 6c is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 7a , R 7b and R 7c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 7a and R 7b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; and R 7c is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R' is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 8a is -L 2 -R 8 ; L 2 is C 2 -C 6 alkylene; R 8 is selected from the group consisting of: -NR 9a R 9b , , , , , , , and R 9a and R 9b are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 9a and R 9b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; Q 1 is C 1 -C 20 alkylene; W 1 is selected from the group consisting of: -C(=O)O-, -OC(=O)-, -C(=O)N(R 12a )-, -N(R 12a )C(=O)-, -OC(=O)N(R 12a )-, -N(R 12a )C(=O)O- and -OC(=O)O-; R 12a is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; X 1 is an optionally substituted branched chain C 1 -C 15 alkylene; or X 1 is a bond; Y 1 is selected from the group consisting of: -(CH 2 ) m -, -O-, -S- and -SS-; m is 0, 1, 2, 3, 4, 5 or 6; Z 1 is selected from the group consisting of: optionally substituted C 4 -C 12 cycloalkylene, and R 10 is selected from the group consisting of hydrogen, C 1 -C 20 alkyl and C 2 -C 20 alkenyl; Q 2 is C 1 -C 20 alkylene; W 2 is selected from the group consisting of -C(=O)O-, -C(=O)N(R 12b )-, -OC(=O)N(R 12b )- and -OC(=O)O-; R 12b is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; X 2 is optionally substituted C 1 -C 15 alkylene; or Y 2 is selected from the group consisting of -(CH 2 ) n -, -O-, -S- and -SS-; n is 0, 1, 2, 3, 4, 5 or 6; Z 2 is -(CH 2 ) p -; p is 0 or 1; and R 11 is C 1 -C 20 branched alkyl groups; wherein one or more methylene bonds of X 1 , X 2 , Y 1 , Y 2 , Z 1 , Z 2 , R 10 and R 11 are optionally and independently replaced by a group selected from -O-, -CH=CH-, -S- and C 3 -C 6 cycloalkylene groups.

在一些實施例中,本揭示案提供式 IAIBIC中任一者之化合物或其醫藥學上可接受之鹽或溶劑合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In some embodiments, the present disclosure provides compounds of any one of Formulae IA , IB , IC, or a pharmaceutically acceptable salt or solvent thereof, wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在一些實施例中,本揭示案提供式 IAIBIC中任一者之化合物或其醫藥學上可接受之鹽或溶劑合物,其中Z 1並非金剛烷基。 In some embodiments, the present disclosure provides a compound of any one of Formulas IA , IB , IC, or a pharmaceutically acceptable salt or solvent thereof, wherein Z 1 is not an adamantyl group.

在一實施例中,本揭示案提供式 ID化合物: ID, 或其醫藥學上可接受之鹽或溶劑合物,其中: A選自由以下組成之群:-N(R 1a)-及-C(R')-OC(=O)(R 8a)-; R 1a為-L 1-R 1; L 1為C 2-C 6伸烷基或–(CH 2) 2-6-OC(=O)-; R 1選自由以下組成之群:-OH、 ; R 2a、R 2b及R 2c獨立地選自由氫及C 1-C 6烷基組成之群; R 3a、R 3b及R 3c獨立地選自由氫及C 1-C 6烷基組成之群; R 4a、R 4b及R 4c獨立地選自由氫及C 1-C 6烷基組成之群; R 5a、R 5b及R 5c獨立地選自由氫及C 1-C 6烷基組成之群; R 6a、R 6b及R 6c獨立地選自由氫及C 1-C 6烷基組成之群;或 R 6a及R 6b與其所連接之氮原子合起來形成4員至8員雜環基;且R 6c選自由氫及C 1-C 6烷基組成之群; R 7a、R 7b及R 7c獨立地選自由氫及C 1-C 6烷基組成之群;或 R 7a及R 7b與其所連接之氮原子合起來形成4員至8員雜環基;且R 7c選自由氫及C 1-C 6烷基組成之群; R'選自由氫及C 1-C 6烷基組成之群; R 8a為- L 2-R 8; L 2為C 2-C 6伸烷基; R 8選自由以下組成之群:-NR 9aR 9b; R 9a及R 9b獨立地選自由氫及C 1-C 6烷基組成之群;或 R 9a及R 9b與其所連接之氮原子合起來形成4員至8員雜環基; Q 1為C 1-C 20伸烷基; W 1選自由以下組成之群:-C(=O)O-、-OC(=O)-、-C(=O)N(R 12a)-、-N(R 12a)C(=O)-、-OC(=O)N(R 12a)-、- N(R 12a)C(=O)O-及-OC(=O)O-; R 12a選自由氫及C 1-C 6烷基組成之群; X 1為視情況經取代之分支鏈C 1-C 15伸烷基;或 X 1為一鍵; Y 1選自由以下組成之群:-(CH 2) m-、-O-、-S-及-S-S-; m為0、1、2、3、4、5或6; Z 1為視情況經取代之C 5-C 12橋接伸環烷基; R 10選自由以下組成之群:氫、C 1-C 20烷基及C 2-C 20烯基; Q 2為C 1-C 20伸烷基; W 2選自由以下組成之群:-C(=O)O-、-C(=O)N(R 12b)-、-OC(=O)N(R 12b)-及-OC(=O)O-; R 12b選自由氫及C 1-C 6烷基組成之群; X 2為視情況經取代之C 1-C 15伸烷基;或 Y 2為-(CH 2) n-; n為0、1、2、3、4、5或6; Z 2為-(CH 2) p-; p為0或1;且 R 11為C 1-C 20分支鏈烷基。 In one embodiment, the present disclosure provides a compound of formula ID : ID , or a pharmaceutically acceptable salt or solvent thereof, wherein: A is selected from the group consisting of: -N(R 1a )- and -C(R')-OC(=O)(R 8a )-; R 1a is -L 1 -R 1 ; L 1 is C 2 -C 6 alkylene or -(CH 2 ) 2-6 -OC(=O)-; R 1 is selected from the group consisting of: -OH, , , , , , , and ; R 2a , R 2b and R 2c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 3a , R 3b and R 3c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 4a , R 4b and R 4c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 5a , R 5b and R 5c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 6a , R 6b and R 6c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 6a and R 6b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; and R 6c is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 7a , R 7b and R 7c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 7a and R 7b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; and R 7c is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R' is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 8a is -L 2 -R 8 ; L 2 is C 2 -C 6 alkylene; R 8 is selected from the group consisting of: -NR 9a R 9b , , , , , , , and R 9a and R 9b are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 9a and R 9b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; Q 1 is C 1 -C 20 alkylene; W 1 is selected from the group consisting of: -C(=O)O-, -OC(=O)-, -C(=O)N(R 12a )-, -N(R 12a )C(=O)-, -OC(=O)N(R 12a )-, -N(R 12a )C(=O)O- and -OC(=O)O-; R 12a is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; X 1 is an optionally substituted branched chain C 1 -C 15 alkylene; or X 1 is a bond; Y 1 is selected from the group consisting of: -(CH 2 ) m -, -O-, -S- and -SS-; m is 0, 1, 2, 3, 4, 5 or 6; Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkyl group; R 10 is selected from the group consisting of: hydrogen, C 1 -C 20 alkyl and C 2 -C 20 alkenyl; Q 2 is a C 1 -C 20 alkylene group; W 2 is selected from the group consisting of: -C(=O)O-, -C(=O)N(R 12b )-, -OC(=O)N(R 12b )- and -OC(=O)O-; R 12b is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; X 2 is an optionally substituted C 1 -C or Y 2 is -(CH 2 ) n -; n is 0, 1, 2, 3, 4, 5 or 6; Z 2 is -(CH 2 ) p -; p is 0 or 1; and R 11 is a C 1 -C 20 branched chain alkyl.

在一些實施例中,本揭示案提供式 ID化合物或其醫藥學上可接受之鹽或溶劑合物,其中Z 1並非金剛烷基。 In some embodiments, the present disclosure provides a compound of formula ID or a pharmaceutically acceptable salt or solvent thereof, wherein Z 1 is not an adamantyl group.

在一實施例中,本揭示案提供式 I化合物: I, 或其醫藥學上可接受之鹽或溶劑合物,其中: A選自由以下組成之群:-N(R 1a)-及-C(R')-OC(=O)(R 8a)-; R 1a為-L 1-R 1; L 1為C 2-C 6伸烷基; R 1選自由以下組成之群:-OH、 ; R 2a、R 2b及R 2c獨立地選自由氫及C 1-C 6烷基組成之群; R 3a、R 3b及R 3c獨立地選自由氫及C 1-C 6烷基組成之群; R 4a、R 4b及R 4c獨立地選自由氫及C 1-C 6烷基組成之群; R 5a、R 5b及R 5c獨立地選自由氫及C 1-C 6烷基組成之群; R 6a、R 6b及R 6c獨立地選自由氫及C 1-C 6烷基組成之群;或 R 6a及R 6b與其所連接之氮原子合起來形成4員至8員雜環基;且R 6c選自由氫及C 1-C 6烷基組成之群; R 7a、R 7b及R 7c獨立地選自由氫及C 1-C 6烷基組成之群;或 R 7a及R 7b與其所連接之氮原子合起來形成4員至8員雜環基;且R 7c選自由氫及C 1-C 6烷基組成之群; R'選自由氫及C 1-C 6烷基組成之群; R 8a為- L 2-R 8; L 2為C 2-C 6伸烷基; R 8為-NR 9aR 9b; R 9a及R 9b獨立地選自由氫及C 1-C 6烷基組成之群;或 R 9a及R 9b與其所連接之氮原子合起來形成4員至8員雜環基; Q 1為C 1-C 20伸烷基; W 1選自由以下組成之群:-C(=O)O-、-OC(=O)-、-C(=O)N(R 12a)-、-N(R 12a)C(=O)-、-OC(=O)N(R 12a)-、- N(R 12a)C(=O)O-及-OC(=O)O-; R 12a選自由氫及C 1-C 6烷基組成之群; X 1為C 1-C 15伸烷基;或 X 1為一鍵; Y 1選自由以下組成之群:-(CH 2) m-、-O-、-S-及-S-S-; m為0、1、2、3、4、5或6; Z 1選自由以下組成之群:C 4-C 12伸環烷基、 ; R 10選自由以下組成之群:氫、C 1-C 20烷基及C 2-C 20烯基; Q 2為C 1-C 20伸烷基; W 2選自由以下組成之群:-C(=O)O-、-C(=O)N(R 12b)-、-OC(=O)N(R 12b)-及-OC(=O)O-; R 12b選自由氫及C 1-C 6烷基組成之群; X 2為C 1-C 15伸烷基;或 X 2為一鍵; Y 2選自由以下組成之群:-(CH 2) n-、-O-、-S-及-S-S-; n為0、1、2、3、4、5或6; Z 2選自由以下組成之群:-(CH 2) p-、C 4-C 12伸環烷基、 ; p為0或1;且 R 11選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基。 In one embodiment, the present disclosure provides a compound of formula I : I, or a pharmaceutically acceptable salt or solvent thereof, wherein: A is selected from the group consisting of: -N(R 1a )- and -C(R')-OC(=O)(R 8a )-; R 1a is -L 1 -R 1 ; L 1 is C 2 -C 6 alkylene; R 1 is selected from the group consisting of: -OH, , , , , and ; R 2a , R 2b and R 2c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 3a , R 3b and R 3c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 4a , R 4b and R 4c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 5a , R 5b and R 5c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 6a , R 6b and R 6c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 6a and R 6b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; and R 6c is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 7a , R 7b and R 7c are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 7a and R 7b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; and R 7c is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R' is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; R 8a is -L 2 -R 8 ; L 2 is C 2 -C 6 alkylene; R 8 is -NR 9a R 9b ; R 9a and R 9b are independently selected from the group consisting of hydrogen and C 1 -C 6 alkyl; or R 9a and R 9b together with the nitrogen atom to which they are attached form a 4- to 8-membered heterocyclic group; Q 1 is C 1 -C 20 alkylene; W wherein R 1 is selected from the group consisting of -C(=O)O-, -OC(=O)-, -C(=O)N(R 12a )-, -N(R 12a )C(=O)-, -OC(=O)N(R 12a )-, -N(R 12a )C(=O)O-, and -OC(=O)O-; R 12a is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; X 1 is C 1 -C 15 alkylene; or X 1 is a bond; Y 1 is selected from the group consisting of -(CH 2 ) m -, -O-, -S-, and -SS-; m is 0, 1, 2, 3, 4, 5, or 6; Z 1 is selected from the group consisting of C 4 -C 12 cycloalkylene, and R 10 is selected from the group consisting of hydrogen, C 1 -C 20 alkyl and C 2 -C 20 alkenyl; Q 2 is C 1 -C 20 alkylene; W 2 is selected from the group consisting of -C(=O)O-, -C(=O)N(R 12b )-, -OC(=O)N(R 12b )- and -OC(=O)O-; R 12b is selected from the group consisting of hydrogen and C 1 -C 6 alkyl; X 2 is C 1 -C 15 alkylene; or X 2 is a bond; Y 2 is selected from the group consisting of -(CH 2 ) n -, -O-, -S- and -SS-; n is 0, 1, 2, 3, 4, 5 or 6; Z 2 is selected from the group consisting of -(CH 2 ) p -, C 4 -C 12- cycloalkylene, and ; p is 0 or 1; and R 11 is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl.

在另一實施例中,本揭示案提供式 II化合物: II, 或其醫藥學上可接受之鹽或溶劑合物,其中R 1、R 10、R 11、Q 1、Q 2、W 1、W 2、X 1、X 2、Y 1、Y 2、Z 1及Z 2如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula II : II , or a pharmaceutically acceptable salt or solvent thereof, wherein R1 , R10 , R11 , Q1 , Q2 , W1 , W2 , X1 , X2 , Y1 , Y2 , Z1 and Z2 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在另一實施例中,本揭示案提供式 III化合物: III, 或其醫藥學上可接受之鹽或溶劑合物,其中R'、R 9a、R 9b、R 10、R 11、L 2、Q 1、Q 2、W 1、W 2、X 1、X 2、Y 1、Y 2、Z 1及Z 2如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula III : III , or a pharmaceutically acceptable salt or solvent thereof, wherein R', R9a , R9b , R10 , R11 , L2 , Q1 , Q2 , W1 , W2 , X1 , X2 , Y1 , Y2 , Z1 and Z2 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在另一實施例中,本揭示案提供式 IV化合物: VI或其醫藥學上可接受之鹽或溶劑合物,其中R 9a、R 9b、L 2、Q 1、Q 2、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula IV : VI or a pharmaceutically acceptable salt or solvent thereof, wherein R 9a , R 9b , L 2 , Q 1 , Q 2 , X 1 , X 2 , Y 1 , Y 2 , Z 1 , Z 2 , R 10 , and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在另一實施例中,本揭示案提供式 VI’化合物: VI’或其醫藥學上可接受之鹽或溶劑合物,其中R 9a、R 9b、L 2、Q 1、Q 2、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula VI' : VI' or a pharmaceutically acceptable salt or solvent thereof, wherein R 9a , R 9b , L 2 , Q 1 , Q 2 , X 1 , X 2 , Y 1 , Y 2 , Z 1 , Z 2 , R 10 , and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在另一實施例中,本揭示案提供式 VI’’化合物: VI’’或其醫藥學上可接受之鹽或溶劑合物,其中R 9a、R 9b、L 2、Q 1、Q 2、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula VI'' : VI'' or a pharmaceutically acceptable salt or solvent thereof, wherein R 9a , R 9b , L 2 , Q 1 , Q 2 , X 1 , X 2 , Y 1 , Y 2 , Z 1 , Z 2 , R 10 , and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在另一實施例中,本揭示案提供式 VI’’’化合物: VI’’’或其醫藥學上可接受之鹽或溶劑合物,其中R 9a、R 9b、L 2、Q 1、Q 2、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula VI''' : VI''' or a pharmaceutically acceptable salt or solvent thereof, wherein R 9a , R 9b , L 2 , Q 1 , Q 2 , X 1 , X 2 , Y 1 , Y 2 , Z 1 , Z 2 , R 10 , and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在另一實施例中,本揭示案提供式 VII化合物: VII或其醫藥學上可接受之鹽或溶劑合物,其中R 1、L 1、Q 1、Q 2、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 在另一實施例中,本揭示案提供式 VII’化合物: VII’或其醫藥學上可接受之鹽或溶劑合物,其中R 1、L 1、Q 1、Q 2、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula VII : VII or a pharmaceutically acceptable salt or solvent thereof, wherein R 1 , L 1 , Q 1 , Q 2 , X 1 , X 2 , Y 1 , Y 2 , Z 1 , Z 2 , R 10 , R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below. In another embodiment, the present disclosure provides a compound of Formula VII' : VII' or a pharmaceutically acceptable salt or solvent thereof, wherein R1 , L1 , Q1 , Q2 , X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在另一實施例中,本揭示案提供式 VII’’化合物: VII’’或其醫藥學上可接受之鹽或溶劑合物,其中R 1、L 1、Q 1、Q 2、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula VII'' : VII'' or a pharmaceutically acceptable salt or solvent thereof, wherein R1 , L1 , Q1 , Q2 , X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在另一實施例中,本揭示案提供式 VII’’’化合物: VII’’’或其醫藥學上可接受之鹽或溶劑合物,其中R 1、L 1、Q 1、Q 2、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 式IA、式IB、式IC、式I,在另一實施例中,本揭示案提供式 VIII化合物: VIII或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; A、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula VII''' : VII''' or a pharmaceutically acceptable salt or solvent thereof, wherein R 1 , L 1 , Q 1 , Q 2 , X 1 , X 2 , Y 1 , Y 2 , Z 1 , Z 2 , R 10 , R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below. Formula IA, Formula IB, Formula IC, Formula I, in another embodiment, the present disclosure provides a compound of Formula VIII : VIII or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; A, X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below.

在某些實施例中,該化合物為式 VIII化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula VIII , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 VIII’化合物: VIII’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; A、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula VIII' : VIII' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; A, X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在某些實施例中,該化合物為式 VIII’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula VIII' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 VIII’’化合物: VIII’’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; A、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula VIII'' : VIII'' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; A, X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在某些實施例中,該化合物為式 VIII’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula VIII'' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 VIII’’’化合物: VIII’’’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; A、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula VIII''' : VIII''' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; A, X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在某些實施例中,該化合物為式 VIII’’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula VIII''' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 IX化合物: IX或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; L 1、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula IX : IX or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; L1 , X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below.

在某些實施例中,該化合物為式 IX化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of Formula IX , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 IX’化合物: IX’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; L 1、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula IX' : IX' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; L1 , X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below.

在某些實施例中,該化合物為式 IX’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula IX' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 IX’’化合物: IX’’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; L 1、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula IX'' : IX'' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; L1 , X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below.

在某些實施例中,該化合物為式 IX’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula IX'' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 IX’’’化合物: IX’’’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; L 1、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula IX''' : IX''' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; L1 , X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below.

在某些實施例中,該化合物為式 IX’’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula IX''' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 X化合物: X或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; L 1、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 9a、R 9b、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula X : X or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; L1 , X1 , X2 , Y1 , Y2 , Z1 , Z2 , R9a , R9b , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式 X化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula X , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 X’化合物: X’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; L 1、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 9a、R 9b、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula X' : X' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; L1 , X1 , X2 , Y1 , Y2 , Z1 , Z2 , R9a , R9b , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below.

在某些實施例中,該化合物為式 X’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula X' , wherein Z1 is an optionally substituted C5 - C12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 X’’化合物: X’’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; L 1、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 9a、R 9b、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula X'' : X'' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; L1 , X1 , X2 , Y1 , Y2 , Z1 , Z2 , R9a , R9b , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式 X’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula X″ , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 X’’’化合物: X’’’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; L 1、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 9a、R 9b、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula X''' : X''' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; L1 , X1 , X2 , Y1 , Y2 , Z1 , Z2 , R9a , R9b , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below.

在某些實施例中,該化合物為式 X’’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula X''' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XI化合物: XI或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 A、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula XI : XI or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; r2 is 0, 1 or 2; s2 is 0, 1, 2, 3, 4, 5, 6; and A, X1 , Y1 , Z1 , R10 and R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在某些實施例中,該化合物為式 XI化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula XI , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XI’化合物: XI’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 A、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula XI' : XI' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; r2 is 0, 1 or 2; s2 is 0, 1, 2, 3, 4, 5, 6; and A, X1 , Y1 , Z1 , R10 and R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式 XI’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula XI' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XI’’化合物: XI’’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 A、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula XI'' : XI'' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; r2 is 0, 1 or 2; s2 is 0, 1, 2, 3, 4, 5, 6; and A, X1 , Y1 , Z1 , R10 and R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式 XI’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula XI″ , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XI’’’化合物: XI’’’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 A、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula XI''' : XI''' or a pharmaceutically acceptable salt or solvent thereof, wherein q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and A, X 1 , Y 1 , Z 1 , R 10 and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式 XI’’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula XI''' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XII化合物: XII或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XII : XII or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; r2 is 0, 1 or 2; s2 is 0, 1, 2, 3, 4, 5, 6; and L1 , X1 , Y1 , Z1 , R10 and R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在某些實施例中,該化合物為式 XII化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of Formula XII , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XII’化合物: XII’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XII' : XII' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; r2 is 0, 1 or 2; s2 is 0, 1, 2, 3, 4, 5, 6; and L1 , X1 , Y1 , Z1 , R10 and R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在某些實施例中,該化合物為式 XII’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula XII' , wherein Z1 is an optionally substituted C5 - C12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XII’’化合物: XII’’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XII'' : XII'' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; r2 is 0, 1 or 2; s2 is 0, 1, 2, 3, 4, 5, 6; and L1 , X1 , Y1 , Z1 , R10 and R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在某些實施例中,該化合物為式 XII’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula XII'' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XII’’’化合物: XII’’’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XII''' : XII''' or a pharmaceutically acceptable salt or solvent thereof, wherein q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and L 1 , X 1 , Y 1 , Z 1 , R 10 and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式 XII’’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of Formula XII''' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XIII化合物: XIII或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 9a、R 9b、R 10及R 11如本文中之式IA、式IB、式IC、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula XIII : XIII or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; r2 is 0, 1 or 2; s2 is 0, 1, 2, 3, 4, 5, 6; and L1 , X1 , Y1 , Z1 , R9a , R9b , R10 and R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula I or below.

在某些實施例中,該化合物為式 XIII化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula XIII , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XIII’化合物: XIII’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 9a、R 9b、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XIII' : XIII' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; r2 is 0, 1 or 2; s2 is 0, 1, 2, 3, 4, 5, 6; and L1 , X1 , Y1 , Z1 , R9a , R9b , R10 and R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式 XIII’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula XIII' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XIII’’化合物: XIII’’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 9a、R 9b、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula XIII'' : XIII'' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; r2 is 0, 1 or 2; s2 is 0, 1, 2, 3, 4, 5, 6; and L1 , X1 , Y1 , Z1 , R9a , R9b , R10 and R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式 XIII’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula XIII'' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XIII’’’化合物: XIII’’’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 9a、R 9b、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XIII''' : XIII''' or a pharmaceutically acceptable salt or solvent thereof, wherein q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and L 1 , X 1 , Y 1 , Z 1 , R 9a , R 9b , R 10 and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式 XIII’’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。 In certain embodiments, the compound is of formula XIII''' , wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group.

在另一實施例中,本揭示案提供式 XIV化合物: XIV或其醫藥學上可接受之鹽或溶劑合物, 其中 R 11’選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 A、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula XIV : XIV or a pharmaceutically acceptable salt or solvent thereof, wherein R 11′ is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and A, X 1 , Y 1 , Z 1 , R 10 and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式XIV化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。在某些實施例中,Z 1並非金剛烷基。 In certain embodiments, the compound is a compound of formula XIV, wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group. In certain embodiments, Z 1 is not an adamantyl group.

在另一實施例中,本揭示案提供式 XIV’化合物: XIV’或其醫藥學上可接受之鹽或溶劑合物, 其中 R 11’選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 A、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XIV' : XIV' or a pharmaceutically acceptable salt or solvent thereof, wherein R 11' is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and A, X 1 , Y 1 , Z 1 , R 10 and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式XIV’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。在某些實施例中,Z 1並非金剛烷基。 In some embodiments, the compound is a compound of formula XIV', wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group. In some embodiments, Z 1 is not an adamantyl group.

在另一實施例中,本揭示案提供式 XIV’’化合物: XIV’’或其醫藥學上可接受之鹽或溶劑合物, 其中 R 11’選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 A、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XIV'' : XIV'' or a pharmaceutically acceptable salt or solvent thereof, wherein R 11' is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and A, X 1 , Y 1 , Z 1 , R 10 and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式XIV’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。在某些實施例中,Z 1並非金剛烷基。 In certain embodiments, the compound is a compound of formula XIV'', wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group. In certain embodiments, Z 1 is not an adamantyl group.

在另一實施例中,本揭示案提供式 XIV’’’化合物: XIV’’’或其醫藥學上可接受之鹽或溶劑合物, 其中 R 11’選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 A、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XIV''' : XIV''' or a pharmaceutically acceptable salt or solvent thereof, wherein R 11' is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and A, X 1 , Y 1 , Z 1 , R 10 and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在某些實施例中,該化合物為式XIV’’’化合物,其中Z 1為視情況經取代之C 5-C 12橋接伸環烷基。在某些實施例中,Z 1並非金剛烷基。 In certain embodiments, the compound is a compound of formula XIV''', wherein Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene group. In certain embodiments, Z 1 is not an adamantyl group.

在另一實施例中,本揭示案提供式 XV化合物: XV或其醫藥學上可接受之鹽或溶劑合物, 其中 R 11’選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式I或下文所定義; 其中Z 1並非金剛烷基。 In another embodiment, the present disclosure provides a compound of formula XV : XV or a pharmaceutically acceptable salt or solvent thereof, wherein R 11′ is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and L 1 , X 1 , Y 1 , Z 1 , R 10 and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula I or below; wherein Z 1 is not an adamantyl group.

在另一實施例中,本揭示案提供式 XV’化合物: XV’或其醫藥學上可接受之鹽或溶劑合物, 其中 R 11’選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式I或下文所定義; 其中Z 1並非金剛烷基。 In another embodiment, the present disclosure provides a compound of formula XV' : XV' or a pharmaceutically acceptable salt or solvent thereof, wherein R 11' is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and L 1 , X 1 , Y 1 , Z 1 , R 10 and R 11 are as defined in Formula IA, Formula IB, Formula IC, Formula I or below; wherein Z 1 is not an adamantyl group.

在另一實施例中,本揭示案提供式 XV’’化合物: XV’’或其醫藥學上可接受之鹽或溶劑合物, 其中 R 11’選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式I或下文所定義; 其中Z 1並非金剛烷基。 In another embodiment, the present disclosure provides a compound of formula XV'' : XV'' or a pharmaceutically acceptable salt or solvent thereof, wherein R 11' is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and L 1 , X 1 , Y 1 , Z 1 , R 10 and R 11 are as defined in Formula IA, Formula IB, Formula IC, Formula I or as described below; wherein Z 1 is not an adamantyl group.

在另一實施例中,本揭示案提供式 XV’’’化合物: XV’’’或其醫藥學上可接受之鹽或溶劑合物, 其中 R 11’選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 10及R 11如本文中之式IA、式IB、式IC、式I或下文所定義; 其中Z 1並非金剛烷基。 In another embodiment, the present disclosure provides a compound of formula XV''' : XV''' or a pharmaceutically acceptable salt or solvent thereof, wherein R 11' is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and L 1 , X 1 , Y 1 , Z 1 , R 10 and R 11 are as defined in Formula IA, Formula IB, Formula IC, Formula I or as described below; wherein Z 1 is not an adamantyl group.

在另一實施例中,本揭示案提供式 XVI化合物: XVI或其醫藥學上可接受之鹽或溶劑合物, 其中 R 11’選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 9a、R 9b、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XVI : XVI or a pharmaceutically acceptable salt or solvent thereof, wherein R 11′ is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and L 1 , X 1 , Y 1 , Z 1 , R 9a , R 9b , R 10 and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在另一實施例中,本揭示案提供式 XVI’化合物: XVI’或其醫藥學上可接受之鹽或溶劑合物, 其中 R 11’選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 9a、R 9b、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XVI' : XVI' or a pharmaceutically acceptable salt or solvent thereof, wherein R 11' is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and L 1 , X 1 , Y 1 , Z 1 , R 9a , R 9b , R 10 and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below.

在另一實施例中,本揭示案提供式 XVI’’化合物: XVI’’或其醫藥學上可接受之鹽或溶劑合物, 其中 R 11’選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 9a、R 9b、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XVI'' : XVI'' or a pharmaceutically acceptable salt or solvent thereof, wherein R 11' is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and L 1 , X 1 , Y 1 , Z 1 , R 9a , R 9b , R 10 and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below.

在另一實施例中,本揭示案提供式 XVI’’’化合物: XVI’’’或其醫藥學上可接受之鹽或溶劑合物, 其中 R 11’選自由以下組成之群:氫、C 1-C 10烷基及C 2-C 10烯基; q 1為0、1、2或3; q 2為0、1、2或3; r 2為0、1或2; s 2為0、1、2、3、4、5、6;且 L 1、X 1、Y 1、Z 1、R 9a、R 9b、R 10及R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XVI''' : XVI''' or a pharmaceutically acceptable salt or solvent thereof, wherein R 11' is selected from the group consisting of hydrogen, C 1 -C 10 alkyl and C 2 -C 10 alkenyl; q 1 is 0, 1, 2 or 3; q 2 is 0, 1, 2 or 3; r 2 is 0, 1 or 2; s 2 is 0, 1, 2, 3, 4, 5, 6; and L 1 , X 1 , Y 1 , Z 1 , R 9a , R 9b , R 10 and R 11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below.

在另一實施例中,本揭示案提供式 XVII化合物: XVII或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; A、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XVII : XVII or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; A, X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在某些實施例中,該化合物為式 XVII化合物,其中X 2、Y 2、Z 2及R 11之一或多個亞甲基鍵聯未經選自-O-、-CH=CH-、-S-及C 3-C 6伸環烷基之基團置換。 In certain embodiments, the compound is a compound of formula XVII , wherein one or more methylene bonds of X 2 , Y 2 , Z 2 and R 11 are not replaced by a group selected from —O—, —CH═CH—, —S— and C 3 -C 6 cycloalkylene.

在另一實施例中,本揭示案提供式 XVIII化合物: XVIII或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; L 1、X 1、X 2、Y 1、Y 2、Z 2、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula XVIII : XVIII or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; L1 , X1 , X2 , Y1 , Y2 , Z2 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在某些實施例中,該化合物為式 XVIII化合物,其中X 2、Y 2、Z 2及R 11之一或多個亞甲基鍵聯未經選自-O-、-CH=CH-、-S-及C 3-C 6伸環烷基之基團置換。 In certain embodiments, the compound is a compound of formula XVIII , wherein one or more methylene bonds of X 2 , Y 2 , Z 2 and R 11 are not replaced by a group selected from —O—, —CH═CH—, —S— and C 3 -C 6 cycloalkylene.

在另一實施例中,本揭示案提供式 XVIII’化合物: XVIII’或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; A、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of Formula XVIII ': XVIII' or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; A, X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below.

在另一實施例中,本揭示案提供式 XIX化合物: XIX或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; A、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula XIX : XIX or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; A, X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as described below.

在另一實施例中,本揭示案提供式 XX化合物: XX或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; L 1、X 1、X 2、Y 1、Y 2、Z 2、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 In another embodiment, the present disclosure provides a compound of formula XX : XX or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; L1 , X1 , X2 , Y1 , Y2 , Z2 , R11 are as defined herein in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or below.

在另一實施例中,本揭示案提供式 XXI化合物: XXI或其醫藥學上可接受之鹽或溶劑合物, 其中 q 1為0、1、2或3; q 2為0、1、2或3; A、X 1、X 2、Y 1、Y 2、Z 1、Z 2、R 10、R 11如本文中之式IA、式IB、式IC、式ID、式I或下文所定義。 L 1 In another embodiment, the present disclosure provides a compound of formula XXI : XXI or a pharmaceutically acceptable salt or solvent thereof, wherein q1 is 0, 1, 2 or 3; q2 is 0, 1, 2 or 3; A, X1 , X2 , Y1 , Y2 , Z1 , Z2 , R10 , R11 are as defined in Formula IA, Formula IB, Formula IC, Formula ID, Formula I or as defined below. L1

在另一實施例中,L1選自由-CH2CH2-、-CH2CH2CH2-及-CH2CH2CH2CH2-組成之群。在另一實施例中,L1為-CH2CH2-。在另一實施例中,L1為- CH2CH2CH2-。在另一實施例中,L 1為-CH 2CH 2CH 2CH 2-。在某些實施例中,L 1為–(CH 2) 2-6-OC(=O)-。在一些實施例中,L 1為–(CH 2) 2-OC(=O)-。 R 1 In another embodiment, L1 is selected from the group consisting of -CH2CH2-, -CH2CH2CH2-, and -CH2CH2CH2CH2-. In another embodiment, L1 is -CH2CH2-. In another embodiment, L1 is -CH2CH2CH2-. In another embodiment, L1 is -CH 2 CH 2 CH 2 CH 2 -. In certain embodiments, L1 is -(CH 2 ) 2-6 -OC(=O)-. In certain embodiments, L1 is -(CH 2 ) 2 -OC(=O)-. R 1

在一些實施例中,R 1。在另一實施例中,R 1為-OH。在一些實施例中,R 1為-N(R 9a)(R 9b)。在一些實施例中,R 1為-NMe 2 在一些實施例中,R 1為-NEt 2 在另一實施例中,R 1。在另一實施例中,R 1L 2 In some embodiments, R1 is In another embodiment, R 1 is -OH. In some embodiments, R 1 is -N(R 9a )(R 9b ). In some embodiments, R 1 is -NMe 2 . In some embodiments, R 1 is -NEt 2 . In another embodiment, R 1 is In another embodiment, R 1 is . L 2

在另一實施例中,L 2選自由-CH 2CH 2-、-CH 2CH 2CH 2-及-CH 2CH 2CH 2CH 2-組成之群。在另一實施例中,L 2為- CH 2CH 2-。在另一實施例中,L 2為-CH 2CH 2CH 2-。在另一實施例中,L 2為-CH 2CH 2CH 2CH 2-。 R 8 In another embodiment, L2 is selected from the group consisting of -CH2CH2- , -CH2CH2CH2- , and -CH2CH2CH2CH2CH2- . In another embodiment , L2 is -CH2CH2- . In another embodiment , L2 is -CH2CH2CH2- . In another embodiment , L2 is -CH2CH2CH2CH2- . R8

在一些實施例中,R 8。在另一實施例中,R 8為-NR 9aR 9b。在一些實施例中,R 8為-NMe 2 在一些實施例中,R 8為-NEt 2 在另一實施例中,R 8為-OH。 R 9a 、R 9b In some embodiments, R8 is In another embodiment, R 8 is -NR 9a R 9b . In some embodiments, R 8 is -NMe 2 . In some embodiments, R 8 is -NEt 2 . In another embodiment, R 8 is -OH. R 9a , R 9b

在另一實施例中,R 9a及R 9b獨立地選自由氫及C 1-C 4烷基組成之群。在另一實施例中,R 9a及R 9b各自為甲基。在另一實施例中,R 9a及R 9b各自為乙基。 R’ In another embodiment, R 9a and R 9b are independently selected from the group consisting of hydrogen and C 1 -C 4 alkyl. In another embodiment, R 9a and R 9b are each methyl. In another embodiment, R 9a and R 9b are each ethyl. R'

在另一實施例中,R'為氫。在一些實施例中,R’為C 1-C 6烷基。 Q 1 In another embodiment, R' is hydrogen. In some embodiments, R' is C 1 -C 6 alkyl.

在另一實施例中,Q 1為直鏈C 1-C 20伸烷基。在另一實施例中,Q 1為直鏈C 1-C 10伸烷基。在另一實施例中,Q 1為C 1-C 10伸烷基。在另一實施例中,Q 1為C 2-C 5伸烷基。Q 1為C 6-C 9伸烷基。在另一實施例中,Q 1選自由以下組成之群:-CH 2CH 2-、-CH 2CH 2CH 2-、-CH 2(CH 2) 2CH 2-、-CH 2(CH 2) 3CH 2-、-CH 2(CH 2) 4CH 2-、-CH 2(CH 2) 5CH 2-、-CH 2(CH 2) 6CH 2-、-CH 2(CH 2) 7CH 2-及-CH 2(CH 2) 8CH 2-。在另一實施例中,Q 1為-CH 2CH 2-。在另一實施例中,Q 1為-CH 2CH 2CH 2-。在另一實施例中,Q 1為-CH 2(CH 2) 2CH 2-。在另一實施例中,Q 1為-CH 2(CH 2) 3CH 2-。在另一實施例中,Q 1為-CH 2CH 2-。在另一實施例中,Q 1為-CH 2(CH 2) 4CH 2-。在另一實施例中,Q 1為-CH 2(CH 2) 5CH 2-。在另一實施例中,Q 1為-CH 2(CH 2) 6CH 2-。在另一實施例中,Q 1為-CH 2(CH 2) 7CH 2-。在另一實施例中,Q 1為-CH 2(CH 2) 8.CH 2-。 W 1 In another embodiment, Q 1 is a linear C 1 -C 20 alkylene group. In another embodiment, Q 1 is a linear C 1 -C 10 alkylene group. In another embodiment, Q 1 is a C 1 -C 10 alkylene group. In another embodiment, Q 1 is a C 2 -C 5 alkylene group . Q 1 is a C 6 -C 9 alkylene group. In another embodiment, Q 1 is selected from the group consisting of —CH 2 CH 2 —, —CH 2 CH 2 CH 2 —, —CH 2 (CH 2 ) 2 CH 2 —, —CH 2 (CH 2 ) 3 CH 2 , —CH 2 (CH 2 ) 4 CH 2 —, —CH 2 (CH 2 ) 5 CH 2 —, —CH 2 (CH 2 ) 6 CH 2 —, —CH 2 (CH 2 ) 7 CH 2 —, and —CH 2 (CH 2 ) 8 CH 2 —. In another embodiment, Q 1 is —CH 2 CH 2 —. In another embodiment, Q 1 is —CH 2 CH 2 CH 2 —. In another embodiment, Q 1 is —CH 2 (CH 2 ) 2 CH 2 —. In another embodiment, Q 1 is -CH 2 (CH 2 ) 3 CH 2 -. In another embodiment, Q 1 is -CH 2 CH 2 -. In another embodiment, Q 1 is -CH 2 (CH 2 ) 4 CH 2 -. In another embodiment, Q 1 is -CH 2 (CH 2 ) 5 CH 2 -. In another embodiment, Q 1 is -CH 2 (CH 2 ) 6 CH 2 -. In another embodiment, Q 1 is -CH 2 (CH 2 ) 7 CH 2 -. In another embodiment, Q 1 is -CH 2 (CH 2 ) 8. CH 2 -. W 1

在另一實施例中,W 1選自由以下組成之群:-C(=O)O-、-OC(=O)-、-C(=O)N(R 12a)-、-N(R 12a)C(=O)-、-OC(=O)N(R 12a)-、- N(R 12a)C(=O)O-及-OC(=O)O-。在另一實施例中,W 1為-C(=O)O-。在另一實施例中,W 1為-OC(=O)-。在另一實施例中,W 1為-C(=O)N(R 12a)-。在另一實施例中,W 1為-N(R 12a)C(=O)-。在另一實施例中,W 1為-OC(=O)N(R 12a)-。在另一實施例中,W 1為-N(R 12a)C(=O)O-。在另一實施例中,W 1為-OC(=O)O-。 X 1 In another embodiment, W is selected from the group consisting of: -C(=O)O-, -OC(=O)-, -C(=O)N(R 12a )-, -N(R 12a )C(=O)-, -OC(=O)N(R 12a )-, -N(R 12a )C(=O)O-, and -OC(=O)O-. In another embodiment, W is -C(=O)O-. In another embodiment, W is -OC(=O)-. In another embodiment, W is -C(=O)N(R 12a )-. In another embodiment, W is -N(R 12a )C(=O)-. In another embodiment, W is -OC(=O)N(R 12a )-. In another embodiment, W 1 is -N(R 12a )C(=O)O-. In another embodiment, W 1 is -OC(=O)O-. X 1

在另一實施例中,X 2為視情況經取代之C 1-C 15伸烷基。在另一實施例中,X 2為分支鏈C 1-C 15伸烷基。在另一實施例中,X 1為一鍵或C 1-C 15伸烷基。在另一實施例中,X 1為一鍵。在另一實施例中,X 1為C 2-C 5伸烷基。在另一實施例中,X 1為C 6-C 9伸烷基。在另一實施例中,X 1為-CH 2-。在另一實施例中,X 2為-CH 2CH 2-。在另一實施例中,X 2為-CH 2CH 2CH 2-。在另一實施例中,X 2為-CH 2CH 2CH 2CH 2-。在另一實施例中,X 2為-CH 2CH 2CH 2CH 2CH 2-。 Y 1 In another embodiment, X2 is optionally substituted C1 - C15 alkylene. In another embodiment, X2 is branched C1- C15 alkylene. In another embodiment, X1 is a bond or C1 -C15 alkylene. In another embodiment, X1 is a bond. In another embodiment, X1 is C2 - C5 alkylene. In another embodiment, X1 is C6 - C9 alkylene. In another embodiment, X1 is -CH2- . In another embodiment, X2 is -CH2CH2- . In another embodiment, X2 is -CH2CH2CH2- . In another embodiment , X2 is -CH2CH2CH2- . In another embodiment , X2 is -CH2CH2CH2CH2- . In another embodiment, X 2 is -CH 2 CH 2 CH 2 CH 2 CH 2 -. Y 1

在另一實施例中,Y 1選自由以下組成之群:-(CH 2) m-、-O-、-S-及-S-S-。在另一實施例中,Y 1為-(CH 2) m-。在一些實施例中,Y 1為-O-。在一些實施例中,Y 1為-S-。在另一實施例中,Y 1為-CH 2-。在另一實施例中,Y 2為-CH 2CH 2-。 m In another embodiment, Y1 is selected from the group consisting of: -( CH2 ) m- , -O-, -S-, and -SS-. In another embodiment, Y1 is -( CH2 ) m- . In some embodiments, Y1 is -O-. In some embodiments, Y1 is -S- . In another embodiment, Y1 is -CH2- . In another embodiment, Y2 is -CH2CH2- . m

在另一實施例中,m為0。在另一實施例中,m為1。在另一實施例中,m為2。在另一實施例中,m為3。在另一實施例中,m為4。在另一實施例中,m為5。在另一實施例中,m為6。 n In another embodiment, m is 0. In another embodiment, m is 1. In another embodiment, m is 2. In another embodiment, m is 3. In another embodiment, m is 4. In another embodiment, m is 5. In another embodiment, m is 6 .

在另一實施例中,n為0。在另一實施例中,n為1。在另一實施例中,n為2。在另一實施例中,n為3。在另一實施例中,n為4。在另一實施例中,n為5。在另一實施例中,n為6。 p In another embodiment, n is 0. In another embodiment, n is 1. In another embodiment, n is 2. In another embodiment, n is 3. In another embodiment, n is 4. In another embodiment, n is 5. In another embodiment, n is 6 .

在另一實施例中,p為0。在另一實施例中,p為1。 Z 1 In another embodiment, p is 0. In another embodiment, p is 1. Z 1

在另一實施例中,Z 1選自由以下組成之群:C 4-C 12伸環烷基、 。在某些實施例中,Z 1為視情況經取代的。 In another embodiment, Z1 is selected from the group consisting of C4 - C12 cycloalkylene, and In certain embodiments, Z 1 is optionally substituted.

在另一實施例中,Z 1In another embodiment, Z1 is or .

在另一實施例中,Z 1為C 4-C 12伸環烷基。在另一實施例中,Z 1為單環C 4-C 8伸環烷基。在另一實施例中,Z 1為單環C 4-C 6伸環烷基。在另一實施例中,Z 1為單環C 4伸環烷基。在另一實施例中,Z 1為單環C 5伸環烷基。在另一實施例中,Z 1為單環C 6伸環烷基。 In another embodiment, Z 1 is a C 4 -C 12 cycloalkylene. In another embodiment, Z 1 is a monocyclic C 4 -C 8 cycloalkylene. In another embodiment, Z 1 is a monocyclic C 4 -C 6 cycloalkylene. In another embodiment, Z 1 is a monocyclic C 4 cycloalkylene. In another embodiment, Z 1 is a monocyclic C 5 cycloalkylene. In another embodiment, Z 1 is a monocyclic C 6 cycloalkylene.

在另一實施例中,Z 1為視情況經取代之橋接雙環或多環伸環烷基。在一些實施例中,Z 1為視情況經取代之C 5-C 12橋接伸環烷基。在一些實施例中,Z 1為視情況經取代之C 6-C 10橋接伸環烷基。在一些實施例中,Z 1為視情況經取代之C 5-C 10橋接伸環烷基,選自由以下組成之群:金剛烷基、立方烷基、雙環[2.2.1]庚基、雙環[2.2.2]辛基、雙環[1.1.1]戊基、雙環[3.2.1]辛基及雙環[3.1.1]庚基。 In another embodiment, Z 1 is an optionally substituted bridged bicyclic or polycyclic cycloalkylene. In some embodiments, Z 1 is an optionally substituted C 5 -C 12 bridged cycloalkylene. In some embodiments, Z 1 is an optionally substituted C 6 -C 10 bridged cycloalkylene. In some embodiments, Z1 is an optionally substituted C5 - C10 bridged cycloalkylene selected from the group consisting of adamantyl, cubanyl, bicyclo[2.2.1]heptyl, bicyclo[2.2.2]octyl, bicyclo[1.1.1]pentyl, bicyclo[3.2.1]octyl, and bicyclo[3.1.1]heptyl.

在另一實施例中,Z 1選自由以下組成之群: In another embodiment, Z1 is selected from the group consisting of: , , , , , , , or .

在另一實施例中,Z 1選自由以下組成之群: R 10 In another embodiment, Z1 is selected from the group consisting of: and . R 10

在另一實施例中,R 10為氫。 In another embodiment, R 10 is hydrogen.

在另一實施例中,R 10為C 1-C 10烷基。在另一實施例中,R 10為C 3-C 7烷基。在另一實施例中,R 10為C 4-C 6烷基。在另一實施例中,R 10為C 4。在另一實施例中,R 10為C 5。在另一實施例中,R 10為C 6In another embodiment, R 10 is C 1 -C 10 alkyl. In another embodiment, R 10 is C 3 -C 7 alkyl. In another embodiment, R 10 is C 4 -C 6 alkyl. In another embodiment, R 10 is C 4. In another embodiment, R 10 is C 5. In another embodiment , R 10 is C 6 .

在另一實施例中,R 10為C 2-C 12烯基。在另一實施例中,R 10為C 6-C 12烯基。在另一實施例中,R 10為C 2-C 8烯基。 R 11 In another embodiment, R 10 is C 2 -C 12 alkenyl. In another embodiment, R 10 is C 6 -C 12 alkenyl. In another embodiment, R 10 is C 2 -C 8 alkenyl. R 11

在另一實施例中,R 11為C 1-C 10烷基。在另一實施例中,R 11為視情況經取代之C 1-C 20烷基。在另一實施例中,R 11為視情況經取代之分支鏈C 1-C 20烷基。在另一實施例中,R 11為視情況經取代之C 1-C 15烷基。在另一實施例中,R 11為視情況經取代之C 1-C 15分支鏈烷基。在另一實施例中,R 11為視情況經取代之C 10-C 15烷基。在另一實施例中,R 11為視情況經取代之C 10-C 15分支鏈烷基。在另一實施例中,R 11選自由-CH 3、-CH 2CH 3及-CH 2CH 2CH 3組成之群。在另一實施例中,R 11選自由以下組成之群:-CH 2(CH 2) 2CH 3、-CH 2(CH 2) 3CH 3、-CH 2(CH 2) 4CH 3、-CH 2(CH 2) 5CH 3、-CH 2(CH 2) 6CH 3、-CH 2(CH 2) 7CH 3及-CH 2(CH 2) 8CH 3。在另一實施例中,R 11為-CH 3。在另一實施例中,R 11為-CH 2CH 3。在另一實施例中,R 11為-CH 2CH 2CH 3。在另一實施例中,R 11為-CH 2(CH 2) 2CH 3。在另一實施例中,R 11為-CH 2(CH 2) 3CH 3。在另一實施例中,R 11為-CH 2(CH 2) 4CH 3。在另一實施例中,R 11為-CH 2(CH 2) 5CH 3。在另一實施例中,R 11為CH 2(CH 2) 6CH 3。在另一實施例中,R 11為-CH 2(CH 2) 7CH 3。在另一實施例中,R 11為-CH 2(CH 2) 8CH 3In another embodiment, R 11 is C 1 -C 10 alkyl. In another embodiment, R 11 is C 1 -C 20 alkyl, which may be substituted. In another embodiment, R 11 is C 1 -C 20 alkyl, which may be substituted. In another embodiment, R 11 is C 1 -C 15 alkyl, which may be substituted. In another embodiment, R 11 is C 1 -C 15 branched alkyl, which may be substituted. In another embodiment, R 11 is C 10 -C 15 alkyl, which may be substituted. In another embodiment, R 11 is C 10 -C 15 branched alkyl, which may be substituted. In another embodiment, R 11 is selected from the group consisting of -CH 3 , -CH 2 CH 3 , and -CH 2 CH 2 CH 3 . In another embodiment, R 11 is selected from the group consisting of -CH 2 (CH 2 ) 2 CH 3 , -CH 2 (CH 2 ) 3 CH 3 , -CH 2 (CH 2 ) 4 CH 3 , -CH 2 (CH 2 ) 5 CH 3 , -CH 2 (CH 2 ) 6 CH 3 , -CH 2 (CH 2 ) 7 CH 3 , and -CH 2 (CH 2 ) 8 CH 3 . In another embodiment, R 11 is -CH 3 . In another embodiment, R 11 is -CH 2 CH 3 . In another embodiment, R 11 is -CH 2 CH 3 . In another embodiment, R 11 is -CH 2 (CH 2 ) 2 CH 3. In another embodiment, R 11 is -CH 2 (CH 2 ) 3 CH 3. In another embodiment, R 11 is -CH 2 (CH 2 ) 4 CH 3. In another embodiment, R 11 is -CH 2 (CH 2 ) 5 CH 3. In another embodiment, R 11 is CH 2 (CH 2 ) 6 CH 3. In another embodiment, R 11 is -CH 2 (CH 2 ) 7 CH 3. In another embodiment, R 11 is -CH 2 (CH 2 ) 8 CH 3 .

在另一實施例中,R 11為C 2-C 10烯基。在另一實施例中,R 11為C 2-C 12烯基。在另一實施例中,R 11為C 6-C 12烯基。在另一實施例中,R 11為C 2-C 8烯基。 In another embodiment, R 11 is C 2 -C 10 alkenyl. In another embodiment, R 11 is C 2 -C 12 alkenyl. In another embodiment, R 11 is C 6 -C 12 alkenyl. In another embodiment, R 11 is C 2 -C 8 alkenyl.

在另一實施例中,本揭示案提供式IA、IB、IC或 I- XXI中任一者之化合物或其醫藥學上可接受之鹽或溶劑合物,其中R 11為氫。 Q 2 In another embodiment, the present disclosure provides a compound of any one of Formulas IA, IB, IC or I - XXI , or a pharmaceutically acceptable salt or solvent thereof, wherein R 11 is hydrogen.

在另一實施例中,Q 2為直鏈C 1-C 20伸烷基。在另一實施例中,Q 2為直鏈C 1-C 10伸烷基。在另一實施例中,Q 2為C 2-C 10伸烷基。在另一實施例中,Q 2選自由以下組成之群:-CH 2CH 2-、-CH 2CH 2CH 2-、-CH 2(CH 2) 2CH 2-、-CH 2(CH 2) 3CH 2-、-CH 2(CH 2) 4CH 2-、-CH 2(CH 2) 5CH 2-、-CH 2(CH 2) 6CH 2-、-CH 2(CH 2) 7CH 2-及-CH 2(CH 2) 8.CH 2-。在另一實施例中,Q 2為-CH 2CH 2-。在另一實施例中,Q 2為-CH 2CH 2CH 2-。在另一實施例中,Q 2為-CH 2(CH 2) 3CH 2-。在另一實施例中,Q 2為-CH 2(CH 2) 4CH 2-。在另一實施例中,Q 2為-CH 2(CH 2) 5CH 2-。在另一實施例中,Q 2為-CH 2(CH 2) 6CH 2-。在另一實施例中,Q 2為-CH 2(CH 2) 7CH 2-。在另一實施例中,Q 2為-CH 2(CH 2) 8.CH 2-。 W 2 In another embodiment, Q 2 is a straight chain C 1 -C 20 alkylene group. In another embodiment, Q 2 is a straight chain C 1 -C 10 alkylene group. In another embodiment, Q 2 is a C 2 -C 10 alkylene group. In another embodiment, Q 2 is selected from the group consisting of -CH 2 CH 2 -, -CH 2 CH 2 CH 2 -, -CH 2 (CH 2 ) 2 CH 2 -, -CH 2 (CH 2 ) 3 CH 2 -, -CH 2 (CH 2 ) 4 CH 2 -, -CH 2 (CH 2 ) 5 CH 2 -, -CH 2 (CH 2 ) 6 CH 2 -, -CH 2 (CH 2 ) 7 CH 2 -, and -CH 2 (CH 2 ) 8. CH 2 -. In another embodiment, Q 2 is -CH 2 CH 2 -. In another embodiment, Q 2 is -CH 2 CH 2 CH 2 -. In another embodiment, Q 2 is -CH 2 (CH 2 ) 3 CH 2 -. In another embodiment, Q 2 is -CH 2 (CH 2 ) 4 CH 2 -. In another embodiment, Q 2 is -CH 2 (CH 2 ) 5 CH 2 -. In another embodiment, Q 2 is -CH 2 (CH 2 ) 6 CH 2 -. In another embodiment, Q 2 is -CH 2 (CH 2 ) 7 CH 2 -. In another embodiment, Q 2 is -CH 2 (CH 2 ) 8. CH 2 -. W 2

在另一實施例中,W 2選自由-C(=O)O-及-OC(=O)-以下組成之群。在另一實施例中,W 2為-C(=O)O-。在另一實施例中,W 2為-OC(=O)-。 X 2 In another embodiment, W 2 is selected from the group consisting of -C(=O)O- and -OC(=O)-. In another embodiment, W 2 is -C(=O)O-. In another embodiment, W 2 is -OC(=O)-. X 2

在另一實施例中,X 2為視情況經取代之C 1-C 15伸烷基。在另一實施例中,X 2為C 1-C 15分支鏈伸烷基。在另一實施例中,X 2為C 1-C 6伸烷基或一鍵。在另一實施例中,X 2為C 2-C 4伸烷基。在另一實施例中,X 2為C 3-C 5伸烷基。在另一實施例中,X 2選自由以下組成之群:-CH 2CH 2-、-CH 2CH 2CH 2-、-CH 2(CH 2) 2CH 2-、-CH 2(CH 2) 3CH 2-及-CH 2(CH 2) 4CH 2-。在另一實施例中,X 2為-CH 2-。在另一實施例中,X 2為一鍵。在另一實施例中,X 2為分支鏈C 1-C 15伸烷基,其中X 2之一或多個亞甲基鍵聯視情況且獨立地經選自-O-、-CH=CH-、-S-及C 3-C 6伸環烷基之基團置換。 Y 2 In another embodiment, X2 is an optionally substituted C1 - C15 alkylene. In another embodiment, X2 is a C1 - C15 branched chain alkylene. In another embodiment, X2 is a C1 - C6 alkylene or a bond. In another embodiment, X2 is a C2 - C4 alkylene. In another embodiment, X2 is a C3 - C5 alkylene . In another embodiment, X2 is selected from the group consisting of: -CH2CH2- , -CH2CH2CH2- , -CH2 ( CH2 ) 2CH2- , -CH2 ( CH2 ) 3CH2- , and -CH2(CH2)4CH2- . In another embodiment, X2 is -CH2- . In another embodiment, X 2 is a bond. In another embodiment, X 2 is a branched C 1 -C 15 alkylene group, wherein one or more methylene bonds of X 2 are optionally and independently replaced by a group selected from -O-, -CH=CH-, -S- and C 3 -C 6 cycloalkylene groups. Y 2

在另一實施例中,Y 2選自由-(CH 2) m-及-S-組成之群。在另一實施例中,Y 2為-(CH 2) m-。在另一實施例中,Y 2為-S-。 Z 2 In another embodiment, Y 2 is selected from the group consisting of -(CH 2 ) m - and -S-. In another embodiment, Y 2 is -(CH 2 ) m -. In another embodiment, Y 2 is -S-. Z 2

在另一實施例中,Z 2為-(CH 2) p-。在另一實施例中,Z 2為-CH 2-。在另一實施例中,Z 2為-CH 2CH 2-。在另一實施例中,Z 2為C 4-C 12伸環烷基。在另一實施例中,Z 2為單環C 4-C 8伸環烷基。在某些實施例中,Z 2為視情況經取代的。 In another embodiment, Z 2 is -(CH 2 ) p -. In another embodiment, Z 2 is -CH 2 -. In another embodiment, Z 2 is -CH 2 CH 2 -. In another embodiment, Z 2 is C 4 -C 12 cycloalkylene. In another embodiment, Z 2 is monocyclic C 4 -C 8 cycloalkylene. In certain embodiments, Z 2 is optionally substituted.

在另一實施例中,Z 2為視情況經取代之橋接雙環或多環伸環烷基。在一些實施例中,Z 2為視情況經取代之C 5-C 12橋接伸環烷基。在一些實施例中,Z 2為視情況經取代之C 6-C 10橋接伸環烷基。在一些實施例中,Z 2為視情況經取代之C 5-C 10橋接伸環烷基,選自由以下組成之群:金剛烷基、立方烷基、雙環[2.2.1]庚基、雙環[2.2.2]辛基、雙環[1.1.1]戊基、雙環[3.2.1]辛基及雙環[3.1.1]庚基。 In another embodiment, Z 2 is an optionally substituted bridged bicyclic or polycyclic cycloalkylene. In some embodiments, Z 2 is an optionally substituted C 5 -C 12 bridged cycloalkylene. In some embodiments, Z 2 is an optionally substituted C 6 -C 10 bridged cycloalkylene. In some embodiments, Z2 is an optionally substituted C5 - C10 bridged cycloalkylene selected from the group consisting of adamantyl, cubanyl, bicyclo[2.2.1]heptyl, bicyclo[2.2.2]octyl, bicyclo[1.1.1]pentyl, bicyclo[3.2.1]octyl, and bicyclo[3.1.1]heptyl.

在另一實施例中,Z 2選自由以下組成之群: In another embodiment, Z2 is selected from the group consisting of: , , , , , , , or .

在另一實施例中,Z 2選自由以下組成之群: In another embodiment, Z2 is selected from the group consisting of: and .

在另一實施例中,本揭示案提供選自表(III)之任何一或多種化合物之化合物或其醫藥學上可接受之鹽或溶劑合物。 表(III). 具有受約束臂之可離子化脂質的非限制性實例 化合物編號 結構 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 C28 C29 C30 C31 C32 C33 C34 C35 C36 C37 C38 C39 C40 C41 C42 C43 C44 C45 C46 C47 C48 C49 C50 C51 C52 C53 C54 C55 C56 C57 C58 C59 C60 C61 C62 C63 C64 C65 C66 C67 C68 C69 C70 C71 C72 C73 C74 C75 C76 C77 C78 C79 C80 C81 In another embodiment, the present disclosure provides a compound selected from any one or more compounds of Table (III) or a pharmaceutically acceptable salt or solvent thereof. Table (III). Non-limiting examples of ionizable lipids with constrained arms Compound No. Structure C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 C28 C29 C30 C31 C32 C33 C34 C35 C36 C37 C38 C39 C40 C41 C42 C43 C44 C45 C46 C47 C48 C49 C50 C51 C52 C53 C54 C55 C56 C57 C58 C59 C60 C61 C62 C63 C64 C65 C66 C67 C68 C69 C70 C71 C72 C73 C74 C75 C76 C77 C78 C79 C80 C81

在一些實施例中,本揭示案之LNP包含PCT申請案PCT/US2023/065477中所揭示之可離子化脂質,該案以引用之方式整體併入本文中。In some embodiments, the LNPs of the present disclosure comprise ionizable lipids disclosed in PCT application PCT/US2023/065477, which is incorporated herein by reference in its entirety.

在一些實施例中,本揭示案之脂質包含雜環核心,其中雜原子為氮。在一些實施例中,該雜環核心包含吡咯啶或其衍生物。在一些實施例中,該雜環核心包含哌啶或其衍生物。In some embodiments, the lipids of the present disclosure comprise a heterocyclic core, wherein the heteroatom is nitrogen. In some embodiments, the heterocyclic core comprises pyrrolidine or a derivative thereof. In some embodiments, the heterocyclic core comprises piperidine or a derivative thereof.

在一些實施例中,本揭示案之化合物由式(CX-I)表示: (CX-I) 或其醫藥學上可接受之鹽, 其中 Z選自由以下組成之群:一鍵、 ; 每個Y獨立地選自由以下組成之群: ; R 1為-(CH 2) 1-6N(R a) 2或-(CH 2) 1-6OH; R 2為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換; R 2’為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換;每個R a獨立地為視情況經取代之C 1-C 6烷基;或 兩個R a與其所連接之氮合起來形成視情況經取代之4-7員雜環基環; m為0、1或2; n為1或2;且 p為1或2。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-I): (CX-I) or a pharmaceutically acceptable salt thereof, wherein Z is selected from the group consisting of: a , , , , and ; Each Y is independently selected from the group consisting of: , , , , , , and ; R 1 is -(CH 2 ) 1-6 N(R a ) 2 or -(CH 2 ) 1-6 OH; R 2 is an optionally substituted C 1 -C 36 alkyl group or an optionally substituted C 2 -C 36 alkenyl group, wherein 1 to 6 methylene units of R 2 are optionally replaced by a group independently selected from cyclopropylene, -O-, -OC(O)- and -C(O)O-; R 2' is an optionally substituted C 1 -C 36 alkyl group or an optionally substituted C 2 -C 36 alkenyl group, wherein 1 to 6 methylene units of R 2 are optionally replaced by a group independently selected from cyclopropylene, -O-, -OC(O)- and -C(O)O-; each R a is independently an optionally substituted C 1 -C 6 alkyl group; or two Ras are combined with the nitrogen to which they are attached to form an optionally substituted 4-7 membered heterocyclic ring; m is 0, 1 or 2; n is 1 or 2; and p is 1 or 2.

在一些實施例中,本揭示案之化合物由式(CX-i)表示: (CX-i) 或其醫藥學上可接受之鹽, 其中 Z選自由以下組成之群:一鍵、 ; 每個Y獨立地選自由以下組成之群: ; R 1為-(CH 2) 1-6N(R a) 2; R 2為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換; 每個R a獨立地為視情況經取代之C 1-C 6烷基;或 兩個R a與其所連接之氮合起來形成視情況經取代之4-7員雜環基環; m為0、1或2; n為1或2;且 p為1或2。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-i): (CX-i) or a pharmaceutically acceptable salt thereof, wherein Z is selected from the group consisting of: a key, , , , , and ; Each Y is independently selected from the group consisting of: , , , , , , and ; R 1 is -(CH 2 ) 1-6 N(R a ) 2 ; R 2 is an optionally substituted C 1 -C 36 alkyl group or an optionally substituted C 2 -C 36 alkenyl group, wherein 1-6 methylene units of R 2 are optionally replaced by a group independently selected from cyclopropene, -O-, -OC(O)- and -C(O)O-; each Ra is independently an optionally substituted C 1 -C 6 alkyl group; or two Ra s are combined with the nitrogen to which they are attached to form an optionally substituted 4-7 membered heterocyclic ring; m is 0, 1 or 2; n is 1 or 2; and p is 1 or 2.

在一些實施例中,本揭示案之化合物由式(CX-I’)、(CX-I’’)、(CX-I’’’)表示, (CX-I’)                    (CX-I’’)                (CX-I’’’) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-I'), (CX-I''), (CX-I'''), (CX-I') (CX-I'') (CX-I''') or a pharmaceutically acceptable salt thereof.

在一些實施例中,本揭示案之化合物由式(CX-I-a)、(CX-I-b)、(CX-I-c)或(CX-I-d)表示: (CX-I-a)                                         (CX-I-b) (CX-I-c)                                  (CX-I-d) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by Formula (CX-Ia), (CX-Ib), (CX-Ic) or (CX-Id): (CX-Ia) (CX-Ib) (CX-Ic) (CX-Id) or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CX-I-a’)、(CX-I-b’)、(CX-I-c’)或(CX-I-d’)表示: (CX-I-a’)                                            (CX-I-b’) (CX-I-c’)                                     (CX-I-d’) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-I-a'), (CX-I-b'), (CX-I-c') or (CX-I-d'): (CX-I-a') (CX-I-b') (CX-I-c') (CX-I-d') or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CX-I-a’’)、(CX-I-b’’)、(CX-I-c’’)或(CX-I-d’’)表示: (CX-I-a’’)                                               (CX-I-b’’) (CX-I-c’’)                                        (CX-I-d’’) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-I-a''), (CX-I-b''), (CX-I-c'') or (CX-I-d''): (CX-I-a'') (CX-I-b'') (CX-I-c'') (CX-I-d'') or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CX-I-a’’’)、(CX-I-b’’’)、(CX-I-c’’’)或(CX-I-d’’’)表示: (CX-I-a’’’)                                           (CX-I-b’’’) (CX-I-c’’’)                                    (CX-I-d’’’) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-I-a'''), (CX-I-b'''), (CX-I-c'''), or (CX-I-d'''): (CX-I-a''') (CX-I-b''') (CX-I-c''') (CX-I-d''') or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CX-I-e)或(CX-I-f)表示: (CX-I-e)                            (CX-I-f) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-Ie) or (CX-If): (CX-Ie) (CX-If) or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CX-I-e’)或(CX-I-f’)表示: (CX-I-e’)                              (CX-I-f’) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-I-e') or (CX-I-f'): (CX-I-e') (CX-I-f') or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CX-I-e’’)或(CX-I-f’’)表示: (CX-I-e’’)                                 (CX-I-f’’) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-I-e'') or (CX-I-f''): (CX-I-e'') (CX-I-f'') or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CX-I-e’’’)或(CX-I-f’’’)表示: (CX-I-e’’’)                             (CX-I-f’’’) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-I-e''') or (CX-I-f'''): (CX-I-e''') (CX-I-f''') or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CX-II)表示: (CX-II) 或其醫藥學上可接受之鹽, 其中 Z選自由以下組成之群:一鍵、 ; 每個Y獨立地選自由以下組成之群: ; R 1為-(CH 2) 1-6N(R a) 2或-(CH 2) 1-6OH; R 2為視情況經取代之C 5-C 36烷基或視情況經取代之C 5-C 36烯基,其中R 2之2個亞甲基單元經-O-置換以在R 2內形成羧醛,且其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換; R 2’為視情況經取代之C 1-C 36烷基或視情況經取代之C 5-C 36烯基,其中R 2’之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換; 每個R a獨立地為視情況經取代之C 1-C 6烷基;或 兩個R a與其所連接之氮合起來形成視情況經取代之4-7員雜環基環; m為0、1或2; n為1或2;且 p為1或2。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-II): (CX-II) or a pharmaceutically acceptable salt thereof, wherein Z is selected from the group consisting of: a , , , , and ; Each Y is independently selected from the group consisting of: , , , , , , and ; R 1 is -(CH 2 ) 1-6 N( Ra ) 2 or -(CH 2 ) 1-6 OH; R 2 is an optionally substituted C 5 -C 36 alkyl group or an optionally substituted C 5 -C 36 alkenyl group, wherein 2 methylene units of R 2 are replaced by -O- to form a carboxaldehyde in R 2 , and wherein 1-6 methylene units of R 2 are optionally replaced by a group independently selected from cyclopropene, -O-, -OC(O)- and -C(O)O-; R 2' is an optionally substituted C 1 -C 36 alkyl group or an optionally substituted C 5 -C 36 alkenyl group, wherein R 1-6 methylene units of 2' are optionally replaced by a group independently selected from cyclopropene, -O-, -OC(O)- and -C(O)O-; each Ra is independently an optionally substituted C1 - C6 alkyl; or two Ra are combined with the nitrogen to which they are attached to form an optionally substituted 4-7 membered heterocyclic ring; m is 0, 1 or 2; n is 1 or 2; and p is 1 or 2.

在一些實施例中,本揭示案之化合物由式(CX-ii)表示: (CX-ii) 或其醫藥學上可接受之鹽, 其中 Z選自由以下組成之群:一鍵、 ; 每個Y獨立地選自由以下組成之群: ; R 1為-(CH 2) 1-6N(R a) 2; R 2為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換; R 2’為視情況經取代之C 1-C 36烷基,其中R 2’之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換; 每個R a獨立地為視情況經取代之C 1-C 6烷基;或 兩個R a與其所連接之氮合起來形成視情況經取代之4-7員雜環基環; m為0、1或2; n為1或2;且 p為1或2。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-ii): (CX-ii) or a pharmaceutically acceptable salt thereof, wherein Z is selected from the group consisting of: a , , , , and ; Each Y is independently selected from the group consisting of: , , , , , , and ; R 1 is -(CH 2 ) 1-6 N( Ra ) 2 ; R 2 is an optionally substituted C 1 -C 36 alkyl group or an optionally substituted C 2 -C 36 alkenyl group, wherein 1 to 6 methylene units of R 2 are optionally substituted by a group selected from cyclopropylene, -O-, -OC(O)- and -C(O)O-; R 2' is an optionally substituted C 1 -C 36 alkyl group, wherein 1 to 6 methylene units of R 2' are optionally substituted by a group selected from cyclopropylene, -O-, -OC(O)- and -C(O)O-; each Ra is independently an optionally substituted C 1 -C 6 alkyl group; or two R a and the nitrogen to which it is attached form an optionally substituted 4-7 membered heterocyclic ring; m is 0, 1 or 2; n is 1 or 2; and p is 1 or 2.

在一些實施例中,本揭示案之化合物由式(CX-II’)、(CX-II’’)、(CX-II’’’)表示, (CX-II’)            (CX-II’’)        (CX-II’’’) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-II'), (CX-II''), (CX-II'''), (CX-II') (CX-II'') (CX-II''') or a pharmaceutically acceptable salt thereof.

在一些實施例中,本揭示案之化合物由式(CX-II-a)表示 (CX-II-a) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-II-a): (CX-II-a) or its pharmaceutically acceptable salt.

在一些實施例中,本揭示案之化合物由式(CX-II-a’)、(CX-II-a’’)或(CX-II-a’’’)表示, (CX-II-a’)                (CX-II-a’’)                   (CX-II-a’’’) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-II-a'), (CX-II-a'') or (CX-II-a'''), (CX-II-a') (CX-II-a'') (CX-II-a''') or a pharmaceutically acceptable salt thereof.

在一些實施例中,本揭示案之化合物由式(CX-II-b)、(CX-II-c)或(CX-II-d)表示 (CX-II-b)                    (CX-II-c)                    (CX-II-d) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-II-b), (CX-II-c) or (CX-II-d): (CX-II-b) (CX-II-c) (CX-II-d) or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CX-II-b’)、(CX-II-c’)或(CX-II-d’)表示 (CX-II-b’)                (CX-II-c’)                (CX-II-d’) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-II-b'), (CX-II-c') or (CX-II-d'): (CX-II-b') (CX-II-c') (CX-II-d') or a pharmaceutically acceptable salt thereof.

在一些實施例中,本揭示案之化合物由式(CX-II-b’’)、(CX-II-c’’)或(CX-II-d’’)表示 (CX-II-b’’)                   (CX-II-c’’)                   (CX-II-d’’) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-II-b''), (CX-II-c'') or (CX-II-d''): (CX-II-b'') (CX-II-c'') (CX-II-d'') or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CX-II-b’’’)、(CX-II-c’’’)或(CX-II-d’’’)表示 (CX-II-b’’’)               (CX-II-c’’’)               (CX-II-d’’’) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-II-b'''), (CX-II-c''') or (CX-II-d'''): (CX-II-b''') (CX-II-c''') (CX-II-d''') or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CX-II-e)表示表示: (CX-II-e) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-II-e): (CX-II-e) or its pharmaceutically acceptable salt.

在一些實施例中,本揭示案之化合物由式(CX-III)表示 (CX-III) 或其醫藥學上可接受之鹽, 其中 Z選自由以下組成之群:一鍵、 ; 每個Y獨立地選自由以下組成之群: ; R 1為-(CH 2) 1-6N(R a) 2或-(CH 2) 1-6OH; R 2為視情況經取代之C 5-C 36烷基或視情況經取代之C 5-C 36烯基,其中R 2之2個亞甲基單元經-O-置換以在R 2內形成羧醛,且其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換; R 2’為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換; 每個R a獨立地為視情況經取代之C 1-C 6烷基;或 兩個R a與其所連接之氮合起來形成視情況經取代之4-7員雜環基環; m為0、1或2;且 n為1或2。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-III): (CX-III) or a pharmaceutically acceptable salt thereof, wherein Z is selected from the group consisting of: a , , , , and ; Each Y is independently selected from the group consisting of: , , , , , , and ; R 1 is -(CH 2 ) 1-6 N( Ra ) 2 or -(CH 2 ) 1-6 OH; R 2 is an optionally substituted C 5 -C 36 alkyl group or an optionally substituted C 5 -C 36 alkenyl group, wherein 2 methylene units of R 2 are replaced by -O- to form a carboxaldehyde in R 2 , and wherein 1-6 methylene units of R 2 are optionally replaced by a group independently selected from cyclopropene, -O-, -OC(O)- and -C(O)O-; R 2' is an optionally substituted C 1 -C 36 alkyl group or an optionally substituted C 2 -C 36 alkenyl group, wherein R 2 is optionally substituted by a group independently selected from cyclopropene, -O-, -OC(O)- and -C(O)O-; each Ra is independently an optionally substituted C1 - C6 alkyl; or two Ra are combined with the nitrogen to which they are attached to form an optionally substituted 4-7 membered heterocyclic ring; m is 0, 1 or 2; and n is 1 or 2.

在一些實施例中,本揭示案之化合物由式(CX-iii)表示 (CX-iii) 或其醫藥學上可接受之鹽, 其中 Z選自由以下組成之群:一鍵、 ; 每個Y獨立地選自由以下組成之群: ; R 1為-(CH 2) 1-6N(R a) 2; 每個R 2獨立地為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換; 每個R a獨立地為視情況經取代之C 1-C 6烷基;或 兩個R a與其所連接之氮合起來形成視情況經取代之4-7員雜環基環; m為0、1或2;且 n為1或2。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-iii): (CX-iii) or a pharmaceutically acceptable salt thereof, wherein Z is selected from the group consisting of: a , , , , and ; Each Y is independently selected from the group consisting of: , , , , , , and ; R 1 is -(CH 2 ) 1-6 N(R a ) 2 ; each R 2 is independently an optionally substituted C 1 -C 36 alkyl group or an optionally substituted C 2 -C 36 alkenyl group, wherein 1-6 methylene units of R 2 are optionally replaced by a group independently selected from cyclopropene, -O-, -OC(O)- and -C(O)O-; each Ra is independently an optionally substituted C 1 -C 6 alkyl group; or two Ra s are combined with the nitrogen to which they are attached to form an optionally substituted 4-7 membered heterocyclic ring; m is 0, 1 or 2; and n is 1 or 2.

在一些實施例中,本揭示案之化合物由式(CX-III-a)、(CX-III-b)或(CX-III-c)表示: (CX-III-a)                         (CX-III-b) (CX-III-c) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-III-a), (CX-III-b) or (CX-III-c): (CX-III-a) (CX-III-b) (CX-III-c) or its pharmaceutically acceptable salt.

在一些實施例中,本揭示案之化合物由式(CX-III-d)或(CX-III-e)表示 (CX-III-d)                                (CX-III-e) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-III-d) or (CX-III-e): (CX-III-d) (CX-III-e) or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CX-IV)表示 (CX-IV) 或其醫藥學上可接受之鹽, 其中 Z選自由以下組成之群:一鍵、 ; 每個Y獨立地選自由以下組成之群: ; R 1為-(CH 2) 1-6N(R a) 2或-(CH 2) 1-6OH; R 2為C 3-C 36分支鏈烷基或視情況經取代之C 3-C 36分支鏈烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基及-O-之基團置換; R 2’為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換; 每個R a獨立地為視情況經取代之C 1-C 6烷基;或 兩個R a與其所連接之氮合起來形成視情況經取代之4-7員雜環基環; m為0、1或2;且 n為1或2。 In some embodiments, the compounds of the present disclosure are represented by formula (CX-IV): (CX-IV) or a pharmaceutically acceptable salt thereof, wherein Z is selected from the group consisting of: a , , , , and ; Each Y is independently selected from the group consisting of: , , , , , , and ; R 1 is -(CH 2 ) 1-6 N( Ra ) 2 or -(CH 2 ) 1-6 OH; R 2 is a C 3 -C 36 branched alkyl group or an optionally substituted C 3 -C 36 branched alkenyl group, wherein 1-6 methylene units of R 2 are optionally substituted by a cyclopropane group and -O-; R 2' is an optionally substituted C 1 -C 36 alkyl group or an optionally substituted C 2 -C 36 alkenyl group, wherein 1-6 methylene units of R 2 are optionally substituted by a cyclopropane group, -O-, -OC(O)- and -C(O)O-; each Ra is independently an optionally substituted C 1 -C or two Ra are taken together with the nitrogen to which they are attached to form an optionally substituted 4-7 membered heterocyclic ring; m is 0, 1 or 2; and n is 1 or 2.

在一些實施例中,化合物由式(CX-IV-a)、(CX-IV-b)或(CX-IV-c)表示: (CX-IV-a)                         (CX-IV-b) (CX-IV-c) 或其醫藥學上可接受之鹽。 In some embodiments, the compound is represented by formula (CX-IV-a), (CX-IV-b) or (CX-IV-c): (CX-IV-a) (CX-IV-b) (CX-IV-c) or its pharmaceutically acceptable salt.

在一些實施例中,本揭示案之化合物由式(CX-IV-d)或(CX-IV-e)表示: (CX-IV-d)                               (CX-IV-e) 或其醫藥學上可接受之鹽。 Z In some embodiments, the compounds of the present disclosure are represented by formula (CX-IV-d) or (CX-IV-e): (CX-IV-d) (CX-IV-e) or their pharmaceutically acceptable salts.

在一些實施例中,Z選自由以下組成之群:一鍵、 In some embodiments, Z is selected from the group consisting of: a key, , , , , and .

在一些實施例中,Z選自由以下組成之群: 。在一些實施例中,Z為 In some embodiments, Z is selected from the group consisting of: , and In some embodiments, Z is .

在一些實施例中,Z選自由以下組成之群:一鍵、 ,其中R 1在由*表示之位置處進行連接。 In some embodiments, Z is selected from the group consisting of: a key, , , , , and , wherein R 1 is connected at the position indicated by *.

在一些實施例中,Z選自由以下組成之群: ,其中R 1在由*表示之位置處進行連接。 In some embodiments, Z is selected from the group consisting of: , , and , wherein R 1 is connected at the position indicated by *.

在一些實施例中,Z為 。在一些實施例中,Z為 ,其中R 1在由*表示之位置處進行連接。在一些實施例中,Z為 ,其中R 1在由*表示之位置處進行連接。在一些實施例中,Z為 ,其中R 1在由*表示之位置處進行連接。 Y In some embodiments, Z is In some embodiments, Z is , wherein R 1 is connected at the position indicated by *. In some embodiments, Z is , wherein R 1 is connected at the position indicated by *. In some embodiments, Z is , wherein R 1 is connected at the position indicated by *.

在一些實施例中,每個Y獨立地選自由以下組成之群: In some embodiments, each Y is independently selected from the group consisting of: , , , , , , and .

在一些實施例中,Y選自由以下組成之群: In some embodiments, Y is selected from the group consisting of: , , and .

在一些實施例中,Y選自由以下組成之群: ,其中R 2在由*表示之位置處進行連接。 In some embodiments, Y is selected from the group consisting of: , , and , wherein R 2 is connected at the position indicated by *.

在一些實施例中,Y為 ,其中R 2在由*表示之位置處進行連接。 In some embodiments, Y is , wherein R 2 is connected at the position indicated by *.

在一些實施例中,Y為 ,其中R 2在由*表示之位置處進行連接。 In some embodiments, Y is , wherein R 2 is connected at the position indicated by *.

在一些實施例中,Y為 ,其中R 2在由*表示之位置處進行連接。 In some embodiments, Y is , wherein R 2 is connected at the position indicated by *.

在一些實施例中,Y為 ,其中R 2在由*表示之位置處進行連接。 In some embodiments, Y is , wherein R 2 is connected at the position indicated by *.

在一些實施例中,Y為 In some embodiments, Y is .

在一些實施例中,Y為 In some embodiments, Y is or .

在一些實施例中,Y為 In some embodiments, Y is .

在一些實施例中,Y為 R 1 In some embodiments, Y is . R 1

在一些實施例中,R 1為-(CH 2) 1-6N(R a) 2或-(CH 2) 1-6OH。在一些實施例中,R 1為-(CH 2) 1-6OH。在一些實施例中,R 1為-(CH 2) 1-6N(R a) 2。在一些實施例中,R 1為-(CH 2) 2N(R a) 2。在一些實施例中,R 1為-(CH 2) 3N(R a) 2。在一些實施例中,R 1為-(CH 2) 4N(R a) 2。在一些實施例中,R 1為-(CH 2) 1-6N(Me) 2。在一些實施例中,R 1為-(CH 2) 1-6N(Et) 2。在一些實施例中,R 1為-(CH 2) 1-6N(n-Pr) 2。在一些實施例中,R 1為-(CH 2) 1-6N(i-Pr) 2。在一些實施例中,R 1為-(CH 2) 2N(Me) 2。在一些實施例中,R 1為-(CH 2) 3N(Me) 2。在一些實施例中,R 1為-(CH 2) 4N(Me) 2。在一些實施例中,R 1為-(CH 2) 2N(Et) 2。在一些實施例中,R 1為-(CH 2) 3N(Et) 2。在一些實施例中,R 1為-(CH 2) 4N(Et) 2In some embodiments, R 1 is -(CH 2 ) 1-6 N(R a ) 2 or -(CH 2 ) 1-6 OH. In some embodiments, R 1 is -(CH 2 ) 1-6 OH. In some embodiments, R 1 is -(CH 2 ) 1-6 N(R a ) 2. In some embodiments, R 1 is -(CH 2 ) 2 N(R a ) 2. In some embodiments, R 1 is -(CH 2 ) 3 N(R a ) 2. In some embodiments, R 1 is -(CH 2 ) 4 N(R a ) 2. In some embodiments, R 1 is -(CH 2 ) 1-6 N(Me) 2 . In some embodiments, R 1 is -(CH 2 ) 1-6 N(Et) 2 . In some embodiments, R 1 is -(CH 2 ) 1-6 N(n-Pr) 2 . In some embodiments, R 1 is -(CH 2 ) 1-6 N(i-Pr) 2 . In some embodiments, R 1 is -(CH 2 ) 2 N(Me) 2 . In some embodiments, R 1 is -(CH 2 ) 3 N(Me) 2 . In some embodiments, R 1 is -(CH 2 ) 4 N(Me) 2 . In some embodiments, R 1 is -(CH 2 ) 2 N(Et) 2 . In some embodiments, R 1 is -(CH 2 ) 3 N(Et) 2 . In some embodiments, R 1 is -(CH 2 ) 4 N(Et) 2 .

在一些實施例中,R 1選自由以下組成之群: In some embodiments, R is selected from the group consisting of: , , and

在一些實施例中,R 1選自由以下組成之群: In some embodiments, R is selected from the group consisting of: , , and .

在一些實施例中,R 1選自由以下組成之群: R 2 及R 2’ In some embodiments, R is selected from the group consisting of: , , and . R 2 and R 2'

在一些實施例中,R 2為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2為視情況經取代之C 1-C 32烷基或視情況經取代之C 2-C 32烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2為視情況經取代之C 1-C 30烷基或視情況經取代之C 2-C 30烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2為視情況經取代之C 1-C 24烷基或視情況經取代之C 2-C 24烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2為視情況經取代之C 1-C 24烷基或視情況經取代之C 2-C 24烯基,其中R 2之1-6個亞甲基單元經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2為視情況經取代之C 1-C 24烷基或視情況經取代之C 2-C 24烯基。在一些實施例中,R 2為視情況經取代之C 10-C 24烷基或視情況經取代之C 10-C 24烯基,其中R 2之1-6個亞甲基單元經-O-置換。 In some embodiments, R 2 is an optionally substituted C 1 -C 36 alkyl group or an optionally substituted C 2 -C 36 alkenyl group, wherein 1-6 methylene units of R 2 are optionally replaced by a group selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R 2 is an optionally substituted C 1 -C 32 alkyl group or an optionally substituted C 2 -C 32 alkenyl group, wherein 1-6 methylene units of R 2 are optionally replaced by a group selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R2 is an optionally substituted C1 - C30 alkyl group or an optionally substituted C2-C30 alkenyl group, wherein 1-6 methylene units of R2 are optionally replaced by a group selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R2 is an optionally substituted C1 - C24 alkyl group or an optionally substituted C2- C24 alkenyl group, wherein 1-6 methylene units of R2 are optionally replaced by a group selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R2 is an optionally substituted C1 - C24 alkyl group or an optionally substituted C2- C24 alkenyl group, wherein 1-6 methylene units of R2 are replaced by groups independently selected from cyclopropene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R2 is an optionally substituted C1 - C24 alkyl group or an optionally substituted C2- C24 alkenyl group. In some embodiments, R2 is an optionally substituted C10 - C24 alkyl group or an optionally substituted C10 - C24 alkenyl group, wherein 1-6 methylene units of R2 are replaced by -O-.

在一些實施例中,R 2為視情況經取代之C 5-C 36烷基或視情況經取代之C 5-C 36烯基,其中R 2之2個亞甲基單元經-O-置換以在R 2內形成縮醛,且其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換;且R 2’為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。 In some embodiments, R2 is an optionally substituted C5 - C36 alkyl group or an optionally substituted C5 - C36 alkenyl group, wherein 2 methylene units of R2 are replaced by -O- to form an acetal in R2 , and wherein 1-6 methylene units of R2 are optionally replaced by a group independently selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-; and R2 ' is an optionally substituted C1 - C36 alkyl group or an optionally substituted C2 - C36 alkenyl group, wherein 1-6 methylene units of R2 are optionally replaced by a group independently selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-.

在一些實施例中,R 2為視情況經取代之C 10-C 24烷基或視情況經取代之C 10-C 24烯基,其中R 2之2個亞甲基單元經-O-置換以在R 2內形成縮醛,且其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換;且R 2’為視情況經取代之C 10-C 36分支鏈烷基或視情況經取代之C 10-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。 In some embodiments, R2 is an optionally substituted C10 - C24 alkyl group or an optionally substituted C10 -C24 alkenyl group, wherein 2 methylene units of R2 are replaced by —O— to form an acetal in R2 , and wherein 1-6 methylene units of R2 are optionally replaced by a group independently selected from cyclopropylene, —O—, —OC(O)—, and —C(O)O—; and R2 is an optionally substituted C10 - C36 branched chain alkyl group or an optionally substituted C10 -C36 alkenyl group, wherein 1-6 methylene units of R2 are optionally replaced by a group independently selected from cyclopropylene, —O—, —OC(O)—, and —C(O)O—.

在一些實施例中,R 2為C 3-C 36分支鏈烷基或視情況經取代之C 3-C 36分支鏈烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基及-O-之基團置換;且R 2’為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2為視情況經取代之C 10-C 24分支鏈烷基或視情況經取代之C 10-C 24分支鏈烯基,其中R 2之1-3個亞甲基單元視情況經-O-置換;且R 2’為視情況經取代之C 10-C 36烷基或視情況經取代之C 10-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。 In some embodiments, R2 is a C3 - C36 branched alkyl group or an optionally substituted C3 - C36 branched alkenyl group, wherein 1-6 methylene units of R2 are optionally replaced by a group independently selected from cyclopropylene and -O-; and R2 ' is an optionally substituted C1 - C36 alkyl group or an optionally substituted C2 - C36 alkenyl group, wherein 1-6 methylene units of R2 are optionally replaced by a group independently selected from cyclopropylene, -O-, -OC(O)- and -C(O)O-. In some embodiments, R2 is an optionally substituted C10 - C24 branched chain alkyl or an optionally substituted C10 - C24 branched chain alkenyl, wherein 1-3 methylene units of R2 are optionally replaced by -O-; and R2 ' is an optionally substituted C10 - C36 alkyl or an optionally substituted C10 - C36 alkenyl, wherein 1-6 methylene units of R2 are optionally replaced by a group independently selected from cyclopropene, -O-, -OC(O)- and -C(O)O-.

在一些實施例中,R 2及/或R 2’其中每個q獨立地選自0-12且每個R °獨立地經選擇,且如本文所描述及定義。 In some embodiments, R 2 and/or R 2' is wherein each q is independently selected from 0-12 and each R ° is independently selected and as described and defined herein.

在一些實施例中,R 2及/或R 2’其中每個q獨立地選自0-12。 In some embodiments, R 2 and/or R 2' is where each q is independently selected from 0-12.

在一些實施例中,R 2為視情況經取代之C 5-C 36烷基或視情況經取代之C 5-C 36烯基,其中R 2之2個亞甲基單元經-O-置換以在R 2內形成縮醛,且其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換;R 2’為視情況經取代之C 1-C 36烷基或視情況經取代之C 5-C 36烯基,其中R 2’之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。 In some embodiments, R2 is an optionally substituted C5 - C36 alkyl group or an optionally substituted C5 - C36 alkenyl group, wherein 2 methylene units of R2 are replaced by -O- to form an acetal in R2 , and wherein 1-6 methylene units of R2 are optionally replaced by a group independently selected from cyclopropylene, -O-, -OC(O)- and -C(O)O-; R2 ' is an optionally substituted C1 - C36 alkyl group or an optionally substituted C5 - C36 alkenyl group, wherein 1-6 methylene units of R2 ' are optionally replaced by a group independently selected from cyclopropylene, -O-, -OC(O)- and -C(O)O-.

在一些實施例中,R 2為視情況經取代之C 10-C 24烷基,其中R 2之2個亞甲基單元經-O-置換以在R 2內形成縮醛,且R 2’為視情況經取代之C 10-C 24烷基,其中R 2’之2個亞甲基單元經-O-置換以在R 2’內形成縮醛。 In some embodiments, R2 is an optionally substituted C10 - C24 alkyl group wherein 2 methylene units of R2 are replaced by —O— to form an acetal within R2 , and R2 is an optionally substituted C10 - C24 alkyl group wherein 2 methylene units of R2 are replaced by —O— to form an acetal within R2 .

在一些實施例中,每個q獨立地選自0-6。在一些實施例中,每個q獨立地選自0-8。在一些實施例中,每個q獨立地選自0-10。在一些實施例中,每個q獨立地選自0-12。In some embodiments, each q is independently selected from 0-6. In some embodiments, each q is independently selected from 0-8. In some embodiments, each q is independently selected from 0-10. In some embodiments, each q is independently selected from 0-12.

在一些實施例中,R 2為視情況經取代之C 10-C 24烷基或視情況經取代之C 10-C 24烯基,其中R 2之2個亞甲基單元經-O-置換以在R 2內形成縮醛,且其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換;且R 2’為視情況經取代之C 10-C 24烯基,其中R 2’之1-3個亞甲基單元視情況經-O-置換。 In some embodiments, R2 is an optionally substituted C10 - C24 alkyl or an optionally substituted C10 - C24 alkenyl, wherein 2 methylene units of R2 are replaced by -O- to form an acetal in R2 , and wherein 1-6 methylene units of R2 are optionally replaced by a group independently selected from cyclopropene, -O-, -OC(O)- and -C(O)O-; and R2 ' is an optionally substituted C10 - C24 alkenyl, wherein 1-3 methylene units of R2 ' are optionally replaced by -O-.

在一些實施例中,R 2選自由以下組成之群: In some embodiments, R is selected from the group consisting of: , , , , , , , , , , , and .

在一些實施例中,R 2In some embodiments, R2 is or .

在一些實施例中,R 2In some embodiments, R2 is .

在一些實施例中,R 2In some embodiments, R2 is .

在一些實施例中,R 2選自由以下組成之群: In some embodiments, R is selected from the group consisting of: , and .

在一些實施例中,R 2’為視情況經取代之C 1-C 36烷基,其中R 2’之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2’為視情況經取代之C 1-C 32烷基,其中R 2’之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2’為視情況經取代之C 1-C 30烷基,其中R 2’之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2’為視情況經取代之C 1-C 24烷基,其中R 2’之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2’為視情況經取代之C 1-C 24烷基,其中R 2’之1-6個亞甲基單元經各自獨立地選自-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2’為視情況經取代之C 1-C 24烷基。在一些實施例中,R 2’為視情況經取代之C 10-C 24烷基,其中R 2’之1-6個亞甲基單元經-O-置換。 In some embodiments, R 2' is an optionally substituted C 1 -C 36 alkyl group, wherein 1-6 methylene units of R 2' are optionally replaced by a group independently selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R 2' is an optionally substituted C 1 -C 32 alkyl group, wherein 1-6 methylene units of R 2' are optionally replaced by a group independently selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R 2' is an optionally substituted C 1 -C 30 alkyl group, wherein 1-6 methylene units of R 2' are optionally replaced by a group independently selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R 2' is an optionally substituted C 1 -C 24 alkyl group, wherein 1-6 methylene units of R 2' are optionally replaced by a group independently selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R 2' is an optionally substituted C 1 -C 24 alkyl group, wherein 1-6 methylene units of R 2' are replaced by a group independently selected from -O-, -OC(O)-, and -C(O)O-. In some embodiments, R 2' is an optionally substituted C 1 -C 24 alkyl group. In some embodiments, R 2' is an optionally substituted C 10 -C 24 alkyl group, wherein 1-6 methylene units of R 2' are replaced by -O-.

在一些實施例中,R 2’選自由以下組成之群: In some embodiments, R 2' is selected from the group consisting of: , , , , , , , and .

在一些實施例中,R 2及R 2’各自獨立地選自由以下組成之群: In some embodiments, R 2 and R 2' are each independently selected from the group consisting of: , , , and .

在一些實施例中,R 2’選自由以下組成之群: In some embodiments, R 2' is selected from the group consisting of: , , , , and .

在一些實施例中,R 2’In some embodiments, R 2' is .

在一些實施例中,R 2’In some embodiments, R 2' is .

在一些實施例中,R 2’選自由以下組成之群: In some embodiments, R 2' is selected from the group consisting of: , and .

在一些實施例中,R 2選自由以下組成之群: In some embodiments, R is selected from the group consisting of: , , , and .

在一些實施例中,R 2’選自由以下組成之群: In some embodiments, R 2' is selected from the group consisting of: , , and .

在一些實施例中,本揭示案包括選自下表(IV)中之任何脂質或其醫藥學上可接受之鹽的化合物: 表(IV). 可离子化脂質之非限制性實例 結構 化合物編號 CX-1 CX-2 CX-3 CX-4 CX-5 CX-6 CX-7 CX-8 CX-8a CX-8b CX-8c CX-9 CX-10 CX-11 CX-12 CX-13 CX-14 CX-15 CX-16 CX-17 CX-18 CX-19 CX-20 CX-21 CX-22 CX-23 CX-24 CX-25 CX-26 CX-27 CX-28 CX-29 CX-30 CX-30a CX-30b CX-30c In some embodiments, the present disclosure includes a compound selected from any lipid or a pharmaceutically acceptable salt thereof in Table (IV) below: Table (IV). Non-limiting Examples of Ionizable Lipids Structure Compound No. CX-1 CX-2 CX-3 CX-4 CX-5 CX-6 CX-7 CX-8 CX-8a CX-8b CX-8c CX-9 CX-10 CX-11 CX-12 CX-13 CX-14 CX-15 CX-16 CX-17 CX-18 CX-19 CX-20 CX-21 CX-22 CX-23 CX-24 CX-25 CX-26 CX-27 CX-28 CX-29 CX-30 CX-30a CX-30b CX-30c

在一些實施例中,本揭示案之脂質包含雜環核心,其中雜原子為氮。在一些實施例中,該雜環核心包含吡咯啶或其衍生物。在一些實施例中,該雜環核心包含哌啶或其衍生物。In some embodiments, the lipids of the present disclosure comprise a heterocyclic core, wherein the heteroatom is nitrogen. In some embodiments, the heterocyclic core comprises pyrrolidine or a derivative thereof. In some embodiments, the heterocyclic core comprises piperidine or a derivative thereof.

在一些實施例中,本揭示案之化合物由式(CZ-I)表示 (CZ-I) 或其醫藥學上可接受之鹽, 其中 Z選自由以下組成之群:一鍵、 ; 每個Y獨立地選自由以下組成之群: ; R 1為-(CH 2) 1-6N(R a) 2; 每個R 2獨立地為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換; 每個R a獨立地為視情況經取代之C 1-C 6烷基;或 兩個R a與其所連接之氮合起來形成視情況經取代之4-7員雜環基環; m為0、1或2; n為1或2;且 p為1或2。 In some embodiments, the compounds of the present disclosure are represented by formula (CZ-I): (CZ-I) or a pharmaceutically acceptable salt thereof, wherein Z is selected from the group consisting of: a , , , , and ; Each Y is independently selected from the group consisting of: , , and ; R 1 is -(CH 2 ) 1-6 N(R a ) 2 ; each R 2 is independently an optionally substituted C 1 -C 36 alkyl or an optionally substituted C 2 -C 36 alkenyl, wherein 1-6 methylene units of R 2 are optionally replaced by a group independently selected from cyclopropene, -O-, -OC(O)- and -C(O)O-; each Ra is independently an optionally substituted C 1 -C 6 alkyl; or two Ra s are combined with the nitrogen to which they are attached to form an optionally substituted 4-7 membered heterocyclic ring; m is 0, 1 or 2; n is 1 or 2; and p is 1 or 2.

在一些實施例中,本揭示案之化合物由式(CZ-I-a)、(CZ-I-b)、(CZ-I-c)或(CZ-I-d)表示 (CZ-I-a)                                         (CZ-I-b) (CZ-I-c)                     (CZ-I-d) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CZ-Ia), (CZ-Ib), (CZ-Ic) or (CZ-Id): (CZ-Ia) (CZ-Ib) (CZ-Ic) (CZ-Id) or its pharmaceutically acceptable salt.

在一些實施例中,本揭示案之化合物由式(CZ-I-e)或(CZ-I-f)表示 (CZ-I-e)                            (CZ-I-f) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CZ-Ie) or (CZ-If) (CZ-Ie) (CZ-If) or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CZ-I-g)表示 (CZ-I-g) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CZ-Ig): (CZ-Ig) or its pharmaceutically acceptable salt.

在一些實施例中,本揭示案之化合物由式(CZ-II)表示 (CZ-II) 或其醫藥學上可接受之鹽, 其中 Z選自由以下組成之群:一鍵、 ; 每個Y獨立地選自由以下組成之群: ; R 1為-(CH 2) 1-6N(R a) 2; 每個R 2獨立地為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換; 每個R a獨立地為視情況經取代之C 1-C 6烷基;或 兩個R a與其所連接之氮合起來形成視情況經取代之4-7員雜環基環; m為0、1或2; n為1或2;且 p為1或2。 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CZ-II): (CZ-II) or a pharmaceutically acceptable salt thereof, wherein Z is selected from the group consisting of: a , , , , and ; Each Y is independently selected from the group consisting of: , , and ; R 1 is -(CH 2 ) 1-6 N( Ra ) 2 ; each R 2 is independently an optionally substituted C 1 -C 36 alkyl or an optionally substituted C 2 -C 36 alkenyl, wherein 1-6 methylene units of R 2 are independently selected from cyclopropene, -O-, -OC(O)- and -C(O)O- group replacement; each Ra is independently an optionally substituted C 1 -C 6 alkyl; or two Ras and the nitrogen to which they are attached form an optionally substituted 4-7 membered heterocyclic ring; m is 0, 1 or 2; n is 1 or 2; and p is 1 or 2. Or a pharmaceutically acceptable salt thereof.

在一些實施例中,本揭示案之化合物由式(CZ-II-a)、(CZ-II-b)、(CZ-II-c)或(CZ-II-d)表示: (CZ-II-a)                                               (CZ-II-b) (CZ-II-c)                           (CZ-II-d) 或其醫藥學上可接受之鹽。 In some embodiments, the compounds of the present disclosure are represented by formula (CZ-II-a), (CZ-II-b), (CZ-II-c) or (CZ-II-d): (CZ-II-a) (CZ-II-b) (CZ-II-c) (CZ-II-d) or their pharmaceutically acceptable salts.

在一些實施例中,本揭示案之化合物由式(CZ-II-e)表示 (CZ-II-e) 或其醫藥學上可接受之鹽。 Z In some embodiments, the compounds of the present disclosure are represented by formula (CZ-II-e): (CZ-II-e) or its pharmaceutically acceptable salt.

在一些實施例中,Z選自由以下組成之群:一鍵、 In some embodiments, Z is selected from the group consisting of: a key, , , , , and .

在一些實施例中,Z選自由以下組成之群: 。在一些實施例中,Z為 In some embodiments, Z is selected from the group consisting of: , and In some embodiments, Z is .

在一些實施例中,Z選自由以下組成之群:一鍵、 ,其中R 1在由*表示之位置處進行連接。 In some embodiments, Z is selected from the group consisting of: a key, , , , , and , wherein R 1 is connected at the position indicated by *.

在一些實施例中,Z選自由以下組成之群: ,其中R 1在由*表示之位置處進行連接。 In some embodiments, Z is selected from the group consisting of: , , and , wherein R 1 is connected at the position indicated by *.

在一些實施例中,Z為 In some embodiments, Z is .

在一些實施例中,Z為 ,其中R 1在由*表示之位置處進行連接。 In some embodiments, Z is , wherein R 1 is connected at the position indicated by *.

在一些實施例中,Z為 ,其中R 1在由*表示之位置處進行連接。 In some embodiments, Z is , wherein R 1 is connected at the position indicated by *.

在一些實施例中,Z為 ,其中R 1在由*表示之位置處進行連接。 In some embodiments, Z is , wherein R 1 is connected at the position indicated by *.

在一些實施例中,Z為 ,其中R 1在由*表示之位置處進行連接。 Y In some embodiments, Z is , wherein R 1 is connected at the position indicated by *.

在一些實施例中,Y選自由以下組成之群: 。在一些實施例中,Y為 In some embodiments, Y is selected from the group consisting of: , , and In some embodiments, Y is or .

在一些實施例中,Y選自由以下組成之群: ,其中R 2在由*表示之位置處進行連接。 In some embodiments, Y is selected from the group consisting of: , , and , wherein R 2 is connected at the position indicated by *.

在一些實施例中,Y為 ,其中R 2在由*表示之位置處進行連接。在一些實施例中,Y為 ,其中R 2在由*表示之位置處進行連接。 In some embodiments, Y is , wherein R 2 is connected at the position indicated by *. In some embodiments, Y is , wherein R 2 is connected at the position indicated by *.

在一些實施例中,Y為 ,其中R 2在由*表示之位置處進行連接。 In some embodiments, Y is , wherein R 2 is connected at the position indicated by *.

在一些實施例中,Y為 ,其中R 2在由*表示之位置處進行連接。 In some embodiments, Y is , wherein R 2 is connected at the position indicated by *.

在一些實施例中,Y為 In some embodiments, Y is .

在一些實施例中,Y為 In some embodiments, Y is or .

在一些實施例中,Y為 In some embodiments, Y is .

在一些實施例中,Y為 R 1 In some embodiments, Y is . R 1

在一些實施例中,R 1為-(CH 2) 1-6N(R a) 2。在一些實施例中,R 1為-(CH 2) 2N(R a) 2。在一些實施例中,R 1為-(CH 2) 3N(R a) 2。在一些實施例中,R 1為-(CH 2) 4N(R a) 2。在一些實施例中,R 1為-(CH 2) 1-6N(Me) 2。在一些實施例中,R 1為-(CH 2) 1-6N(Et) 2。在一些實施例中,R 1為-(CH 2) 1-6N(n-Pr) 2。在一些實施例中,R 1為-(CH 2) 1-6N(CZ-I-Pr) 2。在一些實施例中,R 1為-(CH 2) 2N(Me) 2。在一些實施例中,R 1為-(CH 2) 3N(Me) 2。在一些實施例中,R 1為-(CH 2) 4N(Me) 2。在一些實施例中,R 1為-(CH 2) 2N(Et) 2。在一些實施例中,R 1為-(CH 2) 3N(Et) 2。在一些實施例中,R 1為-(CH 2) 4N(Et) 2In some embodiments, R 1 is -(CH 2 ) 1-6 N(R a ) 2 . In some embodiments, R 1 is -(CH 2 ) 2 N(R a ) 2 . In some embodiments, R 1 is -(CH 2 ) 3 N(R a ) 2 . In some embodiments, R 1 is -(CH 2 ) 4 N(R a ) 2 . In some embodiments, R 1 is -(CH 2 ) 1-6 N(Me) 2 . In some embodiments, R 1 is -(CH 2 ) 1-6 N(Et) 2 . In some embodiments, R 1 is -(CH 2 ) 1-6 N(n-Pr) 2 . In some embodiments, R 1 is -(CH 2 ) 1-6 N(CZ-I-Pr) 2 . In some embodiments, R 1 is -(CH 2 ) 2 N(Me) 2 . In some embodiments, R 1 is -(CH 2 ) 3 N(Me) 2 . In some embodiments, R 1 is -(CH 2 ) 4 N(Me) 2 . In some embodiments, R 1 is -(CH 2 ) 2 N(Et) 2 . In some embodiments, R 1 is -(CH 2 ) 3 N(Et) 2 . In some embodiments, R 1 is -(CH 2 ) 4 N(Et) 2 .

在一些實施例中,R 1選自由以下組成之群: In some embodiments, R1 is selected from the group consisting of: , , and

在一些實施例中,R 1選自由以下組成之群: In some embodiments, R is selected from the group consisting of: , , and .

在一些實施例中,R 1選自由以下組成之群: R 2 In some embodiments, R1 is selected from the group consisting of: , , and . R 2

在一些實施例中,R 2為視情況經取代之C 1-C 36烷基或視情況經取代之C 2-C 36烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2為視情況經取代之C 1-C 32烷基或視情況經取代之C 2-C 32烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2為視情況經取代之C 1-C 30烷基或視情況經取代之C 2-C 30烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2為視情況經取代之C 1-C 24烷基或視情況經取代之C 2-C 24烯基,其中R 2之1-6個亞甲基單元視情況經各自獨立地選自伸環丙基、-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2為視情況經取代之C 1-C 24烷基或視情況經取代之C 2-C 24烯基,其中R 2之1-6個亞甲基單元經各自獨立地選自-O-、-OC(O)-及-C(O)O-之基團置換。在一些實施例中,R 2為視情況經取代之C 1-C 24烷基或視情況經取代之C 2-C 24烯基。在一些實施例中,R 2為視情況經取代之C 10-C 24烷基或視情況經取代之C 10-C 24烯基,其中R 2之1-6個亞甲基單元經-O-置換。 In some embodiments, R 2 is an optionally substituted C 1 -C 36 alkyl group or an optionally substituted C 2 -C 36 alkenyl group, wherein 1-6 methylene units of R 2 are optionally replaced by a group selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R 2 is an optionally substituted C 1 -C 32 alkyl group or an optionally substituted C 2 -C 32 alkenyl group, wherein 1-6 methylene units of R 2 are optionally replaced by a group selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R2 is an optionally substituted C1 - C30 alkyl group or an optionally substituted C2-C30 alkenyl group, wherein 1-6 methylene units of R2 are optionally replaced by a group selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R2 is an optionally substituted C1 - C24 alkyl group or an optionally substituted C2- C24 alkenyl group, wherein 1-6 methylene units of R2 are optionally replaced by a group selected from cyclopropylene, -O-, -OC(O)-, and -C(O)O-. In some embodiments, R2 is an optionally substituted C1 - C24 alkyl group or an optionally substituted C2- C24 alkenyl group, wherein 1-6 methylene units of R2 are replaced by a group independently selected from -O-, -OC(O)-, and -C(O)O-. In some embodiments, R2 is an optionally substituted C1 - C24 alkyl group or an optionally substituted C2 - C24 alkenyl group. In some embodiments, R2 is an optionally substituted C10 - C24 alkyl group or an optionally substituted C10- C24 alkenyl group, wherein 1-6 methylene units of R2 are replaced by -O-.

在一些實施例中,R 2其中每個q獨立地選自0-12且每個R °獨立地經選擇且在本文中定義。 In some embodiments, R2 is wherein each q is independently selected from 0-12 and each R ° is independently selected and defined herein.

在一些實施例中,R 2其中每個q獨立地選自0-12。 In some embodiments, R2 is where each q is independently selected from 0-12.

在一些實施例中,每個q獨立地選自0-6。在一些實施例中,每個q獨立地選自0-8。在一些實施例中,每個q獨立地選自0-10。在一些實施例中,每個q獨立地選自0-12。In some embodiments, each q is independently selected from 0-6. In some embodiments, each q is independently selected from 0-8. In some embodiments, each q is independently selected from 0-10. In some embodiments, each q is independently selected from 0-12.

在一些實施例中,R 2選自由以下組成之群: In some embodiments, R is selected from the group consisting of: , , , , , , , , , , , and .

在一些實施例中,R 2選自由以下組成之群: In some embodiments, R is selected from the group consisting of: , , and .

在一些實施例中,本揭示案包括選自下表(V)中之任何脂質或其醫藥學上可接受之鹽的化合物: 表(V). 可离子化脂質之非限制性實例 結構 化合物編號 CZ-1 CZ-2 CZ-3 CZ-4 CZ-5 CZ-6 CZ-7 CZ-8 CZ-9 CZ-10 CZ-11 CZ-12 CZ-13 CZ-14 CZ-15 CZ-16 CZ-17 CZ-18 ii. 結構脂質 In some embodiments, the present disclosure includes a compound selected from any lipid or a pharmaceutically acceptable salt thereof in Table (V) below: Table (V). Non-limiting Examples of Ionizable Lipids Structure Compound No. CZ-1 CZ-2 CZ-3 CZ-4 CZ-5 CZ-6 CZ-7 CZ-8 CZ-9 CZ-10 CZ-11 CZ-12 CZ-13 CZ-14 CZ-15 CZ-16 CZ-17 CZ-18 ii. Structural lipids

在一些實施例中,LNP包含結構脂質。結構脂質可選自由以下組成之群但不限於:膽固醇、糞甾醇、岩藻甾醇、β麥固醇、麥固醇、麥角甾醇、菜油甾醇、豆甾醇、蕓苔甾醇、番茄鹼、膽酸、麩甾烷醇、石膽酸(litocholic acid)、番茄苷、熊果酸、α-生育酚、維他命D3、維他命D2、卡泊三醇、肉毒桿菌毒素、羽扇豆醇、石竹素、β-麥固醇-乙酸酯及其混合物。在一些實施例中,結構脂質為膽固醇。在一些實施例中,結構脂質係Patel等人, Nat Commun., 11, 983 (2020)所揭示之膽固醇類似物,該文獻以引用之方式整體併入本文中。在一些實施例中,結構脂質包括膽固醇及皮質類固醇(諸如潑尼松龍、地塞米松、潑尼松及氫化可的松)或其任何組合。在一些實施例中,結構脂質描述於國際專利申請案WO2019152557A1中,該案以引用之方式整體併入本文中。In some embodiments, the LNP comprises a structural lipid. The structural lipid may be selected from the group consisting of, but not limited to, cholesterol, natriol, fucosterol, β-mysterol, mysterol, ergosterol, campesterol, stigmasterol, stigmasterol, tomatosterol, lycoside, cholic acid, glutanostanol, litocholic acid, tomatin, ursolic acid, α-tocopherol, vitamin D3, vitamin D2, calcipotriol, botulinum toxin, lupeol, dianthin, β-mysterol-acetate, and mixtures thereof. In some embodiments, the structural lipid is cholesterol. In some embodiments, the structured lipid is a cholesterol analog disclosed in Patel et al., Nat Commun., 11, 983 (2020), which is incorporated herein by reference in its entirety. In some embodiments, the structured lipid includes cholesterol and corticosteroids (such as prednisolone, dexamethasone, prednisone and hydrocortisone) or any combination thereof. In some embodiments, the structured lipid is described in international patent application WO2019152557A1, which is incorporated herein by reference in its entirety.

在一些實施例中,結構脂質為膽固醇類似物。使用膽固醇類似物可增強內體逃逸,如Patel等人, Naturally-occurring cholesterol analogues in lipid nanoparticles induce polymorphic shape and enhance intracellular delivery of mRNA, Nature Communications (2020)中所述,該文獻以引用之方式併入本文中。In some embodiments, the structural lipid is a cholesterol analog. The use of cholesterol analogs can enhance endosomal escape as described in Patel et al., Naturally-occurring cholesterol analogues in lipid nanoparticles induce polymorphic shape and enhance intracellular delivery of mRNA, Nature Communications (2020), which is incorporated herein by reference.

在一些實施例中,結構脂質為植物固醇。使用植物固醇可增強內體逃逸,如Herrera等人, Illuminating endosomal escape of polymorphic lipid nanoparticles that boost mRNA delivery, Biomaterials Science (2020)中所述,該文獻以引用之方式併入本文中。In some embodiments, the structured lipid is a plant sterol. Use of plant sterols can enhance endosomal escape as described in Herrera et al., Illuminating endosomal escape of polymorphic lipid nanoparticles that boost mRNA delivery, Biomaterials Science (2020), which is incorporated herein by reference.

在一些實施例中,結構脂質含有用於增強內體釋放之植物固醇模擬物。 iii. PEG 化脂質 In some embodiments, the structured lipids contain plant sterol mimetics for enhanced endosomal release. iii. PEGylated lipids

PEG化脂質係經聚乙二醇修飾之脂質。PEGylated lipids are lipids modified with polyethylene glycol.

在一些實施例中,LNP包含一種、兩種或兩種以上PEG化脂質或經PEG修飾之脂質。PEG化脂質可選自由以下組成之非限制性群:經PEG修飾之磷脂醯乙醇胺、經PEG修飾之磷脂酸、經PEG修飾之神經醯胺、經PEG修飾之二烷基胺、經PEG修飾之二醯基甘油、經PEG修飾之二烷基甘油及其混合物。例如,PEG脂質可為PEG-c-DOMG、PEG-DMG、PEG-DLPE、PEG-DMPE、PEG-DPPC或PEG-DSPE脂質。In some embodiments, the LNP comprises one, two or more PEGylated lipids or PEGylated lipids. The PEGylated lipids can be selected from a non-limiting group consisting of PEGylated phosphatidylethanolamine, PEGylated phosphatidic acid, PEGylated ceramide, PEGylated dialkylamine, PEGylated diacylglycerol, PEGylated dialkylglycerol and mixtures thereof. For example, the PEG lipid can be PEG-c-DOMG, PEG-DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC or PEG-DSPE lipid.

在一些實施例中,PEG化脂質選自胺基甲酸(R)-2,3-雙(十八烷氧基)丙基-1-(甲氧基聚(乙二醇)2000)丙酯、PEG-S-DSG、PEG-S-DMG、PEG-PE、PEG-PAA、PEG-OH DSPE C18、PEG-DSPE、PEG-DSG、PEG-DPG、PEG-DOMG、PEG-DMPE Na、PEG-DMPE、PEG-DMG2000、PEG-DMG C14、PEG-DMG 2000、PEG-DMG、PEG-DMA、PEG-神經醯胺C16、PEG-C-DOMG、PEG-c-DMOG、PEG-c-DMA、PEG-cDMA、PEGA、PEG750-C-DMA、PEG400、PEG2k-DMG、PEG2k-C11、PEG2000-PE、PEG2000P、PEG2000-DSPE、PEG2000-DOMG、PEG2000-DMG、PEG2000-C-DMA、PEG2000、PEG200、PEG(2k)-DMG、PEG DSPE C18、PEG DMPE C14、PEG DLPE C12、PEG Click DMG C14、PEG Click C12、PEG Click C10、N(羰基-甲氧基聚乙二醇-2000)-l,2-硬脂醯基-sn-甘油3-磷酸乙醇胺、Myrj52、mPEG-PLA、MPEG-DSPE、mPEG3000-DMPE、MPEG-2000-DSPE、MPEG2000-DSPE、mPEG2000-DPPE、mPEG2000-DMPE、mPEG2000-DMG、mDPPE-PEG2000、l,2-硬脂醯基-sn-甘油-3-磷酸乙醇胺-PEG2000、HPEG-2K-LIPD、葉酸PEG-DSPE、DSPE-PEGMA 500、DSPE-PEGMA、DSPE-PEG6000、DSPE-PEG5000、DSPE-PEG2K-NAG、DSPE-PEG2k、DSPE-PEG2000馬來醯亞胺、DSPE-PEG2000、DSPE-PEG、DSG-PEGMA、DSG-PEG5000、DPPE-PEG-2K、DPPE-PEG、DPPE-mPEG2000、DPPE-mPEG、DPG-PEGMA、DOPE-PEG2000、DMPE-PEGMA、DMPE-PEG2000、DMPE-Peg、DMPE-mPEG2000、DMG-PEGMA、DMG-PEG2000、DMG-PEG、二硬脂醯基-甘油-聚乙二醇、Cl8PEG750、CI8PEG5000、CI8PEG3000、CI8PEG2000、CI6PEG2000、CI4PEG2000、C18-PEG5000、C18PEG、C16PEG、C16 mPEG (聚乙二醇) 2000神經醯胺、C14-PEG-DSPE200、C14-PEG2000、C14PEG2000、C14-PEG 2000、C14-PEG、C14PEG、14:0-PEG2KPE、1,2-二硬脂醯基-sn-甘油-3-磷酸乙醇胺-PEG2000、胺基甲酸(R)-2,3-雙(十八烷氧基)丙基-1-(甲氧基聚(乙二醇)2000)丙酯、(PEG)-C-DOMG、PEG-C-DMA及DSPE-PEG-X。In some embodiments, the PEGylated lipid is selected from (R)-2,3-bis(octadecyloxy)propyl-1-(methoxypoly(ethylene glycol) 2000)propyl carbamate, PEG-S-DSG, PEG-S-DMG, PEG-PE, PEG-PAA, PEG-OH DSPE C18, PEG-DSPE, PEG-DSG, PEG-DPG, PEG-DOMG, PEG-DMPE Na, PEG-DMPE, PEG-DMG2000, PEG-DMG C14, PEG-DMG 2000, PEG-DMG, PEG-DMA, PEG-ceramide C16, PEG-C-DOMG, PEG-c-DMOG, PEG-c-DMA, PEG-cDMA, PEGA, PEG750-C-DMA, PEG400, PEG2k-DMG, PEG2k-C11, PEG2000-PE, PEG2000P, PEG2000-DSPE, PEG2000-DOMG, PEG2000 -DMG, PEG2000-C-DMA, PEG2000, PEG200, PEG(2k)-DMG, PEG DSPE C18, PEG DMPE C14, PEG DLPE C12, PEG Click DMG C14, PEG Click C12, PEG Click C10, N (carbonyl-methoxy polyethylene glycol-2000)-l, 2-stearyl-sn-glycero-3-phosphoethanolamine, Myrj52, mPEG-PLA, MPEG-DSPE, mPEG3000-DMPE, MPEG-2000-DSPE, MPEG2000-DSPE, mPEG2000-DPPE, mPEG2000-DMPE, mPEG2000-DMG, mDPPE-PEG2000, l, 2-stearyl-sn-glycero-3-phosphoethanolamine-PEG2000, HPEG-2K-LIPD, folic acid PEG-DSPE, DSPE-PEGMA 500, DSPE-PEGMA, DSPE-PEG6000, DSPE-PEG5000, DSPE-PEG2K-NAG, DSPE-PEG2k, DSPE-PEG2000 maleimide, DSPE-PEG2000, DSPE-PEG, DSG-PEGMA, DSG-PEG5000, DPPE-PEG-2K, DPPE-PEG, DPPE-mPEG2000, DPPE-mPEG, DPG -PEGMA, DOPE-PE G2000, DMPE-PEGMA, DMPE-PEG2000, DMPE-Peg, DMPE-mPEG2000, DMG-PEGMA, DMG-PEG2000, DMG-PEG, Distearyl-glycerol-polyethylene glycol, CI8PEG750, CI8PEG5000, CI8PEG3000, CI8PEG2000, CI6PEG2000, CI4PEG2000, C18-PEG5000, C18PEG, C16PEG, C16 mPEG (polyethylene glycol) 2000 ceramide, C14-PEG-DSPE200, C14-PEG2000, C14PEG2000, C14-PEG 2000, C14-PEG, C14PEG, 14:0-PEG2KPE, 1,2-distearoyl-sn-glycero-3-phosphoethanolamine-PEG2000, (R)-2,3-bis(octadecyloxy)propyl-1-(methoxypoly(ethylene glycol) 2000)propyl carbamate, (PEG)-C-DOMG, PEG-C-DMA, and DSPE-PEG-X.

在一些實施例中,LNP包含以下一者所揭示之PEG化脂質:US 2019/0240354;US 2010/0130588;US 2021/0087135;WO 2021/204179;US 2021/0128488;US 2020/0121809;US 2017/0119904;US 2013/0108685;US 2013/0195920;US 2015/0005363;US 2014/0308304;US 2013/0053572;WO 2019/232095A1;WO 2021/077067;WO 2019/152557;US 2015/0203446;US 2017/0210697;US 2014/0200257;或WO 2019/089828A1,其中每一個均以引用之方式整體併入本文中。In some embodiments, the LNP comprises a PEGylated lipid as disclosed in one of the following: US 2019/0240354; US 2010/0130588; US 2021/0087135; WO 2021/204179; US 2021/0128488; US 2020/0121809; US 2017/0119904; US 2013/0108685; US 2013/0195920; US 2015/0005363; US 2014/0308304; US 2013/0053572; WO 2019/232095A1; WO 2021/077067; WO 2019/152557; US 2015/0203446; US 2017/0210697; US 2014/0200257; or WO 2019/089828A1, each of which is incorporated herein by reference in its entirety.

在一些實施例中,LNP包含替代PEG化脂質之PEG化脂質替代品。本文所揭示之考慮PEG化脂質之所有實施例均應理解為亦適用於PEG化脂質替代品。在一些實施例中,LNP包含聚肌胺酸-脂質結合物,諸如US 2022/0001025 A1中所揭示之彼等結合物,該案以引用之方式整體併入本文中。 iv. 磷脂 In some embodiments, the LNP comprises a PEGylated lipid substitute to replace the PEGylated lipid. All embodiments disclosed herein that consider PEGylated lipids should be understood to also apply to PEGylated lipid substitutes. In some embodiments, the LNP comprises a poly(lactic acid)-lipid conjugate, such as those disclosed in US 2022/0001025 A1, which is incorporated herein by reference in its entirety. iv. Phospholipids

在一些實施例中,本揭示案之LNP包含磷脂。可用於該等組合物及方法中之磷脂可選自由以下組成之非限制性群:1,2-二硬脂醯基-sn-甘油-3-磷酸膽鹼(DSPC)、1,2-二油醯基-sn-甘油-3-磷酸乙醇胺(DOPE)、1,2-二亞油醯基-sn-甘油-3-磷酸膽鹼(DLPC)、1,2-二肉豆蔻醯基-sn-甘油-磷酸膽鹼(DMPC)、1,2-二油醯基-sn-甘油-3-磷酸膽鹼(DOPC)、1,2-二棕櫚醯基-sn-甘油-3-磷酸膽鹼(DPPC)、1,2-二(十一烷醯基)-sn-甘油-磷酸膽鹼(DUPC)、1-棕櫚醯基-2-油醯基-sn-甘油-3-磷酸膽鹼(POPC)、1,2-二-O-十八烯基-sn-甘油-3-磷酸膽鹼(18:0 Diether PC)、1-油醯基-2-膽固醇基半琥珀醯基-sn-甘油-3-磷酸膽鹼(OChemsPC)、1-十六烷基-sn-甘油-3-磷酸膽鹼(C16 Lyso PC)、1,2-二亞麻醯基-sn-甘油-3-磷酸膽鹼、1,2-二花生四烯醯基-sn-甘油-3-磷酸膽鹼、1,2-二(二十二碳六烯醯基)-sn-甘油-3-磷酸膽鹼、1,2-二植烷醯基sn-甘油-3-磷酸乙醇胺(ME 16.0 PE)、1,2-二硬脂醯基-sn-甘油-3-磷酸乙醇胺、1,2-二亞油醯基-sn-甘油-3-磷酸乙醇胺、1,2-二亞麻醯基-sn-甘油-3-磷酸乙醇胺、1,2-二花生四烯醯基-sn-甘油-3-磷酸乙醇胺、1,2-二(二十二碳六烯醯基)-sn-甘油-3-磷酸乙醇胺、1,2-二油醯基-sn-甘油-3-磷酸-外消旋-(1-甘油)鈉鹽(DOPG)、(S)-2-銨基-3-((((R)-2-(油醯基氧基)-3-(硬脂醯基氧基)丙氧基)氧橋磷醯基)氧基)丙酸鈉(L-α-磷脂醯基絲胺酸;腦PS)、二肉豆蔻醯基磷脂醯膽鹼(DMPC)、二肉豆蔻醯基磷酸乙醇胺(DMPE)、二肉豆蔻醯基磷脂醯甘油(DMPG)、二油醯基-磷脂醯乙醇胺4-(N-馬來醯亞胺基甲基)-環己烷-1-甲酸酯(DOPE-mal)、二油醯基磷脂醯甘油(DOPG)、1,2-二油醯基-sn-甘油-3-(磷酸-L-絲胺酸) (DOPS)、無細胞融合磷脂(DPhPE)、二棕櫚醯基磷脂醯乙醇胺(DPPE)、1,2-二反油醯基-sn-磷脂醯乙醇胺(DEPE)、二棕櫚醯基磷脂醯甘油(DPPG)、二棕櫚醯基磷脂醯基絲胺酸(DPPS)、二硬脂醯基磷脂醯膽鹼(DSPC)、二硬脂醯基-磷脂醯基-乙醇胺(DSPE)、二硬脂醯基磷酸乙醇胺咪唑(DSPEI)、1,2-二(十一烷醯基)-sn-甘油-磷酸膽鹼(DUPC)、卵磷脂醯膽鹼(EPC)、1,2-二油醯基-sn-甘油-3-磷酸鹽(18:1 PA;DOPA)、雙((S)-2-羥基-3-(油醯基氧基)丙基)磷酸銨(18:1 DMP;LBPA)、1,2-二油醯基-sn-甘油-3-磷酸-(1’-肌醇) (DOPI;18:1 PI)、1,2-二硬脂醯基-sn-甘油-3-磷酸-L-絲胺酸(18:0 PS)、1,2-二亞油醯基-sn-甘油-3-磷酸-L-絲胺酸(18:2 PS)、1-棕櫚醯基-2-油醯基-sn-甘油-3-磷酸-L-絲胺酸(16:0-18:1 PS;POPS)、1-硬脂醯基-2-油醯基-sn-甘油-3-磷酸-L-絲胺酸(18:0-18:1 PS)、1-硬脂醯基-2-亞油醯基-sn-甘油-3-磷酸-L-絲胺酸(18:0-18:2 PS)、1-油醯基-2-羥基-sn-甘油-3-磷酸-L-絲胺酸(18:1 Lyso PS)、1-硬脂醯基-2-羥基-sn-甘油-3-磷酸-L-絲胺酸(18:0 Lyso PS)及鞘磷脂。在一些實施例中,LNP包括DSPC。在某些實施例中,LNP包括DOPE。在一些實施例中,LNP包括DSPC及DOPE兩者。In some embodiments, the LNPs of the present disclosure comprise phospholipids. The phospholipids useful in the compositions and methods can be selected from a non-limiting group consisting of: 1,2-distearoyl-sn-glycero-3-phosphocholine (DSPC), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 1,2-dilinoleyl-sn-glycero-3-phosphocholine (DLPC), 1,2-dimyristoyl-sn-glycero-phosphocholine (DMPC), 1,2- 2-dioleyl-sn-glycero-3-phosphocholine (DOPC), 1,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2-di(undecanyl)-sn-glycero-phosphocholine (DUPC), 1-palmitoyl-2-oleyl-sn-glycero-3-phosphocholine (POPC), 1,2-di-O-octadecenyl-sn-glycero-3-phosphocholine (18:0 Diether PC), 1-oleyl-2-cholesterol hemisuccinyl-sn-glycero-3-phosphocholine (OChemsPC), 1-hexadecyl-sn-glycero-3-phosphocholine (C16 Lyso PC), 1,2-dilinolenoyl-sn-glycero-3-phosphocholine, 1,2-diarachidonyl-sn-glycero-3-phosphocholine, 1,2-di(docosahexaenoyl)-sn-glycero-3-phosphocholine, 1,2-diphytanyl-sn-glycero-3-phosphoethanolamine (ME 16.0 PE), 1,2-distearoyl-sn-glycero-3-phosphoethanolamine, 1,2-dilinoleyl-sn-glycero-3-phosphoethanolamine, 1,2-dilinoleyl-sn-glycero-3-phosphoethanolamine, 1,2-diarachidonyl-sn-glycero-3-phosphoethanolamine, 1,2-di(docosahexaenoyl)-sn-glycero-3-phosphoethanolamine, 1,2-dioleoyl-sn-glycero-3-phospho-racemic-(1-glycero) sodium salt (DOPG), (S)-2-ammonium-3-((((R)-2-(oleoyl)-3-phosphoethanolamine, =Sodium (( ... (DOPS), cell-free fusion phospholipids (DPhPE), dimalmitoylphosphatidylethanolamine (DPPE), 1,2-dioleoyl-sn-phosphatidylethanolamine (DEPE), dimalmitoylphosphatidylglycerol (DPPG), dimalmitoylphosphatidylserine (DPPS), distearylphosphatidylcholine (DSPC), distearyl-phosphatidyl-ethanolamine (DSPE), distearylphosphoethanolamine imidazole (DSPEI), 1,2-di(undecanoyl)-sn-glycero-phosphocholine (DUPC), phosphatidylcholine (EPC), 1,2-dioleoyl-sn-glycero-3-phosphate (18:1 PA; DOPA), bis((S)-2-hydroxy-3-(oleyloxy)propyl)ammonium phosphate (18:1 DMP; LBPA), 1,2-dioleoyl-sn-glycero-3-phospho-(1'-inositol) (DOPI; 18:1 PI), 1,2-distearyl-sn-glycero-3-phospho-L-serine (18:0 PS), 1,2-dilinoleyl-sn-glycero-3-phospho-L-serine (18:2 PS), 1-palmitoyl-2-oleoyl-sn-glycero-3-phospho-L-serine (16:0-18:1 In some embodiments, the LNP comprises DSPC. In some embodiments, the LNP comprises DOPE. In some embodiments, the LNP comprises both DSPC and DOPE.

在一些實施例中,LNP包含選自以下之磷脂:1-十五烷醯基-2-油醯基-sn-甘油-3-磷酸膽鹼、1-肉豆蔻醯基-2-棕櫚醯基-sn-甘油-3-磷酸膽鹼、1-肉豆蔻醯基-2-硬脂醯基-sn-甘油-3-磷酸膽鹼、1-棕櫚醯基-2-肉豆蔻醯基-sn-甘油-3-磷酸膽鹼、1-棕櫚醯基-2-硬脂醯基-sn-甘油-3-磷酸膽鹼、1-棕櫚醯基-2-油醯基-甘油-3-磷酸膽鹼、1-棕櫚醯基-2-亞油醯基-sn-甘油-3-磷酸膽鹼、1-棕櫚醯基-2-花生四烯醯基-sn-甘油-3-磷酸膽鹼、1-棕櫚醯基-2-二十二碳六烯醯基-sn-甘油-3-磷酸膽鹼、1-硬脂醯基-2-肉豆蔻醯基-sn-甘油-3-磷酸膽鹼、1-硬脂醯基-2-棕櫚醯基-sn-甘油-3-磷酸膽鹼、1-硬脂醯基-2-油醯基-sn-甘油-3-磷酸膽鹼、1-硬脂醯基-2-亞油醯基-sn-甘油-3-磷酸膽鹼、1-硬脂醯基-2-花生四烯醯基-sn-甘油-3-磷酸膽鹼、1-硬脂醯基-2-二十二碳六烯醯基-sn-甘油-3-磷酸膽鹼、1-油醯基-2-肉豆蔻醯基-sn-甘油-3-磷酸膽鹼、1-油醯基-2-棕櫚醯基-sn-甘油-3-磷酸膽鹼、1-油醯基-2-硬脂醯基-sn-甘油-3-磷酸膽鹼、1-棕櫚醯基-2-乙醯基-sn-甘油-3-磷酸膽鹼、1,2-二油醯基-sn-甘油-3-磷酸-(1’-肌醇-3’,4’-二磷酸)、1,2-二油醯基-sn-甘油-3-磷酸-(1’-肌醇-3’,5’-二磷酸), 1,2-二油醯基-sn-甘油-3-磷酸-(1’-肌醇-4’,5’-二磷酸)、1,2-二油醯基-sn-甘油-3-磷酸-(1'-肌醇-3',4',5'-三磷酸)、1,2-二油醯基-sn-甘油-3-磷酸-(1’-肌醇-3’-磷酸)、1,2-二油醯基-sn-甘油-3-磷酸-(1’-肌醇-4’-磷酸)、1,2-二油醯基-sn-甘油-3-磷酸-(1'-肌醇-5'-磷酸)、1,2-二油醯基-sn-甘油-3-磷酸-(1’-肌醇)、1,2-二油醯基-sn-甘油-3-磷酸-L-絲胺酸及1-(8Z-十八烯醯基)-2-棕櫚醯基-sn-甘油-3-磷酸膽鹼。In some embodiments, the LNP comprises a phospholipid selected from the group consisting of 1-pentadecanyl-2-oleyl-sn-glycero-3-phosphocholine, 1-myristoyl-2-palmitoyl-sn-glycero-3-phosphocholine, 1-myristoyl-2-stearoyl-sn-glycero-3-phosphocholine, 1-palmitoyl-2-myristoyl-sn-glycero-3-phosphocholine, 1-palmitoyl-2-stearoyl-sn-glycero-3-phosphocholine, Choline phosphate, 1-palmitoyl-2-oleyl-glycero-3-phosphocholine, 1-palmitoyl-2-linoleyl-sn-glycero-3-phosphocholine, 1-palmitoyl-2-arachidonoyl-sn-glycero-3-phosphocholine, 1-palmitoyl-2-docosahexaenoyl-sn-glycero-3-phosphocholine, 1-stearyl-2-myristoyl-sn-glycero-3-phosphocholine, 1-stearyl-2-palmitoyl sn-glycero-3-phosphocholine, 1-stearyl-2-oleyl-sn-glycero-3-phosphocholine, 1-stearyl-2-linoleyl-sn-glycero-3-phosphocholine, 1-stearyl-2-arachidonoyl-sn-glycero-3-phosphocholine, 1-stearyl-2-docosahexaenoyl-sn-glycero-3-phosphocholine, 1-oleyl-2-myristoyl-sn-glycero-3-phosphocholine 、1-oleyl-2-palmitoyl-sn-glycero-3-phosphocholine、1-oleyl-2-stearoyl-sn-glycero-3-phosphocholine、1-palmitoyl-2-acetyl-sn-glycero-3-phosphocholine、1,2-dioleoyl-sn-glycero-3-phosphocholine-(1'-inositol-3',4'-diphosphate)、1,2-dioleoyl-sn-glycero-3-phosphocholine-(1'-inositol-3',5'-diphosphate)、 1,2-Dioleoyl-sn-glycerol-3-phosphate-(1'-inositol-4',5'-diphosphate), 1,2-Dioleoyl-sn-glycerol-3-phosphate-(1'-inositol-3',4',5'-triphosphate), 1,2-Dioleoyl-sn-glycerol-3-phosphate-(1'-inositol-3'-phosphate), 1,2-Dioleoyl-sn-glycerol-3-phosphate-( 1'-inositol-4'-phosphate), 1,2-dioleoyl-sn-glycero-3-phosphate-(1'-inositol-5'-phosphate), 1,2-dioleoyl-sn-glycero-3-phosphate-(1'-inositol), 1,2-dioleoyl-sn-glycero-3-phosphate-L-serine and 1-(8Z-octadecenoyl)-2-palmitoyl-sn-glycero-3-phosphocholine.

在一些實施例中,可修飾磷脂尾以便促進內體逃逸,如美國申請公開案2021/0121411中所述,該案以引用之方式併入本文中。In some embodiments, the phospholipid tail can be modified to promote endosomal escape, as described in U.S. Application Publication No. 2021/0121411, which is incorporated herein by reference.

在一些實施例中,LNP包含以下一者所揭示之磷脂:US 2019/0240354;US 2010/0130588;US 2021/0087135;WO 2021/204179;US 2021/0128488;US 2020/0121809;US 2017/0119904;US 2013/0108685;US 2013/0195920;US 2015/0005363;US 2014/0308304;US 2013/0053572;WO 2019/232095A1;WO 2021/077067;WO 2019/152557;US 2017/0210697;或WO 2019/089828A1,其中每一個均以引用之方式整體併入本文中。In some embodiments, the LNP comprises a phospholipid disclosed in one of the following: US 2019/0240354; US 2010/0130588; US 2021/0087135; WO 2021/204179; US 2021/0128488; US 2020/0121809; US 2017/0119904; US 2013/0108685; US 2013/0195920; US 2015/0005363; US 2014/0308304; US 2013/0053572; WO 2019/232095A1; WO 2021/077067; WO 2019/152557; US 2017/0210697; or WO 2019/089828A1, each of which is incorporated herein by reference in its entirety.

在一些實施例中,US 2020/0121809中所揭示之磷脂具有以下結構: 其中R1及R2各自獨立地為分支鏈或直鏈、飽和或不飽和碳鏈(例如,烷基、烯基、炔基)。 vi. 靶向部分 In some embodiments, the phospholipid disclosed in US 2020/0121809 has the following structure: Wherein R1 and R2 are each independently a branched or straight chain, a saturated or unsaturated carbon chain (e.g., an alkyl, alkenyl, alkynyl). vi. Targeting moiety

在一些實施例中,脂質奈米顆粒進一步包含靶向部分。靶向部分可為抗體或其片段。靶向部分可能能夠結合於標靶抗原。In some embodiments, the lipid nanoparticles further comprise a targeting moiety. The targeting moiety may be an antibody or a fragment thereof. The targeting moiety may be capable of binding to a target antigen.

在一些實施例中,醫藥組合物包含可操作地連接至脂質奈米顆粒之靶向部分。在一些實施例中,靶向部分能夠結合於標靶抗原。在一些實施例中,標靶抗原在標靶器官中表現。在一些實施例中,標靶抗原在標靶器官中比其在肝臟中表現更多。In some embodiments, the pharmaceutical composition comprises a targeting moiety operably linked to a lipid nanoparticle. In some embodiments, the targeting moiety is capable of binding to a target antigen. In some embodiments, the target antigen is expressed in a target organ. In some embodiments, the target antigen is expressed more in the target organ than in the liver.

在一些實施例中,靶向部分為如WO2016189532A1中所述之抗體,該案以引用之方式併入本文中。例如,在一些實施例中,靶向顆粒與特異性抗CD38單株抗體(mAb)結合,這允許將囊封於該等顆粒內之siRNA以比遞送至其他白血球亞型更大之百分比特異性地遞送至B細胞淋巴球惡性腫瘤(諸如MCL)。In some embodiments, the targeting moiety is an antibody as described in WO2016189532A1, which is incorporated herein by reference. For example, in some embodiments, the targeting particles are conjugated to a specific anti-CD38 monoclonal antibody (mAb), which allows the siRNA encapsulated in the particles to be specifically delivered to B cell lymphocytic malignancies (such as MCL) at a greater percentage than to other white blood cell subtypes.

在一些實施例中,當與諸如抗體之靶向部分結合/連接/締合時,脂質奈米顆粒可經靶向。 vii. 兩性離子胺基脂質 In some embodiments, lipid nanoparticles can be targeted when bound/linked/conjugated to a targeting moiety such as an antibody. vii. Zwitterionic amino lipids

在一些實施例中,LNP包含兩性離子脂質。在一些實施例中,包含兩性離子脂質之LNP不包含磷脂。In some embodiments, the LNP comprises a zwitterionic lipid. In some embodiments, the LNP comprising a zwitterionic lipid does not comprise a phospholipid.

兩性離子胺基脂質已顯示出能夠在無磷脂之情況下自組裝成LNP以在細胞內進行mRNA負載、穩定及釋放,如美國專利申請案20210121411中所述,該案以引用之方式整體併入本文中。兩性離子、可離子化陽離子及永久陽離子輔助脂質使得能夠在脾臟、肝臟及肺中實現組織選擇性mRNA遞送及CRISPR-Cas9基因編輯,如Liu等人, Membrane-destablizing ionizable phospholipids for organ-selective mRNA delivery and CRISPR-Cas gene editing, Nat Mater. (2021)中所述,該文獻以引用之方式整體併入本文中。Zwitterionic amino lipids have been shown to be able to self-assemble into LNPs in the absence of phospholipids for mRNA loading, stabilization, and release within cells, as described in U.S. Patent Application No. 20210121411, which is incorporated herein by reference in its entirety. Zwitterionic, ionizable cation-, and permanent cation-assisted lipids enable tissue-selective mRNA delivery and CRISPR-Cas9 gene editing in the spleen, liver, and lung, as described in Liu et al., Membrane-destablizing ionizable phospholipids for organ-selective mRNA delivery and CRISPR-Cas gene editing, Nat Mater. (2021), which is incorporated herein by reference in its entirety.

兩性離子脂質可具有含有陽離子胺及陰離子羧酸酯基之頭基,如Walsh等人, Synthesis, Characterization and Evaluation of Ionizable Lysine-Based Lipids for siRNA Delivery, Bioconjug Chem. (2013)中所述,該文獻以引用之方式整體併入本文中。含有藉由離胺酸α-胺處之醯胺鍵聯與長鏈二烷基胺連接之離胺酸頭基的可離子化基於離胺酸之脂質可降低免疫原性,如Walsh等人, Synthesis, Characterization and Evaluation of Ionizable Lysine-Based Lipids for siRNA Delivery, Bioconjug Chem. (2013)中所述。 viii. 額外脂質組分 Zwitterionic lipids can have head groups containing cationic amines and anionic carboxylate groups, as described in Walsh et al., Synthesis, Characterization and Evaluation of Ionizable Lysine-Based Lipids for siRNA Delivery, Bioconjug Chem. (2013), which is incorporated herein by reference in its entirety. Ionizable lysine-based lipids containing a lysine head group linked to a long-chain dialkylamine via an amide linkage at the α-amine of the lysine can reduce immunogenicity, as described in Walsh et al., Synthesis, Characterization and Evaluation of Ionizable Lysine-Based Lipids for siRNA Delivery, Bioconjug Chem. (2013). viii. Additional Lipid Components

在一些實施例中,本揭示案之LNP組合物進一步包含一或多種能夠影響LNP之趨向性的額外脂質組分。在一些實施例中,LNP進一步包含至少一種選自DDAB、EPC、14PA、18BMP、DODAP、DOTAP及C12-200之脂質(參見Cheng等人 Nat Nanotechnol. 2020年4月; 15(4): 313–320.;Dillard等人 PNAS 2021 第118卷第52期。)。In some embodiments, the LNP composition of the present disclosure further comprises one or more additional lipid components capable of affecting the tropism of the LNP. In some embodiments, the LNP further comprises at least one lipid selected from DDAB, EPC, 14PA, 18BMP, DODAP, DOTAP and C12-200 (see Cheng et al. Nat Nanotechnol. 2020 April; 15(4): 313–320.; Dillard et al. PNAS 2021 Vol. 118 No. 52.).

在一些實施例中,本揭示案之LNP組合物包含或進一步包含選自以下之一或多種脂質:1,2-二-O-十八烯基-sn-甘油-3-磷酸膽鹼(18:0 Diether PC)、1,2-二亞麻醯基-sn-甘油-3-磷酸膽鹼(18:3 PC)、醯基肌肽(AC)、1-十六烷基-sn-甘油-3-磷酸膽鹼(C16 Lyso PC)、N-油醯基-鞘磷脂(SPM) (C18:l)、N-木蠟醇基SPM (C24:0)、N-神經醯鞘磷脂(C24:l)、心磷脂(CL)、l,2-雙(二十三碳-10,12-二炔醯基)-sn-甘油-3-磷酸膽鹼(DC8-9PC)、磷酸二鯨蠟酯(DCP)、磷酸雙十六烷基酯(DCP1)、1,2-二棕櫚醯基甘油-3-半琥珀酸酯(DGSucc)、短鏈雙-正十七烷醯基磷脂醯膽鹼(DHPC)、雙十六烷醯基-磷酸乙醇胺(DHPE)、1,2-二亞油醯基-sn-甘油-3-磷酸膽鹼(DLPC)、l,2-二月桂醯基-sn-甘油-3-PE (DLPE)、二肉豆蔻醯基甘油半琥珀酸酯(DMGS)、二肉豆蔻醯基磷脂醯膽鹼(DMPC)、二肉豆蔻醯基磷酸乙醇胺(DMPE)、二肉豆蔻醯基磷脂醯甘油(DMPG)、二油烯基氧基苄醇(DOBA)、1,2-二油醯基甘油-3 -半琥珀酸酯(DOGHEMS)、N-[2-(2-{2-[2-(2,3-雙-十八碳-9-烯基氧基-丙氧基)-乙氧基]-乙氧基}-乙氧基)-乙基]-3-(3,4,5-三羥基-6-羥基甲基-1四氫-哌喃-2-基硫基)-丙醯胺(DOGP4αMan)、二油醯基磷脂醯膽鹼(DOPC)、二油醯基磷脂醯乙醇胺(DOPE)、二油醯基-磷脂醯乙醇胺4-(N-馬來醯亞胺基甲基)-環己烷-1-甲酸酯(DOPE-mal)、二油醯基磷脂醯甘油(DOPG)、1,2-二油醯基-sn-甘油-3-(磷酸-L-絲胺酸) (DOPS)、無細胞融合磷脂(DPhPE)、二棕櫚醯基磷脂醯乙醇胺(DPPE)、二棕櫚醯基磷脂醯甘油(DPPG)、二棕櫚醯基磷脂醯絲胺酸(DPPS)、二硬脂醯基磷脂醯膽鹼(DSPC)、二硬脂醯基-磷脂醯-乙醇胺(DSPE)、二硬脂醯基磷酸乙醇胺咪唑(DSPEI)、1,2-二(十一烷醯基)-sn-甘油-磷酸膽鹼(DUPC)、卵磷脂醯膽鹼(EPC)、組織胺二硬脂醯甘油(HDSG)、1,2-二棕櫚醯基甘油-半琥珀酸酯-Nα-組胺醯基-半琥珀酸酯(HistSuccDG)、N-(5'-羥基-3'-氧基戊基)-10-12-二十五碳二醯胺(h-Pegi-PCDA)、2-[l-己氧基乙基]-2-去乙烯基焦去鎂葉綠素酸-a (HPPH)、氫化大豆磷脂醯膽鹼(HSPC)、1,2-二棕櫚醯基甘油-O-α-組胺醯基-Nα-半琥珀酸酯(IsohistsuccDG)、甘露糖化二棕櫚醯基磷脂醯乙醇胺(ManDOG)、l,2-二油醯基-sn-甘油-3-磷酸乙醇胺-N-[4-(對馬來醯亞胺基甲基)環己烷-甲醯胺] (MCC-PE)、1,2-二植烷醯基-sn-甘油-3-磷酸乙醇胺(ME 16:0 PE)、1-肉豆蔻醯基-2-羥基-sn-甘油-磷酸膽鹼(MHPC)、硫醇反應性馬來醯亞胺頭基脂質(例如,1,2-二油醯基-sn-甘油-3-磷酸乙醇胺-N-[4-(對馬來醯亞胺基苯基)丁醯胺(MPB-PE))、神經酸(NA)、膽酸鈉(NaChol)、l,2-二油醯基-sn-甘油-3-[磷酸乙醇胺-N-十二烷醯基(NC12-DOPE)、1-油醯基-2-膽固醇半琥珀醯基-sn-甘油-3-磷酸膽鹼(OChemsPC)、磷脂醯乙醇胺脂質(PE)、與聚乙二醇(PEG)結合之PE脂質(例如,聚乙二醇-二硬脂醯基磷脂醯乙醇胺脂質(PEG-PE))、磷脂醯甘油(PG)、部分氫化大豆磷脂醯膽鹼(PHSPC)、磷脂醯肌醇脂質(PI)、磷脂醯肌醇-4-磷酸(PIP)、棕櫚醯基油醯基磷脂醯膽鹼(POPC)、磷脂醯乙醇胺(POPE)、棕櫚醯基油醯基磷脂醯甘油(POPG)、磷脂醯絲胺酸(PS)、麗絲胺若丹明B-磷脂醯乙醇胺脂質(Rh-PE)、純化大豆衍生之磷脂混合物(SIOO)、磷脂醯膽鹼(SM)、18-1-反式-PE,1-硬脂醯基-2-油醯基-磷脂醯乙醇胺(SOPE)、大豆磷脂醯膽鹼(SPC)、鞘磷脂(SPM)、α,α-海藻糖-6,6'-二山萮酸酯(TDB)、l,2-二反油醯基-sn-甘油-3-磷酸乙醇胺(反式DOPE)、((23S,5R)-3-(雙(十六烷氧基)甲氧基)-5-(5-甲基-2,4-二側氧基-3,4-二氫嘧啶-1(2H)-基)四氫呋喃-2-基)甲基甲基磷酸酯、1,2-二花生四烯醯基-sn-甘油-3-磷酸膽鹼、1,2-二花生四烯醯基- sn-甘油-3-磷酸乙醇胺、1,2-二(二十二碳六烯醯基)-sn-甘油-3-磷酸膽鹼、1,2 -二(二十二碳六烯醯基)-sn-甘油-3-磷酸乙醇胺、1,2-二亞麻醯基-sn-甘油-3-磷酸膽鹼、1,2-二亞麻醯基-sn-甘油-3-磷酸乙醇胺、1,2-二亞油醯基-sn-甘油-3-磷酸乙醇胺、1,2-二油烯基-sn-甘油-3-磷酸乙醇胺、1,2-二硬脂醯基-sn-甘油-3-磷酸乙醇胺、16-O-單甲基PE、16-O-二甲基PE及二油烯基磷脂醯乙醇胺。 viii.    LNP 醫藥組合物 In some embodiments, the LNP composition of the present disclosure comprises or further comprises one or more lipids selected from the following: 1,2-di-O-octadecene-sn-glycero-3-phosphocholine (18:0 Diether PC), 1,2-dilinolenoyl-sn-glycero-3-phosphocholine (18:3 PC), acylcarnosine (AC), 1-hexadecyl-sn-glycero-3-phosphocholine (C16 Lyso PC), N-oleyl-sphingomyelin (SPM) (C18:1), N-xylyl SPM (C18:2), N-hydroxysphingomyelin (SPM) (C18:3), N-hydroxysphingomyelin (SPM) (C18:1), N-hydroxysphingomyelin (SPM) ... (C24:0), N-neuroyl sphingomyelin (C24:1), cardiolipin (CL), l,2-bis(tricosyl-10,12-diynylyl)-sn-glycero-3-phosphocholine (DC8-9PC), dicetyl phosphate (DCP), dihexadecyl phosphate (DCP1), 1,2-dipalmitoylglycerol-3-hemisuccinate (DGSucc), short-chain di-n-heptadecanylphospholipid acylcholine (DHPC), dihexadecanoyl-phosphoethanolamine (DHPE), 1,2-dilinoleyl-sn-glycero-3-phosphocholine (DLPC), l,2-dilauryl-sn-glycero-3-PE (DLPE), dimyristylglycerol hemisuccinate (DMGS), dimyristylphosphatidylcholine (DMPC), dimyristylphosphoethanolamine (DMPE), dimyristylphosphatidylglycerol (DMPG), dioleyloxybenzyl alcohol (DOBA), 1,2-dioleylglycerol-3 -hemisuccinate (DOGHEMS), N-[2-(2-{2-[2-(2,3-bis-octadec-9-enyloxy-propoxy)-ethoxy]-ethoxy}-ethoxy)-ethyl]-3-(3,4,5-trihydroxy-6-hydroxymethyl-1-tetrahydro-pyran-2-ylthio)-propionamide (DOGP4αMan), dioleylphosphatidylcholine (DOPC), dioleylphosphatidylethanolamine (DOPE), dioleyl-phosphatidylethanolamine 4-(N-maleimidomethyl)-cyclohexane-1-carboxylate (DOPE-mal), dioleylphosphatidylglycerol (DOPG), 1,2-dioleyl-sn-glycero-3-(phospho-L-serine) (DOPS), cell-free phospholipids (DPhPE), dimalmitoylphosphatidylethanolamine (DPPE), dimalmitoylphosphatidylglycerol (DPPG), dimalmitoylphosphatidylserine (DPPS), distearylphosphatidylcholine (DSPC), distearyl-phosphatidyl-ethanolamine (DSPE), distearylphosphoethanolamine imidazole (DSPEI), 1,2-di(undecanoyl)-sn-glycero-phosphate Choline (DUPC), phosphatidylcholine (EPC), histamine distearyl glycerol (HDSG), 1,2-dipalmitoylglycerol-hemisuccinate-Nα-histidyl-hemisuccinate (HistSuccDG), N-(5'-hydroxy-3'-oxypentyl)-10-12-pentacosadiamide (h-Pegi-PCDA), 2-[l-hexyloxyethyl]-2-devinylpyrodemagnesium chlorophyllide-a (HPPH), hydrogenated soybean phosphatidylcholine (HSPC), 1,2-dipalmitoylglycerol-O-α-histidyl-Nα-hemisuccinate (IsohistsuccDG), mannosylated dimalmitoylphosphatidylethanolamine (ManDOG), l,2-dioleyl-sn-glycero-3-phosphoethanolamine-N-[4-(p-maleimidomethyl)cyclohexane-formamide] (MCC-PE), 1,2-diphytanyl-sn-glycero-3-phosphoethanolamine (ME 16:0 PE), 1-myristyl-2-hydroxy-sn-glycero-phosphocholine (MHPC), thiol-reactive maleimide head group lipids (e.g., 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine-N-[4-(p-maleimidophenyl)butyramide (MPB-PE)), neuraminic acid (NA), sodium cholate (NaChol), l,2-dioleoyl-sn-glycero-3-[phosphoethanolamine-N-dodecyl (NC12-DOPE), 1-Oleyl-2-cholesterol hemisuccinyl-sn-glycero-3-phosphocholine (OChemsPC), phosphatidylethanolamine lipids (PE), PE lipids conjugated with polyethylene glycol (PEG) (e.g., polyethylene glycol-distearylphosphatidylethanolamine lipids (PEG-PE)), phosphatidylglycerol (PG), partially hydrogenated soybean phosphatidylcholine (PHSPC), phosphatidylinositol lipids (PI), phosphatidylinositol-4-phosphate (PIP), palmitoyloleylphospholipids Phosphatidylcholine (POPC), phosphatidylethanolamine (POPE), palmitoyloleylphosphatidylglycerol (POPG), phosphatidylserine (PS), lissamine rhodamine B-phosphatidylethanolamine lipid (Rh-PE), purified soybean derived phospholipid mixture (SIOO), phosphatidylcholine (SM), 18-1-trans-PE, 1-stearyl-2-oleyl-phosphatidylethanolamine (SOPE), soybean phosphatidylcholine (SPC), sphingomyelin (SPM), α, α-Trehalose-6,6'-dibehenate (TDB), l,2-di-antioleyl-sn-glycero-3-phosphoethanolamine (trans-DOPE), ((23S,5R)-3-(bis(hexadecyloxy)methoxy)-5-(5-methyl-2,4-dihydroxy-3,4-dihydropyrimidin-1(2H)-yl)tetrahydrofuran-2-yl)methyl methyl phosphate, 1,2-diamidoyl-sn-glycero-3-phosphocholine, 1,2-diamidoyl- sn-glycero-3-phosphoethanolamine, 1,2-di(docosahexaenoyl)-sn-glycero-3-phosphocholine, 1,2-di(docosahexaenoyl)-sn-glycero-3-phosphoethanolamine, 1,2-dilinoleoyl-sn-glycero-3-phosphocholine, 1,2-dilinoleoyl-sn-glycero-3-phosphoethanolamine, 1,2-dioleyl-sn-glycero-3-phosphoethanolamine, 1,2-distearoyl-sn-glycero-3-phosphoethanolamine, 16-O-monomethyl PE, 16-O-dimethyl PE and dioleyl phosphatidylethanolamine. viii. LNP pharmaceutical composition

在一些實施例中,奈米顆粒包括可離子化脂質、磷脂、PEG脂質及結構脂質。在某些實施例中,奈米顆粒組合物之脂質組分包括約30 mol%至約60 mol%可離子化脂質、約0 mol%至約30 mol%磷脂、約18.5 mol%至約48.5 mol%結構脂質及約0 mol%至約10 mol% PEG脂質,其限制條件在於總mol%不超過100%。在一些實施例中,奈米顆粒組合物之脂質組分包括約35 mol%至約55 mol%可離子化脂質、約5 mol%至約25 mol%磷脂、約30 mol%至約40 mol%結構脂質及約0 mol%至約10 mol% PEG脂質。在一特定實施例中,脂質組分包括約50 mol%可離子化脂質、約10 mol%磷脂、約38.5 mol%結構脂質及約1.5 mol% PEG脂質。在另一特定實施例中,脂質組分包括約40 mol%可離子化脂質、約20 mol%磷脂、約38.5 mol%結構脂質及約1.5 mol% PEG脂質。在另一特定實施例中,脂質組分包括約48.5 mol%可離子化脂質、約10 mol%磷脂、約40 mol%結構脂質及約1.5 mol% PEG脂質。在另一特定實施例中,脂質組分包括約48.5 mol%可離子化脂質、約10 mol%磷脂、約39 mol%結構脂質及約2.5 mol% PEG脂質。在一些實施例中,磷脂可為DOPE或DSPC。在其他實施例中,PEG脂質可為PEG-DMG及/或結構脂質可為膽固醇。奈米顆粒組合物中之活性劑的量可取決於奈米顆粒組合物之大小、組成、所需標靶及/或應用或其他特性,以及活性劑之特性。例如,奈米顆粒組合物中可用之活性劑的量可取決於活性劑之大小、序列及其他特徵。奈米顆粒組合物中之活性劑及其他要素(例如,脂質)之相對量亦可變化。在一些實施例中,奈米顆粒組合物中之脂質組分與酶之wt/wt比率可為約5:1至約60:1,諸如5:1、6:1、7:1、8:1、9:1、10:1、11:1、12:1、13:1、14:1、15:1、16:1、17:1、18:1、19:1、20:1、25:1、30:1、35:1、40:1、45:1、50:1及60:1。奈米顆粒組合物中之酶的量可例如使用吸收光譜法(例如,紫外-可見光譜法)來量測。In some embodiments, the nanoparticles include ionizable lipids, phospholipids, PEG lipids and structural lipids. In certain embodiments, the lipid component of the nanoparticle composition includes about 30 mol% to about 60 mol% ionizable lipids, about 0 mol% to about 30 mol% phospholipids, about 18.5 mol% to about 48.5 mol% structural lipids and about 0 mol% to about 10 mol% PEG lipids, with the limitation that the total mol% does not exceed 100%. In some embodiments, the lipid component of the nanoparticle composition includes about 35 mol% to about 55 mol% ionizable lipids, about 5 mol% to about 25 mol% phospholipids, about 30 mol% to about 40 mol% structural lipids and about 0 mol% to about 10 mol% PEG lipids. In a specific embodiment, the lipid component includes about 50 mol% ionizable lipids, about 10 mol% phospholipids, about 38.5 mol% structural lipids, and about 1.5 mol% PEG lipids. In another specific embodiment, the lipid component includes about 40 mol% ionizable lipids, about 20 mol% phospholipids, about 38.5 mol% structural lipids, and about 1.5 mol% PEG lipids. In another specific embodiment, the lipid component includes about 48.5 mol% ionizable lipids, about 10 mol% phospholipids, about 40 mol% structural lipids, and about 1.5 mol% PEG lipids. In another specific embodiment, the lipid component includes about 48.5 mol% ionizable lipids, about 10 mol% phospholipids, about 39 mol% structural lipids and about 2.5 mol% PEG lipids. In some embodiments, the phospholipids may be DOPE or DSPC. In other embodiments, the PEG lipid may be PEG-DMG and/or the structural lipid may be cholesterol. The amount of the active agent in the nanoparticle composition may depend on the size, composition, desired target and/or application or other characteristics of the nanoparticle composition, as well as the characteristics of the active agent. For example, the amount of the active agent available in the nanoparticle composition may depend on the size, sequence and other characteristics of the active agent. The relative amounts of the active agent and other elements (e.g., lipids) in the nanoparticle composition may also vary. In some embodiments, the wt/wt ratio of the lipid component to the enzyme in the nanoparticle composition can be about 5: 1 to about 60: 1, such as 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1, 20: 1, 25: 1, 30: 1, 35: 1, 40: 1, 45: 1, 50: 1 and 60: 1. The amount of enzyme in the nanoparticle composition can be measured, for example, using absorption spectroscopy (e.g., UV-Vis spectroscopy).

在一些實施例中,調配包含本揭示案之活性劑之奈米顆粒組合物以提供特定E:P比率。該組合物之E:P比率係指一或多種脂質中之氮原子與RNA活性劑中之磷酸酯基的數目之莫耳比。一般而言,較低E:P比率為較佳的。可選擇一或多種酶、脂質及其量以提供約2:1至約30:1之E:P比率,諸如2:1、3:1、4:1、5:1、6:1、7:1、8:1、9:1、10:1、12:1、14:1、16:1、18:1、20:1、22:1、24:1、26:1、28:1或30:1。在某些實施例中,E:P比率可為約2:1至約8:1。在其他實施例中,E:P比率為約5:1至約8:1。例如,E:P比率可為約5.0:1、約5.5:1、約5.67:1、約6.0:1、約6.5:1或約7.0:1。In some embodiments, the nanoparticle composition comprising the active agent of the present disclosure is formulated to provide a specific E:P ratio. The E:P ratio of the composition refers to the molar ratio of the number of nitrogen atoms in one or more lipids to the number of phosphate groups in the RNA active agent. In general, lower E:P ratios are preferred. One or more enzymes, lipids, and amounts thereof may be selected to provide an E:P ratio of about 2:1 to about 30:1, such as 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 12:1, 14:1, 16:1, 18:1, 20:1, 22:1, 24:1, 26:1, 28:1, or 30:1. In certain embodiments, the E:P ratio may be about 2:1 to about 8:1. In other embodiments, the E:P ratio is about 5: 1 to about 8: 1. For example, the E:P ratio can be about 5.0: 1, about 5.5: 1, about 5.67: 1, about 6.0: 1, about 6.5: 1, or about 7.0: 1.

奈米顆粒組合物之特徵可取決於其組分。例如,包括膽固醇作為結構脂質之奈米顆粒組合物可具有與包括不同結構脂質之奈米顆粒組合物不同的特徵。同樣,奈米顆粒組合物之特徵可取決於其組分之絕對量或相對量。例如,包括較高莫耳分數之磷脂的奈米顆粒組合物可具有與包括較低莫耳分數之磷脂的奈米顆粒組合物不同之特徵。特徵亦可根據奈米顆粒組合物之製備方法及條件而變化。可藉由多種方法來表徵奈米顆粒組合物。例如,顯微術(例如,透射電子顯微術或掃描電子顯微術)可用於檢查奈米顆粒組合物之形態及大小分佈。動態光散射或電位分析法(例如電位滴定)可用於量測ζ電位。動態光散射亦可用於確定粒徑。亦可使用諸如Zetasizer Nano ZS (Malvern Instruments Ltd, Malvern, Worcestershire, UK)之儀器來量測奈米顆粒組合物之多種特徵,諸如粒徑、多分散性指數及ζ電位。The characteristics of a nanoparticle composition may depend on its components. For example, a nanoparticle composition comprising cholesterol as a structural lipid may have different characteristics from a nanoparticle composition comprising a different structural lipid. Similarly, the characteristics of a nanoparticle composition may depend on the absolute or relative amounts of its components. For example, a nanoparticle composition comprising a higher molar fraction of phospholipids may have different characteristics from a nanoparticle composition comprising a lower molar fraction of phospholipids. Characteristics may also vary depending on the preparation method and conditions of the nanoparticle composition. Nanoparticle compositions may be characterized by a variety of methods. For example, microscopy (e.g., transmission electron microscopy or scanning electron microscopy) may be used to examine the morphology and size distribution of nanoparticle compositions. Dynamic light scattering or potentiometry (e.g., potentiometric titration) can be used to measure the zeta potential. Dynamic light scattering can also be used to determine particle size. Instruments such as the Zetasizer Nano ZS (Malvern Instruments Ltd, Malvern, Worcestershire, UK) can also be used to measure various characteristics of nanoparticle compositions, such as particle size, polydispersity index, and zeta potential.

奈米顆粒組合物之平均大小可在數十nm與數百nm之間,例如藉由動態光散射(DLS)來量測。例如,平均大小可為約40 nm至約150 nm,諸如約40 nm、45 nm、50 nm、55 nm、60 nm、65 nm、70 nm、75 nm、80 nm、85 nm、90 nm、95 nm、100 nm、105 nm、110 nm、115nm、120 nm、125 nm、130 nm、135 nm、140 nm、145 nm或150 nm。在一些實施例中,奈米顆粒組合物之平均大小可為約50 nm至約100 nm、約50 nm至約90 nm、約50 nm至約80 nm、約50 nm至約70 nm、約50 nm至約60 nm、約60 nm至約100 nm、約60 nm至約90 nm、約60 nm至約80 nm、約60 nm至約70 nm、約70 nm至約100 nm、約70 nm至約90 nm、約70 nm至約80 nm、約80 nm至約100 nm、約80 nm至約90 nm或約90 nm至約100 nm。在某些實施例中,奈米顆粒組合物之平均大小可為約70 nm至約100 nm。在一特定實施例中,平均大小可為約80 nm。在其他實施例中,平均大小可為約100 nm。The average size of the nanoparticle composition can be between tens of nm and hundreds of nm, for example, as measured by dynamic light scattering (DLS). For example, the average size can be about 40 nm to about 150 nm, such as about 40 nm, 45 nm, 50 nm, 55 nm, 60 nm, 65 nm, 70 nm, 75 nm, 80 nm, 85 nm, 90 nm, 95 nm, 100 nm, 105 nm, 110 nm, 115 nm, 120 nm, 125 nm, 130 nm, 135 nm, 140 nm, 145 nm, or 150 nm. In some embodiments, the nanoparticle composition may have an average size of about 50 nm to about 100 nm, about 50 nm to about 90 nm, about 50 nm to about 80 nm, about 50 nm to about 70 nm, about 50 nm to about 60 nm, about 60 nm to about 100 nm, about 60 nm to about 90 nm, about 60 nm to about 80 nm, about 60 nm to about 70 nm, about 70 nm to about 100 nm, about 70 nm to about 90 nm, about 70 nm to about 80 nm, about 80 nm to about 100 nm, about 80 nm to about 90 nm, or about 90 nm to about 100 nm. In certain embodiments, the nanoparticle composition may have an average size of about 70 nm to about 100 nm. In a particular embodiment, the average size may be about 80 nm. In other embodiments, the average size may be about 100 nm.

奈米顆粒組合物可為相對均質的。多分散性指數可用於指示奈米顆粒組合物之均質性,例如奈米顆粒組合物之粒徑分佈。小(例如,小於0.3)多分散性指數一般指示狹窄粒徑分佈。奈米顆粒組合物可具有約0至約0.25之多分散性指數,諸如0.01、0.02、0.03、0.04、0.05、0.06、0.07、0.08、0.09、0.10、0.11、0.12、0.13、0.14、0.15、0.16、0.17、0.18、0.19、0.20、0.21、0.22、0.23、0.24或0.25。The nanoparticle composition can be relatively homogeneous. The polydispersity index can be used to indicate the homogeneity of the nanoparticle composition, such as the particle size distribution of the nanoparticle composition. A small (e.g., less than 0.3) polydispersity index generally indicates a narrow particle size distribution. The nanoparticle composition can have a polydispersity index of about 0 to about 0.25, such as 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24, or 0.25.

奈米顆粒組合物之ζ電位可用於指示該組合物之動電位。例如,ζ電位可描述奈米顆粒組合物之表面電荷。具有相對低電荷(正電荷或負電荷)之奈米顆粒組合物通常為合乎需要的,因為更高電荷之物質可能與身體中之細胞、組織及其他要素不合需要地相互作用。在一些實施例中,奈米顆粒組合物之ζ電位可為約-10 mV至約+20 mV、約-10 mV至約+15 mV、約-10 mV至約+10 mV、約-10 mV至約+5 mV、約-10 mV至約0 mV、約-10 mV至約-5 mV、約-5 mV至約+20 mV、約-5 mV至約+15 mV、約-5 mV至約+10 mV、約-5 mV至約+5 mV、約-5 mV至約0 mV、約0 mV,至約+20 mV、約0 mV至約+15 mV、約0 mV至約+10 mV、約0 mV至約+5 mV、約+5 mV至約+20 mV、約+5 mV,至約+15 mV或約+5 mV至約+10 mV。The zeta potential of a nanoparticle composition can be used to indicate the zeta potential of the composition. For example, the zeta potential can describe the surface charge of the nanoparticle composition. Nanoparticle compositions with relatively low charge (positive or negative) are generally desirable because higher charged species may interact undesirably with cells, tissues, and other elements in the body. In some embodiments, the zeta potential of the nanoparticle composition can be from about -10 mV to about +20 mV, about -10 mV to about +15 mV, about -10 mV to about +10 mV, about -10 mV to about +5 mV, about -10 mV to about 0 mV, about -10 mV to about -5 mV, about -5 mV to about +20 mV, about -5 mV to about +15 mV, about -5 mV to about +10 mV, about -5 mV to about +5 mV, about -5 mV to about 0 mV, about 0 mV, to about +20 mV, about 0 mV to about +15 mV, about 0 mV to about +10 mV, about 0 mV to about +5 mV, about +5 mV to about +20 mV, about +5 mV, to about +15 mV, or about +5 mV. mV to approximately +10 mV.

有效載荷之囊封效率描述相對於所提供之初始量,在製備後經囊封或以其他方式與奈米顆粒組合物締合之有效載荷的量。高囊封效率係合乎需要的(例如,接近100%)。例如,可藉由比較在用一或多種有機溶劑或清潔劑打碎奈米顆粒組合物之前及之後,含有奈米顆粒組合物之溶液中的有效載荷之量來量測囊封效率。可使用螢光來量測溶液中的游離有效載荷之量。對於本文所述之奈米顆粒組合物,治療劑及/或預防劑之囊封效率可為至少50%,例如50%、55%、60%. 65%、70%、75%、80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%、99%或100%。在一些實施例中,囊封效率可為至少80%。在某些實施例中,囊封效率可為至少90%。The encapsulation efficiency of a payload describes the amount of payload that is encapsulated or otherwise associated with a nanoparticle composition after preparation relative to the initial amount provided. A high encapsulation efficiency is desirable (e.g., close to 100%). For example, encapsulation efficiency can be measured by comparing the amount of payload in a solution containing the nanoparticle composition before and after breaking up the nanoparticle composition with one or more organic solvents or detergents. Fluorescence can be used to measure the amount of free payload in a solution. For the nanoparticle compositions described herein, the encapsulation efficiency of the therapeutic and/or prophylactic agent may be at least 50%, such as 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%. In some embodiments, the encapsulation efficiency may be at least 80%. In certain embodiments, the encapsulation efficiency may be at least 90%.

脂質及其製備方法揭示於例如美國專利第8,569,256號、第5,965,542號及美國專利公開案第2016/0199485號、第2016/0009637號、第2015/0273068號、第2015/0265708號、第2015/0203446號、第2015/0005363號、第2014/0308304號、第2014/0200257號、第2013/086373號、第2013/0338210號、第2013/0323269號、第2013/0245107號、第2013/0195920號、第2013/0123338號、第2013/0022649號、第2013/0017223號、第2012/0295832號、第2012/0183581號、第2012/0172411號、第2012/0027803號、第2012/0058188號、第2011/0311583號、第2011/0311582號、第2011/0262527號、第2011/0216622號、第2011/0117125號、第2011/0091525號、第2011/0076335號、第2011/0060032號、第2010/0130588號、第2007/0042031號、第2006/0240093號、第2006/0083780號、第2006/0008910號、第2005/0175682號、第2005/017054號、第2005/0118253號、第2005/0064595號、第2004/0142025號、第2007/0042031號、第1999/009076號及PCT公開案第WO 99/39741號、第WO 2017/117528號、第WO 2017/004143號、第WO 2017/075531號、第WO 2015/199952號、第WO 2014/008334號、第WO 2013/086373號、第WO 2013/086322號、第WO 2013/016058號、第WO 2013/086373號、第WO2011/141705號及第WO 2001/07548號以及Semple等人, Nature Biotechnology, 2010, 28, 172-176中,其完整揭示內容以引用之方式整體併入本文中以達成所有目的。Lipids and methods for preparing the same are disclosed in, for example, U.S. Patent Nos. 8,569,256 and 5,965,542 and U.S. Patent Publication Nos. 2016/0199485, 2016/0009637, 2015/0273068, 2015/0265708, 2015/0203446, 2015/0005363, 2014/0308304, 2014/0200257 , No. 2013/086373, No. 2013/0338210, No. 2013/0323269, No. 2013/0245107, No. 2013/0195920, No. 2013/0123338, No. 2013/0022649, No. 2013/0017223, No. 2012/0295832, No. 2012/0183581, No. 2012/0172411, No. 012/0027803, 2012/0058188, 2011/0311583, 2011/0311582, 2011/0262527, 2011/0216622, 2011/0117125, 2011/0091525, 2011/0076335, 2011/0060032, 2010/0130588, 200 7/0042031, 2006/0240093, 2006/0083780, 2006/0008910, 2005/0175682, 2005/017054, 2005/0118253, 2005/0064595, 2004/0142025, 2007/0042031, 1999/009076 and PCT Publication No. WO 99/39741, WO 2017/117528, WO 2017/004143, WO 2017/075531, WO 2015/199952, WO 2014/008334, WO 2013/086373, WO 2013/086322, WO 2013/016058, WO 2013/086373, WO 2011/141705 and WO 2001/07548 and Semple et al., Nature Biotechnology, 2010, 28, 172-176, the entire disclosures of which are incorporated herein by reference in their entirety for all purposes.

奈米顆粒組合物可包括可用於醫藥組合物之任何物質。例如,奈米顆粒組合物可包括一或多種醫藥學上可接受之賦形劑或附屬成分,諸如但不限于一或多種溶劑、分散介質、稀釋劑、分散助劑、懸浮助劑、造粒助劑、崩解劑、填充劑、助流劑、液體媒劑、黏合劑、表面活性劑、等張劑、增稠或乳化劑、緩衝劑、潤滑劑、油、防腐劑及其他物質。亦可包括賦形劑,諸如蠟、乳酪、著色劑、包衣劑、調味劑及芳香劑。醫藥學上可接受之賦形劑為此項技術中熟知的(參見例如Remington之 The Science and Practice of Pharmacy, 第21版, A. R. Gennaro: Lippincott, Williams & Wilkins, Baltimore, Md., 2006)。包括奈米顆粒之其他不同脂質或脂質體調配物及投與方法包括但不限於美國專利公開案20030203865、20020150626、20030032615及20040048787,該等公開案以引用之方式特定地併入至其揭示調配物以及核酸投與及遞送之其他相關態樣的程度。用於形成顆粒之方法亦揭示於美國專利第5,844,107號、第5,877,302號、第6,008,336號、第6,077,835號、第5,972,901號、第6,200,801號及第5,972,900號中,該等專利針對彼等態樣以引用之方式併入。 Nanoparticle compositions may include any substance that can be used in pharmaceutical compositions. For example, nanoparticle compositions may include one or more pharmaceutically acceptable excipients or adjunct ingredients, such as but not limited to one or more solvents, dispersion media, diluents, dispersing aids, suspension aids, granulation aids, disintegrants, fillers, glidants, liquid vehicles, binders, surfactants, isotonic agents, thickeners or emulsifiers, buffers, lubricants, oils, preservatives and other substances. Excipients such as waxes, creams, colorants, coatings, flavorings and fragrances may also be included. Pharmaceutically acceptable excipients are well known in the art (see, e.g., Remington's The Science and Practice of Pharmacy , 21st ed., AR Gennaro: Lippincott, Williams & Wilkins, Baltimore, Md., 2006). Other various lipid or liposomal formulations and administration methods including nanoparticles include, but are not limited to, U.S. Patent Publications 20030203865, 20020150626, 20030032615, and 20040048787, which are specifically incorporated by reference to the extent that they disclose formulations and other relevant aspects of nucleic acid administration and delivery. Methods for forming particles are also disclosed in U.S. Patent Nos. 5,844,107, 5,877,302, 6,008,336, 6,077,835, 5,972,901, 6,200,801, and 5,972,900, which are incorporated by reference for their aspects.

在一些實施例中,LNP囊封經工程改造之逆轉錄子,例如,如本文所述之經工程改造之核酸構築體、ncRNA、載體系統、RNA分子及/或經工程改造之核酸-酶構築體。In some embodiments, the LNP encapsulates an engineered retrotranscript, e.g., an engineered nucleic acid construct, ncRNA, vector system, RNA molecule, and/or engineered nucleic acid-enzyme construct as described herein.

在一些實施例中,脂質奈米顆粒包含:一或多種可離子化脂質;一或多種結構脂質;一或多種PEG化脂質;及一或多種磷脂。在一些實施例中,該一或多種可離子化脂質選自由表X中所揭示之彼等組成之群。In some embodiments, lipid nanoparticles include: one or more ionizable lipids; one or more structural lipids; one or more PEGylated lipids; and one or more phospholipids. In some embodiments, the one or more ionizable lipids are selected from the group consisting of those disclosed in Table X.

在一些實施例中,該一或多種結構脂質選自由以下組成之群:膽固醇、糞甾醇、β麥固醇、麥固醇、麥角甾醇、菜油甾醇、豆甾醇、蕓苔甾醇、番茄鹼、番茄苷、熊果酸、α-生育酚、潑尼松龍、地塞米松、潑尼松及氫化可的松。在一些實施例中,該一或多種PEG化脂質選自由以下組成之群:PEG-c-DOMG、PEG-DMG、PEG-DLPE、PEG-DMPE、PEG-DPPC及PEG-DSPE。In some embodiments, the one or more structural lipids are selected from the group consisting of: cholesterol, natriol, β-mysterol, mysterol, ergosterol, campesterol, stigmasterol, sterosterol, tomatine, tomatin, ursolic acid, α-tocopherol, prednisolone, dexamethasone, prednisolone and hydrocortisone. In some embodiments, the one or more PEGylated lipids are selected from the group consisting of: PEG-c-DOMG, PEG-DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC and PEG-DSPE.

在一些實施例中,該一或多種磷脂選自由以下組成之群:1,2-二硬脂醯基-sn-甘油-3-磷酸膽鹼(DSPC)、1,2-二油醯基-sn-甘油-3-磷酸乙醇胺(DOPE)、1,2-二亞油醯基-sn-甘油-3-磷酸膽鹼(DLPC)、1,2-二肉豆蔻醯基-sn-甘油-磷酸膽鹼(DMPC)、1,2-二油醯基-sn-甘油-3-磷酸膽鹼(DOPC)、1,2-二棕櫚醯基-sn-甘油-3-磷酸膽鹼(DPPC)、1,2-二(十一烷醯基)-sn-甘油-磷酸膽鹼(DUPC)、1-棕櫚醯基-2-油醯基-sn-甘油-3-磷酸膽鹼(POPC)、1,2-二-O-十八烯基-sn-甘油-3-磷酸膽鹼(18:0 Diether PC)、1-油醯基-2-膽固醇基半琥珀醯基-sn-甘油-3-磷酸膽鹼(OChemsPC)、1-十六烷基-sn-甘油-3-磷酸膽鹼(C16 Lyso PC)、1,2-二亞麻醯基-sn-甘油-3-磷酸膽鹼、1,2-二花生四烯醯基-sn-甘油-3-磷酸膽鹼、1,2-二(二十二碳六烯醯基)-sn-甘油-3-磷酸膽鹼、1,2-二植烷醯基sn-甘油-3-磷酸乙醇胺(ME 16.0 PE)、1,2-二硬脂醯基-sn-甘油-3-磷酸乙醇胺、1,2-二亞油醯基-sn-甘油-3-磷酸乙醇胺、1,2-二亞麻醯基-sn-甘油-3-磷酸乙醇胺、1,2-二花生四烯醯基-sn-甘油-3-磷酸乙醇胺、1,2-二(二十二碳六烯醯基)-sn-甘油-3-磷酸乙醇胺、1,2-二油醯基-sn-甘油-3-磷酸-外消旋-(1-甘油)鈉鹽(DOPG)及鞘磷脂。In some embodiments, the one or more phospholipids are selected from the group consisting of 1,2-distearyl-sn-glycero-3-phosphocholine (DSPC), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 1,2-dilinoleyl-sn-glycero-3-phosphocholine (DLPC), 1,2-dimyristoyl-sn-glycero-3-phosphocholine (DMPC), 1,2- Dioleyl-sn-glycero-3-phosphocholine (DOPC), 1,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2-di(undecanyl)-sn-glycero-3-phosphocholine (DUPC), 1-palmitoyl-2-oleyl-sn-glycero-3-phosphocholine (POPC), 1,2-di-O-octadecenyl-sn-glycero-3-phosphocholine (18:0 Diether PC), 1-oleyl-2-cholesterol hemisuccinyl-sn-glycero-3-phosphocholine (OChemsPC), 1-hexadecyl-sn-glycero-3-phosphocholine (C16 Lyso PC), 1,2-dilinolenoyl-sn-glycero-3-phosphocholine, 1,2-diarachidonyl-sn-glycero-3-phosphocholine, 1,2-di(docosahexaenoyl)-sn-glycero-3-phosphocholine, 1,2-diphytanyl-sn-glycero-3-phosphoethanolamine (ME 16.0 PE), 1,2-distearoyl-sn-glycero-3-phosphoethanolamine, 1,2-dilinoleyl-sn-glycero-3-phosphoethanolamine, 1,2-dilinolenoyl-sn-glycero-3-phosphoethanolamine, 1,2-diarachidonyl-sn-glycero-3-phosphoethanolamine, 1,2-di(docosahexaenoyl)-sn-glycero-3-phosphoethanolamine, 1,2-dioleoyl-sn-glycero-3-phospho-racemic-(1-glycero) sodium salt (DOPG) and sphingomyelin.

在一些實施例中,脂質奈米顆粒包含約48.5 mol%可離子化脂質、約10 mol%磷脂、約40 mol%結構脂質及約1.5 mol% PEG脂質。In some embodiments, the lipid nanoparticles comprise about 48.5 mol% ionizable lipids, about 10 mol% phospholipids, about 40 mol% structural lipids, and about 1.5 mol% PEG lipids.

在一些實施例中,脂質奈米顆粒包含約48.5 mol%可離子化脂質、約10 mol%磷脂、約39 mol%結構脂質及約2.5 mol% PEG脂質。在一些實施例中,LNP進一步包含可操作地連接至LNP之靶向部分。在一些實施例中,LNP進一步包含選自由以下組成之群的一或多種額外組分:DDAB、EPC、14PA、18BMP、DODAP、DOTAP及C12-200。In some embodiments, the lipid nanoparticles comprise about 48.5 mol% ionizable lipids, about 10 mol% phospholipids, about 39 mol% structural lipids, and about 2.5 mol% PEG lipids. In some embodiments, the LNP further comprises a targeting moiety operably linked to the LNP. In some embodiments, the LNP further comprises one or more additional components selected from the group consisting of: DDAB, EPC, 14PA, 18BMP, DODAP, DOTAP, and C12-200.

在一些實施例中,經工程改造之逆轉錄子可用於基因轉移,基因轉移可在離體或活體內條件下執行。離體基因療法係指自個體分離細胞,在 活體外將核酸送至細胞中,且將經修飾之細胞送回個體體內。這可能涉及收集包含來自個體之細胞之生物樣品。例如,可藉由靜脈穿刺獲得血液,且可根據此項技術中熟知之方法藉由手術技術獲得實體組織樣品。 In some embodiments, engineered retrotransposons can be used for gene transfer, which can be performed in vitro or in vivo. Ex vivo gene therapy refers to isolating cells from an individual, delivering nucleic acids to the cells ex vivo , and returning the modified cells to the individual. This may involve collecting a biological sample containing cells from the individual. For example, blood can be obtained by venous puncture, and solid tissue samples can be obtained by surgical techniques according to methods well known in the art.

通常但並非始終,接受細胞之個體( 例如,接受者)亦係其中收穫或獲得細胞之個體,這提供如下優勢,即所捐贈之細胞為自體的。然而,細胞可獲自另一個體( 例如,供體)、來自供體之細胞培養物或已建立之細胞培養株。因此,在一些實施例中,細胞對於接受者為同種異體的。細胞可獲自與欲治療之個體相同或不同之物種,但較佳為與個體相同之物種,且更佳具有與個體相同的免疫型態。此類細胞可例如獲自包含來自近親或匹配供體之細胞的生物樣品,接著經核酸( 例如,包含經工程改造之逆轉錄子)轉染,且投與至需要基因體經修飾之個體,例如,用於治療疾病或疾患。 Typically, but not always, the individual receiving the cells ( e.g. , the recipient) is also the individual from whom the cells were harvested or obtained, which provides the advantage that the donated cells are autologous. However, the cells can be obtained from another individual ( e.g. , a donor), a cell culture from a donor, or an established cell culture. Thus, in some embodiments, the cells are allogeneic to the recipient. The cells can be obtained from the same or a different species as the individual to be treated, but are preferably of the same species as the individual, and more preferably have the same immunotype as the individual. Such cells can be obtained, for example, from a biological sample comprising cells from a close relative or matched donor, then transfected with a nucleic acid ( e.g. , comprising an engineered retrotransposon), and administered to an individual in need of genetic modification, e.g., for treatment of a disease or disorder.

在其他實施例中,藉由將經工程改造之逆轉錄子以物理方式遞送至個體可活體內引入經工程改造之逆轉錄子(例如,用於基因療法)。以物理方式引入經工程改造之逆轉錄子的實例包括經由注射、電穿孔及轉染(例如,鈣介導或脂質體轉染或其類似方式)。 J. 有效載荷 In other embodiments, an engineered retrotransposon can be introduced in vivo (e.g., for gene therapy) by physically delivering the engineered retrotransposon to an individual. Examples of physically introducing an engineered retrotransposon include via injection, electroporation, and transfection (e.g., calcium-mediated or liposome transfection or the like). J. Payload

可藉助於如此處所述之LNP來遞送基於逆轉錄子之基因編輯系統及/或其組分。在各個實施例中,可藉由LNP將基於逆轉錄子之基因編輯系統遞送至細胞、組織、器官或生物體中。根據所選形式,基於逆轉錄子之基因編輯系統及/或其個別或組合組分可作為DNA分子(例如,在一或多種質體上編碼)、RNA分子(例如引導RNA,用於靶向編碼逆轉錄子RT之可程式化核酸酶或線性或環狀mRNA或基於逆轉錄子之基因編輯系統的可程式化核酸酶組分)、蛋白質(例如,逆轉錄子多肽、具有其他功能之輔助蛋白(例如,重組酶、核酸酶、聚合酶、連接酶、去胺酶或逆轉錄酶)或蛋白質-核酸複合物(例如,引導RNA與可程式化核酸酶蛋白或包含逆轉錄子RT之融合蛋白之間的複合物)進行遞送。此等對應於及/或編碼基於逆轉錄子之基因編輯系統或其組分的DNA、RNA、蛋白質或核蛋白包含LNP貨物或有效載荷。在各個實施例中,LNP貨物或有效載荷可包含核酸有效載荷,包括編碼有效載荷,諸如用於編碼基於逆轉錄子之編輯系統的各種組分之線性及環狀mRNA。 1. 核酸有效載荷 Retrotran-based gene editing systems and/or components thereof can be delivered with the aid of LNPs as described herein. In various embodiments, retrotran-based gene editing systems can be delivered to cells, tissues, organs or organisms via LNPs. Depending on the selected form, the retrotran-based gene editing system and/or its individual or combined components can be delivered as a DNA molecule (e.g., encoded on one or more plastids), an RNA molecule (e.g., a guide RNA for targeting a programmable nuclease encoding a retrotran RT or a linear or circular mRNA or a programmable nuclease component of a retrotran-based gene editing system), a protein (e.g., a retrotran polypeptide, an accessory protein with other functions (e.g., a recombinase, a nuclease, a polymerase, a ligase, a deaminase or a reverse transcriptase) or The LNP cargo or payload is a nucleic acid payload that is delivered by a protein-nucleic acid complex (e.g., a complex between a guide RNA and a programmable nuclease protein or a fusion protein comprising a retrotranscript RT). Such DNA, RNA, protein or nucleoprotein corresponding to and/or encoding a retrotranscript-based gene editing system or its components comprises an LNP cargo or payload. In various embodiments, the LNP cargo or payload may comprise a nucleic acid payload, including encoding payloads such as linear and circular mRNAs for encoding various components of a retrotranscript-based editing system. 1. Nucleic acid payload

在各個實施例中,本文所述之LNP組合物可用於遞送核酸或多核苷酸有效載荷,例如線性或環狀mRNA。In various embodiments, the LNP compositions described herein can be used to deliver a nucleic acid or polynucleotide payload, such as a linear or circular mRNA.

在各個實施例中,本文所述之基於逆轉錄子之編輯組合物可包括核酸或多核苷酸有效載荷,例如線性或環狀mRNA。例如,逆轉錄子基因編輯系統可包含一或多種用於編碼逆轉錄子RT及其他輔助蛋白(例如,可程式化核酸酶)之編碼mRNA (環狀或線性),且此等RNA組分可藉由LNP進行遞送。In various embodiments, the retrotranscript-based editing compositions described herein may include a nucleic acid or polynucleotide payload, such as a linear or circular mRNA. For example, a retrotranscript gene editing system may include one or more mRNAs (circular or linear) encoding a retrotranscript RT and other accessory proteins (e.g., a programmable nuclease), and these RNA components may be delivered by LNP.

在一些實施例中,LNP能夠將多核苷酸遞送至標靶細胞、組織或器官。就該術語之最廣泛意義而言,多核苷酸包括併入或可併入寡核苷酸鏈中之任何化合物及/或物質。根據本揭示案使用之例示性多核苷酸包括但不限於以下一或多者:去氧核糖核酸(DNA)、核糖核酸(RNA) (包括信使mRNA (mRNA))、其雜合體、RNAi誘導劑、RNAi劑、siRNA、shRNA、miRNA、反義RNA、核酶、催化DNA、誘導三螺旋形成之RNA、適體、載體等。可用於本文所述之組合物及方法中的RNA可選自由以下組成之群但不限於:shortimer、antagomir、反義、核酶、短干擾RNA (siRNA)、不對稱干擾RNA (aiRNA)、微小RNA (miRNA)、Dicer受質RNA (dsRNA)、短髮夾RNA (shRNA)、轉移RNA (tRNA)、信使RNA (mRNA)及其混合物。在一些實施例中,多核苷酸為mRNA。在一些實施例中,多核苷酸為環狀RNA。在一些實施例中,多核苷酸編碼蛋白質,例如核鹼基編輯酶。多核苷酸可編碼任何相關多肽,包括任何天然或非天然存在或以其他方式經修飾之多肽。多肽可為任何大小且可具有任何二級結構或活性。在一些實施例中,由mRNA編碼之多肽當在細胞中表現時可具有治療效應。 在其他實施例中,多核苷酸為siRNA。siRNA可能能夠選擇性地敲低或下調相關基因之表現。例如,在向有需要之個體投與包括siRNA之奈米顆粒組合物後,可選擇siRNA來沈默與特定疾病、病症或疾患相關之基因。siRNA可包含與編碼相關基因或蛋白質之mRNA序列互補的序列。在一些實施例中,siRNA可為免疫調節siRNA。 In some embodiments, LNPs are capable of delivering polynucleotides to target cells, tissues or organs. In the broadest sense of the term, polynucleotides include any compound and/or substance that is or can be incorporated into an oligonucleotide chain. Exemplary polynucleotides used in accordance with the present disclosure include, but are not limited to, one or more of the following: deoxyribonucleic acid (DNA), ribonucleic acid (RNA) (including messenger mRNA (mRNA)), hybrids thereof, RNAi inducers, RNAi agents, siRNA, shRNA, miRNA, antisense RNA, ribozymes, catalytic DNA, RNA that induces triple helix formation, aptamers, vectors, etc. RNA that can be used in the compositions and methods described herein can be selected from the group consisting of, but not limited to, shortimer, antagomir, antisense, ribozyme, short interfering RNA (siRNA), asymmetric interfering RNA (aiRNA), microRNA (miRNA), Dicer substrate RNA (dsRNA), short hairpin RNA (shRNA), transfer RNA (tRNA), messenger RNA (mRNA), and mixtures thereof. In some embodiments, the polynucleotide is mRNA. In some embodiments, the polynucleotide is a circular RNA. In some embodiments, the polynucleotide encodes a protein, such as a nucleobase editor. The polynucleotide can encode any relevant polypeptide, including any naturally or non-naturally occurring or otherwise modified polypeptide. The polypeptide can be of any size and can have any secondary structure or activity. In some embodiments, the polypeptide encoded by the mRNA can have a therapeutic effect when expressed in a cell. In other embodiments, the polynucleotide is siRNA. siRNA may be able to selectively knock down or downregulate the expression of related genes. For example, after administering a nanoparticle composition comprising siRNA to an individual in need, siRNA may be selected to silence a gene associated with a specific disease, condition, or disorder. siRNA may include a sequence that is complementary to an mRNA sequence encoding a related gene or protein. In some embodiments, siRNA may be an immunomodulatory siRNA.

在一些實施例中,多核苷酸為shRNA或編碼shRNA之載體或質體。在將適當構築體遞送至細胞核後,可在標靶細胞內部產生shRNA。與shRNA相關之構築體及機制為相關領域中熟知的。In some embodiments, the polynucleotide is shRNA or a vector or plasmid encoding shRNA. After the appropriate construct is delivered to the cell nucleus, shRNA can be produced inside the target cell. The constructs and mechanisms associated with shRNA are well known in the relevant field.

多核苷酸可包括編碼相關多肽之連接核苷的第一區域(例如,編碼區)、位於第一區域之5'末端的第一側接區域(例如,5'-UTR)、位於第一區域之3'末端的第二側接區域(例如,3'-UTR)、至少一個5'-帽區域及3'-穩定區域。在一些實施例中,多核苷酸進一步包括聚A區或Kozak序列(例如,在5'-UTR中)。在一些情況下,多核苷酸可含有能夠自多核苷酸切除之一或多個內含子核苷酸序列。在一些實施例中,多核苷酸(例如,mRNA)可包括5'帽結構、鏈終止核苷酸、莖環、聚A序列及/或多腺苷酸化信號。核酸之任一區域均可包括一或多種替代組分(例如,替代核苷)。例如,3'-穩定區域可含有替代核苷,諸如L-核苷、反向胸苷或2'-O-甲基核苷,及/或編碼區、5'-UTR、3'-UTR或帽區可包括替代核苷,諸如5-取代之尿苷(例如,5-甲氧基尿苷)、1-取代之假尿苷(例如,1-甲基假尿苷或1-乙基-假尿苷)及/或5-取代之胞苷(例如,5-甲基-胞苷)。在一些實施例中,多核苷酸僅含有天然存在之核苷。A polynucleotide may include a first region of linked nucleosides encoding a polypeptide of interest (e.g., a coding region), a first flanking region at the 5' end of the first region (e.g., a 5'-UTR), a second flanking region at the 3' end of the first region (e.g., a 3'-UTR), at least one 5'-cap region, and a 3'-stabilizing region. In some embodiments, the polynucleotide further includes a poly A region or a Kozak sequence (e.g., in a 5'-UTR). In some cases, the polynucleotide may contain one or more intronic nucleotide sequences that can be excised from the polynucleotide. In some embodiments, a polynucleotide (e.g., an mRNA) may include a 5' cap structure, a chain termination nucleotide, a stem loop, a poly A sequence, and/or a polyadenylation signal. Any region of a nucleic acid may include one or more alternative components (e.g., alternative nucleosides). For example, the 3'-stabilizing region may contain alternative nucleosides, such as L-nucleosides, inverted thymidines, or 2'-O-methyl nucleosides, and/or the coding region, 5'-UTR, 3'-UTR, or cap region may include alternative nucleosides, such as 5-substituted uridines (e.g., 5-methoxyuridine), 1-substituted pseudouridines (e.g., 1-methylpseudouridine or 1-ethyl-pseudouridine), and/or 5-substituted cytidines (e.g., 5-methyl-cytidine). In some embodiments, the polynucleotide contains only naturally occurring nucleosides.

在一些情況下,多核苷酸為大於30個核苷酸長。在另一實施例中,多核苷酸分子為大於35個核苷酸長。在另一實施例中,長度為至少40個核苷酸。在另一實施例中,長度為至少45個核苷酸。在另一實施例中,長度為至少55個核苷酸。在另一實施例中,長度為至少50個核苷酸。在另一實施例中,長度為至少60個核苷酸。在另一實施例中,長度為至少80個核苷酸。在另一實施例中,長度為至少90個核苷酸。在另一實施例中,長度為至少100個核苷酸。在另一實施例中,長度為至少120個核苷酸。在另一實施例中,長度為至少140個核苷酸。在另一實施例中,長度為至少160個核苷酸。在另一實施例中,長度為至少180個核苷酸。在另一實施例中,長度為至少200個核苷酸。在另一實施例中,長度為至少250個核苷酸。在另一實施例中,長度為至少300個核苷酸。在另一實施例中,長度為至少350個核苷酸。在另一實施例中,長度為至少400個核苷酸。在另一實施例中,長度為至少450個核苷酸。在另一實施例中,長度為至少500個核苷酸。在另一實施例中,長度為至少600個核苷酸。在另一實施例中,長度為至少700個核苷酸。在另一實施例中,長度為至少800個核苷酸。在另一實施例中,長度為至少900個核苷酸。在另一實施例中,長度為至少1000個核苷酸。在另一實施例中,長度為至少1100個核苷酸。在另一實施例中,長度為至少1200個核苷酸。在另一實施例中,長度為至少1300個核苷酸。在另一實施例中,長度為至少1400個核苷酸。在另一實施例中,長度為至少1500個核苷酸。在另一實施例中,長度為至少1600個核苷酸。在另一實施例中,長度為至少1800個核苷酸。在另一實施例中,長度為至少2000個核苷酸。在另一實施例中,長度為至少2500個核苷酸。在另一實施例中,長度為至少3000個核苷酸。在另一實施例中,長度為至少4000個核苷酸。在另一實施例中,長度為至少5000個核苷酸,或大於5000個核苷酸。In some cases, the polynucleotide is greater than 30 nucleotides long. In another embodiment, the polynucleotide molecule is greater than 35 nucleotides long. In another embodiment, the length is at least 40 nucleotides. In another embodiment, the length is at least 45 nucleotides. In another embodiment, the length is at least 55 nucleotides. In another embodiment, the length is at least 50 nucleotides. In another embodiment, the length is at least 60 nucleotides. In another embodiment, the length is at least 80 nucleotides. In another embodiment, the length is at least 90 nucleotides. In another embodiment, the length is at least 100 nucleotides. In another embodiment, the length is at least 120 nucleotides. In another embodiment, the length is at least 140 nucleotides. In another embodiment, the length is at least 160 nucleotides. In another embodiment, the length is at least 180 nucleotides. In another embodiment, the length is at least 200 nucleotides. In another embodiment, the length is at least 250 nucleotides. In another embodiment, the length is at least 300 nucleotides. In another embodiment, the length is at least 350 nucleotides. In another embodiment, the length is at least 400 nucleotides. In another embodiment, the length is at least 450 nucleotides. In another embodiment, the length is at least 500 nucleotides. In another embodiment, the length is at least 600 nucleotides. In another embodiment, the length is at least 700 nucleotides. In another embodiment, the length is at least 800 nucleotides. In another embodiment, the length is at least 900 nucleotides. In another embodiment, the length is at least 1000 nucleotides. In another embodiment, the length is at least 1100 nucleotides. In another embodiment, the length is at least 1200 nucleotides. In another embodiment, the length is at least 1300 nucleotides. In another embodiment, the length is at least 1400 nucleotides. In another embodiment, the length is at least 1500 nucleotides. In another embodiment, the length is at least 1600 nucleotides. In another embodiment, the length is at least 1800 nucleotides. In another embodiment, the length is at least 2000 nucleotides. In another embodiment, the length is at least 2500 nucleotides. In another embodiment, the length is at least 3000 nucleotides. In another embodiment, the length is at least 4000 nucleotides. In another embodiment, the length is at least 5000 nucleotides, or greater than 5000 nucleotides.

在一些實施例中,多核苷酸分子、與其相關之式、組合物或方法包含一或多種多核苷酸,該等多核苷酸包含如下所示之特徵:WO2002/098443、WO2003/051401、WO2008/052770、WO2009/127230、WO2006/122828、WO2008/083949、WO2010/088927、WO2010/037539、WO2004/004743、WO2005/016376、WO2006/024518、WO2007/095976、WO2008/014979、WO2008/077592、WO2009/030481、WO2009/095226、WO2011/069586、WO2011/026641、WO2011/144358、WO2012/019780、WO2012/013326、WO2012/089338、WO2012/113513、WO2012/116811、WO2012/116810、WO2013/113502、WO2013/113501、WO2013/113736、WO2013/143698、WO2013/143699、WO2013/143700、WO2013/120626、WO2013/120627、WO2013/120628、WO2013/120629、WO2013/174409、WO2014/127917、WO2015/024669、WO2015/024668、WO2015/024667、WO2015/024665、WO2015/024666、WO2015/024664、WO2015/101415、WO2015/101414、WO2015/024667、WO2015/062738、WO2015/101416,其均以引用之方式併入本文中。In some embodiments, the polynucleotide molecules, formulas, compositions or methods related thereto comprise one or more polynucleotides comprising the features of WO2002/098443, WO2003/051401, WO2008/052770, WO2009/127230, WO2006/122828, WO2008/083949, WO2010/088927, WO2010/037539, WO2004/004743, WO2005/006743, WO2006 ... 05/016376、WO2006/024518、WO2007/095976、WO2008/014979、WO2008/077592、WO2009/030481、WO2009/095226、WO2011/069586、WO2011/026641、WO2011/1 44358、WO2012/019780、WO2012/013326、WO2012/089338、WO2012/ 113513、WO2012/116811、WO2012/116810、WO2013/113502、WO2013/113501、WO2013/113736、WO2013/143698、WO2013/143699、WO2013/143700、WO2013/1206 26. WO2013/120627, WO2013/120628, WO2013/120629, WO2013/174 409, WO2014/127917, WO2015/024669, WO2015/024668, WO2015/024667, WO2015/024665, WO2015/024666, WO2015/024664, WO2015/101415, WO2015/101414, WO2015/024667, WO2015/062738, WO2015/101416, all of which are incorporated herein by reference.

在一些實施例中,多核苷酸包含一或多個微小RNA結合位點。在一些實施例中,微小RNA結合位點由非標靶器官中之微小RNA識別。在一些實施例中,微小RNA結合位點由肝臟中之微小RNA識別。在一些實施例中,微小RNA結合位點由肝細胞中之微小RNA識別。In some embodiments, the polynucleotide comprises one or more microRNA binding sites. In some embodiments, the microRNA binding sites are identified by microRNAs in non-target organs. In some embodiments, the microRNA binding sites are identified by microRNAs in the liver. In some embodiments, the microRNA binding sites are identified by microRNAs in hepatocytes.

在某些實施例中,本揭示案之RNA包含選自以下之一或多種膦酸酯修飾:硫代磷酸酯鍵聯(PS)、二硫代磷酸酯鍵聯(PS2)、甲基膦酸酯鍵聯(MP)、甲氧基丙基膦酸酯鍵聯(MOP)、5’-(E)-乙烯基膦酸酯鍵聯(5’-(E)-VP)、5’-甲基膦酸酯鍵聯(5’-MP)、具有磷酸酯鍵聯之(S)-5’-C-甲基、5’-硫代磷酸酯鍵聯(5’-PS)及肽核酸鍵聯(PNA)。在某些實施例中,本揭示案之RNA包含選自以下之一或多種核糖修飾:2’-O-甲基(2’-OMe)、2’-O-甲氧基乙基(2’-O-MOE)、2’-去氧-2’-氟(2’-F)、2’-阿拉伯-氟(2’-Ara-F)、2’-O-苄基、2’-O-甲基-4-吡啶(2’-O-CH2Py(4))、鎖核酸(LNA)、(S)-cET-BNA、三環-DNA (tcDNA)、PMO、解鎖核酸(UNA)及二醇核酸(GNA)。在某些實施例中,RNA包含鎖核酸(LNA),該鎖核酸(LNA)包含甲基橋、乙基橋、丙基橋、丁基橋或前述任一者之視情況經取代之變異體。在某些實施例中,本揭示案之RNA包含選自以下之一或多種經修飾鹼基:假尿苷(ψ)、2’硫代尿苷(s2U)、N6’-甲基腺苷(m 6A)、5’甲基胞苷(m 5C)、5’氟2’-去氧尿苷、經N-乙基哌啶7’-EAA三唑修飾之腺嘌呤、經N-乙基哌啶6’三唑修飾之腺嘌呤、6’苯基吡咯并-胞嘧啶(PhpC)、2’,4’-二氟甲苯甲醯基核糖核苷(rF)及5’-硝基吲哚。 2. 單股DNA有效載荷 In certain embodiments, the RNA of the present disclosure comprises one or more phosphonate modifications selected from the group consisting of phosphorothioate linkage (PS), phosphorodithioate linkage (PS2), methylphosphonate linkage (MP), methoxypropylphosphonate linkage (MOP), 5'-(E)-vinylphosphonate linkage (5'-(E)-VP), 5'-methylphosphonate linkage (5'-MP), (S)-5'-C-methyl with phosphate linkage, 5'-phosphorothioate linkage (5'-PS), and peptide nucleic acid linkage (PNA). In certain embodiments, the RNA of the present disclosure comprises one or more ribose modifications selected from the group consisting of 2'-O-methyl (2'-OMe), 2'-O-methoxyethyl (2'-O-MOE), 2'-deoxy-2'-fluoro (2'-F), 2'-arabino-fluoro (2'-Ara-F), 2'-O-benzyl, 2'-O-methyl-4-pyridine (2'-O-CH2Py(4)), locked nucleic acid (LNA), (S)-cET-BNA, tricyclic-DNA (tcDNA), PMO, unblocked nucleic acid (UNA), and glycol nucleic acid (GNA). In certain embodiments, the RNA comprises a locked nucleic acid (LNA) comprising a methyl bridge, an ethyl bridge, a propyl bridge, a butyl bridge, or a substituted variant of any of the foregoing. In certain embodiments, the RNA of the present disclosure comprises one or more modified bases selected from the group consisting of pseudouridine (ψ), 2'thiouridine (s2U), N6'-methyladenosine ( m6A ), 5'methylcytidine ( m5C ), 5'fluoro-2'-deoxyuridine, adenine modified with N-ethylpiperidinium 7'-EAA triazole, adenine modified with N-ethylpiperidinium 6' triazole, 6'phenylpyrrolo-cytosine (PhpC), 2',4'-difluorotoluoyl ribonucleoside (rF), and 5'-nitroindole. 2. Single-stranded DNA payload

在各個實施例中,本揭示案之LNP可包含具有至少一種單股DNA之有效載荷。在某些實施例中,單股DNA為線性單股DNA。在某些實施例中,單股DNA為環狀單股DNA。在某些實施例中,有效載荷進一步包含核鹼基編輯系統,諸如酶或編碼能夠獨立地或共同依賴性地編輯、修飾或改變標靶多核苷酸序列或包含核酸序列之標靶轉錄本的酶之多核苷酸。In various embodiments, the LNP of the present disclosure may include a payload having at least one single strand of DNA. In certain embodiments, the single strand of DNA is a linear single strand of DNA. In certain embodiments, the single strand of DNA is a circular single strand of DNA. In certain embodiments, the payload further includes a nucleobase editing system, such as an enzyme or a polynucleotide encoding an enzyme that can independently or co-dependently edit, modify or alter a target polynucleotide sequence or a target transcript comprising a nucleic acid sequence.

在某些實施例中,環狀單股DNA (CiSSD)有效載荷為PCT公開案WO2020142730A1中所述之有效載荷,該案以引用之方式整體併入本文中。在某些實施例中,CiSSD係用作用於靶向基因體修飾之核鹼基編輯系統的部分之供體模板。在某些實施例中,CiSSD包含DNA插入物、5’同源臂及3’同源臂。在一些實施例中,DNA插入物位於5’同源臂與3’同源臂之間。如本文所用之同源臂係指一系列核苷酸,該等核苷酸與標靶區域中之內源DNA序列中的一系列核苷酸互補。側接DNA插入物之同源臂允許將DNA插入物特異性地插入標靶區域中。標靶區域係發生所需插入或修飾之核酸序列。In some embodiments, the circular single-stranded DNA (CiSSD) payload is a payload described in PCT Publication WO2020142730A1, which is incorporated herein by reference in its entirety. In some embodiments, CiSSD is used as a donor template as part of a nucleobase editing system for targeted genome modification. In some embodiments, CiSSD comprises a DNA insert, a 5' homology arm, and a 3' homology arm. In some embodiments, the DNA insert is located between the 5' homology arm and the 3' homology arm. As used herein, a homology arm refers to a series of nucleotides that complement a series of nucleotides in an endogenous DNA sequence in a target region. The homology arms flanking the DNA insert allow the DNA insert to be specifically inserted into the target region. The target region is a nucleic acid sequence where the desired insertion or modification occurs.

在某些實施例中,DNA插入物為至少1個核苷酸。在某些實施例中,DNA插入物為至少約0.5 kb、2 kb、2.5 kb、5 kb、10 kb、20 kb、40 kb、80 kb、100 kb、150 kb或200 kb。在某些實施例中,DNA插入物之長度為約0.5 kb至5 kb、約1 kb至5 kb、約1 kb至10 kb、約1.6 kb至5 kb、約1.6 kb至10 kb、約2 kb至5 kb、約2 kb至20 kb、約2.5 kb至5 kb、約2.5 kb至10 kb、約2.5 kb至20 kb及約5 kb至100 kb。在一些實施例中,DNA插入物大小可介於約1 kb至約3 kb、約3 kb至約6 kb、約6 kb至約9 kb、約9 kb至約12 kb、約12 kb至約15 kb、約15 kb至約18 kb或約18 kb至約21 kb範圍內。In some embodiments, the DNA insert is at least 1 nucleotide. In some embodiments, the DNA insert is at least about 0.5 kb, 2 kb, 2.5 kb, 5 kb, 10 kb, 20 kb, 40 kb, 80 kb, 100 kb, 150 kb, or 200 kb. In some embodiments, the DNA insert is about 0.5 kb to 5 kb, about 1 kb to 5 kb, about 1 kb to 10 kb, about 1.6 kb to 5 kb, about 1.6 kb to 10 kb, about 2 kb to 5 kb, about 2 kb to 20 kb, about 2.5 kb to 5 kb, about 2.5 kb to 10 kb, about 2.5 kb to 20 kb, and about 5 kb to 100 kb in length. In some embodiments, the DNA insert size can range from about 1 kb to about 3 kb, about 3 kb to about 6 kb, about 6 kb to about 9 kb, about 9 kb to about 12 kb, about 12 kb to about 15 kb, about 15 kb to about 18 kb, or about 18 kb to about 21 kb.

在一些實施例中,DNA插入物可包含編碼標記物或報告子(例如,螢光標記物、抗生素標記物或任何合適標記物)之核苷酸序列。如本文所用,「標記物」或「報告子」意謂允許例如藉由螢光或抗生素抗性來鑑定及選擇所需細胞之特徵。例如,插入物可包括編碼報告子(例如,GFP、RFP或任何合適報告子)或重組酶之核苷酸序列。例如,報告子為N末端GFP融合報告子。In some embodiments, the DNA insert may include a nucleotide sequence encoding a marker or reporter (e.g., a fluorescent marker, an antibiotic marker, or any suitable marker). As used herein, a "marker" or "reporter" means a characteristic that allows identification and selection of desired cells, such as by fluorescence or antibiotic resistance. For example, the insert may include a nucleotide sequence encoding a reporter (e.g., GFP, RFP, or any suitable reporter) or a recombinase. For example, the reporter is an N-terminal GFP fusion reporter.

在一些實施例中,DNA插入物可包含編碼轉錄單元之核苷酸序列,其中每個轉錄單元可產生細胞產物(例如,蛋白質或RNA)。在一些實施例中,DNA插入物可包含編碼蛋白質之核苷酸序列,該蛋白質例如免疫調節蛋白(例如,細胞介素)、抗體、嵌合抗原受體(CAR)、生長因子、T細胞受體或另一蛋白質。In some embodiments, the DNA insert may include a nucleotide sequence encoding a transcription unit, wherein each transcription unit can produce a cellular product (e.g., a protein or RNA). In some embodiments, the DNA insert may include a nucleotide sequence encoding a protein, such as an immunomodulatory protein (e.g., a cytokine), an antibody, a chimeric antigen receptor (CAR), a growth factor, a T cell receptor, or another protein.

在某些實施例中,CiSSD包含可插入基因體DNA之標靶區域中的核苷酸斷裂處之DNA插入物。在一些實施例中,斷裂為雙股斷裂(DSB)。在某些實施例中,斷裂為單股DNA斷裂或切口。精確基因編輯技術(例如CRISPR)在所需序列變化(標靶序列)附近產生斷裂。CRISPR可用於產生缺失、破壞、插入、置換及修復。用於此等不同修飾之模板供體的組分一般為相同的,由三個基礎元件組成:5’同源臂、DNA插入物及3’同源臂。基於CRISPR之基因編輯可藉由破壞基因序列來生成基因剔除,然而,使用目前方法插入外源DNA (敲入)或置換基因體序列之效率非常差。在某些實施例中,CiSSD可藉由生成敲入修飾而與CRISPR一起使用。雙股斷裂可藉由任何合適機制,包括例如藉由如先前所述使用CRISPR、鋅指核酸酶、TALEN核酸酶(轉錄活化子樣效應子核酸酶)或大範圍核酸酶之基因編輯系統引入。簡言之,CRISPR基因體編輯系統使用CRISPR可程式化DNA核酸內切酶生成靶向DSB,該核酸內切酶可藉由小「引導」RNA (crRNA)靶向特定DNA序列(標靶序列)。用於基於CRISPR之修飾的引導RNA (例如(z.e.),crRNA及tracrRNA)可藉由任何合適方法生成。在某些實施例中,crRNA及tracrRNA可以化學方式合成。在其他實施例中,可藉由活體外轉錄構建及合成單一引導RNA (sgRNA)。In some embodiments, the CiSSD comprises a DNA insert that can be inserted into a nucleotide break in a target region of genomic DNA. In some embodiments, the break is a double-strand break (DSB). In some embodiments, the break is a single-strand DNA break or incision. Precision gene editing technology (such as CRISPR) produces breaks near the desired sequence change (target sequence). CRISPR can be used to produce deletions, disruptions, insertions, substitutions, and repairs. The components of the template donors used for these different modifications are generally the same, consisting of three basic elements: 5' homology arms, DNA inserts, and 3' homology arms. CRISPR-based gene editing can generate gene knockouts by disrupting gene sequences, however, the efficiency of inserting exogenous DNA (knock-in) or replacing genomic sequences using current methods is very poor. In certain embodiments, CiSSD can be used with CRISPR by generating knock-in modifications. Double-strand breaks can be introduced by any suitable mechanism, including, for example, by using a gene editing system of CRISPR, zinc finger nucleases, TALEN nucleases (transcription activator-like effector nucleases), or meganucleases as previously described. In short, the CRISPR genome editing system uses CRISPR programmable DNA endonucleases to generate targeted DSBs, which can be targeted to specific DNA sequences (target sequences) by small "guide" RNAs (crRNAs). Guide RNAs (e.g., (z.e.), crRNAs and tracrRNAs) for CRISPR-based modifications can be generated by any suitable method. In certain embodiments, crRNAs and tracrRNAs can be chemically synthesized. In other embodiments, a single guide RNA (sgRNA) can be constructed and synthesized by in vitro transcription.

在某些實施例中,本揭示案之LNP包含本文揭示之CiSSD且進一步包含精確基因編輯系統組分,諸如CRISPR、鋅指核酸酶、TALEN核酸酶(轉錄活化子樣效應子核酸酶)或大範圍核酸酶,或此項技術中已知之任何其他核鹼基編輯系統。In certain embodiments, the LNP of the present disclosure comprises a CiSSD disclosed herein and further comprises a component of a precise gene editing system, such as CRISPR, zinc finger nuclease, TALEN nuclease (transcription activator-like effector nuclease) or meganuclease, or any other nucleobase editing system known in the art.

在某些實施例中,單股DNA (SSD)有效載荷為PCT公開案WO2020232286A1中所述之有效載荷,該案以引用之方式整體併入本文中。In certain embodiments, the single-stranded DNA (SSD) payload is a payload described in PCT Publication WO2020232286A1, which is incorporated herein by reference in its entirety.

在某些實施例中,SSD包含來自絲狀噬菌體之經工程改造之起始序列及經工程改造之終止序列,以及相關DNA序列,其中相關DNA序列位於經工程改造之起始序列的3’處及經工程改造之終止序列的5’處。在某些實施例中,SSD包含可選擇標記物。In some embodiments, the SSD comprises an engineered start sequence and an engineered stop sequence from a filamentous phage, and a related DNA sequence, wherein the related DNA sequence is located 3' to the engineered start sequence and 5' to the engineered stop sequence. In some embodiments, the SSD comprises a selectable marker.

在某些實施例中,單股DNA (SSD)有效載荷由PCT公開案WO2020232286A1中所述之方法製得。在某些實施例中,SSD由包括以下步驟之方法製得:(a)在適合由經工程改造之核酸中的相關DNA序列產生ssDNA及由核酸輔助質體產生複數種噬菌體蛋白之條件下培養技術方案11之宿主細胞;(b)使ssDNA及該複數種噬菌體蛋白組裝成經工程改造之噬菌體;及(c)收集經工程改造之噬菌體。在某些實施例中,該方法進一步包括自經工程改造之噬菌體中萃取SSD。In some embodiments, the single-stranded DNA (SSD) payload is prepared by the method described in PCT Publication WO2020232286A1. In some embodiments, the SSD is prepared by a method comprising the following steps: (a) culturing the host cell of technical solution 11 under conditions suitable for producing ssDNA from the relevant DNA sequence in the engineered nucleic acid and producing a plurality of phage proteins from the nucleic acid-assisted plasmid; (b) assembling the ssDNA and the plurality of phage proteins into an engineered phage; and (c) collecting the engineered phage. In some embodiments, the method further comprises extracting the SSD from the engineered phage.

在某些實施例中,至少90%之SSD與相關DNA序列之長度相同。在某些實施例中,至少95%之ssDNA與相關DNA序列之長度相同。在某些實施例中,SSD之長度在100與20,000個核苷酸之間。在某些實施例中,ssDNA為環狀的。In some embodiments, at least 90% of the SSD is the same length as the associated DNA sequence. In some embodiments, at least 95% of the ssDNA is the same length as the associated DNA sequence. In some embodiments, the length of the SSD is between 100 and 20,000 nucleotides. In some embodiments, the ssDNA is circular.

在某些實施例中,單股DNA (SSD)有效載荷為PCT公開案WO2022011082A1中所述之有效載荷,該案以引用之方式整體併入本文中。在某些實施例中,SSD包含來自絲狀噬菌體之第一序列,該第一序列具有起始子及終止子功能;與該第一序列一致之第二序列;以及位於該第一序列與該第二序列之間的相關單股DNA序列。在某些實施例中,SSD進一步包含可選擇標記物。在某些實施例中,SSD為環狀的。在某些實施例中,SSD為線性的。In some embodiments, the single-stranded DNA (SSD) payload is a payload described in PCT Publication WO2022011082A1, which is incorporated herein by reference in its entirety. In some embodiments, the SSD comprises a first sequence from a filamentous phage, the first sequence having initiator and terminator functions; a second sequence identical to the first sequence; and a related single-stranded DNA sequence between the first sequence and the second sequence. In some embodiments, the SSD further comprises a selectable marker. In some embodiments, the SSD is circular. In some embodiments, the SSD is linear.

在某些實施例中,單股DNA (SSD)有效載荷由PCT公開案WO2022011082A1中所述之方法製得。在某些實施例中,該方法包括在適合由經分離之核酸中的相關單股DNA序列產生單股DNA及由核酸輔助質體產生噬菌體蛋白之條件下培養宿主細胞;使單股DNA及噬菌體蛋白組裝成經工程改造之噬菌體;及收集經工程改造之噬菌體。在某些實施例中,宿主細胞包含經分離之核酸,該核酸包括:來自絲狀噬菌體之第一序列,該第一序列具有起始子及終止子功能;與該第一序列一致之第二序列;以及位於該第一序列與該第二序列之間的相關單股DNA序列;及用於表現能夠將單股DNA組裝成噬菌體之噬菌體蛋白之核酸輔助質體。在某些實施例中,該方法進一步包括自經工程改造之噬菌體中萃取SSD。In some embodiments, the single-stranded DNA (SSD) payload is made by the method described in PCT Publication WO2022011082A1. In some embodiments, the method includes culturing host cells under conditions suitable for producing single-stranded DNA from the relevant single-stranded DNA sequence in the isolated nucleic acid and producing phage proteins from the nucleic acid-assisted plasmid; assembling the single-stranded DNA and phage proteins into engineered phages; and collecting the engineered phages. In some embodiments, the host cell comprises an isolated nucleic acid, the nucleic acid comprising: a first sequence from a filamentous phage, the first sequence having initiator and terminator functions; a second sequence identical to the first sequence; and a related single-stranded DNA sequence between the first sequence and the second sequence; and a nucleic acid helper plasmid for expressing a phage protein capable of assembling single-stranded DNA into phage. In some embodiments, the method further comprises extracting SSD from the engineered phage.

在某些實施例中,至少90%之SSD與相關DNA序列之長度相同。在某些實施例中,至少95%之ssDNA與相關DNA序列之長度相同。在某些實施例中,SSD之長度在100與20,000個核苷酸之間。在某些實施例中,SSD為環狀的。In some embodiments, at least 90% of the SSD is the same length as the associated DNA sequence. In some embodiments, at least 95% of the ssDNA is the same length as the associated DNA sequence. In some embodiments, the length of the SSD is between 100 and 20,000 nucleotides. In some embodiments, the SSD is circular.

在某些實施例中,單股DNA (SSD)有效載荷為PCT公開案WO2021055616A1中所述之有效載荷,該案以引用之方式整體併入本文中。 3. 線性mRNA有效載荷 In certain embodiments, the single-stranded DNA (SSD) payload is a payload described in PCT Publication WO2021055616A1, which is incorporated herein by reference in its entirety. 3. Linear mRNA payload

在各個實施例中,本文所述之基於LNP之醫藥組合物(例如,基於LNP之基因編輯系統)可包括一或多種線性mRNA分子或線性mRNA有效載荷。在各個實施例中,mRNA有效載荷可編碼本文所述之基因編輯系統的一或多種組分。例如,mRNA有效載荷可編碼胺基酸序列-可程式化DNA結合結構域(例如,逆轉錄子RT或可程式化核酸酶)或核酸序列-可程式化DNA結合結構域(例如,CRISPR Cas9、CRISPR Cas12a、CRISPR Cas12f、CRISPR Cas13a、CRISPR Cas13b或TnpB)。In various embodiments, the LNP-based pharmaceutical compositions described herein (e.g., LNP-based gene editing systems) may include one or more linear mRNA molecules or linear mRNA payloads. In various embodiments, the mRNA payload may encode one or more components of the gene editing system described herein. For example, the mRNA payload may encode an amino acid sequence-programmable DNA binding domain (e.g., a retrotranscript RT or a programmable nuclease) or a nucleic acid sequence-programmable DNA binding domain (e.g., CRISPR Cas9, CRISPR Cas12a, CRISPR Cas12f, CRISPR Cas13a, CRISPR Cas13b, or TnpB).

取決於基因編輯系統之性質,mRNA有效載荷亦可編碼一或多個效應子結構域,該等結構域提供促進核苷酸序列及/或基因表現之變化之各種功能,諸如但不限於單股DNA結合蛋白質、核酸酶、核酸內切酶、核酸外切酶、去胺酶(例如,胞苷去胺酶或腺苷去胺酶)、聚合酶(例如,逆轉錄酶)、整合酶、重組酶等,以及包含連接在一起的一或多個功能結構域之融合蛋白。Depending on the nature of the gene editing system, the mRNA payload may also encode one or more effector domains that provide various functions that promote changes in nucleotide sequence and/or gene expression, such as but not limited to single-stranded DNA binding proteins, nucleases, endonucleases, exonucleases, deaminases (e.g., cytidine deaminases or adenosine deaminases), polymerases (e.g., reverse transcriptases), integrases, recombinases, etc., as well as fusion proteins comprising one or more functional domains linked together.

核糖核酸(RNA)係由核苷酸構成之分子,該等核苷酸係連接至含氮鹼基及磷酸酯基之核糖。含氮鹼基包括腺嘌呤(A)、鳥嘌呤(G)、尿嘧啶(U)及胞嘧啶(C)。一般而言,RNA大多數以單股形式存在,但在某些情況下亦可以雙股形式存在。RNA之長度、形式及結構根據RNA之用途而異。例如,RNA之長度可自短序列(例如,siRNA)至長序列(例如,lncRNA)變化,可為線性(例如,mRNA)或環狀的(例如,oRNA),且可為編碼(例如,mRNA)或非編碼(例如,lncRNA)序列。Ribonucleic acid (RNA) is a molecule composed of nucleotides, which are linked to a ribose sugar containing a nitrogenous base and a phosphate group. The nitrogenous bases include adenine (A), guanine (G), uracil (U), and cytosine (C). Generally speaking, RNA mostly exists in a single-stranded form, but in some cases it can also exist in a double-stranded form. The length, form, and structure of RNA vary depending on the purpose of the RNA. For example, the length of RNA can vary from short sequences (e.g., siRNA) to long sequences (e.g., lncRNA), can be linear (e.g., mRNA) or circular (e.g., oRNA), and can be a coding (e.g., mRNA) or non-coding (e.g., lncRNA) sequence.

在各個實施例中,可使用本文所述之基於LNP之基因編輯系統、RNA治療劑及其醫藥組合物來遞送作為線性mRNA分子的mRNA有效載荷。在實施例中,mRNA有效載荷可包含一或多個核苷酸序列,該等序列編碼相關產物,諸如但不限於基因編輯系統之組分(例如,核酸內切酶、引發編輯器等)及/或治療蛋白。In various embodiments, the LNP-based gene editing systems, RNA therapeutics, and pharmaceutical compositions thereof described herein can be used to deliver mRNA payloads as linear mRNA molecules. In embodiments, the mRNA payload can include one or more nucleotide sequences encoding related products, such as, but not limited to, components of a gene editing system (e.g., endonucleases, primer editors, etc.) and/or therapeutic proteins.

在一些實施例中,RNA有效載荷可為線性mRNA。如本文所用,術語「信使RNA」(mRNA)係指編碼相關蛋白質且能夠在 活體外活體內原位離體轉譯以產生相關編碼蛋白之任何多核苷酸。 In some embodiments, the RNA payload may be a linear mRNA. As used herein, the term "messenger RNA" (mRNA) refers to any polynucleotide that encodes a protein of interest and can be translated in vitro , in vivo , in situ or ex vivo to produce the encoded protein of interest.

一般而言,mRNA分子至少包含編碼區、5'非轉譯區(UTR)、3' UTR、5'帽及聚A尾。在一些態樣中,一或多種結構及/或化學修飾或改變可包括於RNA中,其可降低其中引入mRNA之細胞之先天免疫反應。如本文所用,「結構」特徵或修飾為如下特徵或修飾,其中兩個或兩個以上連接之核苷酸在核酸中進行插入、缺失、重複、倒置或隨機化,而未對核苷酸自身造成顯著化學修飾。因為化學鍵必然會受到破壞且再形成以影響結構修飾,故結構修飾具有化學性質且因此為化學修飾。然而,結構修飾將導致不同核苷酸序列。例如,多核苷酸「ATCG」可經化學修飾為「AT-5meC-G」。In general, an mRNA molecule comprises at least a coding region, a 5' untranslated region (UTR), a 3' UTR, a 5' cap, and a poly A tail. In some aspects, one or more structural and/or chemical modifications or alterations may be included in the RNA, which may reduce the innate immune response of the cell into which the mRNA is introduced. As used herein, a "structural" feature or modification is a feature or modification in which two or more linked nucleotides are inserted, deleted, repeated, inverted, or randomized in a nucleic acid without causing significant chemical modification to the nucleotides themselves. Because chemical bonds are necessarily destroyed and reformed to affect structural modifications, structural modifications are chemical in nature and are therefore chemical modifications. However, structural modifications will result in different nucleotide sequences. For example, the polynucleotide "ATCG" can be chemically modified to "AT-5meC-G".

一般而言,本文所用之mRNA中的相關編碼區可編碼二肽、三肽、四肽、五肽、六肽、七肽、八肽、九肽或十肽。在另一實施例中,mRNA可編碼2-30個胺基酸,例如5-30、10-30、2-25、5-25、10-25或10-20個胺基酸之肽。mRNA可編碼至少10、11、12、13、14、15、17、20、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39或40個胺基酸之肽,或不長於10、11、12、13、14、15、17、20、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39或40個胺基酸之肽。In general, the relevant coding region in the mRNA used herein can encode a dipeptide, a tripeptide, a tetrapeptide, a pentapeptide, a hexapeptide, a heptapeptide, an octapeptide, a nonapeptide or a decapeptide. In another embodiment, the mRNA can encode a peptide of 2-30 amino acids, such as 5-30, 10-30, 2-25, 5-25, 10-25 or 10-20 amino acids. The mRNA may encode a peptide of at least 10, 11, 12, 13, 14, 15, 17, 20, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 amino acids, or a peptide of no longer than 10, 11, 12, 13, 14, 15, 17, 20, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 amino acids.

一般而言,編碼相關產物之mRNA區域的長度大於約30個核苷酸長(例如,至少或大於約35、40、45、50、55、60、70、80、90、100、120、140、160、180、200、250、300、350、400、450、500、600、700、800、900、1000、1,100、1,200、1,300、1,400、1,500、1,600、1,700、1,800、1,900、2,000、2,500及3,000、4,000、5,000、6,000、7,000、8,000、9,000、10,000、20,000、30,000、40,000、50,000、60,000、70,000、80,000、90,000個或多達且包括100,000個核苷酸)。Generally, the length of the mRNA region encoding the product of interest is greater than about 30 nucleotides in length (e.g., at least or greater than about 35, 40, 45, 50, 55, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, 1,100, 1,200, 1,300, 1,400, 1,500, 2 70,000, 80,000, 90,000, or up to and including 100,000 nucleotides).

在一些實施例中,mRNA具有跨越約30至約100,000個核苷酸之總長度(例如,30至50、30至100、30至250、30至500、30至1,000、30至1,500、30至3,000、30至5,000、30至7,000、30至10,000、30至25,000、30至50,000、30至70,000、100至250、100至500、100至1,000、100至1,500、100至3,000、100至5,000、100至7,000、100至10,000、100至25,000、100至50,000、100至70,000、100至100,000、500至1,000、500至1,500、500至2,000、500至3,000、500至5,000、500至7,000、500至10,000、500至25,000、500至50,000、500至70,000、500至100,000、1,000至1,500、1,000至2,000、1,000至3,000、1,000至5,000、1,000至7,000、1,000至10,000、1,000至25,000、1,000至50,000、1,000至70,000、1,000至100,000、1,500至3,000、1,500至5,000、1,500至7,000、1,500至10,000、1,500至25,000、1,500至50,000、1,500至70,000、1,500至100,000、2,000至3,000、2,000至5,000、2,000至7,000、2,000至10,000、2,000至25,000、2,000至50,000、2,000至70,000及2,000至100,000個核苷酸)。In some embodiments, the mRNA has a total length spanning from about 30 to about 100,000 nucleotides (e.g., 30 to 50, 30 to 100, 30 to 250, 30 to 500, 30 to 1,000, 30 to 1,500, 30 to 3,000, 30 to 5,000, 30 to 7,000, 30 to 10,000, 30 to 25,000, 30 to 50,000, 30 to 70,000, 100 to 250, 100 to 500, 100 to 1,000, 100 to 1,500, 100 to 3,000, 100 to 5,000, 100 to 7,000, 100 to 10,000, 100 to 25,000, 100 to 50,000, 100 to 70,000, 100 to 100,000, 500 to 1,000, 500 to 1,500, 500 to 2,000, 500 to 3,000, 500 to 5,000, 500 to 7,000, 500 to 10,000, 500 to 25,000, 500 to 50 ,000, 500 to 70,000, 500 to 100,000, 1,000 to 1,500, 1,000 to 2,000, 1,000 to 3,000, 1,000 to 5,000, 1,000 to 7,000, 1,000 to 10,000, 1,000 to 25,000, 1,000 to 50,000, 1,000 to 70,000, 1,000 to 100,000, 1,500 to 3,000, 1,500 to 5,000, 1,5 00 to 7,000, 1,500 to 10,000, 1,500 to 25,000, 1,500 to 50,000, 1,500 to 70,000, 1,500 to 100,000, 2,000 to 3,000, 2,000 to 5,000, 2,000 to 7,000, 2,000 to 10,000, 2,000 to 25,000, 2,000 to 50,000, 2,000 to 70,000, and 2,000 to 100,000 nucleotides).

在一些實施例中,側接編碼相關產物之區域的一或多個區域之長度可獨立地介於15-1,000個核苷酸範圍內(例如,大於30、40、45、50、55、60、70、80、90、100、120、140、160、180、200、250、300、350、400、450、500、600、700、800及900個核苷酸或至少30、40、45、50、55、60、70、80、90、100、120、140、160、180、200、250、300、350、400、450、500、600、700、800、900及1,000個核苷酸)。In some embodiments, the length of one or more regions flanking the region encoding the product of interest may independently range from 15-1,000 nucleotides (e.g., greater than or at least 30, 40, 45, 50, 55, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, and 900 nucleotides).

在一些實施例中,mRNA包含加尾序列,其長度可介於不存在至500個核苷酸範圍內(例如,至少60、70、80、90、120、140、160、180、200、250、300、350、400、450或500個核苷酸)。當加尾區域為聚A尾時,長度可以聚A結合蛋白結合之單位或作為聚A結合蛋白結合之函數來確定。在此實施例中,聚A尾足夠長以結合聚A結合蛋白之至少4個單體。聚A結合蛋白單體與大約38個核苷酸之延伸段結合。因此,已觀察到約80個核苷酸及160個核苷酸之聚A尾為功能性的。In some embodiments, the mRNA comprises a tailing sequence, which can range from absent to 500 nucleotides in length (e.g., at least 60, 70, 80, 90, 120, 140, 160, 180, 200, 250, 300, 350, 400, 450, or 500 nucleotides). When the tailing region is a poly A tail, the length can be determined in units of poly A binding protein binding or as a function of poly A binding protein binding. In this embodiment, the poly A tail is long enough to bind at least 4 monomers of the poly A binding protein. The poly A binding protein monomer binds to an extension of about 38 nucleotides. Thus, poly A tails of about 80 nucleotides and 160 nucleotides have been observed to be functional.

在一些實施例中,mRNA包含加帽序列,其包含單一帽或形成帽之一系列核苷酸。加帽序列可為1至10個(例如,2-9、3-8、4-7、1-5、5-10或至少2個或10個或更少)核苷酸長。在一些實施例中,加帽序列不存在。In some embodiments, the mRNA comprises a capping sequence comprising a single cap or a series of nucleotides that form a cap. The capping sequence can be 1 to 10 (e.g., 2-9, 3-8, 4-7, 1-5, 5-10, or at least 2 or 10 or less) nucleotides long. In some embodiments, the capping sequence is absent.

在一些實施例中,mRNA包括包含起始密碼子之區域。包含起始密碼子之區域之長度可介於3至40個(例如,5-30、10-20、15個或至少4個或30個或更少)核苷酸範圍內。In some embodiments, the mRNA includes a region comprising a start codon. The length of the region comprising the start codon may be between 3 and 40 (e.g., 5-30, 10-20, 15 or at least 4 or 30 or less) nucleotides.

在一些實施例中,mRNA包括包含終止密碼子之區域。包含終止密碼子之區域之長度可介於3至40個(例如,5-30、10-20、15個或至少4個或30個或更少)核苷酸範圍內。In some embodiments, the mRNA includes a region comprising a stop codon. The length of the region comprising the stop codon may be between 3 and 40 (e.g., 5-30, 10-20, 15 or at least 4 or 30 or less) nucleotides.

在一些實施例中,mRNA包括包含限制序列之區域。包含限制序列之區域之長度可介於3至40個(例如,5-30、10-20、15個或至少4個或30個或更少)核苷酸範圍內。 非轉譯區(UTR) In some embodiments, the mRNA includes a region comprising a restriction sequence. The length of the region comprising the restriction sequence can range from 3 to 40 (e.g., 5-30, 10-20, 15 or at least 4 or 30 or less) nucleotides. Untranslated region (UTR)

在各個實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及其醫藥組合物的mRNA有效載荷可包含至少一個非轉譯區(UTR),其側接編碼相關產物之區域及/或併入mRNA分子內。UTR經轉錄,但未經轉譯。mRNA有效載荷可包括5’ UTR序列及3’ UTR序列以及內部UTR。In various embodiments, the mRNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition thereof described herein may include at least one untranslated region (UTR) flanking the region encoding the product of interest and/or incorporated into the mRNA molecule. UTRs are transcribed but not translated. The mRNA payload may include 5'UTR sequences and 3'UTR sequences as well as internal UTRs.

本揭示案之RNA有效負荷可包含充當或用作非轉譯區之一或多個區域或部分。當核酸經設計以編碼至少一種相關多肽時,核酸可包含此等非轉譯區(UTR)中之一或多者。核酸之野生型非轉譯區經轉錄,但未經轉譯。在mRNA中,5' UTR在轉錄起始位點處開始且持續至起始密碼子,但不包括起始密碼子;而3' UTR緊接著終止密碼子開始且持續直至轉錄終止信號。關於UTR在核酸分子穩定性及轉譯方面發揮之調節作用,存在愈來愈多之證據。UTR之調節特徵可併入本揭示案之RNA有效載荷分子(例如,線性及環狀mRNA分子)中,以尤其增強分子穩定性。亦可併入特定特徵來確保轉錄本之控制下調,以防它們經誤導至非所需器官位點。多種5'UTR及3'UTR序列為此項技術中已知且可獲得的。The RNA payload of the present disclosure may include one or more regions or portions that serve or function as non-translated regions. When a nucleic acid is designed to encode at least one polypeptide of interest, the nucleic acid may include one or more of these non-translated regions (UTRs). The wild-type non-translated regions of a nucleic acid are transcribed but not translated. In an mRNA, the 5' UTR begins at the transcription start site and continues to, but not including, the start codon; while the 3' UTR begins immediately following the stop codon and continues until the transcription stop signal. There is increasing evidence for the regulatory role that UTRs play in the stability and translation of nucleic acid molecules. UTR regulatory features can be incorporated into the RNA payload molecules (e.g., linear and circular mRNA molecules) of the present disclosure to, among other things, enhance molecular stability. Specific features can also be incorporated to ensure controlled downregulation of transcripts to prevent them from being misdirected to undesired organ sites. A variety of 5'UTR and 3'UTR sequences are known and available in the art.

在各個實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及其醫藥組合物的mRNA有效載荷可包含至少一個UTR,該UTR可選自美國專利第10,709,779號之表19或20中列出之任何UTR序列,該案以引用之方式併入本文中。 5' UTR 區域 In various embodiments, the mRNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition thereof described herein may comprise at least one UTR, which may be selected from any of the UTR sequences listed in Table 19 or 20 of U.S. Patent No. 10,709,779, which is incorporated herein by reference. 5' UTR region

在各個實施例中,本文所述之基於LNP之基因編輯系統、RNA治療劑及其醫藥組合物的mRNA有效載荷可包含至少一個5' UTR。In various embodiments, the mRNA payload of the LNP-based gene editing system, RNA therapeutic agent, and pharmaceutical composition thereof described herein may comprise at least one 5'UTR.

5' UTR係mRNA中在起始密碼子(由核糖體轉譯之mRNA轉錄本的第一密碼子)直接上游(5')之區域。5' UTR不編碼蛋白質(為非編碼的)。天然5'UTR具有在轉譯起始中發揮作用之特徵。其攜帶印記,如Kozak序列,通常已知該等序列參與核糖體起始多種基因之轉譯的過程。Kozak序列具有共有CCR(A/G)CCAUGG (SEQ ID NO: 19421),其中R為起始密碼子(AUG)上游三個鹼基之嘌呤(腺嘌呤或鳥嘌呤),其後為另一『G』。亦已知5'UTR形成參與延伸因子結合之二級結構。亦已知5’ UTR序列對於核糖體募集至mRNA很重要,且據報告在轉譯中發揮作用(Hinnebusch A等人, (2016) Science, 352:6292: 1413-6)。另外,5’ UTR序列可向由本文所述之RNA有效載荷編碼的多肽賦予增加之半衰期、增加之表現及/或增加之活性。The 5'UTR is the region of the mRNA that is immediately upstream (5') of the start codon (the first codon of the mRNA transcript translated by the ribosome). The 5'UTR does not encode a protein (is non-coding). The natural 5'UTR has the characteristic of playing a role in translation initiation. It carries imprints, such as the Kozak sequence, which are generally known to be involved in the process of ribosome initiation of translation of various genes. The Kozak sequence has the consensus CCR(A/G)CCAUGG (SEQ ID NO: 19421), where R is a purine (adenine or guanine) three bases upstream of the start codon (AUG), followed by another 'G'. The 5'UTR is also known to form a secondary structure that is involved in the binding of elongation factors. It is also known that the 5'UTR sequence is important for ribosome recruitment to mRNA and is reported to play a role in translation (Hinnebusch A et al., (2016) Science, 352:6292: 1413-6). In addition, the 5'UTR sequence can confer increased half-life, increased expression and/or increased activity to the polypeptide encoded by the RNA payload described herein.

在各個實施例中,本文所考慮之RNA有效載荷構築體可包括自然界中發現之5’UTR及未在自然界中發現之彼等5’UTR。例如,5’UTR可為合成的及/或可相對於天然存在之5’UTR發生序列改變。此類改變之5’UTR可包括相對於天然存在之5’UTR的一或多種修飾,例如插入、缺失或改變之序列,或用一或多種核苷酸類似物取代天然存在之核苷酸。In various embodiments, the RNA payload constructs contemplated herein may include 5'UTRs found in nature and those 5'UTRs not found in nature. For example, the 5'UTR may be synthetic and/or may have sequence alterations relative to the naturally occurring 5'UTR. Such altered 5'UTRs may include one or more modifications relative to the naturally occurring 5'UTR, such as insertions, deletions, or altered sequences, or substitutions of naturally occurring nucleotides with one or more nucleotide analogs.

5' UTR在轉錄起始位點處開始且持續至起始密碼子,但不包括起始密碼子;而3 'UTR緊接著終止密碼子開始且持續直至轉錄終止信號。雖然不希望受理論束縛,但UTR可具有在核酸轉譯及穩定性方面之調節作用。The 5'UTR begins at the transcription start site and continues to the start codon, but not including the start codon, while the 3'UTR begins immediately after the stop codon and continues until the transcription stop signal. Although not wishing to be bound by theory, UTRs may have regulatory roles in nucleic acid translation and stability.

天然5' UTR通常包括在轉譯起始中具有作用之特徵,因為該等特徵傾向於包括通常已知參與核糖體起始多種基因之轉譯的過程之Kozak序列。Kozak序列具有共有CCR(A/G)CCAUGG (SEQ ID NO: 19421),其中R為起始密碼子(AUG)上游三個鹼基之嘌呤(腺嘌呤或鳥嘌呤),其後為另一『G』。亦已知5'UTR形成參與延伸因子結合之二級結構。The natural 5'UTR generally includes features that have a role in translation initiation, as these features tend to include the Kozak sequence, which is generally known to be involved in the process by which ribosomes initiate translation of a variety of genes. The Kozak sequence has a consensus of CCR(A/G)CCAUGG (SEQ ID NO: 19421), where R is a purine (adenine or guanine) three bases upstream of the start codon (AUG), followed by another 'G'. The 5'UTR is also known to form a secondary structure that is involved in the binding of elongation factors.

在一實施例中,5’ UTR包含表X中所提供之序列或與表Y中所提供之5’ UTR序列具有至少80%、85%、90%、95%、96%、97%、98%、99%或100%序列一致性之序列或其變異體或片段(例如,缺乏表Y中所提供之5’ UTR序列的前一個、兩個、三個、四個、五個或六個核苷酸之片段)。在一實施例中,5’ UTR包含與表Y之任一序列具有至少80%、85%、90%、95%、96%、97%、98%、99%或100%一致性之序列。 表Y – 5’ UTR之例示性核苷酸序列 5’ UTR核苷酸序列 序列標識符 ggaaaucgca aaauuugcuc uucgcguuag auuucuuuua guuuucucgc aacuagcaag cuuuuuguuc ucgccgccgc c SEQ ID NO: 19422 ggaaaucgca aaauuugcuc uucgcguuag auuucuuuua guuuucucgc aacuagcaag cuuuuuguuc ucgccgccgc c SEQ ID NO: 19423 ggaaaucgca aaauuuucuu uucgcguuag auuucuuuua guuuucuuuc aacuagcaag cuuuuuguuc ucgccgccgc c SEQ ID NO: 19424 ggaaaucgca aaauuuuugc ucuuuuucgc guuagauuuc uuuuaguuuu cuykcaacua gcaagcuuuu uguucucgcc rcc SEQ ID NO: 19425 ggaaaucccc acaaccgccu cauauccagg cucaagaaua gagcucagug uuuuguuguu uaaucauucc gacguguuuu gcgauauucg cgcaaagcag ccagucgcgc gcuugcuuuu aaguagaguu guuuuuccac ccguuugcca ggcaucuuua auuuaacaua uuuuuauuuu ucaggcuaac cuacgccgcc acc SEQ ID NO: 19426 ggaaauaaga gagaaaagaa gaguaagaag aaauauaaga ucucccugag cuucagggag ccccggcgcc gccacc SEQ ID NO: 19427 ggaaaccccc cacccccgua agagagaaaa gaagaguaag aagaaauaua agaucucccu gagcuucagg gagccccggc gccgccacc SEQ ID NO: 19428 ggagaacuuc cgcuuccguu ggcgcaagcg cuuucauuuu uucugcuacc gugacuaag SEQ ID NO: 19429 ggaaauaaga gagaaaagaa gaguaagaag aaauauaaga gccacc SEQ ID NO: 19430 ggaaauaaga gagaaaagaa gaguaagaag aaauauaaga ccccggcgcc gccacc SEQ ID NO: 19431 ggaaacuuua uuuaguguua cuuuauuuuc uguuuauuug uguuucuuca guggguuugu ucuaauuucc uuggccgcc SEQ ID NO: 19432 ggaaaaucug uauuagguug gcguguucuu uggucgguug uuaguauugu uguugauucg uuuguggucg guugccgcc SEQ ID NO: 19433 ggaaaauuau uaacaucuug guauucucga uaaccauucg uuggauuuua uuguauucgu aguuuggguu ccugccgcc SEQ ID NO: 19434 ggaaauuauu auuauuucua gcuacaauuu aucauuguau uauuuuagcu auucaucauu auuuacuugg ugaucaaca SEQ ID NO: 19435 ggaaauaggu uguuaaccaa guucaagccu aauaagcuug gauucuggug acuugcuuca ccguuggcgg gcaccgauc SEQ ID NO: 19436 ggaaaucgua gagagucgua cuuaguacau aucgacuauc gguggacacc aucaagauua uaaaccaggc caga SEQ ID NO: 19437 ggaaacccgc ccaagcgacc ccaacauauc agcaguugcc caaucccaac ucccaacaca auccccaagc aacgccgcc SEQ ID NO: 19438 ggaaagcgau ugaaggcguc uuuucaacua cucgauuaag guuggguauc gucgugggac uuggaaauuu guuguuucc SEQ ID NO: 19439 ggaaacuaau cgaaauaaaa gagccccgua cucuuuuauu ucuauuaggu uaggagccuu agcauuugua ucuuaggua SEQ ID NO: 19440 ggaaauguga uuuccagcaa cuucuuuuga auauauugaa uuccuaauuc aaagcgaaca aaucuacaag ccauauacc SEQ ID NO: 19441 ggaaaucgua gagagucgua cuuacguggu cgccauugca uagcgcgcga aagcaacagg aacaagaacg cgcc SEQ ID NO: 19442 ggaaaucgua gagagucgua cuuagaauaa acagagucgg gucgacuugu cucugauacu acgacgucac aauc SEQ ID NO: 19443 ggaaaauuug ccuucggagu ugcguauccu gaacugccca gccuccugau auacaacugu uccgcuuauu cgggccgcc SEQ ID NO: 19444 ggaaaucuga gcaggaaucc uuugugcauu gaagacuuua gauuccucuc ugcgguagac gugcacuuau aaguauuug SEQ ID NO: 19445 ggaaagcgau ugaaggcguc uuuucaacua cucgauuaag guuggguauc gucgugggac uuggaaauuu guugccacc SEQ ID NO: 19446 ggaaaauuuu agccuggaac guuagauaac uguccuguug ucuuuauaua cuuggucccc aaguaguuug ucuuccaaa SEQ ID NO: 19447 ggaaauuuuu uuuugauauu auaagaguuu uuuuuugaua uuaagaaaau uuuuuuuuga uauuagaaga guaagaagaa auauaagacc ccggcgccgc cacc SEQ ID NO: 19448 ggaaauaaga gagaaaagaa gaguaagaag aaauauaaga gccaaaaaaa aaaaacc SEQ ID NO: 19449 ggaaaucucc cugagcuuca gggaguaaga gagaaaagaa gaguaagaag aaauauaaga ccccggcgcc gccacc SEQ ID NO: 19450 In one embodiment, the 5'UTR comprises a sequence provided in Table X or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to a 5'UTR sequence provided in Table Y, or a variant or fragment thereof (e.g., a fragment lacking the first one, two, three, four, five or six nucleotides of a 5'UTR sequence provided in Table Y). In one embodiment, the 5'UTR comprises a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identity to any of the sequences in Table Y. Table Y - Exemplary Nucleotide Sequences of 5'UTR 5'UTR nucleotide sequence Sequence Identifier ggaaaucgca aaauuugcuc uucgcguuag auuucuuuua guuuucucgc aacuagcaag cuuuuuguuc ucgccgccgc c SEQ ID NO: 19422 ggaaaucgca aaauuugcuc uucgcguuag auuucuuuua guuuucucgc aacuagcaag cuuuuuguuc ucgccgccgc c SEQ ID NO: 19423 ggaaaucgca aaauuuucuu uucgcguuag auuucuuuua guuuucuuuc aacuagcaag cuuuuuguuc ucgccgccgc c SEQ ID NO: 19424 ggaaaucgca aaauuuuugc ucuuuuucgc guuagauuuc uuuuaguuuu cuykcaacua gcaagcuuuu uguucucgcc rcc SEQ ID NO: 19425 ggaaaucccc acaaccgccu cauauccagg cucaagaaua gagcucagug uuuuguuguu uaaucauucc gacguguuuu gcgauauucg cgcaaagcag ccagucgcgc gcuugcuuuu aaguagaguu guuuuuccac ccguuugcca ggcaucuuua auuuaacaua uuuuuauuuu ucagg cuaac cuacgccgcc acc SEQ ID NO: 19426 ggaaauaaga gagaaaagaa gaguaagaag aaauauaaga ucucccugag cuucagggag ccccggcgcc gccacc SEQ ID NO: 19427 ggaaaccccc cacccccgua agagagaaaa gaagaguaag aagaaauaua agaucucccu gagcuucagg gagccccggc gccgccacc SEQ ID NO: 19428 ggagaacuuc cgcuuccguu ggcgcaagcg cuuucauuuu uucugcuacc gugacuaag SEQ ID NO: 19429 ggaaauaaga gagaaaagaa gaguaagaag aaauauaaga gccacc SEQ ID NO: 19430 ggaaauaaga gagaaaagaa gaguaagaag aaauauaaga ccccggcgcc gccacc SEQ ID NO: 19431 ggaaacuuua uuuaguguua cuuuauuuuc uguuuauuug uguuucuuca guggguuugu ucuaauuucc uuggccgcc SEQ ID NO: 19432 ggaaaaucug uauuagguug gcguguucuu uggucgguug uuaguauugu uguugauucg uuuguggucg guugccgcc SEQ ID NO: 19433 ggaaaauuau uaacaucuug guauucucga uaaccauucg uuggauuuua uuguauucgu aguuuggguu ccugccgcc SEQ ID NO: 19434 ggaaauuauu auuauuucua gcuacaauuu aucauuguau uauuuuagcu auucaucauu auuuacuugg ugaucaaca SEQ ID NO: 19435 ggaaauaggu uguuaaccaa guucaagccu aauaagcuug gauucuggug acuugcuuca ccguuggcgg gcaccgauc SEQ ID NO: 19436 ggaaaucgua gagagucgua cuuaguacau aucgacuauc gguggacacc aucaagauua uaaaccaggc caga SEQ ID NO: 19437 ggaaacccgc ccaagcgacc ccaacauauc agcaguugcc caaucccaac ucccaacaca auccccaagc aacgccgcc SEQ ID NO: 19438 ggaaagcgau ugaaggcguc uuuucaacua cucgauuaag guuggguauc gucgugggac uuggaaauuu guuguuucc SEQ ID NO: 19439 ggaaacuaau cgaaauaaaa gagccccgua cucuuuuauu ucuauuaggu uaggagccuu agcauuugua ucuuaggua SEQ ID NO: 19440 ggaaauguga uuuccagcaa cuucuuuuga auauauugaa uuccuaauuc aaagcgaaca aaucuacaag ccauauacc SEQ ID NO: 19441 ggaaaucgua gagagucgua cuuacguggu cgccauugca uagcgcgcga aagcaacagg aacaagaacg cgcc SEQ ID NO: 19442 ggaaaucgua gagagucgua cuuagaauaa acagagucgg gucgacuugu cucugauacu acgacgucac aauc SEQ ID NO: 19443 ggaaaauuug ccuucggagu ugcguauccu gaacugccca gccuccugau auacaacugu uccgcuuauu cgggccgcc SEQ ID NO: 19444 ggaaaucuga gcaggaaucc uuugugcauu gaagacuuua gauuccucuc ugcgguagac gugcacuuau aaguauuug SEQ ID NO: 19445 ggaaagcgau ugaaggcguc uuuucaacua cucgauuaag guuggguauc gucgugggac uuggaaauuu guugccacc SEQ ID NO: 19446 ggaaaauuuu agccuggaac guuagauaac uguccuguug ucuuuauaua cuuggucccc aaguaguuug ucuuccaaa SEQ ID NO: 19447 ggaaauuuuu uuuugauauu auaagaguuu uuuuuugaua uuaagaaaau uuuuuuuuga uauuagaaga guaagaagaa auauaagacc ccggcgccgc cacc SEQ ID NO: 19448 ggaaauaaga gagaaaagaa gaguaagaag aaauauaaga gccaaaaaaa aaaaacc SEQ ID NO: 19449 ggaaaucucc cugagcuuca gggaguaaga gagaaaagaa gaguaagaag aaauauaaga ccccggcgcc gccacc SEQ ID NO: 19450

在本揭示案之一些實施例中,5' UTR為異源UTR,亦即,係在自然界中發現的與不同mRNA相關之UTR。在另一實施例中,5' UTR為合成UTR,亦即,不存在於自然界中。合成UTR包括已發生突變以改良其特性之UTR,例如,該等UTR增加基因表現;以及完全合成之彼等UTR。例示性5' UTR包括非洲爪蟾或人類源性α-球蛋白或β-球蛋白(例如,US8,278,063及US9,012,219)、人類細胞色素b-245多肽及羥基類固醇(17b)去氫酶及煙草蝕刻病毒。亦可使用CMV即刻早期1 (IE1)基因(參見US20140206753及WO2013/185069),序列GGGAUCCUACC (SEQ ID NO: 19451) (WO2014144196)。在另一實施例中,TOP基因之5' UTR係缺乏5' TOP模體(寡嘧啶束)之TOP基因的5' UTR (例如WO/2015101414、WO2015101415、WO/2015/062738、WO2015024667、WO2015024667;源自核糖體蛋白大32 (L32)基因之5' UTR元件(WO/2015101414、WO2015101415、WO/2015/062738)),可使用源自羥基類固醇(17-β)去氫酶4基因(HSD17B4)之5'UTR的5' UTR元件(WO2015024667)或源自ATP5A1之5' UTR的5' UTR元件(WO2015024667)。在一實施例中,內部核糖體進入位點(IRES)用作5' UTR之替代品。In some embodiments of the present disclosure, the 5'UTR is a heterologous UTR, i.e., a UTR found in nature and associated with a different mRNA. In another embodiment, the 5'UTR is a synthetic UTR, i.e., does not exist in nature. Synthetic UTRs include UTRs that have been mutated to improve their properties, e.g., such UTRs increase gene expression; and those that are completely synthetic. Exemplary 5'UTRs include Xenopus or human-derived α-globin or β-globin (e.g., US8,278,063 and US9,012,219), human cytochrome b-245 polypeptide, and hydroxysteroid (17b) dehydrogenase, and tobacco etch virus. The CMV immediate early 1 (IE1) gene (see US20140206753 and WO2013/185069), sequence GGGAUCCUACC (SEQ ID NO: 19451) (WO2014144196) can also be used. In another embodiment, the 5'UTR of the TOP gene is a 5'UTR of a TOP gene lacking a 5'TOP motif (oligopyrimidine tract) (e.g., WO/2015101414, WO2015101415, WO/2015/062738, WO2015024667, WO2015024667; a 5'UTR element derived from the ribosomal protein large 32 (L32) gene (WO/2015101414, WO2015101415, WO/2015/062738)), a 5'UTR element derived from the 5'UTR of the hydroxysteroid (17-β) dehydrogenase 4 gene (HSD17B4) (WO2015024667) or a 5'UTR derived from the 5'UTR of ATP5A1 can be used. UTR element (WO2015024667). In one embodiment, an internal ribosome entry site (IRES) is used as a substitute for the 5' UTR.

在一些實施例中,本揭示案之5' UTR包含選自SEQ ID NO:19452 (GGGAAAUAAG AGAGAAAAGA AGAGUAAGAA GAAAUAUAAG AGCCACC)及SEQ ID NO:19453 (GGGAAATAAG AGAGAAAAGA AGAGTAAGAA GAAATATAAG AGCCACC)之序列。 3' UTR 區域 In some embodiments, the 5'UTR of the present disclosure comprises a sequence selected from SEQ ID NO: 19452 (GGGAAAUAAG AGAGAAAAGA AGAGUAAGAA GAAAUAUAAG AGCCACC) and SEQ ID NO: 19453 (GGGAAATAAG AGAGAAAAGA AGAGTAAGAA GAAATATAAG AGCCACC). 3'UTR Region

在各個實施例中,本文所述之基於LNP之鹼基編輯系統、RNA治療劑及其醫藥組合物的mRNA有效載荷可包含至少一個3' UTR。3' UTR可為異源或合成的。In various embodiments, the mRNA payload of the LNP-based base editing system, RNA therapeutic agent, and pharmaceutical composition thereof described herein may include at least one 3'UTR. The 3'UTR may be heterologous or synthetic.

3' UTR係mRNA中在終止密碼子(發出轉譯終止信號之mRNA轉錄本的密碼子)直接下游(3')之區域。3' UTR不編碼蛋白質(為非編碼的)。已知天然或野生型3' UTR具有包埋於其中之腺苷及尿苷延伸段。此等富AU印記在具有高週轉率之基因中尤其普遍。基於其序列特徵及功能特性,該等富AU元件(ARE)可分為三類(Chen等人, 1995):I類ARE在富U區域內含有數個分散之AUUUA模體複本。C-Myc及MyoD含有I類ARE。II類ARE具有兩個或兩個以上重疊之UUAUUUA(U/A)(U/A)九聚體。含有此類型之ARE之分子包括GM-CSF及TNF-α。III類ARES未明確定義。此等富U區域不含AUUUA模體。c-Jun及肌細胞生成素為此類中之兩種充分研究之實例。已知大多數與ARE結合之蛋白質會使信使不穩定,而ELAV家族之成員(尤其為HuR)已經記錄增加mRNA之穩定性。HuR與所有三個類別之ARE結合。將HuR特異性結合位點工程改造至核酸分子之3' UTR中將導致HuR結合且因此導致信使在活體內穩定。The 3'UTR is a region of mRNA that is immediately downstream (3') of the stop codon (the codon of the mRNA transcript that signals the end of translation). The 3'UTR does not encode protein (is non-coding). The native or wild-type 3'UTR is known to have stretches of adenosine and uridine embedded therein. These AU-rich imprints are particularly prevalent in genes with high turnover rates. Based on their sequence characteristics and functional properties, these AU-rich elements (AREs) can be divided into three classes (Chen et al., 1995): Class I AREs contain several dispersed copies of the AUUUA motif within the U-rich region. C-Myc and MyoD contain class I AREs. Class II AREs have two or more overlapping UUAUUUA(U/A)(U/A) nonamers. Molecules containing this type of ARE include GM-CSF and TNF-α. Class III ARES are not well defined. These U-rich regions do not contain the AUUUA motif. c-Jun and myogenin are two well-studied examples of this class. Most proteins that bind to AREs are known to destabilize the message, but members of the ELAV family, especially HuR, have been documented to increase mRNA stability. HuR binds to AREs of all three classes. Engineering a HuR-specific binding site into the 3'UTR of a nucleic acid molecule will result in HuR binding and, therefore, message stabilization in vivo.

已知3' UTR具有包埋於其中之腺苷及尿苷延伸段。此等富AU印記在具有高週轉率之基因中尤其普遍。基於其序列特徵及功能特性,該等富AU元件(ARE)可分為三類(Chen等人, 1995):I類ARE在富U區域內含有數個分散之AUUUA模體複本。C-Myc及MyoD含有I類ARE。II類ARE具有兩個或兩個以上重疊之UUAUUUA(U/A)(U/A)九聚體。含有此類型之ARE之分子包括GM-CSF及TNF-a。III類ARES未明確定義。此等富U區域不含AUUUA模體。c-Jun及肌細胞生成素為此類中之兩種充分研究之實例。已知大多數與ARE結合之蛋白質會使信使不穩定,而ELAV家族之成員(尤其為HuR)已經記錄增加mRNA之穩定性。HuR與所有三個類別之ARE結合。將HuR特異性結合位點工程改造至核酸分子之3' UTR中將導致HuR結合且因此導致信使在活體內穩定。The 3'UTR is known to have stretches of adenosine and uridine embedded in it. These AU-rich imprints are particularly common in genes with high turnover rates. Based on their sequence characteristics and functional properties, these AU-rich elements (AREs) can be divided into three classes (Chen et al., 1995): Class I AREs contain several dispersed copies of the AUUUA motif within the U-rich region. C-Myc and MyoD contain class I AREs. Class II AREs have two or more overlapping UUAUUUA(U/A)(U/A) nonamers. Molecules containing this type of ARE include GM-CSF and TNF-a. Class III ARES are not well defined. These U-rich regions do not contain the AUUUA motif. c-Jun and myogenin are two well-studied examples of this class. Most proteins that bind to AREs are known to destabilize the message, while members of the ELAV family, especially HuR, have been documented to increase mRNA stability. HuR binds to all three classes of AREs. Engineering a HuR-specific binding site into the 3'UTR of a nucleic acid molecule will result in HuR binding and, therefore, message stabilization in vivo.

3' UTR富AU元件(ARE)之引入、移除或修飾可用於調節本文所述之mRNA有效載荷的穩定性。例如,可引入ARE之一或多個複本以使mRNA不太穩定且由此減少轉譯且減少所得蛋白質之產生。或者,可鑑定ARE且進行移除或突變以增加細胞內穩定性且因此增加所得蛋白質之轉譯及產生。The introduction, removal or modification of the 3'UTR AU-rich element (ARE) can be used to modulate the stability of the mRNA payload described herein. For example, one or more copies of the ARE can be introduced to make the mRNA less stable and thereby reduce translation and reduce production of the resulting protein. Alternatively, the ARE can be identified and removed or mutated to increase intracellular stability and thereby increase translation and production of the resulting protein.

在一些實施例中,在特定器官及/或組織中引入通常在標靶器官基因中表現之特徵,可增強mRNA之穩定性及蛋白質產生。作為非限制性實例,該特徵可為UTR。作為另一實例,該特徵可為內含子或內含子序列之部分。In some embodiments, the introduction of a feature that is normally expressed in a target organ gene in a specific organ and/or tissue can enhance the stability of mRNA and protein production. As a non-limiting example, the feature can be a UTR. As another example, the feature can be an intron or part of an intron sequence.

一般技術者應瞭解,異源或合成5' UTR可與任何所需之3' UTR序列一起使用。例如,異源5' UTR可與具有異源3' UTR之合成3' UTR一起使用。One of ordinary skill in the art will appreciate that a heterologous or synthetic 5'UTR can be used with any desired 3'UTR sequence. For example, a heterologous 5'UTR can be used with a synthetic 3'UTR having a heterologous 3'UTR.

非UTR序列亦可用作RNA有效載荷構築體內之區域或亞區。例如,內含子或內含子序列之部分可併入本揭示案之核酸之區域中。內含子序列之併入可增加蛋白質產生以及核酸水準。Non-UTR sequences can also be used as regions or subregions within RNA payload constructs. For example, introns or portions of intronic sequences can be incorporated into regions of the nucleic acids of the present disclosure. Incorporation of intronic sequences can increase protein production and nucleic acid levels.

特徵之組合可包括於側接區域中且可含於其他特徵內。例如,mRNA有效載荷中之相關多肽編碼區可側接5' UTR,該5' UTR可含有強Kozak轉譯起始信號;及/或3' UTR,該3' UTR可包括用於模板化添加聚A尾之oligo(dT)序列。5' UTR可包含來自相同及/或不同基因之第一多核苷酸片段及第二多核苷酸片段,諸如美國專利申請公開案第20100293625號及PCT/US2014/069155中所述之5' UTR,各案以引用之方式整體併入本文中Combinations of features may be included in the flanking regions and may be contained within other features. For example, the relevant polypeptide coding region in the mRNA payload may be flanked by a 5'UTR, which may contain a strong Kozak translation initiation signal; and/or a 3'UTR, which may include an oligo(dT) sequence for templated addition of a poly A tail. The 5'UTR may comprise a first polynucleotide segment and a second polynucleotide segment from the same and/or different genes, such as the 5'UTR described in U.S. Patent Application Publication No. 20100293625 and PCT/US2014/069155, each of which is incorporated herein by reference in its entirety.

應理解,來自任何基因之任何UTR均可併入RNA有效載荷分子(例如,線性mRNA)之區域中。此外,可利用任何已知基因之多個野生型UTR。提供並非野生型區域之變異體的人工UTR亦在本揭示案之範圍內。此等UTR或其部分可以與其中選出該等UTR或其部分之轉錄本中相同的取向置放,或可改變取向或位置。因此,5'或3' UTR可倒置、縮短、加長、用一或多個其他5' UTR或3' UTR製成。如本文所用,與UTR序列相關之術語「改變(altered)」意謂該UTR相對於參考序列已以某種方式發生變化。例如,相對於野生型或原生UTR,3' UTR或5' UTR可如上文所教示藉由取向或位置之變化而改變,或可藉由包括額外核苷酸、核苷酸缺失、核苷酸交換或轉座而改變。產生「改變之」UTR (無論3'或5')之此等變化中之任一者均包含變異型UTR。It should be understood that any UTR from any gene can be incorporated into a region of an RNA payload molecule (e.g., a linear mRNA). In addition, multiple wild-type UTRs of any known gene can be utilized. It is also within the scope of the present disclosure to provide artificial UTRs that are variants of non-wild-type regions. These UTRs or portions thereof can be placed in the same orientation as in the transcript from which they were selected, or the orientation or position can be changed. Thus, a 5' or 3' UTR can be inverted, shortened, lengthened, made with one or more other 5' UTRs or 3' UTRs. As used herein, the term "altered" in relation to a UTR sequence means that the UTR has been altered in some way relative to a reference sequence. For example, a 3'UTR or 5'UTR may be altered by a change in orientation or position as taught above, or may be altered by the inclusion of additional nucleotides, nucleotide deletions, nucleotide exchanges, or transpositions relative to a wild-type or native UTR. Any of these changes that result in an "altered" UTR (whether 3' or 5') includes a variant UTR.

在一些實施例中,可使用雙重、三重或四重UTR,諸如5' UTR或3' UTR。如本文所用,「雙重」UTR為如下UTR,其中同一UTR之兩個複本串聯地或實質上串聯地經編碼。例如,可如美國專利公開案20100129877中所述使用雙重β-球蛋白3' UTR,該案之內容以引用之方式整體併入本文中。In some embodiments, a double, triple, or quadruple UTR may be used, such as a 5'UTR or a 3'UTR. As used herein, a "double" UTR is a UTR in which two copies of the same UTR are encoded in tandem or substantially in tandem. For example, a double β-globin 3'UTR may be used as described in U.S. Patent Publication No. 20100129877, the contents of which are incorporated herein by reference in their entirety.

具有圖案化UTR亦在本揭示案之範圍內。如本文所用,「圖案化UTR」係反映重複或交替模式之彼等UTR,諸如重複一次、兩次或超過3次之ABABAB或AABBAABBAABB或ABCABCABC或其變異體。在此等模式中,每個字母A、B或C表示核苷酸層面上之不同UTR。It is also within the scope of the present disclosure to have patterned UTRs. As used herein, "patterned UTRs" are those UTRs that reflect a repeating or alternating pattern, such as ABABAB or AABBAABBABB or ABCABCABC or variants thereof repeated once, twice, or more than three times. In these patterns, each letter A, B, or C represents a different UTR at the nucleotide level.

在一些實施例中,側接區域選自其蛋白質共享共同功能、結構、特徵或特性之轉錄本家族。例如,相關多肽可屬於在特定細胞、組織中或在發育期間之某個時間表現之蛋白質家族。來自此等基因中之任一者之UTR可與相同或不同蛋白質家族之任何其他UTR交換,以產生新的多核苷酸。如本文所用,「蛋白質家族」在最廣泛意義上使用,係指共享至少一種功能、結構、特徵、定位、起源或表現模式之一組兩種或兩種以上相關多肽。In some embodiments, the flanking regions are selected from a family of transcripts whose proteins share a common function, structure, characteristic, or property. For example, related polypeptides may belong to a family of proteins that are expressed in a particular cell, tissue, or at a certain time during development. A UTR from any of these genes may be exchanged with any other UTR of the same or different protein family to generate a new polynucleotide. As used herein, "protein family" is used in the broadest sense to refer to a group of two or more related polypeptides that share at least one function, structure, characteristic, location, origin, or expression pattern.

非轉譯區亦可包括轉譯增強子元件(TEE)。作為非限制性實例,TEE可包括美國申請案第20090226470號(以引用之方式整體併入本文中)中所述之彼等TEE,以及此項技術中已知之彼等TEE。 5' 加帽 The non-translated region may also include a translation enhancing element (TEE). As a non-limiting example, TEEs may include those described in U.S. Application No. 20090226470 (incorporated herein by reference in its entirety), as well as those known in the art. 5' capping

在各個實施例中,本文所述之基於LNP之鹼基編輯系統、RNA治療劑及其醫藥組合物的mRNA有效載荷可包含5’帽結構。In various embodiments, the mRNA payload of the LNP-based base editing system, RNA therapeutics, and pharmaceutical compositions thereof described herein may comprise a 5' cap structure.

mRNA之5'帽結構參與核輸出、增加聚核苷酸穩定性且結合mRNA帽結合蛋白(CBP),該CBP經由CBP與聚(A)結合蛋白締合形成成熟環狀mRNA物質,負責細胞中之mRNA穩定性及轉譯勝任性。該帽進一步有助於在mRNA剪接期間移除5'近端內含子。The 5' cap structure of mRNA is involved in nuclear export, increases polynucleotide stability and binds mRNA cap binding protein (CBP), which is responsible for mRNA stability and translation competence in the cell through the formation of mature circular mRNA species by CBP and poly(A) binding protein. The cap further facilitates the removal of 5' proximal introns during mRNA splicing.

內源性mRNA分子可為5'端加帽的,從在mRNA分子之末端鳥苷帽殘基與5'末端轉錄之有義核苷酸之間生成5'-ppp-5'-三磷酸鍵聯。接著可使此5'-鳥苷酸帽甲基化以生成N7-甲基-鳥苷酸殘基。mRNA 5'端之末端及/或前末端轉錄的核苷酸之核糖亦可視情況經2'-0-甲基化。經由鳥苷酸帽結構之水解及裂解實現的5'-去帽可靶向核酸分子,諸如mRNA分子,以進行降解。Endogenous mRNA molecules can be 5'-capped, resulting from a 5'-ppp-5'-triphosphate linkage between the terminal guanosine cap residue and the 5'-terminal transcribed sense nucleotide of the mRNA molecule. This 5'-guanylate cap can then be methylated to generate an N7-methyl-guanylate residue. The ribose sugars of the terminal and/or pre-terminal transcribed nucleotides at the 5' end of the mRNA can also be 2'-0-methylated as appropriate. 5'-decapping achieved by hydrolysis and cleavage of the guanylate cap structure can target nucleic acid molecules, such as mRNA molecules, for degradation.

對mRNA之修飾可生成不可水解之帽結構,從而防止去帽且因此增加mRNA半衰期。因為帽結構水解需要5'-ppp-5'磷酸二酯鍵聯之裂解,故可在加帽反應期間使用經修飾核苷酸。例如,來自New England Biolabs (Ipswich, MA)之牛痘加帽酶可根據製造商之說明書與a-硫代-鳥苷核苷酸一起使用以在5'-ppp-5'帽中產生硫代磷酸酯鍵聯。Modification of mRNA can generate a non-hydrolyzable cap structure, thereby preventing decapping and thus increasing the half-life of the mRNA. Because hydrolysis of the cap structure requires cleavage of the 5'-ppp-5' phosphodiester linkage, modified nucleotides can be used during the capping reaction. For example, vaccinia capping enzyme from New England Biolabs (Ipswich, MA) can be used with α-thio-guanosine nucleotides according to the manufacturer's instructions to generate phosphorothioate linkages in the 5'-ppp-5' cap.

可使用額外的經修飾之鳥苷核苷酸,諸如a-甲基-膦酸及硒代-磷酸核苷酸。Additional modified guanosine nucleotides may be used, such as α-methyl-phosphonic acid and seleno-phosphate nucleotides.

額外修飾包括但不限於mRNA (如上文所提及)之5 '末端及/或5'前末端核苷酸的核糖在糖環之2'-羥基上的2'-0-甲基化。可使用多種不同的5 '-帽結構來生成核酸分子(諸如mRNA分子)之5 '-帽。Additional modifications include, but are not limited to, 2'-O-methylation of the ribose of the 5'-terminal and/or 5'-pre-terminal nucleotide of the mRNA (as mentioned above) on the 2'-hydroxyl group of the sugar ring. A variety of different 5'-cap structures can be used to generate the 5'-cap of a nucleic acid molecule (such as an mRNA molecule).

帽類似物在本文中亦稱為合成帽類似物、化學帽、化學帽類似物或結構或功能性帽類似物,其化學結構不同於天然(亦即,內源性、野生型或生理性) 5'-帽,同時保留帽功能。帽類似物可為以化學方式(亦即,非酶促)或酶促合成的,及/或連接至核酸分子。Cap analogs are also referred to herein as synthetic cap analogs, chemical caps, chemical cap analogs, or structural or functional cap analogs, whose chemical structure is different from the natural (i.e., endogenous, wild-type or physiological) 5'-cap while retaining the cap function. Cap analogs can be chemically (i.e., non-enzymatically) or enzymatically synthesized and/or attached to a nucleic acid molecule.

例如,抗反向帽類似物(ARCA)帽含有藉由5 '-5 '-三磷酸酯基連接之兩個鳥嘌呤,其中一個鳥嘌呤含有N7-甲基以及3'-0-甲基(亦即,N7,3'-0-二甲基-鳥苷-5'-三磷酸-5'-鳥苷(m 7G-3'mppp-G;其可等效地稱為3'O-Me-m7G(5') ppp(5')G)。另一未經修飾之鳥嘌呤的3'-0原子連接至加帽核酸分子(例如mRNA)之5'末端核苷酸。N7-及3'-0-甲基化鳥嘌呤提供加帽核酸分子(例如mRNA)之末端部分。 For example, the anti-reverse cap analog (ARCA) cap contains two guanines linked by a 5'-5'-triphosphate group, one of which contains an N7-methyl group and a 3'-0-methyl group (i.e., N7,3'-0-dimethyl-guanosine-5'-triphosphate-5'-guanosine ( m7G -3'mppp-G; which can be equivalently referred to as 3'O-Me-m7G(5') ppp(5')G). The 3'-0 atom of the other unmodified guanine is linked to the 5'-terminal nucleotide of the capped nucleic acid molecule (e.g., mRNA). The N7- and 3'-0-methylated guanines provide the terminal portion of the capped nucleic acid molecule (e.g., mRNA).

另一例示性帽為mCAP,其類似於ARCA,但在鳥苷上具有2'-0-甲基(亦即,N7,2'-0-二甲基-鳥苷-5'-三磷酸-5'-鳥苷、m 7Gm-ppp-G)。 Another exemplary cap is mCAP, which is similar to ARCA but has a 2'-0-methyl group on guanosine (ie, N7,2'-0-dimethyl-guanosine-5'-triphosphate-5'-guanosine, m7Gm -ppp-G).

雖然帽類似物允許在活體外轉錄反應中同時對核酸分子進行加帽,但高達20%之轉錄本可保持不加帽。這以及帽類似物與由內源性細胞轉錄機器產生之核酸的內源性5 '-帽結構之結構差異可導致轉譯勝任性降低及細胞穩定性降低。Although cap analogs allow simultaneous capping of nucleic acid molecules during in vitro transcription reactions, up to 20% of transcripts may remain uncapped. This, along with structural differences between cap analogs and the endogenous 5'-cap structure of nucleic acids produced by the endogenous cellular transcription machinery, can lead to reduced translational competence and reduced cellular stability.

mRNA亦可使用酶進行轉錄後加帽,以便生成更真實5'-帽結構。如本文所用,措辭「更真實」係指在結構上或功能上密切反映或模擬內源或野生型特徵之特徵。亦即,「更真實」特徵如與先前技術之合成特徵或類似物等相比,更能代表內源性、野生型、天然或生理細胞功能及/或結構,或其在一或多個方面勝過相應的內源性、野生型、天然或生理特徵。更真實5 '帽結構之非限制性實例為如下彼等結構,如與此項技術中已知之合成5 '帽結構(或野生型、天然或生理5 '帽結構)相比,該等結構尤其具有增強的帽結合蛋白之結合、增加之半衰期、降低的對5'核酸內切酶之易感性及/或減少之5'去帽。例如,重組牛痘病毒加帽酶及重組2'-0-甲基轉移酶可在mRNA之5 '末端核苷酸與鳥嘌呤帽核苷酸之間產生規範5 '-5 '-三磷酸鍵聯,其中該帽鳥嘌呤含有N7甲基化且mRNA之5 '末端核苷酸含有2'-0-甲基。此類結構稱為Capl結構。如與例如此項技術中已知之其他5 '帽類似物結構相比,此帽導致較高之轉譯勝任性及細胞穩定性以及減少之細胞促發炎細胞介素活化。帽結構包括但不限於7mG(5 *)ppp(5 *)N,pN2p (帽0)、7mG(5 *)ppp(5 *)NlmpNp (帽1)及7mG(5 *)-ppp(5')NlmpN2mp (帽2)。 mRNA can also be capped after transcription using enzymes to generate more realistic 5'-cap structures. As used herein, the term "more realistic" refers to features that closely reflect or mimic endogenous or wild-type features in structure or function. That is, "more realistic" features are more representative of endogenous, wild-type, natural or physiological cellular functions and/or structures, such as compared to synthetic features or analogs of the prior art, or they outperform the corresponding endogenous, wild-type, natural or physiological features in one or more aspects. Non-limiting examples of more realistic 5' cap structures are those structures that, among other things, have enhanced binding of cap-binding proteins, increased half-life, reduced susceptibility to 5' endonucleases and/or reduced 5' decapping, such as compared to synthetic 5' cap structures (or wild-type, natural or physiological 5' cap structures) known in the art. For example, recombinant vaccinia virus capping enzymes and recombinant 2'-0-methyltransferases can generate a canonical 5'-5'-triphosphate linkage between the 5' terminal nucleotide of an mRNA and a guanine cap nucleotide, wherein the cap guanine contains an N7 methylation and the 5' terminal nucleotide of the mRNA contains a 2'-0-methyl group. Such structures are referred to as Capl structures. This cap results in higher translational competence and cell stability and reduced activation of cellular pro-inflammatory cytokines, as compared to other 5' cap analog structures known in the art, for example. Cap structures include, but are not limited to, 7mG(5 * )ppp(5 * )N,pN2p (cap 0), 7mG(5 * )ppp(5 * )NlmpNp (cap 1), and 7mG(5 * )-ppp(5')NlmpN2mp (cap 2).

在一些實施例中,5'末端帽可包括內源性帽或帽類似物。In some embodiments, the 5' terminal cap may include an endogenous cap or a cap analog.

在一些實施例中,5'末端帽可包含鳥嘌呤類似物。可用之鳥嘌呤類似物包括但不限於肌苷、Nl-甲基-鳥苷、2'氟-鳥苷、7-去氮雜-鳥苷、8-側氧基-鳥苷、2-胺基-鳥苷、LNA-鳥苷及2-疊氮基-鳥苷。 IRES 序列 In some embodiments, the 5' end cap may comprise a guanine analog. Useful guanine analogs include, but are not limited to, inosine, Nl-methyl-guanosine, 2'fluoro-guanosine, 7-deaza-guanosine, 8-oxo-guanosine, 2-amino-guanosine, LNA-guanosine, and 2-azido-guanosine. IRES sequence

在各個實施例中,本文所述之基於LNP之基因編輯系統、RNA治療劑及其醫藥組合物的mRNA有效載荷可包含一或多個IRES序列。In various embodiments, the mRNA payload of the LNP-based gene editing system, RNA therapeutic agent, and pharmaceutical composition thereof described herein may comprise one or more IRES sequences.

在一些實施例中,mRNA可含有內部核糖體進入位點(IRES)。IRES最初經鑑定為微小RNA病毒RNA之一種特徵,在5'帽結構不存在之情況下,在起始蛋白質合成中發揮重要作用。IRES可充當唯一的核糖體結合位點,或可充當mRNA之多個核糖體結合位點之一。含有超過一個功能性核糖體結合位點之mRNA可編碼由核糖體獨立轉譯之數種肽或多肽。可使用之IRES序列的非限制性實例包括但不限於來自微小RNA病毒(例如,FMDV)、瘟病毒(CFFV)、脊髓灰白質炎病毒(PV)、腦心肌炎病毒(ECMV)、口蹄疫病毒(FMDV)、C型肝炎病毒(HCV)、經典豬瘟病毒(CSFV)、鼠科動物白血病病毒(MLV)、猿免疫缺乏病毒(SIV)或蟋蟀麻痺病毒(CrPV)之彼等IRES序列。In some embodiments, the mRNA may contain an internal ribosome entry site (IRES). IRES was originally identified as a feature of picornavirus RNAs and plays an important role in initiating protein synthesis in the absence of a 5' cap structure. IRES can serve as the only ribosome binding site, or can serve as one of multiple ribosome binding sites for an mRNA. An mRNA containing more than one functional ribosome binding site can encode several peptides or polypeptides that are independently translated by the ribosome. Non-limiting examples of IRES sequences that can be used include, but are not limited to, those from picornaviruses (e.g., FMDV), pestiviruses (CFFV), polioviruses (PV), encephalomyocarditis virus (ECMV), foot-and-mouth disease virus (FMDV), hepatitis C virus (HCV), classical swine fever virus (CSFV), murine leukemia virus (MLV), simian immunodeficiency virus (SIV), or cricket paralysis virus (CrPV).

在一些實施例中,IRES來自Taura症候群病毒、錐蝽病毒、泰勒氏腦脊髓炎病毒、猿病毒40、紅火蟻病毒1、稻麥蚜病毒、網狀內皮增生病毒、人類脊髓灰白質炎病毒1、珀椿腸病毒、喀什米爾蜜蜂病毒、人類鼻病毒2、草翅葉蟬病毒-1、人類免疫缺乏病毒1型、草翅葉蟬病毒-1、Himetobi P病毒、C型肝炎病毒、A型肝炎病毒、GB型肝炎病毒、口蹄疫病毒、人類腸病毒71、馬鼻炎病毒、茶尺蠖微小RNA病毒樣病毒、腦心肌炎病毒、果蠅C病毒、人類柯薩奇病毒B3、十字花科煙草花葉病毒、蟋蟀麻痺病毒、牛病毒性腹瀉病毒1、黑皇后細胞病毒、蚜蟲致死性麻痺病毒、禽腦脊髓炎病毒、急性蜜蜂麻痺病毒、木槿褪綠環斑病毒、經典豬瘟病毒、人類FGF2、人類SFTPA1、人類AML1/RUNX1、果蠅觸角足、人類AQP4、人類AT1R、人類BAG-1、人類BCL2、人類BiP、人類c-IAP1、人類c-myc、人類eIF4G、小鼠NDST4L、人類LEF1、小鼠HIF1α、人類n.myc、小鼠Gtx、人類p27kip1、人類PDGF2/c-sis、人類p53、人類Pim-1、小鼠Rbm3、果蠅reaper、犬Scamper、果蠅Ubx、人類UNR、小鼠UtrA、人類VEGF-A、人類XIAP、果蠅hairless、釀酒酵母TFIID、釀酒酵母YAP1、煙草蝕刻病毒、蕪菁皺縮病毒、EMCV-A、EMCV-B、EMCV-Bf、EMCV-Cf、EMCV pEC9、微小雙節RNA病毒、HCV QC64、人類科薩病毒E/D、人類科薩病毒F、人類科薩病毒JMY、鼻病毒NAT001、HRV14、HRV89、HRVC-02、HRV-A21、薩比亞病毒A SH1、薩比亞病毒FHB、薩比亞病毒NG-J1、人類副腸孤病毒1、Crohivirus B、Yc-3、Rosavirus M-7、Shanbavirus A、Pasivirus A、Pasivirus A 2、埃可病毒E14、人類副腸孤病毒5、愛知病毒、A型肝炎病毒HA16、Phopivirus、CVA10、腸病毒C、腸病毒D、腸病毒J、人類Pegivirus 2、GBV-C GT110、GBV-C K1737、GBV-C Iowa、Pegivirus A 1220、Pasivirus A 3、薩佩羅病毒、Rosavirus B、Bakunsa病毒、Tremovirus A、豬Pasivirus 1、PLV-CHN、Pasivirus A、Sicinivirus、Hepacivirus K、Hepacivirus A、BVDV1、邊界病病毒、BVDV2、CSFV-PK15C、SF573 Dicistrovirus、湖北微小RNA病毒樣病毒、CRPV、唾液病毒A BNS、唾液病毒A BN2、唾液病毒A 02394、唾液病毒A GUT、唾液病毒A CH、唾液病毒A SZ1、唾液病毒FHB、CVB3、CVB1、埃可病毒7、CVBS、EVA71、CVA3、CVA12、EV24或eIF4G適體。 聚A尾及3’穩定區 In some embodiments, the IRES is from Taura syndrome virus, cone bug virus, Theilerian encephalomyelitis virus, simian virus 40, red fire ant virus 1, rice aphid virus, reticuloendotheliosis virus, human poliovirus 1, Persian enterovirus, cashmere honey bee virus, human rhinovirus 2, leafhopper virus-1, human immunodeficiency virus type 1, leafhopper virus-1, Himetobi P virus, hepatitis C virus, hepatitis A virus, GB hepatitis virus, foot-and-mouth disease virus, human enterovirus 71, equine rhinitis virus, tea geometrid microRNA virus-like virus, encephalomyocarditis virus, fruit fly C virus, human coxsackievirus B3, cruciferous tobacco mosaic virus, cricket paralysis virus, bovine viral diarrhea virus 1, black queen cell virus, aphid lethal paralysis virus, avian encephalomyelitis virus, acute bee paralysis virus, hibiscus chlorotic ringspot virus, classical swine fever virus, human FGF2, human SFTPA1, human AML1/RUNX1, fruit fly tentacles, human AQP4, human AT1R, human BAG-1, human BCL2, human BiP, human c -IAP1, human c-myc, human eIF4G, mouse NDST4L, human LEF1, mouse HIF1α, human n.myc, mouse Gtx, human p27kip1, human PDGF2/c-sis, human p53, human Pim-1, mouse Rbm3, fruit fly reaper, canine Scamper, fruit fly Ubx, human UNR, mouse UtrA, human VEGF-A, human XIAP, fruit fly hairless, brewing yeast TFIID, brewing yeast YAP1, tobacco etch virus, cyanocobalamin virus, EMCV-A, EMCV-B, EMCV-Bf, EMCV-Cf, EMCV pEC9, Picovirus, HCV QC64, Human Cossavirus E/D, Human Cossavirus F, Human Cossavirus JMY, Rhinovirus NAT001, HRV14, HRV89, HRVC-02, HRV-A21, Sabiavirus A SH1, Sabiavirus FHB, Sabiavirus NG-J1, Human coleus virus 1, Crohivirus B, Yc-3, Rosavirus M-7, Shanbavirus A, Pasivirus A, Pasivirus A 2, Echovirus E14, Human coleus virus 5, Aichi virus, Hepatitis A virus HA16, Phopivirus, CVA10, Enterovirus C, Enterovirus D, Enterovirus J, Human Pegivirus 2, GBV-C GT110, GBV-C K1737, GBV-C Iowa, Pegivirus A 1220, Pasivirus A 3, Sapelovirus, Rosavirus B, Bakunsa virus, Tremovirus A, Porcine Pasivirus 1, PLV-CHN, Pasivirus A, Sicinivirus, Hepacivirus K, Hepacivirus A, BVDV1, Border Disease Virus, BVDV2, CSFV-PK15C, SF573 Dicistrovirus, Hubei Picornaviral Virus, CRPV, Salivirus A BNS, Salivirus A BN2, Salivirus A 02394, Salivirus A GUT, Salivirus A CH, Salivirus A SZ1, Salivirus FHB, CVB3, CVB1, Echovirus 7, CVBS, EVA71, CVA3, CVA12, EV24 or eIF4G aptamer. Poly A tail and 3' stabilizing region

在各個實施例中,本文所述之基於LNP之基因編輯系統、RNA治療劑及其醫藥組合物的mRNA有效載荷可包含聚A尾。In various embodiments, the mRNA payload of the LNP-based gene editing system, RNA therapeutic agent, and pharmaceutical composition thereof described herein may include a poly A tail.

在RNA加工期間,可將長鏈腺嘌呤核苷酸(聚A尾)添加至多核苷酸(諸如mRNA分子)中以增加穩定性。可在轉錄後立即使轉錄本之3'端裂解以釋放3'羥基。接著,聚A聚合酶將腺嘌呤核苷酸鏈添加至游離3'羥基端。該過程稱為多腺苷酸化,添加一定長度之聚A尾。During RNA processing, long chains of adenine nucleotides (poly A tails) can be added to polynucleotides (such as mRNA molecules) to increase stability. Immediately after transcription, the 3' end of the transcript can be cleaved to release the 3' hydroxyl group. Poly A polymerase then adds chains of adenine nucleotides to the free 3' hydroxyl end. This process is called polyadenylation, and a certain length of poly A tail is added.

在一些實施例中,聚A尾之長度大於30個核苷酸長。在另一實施例中,聚A尾大於35個核苷酸長(例如,至少或大於約35、40、45、50、55、60、70、80、90、100、120、140、160、180、200、250、300、350、400、450、500、600、700、800、900、1,000、1,100、1,200、1,300、1,400、1,500、1,600、1,700、1,800、1,900、2,000、2,500及3,000個核苷酸)且不超過約50、100、200、300、400、500、600、700、800、900、1000、2000或3000個核苷酸長。在一些實施例中,mRNA包括約30至約3,000個核苷酸(例如,30至50、30至100、30至250、30至500、30至750、30至1,000、30至1,500、30至2,000、30至2,500、50至100、50至250、50至500、50至750、50至1,000、50至1,500、50至2,000、50至2,500、50至3,000、100至500、100至750、100至1,000、100至1,500、100至2,000、100至2,500、100至3,000、500至750、500至1,000、500至1,500、500至2,000、500至2,500、500至3,000、1,000至1,500、1,000至2,000、1,000至2,500、1,000至3,000、1,500至2,000、1,500至2,500、1,500至3,000、2,000至3,000、2,000至2,500及2,500至3,000個)之聚A尾。In some embodiments, the length of the poly A tail is greater than 30 nucleotides. In another embodiment, the poly A tail is greater than 35 nucleotides long (e.g., at least or greater than about 35, 40, 45, 50, 55, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1,000, 1,100, 1,200, 1,300, 1,400, 1,500, 1,600, 1,700, 1,800, 1,900, 2,000, 2,100, 2,200, 2,300, 3,400, 4,500, 5 700, 800, 900, 1000, 2000, or 3000 nucleotides in length. In some embodiments, the mRNA includes about 30 to about 3,000 nucleotides (e.g., 30 to 50, 30 to 100, 30 to 250, 30 to 500, 30 to 750, 30 to 1,000, 30 to 1,500, 30 to 2,000, 30 to 2,500, 50 to 100, 50 to 250, 50 to 500, 50 to 750, 50 to 1,000, 50 to 1,500, 50 to 2,000, 50 to 2,500, 50 to 3,000, 100 to 500, 100 to 750, 100 to 1,000, 100 to 1,500, 100 to 1 from 1,000 to 2,000, 100 to 2,500, 100 to 3,000, 500 to 750, 500 to 1,000, 500 to 1,500, 500 to 2,000, 500 to 2,500, 500 to 3,000, 1,000 to 1,500, 1,000 to 2,000, 1,000 to 2,500, 1,000 to 3,000, 1,500 to 2,000, 1,500 to 2,500, 1,500 to 3,000, 2,000 to 3,000, 2,000 to 2,500, and 2,500 to 3,000).

在一些實施例中,聚A尾相對於總mRNA之長度經設計。此設計可基於編碼相關標靶之區域的長度、特定特徵或區域(諸如側接區域)之長度或基於自mRNA表現之最終產物的長度。In some embodiments, the length of the poly A tail relative to the total mRNA is designed. This design can be based on the length of the region encoding the target of interest, the length of a particular feature or region (such as a flanking region), or based on the length of the final product expressed from the mRNA.

在此情況下,聚A尾之長度可比mRNA或其特徵大10、20、30、40、50、60、70、80、90或100%。聚A尾亦可經設計為其所屬mRNA之一部分。在此情況下,聚A尾可為構築體之總長度或構築體之總長度減去聚A尾之10、20、30、40、50、60、70、80或90%或更高。此外,針對聚A結合蛋白之經工程改造之結合位點及mRNA結合可增強表現。In this case, the length of the poly A tail can be 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100% greater than the mRNA or its feature. The poly A tail can also be designed as part of the mRNA to which it belongs. In this case, the poly A tail can be 10, 20, 30, 40, 50, 60, 70, 80 or 90% or more of the total length of the construct or the total length of the construct minus the poly A tail. In addition, engineered binding sites and mRNA binding for poly A binding proteins can enhance expression.

另外,可使用聚A尾之3 '末端之經修飾核苷酸藉由3'端將多種不同mRNA一起連接至PABP (聚A結合蛋白)。轉染實驗可在相關細胞株中進行,且可在轉染後12 h、24 h、48 h、72 h及第7天藉由ELISA分析蛋白質產生。Alternatively, multiple different mRNAs can be linked together to PABP (poly A binding protein) via the 3' end using modified nucleotides at the 3' end of the poly A tail. Transfection experiments can be performed in relevant cell lines and protein production can be analyzed by ELISA at 12 h, 24 h, 48 h, 72 h and 7 days after transfection.

在一些實施例中,mRNA經設計以包括聚A-G四分體。G-四分體為四個鳥嘌呤核苷酸之環狀氫鍵結陣列,其可由DNA及RNA中之富G序列形成。在此實施例中,G-四分體併入聚A尾之末端。 終止密碼子 In some embodiments, the mRNA is designed to include a poly-AG tetrad. A G-tetrad is a cyclic hydrogen-bonded array of four guanine nucleotides that can be formed by G-rich sequences in DNA and RNA. In this embodiment, the G-tetrad is incorporated into the end of the poly-A tail. Stop codon

在各個實施例中,本文所述之基於LNP之基因編輯系統、RNA治療劑及其醫藥組合物的mRNA有效載荷可包含一或多種轉譯終止密碼子。轉譯終止密碼子UAA、UAG及UGA為遺傳密碼之重要組分且發出mRNA轉譯終止之信號。在蛋白質合成期間,終止密碼子與蛋白質釋放因子相互作用且此相互作用可調節核糖體活性,由此影響轉譯(Tate WP等人, (2018) Biochem Soc Trans, 46(6):1615-162)。In various embodiments, the mRNA payload of the LNP-based gene editing system, RNA therapeutic agent, and pharmaceutical composition thereof described herein may include one or more translational termination codons. The translational termination codons UAA, UAG, and UGA are important components of the genetic code and signal the termination of mRNA translation. During protein synthesis, the termination codon interacts with the protein release factor and this interaction can regulate ribosome activity, thereby affecting translation (Tate WP et al., (2018) Biochem Soc Trans, 46(6):1615-162).

如本文所用,終止元件係指包含終止密碼子之核酸序列。在DNA之情況下,終止密碼子可選自TGA、TAA及TAG,或在RNA之情況下,可選自UGA、UAA及UAG。在一實施例中,終止元件包含兩個連續終止密碼子。在一實施例中,終止元件包含三個連續終止密碼子。在一實施例中,終止元件包含四個連續終止密碼子。在一實施例中,終止元件包含五個連續終止密碼子。As used herein, a termination element refers to a nucleic acid sequence comprising a stop codon. In the case of DNA, the stop codon may be selected from TGA, TAA and TAG, or in the case of RNA, may be selected from UGA, UAA and UAG. In one embodiment, the termination element comprises two consecutive stop codons. In one embodiment, the termination element comprises three consecutive stop codons. In one embodiment, the termination element comprises four consecutive stop codons. In one embodiment, the termination element comprises five consecutive stop codons.

在一些實施例中,mRNA可包括一個終止密碼子。在一些實施例中,mRNA可包括兩個終止密碼子。在一些實施例中,mRNA可包括三個終止密碼子。在一些實施例中,mRNA可包括至少一個終止密碼子。在一些實施例中,mRNA可包括至少兩個終止密碼子。在一些實施例中,mRNA可包括至少三個終止密碼子。作為非限制性實例,終止密碼子可選自TGA、TAA及TAG。In some embodiments, the mRNA may include one stop codon. In some embodiments, the mRNA may include two stop codons. In some embodiments, the mRNA may include three stop codons. In some embodiments, the mRNA may include at least one stop codon. In some embodiments, the mRNA may include at least two stop codons. In some embodiments, the mRNA may include at least three stop codons. As a non-limiting example, the stop codon may be selected from TGA, TAA and TAG.

在其他實施例中,終止密碼子可選自表Z之以下終止元件中之一或多者: 表Z:線性mRNA之額外終止元件 核苷酸序列(5’至3’) 序列標識符 UGAUAAUAG SEQ ID NO: 19454 UAAUAGUAA SEQ ID NO: 19455 UAAGUCUAA SEQ ID NO: 19456 UAAAGCUAA SEQ ID NO: 19457 UAAGUCUCC SEQ ID NO: 19458 UAAGGCUAA SEQ ID NO: 19459 UAAGCCCCUCCGGGG SEQ ID NO: 19460 UAAAGCUCCCCGGGG SEQ ID NO: 19461 UAAGCCCCU SEQ ID NO: 19462 UAAAGCUCC SEQ ID NO: 19463 UAAAGCUCC SEQ ID NO: 19464 UAGGGUUAA SEQ ID NO: 19465 UAAGCACCC SEQ ID NO: 19466 UGAUAGUAA SEQ ID NO: 19467 UAAAGCGCU SEQ ID NO: 19468 In other embodiments, the stop codon can be selected from one or more of the following termination elements in Table Z: Table Z: Additional termination elements for linear mRNA Nucleotide sequence (5' to 3') Sequence Identifier UGAUAAUAG SEQ ID NO: 19454 UAAUAGUAA SEQ ID NO: 19455 UAAGUCUAA SEQ ID NO: 19456 UAAAGCUAA SEQ ID NO: 19457 UAAGUCUCC SEQ ID NO: 19458 UAAGGCUAA SEQ ID NO: 19459 UAAGCCCCUCCGGGG SEQ ID NO: 19460 UAAAGCUCCCCGGGG SEQ ID NO: 19461 UAAGCCCCU SEQ ID NO: 19462 UAAAGCUCC SEQ ID NO: 19463 UAAAGCUCC SEQ ID NO: 19464 UAGGGUUAA SEQ ID NO: 19465 UAAGCACCC SEQ ID NO: 19466 UGAUAGUAA SEQ ID NO: 19467 UAAAGCGCU SEQ ID NO: 19468

在一些實施例中,mRNA包括終止密碼子TGA及一個額外終止密碼子。在另一實施例中,額外終止密碼子可為TAA。 微小RNA結合位點及其他調節元件 In some embodiments, the mRNA includes the stop codon TGA and an additional stop codon. In another embodiment, the additional stop codon may be TAA. MicroRNA Binding Sites and Other Regulatory Elements

在各個實施例中,本文所述之基於LNP之基因編輯系統、RNA治療劑及其醫藥組合物的mRNA有效載荷可包含一或多種調節元件,包括但不限於微小RNA (miRNA)結合位點、結構化mRNA序列及/或模體、結合於內源性核酸結合分子之人工結合位點及其組合。 未經化學修飾之核苷酸 In various embodiments, the mRNA payload of the LNP-based gene editing systems, RNA therapeutics, and pharmaceutical compositions thereof described herein may comprise one or more regulatory elements, including but not limited to microRNA (miRNA) binding sites, structured mRNA sequences and/or motifs, artificial binding sites that bind to endogenous nucleic acid binding molecules, and combinations thereof .

在一些實施例中,本文所述之基於LNP之基因編輯系統、RNA治療劑及其醫藥組合物的mRNA有效載荷未經化學修飾且包含由腺苷、鳥苷、胞嘧啶及尿苷組成之標準核糖核苷酸。在一些實施例中,本揭示案之核苷酸及核苷包含標準核苷殘基,諸如存在於經轉錄RNA中之彼等(例如,A、G、C或U)。在一些實施例中,本揭示案之核苷酸及核苷包含標準去氧核糖核苷,諸如存在於DNA中之彼等(例如,dA、dG、dC或dT)。 經化學修飾之核苷酸 In some embodiments, the mRNA payload of the LNP-based gene editing systems, RNA therapeutics, and pharmaceutical compositions thereof described herein is not chemically modified and comprises standard ribonucleotides composed of adenosine, guanosine, cytosine, and uridine. In some embodiments, the nucleotides and nucleosides of the present disclosure comprise standard nucleoside residues, such as those present in transcribed RNA (e.g., A, G, C, or U). In some embodiments, the nucleotides and nucleosides of the present disclosure comprise standard deoxyribonucleosides, such as those present in DNA (e.g., dA, dG, dC, or dT). Chemically Modified Nucleotides

在一些實施例中,本文所述之基於LNP之基因編輯系統、RNA治療劑及其醫藥組合物的mRNA有效載荷包含(在一些實施例中包含)至少一種化學修飾。In some embodiments, the mRNA payload of the LNP-based gene editing systems, RNA therapeutics, and pharmaceutical compositions thereof described herein comprises (in some embodiments comprises) at least one chemical modification.

術語「化學修飾」及「經化學修飾」係指相對於腺苷(A)、鳥苷(G)、尿苷(U)、胸苷(T)或胞苷(C)核糖核苷或去氧核糖核苷在其位置、模式、百分比或群體中之至少一方面進行之修飾。一般而言,此等術語並非指天然存在之5'末端mRNA帽部分中之核糖核苷酸修飾。就多肽而言,術語「修飾」係指相對於規範集合20種胺基酸之修飾。若如本文所提供之多肽含有胺基酸取代、插入或取代及插入之組合,則其亦被視為「經修飾」。The terms "chemical modification" and "chemically modified" refer to modifications in at least one aspect of the position, pattern, percentage or group of ribonucleosides or deoxyribonucleosides relative to adenosine (A), guanosine (G), uridine (U), thymidine (T) or cytidine (C). Generally, these terms do not refer to naturally occurring ribonucleotide modifications in the 5' terminal mRNA cap portion. With respect to polypeptides, the term "modification" refers to modifications relative to the canonical set of 20 amino acids. A polypeptide as provided herein is also considered "modified" if it contains amino acid substitutions, insertions, or a combination of substitutions and insertions.

在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含各種(超過一種)不同修飾。在一些實施例中,多核苷酸之特定區域含有一種、兩種或兩種以上(視情況不同)核苷或核苷酸修飾。在一些實施例中,相對於未經修飾之多核苷酸,引入細胞或生物體中的經修飾之RNA多核苷酸(例如,經修飾之mRNA多核苷酸)分別在細胞或生物體中展現減少之降解。在一些實施例中,引入細胞或生物體中的經修飾之RNA多核苷酸(例如,經修飾之mRNA多核苷酸)可分別在細胞或生物體中展現降低之免疫原性(例如,降低之先天反應)。In some embodiments, a polynucleotide (e.g., an RNA polynucleotide, such as an mRNA polynucleotide) comprises a variety (more than one) different modifications. In some embodiments, a specific region of a polynucleotide contains one, two, or more (as the case may be) nucleoside or nucleotide modifications. In some embodiments, a modified RNA polynucleotide (e.g., a modified mRNA polynucleotide) introduced into a cell or an organism exhibits reduced degradation in the cell or organism, respectively, relative to an unmodified polynucleotide. In some embodiments, a modified RNA polynucleotide (e.g., a modified mRNA polynucleotide) introduced into a cell or an organism may exhibit reduced immunogenicity (e.g., reduced innate response) in the cell or organism, respectively.

多核苷酸之修飾包括但不限於本文所述之彼等。多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)可包含天然存在、非天然存在之修飾,或多核苷酸可包含天然存在及非天然存在之修飾的組合。多核苷酸可包括任何可用之修飾,例如糖、核鹼基或核苷間鍵聯(例如,連接磷酸酯、磷酸二酯鍵聯或磷酸二酯主鏈)之修飾。Modifications of polynucleotides include, but are not limited to, those described herein. Polynucleotides (e.g., RNA polynucleotides, such as mRNA polynucleotides) may include naturally occurring modifications, non-naturally occurring modifications, or a combination of naturally occurring and non-naturally occurring modifications. Polynucleotides may include any useful modification, such as modifications of sugars, nucleobases, or internucleoside linkages (e.g., attachment to phosphates, phosphodiester linkages, or phosphodiester backbones).

在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含在多核苷酸之合成期間或合成後引入的非天然修飾之核苷酸以實現所需功能或特性。修飾可存在於核苷酸間鍵聯、嘌呤或嘧啶鹼基或糖上。修飾可藉由化學合成或藉由聚合酶引入鏈末端或鏈中之別處。多核苷酸之任何區域均可經化學修飾。In some embodiments, a polynucleotide (e.g., an RNA polynucleotide, such as an mRNA polynucleotide) comprises non-naturally modified nucleotides introduced during or after the synthesis of the polynucleotide to achieve a desired function or property. The modification may be present at the internucleotide linkage, the purine or pyrimidine base, or the sugar. The modification may be introduced by chemical synthesis or by a polymerase at the end of the chain or elsewhere in the chain. Any region of a polynucleotide may be chemically modified.

本揭示案提供多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)之經修飾核苷及核苷酸。「核苷」係指含有與有機鹼(例如,嘌呤或嘧啶)或其衍生物(本文中亦稱為「核鹼基」)組合之糖分子(例如,戊糖或核糖)或其衍生物的化合物。「核苷酸」係指包括磷酸酯基之核苷。經修飾核苷酸可藉由任何可用方法,例如以化學方式、酶促或以重組方式合成,以包括一或多種經修飾或非天然核苷。多核苷酸可包含連接核苷之一或多個區域。此類區域可具有可變主鏈鍵聯。該等鍵聯可為標準磷酸二酯鍵聯,在該情況下,多核苷酸將包含核苷酸區域。The present disclosure provides modified nucleosides and nucleotides of polynucleotides (e.g., RNA polynucleotides, such as mRNA polynucleotides). "Nucleoside" refers to a compound containing a sugar molecule (e.g., pentose or ribose) or its derivatives combined with an organic base (e.g., purine or pyrimidine) or its derivatives (also referred to herein as "nucleobase"). "Nucleotide" refers to a nucleoside that includes a phosphate group. Modified nucleotides can be synthesized by any available method, such as chemically, enzymatically, or recombinantly to include one or more modified or non-natural nucleosides. A polynucleotide may include one or more regions linking nucleosides. Such regions may have variable backbone linkages. Such linkages may be standard phosphodiester linkages, in which case the polynucleotide will include nucleotide regions.

經修飾之核苷酸鹼基配對不僅涵蓋標準腺苷-胸腺嘧啶、腺苷-尿嘧啶或鳥苷-胞嘧啶鹼基對,而且涵蓋核苷酸及/或包含非標準或經修飾鹼基之經修飾核苷酸之間形成的鹼基對,其中氫鍵供體及氫鍵受體之排列允許在非標準鹼基與標準鹼基之間或在兩個互補之非標準鹼基結構之間形成氫鍵結。此類非標準鹼基配對之一實例為經修飾核苷酸肌苷與腺嘌呤、胞嘧啶或尿嘧啶之間的鹼基配對。鹼基/糖或連接體之任何組合均可併入本揭示案之多核苷酸中。Modified nucleotide base pairing encompasses not only standard adenosine-thymine, adenosine-uracil or guanosine-cytosine base pairs, but also base pairs formed between nucleotides and/or modified nucleotides comprising non-standard or modified bases, wherein the arrangement of hydrogen bond donors and hydrogen bond acceptors allows hydrogen bond formation between a non-standard base and a standard base or between two complementary non-standard base structures. An example of such non-standard base pairing is base pairing between the modified nucleotides inosine and adenine, cytosine or uracil. Any combination of base/sugar or linker can be incorporated into the polynucleotides of the present disclosure.

在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包括至少兩種(例如,2、3、4種或更多種)前述經修飾核鹼基之組合。In some embodiments, a polynucleotide (eg, an RNA polynucleotide, such as an mRNA polynucleotide) includes a combination of at least two (eg, 2, 3, 4 or more) of the aforementioned modified nucleobases.

在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)中之經修飾核鹼基選自由以下組成之群:假尿苷(ψ)、N1-甲基假尿苷(m 1ψ)、N1-乙基假尿苷、2-硫代尿苷、4′-硫代尿苷、5-甲基胞嘧啶、2-硫代-1-甲基-1-去氮雜-假尿苷、2-硫代-1-甲基-假尿苷、2-硫代-5-氮雜-尿苷、2-硫代-二氫假尿苷、2-硫代-二氫尿苷、2-硫代-假尿苷、4-甲氧基-2-硫代-假尿苷、4-甲氧基-假尿苷、4-硫代-1-甲基-假尿苷、4-硫代-假尿苷、5-氮雜-尿苷、二氫假尿苷、5-甲氧基尿苷及2′-O-甲基尿苷。在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包括至少兩種(例如,2、3、4種或更多種)前述經修飾核鹼基之組合。 In some embodiments, the modified nucleobase in a polynucleotide (eg, an RNA polynucleotide, such as an mRNA polynucleotide) is selected from the group consisting of pseudouridine (ψ), N1-methylpseudouridine (m 1 ψ), N1-ethylpseudouridine, 2-thiouridine, 4′-thiouridine, 5-methylcytosine, 2-thiol-1-methyl-1-deaza-pseudouridine, 2-thiol-1-methyl-pseudouridine, 2-thiol-5-aza-uridine, 2-thiol-dihydropseudouridine, 2-thiol-dihydrouridine, 2-thiol-pseudouridine, 4-methoxy-2-thiol-pseudouridine, 4-methoxy-pseudouridine, 4-thiol-1-methyl-pseudouridine, 4-thiol-pseudouridine, 5-aza-uridine, dihydropseudouridine, 5-methoxyuridine and 2′-O-methyluridine. In some embodiments, a polynucleotide (eg, an RNA polynucleotide, such as an mRNA polynucleotide) includes a combination of at least two (eg, 2, 3, 4 or more) of the aforementioned modified nucleobases.

在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)中之經修飾核鹼基選自由以下組成之群:1-甲基-假尿苷(m 1ψ)、5-甲氧基-尿苷(mo 5U)、5-甲基-胞苷(m 5C)、假尿苷(ψ)、α-硫代-鳥苷及α-硫代-腺苷。在一些實施例中,多核苷酸包括至少兩種(例如,2、3、4種或更多種)前述經修飾核鹼基之組合。 In some embodiments, the modified nucleobase in a polynucleotide (e.g., an RNA polynucleotide, such as an mRNA polynucleotide) is selected from the group consisting of 1-methyl-pseudouridine (m 1 ψ), 5-methoxy-uridine (mo 5 U), 5-methyl-cytidine (m 5 C), pseudouridine (ψ), α-thio-guanosine, and α-thio-adenosine. In some embodiments, the polynucleotide comprises a combination of at least two (e.g., 2, 3, 4 or more) of the aforementioned modified nucleobases.

在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含假尿苷(ψ)及5-甲基-胞苷(m 5C)。在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含1-甲基-假尿苷(m 1ψ)。在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含1-甲基-假尿苷(m 1ψ)及5-甲基-胞苷(m 5C)。在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含2-硫代尿苷(s 2U)。在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含2-硫代尿苷及5-甲基-胞苷(m 5C)。在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含甲氧基-尿苷(mo 5U)。在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含5-甲氧基-尿苷(mo 5U)及5-甲基-胞苷(m 5C)。在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含2′-O-甲基尿苷。在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含2′-O-甲基尿苷及5-甲基-胞苷(m 5C)。在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含N6-甲基-腺苷(m 6A)。在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)包含N6-甲基-腺苷(m 6A)及5-甲基-胞苷(mC)。 In some embodiments, the polynucleotide (e.g., RNA polynucleotide, such as mRNA polynucleotide) comprises pseudouridine (ψ) and 5-methyl-cytidine (m 5 C). In some embodiments, the polynucleotide (e.g., RNA polynucleotide, such as mRNA polynucleotide) comprises 1-methyl-pseudouridine (m 1 ψ). In some embodiments, the polynucleotide (e.g., RNA polynucleotide, such as mRNA polynucleotide) comprises 1-methyl-pseudouridine (m 1 ψ) and 5-methyl-cytidine (m 5 C). In some embodiments, the polynucleotide (e.g., RNA polynucleotide, such as mRNA polynucleotide) comprises 2-thiouridine (s 2 U). In some embodiments, the polynucleotide (e.g., RNA polynucleotide, such as mRNA polynucleotide) comprises 2-thiouridine and 5-methyl-cytidine (m 5 C). In some embodiments, the polynucleotide (e.g., RNA polynucleotide, such as mRNA polynucleotide) comprises methoxy-uridine (mo 5 U). In some embodiments, the polynucleotide (e.g., RNA polynucleotide, such as mRNA polynucleotide) comprises 5-methoxy-uridine (mo 5 U) and 5-methyl-cytidine (m 5 C). In some embodiments, the polynucleotide (e.g., RNA polynucleotide, such as mRNA polynucleotide) comprises 2′-O-methyluridine. In some embodiments, the polynucleotide (e.g., RNA polynucleotide, such as mRNA polynucleotide) comprises 2′-O-methyluridine and 5-methyl-cytidine (m 5 C). In some embodiments, the polynucleotide (e.g., RNA polynucleotide, such as mRNA polynucleotide) comprises N6-methyl-adenosine (m 6 A). In some embodiments, a polynucleotide (eg, an RNA polynucleotide, such as an mRNA polynucleotide) comprises N6-methyl-adenosine ( m6A ) and 5-methyl-cytidine (mC).

在一些實施例中,多核苷酸(例如RNA多核苷酸,諸如mRNA多核苷酸)經均一修飾(例如,經完全修飾、在整個序列中經修飾)以進行特定修飾。例如,多核苷酸可經5-甲基-胞苷(m 5C)均一修飾,意謂mRNA序列中之所有胞嘧啶殘基均經5-甲基-胞苷(m 5C)置換。同樣,可藉由用經修飾殘基(諸如上文所陳述之彼等殘基)進行置換,針對序列中存在之任一類型之核苷殘基對多核苷酸進行均一修飾。 In some embodiments, a polynucleotide (e.g., an RNA polynucleotide, such as an mRNA polynucleotide) is uniformly modified (e.g., completely modified, modified throughout the sequence) to carry out a specific modification. For example, a polynucleotide can be uniformly modified with 5-methyl-cytidine ( m5C ), meaning that all cytosine residues in the mRNA sequence are replaced with 5-methyl-cytidine ( m5C ). Similarly, a polynucleotide can be uniformly modified for any type of nucleoside residue present in the sequence by replacing it with a modified residue (such as those described above).

具有經修飾胞嘧啶之例示性核鹼基及核苷包括N4-乙醯基-胞苷(ac4C)、5-甲基-胞苷(m5C)、5-鹵基-胞苷(例如,5-碘-胞苷)、5-羥基甲基-胞苷(hm5C)、1-甲基-假異胞苷、2-硫代-胞苷(s2C)及2-硫代-5-甲基-胞苷。Exemplary nucleobases and nucleosides having modified cytosines include N4-acetyl-cytidine (ac4C), 5-methyl-cytidine (m5C), 5-halogen-cytidine (e.g., 5-iodo-cytidine), 5-hydroxymethyl-cytidine (hm5C), 1-methyl-pseudoisocytidine, 2-thio-cytidine (s2C), and 2-thio-5-methyl-cytidine.

在一些實施例中,經修飾核鹼基為經修飾尿苷。例示性核鹼基及在一些實施例中,經修飾核鹼基為經修飾胞嘧啶。具有經修飾尿苷之核苷包括5-氰基尿苷及4′-硫代尿苷。In some embodiments, the modified nucleobase is a modified uridine. Exemplary nucleobases and in some embodiments, the modified nucleobase is a modified cytosine. Nucleosides having modified uridine include 5-cyanouridine and 4′-thiouridine.

本揭示案之多核苷酸可沿著分子之整個長度部分地或完全地經修飾。例如,在本發明之多核苷酸中,或在其既定之預定序列區域中(例如,在包括或排除聚A尾之mRNA中),一或多種或所有或既定類型之核苷酸(例如嘌呤或嘧啶,或A、G、U、C中之任何一或多者或所有)可均一地經修飾。在一些實施例中,本揭示案之多核苷酸中(或其既定序列區域中)之所有核苷酸X均為經修飾核苷酸,其中X可為核苷酸A、G、U、C中之任一者或組合A+G、A+U、A+C、G+U、G+C、U+C、A+G+U、A+G+C、G+U+C或A+G+C中之任一者。The polynucleotides of the present disclosure may be partially or completely modified along the entire length of the molecule. For example, in a polynucleotide of the present invention, or in a given predetermined sequence region thereof (e.g., in an mRNA including or excluding a poly A tail), one or more or all or a given type of nucleotide (e.g., a purine or pyrimidine, or any one or more or all of A, G, U, C) may be uniformly modified. In some embodiments, all nucleotides X in a polynucleotide of the present disclosure (or in a given sequence region thereof) are modified nucleotides, wherein X may be any one of the nucleotides A, G, U, C, or any one of the combinations A+G, A+U, A+C, G+U, G+C, U+C, A+G+U, A+G+C, G+U+C, or A+G+C.

多核苷酸可含有約1%至約100%經修飾核苷酸(相對於總核苷酸含量,或相對於一或多種類型之核苷酸,亦即A、G、U或C中之任何一或多者)或任何居中百分比(例如,1%至20%、1%至25%、1%至50%、1%至60%、1%至70%、1%至80%、1%至90%、1%至95%、10%至20%、10%至25%、10%至50%、10%至60%、10%至70%、10%至80%、10%至90%、10%至95%、10%至100%、20%至25%、20%至50%、20%至60%、20%至70%、20%至80%、20%至90%、20%至95%、20%至100%、50%至60%、50%至70%、50%至80%、50%至90%、50%至95%、50%至100%、70%至80%、70%至90%、70%至95%、70%至100%、80%至90%、80%至95%、80%至100%、90%至95%、90%至100%及95%至100%)。應理解,任何剩餘百分比係由未經修飾之A、G、U或C之存在解釋的。The polynucleotide may contain from about 1% to about 100% modified nucleotides (relative to the total nucleotide content, or relative to one or more types of nucleotides, i.e., any one or more of A, G, U, or C), or any intermediate percentages (e.g., 1% to 20%, 1% to 25%, 1% to 50%, 1% to 60%, 1% to 70%, 1% to 80%, 1% to 90%, 1% to 95%, 10% to 20%, 10% to 25%, 10% to 50%, 10% to 60%, 10% to 70%, 10% to 80%, 10% to 90%, 10% to 95%, 10% to 1 % to 100%, 50% to 60%, 50% to 70%, 50% to 80%, 50% to 90%, 50% to 95%, 50% to 100%, 70% to 80%, 70% to 90%, 70% to 95%, 70% to 100%, 80% to 90%, 80% to 95%, 80% to 100%, 90% to 95%, 90% to 100%, and 95% to 100%). It is understood that any remaining percentages are explained by the presence of unmodified A, G, U, or C.

多核苷酸可含有最少1%且最多100%經修飾核苷酸,或任何居中百分比,諸如至少5%經修飾核苷酸、至少10%經修飾核苷酸、至少25%經修飾核苷酸、至少50%經修飾核苷酸、至少80%經修飾核苷酸或至少90%經修飾核苷酸。例如,多核苷酸可含有經修飾嘧啶,例如經修飾尿嘧啶或胞嘧啶。在一些實施例中,多核苷酸中之至少5%、至少10%、至少25%、至少50%、至少80%、至少90%或100%尿嘧啶由經修飾尿嘧啶(例如,5-取代之尿嘧啶)置換。經修飾尿嘧啶可由具有單一獨特結構之化合物置換,或可由複數種具有不同結構(例如,2、3、4種或更多種獨特結構)之化合物置換。在一些實施例中,多核苷酸中之至少5%、至少10%、至少25%、至少50%、至少80%、至少90%或100%胞嘧啶由經修飾胞嘧啶(例如,5-取代之胞嘧啶)置換。經修飾胞嘧啶可由具有單一獨特結構之化合物置換,或可由複數種具有不同結構(例如,2、3、4種或更多種獨特結構)之化合物置換。 4. 環狀mRNA有效載荷 The polynucleotide may contain at least 1% and at most 100% modified nucleotides, or any intermediate percentages, such as at least 5% modified nucleotides, at least 10% modified nucleotides, at least 25% modified nucleotides, at least 50% modified nucleotides, at least 80% modified nucleotides, or at least 90% modified nucleotides. For example, the polynucleotide may contain a modified pyrimidine, such as a modified uracil or cytosine. In some embodiments, at least 5%, at least 10%, at least 25%, at least 50%, at least 80%, at least 90%, or 100% of the uracil in the polynucleotide is replaced by a modified uracil (e.g., a 5-substituted uracil). The modified uracil may be replaced by a compound having a single unique structure, or may be replaced by a plurality of compounds having different structures (e.g., 2, 3, 4 or more unique structures). In some embodiments, at least 5%, at least 10%, at least 25%, at least 50%, at least 80%, at least 90%, or 100% of the cytosines in the polynucleotide are replaced by modified cytosines (e.g., 5-substituted cytosines). The modified cytosines can be replaced by a compound having a single unique structure, or can be replaced by a plurality of compounds having different structures (e.g., 2, 3, 4 or more unique structures). 4. Circular mRNA Payload

在各個實施例中,本文所述之基於LNP之醫藥組合物(例如,基於LNP之基因編輯系統)可包括一或多種環狀mRNA分子或「oRNA」。在各個實施例中,環狀mRNA有效載荷可編碼本文所述之基因編輯系統的一或多種組分或其他相關治療蛋白。例如,環狀mRNA有效載荷可編碼胺基酸序列-可程式化DNA結合結構域(例如,逆轉錄子RT、CRISPR核酸酶、TALEN及鋅指結合結構域)或核酸序列-可程式化DNA結合結構域(例如,CRISPR Cas9、CRISPR Cas12a、CRISPR Cas12f、CRISPR Cas13a、CRISPR Cas13b或TnpB)。In various embodiments, the LNP-based pharmaceutical compositions described herein (e.g., LNP-based gene editing systems) may include one or more circular mRNA molecules or "oRNAs". In various embodiments, the circular mRNA payload may encode one or more components of the gene editing system described herein or other related therapeutic proteins. For example, the circular mRNA payload may encode an amino acid sequence-programmable DNA binding domain (e.g., retrotranscript RT, CRISPR nuclease, TALEN, and zinc finger binding domain) or a nucleic acid sequence-programmable DNA binding domain (e.g., CRISPR Cas9, CRISPR Cas12a, CRISPR Cas12f, CRISPR Cas13a, CRISPR Cas13b, or TnpB).

取決於基因編輯系統之性質,環狀mRNA有效載荷亦可編碼一或多個效應子結構域,該等結構域提供促進核苷酸序列及/或基因表現之變化之各種功能,諸如但不限於單股DNA結合蛋白質、核酸酶、核酸內切酶、核酸外切酶、去胺酶(例如,胞苷去胺酶或腺苷去胺酶)、聚合酶(例如,逆轉錄酶)、整合酶、重組酶等,以及包含連接在一起的一或多個功能結構域之融合蛋白。Depending on the nature of the gene editing system, the circular mRNA payload may also encode one or more effector domains that provide various functions that promote changes in nucleotide sequence and/or gene expression, such as but not limited to single-stranded DNA binding proteins, nucleases, endonucleases, exonucleases, deaminases (e.g., cytidine deaminase or adenosine deaminase), polymerases (e.g., reverse transcriptases), integrases, recombinases, etc., as well as fusion proteins comprising one or more functional domains linked together.

本文所述之環狀RNA係藉由共價或非共價鍵形成連續結構之多核糖核苷酸。由於環狀結構,oRNA與相應線性RNA相比具有改良之穩定性、增加之半衰期、降低之免疫原性及/或改良之功能性(例如,本文所述之功能)。The circular RNA described herein is a polyribonucleotide that forms a continuous structure by covalent or non-covalent bonds. Due to the circular structure, oRNA has improved stability, increased half-life, reduced immunogenicity and/or improved functionality (e.g., the functions described herein) compared to the corresponding linear RNA.

在一些實施例中,oRNA結合標靶。在一些實施例中,oRNA結合受質。在一些實施例中,oRNA結合標靶且結合標靶之受質。在一些實施例中,oRNA結合標靶且介導標靶之受質的調節。在一些實施例中,oRNA將標靶及其受質結合在一起以介導受質之修飾,例如轉譯後修飾。在一些實施例中,oRNA將標靶及其受質結合在一起以介導涉及受質之細胞過程(例如,改變蛋白質降解或信號轉導)。在一些實施例中,標靶為標靶蛋白且受質為受質蛋白。In some embodiments, the oRNA binds to a target. In some embodiments, the oRNA binds to a substrate. In some embodiments, the oRNA binds to a target and binds to a substrate of the target. In some embodiments, the oRNA binds to a target and mediates regulation of the substrate of the target. In some embodiments, the oRNA binds to a target and its substrate to mediate modification of the substrate, such as post-translational modification. In some embodiments, the oRNA binds to a target and its substrate to mediate a cellular process involving the substrate (e.g., altering protein degradation or signal transduction). In some embodiments, the target is a target protein and the substrate is a substrate protein.

在一些實施例中,oRNA包含用於結合於化合物之結合部分。結合部分可為經修飾之多核糖核苷酸。該化合物可藉由結合部分與oRNA結合。在一些實施例中,該化合物結合於標靶且介導標靶之受質的調節。在一些實施例中,oRNA結合標靶之受質,且藉由結合部分與oRNA結合之化合物結合標靶以將標靶及其受質結合在一起,從而介導受質之修飾,例如轉譯後修飾。在一些實施例中,oRNA結合標靶之受質,且藉由結合部分與oRNA結合之化合物結合標靶以將標靶及其受質結合在一起,從而介導受質之修飾以介導涉及受質之細胞過程(例如,改變蛋白質降解或信號轉導)。在一些實施例中,標靶為標靶蛋白且受質為受質蛋白。In some embodiments, the oRNA comprises a binding portion for binding to a compound. The binding portion may be a modified polyribonucleotide. The compound may be bound to the oRNA by a binding portion. In some embodiments, the compound binds to a target and mediates the regulation of the target's substrate. In some embodiments, the oRNA binds to the target's substrate, and the compound bound to the oRNA by a binding portion binds to the target to bind the target and its substrate together, thereby mediating the modification of the substrate, such as post-translational modification. In some embodiments, the oRNA binds to the target's substrate, and the compound bound to the oRNA by a binding portion binds to the target to bind the target and its substrate together, thereby mediating the modification of the substrate to mediate a cellular process involving the substrate (e.g., altering protein degradation or signal transduction). In some embodiments, the target is a target protein and the substrate is a substrate protein.

在一些實施例中,oRNA可在哺乳動物(例如,人類、非人類靈長類動物、兔、大鼠及小鼠)中為非免疫原性的。In some embodiments, the oRNA can be non-immunogenic in mammals (e.g., humans, non-human primates, rabbits, rats, and mice).

在一些實施例中,在來自水產養殖動物(例如,魚、蟹、蝦、牡蠣等)之細胞、哺乳動物細胞、來自寵物或動物園動物(例如,貓、犬、蜥蜴、鳥、獅子、老虎及熊等)之細胞、來自農場或工作動物(例如,馬、牛、豬、雞等)之細胞、人類細胞、經培養細胞、原代細胞或細胞株、幹細胞、祖細胞、分化細胞、生殖細胞、癌細胞(例如,腫瘤發生性、轉移性)、非腫瘤發生細胞(例如,正常細胞)、胎兒細胞、胚胎細胞、成體細胞、有絲分裂細胞、非有絲分裂細胞或其任何組合中,oRNA可能能夠複製,或進行複製。In some embodiments, in cells from aquaculture animals (e.g., fish, crab, shrimp, oysters, etc.), mammalian cells, cells from pet or zoo animals (e.g., cats, dogs, lizards, birds, lions, tigers, and bears, etc.), cells from farm or working animals (e.g., horses, cows, pigs, chickens, etc.), human cells, cultured cells, The oRNA may be capable of replication, or undergo replication, in primary cells or cell lines, stem cells, progenitor cells, differentiated cells, germ cells, cancer cells (e.g., tumorigenic, metastatic), non-tumorigenic cells (e.g., normal cells), fetal cells, embryonic cells, adult cells, mitotic cells, non-mitotic cells, or any combination thereof.

在一態樣中,本文提供一種醫藥組合物,其包含:環狀RNA,該環狀RNA依以下次序包含3’組I內含子片段、內部核糖體進入位點(IRES)、編碼多肽(例如,核鹼基編輯系統或其組分)之表現序列及5’組I內含子片段;及轉移媒劑,該轉移媒劑包含以下至少一者:(i)可離子化脂質,(ii)結構脂質,及(iii)經PEG修飾之脂質,其中該轉移媒劑能夠將環狀RNA多核苷酸遞送至細胞(例如人類細胞,諸如存在於人類個體中之免疫細胞),使得多肽在細胞中進行轉譯。In one aspect, provided herein is a pharmaceutical composition comprising: a circular RNA comprising, in the following order, a 3' group I intron fragment, an internal ribosome entry site (IRES), an expression sequence encoding a polypeptide (e.g., a nucleobase editing system or a component thereof), and a 5' group I intron fragment; and a transfer vehicle comprising at least one of the following: (i) an ionizable lipid, (ii) a structural lipid, and (iii) a PEG-modified lipid, wherein the transfer vehicle is capable of delivering the circular RNA polynucleotide to a cell (e.g., a human cell, such as an immune cell present in a human individual), so that the polypeptide is translated in the cell.

在一些實施例中,該醫藥組合物經調配用於靜脈內投與至有需要之人類個體。在一些實施例中,3’組I內含子片段及5’組I內含子片段為魚腥藻組I內含子片段。In some embodiments, the pharmaceutical composition is formulated for intravenous administration to a human subject in need thereof. In some embodiments, the 3' group I intron fragment and the 5' group I intron fragment are Anabaena group I intron fragments.

在某些實施例中,3’內含子片段及5’內含子片段由完整內含子中之L9a-5排列位點限定。在某些實施例中,3’內含子片段及5’內含子片段由完整內含子中之L8-2排列位點限定。In certain embodiments, the 3' intron fragment and the 5' intron fragment are defined by the L9a-5 arrangement site in the complete intron. In certain embodiments, the 3' intron fragment and the 5' intron fragment are defined by the L8-2 arrangement site in the complete intron.

在一些實施例中,IRES來自Taura症候群病毒、錐蝽病毒、泰勒氏腦脊髓炎病毒、猿病毒40、紅火蟻病毒1、稻麥蚜病毒、網狀內皮增生病毒、人類脊髓灰白質炎病毒1、珀椿腸病毒、喀什米爾蜜蜂病毒、人類鼻病毒2、草翅葉蟬病毒- 1、人類免疫缺乏病毒1型、草翅葉蟬病毒- 1、Himetobi P病毒、C型肝炎病毒、A型肝炎病毒、GB型肝炎病毒、口蹄疫病毒、人類腸病毒71、馬鼻炎病毒、茶尺蠖微小RNA病毒樣病毒、腦心肌炎病毒、果蠅C病毒、人類柯薩奇病毒B3、十字花科煙草花葉病毒、蟋蟀麻痺病毒、牛病毒性腹瀉病毒1、黑皇后細胞病毒、蚜蟲致死性麻痺病毒、禽腦脊髓炎病毒、急性蜜蜂麻痺病毒、木槿褪綠環斑病毒、經典豬瘟病毒、人類FGF2、人類SFTPA1、人類AML1/RUNX1、果蠅觸角足、人類AQP4、人類AT1R、人類BAG-1、人類BCL2、人類BiP、人類c-IAPl、人類c-myc、人類eIF4G、小鼠NDST4L、人類LEF1、小鼠HIFlα、人類n.myc、小鼠Gtx、人類p27kipl、人類PDGF2/c-sis、人類p53、人類Pim-1、小鼠Rbm3、果蠅reaper、犬Scamper、果蠅Ubx、人類UNR、小鼠UtrA、人類VEGF-A、人類XIAP、果蠅hairless、釀酒酵母TFIID、釀酒酵母YAP1、煙草蝕刻病毒、蕪菁皺縮病毒、EMCV-A、EMCV-B、EMCV-Bf、EMCV-Cf、EMCV pEC9、微小雙節RNA病毒、HCV QC64、人類科薩病毒E/D、人類科薩病毒F、人類科薩病毒JMY、鼻病毒NAT001、HRV14、HRV89、HRVC-02、HRV-A21、薩比亞病毒A SHI、薩比亞病毒FHB、薩比亞病毒NG-J1、人類副腸孤病毒1、Crohivirus B、Yc-3、Rosavirus M-7、Shanbavirus A、Pasivirus A、Pasivirus A 2、埃可病毒E14、人類副腸孤病毒5、愛知病毒、A型肝炎病毒HA 16、Phopivirus、CVA10、腸病毒C、腸病毒D、腸病毒J、人類Pegivirus 2、GBV-C GT110、GBV-C K1737、GBV-C Iowa、Pegivirus A 1220、Pasivirus A 3、薩佩羅病毒、Rosavirus B、Bakunsa病毒、Tremovirus A、豬Pasivirus 1、PLV-CHN、Pasivirus A、Sicinivirus、Hepacivirus K、Hepacivirus A、BVDV1、邊界病病毒、BVDV2、CSFV-PK15C、SF573 Dicistrovirus、湖北微小RNA病毒樣病毒、CRPV、唾液病毒A BN5、唾液病毒A BN2、唾液病毒A 02394、唾液病毒A GUT、唾液病毒A CH、唾液病毒A SZ1、唾液病毒FHB、CVB3、CVB1、埃可病毒7、CVB5、EVA71、CVA3、CVA12、EV24或eIF4G適體。In some embodiments, the IRES is from Taura syndrome virus, cone bug virus, Theilerian encephalomyelitis virus, simian virus 40, red fire ant virus 1, rice aphid virus, reticuloendotheliosis virus, human poliovirus 1, Persian enterovirus, cashmere honey bee virus, human rhinovirus 2, leafhopper virus-1, human immunodeficiency virus type 1, leafhopper virus-1, Himetobi P virus, hepatitis C virus, hepatitis A virus, GB hepatitis virus, foot-and-mouth disease virus, human enterovirus 71, equine rhinitis virus, tea geometrid microRNA virus-like virus, encephalomyocarditis virus, fruit fly C virus, human coxsackievirus B3, cruciferous tobacco mosaic virus, cricket paralysis virus, bovine viral diarrhea virus 1, black queen cell virus, aphid lethal paralysis virus, avian encephalomyelitis virus, acute bee paralysis virus, hibiscus chlorotic ringspot virus, classical swine fever virus, human FGF2, human SFTPA1, human AML1/RUNX1, fruit fly tentacles, human AQP4, human AT1R, human BAG-1, human BCL2, human BiP, human c -IAPl, human c-myc, human eIF4G, mouse NDST4L, human LEF1, mouse HIFlα, human n.myc, mouse Gtx, human p27kipl, human PDGF2/c-sis, human p53, human Pim-1, mouse Rbm3, fruit fly reaper, canine Scamper, fruit fly Ubx, human UNR, mouse UtrA, human VEGF-A, human XIAP, fruit fly hairless, brewer's yeast TFIID, brewer's yeast YAP1, tobacco etch virus, cyanocobalamin virus, EMCV-A, EMCV-B, EMCV-Bf, EMCV-Cf, EMCV pEC9, Picovirus, HCV QC64, Human Cossavirus E/D, Human Cossavirus F, Human Cossavirus JMY, Rhinovirus NAT001, HRV14, HRV89, HRVC-02, HRV-A21, Sabiavirus A SHI, Sabiavirus FHB, Sabiavirus NG-J1, Human coleus virus 1, Crohivirus B, Yc-3, Rosavirus M-7, Shanbavirus A, Pasivirus A, Pasivirus A 2, Echovirus E14, Human coleus virus 5, Aichi virus, Hepatitis A virus HA 16, Phopivirus, CVA10, Enterovirus C, Enterovirus D, Enterovirus J, Human Pegivirus 2, GBV-C GT110, GBV-C K1737, GBV-C Iowa, Pegivirus A 1220, Pasivirus A 3, Sapelovirus, Rosavirus B, Bakunsa virus, Tremovirus A, Porcine Pasivirus 1, PLV-CHN, Pasivirus A, Sicinivirus, Hepacivirus K, Hepacivirus A, BVDV1, Border Disease Virus, BVDV2, CSFV-PK15C, SF573 Dicistrovirus, Hubei Picornavirus-like Virus, CRPV, Salivirus A BN5, Salivirus A BN2, Salivirus A 02394, Salivirus A GUT, Salivirus A CH, Salivirus A SZ1, Salivirus FHB, CVB3, CVB1, Echovirus 7, CVB5, EVA71, CVA3, CVA12, EV24, or eIF4G aptamer.

在一些實施例中,IRES包含CVB3 IRES或其片段或變異體。在一些實施例中,該醫藥組合物包含在3’組I內含子片段與IRES之間之第一內部間隔基,及在表現序列與5’組I內含子片段之間之第二內部間隔基。在某些實施例中,第一及第二內部間隔基各自具有約10至約60個核苷酸之長度。In some embodiments, the IRES comprises a CVB3 IRES or a fragment or variant thereof. In some embodiments, the pharmaceutical composition comprises a first internal spacer between the 3' group I intron fragment and the IRES, and a second internal spacer between the expression sequence and the 5' group I intron fragment. In certain embodiments, the first and second internal spacers each have a length of about 10 to about 60 nucleotides.

在一些實施例中,環狀mRNA包含編碼相關多肽,諸如核鹼基編輯系統或治療蛋白(例如,CAR或TCR複合蛋白)之核苷酸序列。In some embodiments, the circular mRNA comprises a nucleotide sequence encoding a polypeptide of interest, such as a nucleobase editing system or a therapeutic protein (e.g., a CAR or TCR complex protein).

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物進一步包含靶向部分。在某些實施例中,靶向部分介導受體介導之內吞作用或在細胞分離或純化不存在之情況下將遞送媒劑(LNP)直接融合至所選細胞群體或組織之所選細胞中。在某些實施例中,靶向部分能夠結合於選自群CD3、CD4、CD8、CDS、CD7、PD-1、4-1BB、CD28、Clq及CD2之蛋白質。在某些實施例中,靶向部分包含對巨噬細胞、樹突狀細胞、NK細胞、NKT或T細胞抗原具有特異性之抗體。在某些實施例中,靶向部分包含scFv、奈米抗體、肽、微型抗體、多核苷酸適體、重鏈可變區、輕鏈可變區或其片段。In some embodiments, the LNP-based nucleobase editing systems, RNA therapeutics, and pharmaceutical compositions described herein further comprise a targeting moiety. In certain embodiments, the targeting moiety mediates receptor-mediated endocytosis or directly fuses the delivery vehicle (LNP) to a selected cell population or tissue in the absence of cell separation or purification. In certain embodiments, the targeting moiety is capable of binding to a protein selected from the group CD3, CD4, CD8, CDS, CD7, PD-1, 4-1BB, CD28, Clq, and CD2. In certain embodiments, the targeting moiety comprises an antibody specific for a macrophage, dendritic cell, NK cell, NKT, or T cell antigen. In certain embodiments, the targeting moiety comprises a scFv, a nanobody, a peptide, a minibody, a polynucleotide aptamer, a heavy chain variable region, a light chain variable region, or a fragment thereof.

在一些實施例中,以有效治療人類個體之疾病的量投與本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物(例如,其中該疾病可為癌症、肌肉病症或CNS病症等)。在一些實施例中,當與包括包含編碼相同多肽之外源性DNA的T細胞或載體之醫藥組合物相比時,基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物具有增強之安全性型態。In some embodiments, the LNP-based nucleobase editing systems, RNA therapeutics, and pharmaceutical compositions described herein are administered in an amount effective to treat a disease in a human subject (e.g., where the disease may be cancer, a muscle disorder, or a CNS disorder, etc.). In some embodiments, the LNP-based nucleobase editing systems, RNA therapeutics, and pharmaceutical compositions have an enhanced safety profile when compared to pharmaceutical compositions comprising T cells or vectors comprising exogenous DNA encoding the same polypeptide.

在一些實施例中,以有效誘導基因體中之所需精確編輯的量投與基於LNP之核鹼基編輯系統及其醫藥組合物。在一些實施例中,當與先前技術之基因編輯遞送組合物相比時,基於LNP之核鹼基編輯系統及醫藥組合物具有增強之安全性型態。In some embodiments, LNP-based nucleotide editing systems and pharmaceutical compositions thereof are administered in an amount effective to induce the desired precise editing in the genome. In some embodiments, LNP-based nucleotide editing systems and pharmaceutical compositions have an enhanced safety profile when compared to prior art gene editing delivery compositions.

在另一態樣中,本揭示案提供環狀RNA,該環狀RNA依以下次序包含3’組I內含子片段、內部核糖體進入位點(IRES)、編碼多肽(例如,核鹼基編輯系統或其組分)之表現序列及5’組I內含子片段。In another aspect, the present disclosure provides a circular RNA comprising, in the following order, a 3' group I intron fragment, an internal ribosome entry site (IRES), an expression sequence encoding a polypeptide (e.g., a nucleobase editing system or a component thereof), and a 5' group I intron fragment.

在一些實施例中,3’組I內含子片段及5’組I內含子片段為魚腥藻組I內含子片段。在某些實施例中,3’內含子片段及5’內含子片段由完整內含子中之L9a-5排列位點限定。在某些實施例中,3’內含子片段及5’內含子片段由完整內含子中之L8-2排列位點限定。在某些實施例中,IRES包含CVB3 IRES或其片段或變異體。In some embodiments, the 3' group I intron fragment and the 5' group I intron fragment are Anabaena group I intron fragments. In certain embodiments, the 3' intron fragment and the 5' intron fragment are defined by the L9a-5 arrangement site in the complete intron. In certain embodiments, the 3' intron fragment and the 5' intron fragment are defined by the L8-2 arrangement site in the complete intron. In certain embodiments, the IRES comprises CVB3 IRES or a fragment or variant thereof.

在一些實施例中,該環狀RNA包含在3’組I內含子片段與IRES之間之第一內部間隔基,及在表現序列與5’組I內含子片段之間之第二內部間隔基。In some embodiments, the circular RNA comprises a first internal spacer between the 3' group I intron fragment and the IRES, and a second internal spacer between the expression sequence and the 5' group I intron fragment.

在某些實施例中,第一及第二內部間隔基各自具有約10至約60個核苷酸之長度。In certain embodiments, the first and second internal spacers each have a length of about 10 to about 60 nucleotides.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷由天然核苷酸組成。在一些實施例中,該環狀RNA進一步包含編碼治療蛋白之第二表現序列。在一些實施例中,治療蛋白包含檢查點抑制劑。在某些實施例中,治療蛋白包含細胞介素。In some embodiments, the circular RNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition described herein is composed of natural nucleotides. In some embodiments, the circular RNA further comprises a second expression sequence encoding a therapeutic protein. In some embodiments, the therapeutic protein comprises a checkpoint inhibitor. In certain embodiments, the therapeutic protein comprises an interleukin.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷由天然核苷酸組成。In some embodiments, the circular RNA payload of the LNP-based nucleobase editing systems, RNA therapeutics, and pharmaceutical compositions described herein is composed of natural nucleotides.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷包含部分地或完全地經密碼子最佳化之核苷酸序列。在一些實施例中,該環狀RNA經最佳化以缺乏存在於等效之預最佳化多核苷酸中的至少一個微小RNA結合位點。在一些實施例中,該環狀RNA經最佳化以缺乏存在於等效之預最佳化多核苷酸中的至少一個核酸內切酶敏感位點。在一些實施例中,該環狀RNA經最佳化以缺乏存在於等效之預最佳化多核苷酸中的至少一個RNA編輯敏感位點。In some embodiments, the circular RNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition described herein comprises a partially or completely codon-optimized nucleotide sequence. In some embodiments, the circular RNA is optimized to lack at least one microRNA binding site present in an equivalent pre-optimized polynucleotide. In some embodiments, the circular RNA is optimized to lack at least one endonuclease sensitive site present in an equivalent pre-optimized polynucleotide. In some embodiments, the circular RNA is optimized to lack at least one RNA editing sensitive site present in an equivalent pre-optimized polynucleotide.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷在人體中之活體內功能半衰期大於具有相同表現序列的等效線性RNA之活體內功能半衰期。在一些實施例中,該環狀RNA具有約100個核苷酸至約10千鹼基之長度。在一些實施例中,該環狀RNA具有至少約20小時之功能半衰期。在一些實施例中,該環狀RNA在人類細胞中具有至少約20小時之治療效應持續時間。在一些實施例中,該環狀RNA在人類細胞中之治療效應持續時間大於或等於包含相同表現序列的等效線性RNA之治療效應持續時間。在一些實施例中,該環狀RNA在人類細胞中之功能半衰期大於或等於包含相同表現序列的等效線性RNA之功能半衰期。In some embodiments, the circular RNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition described herein has a functional half-life in vivo in the human body that is greater than the functional half-life in vivo of an equivalent linear RNA with the same expression sequence. In some embodiments, the circular RNA has a length of about 100 nucleotides to about 10 kilobases. In some embodiments, the circular RNA has a functional half-life of at least about 20 hours. In some embodiments, the circular RNA has a therapeutic effect duration of at least about 20 hours in human cells. In some embodiments, the duration of the therapeutic effect of the circular RNA in human cells is greater than or equal to the duration of the therapeutic effect of an equivalent linear RNA comprising the same expression sequence. In some embodiments, the functional half-life of the circular RNA in human cells is greater than or equal to the functional half-life of an equivalent linear RNA comprising the same expression sequence.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷之半衰期至少為線性配對物之半衰期。在一些實施例中,oRNA之半衰期與線性配對物之半衰期相比有所增加。在一些實施例中,半衰期增加約5%、10%、15%、20%、25%、30%、35%、40%、45%、50%或更多。在一些實施例中,oRNA在細胞中之半衰期或持久性為至少約1小時至約30天,或至少約2小時、6小時、12小時、18小時、24小時(1天)、2天、3天、4天、5天、6天、7天、8天、9天、10天、11天、12天、13天、14天、15天、16天、17天、18天天、19天、20天、21天、22天、23天、24天、25天、26天、27天、28天、29天、30天、60天或更長或其間任何時間。在一些實施例中,oRNA在細胞中之半衰期或持久性不超過約10 min至約7天,或不超過約1小時、2小時、3小時、4小時、5小時、6小時、7小時、8小時、9小時、10小時、11小時、12小時、13小時、14小時、15小時、16小時、17小時、18小時、19小時、20小時、21小時、22小時、24小時(1天)、36小時(1.5天)、48小時(2天)、60小時(2.5天)、72小時(3天)、4天、5天、6天或7天。In some embodiments, the half-life of the circular RNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition described herein is at least the half-life of the linear partner. In some embodiments, the half-life of the oRNA is increased compared to the half-life of the linear partner. In some embodiments, the half-life is increased by about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50% or more. In some embodiments, the half-life or persistence of the oRNA in the cell is at least about 1 hour to about 30 days, or at least about 2 hours, 6 hours, 12 hours, 18 hours, 24 hours (1 day), 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, 11 days, 12 days, 13 days, 14 days, 15 days, 16 days, 17 days, 18 days, 19 days, 20 days, 21 days, 22 days, 23 days, 24 days, 25 days, 26 days, 27 days, 28 days, 29 days, 30 days, 60 days or longer, or any time in between. In some embodiments, the half-life or persistence of the oRNA in the cell is no more than about 10 min to about 7 days, or no more than about 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours, 7 hours, 8 hours, 9 hours, 10 hours, 11 hours, 12 hours, 13 hours, 14 hours, 15 hours, 16 hours, 17 hours, 18 hours, 19 hours, 20 hours, 21 hours, 22 hours, 24 hours (1 day), 36 hours (1.5 days), 48 hours (2 days), 60 hours (2.5 days), 72 hours (3 days), 4 days, 5 days, 6 days, or 7 days.

在一些實施例中,當細胞正在分裂時,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷在細胞中具有半衰期或持久性。在一些實施例中,oRNA在細胞分裂後具有半衰期或持久性。In some embodiments, the circular RNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition described herein has a half-life or persistence in the cell when the cell is dividing. In some embodiments, the oRNA has a half-life or persistence after cell division.

在某些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷在分裂細胞中之半衰期或持久性為大於約10分鐘至約30天,或至少約10分鐘、15分鐘、30分鐘、45分鐘、1小時、2小時、3小時、4小時、5小時、6小時、7小時、8小時、9小時、10小時、11小時、12小時、13小時、14小時、15小時、16小時、17小時、18小時、24小時(1天)、2天、3天、4天、5天、6天、7天、8天、9天、10天、11天、12天、13天、14天、15天、16天、17天、18天天、19天、20天、21天、22天、23天、24天、25天、26天、27天、28天、29天、30天、60天或更長或其間任何時間。In certain embodiments, the half-life or persistence of the circular RNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition described herein in dividing cells is greater than about 10 minutes to about 30 days, or at least about 10 minutes, 15 minutes, 30 minutes, 45 minutes, 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours, 7 hours, 8 hours, 9 hours, 10 hours, 11 hours, 12 hours, 13 hours, 14 hours, 15 hours, 16 hours, 17 hours, 18 hours, 19 hours, 20 hours, 21 hours, 22 hours, 23 hours, 24 hours, 25 hours, 26 hours, 27 hours, 28 hours, 29 hours, 30 hours, 31 hours, 32 hours, 33 hours, 34 hours, 35 hours, 36 hours, 37 hours, 38 hours, 39 hours, 40 hours, 41 hours, 42 hours, 43 hours, 44 hours, 45 hours, 46 hours, 47 hours, 48 hours, 49 hours, 50 hours, 51 hours, 52 hours, 53 hours, 54 hours, 55 hours, 56 hours, 57 hours, 58 hours, 59 hours, 60 hours, 61 hours, 62 hours, 63 hours, 64 hours, 65 hours, 66 hours, 67 hours, 68 hours, 69 hours, 70 hours, 71 hours, 72 hours, 73 hours, 74 hours, 75 hours 14 hours, 15 hours, 16 hours, 17 hours, 18 hours, 24 hours (1 day), 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, 11 days, 12 days, 13 days, 14 days, 15 days, 16 days, 17 days, 18 days, 19 days, 20 days, 21 days, 22 days, 23 days, 24 days, 25 days, 26 days, 27 days, 28 days, 29 days, 30 days, 60 days or more or any period in between.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷調節細胞功能,例如瞬時地或長期。在某些實施例中,細胞功能穩定地發生改變,諸如持續至少約1小時至約30天,或至少約2小時、6小時、12小時、18小時、24小時(1天)、2天、3天、4天、5天、6天、7天、8天、9天、10天、11天、12天、13天、14天、15天、16天、17天、18天天、19天、20天、21天、22天、23天、24天、25天、26天、27天、28天、29天、30天、60天或更長之調節。在某些實施例中,細胞功能瞬時地發生改變,諸如持續不超過約30 min至約7天,或不超過約30分鐘、45分鐘、1小時、2小時、3小時、4小時、5小時、6小時、7小時、8小時、9小時、10小時、11小時、12小時、13小時、14小時、15小時、16小時、17小時、18小時、19小時、20小時、21小時、22小時、23小時、24小時(1天)、36小時(1.5天)、48小時(2天)、60小時(2.5天)、72小時(3天)、4天、5天、6天或7天之調節。In some embodiments, the circular RNA payload of the LNP-based nucleobase editing systems, RNA therapeutics, and pharmaceutical compositions described herein modulates cellular function, for example transiently or chronically. In certain embodiments, the alteration in cellular function is stable, such as lasting at least about 1 hour to about 30 days, or at least about 2 hours, 6 hours, 12 hours, 18 hours, 24 hours (1 day), 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, 11 days, 12 days, 13 days, 14 days, 15 days, 16 days, 17 days, 18 days, 19 days, 20 days, 21 days, 22 days, 23 days, 24 days, 25 days, 26 days, 27 days, 28 days, 29 days, 30 days, 60 days or longer of modulation. In certain embodiments, the cellular function is altered transiently, such as lasting no more than about 30 minutes to about 7 days, or no more than about 30 minutes, 45 minutes, 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours, 7 hours, 8 hours, 9 hours, 10 hours, 11 hours, 12 hours, 13 hours, 14 hours, 15 hours, 16 hours, 17 hours, 18 hours, 19 hours, 20 hours, 21 hours, 22 hours, 23 hours, 24 hours (1 day), 36 hours (1.5 days), 48 hours (2 days), 60 hours (2.5 days), 72 hours (3 days), 4 days, 5 days, 6 days, or 7 days of modulation.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷為至少約20個核苷酸、至少約30個核苷酸、至少約40個核苷酸、至少約50個核苷酸、至少約75個核苷酸、至少約100個核苷酸、至少約200個核苷酸、至少約300個核苷酸、至少約400個核苷酸、至少約500個核苷酸、至少約1,000個核苷酸、至少約2,000個核苷酸、至少約5,000個核苷酸、至少約6,000個核苷酸、至少約7,000個核苷酸、至少約8,000個核苷酸、至少約9,000個核苷酸、至少約10,000個核苷酸、至少約12,000個核苷酸、至少約14,000個核苷酸、至少約15,000個核苷酸、至少約16,000個核苷酸、至少約17,000個核苷酸、至少約18,000個核苷酸、至少約19,000個核苷酸或至少約20,000個核苷酸。在一些實施例中,oRNA可具有足夠大小以容納核糖體之結合位點。In some embodiments, the circular RNA payload of the LNP-based nucleobase editing systems, RNA therapeutics, and pharmaceutical compositions described herein is at least about 20 nucleotides, at least about 30 nucleotides, at least about 40 nucleotides, at least about 50 nucleotides, at least about 75 nucleotides, at least about 100 nucleotides, at least about 200 nucleotides, at least about 300 nucleotides, at least about 400 nucleotides, at least about 500 nucleotides, at least about 1,000 nucleotides, at least about 2,000 nucleotides, at least about 5, In some embodiments, the oRNA may be of sufficient size to accommodate the binding site of a ribosome.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷之最大大小可受到封裝及遞送RNA至標靶之能力限制。在一些實施例中,oRNA之大小係足以編碼多肽之長度,因此,至少20,000個核苷酸、至少15,000個核苷酸、至少10,000個核苷酸、至少7,500個核苷酸或至少5,000個核苷酸、至少4,000個核苷酸、至少3,000個核苷酸、至少2,000個核苷酸、至少1,000個核苷酸、至少500個核苷酸、至少400個核苷酸、至少300個核苷酸、至少200個核苷酸、至少100個核苷酸之長度可為有用的。In some embodiments, the maximum size of the circular RNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition described herein may be limited by the ability to package and deliver the RNA to the target. In some embodiments, the size of the oRNA is a length sufficient to encode a polypeptide, and thus, a length of at least 20,000 nucleotides, at least 15,000 nucleotides, at least 10,000 nucleotides, at least 7,500 nucleotides, or at least 5,000 nucleotides, at least 4,000 nucleotides, at least 3,000 nucleotides, at least 2,000 nucleotides, at least 1,000 nucleotides, at least 500 nucleotides, at least 400 nucleotides, at least 300 nucleotides, at least 200 nucleotides, at least 100 nucleotides may be useful.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷包含本文中別處所述之一或多種元件。在一些實施例中,該等元件可藉由間隔序列或連接體彼此隔開。在一些實施例中,該等元件可彼此隔開1個核苷酸、2個核苷酸、約5個核苷酸、約10個核苷酸、約15個核苷酸、約20個核苷酸、約30個核苷酸、約40個核苷酸、約50個核苷酸、約60個核苷酸、約80個核苷酸、約100個核苷酸、約150個核苷酸、約200個核苷酸、約250個核苷酸、約300個核苷酸、約400個核苷酸、約500個核苷酸、約600個核苷酸、約700個核苷酸、約800個核苷酸、約900個核苷酸、約1000個核苷酸,最多約1 kb,至少約1000個核苷酸。In some embodiments, the circular RNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition described herein comprises one or more elements described elsewhere herein. In some embodiments, the elements may be separated from each other by a spacer sequence or a linker. In some embodiments, the elements can be separated from each other by 1 nucleotide, 2 nucleotides, about 5 nucleotides, about 10 nucleotides, about 15 nucleotides, about 20 nucleotides, about 30 nucleotides, about 40 nucleotides, about 50 nucleotides, about 60 nucleotides, about 80 nucleotides, about 100 nucleotides, about 150 nucleotides, about 200 nucleotides, about 250 nucleotides, about 300 nucleotides, about 400 nucleotides, about 500 nucleotides, about 600 nucleotides, about 700 nucleotides, about 800 nucleotides, about 900 nucleotides, about 1000 nucleotides, up to about 1 kb, and at least about 1000 nucleotides.

在一些實施例中,一或多個元件彼此鄰接,例如缺乏間隔元件。In some embodiments, one or more elements are adjacent to each other, e.g., lacking a spacing element.

在一些實施例中,一或多個元件為構形柔性的。在一些實施例中,構形柔性係歸因於序列實質上不含二級結構。In some embodiments, one or more elements are conformationally flexible. In some embodiments, conformational flexibility is due to the sequence being substantially free of secondary structure.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷包含容納核糖體、轉譯或滾環轉譯之結合位點之二級或三級結構。In some embodiments, the circular RNA payload of the LNP-based nucleobase editing systems, RNA therapeutics, and pharmaceutical compositions described herein comprises a secondary or tertiary structure that accommodates a binding site for ribosomes, translation, or circular translation.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷包含特定序列特徵。例如,oRNA可包含特定核苷酸組合物。在一些此類實施例中,oRNA可包括一或多個富嘌呤區域(腺嘌呤或鳥苷)。在一些此類實施例中,oRNA可包括一或多個富嘌呤區域(腺嘌呤或鳥苷)。在一些實施例中,oRNA可包括一或多個富AU區域或元件(ARE)。在一些實施例中,oRNA可包括一或多個富腺嘌呤區域。In some embodiments, the circular RNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition described herein comprises a specific sequence feature. For example, the oRNA may comprise a specific nucleotide composition. In some such embodiments, the oRNA may comprise one or more purine-rich regions (adenine or guanosine). In some such embodiments, the oRNA may comprise one or more purine-rich regions (adenine or guanosine). In some embodiments, the oRNA may comprise one or more AU-rich regions or elements (ARE). In some embodiments, the oRNA may comprise one or more adenine-rich regions.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷包含本文中別處所述之一或多種修飾。In some embodiments, the circular RNA payload of the LNP-based nucleobase editing systems, RNA therapeutics, and pharmaceutical compositions described herein comprises one or more modifications described elsewhere herein.

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷包含一或多個表現序列且經組態以在個體細胞中活體內持續表現。在一些實施例中,oRNA經組態,使得細胞中之一或多個表現序列在稍後時間點之表現等於或高於較早時間點。在此類實施例中,一或多個表現序列之表現可維持於相對穩定之水準下或可隨時間增加。表現序列之表現可在一段延長時期內相對穩定。例如,在一些情況下,一或多個表現序列在細胞中之表現在至少7、8、9、10、12、14、16、18、20、22、23天或更多天之時期內不會減少50%、45%、40%、35%、30%、25%、20%、15%、10%或5%。在一些情況下,在一些情況下,一或多個表現序列在細胞中之表現在至少7、8、9、10、12、14、16、18、20、22、23天或更多天內維持於變化不超過50%、45%、40%、35%、30%、25%、20%、15%、10%或5%之水準下。 調節元件 In some embodiments, the circular RNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition described herein comprises one or more expression sequences and is configured to be continuously expressed in vivo in an individual cell. In some embodiments, the oRNA is configured so that the expression of one or more expression sequences in the cell at a later time point is equal to or higher than that at an earlier time point. In such embodiments, the expression of one or more expression sequences can be maintained at a relatively stable level or can increase over time. The expression of the expression sequence can be relatively stable over an extended period of time. For example, in some cases, expression of one or more expression sequences in a cell does not decrease by 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% over a period of at least 7, 8, 9, 10, 12, 14, 16, 18, 20, 22, 23 days or more. In some cases, in some cases, expression of one or more expression sequences in a cell is maintained at a level that does not vary by more than 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% over a period of at least 7, 8, 9, 10, 12, 14, 16, 18, 20, 22, 23 days or more. Regulatory Elements

在一些實施例中,本文所述之基於LNP之核鹼基編輯系統、RNA治療劑及醫藥組合物的環狀RNA有效載荷包含一或多種調節元件。如本文所用,「調節元件」係修飾表現序列(例如,編碼核鹼基編輯系統或治療蛋白之核苷酸序列,亦即相關編碼區(CROI))之表現的序列。調節元件可包括位於與環狀RNA有效載荷上編碼之相關編碼區相鄰處的序列。調節元件能可操作地連接至編碼相關編碼區(例如,核鹼基編輯系統或治療性多肽)之環狀RNA之核苷酸序列。In some embodiments, the circular RNA payload of the LNP-based nucleobase editing system, RNA therapeutic agent, and pharmaceutical composition described herein comprises one or more regulatory elements. As used herein, a "regulatory element" is a sequence that modifies the expression of an expressed sequence (e.g., a nucleotide sequence encoding a nucleobase editing system or a therapeutic protein, i.e., a cognate coding region (CROI)). The regulatory element may include a sequence located adjacent to the cognate coding region encoded on the circular RNA payload. The regulatory element can be operably linked to a nucleotide sequence of a circular RNA encoding a cognate coding region (e.g., a nucleobase editing system or a therapeutic polypeptide).

在一些實施例中,如與不存在調節元件時之表現量相比,調節元件可增加環狀RNA有效載荷上編碼之相關編碼區之表現量。In some embodiments, a regulatory element can increase the amount of expression of the relevant coding region encoded in the circular RNA payload, as compared to the amount of expression in the absence of the regulatory element.

在一些實施例中,調節元件可包含選擇性起始或活化環狀RNA有效載荷上編碼之相關編碼序列之轉譯的序列。In some embodiments, the regulatory element may comprise a sequence that selectively initiates or activates the translation of a cognate coding sequence encoded on the circular RNA payload.

在一些實施例中,調節元件可包含起始oRNA或有效載荷或貨物之降解的序列。起始降解之序列的非限制性實例包括但不限於核糖開關適體酶及miRNA結合位點。In some embodiments, the regulatory element may comprise a sequence that initiates degradation of an oRNA or an effective load or cargo. Non-limiting examples of sequences that initiate degradation include, but are not limited to, riboswitch aptamers and miRNA binding sites.

在一些實施例中,調節元件可調節oRNA上編碼之相關編碼區之轉譯。調節可產生相關編碼區之表現的增加(增強子)或減少(抑制因子)。調節元件可位於與CROI相鄰處(例如,在CROI之一側或兩側)。 轉譯起始序列 In some embodiments, the regulatory element can regulate the translation of the relevant coding region encoded on the oRNA. Regulation can produce an increase (enhancer) or decrease (repressor) in the expression of the relevant coding region. The regulatory element can be located adjacent to the CROI (e.g., on one or both sides of the CROI).

在一些實施例中,轉譯起始序列充當調節元件。在一些實施例中,轉譯起始序列包含AUG/ATG密碼子。在一些實施例中,轉譯起始序列包含任何真核起始密碼子,諸如但不限於AUG/ATG、CUG/CTG、GUG/GTG、UUG/TTG、ACG、AUC/ATC、AUU、AAG、AUA/ATA或AGG。在一些實施例中,轉譯起始序列包含Kozak序列。在一些實施例中,轉譯在選擇性條件(例如,應力誘導條件)下,在替代轉譯起始序列(例如,除AUG/ATG密碼子以外之轉譯起始序列)處開始。作為非限制性實例,環狀多核糖核苷酸之轉譯可在替代轉譯起始序列(諸如ACG)處開始。作為另一非限制性實例,環狀多核糖核苷酸轉譯可在替代轉譯起始序列CUG/CTG處開始。作為另一非限制性實例,轉譯可在替代轉譯起始序列GUG/GTG處開始。作為另一非限制性實例,轉譯可在重複相關非AUG (RAN)序列,諸如包括重複RNA (例如CGG、GGGGCC、CAG或CTG)之短延伸段之替代轉譯起始序列處開始。In some embodiments, the translation initiation sequence serves as a regulatory element. In some embodiments, the translation initiation sequence comprises AUG/ATG codons. In some embodiments, the translation initiation sequence comprises any eukaryotic initiation codon, such as but not limited to AUG/ATG, CUG/CTG, GUG/GTG, UUG/TTG, ACG, AUC/ATC, AUU, AAG, AUA/ATA or AGG. In some embodiments, the translation initiation sequence comprises a Kozak sequence. In some embodiments, translation starts at an alternative translation initiation sequence (e.g., a translation initiation sequence other than AUG/ATG codons) under selective conditions (e.g., stress-induced conditions). As a non-limiting example, the translation of a cyclic polyribonucleotide can start at an alternative translation initiation sequence (e.g., ACG). As another non-limiting example, circular polyribonucleotide translation can start at alternative translation initiation sequence CUG/CTG. As another non-limiting example, translation can start at alternative translation initiation sequence GUG/GTG. As another non-limiting example, translation can start at alternative translation initiation sequence of repeating related non-AUG (RAN) sequence, such as comprising short extension of repeat RNA (e.g. CGG, GGGGCC, CAG or CTG).

在一些實施例中,oRNA編碼多肽或肽且可包含轉譯起始序列。轉譯起始序列可包含但不限於起始密碼子、非編碼起始密碼子、Kozak序列或Shine-Dalgarno序列。轉譯起始序列可位於與有效載荷或貨物相鄰處(例如,在相關編碼區之一側或兩側)。In some embodiments, the oRNA encodes a polypeptide or peptide and may include a translation initiation sequence. The translation initiation sequence may include, but is not limited to, a start codon, a non-coding start codon, a Kozak sequence, or a Shine-Dalgarno sequence. The translation initiation sequence may be located adjacent to the payload or cargo (e.g., on one or both sides of the relevant coding region).

在一些實施例中,轉譯起始序列向oRNA提供構形柔性。在一些實施例中,轉譯起始序列在oRNA之實質上單股區域內。In some embodiments, the translation initiation sequence provides conformational flexibility to the oRNA. In some embodiments, the translation initiation sequence is in a substantially single-stranded region of the oRNA.

oRNA可包括超過1個起始密碼子,諸如但不限於至少2個、至少3個、至少4個、至少5個、至少6個、至少7個、至少8個、至少9個、至少10個、至少11個、至少12個、至少13個、至少14個、至少15個或超過15個起始密碼子。轉譯可在第一起始密碼子上起始或可在第一起始密碼子下游起始。The oRNA may include more than one start codon, such as but not limited to at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15 or more than 15 start codons. Translation may start at the first start codon or may start downstream of the first start codon.

在一些實施例中,oRNA可在並非第一起始密碼子之密碼子(例如,AUG)處起始。環狀多核糖核苷酸之轉譯可在替代轉譯起始序列,諸如但不限於ACG、AGG、AAG、CUG/CTG、GUG/GTG、AUA/ATA、AUU/ATT、UUG/TTG處起始。在一些實施例中,轉譯在選擇性條件(例如,應力誘導條件)下,在替代轉譯起始序列處開始。作為非限制性實例,oRNA之轉譯可在替代轉譯起始序列(諸如ACG)處開始。作為另一非限制性實例,oRNA轉譯可在替代轉譯起始序列CUG/CTG處開始。作為另一非限制性實例,oRNA轉譯可在替代轉譯起始序列GTG/GUG處開始。作為另一非限制性實例,oRNA可在重複相關非AUG (RAN)序列,諸如包括重複RNA (例如CGG、GGGGCC、CAG、CTG)之短延伸段之替代轉譯起始序列處開始轉譯。 IRES 序列 In some embodiments, oRNA can start at a codon (e.g., AUG) that is not the first start codon. The translation of the cyclic polyribonucleotide can start at an alternative translation start sequence, such as, but not limited to, ACG, AGG, AAG, CUG/CTG, GUG/GTG, AUA/ATA, AUU/ATT, UUG/TTG. In some embodiments, translation starts at an alternative translation start sequence under selective conditions (e.g., stress-induced conditions). As a non-limiting example, the translation of oRNA can start at an alternative translation start sequence (e.g., ACG). As another non-limiting example, oRNA translation can start at an alternative translation start sequence CUG/CTG. As another non-limiting example, oRNA translation can start at an alternative translation start sequence GTG/GUG. As another non-limiting example, oRNAs can initiate translation at an alternative translation initiation sequence that includes a short stretch of repetitive RNA (e.g., CGG, GGGGCC, CAG, CTG). IRES sequence

在一些實施例中,本文所述之oRNA包含能夠銜接真核核糖體之內部核糖體進入位點(IRES)元件。在一些實施例中,IRES元件為至少約5個核苷酸、至少約8個核苷酸、至少約9個核苷酸、至少約10個核苷酸、至少約15個核苷酸、至少約20個核苷酸、至少約25個核苷酸、至少約30個核苷酸、至少約40個核苷酸、至少約50個核苷酸、至少約100個核苷酸、至少約200個核苷酸、至少約250個核苷酸、至少約350個核苷酸或至少約500個核苷酸。在一實施例中,IRES元件源自生物體之DNA,該生物體包括但不限於病毒、哺乳動物及果蠅。此類病毒DNA可源自但不限於微小RNA病毒互補DNA (cDNA)、腦心肌炎病毒(EMCV) cDNA及脊髓灰白質炎病毒cDNA。在一實施例中,產生IRES元件之果蠅DNA包括但不限於來自黑腹果蠅之觸角足基因。In some embodiments, the oRNA described herein comprises an internal ribosome entry site (IRES) element capable of attaching to a eukaryotic ribosome. In some embodiments, the IRES element is at least about 5 nucleotides, at least about 8 nucleotides, at least about 9 nucleotides, at least about 10 nucleotides, at least about 15 nucleotides, at least about 20 nucleotides, at least about 25 nucleotides, at least about 30 nucleotides, at least about 40 nucleotides, at least about 50 nucleotides, at least about 100 nucleotides, at least about 200 nucleotides, at least about 250 nucleotides, at least about 350 nucleotides, or at least about 500 nucleotides. In one embodiment, the IRES element is derived from the DNA of an organism, including but not limited to viruses, mammals, and fruit flies. Such viral DNA can be derived from, but not limited to, microRNA virus complementary DNA (cDNA), encephalomyocarditis virus (EMCV) cDNA, and poliovirus cDNA. In one embodiment, the fruit fly DNA from which the IRES element is derived includes, but is not limited to, the Antlerpedia gene from Drosophila melanogaster.

在一些實施例中,IRES元件至少部分地源自病毒,例如,其可源自病毒IRES元件,諸如ABPV_IGRpred、AEV、ALPV_IGRpred、BQCV_IGRpred、BVDV1_1-385、BVDV1_29-391、CrPV_5NCR、CrPV_IGR、crTMV_IREScp、crTMV_IRESmp75、crTMV_IRESmp228、crTMV_IREScp、crTMV_IREScp、CSFV、CVB3、DCV_IGR、EMCV-R、EoPV_5NTR、ERAV 245-961、ERBV 162-920、EV71_1-748、FeLV-Notch2、FMDV_C_型、GBV-A、GBV-B、GBV-C、gypsy_env、gypsyD5、gypsyD2、HAV_HM175、HCV_1a_型、HiPV_IGRpred、HIV-1、HoCV1_IGRpred、HRV-2、IAPV_IGRpred、idefix、KBV_IGRpred、LINE-1_ORF1_-101_至_-1、LINE-1_ORF1-302_至_-202、LINE-1_ORF2-138_至_-86、LINE-1_ORF1_-44至_-1、PSIV_IGR、PV_1型_Mahoney、PV_3型_Leon、REV-A、RhPV_5NCR、RhPV_IGR、SINV1_IGRpred、SV40_661-830、TMEV、TMV_UI_IRESmp228、TRV_5NTR、TrV_IGR或TSV_IGR。在一些實施例中,IRES元件至少部分地源自細胞IRES,諸如AML1/RUNX1、Antp-D、Antp-DE、Antp-CDE、Apaf-1、Apaf-1、AQP4、AT1R_var1、AT1R_var2、AT1R_var3、AT1R_var4、BAG1_p36delta236 nt、BAG1_p36、BCL2、BiP_-222_-3、c-IAP1_285-1399、c-IAP1_1313-1462、c-jun、c-myc、Cat-1224、CCND1、DAPS、eIF4G、eIF4GI-ext、eIF4GII、eIF4GII-long、ELG1、ELH、FGF1A、FMR1、Gtx-133-141、Gtx-1-166、Gtx-1-120、Gtx-1-196、hairless、HAP4、HIF1a、hSNM1、Hsp101、hsp70、hsp70、Hsp90、IGF2_leader2、Kv1.4_1.2、L-myc、LamB1_-335_-1、LEF1、MNT_75-267、MNT_36-160、MTG8a、MYB、MYT2_997-1152、n-MYC、NDST1、NDST2、NDST3、NDST4L、NDST4S、NRF_-653_-17、NtHSF1、ODC1、p27kip1、03_128-269、PDGF2/c-sis、Pim-1、PITSLRE_p58、Rbm3、reaper、Scamper、TFIID、TIF4631、Ubx_1-966、Ubx_373-961、UNR、Ure2、UtrA、VEGF-A-133-1、XIAP_5-464、XIAP_305-466或YAP1。In some embodiments, the IRES element is at least partially derived from a virus, for example, it can be derived from a viral IRES element, such as ABPV_IGRpred, AEV, ALPV_IGRpred, BQCV_IGRpred, BVDV1_1-385, BVDV1_29-391, CrPV_5NCR, CrPV_IGR, crTMV_IREScp, crTMV_IRESmp75, crTMV_IRESmp228, crTMV_IREScp, crTMV_IREScp, CSFV, CVB3, DCV_IGR, EMCV-R, EoPV_5NTR, ERAV 245-961, ERBV 162-920, EV71_1-748, FeLV-Notch2, FMDV_C_type, GBV-A, GBV-B, GBV-C, gypsy_env, gypsyD5, gypsyD2, HAV_HM175, HCV_1a_type, HiPV_IGRpred, HIV-1, HoCV1_IGRpred, HRV-2, IAPV_IGRpred, idefix, KBV _IGRpred,LINE-1_ORF1_-101_to_-1,L INE-1_ORF1-302_to_-202, LINE-1_ORF2-138_to_-86, LINE-1_ORF1_-44_to_-1, PSIV_IGR, PV_type 1_Mahoney, PV_type 3_Leon, REV-A, RhPV_5NCR, RhPV_IGR, SINV1_IGRpred, SV40_661-830, TMEV, TMV_UI_IRESmp228, TRV_5NTR, TrV_IGR, or TSV_IGR. In some embodiments, the IRES element is derived at least in part from a cellular IRES, such as AML1/RUNX1, Antp-D, Antp-DE, Antp-CDE, Apaf-1, Apaf-1, AQP4, AT1R_var1, AT1R_var2, AT1R_var3, AT1R_var4, BAG1_p36delta236 nt, BAG1_p36, BCL2, BiP_-222_-3, c-IAP1_285-1399, c-IAP1_1313-1462, c-jun, c-myc, Cat-1224, CCND1, DAPS, eIF4G, eIF4GI-ext, eIF4GII, eIF4GII-long, ELG1, ELH, FGF1A, FMR 1. Gtx-133-141, Gtx-1-166, Gtx-1-120, Gtx-1-196, hairless, HAP4, HIF1a, hSNM1, Hsp101, hsp70, hsp70, Hsp90, IGF2_leader2, Kv1.4_1.2, L-myc, LamB1_- 335_-1, LEF1, MNT_75-267, MNT_36-160, MTG8a, MYB, MYT2_997-1152, n-MYC, NDST1, NDST2, NDST3, NDST4L, NDST4S, NRF_-653_-17, NtHSF1, ODC1, p27kip1, 03_128-269, PDGF 2/c-sis, Pim-1, PITSLRE_p58, Rbm3, reaper, Scamper, TFIID, TIF4631, Ubx_1-966, Ubx_373-961, UNR, Ure2, UtrA, VEGF-A-133-1, XIAP_5-464, XIAP_305-466, or YAP1.

在另一實施例中,IRES為來自柯薩奇病毒B3 (CVB3)之IRES序列,蛋白質編碼區編碼Guassia螢光素酶(Gluc)且間隔序列為聚A-C。In another embodiment, the IRES is an IRES sequence from Coxsackievirus B3 (CVB3), the protein coding region encodes Guassia luciferase (Gluc) and the spacer sequence is poly A-C.

在一些實施例中,IRES (若存在)為至少約50個核苷酸長。在一實施例中,載體包含IRES,該IRES包含天然序列。在一實施例中,載體包含IRES,該IRES包含合成序列。In some embodiments, the IRES, if present, is at least about 50 nucleotides long. In one embodiment, the vector comprises an IRES comprising a native sequence. In one embodiment, the vector comprises an IRES comprising a synthetic sequence.

IRES可充當唯一的核糖體結合位點,或可充當mRNA之多個核糖體結合位點之一。含有超過一個功能性核糖體結合位點之多核苷酸分子可編碼由核糖體獨立轉譯之數種肽或多肽(例如,多順反子mRNA)。當多核苷酸具有IRES時,進一步視情況提供第二可轉譯區。可根據本揭示案使用之IRES序列的實例包括但不限於來自微小RNA病毒(例如,FMDV)、瘟病毒(CFFV)、脊髓灰白質炎病毒(PV)、腦心肌炎病毒(ECMV)、口蹄疫病毒(FMDV)、C型肝炎病毒(HCV)、經典豬瘟病毒(CSFV)、鼠科動物白血病病毒(MLV)、猿免疫缺乏病毒(SIV)或蟋蟀麻痺病毒(CrPV)之彼等IRES序列。 終止元件 IRES can serve as the only ribosome binding site, or can serve as one of multiple ribosome binding sites for an mRNA. A polynucleotide molecule containing more than one functional ribosome binding site can encode several peptides or polypeptides that are independently translated by the ribosome (e.g., a polycistronic mRNA). When the polynucleotide has an IRES, a second translatable region is further provided as appropriate. Examples of IRES sequences that can be used according to the present disclosure include, but are not limited to, those from microRNA viruses (e.g., FMDV), pestiviruses (CFFV), polioviruses (PV), encephalomyocarditis virus (ECMV), foot-and-mouth disease virus (FMDV), hepatitis C virus (HCV), classical swine fever virus (CSFV), murine leukemia virus (MLV), simian immunodeficiency virus (SIV), or cricket paralysis virus (CrPV). Termination element

在一些實施例中,oRNA包括一或多個相關編碼區(亦即,亦稱為產物表現序列),其編碼相關多肽,包括但不限於核鹼基編輯系統及治療蛋白。在各個實施例中,產物表現序列可能具有或可能不具有終止元件。In some embodiments, oRNA includes one or more relevant coding regions (i.e., also referred to as product expression sequences), which encode relevant polypeptides, including but not limited to ribobase editing systems and therapeutic proteins. In various embodiments, the product expression sequence may or may not have a termination element.

在一些實施例中,oRNA包括一或多個產物表現序列,其缺乏終止元件,使得oRNA連續經轉譯。In some embodiments, the oRNA includes one or more product expression sequences that lack termination elements, such that the oRNA is continuously translated.

排除終止元件可導致編碼肽或多肽之滾環轉譯或連續表現,因為核糖體不會停滯或脫落。在此類實施例中,滾環轉譯藉由產物表現序列連續地進行表現。Excluding the termination element can result in circular translation or continuous expression of the encoded peptide or polypeptide because the ribosomes will not stall or fall off. In such embodiments, circular translation is continuously expressed by the product expression sequence.

在一些實施例中,oRNA中之一或多個產物表現序列包含終止元件。 在一些實施例中,oRNA中並非所有產物表現序列均包含終止元件。在此類情況下,當核糖體遇到終止元件且終止轉譯時,產物表現序列可自核糖體上脫落。 滾環轉譯 In some embodiments, one or more product expression sequences in the oRNA include a termination element. In some embodiments, not all product expression sequences in the oRNA include a termination element. In such cases, when the ribosome encounters the termination element and terminates translation, the product expression sequence can fall off the ribosome .

在一些實施例中,一旦oRNA之轉譯經起始,與oRNA結合之核糖體就不會在完成oRNA之至少一輪轉譯之前自oRNA脫離。在一些實施例中,如本文所述之oRNA勝任滾環轉譯。在一些實施例中,在滾環轉譯期間,一旦oRNA之轉譯經起始,與oRNA結合之核糖體就不會在完成oRNA之至少2輪、至少3輪、至少4輪、至少5輪、至少6輪、至少7輪、至少8輪、至少9輪、至少10輪、至少11輪、至少12輪、至少13輪、至少14輪、至少15輪、至少20輪、至少30輪、至少40輪、至少50輪、至少60輪、至少70輪、至少80輪、至少90輪、至少100輪、至少150輪、至少200輪、至少250輪、至少500輪、至少1000輪、至少1500輪、至少2000輪、至少5000輪、至少10000輪、至少10 5輪或至少10 6輪轉譯之前自oRNA脫離。 In some embodiments, once the translation of the oRNA is initiated, the ribosome bound to the oRNA will not disengage from the oRNA before completing at least one round of translation of the oRNA. In some embodiments, the oRNA as described herein is competent for circular translation. In some embodiments, during circular translation, once translation of an oRNA is initiated, a ribosome bound to the oRNA does not dissociate from the oRNA before at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 150, at least 200, at least 250, at least 500, at least 1000, at least 1500, at least 2000, at least 5000, at least 10000, at least 10 5 , or at least 10 6 rounds of translation of the oRNA are completed.

在一些實施例中,oRNA之滾環轉譯導致生成自oRNA之超過一輪轉譯轉譯的多肽。在一些實施例中,oRNA包含交錯元件,且oRNA之滾環轉譯導致生成自oRNA之單輪轉譯或少於單輪轉譯生成的多肽產物。 環化 In some embodiments, circular translation of an oRNA results in the production of a polypeptide resulting from more than one round of translation of the oRNA. In some embodiments, the oRNA comprises a staggered element, and circular translation of the oRNA results in the production of a polypeptide product resulting from a single round of translation of the oRNA or less than a single round of translation.

在一實施例中,線性RNA可經環化或多聯體化。在一些實施例中,線性RNA可在調配及/或遞送之前在 活體外環化。在一些實施例中,線性RNA可在細胞內環化。 In one embodiment, the linear RNA can be circularized or concatemerized. In some embodiments, the linear RNA can be circularized in vitro prior to formulation and/or delivery. In some embodiments, the linear RNA can be circularized in cells.

在一些實施例中,環化或多聯體化機制可藉由至少3種不同之徑發生:1)化學,2)酶,及3)核酶催化。新近形成之5'-/3'-鍵聯可為分子內或分子間的。In some embodiments, the cyclization or concatemerization mechanism can occur by at least 3 different pathways: 1) chemical, 2) enzymatic, and 3) ribozyme catalysis. The newly formed 5'-/3'-linkage can be intramolecular or intermolecular.

在第一種路徑中,核酸之5'端及3'端含有化學反應性基團,當該等基團緊挨在一起時,在分子之5'端與3'端之間形成新共價鍵聯。5'端可含有NHS-酯反應性基團且3'端可含有3'-胺基封端之核苷酸,使得在有機溶劑中,合成mRNA分子之3'端的3'-胺基封端之核苷酸將經歷5'-NHS-酯部分上之親核攻擊,從而形成新5'-/3'-醯胺鍵。In the first pathway, the 5' and 3' ends of the nucleic acid contain chemically reactive groups that, when brought together, form a new covalent bond between the 5' and 3' ends of the molecule. The 5' end may contain an NHS-ester reactive group and the 3' end may contain a 3'-amine terminated nucleotide, such that in an organic solvent, the 3'-amine terminated nucleotide at the 3' end of the synthetic mRNA molecule will undergo nucleophilic attack on the 5'-NHS-ester moiety, thereby forming a new 5'-/3'-amide bond.

在第二種路徑中,T4 RNA連接酶可用於將5'-磷酸化核酸分子酶促連接至核酸之3'-羥基,從而形成新磷酸二酯鍵聯。在一例示性反應中,根據製造商之方案,將^g核酸分子與1-10單位之T4 RNA連接酶(New England Biolabs, Ipswich, MA)在37℃下培育1小時。該接合反應可在能夠與並置之5'-及3'-區域進行鹼基配對的分裂寡核苷酸存在下發生,以輔助酶促接合反應。In the second pathway, T4 RNA ligase can be used to enzymatically ligate a 5'-phosphorylated nucleic acid molecule to a 3'-hydroxyl group of a nucleic acid, thereby forming a new phosphodiester bond. In an exemplary reaction, 2 g of nucleic acid molecule is incubated with 1-10 units of T4 RNA ligase (New England Biolabs, Ipswich, MA) at 37°C for 1 hour according to the manufacturer's protocol. The ligation reaction can occur in the presence of a split oligonucleotide capable of base pairing with the juxtaposed 5'- and 3'-regions to assist the enzymatic ligation reaction.

在第三種路徑中,cDNA模板之5'端或3'端編碼連接酶核酶序列,使得在活體外轉錄期間,所得核酸分子可含有能夠將核酸分子之5'端接合至核酸分子之3 '端的活性核酶序列。該連接酶核酶可源自組I內含子、組I內含子、D型肝炎病毒、髮夾核酶,或可藉由SELEX (指數富集配位體之系統進化)加以選擇。核酶連接酶反應可在0與37℃之間的溫度下耗時1至24小時。In the third approach, the 5' or 3' end of the cDNA template encodes a ligase ribozyme sequence, so that during in vitro transcription, the resulting nucleic acid molecule may contain an active ribozyme sequence capable of joining the 5' end of a nucleic acid molecule to the 3' end of a nucleic acid molecule. The ligase ribozyme may be derived from a group I intron, a group I intron, hepatitis D virus, a hairpin ribozyme, or may be selected by SELEX (systematic evolution of ligands by exponential enrichment). The ribozyme ligase reaction may take from 1 to 24 hours at a temperature between 0 and 37°C.

在一些實施例中,oRNA經由線性RNA之環化製得。In some embodiments, oRNA is produced by circularization of linear RNA.

在一些實施例中,以下元件可操作地彼此連接且在一些實施例中,依以下次序排列:a.) 5'同源臂,b.)含有3'剪接位點二核苷酸之3'組I內含子片段,c.)蛋白質編碼或非編碼區,d.)含有5'剪接位點二核苷酸之5'組I內含子片段,及e.) 3'同源臂。在某些實施例中,該載體允許產生在真核細胞內部可轉譯及/或具有生物活性之環狀RNA。在一些實施例中,生物活性RNA為例如miRNA海綿或長非編碼RNA。In some embodiments, the following elements are operably linked to each other and, in some embodiments, are arranged in the following order: a.) 5' homology arm, b.) 3' group I intron fragment containing 3' splice site dinucleotide, c.) protein coding or non-coding region, d.) 5' group I intron fragment containing 5' splice site dinucleotide, and e.) 3' homology arm. In some embodiments, the vector allows the production of circular RNA that is translatable and/or biologically active inside eukaryotic cells. In some embodiments, the biologically active RNA is, for example, a miRNA sponge or a long non-coding RNA.

在一些實施例中,以下元件可操作地彼此連接且依以下次序排列:a.) 5'同源臂,b.)含有3'剪接位點二核苷酸之3'組I內含子片段,c.)視情況,5'間隔序列,d.)視情況,內部核糖體進入位點(IRES),e.)蛋白質編碼或非編碼區,f.)視情況,3'間隔序列,g.)含有5'剪接位點二核苷酸之5'組I內含子片段,及h.) 3'同源臂。在某些實施例中,該載體允許產生在真核細胞內部可轉譯及/或具有生物活性之環狀RNA。In some embodiments, the following elements are operably linked to each other and arranged in the following order: a.) 5' homology arm, b.) 3' group I intron fragment containing 3' splice site dinucleotide, c.) optionally, 5' spacer sequence, d.) optionally, internal ribosome entry site (IRES), e.) protein coding or non-coding region, f.) optionally, 3' spacer sequence, g.) 5' group I intron fragment containing 5' splice site dinucleotide, and h.) 3' homology arm. In certain embodiments, the vector allows the production of circular RNA that is translatable and/or biologically active inside eukaryotic cells.

在一些實施例中,以下元件可操作地彼此連接且依以下次序排列:a.) 5'同源臂,b.)含有3'剪接位點二核苷酸之3'組I內含子片段,c.) 5'間隔序列,d.)內部核糖體進入位點(IRES),e.)蛋白質編碼或非編碼區,f.)含有5'剪接位點二核苷酸之5'組I內含子片段,及g.) 3'同源臂。在一些實施例中,該載體允許產生在真核細胞內部可轉譯及/或具有生物活性之環狀RNA。In some embodiments, the following elements are operably linked to each other and arranged in the following order: a.) 5' homology arm, b.) 3' group I intron fragment containing 3' splice site dinucleotide, c.) 5' spacer sequence, d.) internal ribosome entry site (IRES), e.) protein coding or non-coding region, f.) 5' group I intron fragment containing 5' splice site dinucleotide, and g.) 3' homology arm. In some embodiments, the vector allows the production of circular RNA that is translatable and/or biologically active inside eukaryotic cells.

在一些實施例中,以下元件可操作地彼此連接且依以下次序排列:a.) 5'同源臂,b.)含有3'剪接位點二核苷酸之3'組I內含子片段,c.) 5'間隔序列,d.)蛋白質編碼或非編碼區,e.) 3'間隔序列,f.)含有5'剪接位點二核苷酸之5'組I內含子片段,及g.) 3'同源臂。在一些實施例中,該載體允許產生在真核細胞內部可轉譯及/或具有生物活性之環狀RNA。In some embodiments, the following elements are operably linked to each other and arranged in the following order: a.) 5' homology arm, b.) 3' group I intron fragment containing 3' splice site dinucleotide, c.) 5' spacer sequence, d.) protein coding or non-coding region, e.) 3' spacer sequence, f.) 5' group I intron fragment containing 5' splice site dinucleotide, and g.) 3' homology arm. In some embodiments, the vector allows the production of circular RNA that is translatable and/or biologically active inside eukaryotic cells.

在一些實施例中,以下元件可操作地彼此連接且依以下次序排列:a.) 5'同源臂,b.)含有3'剪接位點二核苷酸之3'組I內含子片段,c.)內部核糖體進入位點(IRES),d.)蛋白質編碼或非編碼區,e.) 3'間隔序列,f)含有5'剪接位點二核苷酸之5'組I內含子片段,及g.) 3'同源臂。在一些實施例中,該載體允許產生在真核細胞內部可轉譯及/或具有生物活性之環狀RNA。In some embodiments, the following elements are operably linked to each other and arranged in the following order: a.) 5' homology arm, b.) 3' group I intron fragment containing 3' splice site dinucleotide, c.) internal ribosome entry site (IRES), d.) protein coding or non-coding region, e.) 3' spacer sequence, f) 5' group I intron fragment containing 5' splice site dinucleotide, and g.) 3' homology arm. In some embodiments, the vector allows the production of circular RNA that is translatable and/or biologically active inside eukaryotic cells.

在一些實施例中,以下元件可操作地彼此連接且依以下次序排列:a.) 5'同源臂,b.)含有3'剪接位點二核苷酸之3'組I內含子片段,c.)蛋白質編碼或非編碼區,d.) 3'間隔序列,e.)含有5'剪接位點二核苷酸之5'組I內含子片段,及f.) 3'同源臂。在一些實施例中,該載體允許產生在真核細胞內部可轉譯及/或具有生物活性之環狀RNA。In some embodiments, the following elements are operably linked to each other and arranged in the following order: a.) 5' homology arm, b.) 3' group I intron fragment containing 3' splice site dinucleotide, c.) protein coding or non-coding region, d.) 3' spacer sequence, e.) 5' group I intron fragment containing 5' splice site dinucleotide, and f.) 3' homology arm. In some embodiments, the vector allows the production of circular RNA that is translatable and/or biologically active inside eukaryotic cells.

在一些實施例中,以下元件可操作地彼此連接且依以下次序排列:a.) 5'同源臂,b.)含有3'剪接位點二核苷酸之3'組I內含子片段,c.) 5'間隔序列,d.)蛋白質編碼或非編碼區,e.)含有5'剪接位點二核苷酸之5'組I內含子片段,及f.) 3'同源臂。在一些實施例中,該載體允許產生在真核細胞內部可轉譯及/或具有生物活性之環狀RNA。In some embodiments, the following elements are operably linked to each other and arranged in the following order: a.) 5' homology arm, b.) 3' group I intron fragment containing 3' splice site dinucleotide, c.) 5' spacer sequence, d.) protein coding or non-coding region, e.) 5' group I intron fragment containing 5' splice site dinucleotide, and f.) 3' homology arm. In some embodiments, the vector allows the production of circular RNA that is translatable and/or biologically active inside eukaryotic cells.

在一些實施例中,以下元件可操作地彼此連接且依以下次序排列:a.) 5'同源臂,b.)含有3'剪接位點二核苷酸之3'組I內含子片段,c.)內部核糖體進入位點(IRES),d.)蛋白質編碼或非編碼區,e.)含有5'剪接位點二核苷酸之5'組I內含子片段,及f) 3'同源臂。在一些實施例中,該載體允許產生在真核細胞內部可轉譯及/或具有生物活性之環狀RNA。In some embodiments, the following elements are operably linked to each other and arranged in the following order: a.) 5' homology arm, b.) 3' group I intron fragment containing 3' splice site dinucleotide, c.) internal ribosome entry site (IRES), d.) protein coding or non-coding region, e.) 5' group I intron fragment containing 5' splice site dinucleotide, and f) 3' homology arm. In some embodiments, the vector allows the production of circular RNA that is translatable and/or biologically active inside eukaryotic cells.

在一些實施例中,以下元件可操作地彼此連接且依以下次序排列:a.) 5'同源臂,b.)含有3'剪接位點二核苷酸之3'組I內含子片段,c.) 5'間隔序列,d.)內部核糖體進入位點(IRES),e.)蛋白質編碼或非編碼區,f) 3'間隔序列,g.)含有5'剪接位點二核苷酸之5'組I內含子片段,及h.) 3'同源臂。在一些實施例中,該載體允許產生在真核細胞內部可轉譯及/或具有生物活性之環狀RNA。In some embodiments, the following elements are operably linked to each other and arranged in the following order: a.) 5' homology arm, b.) 3' group I intron fragment containing 3' splice site dinucleotide, c.) 5' spacer sequence, d.) internal ribosome entry site (IRES), e.) protein coding or non-coding region, f) 3' spacer sequence, g.) 5' group I intron fragment containing 5' splice site dinucleotide, and h.) 3' homology arm. In some embodiments, the vector allows the production of circular RNA that is translatable and/or biologically active inside eukaryotic cells.

在一實施例中,3'組I內含子片段及/或5'組I內含子片段來自藍菌魚腥藻屬前驅tRNA-Leu基因或T4噬菌體Td基因。In one embodiment, the 3' group I intron fragment and/or the 5' group I intron fragment is from the cyanobacterium Anabaena pre-tRNA-Leu gene or the T4 phage Td gene.

在一實施例中,3'組I內含子片段及/或5'組I內含子片段來自藍菌魚腥藻屬前驅tRNA-Leu基因。In one embodiment, the 3' group I intron fragment and/or the 5' group I intron fragment is from the cyanobacterium Anabaena pre-tRNA-Leu gene.

在一實施例中,蛋白質編碼區編碼真核或原核起源之蛋白質。在另一實施例中,蛋白質編碼區編碼人類蛋白質或非人類蛋白質。在一些實施例中,蛋白質編碼區編碼一或多種抗體。例如,在一些實施例中,蛋白質編碼區編碼人類抗體。在一實施例中,蛋白質編碼區編碼選自以下之蛋白質:hFIX、SP-B、VEGF-A、人類甲基丙二醯基-CoA變位酶(hMUT)、CFTR、癌症自體抗原及額外基因編輯酶,如Cpf1、鋅指核酸酶(ZFN)及轉錄活化子樣效應子核酸酶(TALEN)。在另一實施例中,蛋白質編碼區編碼用於治療用途之蛋白質。在一實施例中,由蛋白質編碼區編碼之人類抗體為抗HIV抗體。在一實施例中,由蛋白質編碼區編碼之抗體為雙特異性抗體。在一實施例中,雙特異性抗體對CD19及CD22具特異性。在另一實施例中,雙特異性抗體對CD3及CLDN6具特異性。在一實施例中,蛋白質編碼區編碼用於診斷用途之蛋白質。在一實施例中,蛋白質編碼區編碼Gaussia螢光素酶(Gluc)、螢火蟲螢光素酶(Fluc)、增強型綠色螢光蛋白(eGFP)、人類紅血球生成素(hEPO)或Cas9核酸內切酶。In one embodiment, the protein coding region encodes a protein of eukaryotic or prokaryotic origin. In another embodiment, the protein coding region encodes a human protein or a non-human protein. In some embodiments, the protein coding region encodes one or more antibodies. For example, in some embodiments, the protein coding region encodes a human antibody. In one embodiment, the protein coding region encodes a protein selected from the following: hFIX, SP-B, VEGF-A, human methylmalonyl-CoA mutase (hMUT), CFTR, cancer autoantigens, and additional gene editing enzymes, such as Cpf1, zinc finger nucleases (ZFNs), and transcription activator-like effector nucleases (TALENs). In another embodiment, the protein coding region encodes a protein for therapeutic use. In one embodiment, the human antibody encoded by the protein coding region is an anti-HIV antibody. In one embodiment, the antibody encoded by the protein coding region is a bispecific antibody. In one embodiment, the bispecific antibody is specific for CD19 and CD22. In another embodiment, the bispecific antibody is specific for CD3 and CLDN6. In one embodiment, the protein coding region encodes a protein for diagnostic purposes. In one embodiment, the protein coding region encodes Gaussia luciferase (Gluc), firefly luciferase (Fluc), enhanced green fluorescent protein (eGFP), human erythropoietin (hEPO) or Cas9 endonuclease.

在一實施例中,5'同源臂為約5-50個核苷酸長。在另一實施例中,5'同源臂為約9-19個核苷酸長。在一些實施例中,5'同源臂為至少5、6、7、8、9、10、11、12、13、14、15、16、17、18或19個核苷酸長。在一些實施例中,5'同源臂為不超過50、45、40、35、30、25或20個核苷酸長。在一些實施例中,5'同源臂為5、6、7、8、9、10、11、12、13、14、15、16、17、18或19個核苷酸長。In one embodiment, the 5' homology arm is about 5-50 nucleotides long. In another embodiment, the 5' homology arm is about 9-19 nucleotides long. In some embodiments, the 5' homology arm is at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 or 19 nucleotides long. In some embodiments, the 5' homology arm is no more than 50, 45, 40, 35, 30, 25 or 20 nucleotides long. In some embodiments, the 5' homology arm is 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 or 19 nucleotides long.

在一實施例中,3'同源臂為約5-50個核苷酸長。在另一實施例中,3'同源臂為約9-19個核苷酸長。在一些實施例中,3'同源臂為至少5、6、7、8、9、10、11、12、13、14、15、16、17、18或19個核苷酸長。在一些實施例中,3'同源臂為不超過50、45、40、35、30、25或20個核苷酸長。在一些實施例中,3'同源臂為5、6、7、8、9、10、11、12、13、14、15、16、17、18或19個核苷酸長。In one embodiment, the 3' homology arm is about 5-50 nucleotides long. In another embodiment, the 3' homology arm is about 9-19 nucleotides long. In some embodiments, the 3' homology arm is at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 or 19 nucleotides long. In some embodiments, the 3' homology arm is no more than 50, 45, 40, 35, 30, 25 or 20 nucleotides long. In some embodiments, the 3' homology arm is 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 or 19 nucleotides long.

在一實施例中,5'間隔序列為至少10個核苷酸長。在另一實施例中,5'間隔序列為至少15個核苷酸長。在另一實施例中,5'間隔序列為至少30個核苷酸長。在一些實施例中,5'間隔序列為至少7、8、9、10、11、12、13、14、15、16、17、18、19、20、25或30個核苷酸長。在一些實施例中,5'間隔序列為不超過100、90、80、70、60、50、45、40、35或30個核苷酸長。在一些實施例中,5'間隔序列之長度在20與50個核苷酸之間。在某些實施例中,5'間隔序列為10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49或50個核苷酸長。在一實施例中,5'間隔序列為聚A序列。在另一實施例中,5'間隔序列為聚A-C序列。In one embodiment, the 5' spacer sequence is at least 10 nucleotides long. In another embodiment, the 5' spacer sequence is at least 15 nucleotides long. In another embodiment, the 5' spacer sequence is at least 30 nucleotides long. In some embodiments, the 5' spacer sequence is at least 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, or 30 nucleotides long. In some embodiments, the 5' spacer sequence is no more than 100, 90, 80, 70, 60, 50, 45, 40, 35, or 30 nucleotides long. In some embodiments, the length of the 5' spacer sequence is between 20 and 50 nucleotides. In certain embodiments, the 5' spacer sequence is 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length. In one embodiment, the 5' spacer sequence is a poly A sequence. In another embodiment, the 5' spacer sequence is a poly A-C sequence.

在一實施例中,3'間隔序列為至少10個核苷酸長。在另一實施例中,3'間隔序列為至少15個核苷酸長。在另一實施例中,3'間隔序列為至少30個核苷酸長。在一些實施例中,3'間隔序列為至少7、8、9、10、11、12、13、14、15、16、17、18、19、20、25或30個核苷酸長。在一些實施例中,3'間隔序列為不超過100、90、80、70、60、50、45、40、35或30個核苷酸長。在一些實施例中,3'間隔序列之長度在20與50個核苷酸之間。在某些實施例中,3'間隔序列為10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49或50個核苷酸長。在一實施例中,3'間隔序列為聚A序列。在另一實施例中,5'間隔序列為聚A-C序列。 細胞外環化 In one embodiment, the 3' spacer sequence is at least 10 nucleotides long. In another embodiment, the 3' spacer sequence is at least 15 nucleotides long. In another embodiment, the 3' spacer sequence is at least 30 nucleotides long. In some embodiments, the 3' spacer sequence is at least 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, or 30 nucleotides long. In some embodiments, the 3' spacer sequence is no more than 100, 90, 80, 70, 60, 50, 45, 40, 35, or 30 nucleotides long. In some embodiments, the 3' spacer sequence is between 20 and 50 nucleotides long. In certain embodiments, the 3' spacer sequence is 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length. In one embodiment, the 3' spacer sequence is a poly A sequence. In another embodiment, the 5' spacer sequence is a poly AC sequence.

在一些實施例中,使用化學方法將線性RNA環化或多聯體化以形成oRNA。在一些化學方法中,核酸(例如,線性RNA)之5'端及3'端包括化學反應性基團,當該等基團緊挨在一起時,可在分子之5'端與3'端之間形成新共價鍵聯。5'端可含有NHS-酯反應性基團且3'端可含有3'-胺基封端之核苷酸,使得在有機溶劑中,線性RNA之3'端的3'-胺基封端之核苷酸將經歷5'-NHS-酯部分上之親核攻擊,從而形成新5'-/3'-醯胺鍵。In some embodiments, linear RNAs are circularized or concatemerized using chemical methods to form oRNAs. In some chemical methods, the 5' and 3' ends of a nucleic acid (e.g., a linear RNA) include chemically reactive groups that, when brought together, can form new covalent bonds between the 5' and 3' ends of the molecule. The 5' end can contain an NHS-ester reactive group and the 3' end can contain a 3'-amine terminated nucleotide, such that in an organic solvent, the 3'-amine terminated nucleotide at the 3' end of the linear RNA will undergo nucleophilic attack on the 5'-NHS-ester moiety, thereby forming a new 5'-/3'-amide bond.

在一實施例中,DNA或RNA連接酶可用於將5'-磷酸化核酸分子(例如,線性RNA)酶促連接至核酸(例如,線性核酸)之3'-羥基,從而形成新磷酸二酯鍵聯。在一例示性反應中,根據製造商之方案,將線性RNA與1-10單位之T4 RNA連接酶在37C下培育1小時。該接合反應可在能夠與並置之5'-及3'-區域進行鹼基配對的線性核酸存在下發生,以輔助酶促接合反應。在一實施例中,接合為夾板接合,其中單股多核苷酸(夾板,如單股RNA)可經設計以與線性RNA之兩個末端雜交,使得兩個末端可在與單股夾板雜交後並置。因此,夾板連接酶可催化線性RNA之並置的兩個末端之接合,從而生成oRNA。In one embodiment, a DNA or RNA ligase can be used to enzymatically ligate a 5'-phosphorylated nucleic acid molecule (e.g., a linear RNA) to a 3'-hydroxyl group of a nucleic acid (e.g., a linear nucleic acid), thereby forming a new phosphodiester bond. In an exemplary reaction, the linear RNA is incubated with 1-10 units of T4 RNA ligase at 37C for 1 hour according to the manufacturer's protocol. The ligation reaction can occur in the presence of a linear nucleic acid capable of base pairing with the juxtaposed 5'- and 3'-regions to assist the enzymatic ligation reaction. In one embodiment, the ligation is a splint ligation, in which a single-stranded polynucleotide (the splint, such as a single-stranded RNA) can be designed to hybridize with both ends of the linear RNA so that the two ends can be juxtaposed after hybridization with the single-stranded splint. Thus, splint ligase catalyzes the joining of two juxtaposed ends of a linear RNA to generate oRNA.

在一實施例中,DNA或RNA連接酶可用於oRNA之合成。作為非限制性實例,連接酶可為環狀連接酶(circ ligase/circular ligase)。In one embodiment, DNA or RNA ligase can be used for the synthesis of oRNA. As a non-limiting example, the ligase can be a circ ligase (circ ligase/circular ligase).

在一實施例中,線性RNA之5’端或3'端可編碼連接酶核酶序列,使得在活體外轉錄期間,所得線性RNA包括能夠將線性RNA之5'端接合至線性RNA之3'端的活性核酶序列。該連接酶核酶可源自組I內含子、D型肝炎病毒、髮夾核酶,或可藉由SELEX (指數富集配位體之系統進化)加以選擇。In one embodiment, the 5' or 3' end of the linear RNA may encode a ligase ribozyme sequence, such that during in vitro transcription, the resulting linear RNA includes an active ribozyme sequence capable of ligating the 5' end of the linear RNA to the 3' end of the linear RNA. The ligase ribozyme may be derived from a group I intron, hepatitis D virus, a hairpin ribozyme, or may be selected by SELEX (systematic evolution of ligands by exponential enrichment).

在一實施例中,可藉由使用至少一種非核酸部分將線性RNA環化或多聯體化。在一態樣中,該至少一種非核酸部分可與線性RNA之5'末端附近及/或3'末端附近的區域或特徵反應,以便將線性RNA環化或多聯體化。在另一態樣中,該至少一種非核酸部分可位於線性RNA之5'末端及/或3'末端,或連接至線性RNA之5'末端及/或3'末端,或位於線性RNA之5'末端及/或3'末端附近。所考慮之非核酸部分可為同源或異源的。作為非限制性實例,該非核酸部分可為鍵聯,諸如疏水性鍵聯、離子鍵聯、生物可降解鍵聯及/或可裂解鍵聯。作為另一非限制性實例,該非核酸部分為接合部分。作為另一非限制性實例,該非核酸部分可為寡核苷酸或肽部分,諸如本文所述之適體或非核酸連接體。In one embodiment, a linear RNA can be circularized or concatemerized by using at least one non-nucleic acid portion. In one aspect, the at least one non-nucleic acid portion can react with a region or feature near the 5' end and/or near the 3' end of the linear RNA to circularize or concatemerize the linear RNA. In another aspect, the at least one non-nucleic acid portion can be located at the 5' end and/or 3' end of the linear RNA, or connected to the 5' end and/or 3' end of the linear RNA, or located near the 5' end and/or 3' end of the linear RNA. The non-nucleic acid portion considered can be homologous or heterologous. As a non-limiting example, the non-nucleic acid portion can be a linkage, such as a hydrophobic linkage, an ionic linkage, a biodegradable linkage, and/or a cleavable linkage. As another non-limiting example, the non-nucleic acid portion is a binding portion. As another non-limiting example, the non-nucleic acid portion can be an oligonucleotide or peptide portion, such as an aptamer or non-nucleic acid linker described herein.

在一實施例中,線性RNA可由於非核酸部分而經環化或多聯體化,該非核酸部分引起線性RNA之5'及3'端、附近或所連接之原子、分子表面之間的吸引力。作為非限制性實例,可藉由分子間力或分子內力將一或多種線性RNA環化或多聯體化。分子間力之非限制性實例包括偶極-偶極力、偶極誘導之偶極力、誘導偶極誘導之偶極力、凡得瓦力及倫敦分散力。分子內力之非限制性實例包括共價鍵、金屬鍵、離子鍵、共振鍵、不可知鍵、偶極鍵、共軛、超共軛及反鍵。In one embodiment, a linear RNA may be cyclized or concatemerized due to a non-nucleic acid portion that causes attractive forces between the 5' and 3' ends of the linear RNA, nearby or attached atoms, or molecular surfaces. As non-limiting examples, one or more linear RNAs may be cyclized or concatemerized by intermolecular forces or intramolecular forces. Non-limiting examples of intermolecular forces include dipole-dipole forces, dipole-induced dipole forces, induced dipole-induced dipole forces, van der Waals forces, and London dispersion forces. Non-limiting examples of intramolecular forces include covalent bonds, metallic bonds, ionic bonds, resonant bonds, agnostic bonds, dipole bonds, conjugation, hyperconjugation, and antibonding.

在一實施例中,線性RNA可包含5’末端附近及3'末端附近之核酶RNA序列。當核酶RNA序列暴露於核酶之剩餘部分時,該序列可共價連接至肽。在一態樣中,在5'末端及3'末端附近共價連接至核酶RNA序列之肽可彼此締合,導致線性RNA環化或多聯體化。在另一態樣中,在5'末端及3'末端附近共價連接至核酶RNA之肽可在使用此項技術中已知之各種方法(諸如但不限於蛋白質接合)進行接合後引起線性RNA環化或多聯體化。In one embodiment, the linear RNA may include a ribozyme RNA sequence near the 5' end and near the 3' end. When the ribozyme RNA sequence is exposed to the remaining portion of the ribozyme, the sequence can be covalently linked to a peptide. In one aspect, the peptides covalently linked to the ribozyme RNA sequence near the 5' end and the 3' end can be attached to each other, resulting in circularization or multimerization of the linear RNA. In another aspect, the peptides covalently linked to the ribozyme RNA near the 5' end and the 3' end can cause the linear RNA to be circularized or multimerized after being attached using various methods known in the art (such as but not limited to protein attachment).

在一些實施例中,線性RNA可包括核酸之5'三磷酸,例如藉由使5'三磷酸與RNA 5'焦磷酸水解酶(RppH)或ATP二磷酸水解酶(腺苷三磷酸雙磷酸酶)接觸而轉化為5'單磷酸。或者,可藉由兩步驟反應將線性RNA之5'三磷酸轉化為5'單磷酸,該兩步驟反應包括:(a)使線性RNA之5'核苷酸與磷酸酶(例如,熱敏磷酸酶、蝦鹼性磷酸酶或小牛腸磷酸酶)接觸以移除所有三種磷酸鹽;(b)使步驟(a)後之5'核苷酸與添加單一磷酸之激酶(例如,多核苷酸激酶)接觸。In some embodiments, the linear RNA may include a 5' triphosphate of the nucleic acid, which is converted to a 5' monophosphate, for example, by contacting the 5' triphosphate with RNA 5' pyrophosphohydrolase (RppH) or ATP diphosphohydrolase (adenosine triphosphate diphosphatase). Alternatively, the 5' triphosphate of the linear RNA may be converted to a 5' monophosphate by a two-step reaction comprising: (a) contacting the 5' nucleotide of the linear RNA with a phosphatase (e.g., a thermosensitive phosphatase, a phosphatase, or a calf intestinal phosphatase) to remove all three phosphates; (b) contacting the 5' nucleotide after step (a) with a kinase (e.g., a polynucleotide kinase) that adds a single phosphate.

在一些實施例中,可使用WO2017222911及WO2016197121中所述之方法將RNA環化,各案之內容以引用之方式整體併入本文中。In some embodiments, RNA can be circularized using the methods described in WO2017222911 and WO2016197121, the contents of each of which are incorporated herein by reference in their entirety.

在一些實施例中,可例如藉由非哺乳動物外源性內含子之反向剪接或線性RNA之5'及3'端的夾板接合將RNA環化。在一實施例中,環狀RNA由編碼欲製成環狀之標靶RNA的重組核酸產生。作為非限制性實例,該方法包括:a)產生編碼欲製成環狀之標靶RNA的重組核酸,其中該重組核酸依5'至3'次序包含:i)外源內含子中包含3'剪接位點之3'部分,ii)編碼標靶RNA之核酸序列,及iii)外源內含子中包含5'剪接位點之5'部分;b)執行轉錄,由此自重組核酸產生RNA;及c)執行RNA之剪接,由此將RNA環化以產生oRNA。In some embodiments, RNA can be circularized, for example, by back splicing of non-mammalian exogenous introns or splint ligation of the 5' and 3' ends of linear RNA. In one embodiment, circular RNA is produced by a recombinant nucleic acid encoding a target RNA to be made circular. As a non-limiting example, the method includes: a) generating a recombinant nucleic acid encoding a target RNA to be made circular, wherein the recombinant nucleic acid comprises, in 5' to 3' order: i) a 3' portion of an exogenous intron comprising a 3' splice site, ii) a nucleic acid sequence encoding the target RNA, and iii) a 5' portion of an exogenous intron comprising a 5' splice site; b) performing transcription, thereby generating RNA from the recombinant nucleic acid; and c) performing splicing of the RNA, thereby circularizing the RNA to generate oRNA.

雖然不希望受理論束縛,但用外源內含子生成之環狀RNA由免疫系統識別為「非自體」且觸發先天免疫反應。另一方面,用內源內含子生成之環狀RNA由免疫系統識別為「自體」且一般不會引起先天免疫反應,即使攜帶包含外源RNA之外顯子。Although not wishing to be bound by theory, circular RNA produced with foreign introns is recognized by the immune system as "non-self" and triggers an innate immune response. On the other hand, circular RNA produced with endogenous introns is recognized by the immune system as "self" and generally does not elicit an innate immune response, even if it carries exons containing foreign RNA.

因此,可用內源或外源內含子生成環狀RNA,以視需要控制免疫學自體/非自體區分。來自多種生物體及病毒之多種內含子序列為已知的,且包括源自編碼蛋白質、核糖體RNA (rRNA)或轉移RNA (tRNA)之基因的序列。Thus, circular RNAs can be generated using endogenous or exogenous introns to control immunological self/non-self distinction as desired. A variety of intron sequences from a variety of organisms and viruses are known, and include sequences derived from genes encoding proteins, ribosomal RNA (rRNA), or transfer RNA (tRNA).

環狀RNA可以多種方式由線性RNA產生。在一些實施例中,藉由將下游5'剪接位點(剪接供體)反向剪接至上游3'剪接位點(剪接受體),由線性RNA產生環狀RNA。可藉由任何非哺乳動物剪接方法以此方式生成環狀RNA。例如,含有各種類型之內含子(包括自剪接組I內含子、自剪接組II內含子、剪接體內含子及tRNA內含子)的線性RNA均可經環化。詳言之,組I及組II內含子具有如下優勢:其可容易地用於在活體外以及活體內產生環狀RNA,因為它們由於其自催化核酶活性而能夠進行自剪接。Circular RNA can be produced from linear RNA in a variety of ways. In some embodiments, circular RNA is produced from linear RNA by back splicing the downstream 5' splice site (splicing donor) to the upstream 3' splice site (splicing acceptor). Circular RNA can be generated in this way by any non-mammalian splicing method. For example, linear RNA containing various types of introns (including self-splicing group I introns, self-splicing group II introns, spliceosomal introns and tRNA introns) can be circularized. In detail, group I and group II introns have the following advantages: they can be easily used to produce circular RNA in vitro and in vivo because they can perform self-splicing due to their self-catalytic ribozyme activity.

在一些實施例中,可藉由RNA之5'及3'端的化學或酶接合由線性RNA在活體外產生環狀RNA。可例如使用溴化氰(BrCN)或乙基-3-(3'-二甲基胺基丙基)碳化二亞胺(EDC)來執行化學接合以活化核苷酸磷酸單酯基團,從而允許磷酸二酯鍵形成。參見例如Sokolova (1988) FEBS Lett 232: 153-155;Dolinnaya等人 (1991) Nucleic Acids Res., 19:3067-3072;Fedorova (1996) Nucleosides Nucleotides Nucleic Acids 15: 1 137-1 147;以引用之方式併入本文中。或者,可使用酶接合來環化RNA。可使用之例示性連接酶包括T4 DNA連接酶(T4 Dnl)、T4 RNA連接酶1 (T4 Rnl 1)及T4 RNA連接酶2 (T4 Rnl 2)。In some embodiments, circular RNA can be generated in vitro from linear RNA by chemical or enzymatic ligation of the 5' and 3' ends of the RNA. Chemical ligation can be performed, for example, using cyanogen bromide (BrCN) or ethyl-3-(3'-dimethylaminopropyl)carbodiimide (EDC) to activate the nucleotide phosphate monoester groups, thereby allowing phosphodiester bond formation. See, for example, Sokolova (1988) FEBS Lett 232: 153-155; Dolinnaya et al. (1991) Nucleic Acids Res., 19:3067-3072; Fedorova (1996) Nucleosides Nucleotides Nucleic Acids 15: 1 137-1 147; incorporated herein by reference. Alternatively, enzymatic ligation can be used to circularize the RNA. Exemplary ligases that can be used include T4 DNA ligase (T4 Dn1), T4 RNA ligase 1 (T4 Rn1 1), and T4 RNA ligase 2 (T4 Rn1 2).

在一些實施例中,使用與線性RNA之兩端雜交的寡核苷酸夾板之夾板接合可用於將線性RNA之末端結合在一起以進行接合。夾板可為DNA或RNA,其雜交對RNA末端之5'-磷酸及3' -OH進行定向以進行接合。可使用如上文所述之化學或酶促技術來執行後續接合。例如,可用T4 DNA連接酶(需要DNA夾板)、T4 RNA連接酶1 (需要RNA夾板)或T4 RNA連接酶2 (DNA或RNA夾板)執行酶接合。若雜交之夾板-RNA複合物的結構干擾酶活性,則在一些情況下,化學接合(諸如與BrCN或EDC)比酶接合更有效。In some embodiments, splint ligation using an oligonucleotide splint that hybridizes to both ends of a linear RNA can be used to bring the ends of a linear RNA together for ligation. The splint can be DNA or RNA, with hybridization orienting the 5'-phosphate and 3'-OH at the RNA termini for ligation. Subsequent ligation can be performed using chemical or enzymatic techniques as described above. For example, enzymatic ligation can be performed using T4 DNA ligase (requires a DNA splint), T4 RNA ligase 1 (requires an RNA splint), or T4 RNA ligase 2 (DNA or RNA splint). If the structure of the hybridized splint-RNA complex interferes with enzyme activity, in some cases chemical ligation (such as with BrCN or EDC) is more effective than enzymatic ligation.

在一些實施例中,oRNA可進一步包含可操作地連接至編碼多肽之RNA序列的內部核糖體進入位點(IRES)。IRES之包括允許自環狀RNA轉譯一或多個開放閱讀框。該IRES元件吸引真核生物核糖體轉譯起始複合物且促進轉譯起始。參見例如Kaufman等人, Nuc. Acids Res.(1991) 19:4485-4490;Gurtu等人, Biochem. Biophys. Res.Comm.(1996) 229:295-298;Rees等人, BioTechniques (1996) 20: 102-110;Kobayashi等人, BioTechniques (1996) 21 :399-402;及Mosser等人, BioTechniques 1997 22 150-161)。In some embodiments, the oRNA may further comprise an internal ribosome entry site (IRES) operably linked to the RNA sequence encoding the polypeptide. The inclusion of an IRES allows one or more open reading frames to be translated from the circular RNA. The IRES element attracts the eukaryotic ribosome translation initiation complex and promotes translation initiation. See, e.g., Kaufman et al., Nuc. Acids Res. (1991) 19:4485-4490; Gurtu et al., Biochem. Biophys. Res. Comm. (1996) 229:295-298; Rees et al., BioTechniques (1996) 20:102-110; Kobayashi et al., BioTechniques (1996) 21:399-402; and Mosser et al., BioTechniques 1997 22 150-161).

在一些實施例中,本文所提供之環化方法的環化效率為至少約10%、至少約15%、至少約20%、至少約25%、至少約30%、至少約35%、至少約40%、至少約45%、至少約50%、至少約60%、至少約70%、至少約80%、至少約90%、至少約95%或100%。在一些實施例中,本文所提供之環化方法的環化效率為至少約40%。 剪接元件 In some embodiments, the cyclization efficiency of the cyclization methods provided herein is at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 95%, or 100%. In some embodiments, the cyclization efficiency of the cyclization methods provided herein is at least about 40%. Splicing elements

在一些實施例中,oRNA包括至少一種剪接元件。剪接元件可為可介導oRNA剪接之完整剪接元件,或剪接元件可為來自完成剪接事件之殘餘剪接元件。例如,在一些情況下,線性RNA之剪接元件可介導導致線性RNA環化之剪接事件,由此所得oRNA包含來自此類剪接介導之環化事件的殘餘剪接元件。在一些情況下,殘餘剪接元件不能介導任何剪接。在其他情況下,殘餘剪接元件在某些情況下仍可介導剪接。在一些實施例中,剪接元件鄰近至少一個表現序列。在一些實施例中,oRNA包括鄰近每個表現序列之剪接元件。在一些實施例中,剪接元件在每個表現序列之一側或兩側,從而導致表現產物(例如肽及或多肽)之分離。In some embodiments, the oRNA includes at least one splicing element. The splicing element may be a complete splicing element that can mediate oRNA splicing, or the splicing element may be a residual splicing element from a completed splicing event. For example, in some cases, the splicing element of a linear RNA can mediate a splicing event that causes the linear RNA to be circularized, and the resulting oRNA includes residual splicing elements from such splicing-mediated circularization events. In some cases, the residual splicing element cannot mediate any splicing. In other cases, the residual splicing element can still mediate splicing in certain cases. In some embodiments, the splicing element is adjacent to at least one expression sequence. In some embodiments, the oRNA includes splicing elements adjacent to each expression sequence. In some embodiments, splicing elements flank one or both sides of each expression sequence, thereby resulting in separation of the expression products (e.g., peptides and or polypeptides).

在一些實施例中,oRNA包括內部剪接元件,當複製時,剪接末端接合在一起。一些實例可包括具有剪接位點序列及短反向重複(30-40 nt)之微型內含子(<100 nt),諸如AluSq2、AluJr及AluSz、側接內含子中之反向序列、側接內含子中之Alu元件以及在(suptable4富集模體)接近反向剪接事件之順式序列元件中發現的模體,諸如具有側接外顯子之反向剪接位點之前(上游)或之後(下游)的200 bp序列。在一些實施例中,oRNA包括本文中別處所述之至少一個重複核苷酸序列,作為內部剪接元件。在此類實施例中,重複核苷酸序列可包括來自內含子Alu家族之重複序列。參見例如美國專利第11,058,706號。In some embodiments, the oRNA includes an internal splicing element, and when replicated, the splice ends are joined together. Some examples may include mini-introns (<100 nt) with splice site sequences and short inverted repeats (30-40 nt), such as AluSq2, AluJr, and AluSz, inverted sequences in flanking introns, Alu elements in flanking introns, and motifs found in cis sequence elements close to the reverse splicing event (suptable4 enriched motifs), such as 200 bp sequences before (upstream) or after (downstream) the reverse splicing site with flanking exons. In some embodiments, the oRNA includes at least one repetitive nucleotide sequence described elsewhere herein as an internal splicing element. In such embodiments, the repetitive nucleotide sequence may include a repetitive sequence from the intronic Alu family. See, e.g., U.S. Patent No. 11,058,706.

在一些實施例中,oRNA可包括側接oRNA之頭尾接合處之規範剪接位點。In some embodiments, an oRNA can include canonical splice sites flanking the head-to-tail junction of the oRNA.

在一些實施例中,oRNA可包括凸起-螺旋-凸起模體,其包含側接兩個3-核苷酸凸起之4-鹼基對莖。裂解發生於凸起區域中之位點處,從而生成具有末端5'-羥基及2', 3'-環狀磷酸酯基之特徵片段。環化藉由5'-OH基團對同一分子之2', 3'-環狀磷酸酯基之親核攻擊來進行,從而形成3', 5'-磷酸二酯橋。In some embodiments, the oRNA may include a bulge-helix-bulge motif comprising a 4-basic stem flanked by two 3-nucleotide bulges. Cleavage occurs at a site in the bulge region, generating a characteristic fragment with a terminal 5'-hydroxyl group and a 2', 3'-cyclic phosphate group. Cyclization occurs by nucleophilic attack of the 5'-OH group on the 2', 3'-cyclic phosphate group of the same molecule, forming a 3', 5'-phosphodiester bridge.

在一些實施例中,oRNA可包括介導自接合之序列。可介導自接合之序列的非限制性實例包括自環化內含子,例如5'及3'剪接接合處;或自環化催化內含子,諸如組I、組II或組III內含子。組I內含子自剪接序列之非限制性實例可包括源自T4噬菌體基因td之自剪接排列內含子-外顯子序列,及四膜蟲之間插序列(IVS) rRNA。 其他環化方法 In some embodiments, the oRNA may include a sequence that mediates self-joining. Non-limiting examples of sequences that may mediate self-joining include self-circularizing introns, such as 5' and 3' splice junctions; or self-circularizing catalytic introns, such as group I, group II, or group III introns. Non-limiting examples of group I intron self-splicing sequences may include self-splicing arranged intron-exon sequences derived from the T4 bacteriophage gene td, and the intervening sequence (IVS) rRNA of Tetrahymena. Other Circularization Methods

在一些實施例中,線性RNA可包括互補序列,包括單一內含子內或跨側接內含子之重複或非重複核酸序列。在一些實施例中,oRNA包括重複核酸序列。在一些實施例中,重複核苷酸序列包括聚CA或聚UG序列。在一些實施例中,oRNA包括至少一個重複核酸序列,該序列與oRNA之另一區段中的互補重複核酸序列雜交,其中雜交之區段形成內部雙股。在一些實施例中,來自兩種單獨oRNA之重複核酸序列及互補重複核酸序列雜交生成單一oRNA,其中雜交之區段形成內部雙股。在一些實施例中,在線性RNA之5'及3'端發現互補序列。在一些實施例中,互補序列包括約3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、35、40、45、50、55、60、65、70、75、80、85、90、95、100個或更多配對核苷酸。In some embodiments, linear RNA may include complementary sequences, including repetitive or non-repetitive nucleic acid sequences within or across a single intron. In some embodiments, oRNA includes repetitive nucleic acid sequences. In some embodiments, the repetitive nucleotide sequence includes a poly-CA or poly-UG sequence. In some embodiments, oRNA includes at least one repetitive nucleic acid sequence, which is hybridized with a complementary repetitive nucleic acid sequence in another segment of the oRNA, wherein the hybridized segment forms an internal double strand. In some embodiments, repetitive nucleic acid sequences and complementary repetitive nucleic acid sequences from two separate oRNAs are hybridized to generate a single oRNA, wherein the hybridized segment forms an internal double strand. In some embodiments, complementary sequences are found at the 5' and 3' ends of the linear RNA. In some embodiments, the complementary sequence includes about 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more paired nucleotides.

在一些實施例中,可使用環化之化學方法來生成oRNA。此類方法可包括但不限於點擊化學(例如,基於炔烴及基於疊氮化物之方法,或可點擊鹼)、烯烴複分解、胺基磷酸酯接合、半胺縮醛-亞胺交聯、鹼基修飾及其任何組合。在一些實施例中,可使用環化之酶促方法來生成oRNA。在一些實施例中,可使用接合酶(例如,DNA或RNA連接酶)來生成oRNA或補體、oRNA互補股或oRNA之模板。In some embodiments, chemical methods of cyclization can be used to generate oRNA. Such methods may include, but are not limited to, click chemistry (e.g., alkyne-based and azide-based methods, or bases can be clicked), olefin metathesis, aminophosphoester ligation, hemiamine acetal-imine cross-linking, base modification, and any combination thereof. In some embodiments, enzymatic methods of cyclization can be used to generate oRNA. In some embodiments, a ligase (e.g., DNA or RNA ligase) can be used to generate an oRNA or a complement, an oRNA complement, or a template for an oRNA.

如例如2013年9月3日申請之美國臨時申請案第61/873,010號或美國專利第10,709,779號中所教示之任何環狀多核苷酸均可在本文中使用。此等參考文獻之內容以引用之方式整體併入本文中。此外,以下出版物中所描述之環狀RNA、用於製備環狀RNA之方法、環狀RNA組合物中的任一者均在本文中加以考慮且以引用之方式整體併入,作為本說明書之一部分:美國專利US 11,352,640、US 11,352,641、US 11,203,767、US 10,683,498、US 5,773,244及US 5,766,903;美國申請公開案US 2022/0177540、US 2021/0371494、US 2022/0090137、US 2019/0345503及US 2015/0299702;以及PCT申請公開案WO 2021/226597、WO 2019/236673、WO 2017/222911、WO2016/187583、WO2014/082644及WO 1997/007825。 K. 套組 Any circular polynucleotides as taught in, for example, U.S. Provisional Application No. 61/873,010 filed on September 3, 2013 or U.S. Patent No. 10,709,779 can be used herein. The contents of these references are incorporated herein by reference in their entirety. In addition, any of the circular RNAs, methods for preparing circular RNAs, and circular RNA compositions described in the following publications are contemplated herein and are incorporated by reference in their entirety as part of this specification: U.S. Patents US 11,352,640, US 11,352,641, US 11,203,767, US 10,683,498, US 5,773,244, and US 5,766,903; U.S. Application Publications US 2022/0177540, US 2021/0371494, US 2022/0090137, US 2019/0345503, and US 2015/0299702; and PCT Application Publication WO 2021/226597, WO 2019/236673, WO 2017/222911, WO 2016/187583, WO 2014/082644 and WO 1997/007825. K. Kit

亦提供包含如本文所述之經工程改造之逆轉錄子(例如,經工程改造之核酸構築體或經工程改造之核酸-酶構築體)的套組。Also provided are kits comprising an engineered retrotransposons (eg, engineered nucleic acid constructs or engineered nucleic acid-enzyme constructs) as described herein.

在一些實施例中,該套組提供經工程改造之逆轉錄子構築體或包含此類逆轉錄子構築體之載體系統。在一些實施例中,套組中所包括之經工程改造之逆轉錄子構築體包含能夠向細胞提供編碼相關蛋白質或調節RNA之核酸的異源序列、細胞條碼、適合用於基因編輯( 例如,藉由同源定向修復(HDR)或重組介導之基因工程改造(重組工程))之供體多核苷酸或用於分子記錄之CRISPR原間隔基DNA序列。其他劑亦可包括於套組中,諸如轉染劑、宿主細胞、用於培養細胞之合適培養基、緩衝液及其類似物。 In some embodiments, the kit provides engineered retrotranscript constructs or vector systems comprising such retrotranscript constructs. In some embodiments, the engineered retrotranscript constructs included in the kit include heterologous sequences capable of providing nucleic acids encoding proteins of interest or regulatory RNAs to cells, cell barcodes, donor polynucleotides suitable for gene editing ( e.g. , by homology-directed repair (HDR) or recombination-mediated genetic engineering (recombination engineering)), or CRISPR protospacer DNA sequences for molecular recording. Other agents may also be included in the kit, such as transfection agents, host cells, suitable media for culturing cells, buffers, and the like.

在套組之背景下,劑可以液體或固體形式在任何便利封裝( 例如,條狀包裝、劑量包裝 )中提供。套組之劑可存在於同一或單獨容器中。套組可在一或多個容器中含有本文所述之任何一或多種組分。該等組分可無菌製備,經封裝於注射器中且冷藏運輸。或者,它可容納於小瓶或其他容器中用於儲存。第二容器可具有無菌製備之其他組分。或者,套組可包括預混合且在小瓶、管或其他容器中運輸之活性劑。 In the context of a kit, the agent may be provided in liquid or solid form in any convenient package ( e.g. , a strip pack, a dose pack , etc. ). The agents of the kit may be present in the same or separate containers. The kit may contain any one or more of the components described herein in one or more containers. The components may be prepared aseptically, packaged in a syringe and shipped refrigerated. Alternatively, it may be contained in a vial or other container for storage. A second container may have other components prepared aseptically. Alternatively, the kit may include active agents premixed and shipped in a vial, tube, or other container.

套組可具有多種形式,諸如泡罩袋、收縮包裹袋、真空密封袋、可密封熱成型托盤或類似袋或托盤形式,其中附件鬆散地包裝於袋、一或多種管、容器、盒或袋子內。可在添加附件之後對套組進行滅菌,由此允許以其他方式解開容器中之個別附件。可使用任何適當滅菌技術對套組進行滅菌,諸如輻射滅菌、熱滅菌或此項技術中已知之其他滅菌方法。取決於特定應用,套組亦可包括其他組分,例如容器、細胞培養基、鹽、緩衝液、試劑、注射器、針、用於應用或移除消毒劑之織物(諸如紗布)、一次性手套、在投與之前用於劑之支撐物等。本揭示案之一些態樣提供包含核酸構築體之套組,該核酸構築體包含編碼本文所述之基於逆轉錄子之編輯系統的各種組分之核苷酸序列。The kit may have a variety of forms, such as a blister bag, shrink wrap bag, vacuum sealed bag, sealable thermoformed tray or similar bag or tray form, wherein the accessories are loosely packaged in a bag, one or more tubes, containers, boxes or bags. The kit may be sterilized after the accessories are added, thereby allowing the individual accessories in the container to be unpacked in an otherwise manner. The kit may be sterilized using any suitable sterilization technique, such as radiation sterilization, heat sterilization or other sterilization methods known in the art. Depending on the particular application, the kit may also include other components, such as containers, cell culture media, salts, buffers, reagents, syringes, needles, fabrics (such as gauze) for applying or removing disinfectants, disposable gloves, supports for the agent prior to administration, etc. Some aspects of the disclosure provide kits comprising a nucleic acid construct comprising a nucleotide sequence encoding various components of the retrotransposon-based editing system described herein.

除了上述組分以外,本發明套組可進一步包括(在一些實施例中)用於實踐本發明方法之說明書。此等說明書可以多種形式存在於本發明套組中,其中一或多種可存在於套組中。其中可存在此等說明書之一種形式係作為合適介質或基材( 例如,上面印刷資訊之一張或多張紙)上、套組封裝、包裝插頁及其類似物中之印刷資訊。此等說明書之另一形式為電腦可讀介質, 例如磁片、光碟(CD)、快閃驅動器及其類似物,其上記錄有資訊。此等說明書可能存在之另一形式為網站位址,可經由網際網路使用該網站位址來存取遠程資料站之資訊。在一些實施例中,定期更新網站上所提供之資訊以提供例如最新資訊。書面說明書可呈監管醫藥劑或生物產品之製造、使用或銷售的政府機構規定之形式,該形式亦可反映該機構批准製造、使用或銷售以用於動物投與。如本文所用,「促銷」包括所有開展業務之方法,包括與本揭示案相關之教育方法、醫院及其他臨床指導、科學探究、藥物發現或開發、學術研究、包括藥品銷售在內的醫藥行業活動以及任何廣告或其他促銷活動,包括任何形式之書面、口頭及電子通訊。 In addition to the above components, the kit of the present invention may further include (in some embodiments) instructions for practicing the method of the present invention. Such instructions may be present in the kit of the present invention in a variety of forms, one or more of which may be present in the kit. One form in which such instructions may be present is as printed information on a suitable medium or substrate ( e.g. , one or more sheets of paper with printed information on it), in a kit package, a package insert, and the like. Another form of such instructions is a computer-readable medium, such as a disk, a compact disc (CD), a flash drive, and the like, on which information is recorded. Another form in which such instructions may be present is a website address, which can be used to access information from a remote data site via the Internet. In some embodiments, the information provided on the website is regularly updated to provide, for example, the latest information. Written instructions may be in a form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, and may also reflect approval by that agency for manufacture, use or sale for animal administration. As used herein, "promotion" includes all methods of conducting business, including educational methods related to this disclosure, hospital and other clinical instruction, scientific inquiry, drug discovery or development, academic research, pharmaceutical industry activities including the sale of drugs, and any advertising or other promotional activities, including any form of written, oral and electronic communications.

本揭示案之其他態樣提供包含一或多種核酸構築體(例如,編碼基於逆轉錄子之基因體編輯系統的組分之一或多種mRNA或環狀RNA分子)之套組在各個實施例中,所有核酸構築體均可基於RNA分子,亦即,「全RNA系統」。例如,該編輯系統之每種組分均可能由mRNA分子表現,該mRNA分子將藉由一或多種遞送方法(例如,LNP遞送)遞送至標靶細胞。 L. 細胞 Other aspects of the present disclosure provide kits comprising one or more nucleic acid constructs (e.g., one or more mRNA or circular RNA molecules encoding components of a retrotranscript-based genome editing system). In various embodiments, all nucleic acid constructs can be based on RNA molecules, i.e., an "all-RNA system." For example, each component of the editing system may be expressed by an mRNA molecule, which is delivered to a target cell by one or more delivery methods (e.g., LNP delivery). L. Cells

本揭示案之一態樣提供經分離之宿主細胞,其包括一或多種本文所述之組合物,包括但不限於經工程改造之逆轉錄子及/或逆轉錄子組分、經工程改造之ncRNA、經工程改造之msDNA、經工程改造之RT、編碼經工程改造之逆轉錄子及/或逆轉錄子組分的核酸分子以及編碼經工程改造之逆轉錄子及/或逆轉錄子組分的載體或載體系統,及其任何組合。在一些實施例中,宿主細胞為原核細胞、古細菌細胞或真核宿主細胞。在一些實施例中,真核宿主細胞為哺乳動物細胞,諸如人類細胞、非人類細胞或非人類哺乳動物細胞。在一些實施例中,宿主細胞為人工細胞或經遺傳修飾之細胞。在一些實施例中,宿主細胞係在 活體外,諸如組織培養細胞。在一些實施例中,宿主細胞係在活宿主生物體內。 One aspect of the present disclosure provides an isolated host cell comprising one or more compositions described herein, including but not limited to engineered retrotranscripts and/or retrotranscript components, engineered ncRNAs, engineered msDNAs, engineered RTs, nucleic acid molecules encoding engineered retrotranscripts and/or retrotranscript components, and vectors or vector systems encoding engineered retrotranscripts and/or retrotranscript components, and any combination thereof. In some embodiments, the host cell is a prokaryotic cell, an archaeal cell, or a eukaryotic host cell. In some embodiments, the eukaryotic host cell is a mammalian cell, such as a human cell, a non-human cell, or a non-human mammalian cell. In some embodiments, the host cell is an artificial cell or a genetically modified cell. In some embodiments, the host cell is in vitro , such as a tissue culture cell. In some embodiments, the host cell is in a living host organism.

細胞可含有本文所述之任何組合物。使用本文所述之方法將重組逆轉錄子或其組分遞送至真核細胞(例如哺乳動物細胞,諸如人類細胞)中。在一些實施例中,細胞係在活體外(例如,培養細胞)。在一些實施例中,細胞係在活體內(例如,在諸如人類個體之個體中)。在一些實施例中,細胞為離體的(例如,自個體分離且可投與返回同一或不同個體)。The cell may contain any of the compositions described herein. The recombinant retrotransposons or components thereof are delivered to eukaryotic cells (e.g., mammalian cells, such as human cells) using the methods described herein. In some embodiments, the cell is in vitro (e.g., cultured cells). In some embodiments, the cell is in vivo (e.g., in an individual such as a human individual). In some embodiments, the cell is ex vivo (e.g., isolated from an individual and can be administered back to the same or a different individual).

本揭示案考慮使用任何合適宿主細胞。例如,細胞宿主可為哺乳動物細胞。本揭示案之哺乳動物細胞包括人類細胞、靈長類動物細胞(例如,vero細胞)、大鼠細胞(例如,GH3細胞、OC23細胞)或小鼠細胞(例如,MC3T3細胞)。存在多種人類細胞株,包括但不限於人類胚胎腎(HEK)細胞、HeLa細胞、來自美國國家癌症研究所之60種癌細胞株(NCI60)的癌細胞、DU145 (前列腺癌)細胞、Lncap (前列腺癌)細胞、MCF-7 (乳癌)細胞、MDA-MB-438 (乳癌)細胞、PC3 (前列腺癌)細胞、T47D (乳癌)細胞、THP-1 (急性骨髓性白血病)細胞、U87 (神經膠質母細胞瘤)細胞、SHSY5Y人類神經母細胞瘤細胞(自骨髓瘤選殖)及Saos-2 (骨癌)細胞。在一些實施例中,細胞可為人類胚胎腎(HEK)細胞(例如,HEK 293或HEK 293T細胞)。在一些實施例中,細胞可為幹細胞(例如,人類幹細胞),例如富潛能幹細胞(例如人類富潛能幹細胞,包括人類誘導性富潛能幹細胞(hiPSC))。幹細胞係指能夠在培養物中無限期分裂且產生特化細胞之細胞。富潛能幹細胞係指一種幹細胞類型,其能夠分化為生物體之所有組織,但不能單獨維持全生物體發育。人類誘導性富潛能幹細胞係指體細胞(例如,成熟或成體)細胞,該細胞已藉由被迫表現對於維持胚胎幹細胞之定義特性很重要的基因及因子而經再編程為胚胎幹細胞樣狀態(參見例如Takahashi及Yamanaka, Cell 126 (4): 663–76, 2006,以引用之方式併入本文中)。人類誘導性富潛能幹細胞表現幹細胞標記物,且能夠生成所有三個胚層(外胚層、內胚層、中胚層)所特有的細胞。The present disclosure contemplates the use of any suitable host cell. For example, the cell host can be a mammalian cell. The mammalian cells of the present disclosure include human cells, primate cells (e.g., Vero cells), rat cells (e.g., GH3 cells, OC23 cells), or mouse cells (e.g., MC3T3 cells). There are many human cell lines, including but not limited to human embryonic kidney (HEK) cells, HeLa cells, cancer cells from the National Cancer Institute's 60 cancer cell lines (NCI60), DU145 (prostate cancer) cells, Lncap (prostate cancer) cells, MCF-7 (breast cancer) cells, MDA-MB-438 (breast cancer) cells, PC3 (prostate cancer) cells, T47D (breast cancer) cells, THP-1 (acute myeloid leukemia) cells, U87 (neuroglioblastoma) cells, SHSY5Y human neuroblastoma cells (selected from myeloma), and Saos-2 (bone cancer) cells. In some embodiments, the cell may be a human embryonic kidney (HEK) cell (e.g., HEK 293 or HEK 293T cell). In some embodiments, the cell may be a stem cell (e.g., a human stem cell), such as a high-potential stem cell (e.g., a human high-potential stem cell, including human induced high-potential stem cells (hiPSC)). Stem cells refer to cells that can divide indefinitely in culture and produce specialized cells. High-potential stem cells refer to a type of stem cell that can differentiate into all tissues of an organism but cannot sustain the development of the entire organism alone. Human induced high-potential stem cells refer to somatic cells (e.g., mature or adult) that have been reprogrammed into an embryonic stem cell-like state by being forced to express genes and factors that are important for maintaining the defining properties of embryonic stem cells (see, e.g., Takahashi and Yamanaka, Cell 126 (4): 663–76, 2006, incorporated herein by reference). Human induced high-potential stem cells express stem cell markers and are capable of generating cells characteristic of all three germ layers (ectoderm, endoderm, mesoderm).

本揭示案之一些態樣提供細胞,該等細胞包含本文所揭示之任何組合物,包括但不限於經工程改造之逆轉錄子及/或逆轉錄子組分、經工程改造之ncRNA、經工程改造之msDNA、經工程改造之RT、編碼經工程改造之逆轉錄子及/或逆轉錄子組分的核酸分子以及編碼經工程改造之逆轉錄子及/或逆轉錄子組分的載體或載體系統,及其任何組合。在一些實施例中,用本文所述之一或多種遞送系統瞬時地或非瞬時地轉染宿主細胞,該等遞送系統包括基於病毒之系統、病毒樣顆粒系統及非基於病毒之遞送,包括LNP及脂質體。在一些實施例中,細胞如同其天然存在於個體中般進行轉染。在一些實施例中,經轉染之細胞取自個體,亦即 離體轉染。在一些實施例中,細胞源自取自個體之細胞,諸如細胞株。多種用於組織培養之細胞株為此項技術中已知的。細胞株之實例包括但不限於C8161、CCRF-CEM、MOLT、mIMCD- 3、NHDF、HeLa-S3、Huh1、Huh4、Huh7、HUVEC、HASMC、HEKn、HEKa、MiaPaCell、Panc1、PC-3、TF1、CTLL-2、C1R、Rat6、CV1、RPTE、A10、T24、J82、A375、ARH-77、Calu1、SW480、SW620、SKOV3、SK-UT、CaCo2、P388D1、SEM-K2、WEHI-231、HB56、TIB55、Jurkat、J45.01、LRMB、Bcl-1、BC-3、IC21、DLD2、Raw264.7、NRK、NRK-52E、MRC5、MEF、Hep G2、HeLa B、HeLa T4、COS、COS-1、COS-6、COS-M6A、BS-C-1猴腎上皮、BALB/3T3小鼠胚胎成纖維細胞、3T3 Swiss、3T3-L1、132-d5人類胎兒成纖維細胞;10.1小鼠成纖維細胞、293-T、3T3、721、9L、A2780、A2780ADR、A2780cis、A 172、A20、A253、A431、A-549、ALC、B16、B35、BCP-1細胞、BEAS-2B、bEnd.3、BHK-21、BR 293. BxPC3. C3H-10T1/2、C6/36、Cal-27、CHO、CHO-7、CHO-IR、CHO-K1、CHO-K2、CHO-T、CHO Dhfr -/-、COR-L23、COR-L23/CPR、COR-L23/5010、COR-L23/R23、COS-7、COV-434、CML T1、CMT、CT26、D17、DH82、DU145、DuCaP、EL4、EM2、EM3、EMT6/AR1、EMT6/AR10.0、FM3、H1299、H69、HB54、HB55、HCA2、HEK-293、HeLa、Hepa1c1c7、HL-60、HMEC、HT-29、Jurkat、JY細胞、K562細胞、Ku812、KCL22、KG1、KYO1、LNCap、Ma-Mel 1-48、MC-38、MCF-7、MCF-10A、MDA-MB-231、MDA-MB-468、MDA-MB-435、MDCK II、MDCK 11、MOR/0.2R、MONO-MAC 6、MTD-1A、MyEnd、NCI- H69/CPR、NCI-H69/LX10、NCI-H69/LX20、NCI-H69/LX4、NIH-3T3、NALM-1、NW-145、OPCN/OPCT細胞株、Peer、PNT-1A/PNT 2、RenCa、RIN-5F、RMA/RMAS、Saos-2細胞、Sf-9、SkBr3、T2、T-47D、T84、THP1細胞株、U373、U87、U937、VCaP、Vero細胞、WM39、WT-49、X63、YAC-1、YAR及其轉殖基因品種。 Some aspects of the disclosure provide cells comprising any of the compositions disclosed herein, including but not limited to engineered retrotransposons and/or retrotransposons components, engineered ncRNAs, engineered msDNAs, engineered RTs, nucleic acid molecules encoding engineered retrotransposons and/or retrotransposons components, and vectors or vector systems encoding engineered retrotransposons and/or retrotransposons components, and any combination thereof. In some embodiments, host cells are transiently or non-transiently transfected with one or more of the delivery systems described herein, including viral-based systems, virus-like particle systems, and non-viral-based delivery, including LNPs and liposomes. In some embodiments, cells are transfected as they naturally exist in an individual. In some embodiments, the transfected cells are taken from an individual, i.e., ex vivo transfection. In some embodiments, the cells are derived from cells taken from an individual, such as a cell line. A variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huh1, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panc1, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calu1, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRC5, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelium, BALB/3T3 mouse embryonic fibroblasts, 3T3 Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A 172, A20, A253, A431, A-549, ALC, B16, B35, BCP-1 cells, BEAS-2B, bEnd.3, BHK-21, BR 293. BxPC3. C3H-10T1/2, C6/36, Cal-27, CHO, CHO-7, CHO-IR, CHO-K1, CHO-K2, CHO-T, CHO Dhfr -/-, COR-L23, COR-L23/CPR, COR-L23/5010, COR-L23/R23, COS-7, COV-434, CML T1, CMT, CT26, D17, DH82, DU145, DuCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, HeLa, Hepa1c1c7, HL-60, HMEC, HT-29, Jurkat, JY cells, K562 cells , Ku812, KCL22, KG1, KYO1, LNCap, Ma-Mel 1-48, MC-38, MCF-7, MCF-10A, MDA-MB-231, MDA-MB-468, MDA-MB-435, MDCK II, MDCK 11, MOR/0.2R, MONO-MAC 6, MTD-1A, MyEnd, NCI- H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, Peer, PNT-1A/PNT 2, RenCa, RIN-5F, RMA/RMAS, Saos-2 cells, Sf-9, SkBr3, T2, T-47D, T84, THP1 cell lines, U373, U87, U937, VCaP, Vero cells, WM39, WT-49, X63, YAC-1, YAR and their transgenic varieties.

細胞株可獲自熟習此項技術者已知之多種來源(參見例如美國典型培養物保藏中心(ATCC) (Manassus, Va.))。在一些實施例中,使用經本文所述之一或多種逆轉錄子遞送系統轉染的細胞來建立新細胞株,該新細胞株包含一或多種編碼本文所述之基於重組逆轉錄子之基因編輯系統或至少編碼該等系統之組分(例如,重組ncRNA或重組逆轉錄子RT)的核酸分子。 M. 醫藥組合物 Cell lines are available from a variety of sources known to those skilled in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassus, Va.)). In some embodiments, cells transfected with one or more of the retrotransposon delivery systems described herein are used to establish new cell lines comprising one or more nucleic acid molecules encoding a recombinant retrotransposon-based gene editing system described herein, or at least a component of such a system (e.g., a recombinant ncRNA or a recombinant retrotransposon RT). M. Pharmaceutical Compositions

本文所述的經工程改造之基於逆轉錄子之基因體編輯系統或其一或多種組分(例如,經工程改造之ncRNA、經工程改造之msDNA、經工程改造之RT、編碼經工程改造之逆轉錄子及/或逆轉錄子組分的核酸分子、引導RNA、可程式化核酸酶)可作為醫藥組合物提供。例如,包含一或多種環狀或線性RNA分子之一或多種LNP或其他非基於病毒之遞送系統可經調配為醫藥組合物以投與至有需要之個體(例如,需要基因編輯之人類),該等環狀或線性RNA分子編碼基於逆轉錄子之基因體編輯系統的每種組分。The engineered retrotran-based genome editing system described herein or one or more components thereof (e.g., engineered ncRNA, engineered msDNA, engineered RT, nucleic acid molecules encoding engineered retrotran and/or retrotran components, guide RNA, programmable nuclease) can be provided as a pharmaceutical composition. For example, one or more LNPs or other non-viral-based delivery systems comprising one or more circular or linear RNA molecules encoding each component of the retrotran-based genome editing system can be formulated as a pharmaceutical composition for administration to an individual in need (e.g., a human in need of gene editing).

調配物可包括但不限於生理食鹽水、脂質體、脂質奈米顆粒、聚合物、肽、蛋白質、經病毒載體轉染之細胞( 例如,用於轉移或移植至個體中)及其組合。 Formulations may include, but are not limited to, saline, liposomes, lipid nanoparticles, polymers, peptides, proteins, cells transfected with viral vectors ( e.g. , for transfer or transplantation into a subject), and combinations thereof.

可藉由藥理學領域中已知或以後開發之任何方法來製備本文所述之醫藥組合物的調配物。如本文所用,術語「醫藥組合物」係指包含至少一種活性成分及視情況選用之一或多種醫藥學上可接受之賦形劑的組合物。The formulations of the pharmaceutical compositions described herein can be prepared by any method known or later developed in the field of pharmacology. As used herein, the term "pharmaceutical composition" refers to a composition comprising at least one active ingredient and, if appropriate, one or more pharmaceutically acceptable excipients.

一般而言,此類製備方法包括使活性成分與賦形劑及/或一或多種其他輔助成分締合之步驟。如本文所用,措辭「活性成分」一般係指如本文所述之經工程改造之逆轉錄子。Generally, such preparation methods include the step of combining the active ingredient with a substituent and/or one or more other auxiliary ingredients. As used herein, the term "active ingredient" generally refers to an engineered retrotransposon as described herein.

根據本揭示案之醫藥組合物可作為單一單位劑量及/或作為複數個單一單位劑量進行批量製備、封裝及/或銷售。如本文所用,「單位劑量」係指包含預定量之活性成分的醫藥組合物之個別量。活性成分之量一般等於將投與至個體之活性成分的劑量,及/或此類劑量之便利分數,例如此類劑量之一半或三分之一。The pharmaceutical compositions according to the present disclosure may be prepared, packaged and/or sold in bulk as a single unit dose and/or as a plurality of single unit doses. As used herein, a "unit dose" refers to an individual amount of a pharmaceutical composition containing a predetermined amount of an active ingredient. The amount of the active ingredient is generally equal to the dose of the active ingredient to be administered to an individual, and/or a convenient fraction of such a dose, such as one-half or one-third of such a dose.

本揭示案之其他態樣係關於醫藥組合物,該等醫藥組合物包含本文所述之基於重組逆轉錄子之基因體編輯系統的各種組分中之任一者,包括但不限於經工程改造之逆轉錄子及/或逆轉錄子組分、經工程改造之ncRNA、經工程改造之msDNA、經工程改造之RT、編碼經工程改造之逆轉錄子及/或逆轉錄子組分的核酸分子、可程式化核酸酶(例如,RNA引導之核酸酶)、引導RNA以及編碼經工程改造之逆轉錄子及/或逆轉錄子組分的載體或載體系統,及其任何組合。如本文所用,術語「醫藥組合物」係指經調配用於醫藥用途之組合物。在一些實施例中,該醫藥組合物進一步包含醫藥學上可接受之載劑。在一些實施例中,該醫藥組合物包含額外劑(例如,用於特定遞送、增加半衰期或其他治療化合物)。Other aspects of the present disclosure are related to pharmaceutical compositions, which include any of the various components of the genome editing system based on recombinant retrotranscripts described herein, including but not limited to engineered retrotranscripts and/or retrotranscript components, engineered ncRNAs, engineered msDNAs, engineered RTs, nucleic acid molecules encoding engineered retrotranscripts and/or retrotranscript components, programmable nucleases (e.g., RNA-guided nucleases), guide RNAs, and vectors or vector systems encoding engineered retrotranscripts and/or retrotranscript components, and any combination thereof. As used herein, the term "pharmaceutical composition" refers to a composition formulated for medical use. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical composition comprises an additional agent (e.g., for specific delivery, increasing half-life or other therapeutic compounds).

如此處所用,術語「醫藥學上可接受之載劑」意謂醫藥學上可接受之材料、組合物或媒劑,諸如液體或固體填充劑、稀釋劑、賦形劑、製造助劑(例如,潤滑劑、滑石硬脂酸鎂、硬脂酸鈣或硬脂酸鋅或硬脂酸)或溶劑囊封材料,該材料、組合物或媒劑參與將化合物自身體之一個部位(例如,遞送位點)攜帶或轉運至另一部位(例如,器官、組織或身體之部分)。醫藥學上可接受之載劑在與調配物之其他成分可相容且對個體之組織無害的意義上係「可接受的」(例如,生理學上可相容、無菌、生理pH等)。As used herein, the term "pharmaceutically acceptable carrier" means a pharmaceutically acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium stearate, calcium stearate or zinc stearate or stearic acid) or solvent encapsulating material, which material, composition or vehicle participates in carrying or transporting the compound from one part of the body (e.g., a delivery site) to another part (e.g., an organ, tissue or part of the body). A pharmaceutically acceptable carrier is "acceptable" in the sense of being compatible with the other ingredients of the formulation and not deleterious to the tissues of the subject (e.g., physiologically compatible, sterile, physiological pH, etc.).

可用作醫藥學上可接受之載劑的材料之一些實例包括:(1)糖,諸如乳糖、葡萄糖及蔗糖;(2)澱粉,諸如玉米澱粉及馬鈴薯澱粉;(3)纖維素及其衍生物,諸如羧甲基纖維素鈉、甲基纖維素、乙基纖維素、微晶纖維素及乙酸纖維素;(4)粉末狀黃蓍膠;(5)麥芽;(6)明膠;(7)潤滑劑,諸如硬脂酸鎂、月桂基硫酸鈉及滑石;(8)賦形劑,諸如可可脂及栓劑蠟;(9)油,諸如花生油、棉籽油、紅花油、芝麻油、橄欖油、玉米油及大豆油;(10)二醇,諸如丙二醇;(11)多元醇,諸如甘油、山梨糖醇、甘露糖醇及聚乙二醇(PEG);(12)酯,諸如油酸乙酯及月桂酸乙酯;(13)瓊脂;(14)緩衝劑,諸如氫氧化鎂及氫氧化鋁;(15)褐藻酸;(16)無熱原質水;(17)等張生理食鹽水;(18)林格氏溶液;(19)乙醇;(20) pH緩衝溶液;(21)聚酯、聚碳酸酯及/或聚酐;(22)增積劑,諸如多肽及胺基酸,(23)血清組分,諸如血清白蛋白、HDL及LDL;(22) C2-C12醇,諸如乙醇;及(23)醫藥調配物中所用之其他無毒可相容物質。潤濕劑、著色劑、脫模劑、包衣劑、甜味劑、調味劑、芳香劑、防腐劑及抗氧化劑亦可存在於調配物中。諸如「賦形劑」、「載劑」、「醫藥學上可接受之載劑」或其類似術語之術語在本文中可互換使用。Some examples of materials that can be used as pharmaceutically acceptable carriers include: (1) sugars such as lactose, glucose and sucrose; (2) starches such as corn starch and potato starch; (3) cellulose and its derivatives such as sodium carboxymethylcellulose, methylcellulose, ethylcellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricants such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients such as cocoa butter and suppository wax; (9) oils. , such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil, and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerol, sorbitol, mannitol, and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffers, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethanol; (20) pH buffering solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids, (23) serum components, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances used in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweeteners, flavoring agents, fragrances, preservatives and antioxidants may also be present in the formulation. Terms such as "excipients", "carriers", "pharmaceutically acceptable carriers" or similar terms are used interchangeably herein.

在一些實施例中,該醫藥組合物經調配用於遞送至個體,例如用於基因編輯。投與本文所述之醫藥組合物的合適途徑包括但不限於:表面、皮下、經皮、皮內、病變內、關節內、腹膜內、膀胱內、經黏膜、齒齦、齒內、耳蝸內、經鼓膜、器官內、硬膜外、鞘內、肌肉內、靜脈內、血管內、骨內、眼周、腫瘤內、腦內及腦室內投與。In some embodiments, the pharmaceutical composition is formulated for delivery to an individual, for example, for gene editing. Suitable routes of administration of the pharmaceutical compositions described herein include, but are not limited to, topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, intragingival, intradental, intraotonic, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseous, periocular, intratumoral, intracerebral, and intraventricular administration.

在一些實施例中,將本文所述之醫藥組合物局部投與至患病部位(例如,腫瘤部位)。在一些實施例中,藉由注射,藉助於導管,藉助於栓劑,或藉由植入物將本文所述之醫藥組合物投與至個體,該植入物為多孔、無孔或凝膠狀材料,包括膜(諸如唾液酸膜)或纖維。In some embodiments, the pharmaceutical compositions described herein are administered locally to a diseased site (e.g., a tumor site). In some embodiments, the pharmaceutical compositions described herein are administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, which is a porous, non-porous, or gel-like material, including a membrane (e.g., a sialic acid membrane) or a fiber.

在其他實施例中,在控制釋放系統中遞送本文所述之醫藥組合物。在一實施例中,可使用泵(參見例如Langer, 1990, Science 249:1527-1533;Sefton, 1989, CRC Crit. Ref. Biomed. Eng.14:201;Buchwald等人, 1980, Surgery 88:507;Saudek等人, 1989, N. Engl. J. Med.321:574)。在另一實施例中,可使用聚合物材料。(參見例如Medical Applications of Controlled Release (Langer及Wise編, CRC Press, Boca Raton, Fla., 1974);Controlled Drug Bioavailability, Drug Product Design and Performance (Smolen及Ball編, Wiley, New York, 1984);Ranger及Peppas, 1983, Macromol. Sci. Rev. Macromol. Chem.23:61。亦參見Levy等人, 1985, Science 228:190;During等人, 1989, Ann. Neurol.25:351;Howard等人, 1989, J. Neurosurg.71:105)。其他控制釋放系統論述於例如Langer, 同上中。In other embodiments, the pharmaceutical compositions described herein are delivered in a controlled release system. In one embodiment, a pump can be used (see, e.g., Langer, 1990, Science 249:1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed. Eng. 14:201; Buchwald et al., 1980, Surgery 88:507; Saudek et al., 1989, N. Engl. J. Med. 321:574). In another embodiment, a polymer material can be used. (See, e.g., Medical Applications of Controlled Release (Langer and Wise, eds., CRC Press, Boca Raton, Fla., 1974); Controlled Drug Bioavailability, Drug Product Design and Performance (Smolen and Ball, eds., Wiley, New York, 1984); Ranger and Peppas, 1983, Macromol. Sci. Rev. Macromol. Chem. 23:61. See also Levy et al., 1985, Science 228:190; During et al., 1989, Ann. Neurol. 25:351; Howard et al., 1989, J. Neurosurg. 71:105). Other controlled release systems are discussed, e.g., in Langer, supra.

在一些實施例中,根據常規程序將該醫藥組合物調配為經調適用於靜脈內或皮下投與至個體(例如,人類)之組合物。在一些實施例中,藉由注射進行投與之醫藥組合物為無菌等張水性緩衝液中之溶液。必要時,該醫藥亦可包括增溶劑及局部麻醉劑(諸如利諾卡因)以減輕注射部位之疼痛。一般而言,該等成分分開地或混合在一起以單位劑型供應,例如在指示活性劑之量的氣密容器(諸如安瓿或小藥囊)中之乾燥凍乾粉或無水濃縮物。在該醫藥欲藉由輸注進行投與之情況下,其可用含有無菌醫藥級水或生理食鹽水之輸注瓶進行分配。在該醫藥組合物藉由注射進行投與之情況下,可提供無菌注射用水或生理食鹽水之安瓿,使得該等成分可在投與之前加以混合。In some embodiments, the pharmaceutical composition is formulated as a composition adapted for intravenous or subcutaneous administration to an individual (e.g., human) according to conventional procedures. In some embodiments, the pharmaceutical composition administered by injection is a solution in a sterile isotonic aqueous buffer. If necessary, the medicament may also include a solubilizer and a local anesthetic (such as lignocaine) to reduce pain at the injection site. Generally speaking, the components are supplied separately or mixed together in a unit dosage form, such as a dry lyophilized powder or anhydrous concentrate in an airtight container (such as an ampoule or a sachet) indicating the amount of the active agent. In the case where the medicament is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or physiological saline. Where the pharmaceutical composition is administered by injection, an ampoule of sterile water for injection or physiological saline may be provided so that the ingredients can be mixed prior to administration.

用於全身投與之醫藥組合物可為液體,例如無菌生理食鹽水、乳酸林格氏溶液或漢克氏溶液。此外,該醫藥組合物可呈固體形式且在使用前立即再溶解或懸浮。亦考慮凍乾形式。Pharmaceutical compositions for systemic administration may be liquids, such as sterile saline, lactated Ringer's solution or Hanks' solution. In addition, the pharmaceutical compositions may be in solid form and reconstituted or suspended immediately before use. Lyophilized forms are also contemplated.

該醫藥組合物可含於脂質顆粒或囊泡(諸如脂質體或微晶或LNP)內,該脂質顆粒或囊泡亦適合非經腸投與。該等顆粒可具有任何合適結構,諸如單層或多層,只要其中含有組合物即可。化合物可經截留於含有融合脂質二油醯基磷脂醯乙醇胺(DOPE)、低水準(5-10 mol%) 陽離子脂質之「穩定質體-脂質顆粒」(SPLP)中,且藉由聚乙二醇(PEG)塗層進行穩定化(Zhang Y. P.等人, Gene Ther.1999, 6:1438-47)。帶正電荷之脂質諸如N-[1-(2,3-二油醯基氧基)丙基]-N,N,N-三甲基-甲基硫酸銨或「DOTAP」對於此類顆粒及囊泡為尤其較佳的。此類脂質顆粒之製備為熟知的。參見例如美國專利第4,880,635號;第4,906,477號;第4,911,928號;第4,917,951號;第4,920,016號;及第4,921,757號;其中每一者以引用之方式併入本文中。The pharmaceutical composition may be contained in lipid particles or vesicles (such as liposomes or microcrystals or LNPs), which are also suitable for parenteral administration. The particles may have any suitable structure, such as monolayer or multilayer, as long as the composition is contained therein. The compound may be entrapped in "stabilized plasmid-lipid particles" (SPLPs) containing the fusogenic lipid dioleylphosphatidylethanolamine (DOPE), low levels (5-10 mol%) of cationic lipids, and stabilized by polyethylene glycol (PEG) coating (Zhang Y. P. et al., Gene Ther. 1999, 6: 1438-47). Positively charged lipids such as N-[1-(2,3-dioleyloxy)propyl]-N,N,N-trimethyl-ammonium methylsulfate or "DOTAP" are particularly preferred for such particles and vesicles. The preparation of such lipid particles is well known. See, e.g., U.S. Patent Nos. 4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference.

此外,該醫藥組合物可作為醫藥套組提供,該醫藥套組包含(a)含有呈凍乾形式之基於重組逆轉錄子之基因體編輯系統或其一或多種組分的容器,及(b)含有用於注射之醫藥學上可接受之稀釋劑(例如,無菌水)的第二容器。醫藥學上可接受之稀釋劑可用於重構或稀釋本發明之凍乾系統。視情況,與此類容器相關聯的可為呈監管醫藥劑或生物產品之製造、使用或銷售的政府機構規定之形式之通知,該通知反映了該機構批准製造、使用或銷售以用於人類投與。In addition, the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a recombinant retrotransposon-based genome editing system or one or more components thereof in lyophilized form, and (b) a second container containing a pharmaceutically acceptable diluent for injection (e.g., sterile water). The pharmaceutically acceptable diluent can be used to reconstitute or dilute the lyophilized system of the present invention. Optionally, associated with such a container may be a notice in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which reflects the agency's approval of manufacture, use, or sale for human administration.

在另一態樣中,包括含有可用於治療上述疾病之材料的製造物件。在一些實施例中,該製造物件包括容器及標籤。合適容器包括例如瓶子、小瓶、注射器及試管。該等容器可由多種材料形成,諸如玻璃或塑膠。在一些實施例中,該容器容納有效治療本文所述之疾病之組合物且可具有無菌出入口。例如,該容器可為靜脈內溶液袋或具有可由皮下注射針刺穿之塞子之小瓶。該組合物中之活性劑為本發明化合物。在一些實施例中,該容器上或與該容器相關之標籤指示該組合物用於治療所選疾病。該製造物件可進一步包括第二容器,該容器包含醫藥學上可接受之緩衝液,諸如磷酸鹽緩衝生理食鹽水、林格氏溶液或右旋糖溶液。其可進一步包括自商業及用戶角度看來可需要之其他材料,包括其他緩衝液、稀釋劑、過濾器、針、注射器及帶有使用說明書之包裝插頁。 N. 用途及使用方法 In another aspect, an article of manufacture containing materials useful for treating the above-mentioned diseases is included. In some embodiments, the article of manufacture includes a container and a label. Suitable containers include, for example, bottles, vials, syringes, and test tubes. Such containers can be formed from a variety of materials, such as glass or plastic. In some embodiments, the container contains a composition effective for treating the diseases described herein and can have a sterile access port. For example, the container can be an intravenous solution bag or a vial with a stopper that can be pierced by a hypodermic needle. The active agent in the composition is a compound of the present invention. In some embodiments, a label on or associated with the container indicates that the composition is used to treat a selected disease. The article of manufacture may further include a second container containing a pharmaceutically acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials that may be desirable from a commercial and user perspective, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use. N. Uses and Methods of Use

包含異源核酸序列之經工程改造之逆轉錄子可用於多種應用,本文描述其數個非限制性實例。一般而言,經工程改造之逆轉錄子可用於任何合適生物體。在一些實施例中,生物體為真核生物。Engineered retrotransposons comprising heterologous nucleic acid sequences can be used in a variety of applications, several non-limiting examples of which are described herein. In general, engineered retrotransposons can be used in any suitable organism. In some embodiments, the organism is a eukaryotic organism.

在一些實施例中,生物體為動物。在一些實施例中,動物為魚、兩棲類、爬行動物、哺乳動物或鳥。在一些實施例中,動物為農場動物或農業動物。農場及農業動物之非限制性實例包括馬、山羊、綿羊、豬、牛、美洲駝、羊駝及鳥, 例如雞、火雞、鴨及鵝。在一些實施例中,動物為非人類靈長類動物, 例如狒狒、卷尾猴、黑猩猩、狐猴、獼猴、狨猴、絹毛猴、蜘蛛猴、松鼠猴及長尾黑顎猴。在一些實施例中,動物為寵物。寵物之非限制性實例包括犬、貓 馬、狼、兔、雪貂、沙鼠、倉鼠、毛絲鼠、花枝鼠、豚鼠、金絲雀、長尾小鸚鵡及鸚鵡。 In some embodiments, the organism is an animal. In some embodiments, the animal is a fish, an amphibian, a reptile, a mammal, or a bird. In some embodiments, the animal is a farm animal or an agricultural animal. Non-limiting examples of farm and agricultural animals include horses, goats, sheep, pigs, cattle, camels, alpacas, and birds, such as chickens, turkeys, ducks, and geese. In some embodiments, the animal is a non-human primate, such as a baboon, a capuchin monkey, a chimpanzee, a lemur, a macaque, a marmoset, a tamarin, a spider monkey, a squirrel monkey, and a vervet monkey. In some embodiments, the animal is a pet. Non-limiting examples of pets include dogs, cats, horses, wolves, rabbits, ferrets, gerbils, hamsters, chinchillas, chinchillas, guinea pigs, canaries, parrots, and parrots.

在一些實施例中,生物體為植物。可用經工程改造之逆轉錄子轉染之植物包括單子葉植物及雙子葉植物。特定實例包括但不限於玉米(corn/maize)、高粱、小麥、向日葵、馬鈴薯、棉花、水稻、大豆、甜菜、甘蔗、煙草、大麥及油菜、蕓薹屬、苜蓿、黑麥、小米、紅花、花生、地瓜、木薯、咖啡、椰子、鳳梨、柑橘樹、可可、茶、香蕉、鱷梨、無花果、番石榴、芒果、橄欖、木瓜、腰果、澳洲堅果、杏仁、燕麥、蔬菜、觀賞植物及針葉樹。蔬菜包括但不限於十字花科植物、辣椒、番茄、生菜、青豆、利馬豆、豌豆及 黃瓜屬成員(諸如黃瓜、哈密瓜及甜瓜)。觀賞植物包括但不限於杜鵑花、繡球花、木槿花、玫瑰、鬱金香、水仙花、矮牽牛、康乃馨、一品紅及菊花。 In some embodiments, the organism is a plant. Plants that can be transfected with engineered retrotransposons include monocots and dicots. Specific examples include, but are not limited to, corn/maize, sorghum, wheat, sunflower, potato, cotton, rice, soybean, sugar beet, sugar cane, tobacco, barley and rapeseed, sedge, alfalfa, rye, millet, safflower, peanut, sweet potato, cassava, coffee, coconut, pineapple, citrus, cocoa, tea, banana, avocado, fig, guava, mango, olive, papaya, cashew, macadamia, almond, oat, vegetables, ornamental plants and conifers. Vegetables include, but are not limited to crucifers, peppers, tomatoes, lettuce, green beans, lima beans, peas and members of the genus Cucurbita (such as cucumbers, cantaloupes and melons). Ornamental plants include but are not limited to azaleas, hydrangeas, hibiscus, roses, tulips, daffodils, dwarf yews, carnations, poinsettias and chrysanthemums.

在一些實施例中,可將異源核酸序列添加至本發明之經工程改造之逆轉錄子以向細胞提供編碼相關蛋白質或調節RNA之異源核酸、細胞條碼、適合用於基因編輯( 例如,藉由同源定向修復(HDR)或重組介導之基因工程改造(重組工程))之供體多核苷酸或用於分子記錄之CRISPR原間隔基DNA序列,如下文進一步論述。此類異源序列可插入例如 msr基因座或 msd基因座中,使得異源序列由逆轉錄子逆轉錄酶轉錄為msDNA產物之部分。 In some embodiments, heterologous nucleic acid sequences can be added to the engineered retrotranscripts of the present invention to provide cells with heterologous nucleic acids encoding proteins of interest or regulatory RNAs, cellular barcodes, donor polynucleotides suitable for gene editing ( e.g. , by homology-directed repair (HDR) or recombination-mediated genetic engineering (recombineering)), or CRISPR protospacer DNA sequences for molecular recording, as further discussed below. Such heterologous sequences can be inserted, for example, into the msr locus or msd locus, such that the heterologous sequences are transcribed by the retrotranscript reverse transcriptase as part of the msDNA product.

在一些實施例中,本文所述之經工程改造之逆轉錄子可用於研究工具(諸如套組)、功能性基因體學分析以及生成用於研究及藥物篩選的經工程改造之細胞株及動物模型。除了經工程改造之逆轉錄子以外,套組亦可包含一或多種試劑,諸如緩衝液、對照試劑、對照載體、對照RNA多核苷酸、用於自DNA活體外產生多肽之試劑及用於測序之銜接子。緩衝液可為例如穩定緩衝液、重構緩衝液、稀釋緩衝液、洗滌緩衝液或用於將套組之多肽及/或多核苷酸引入細胞中之緩衝液。在一些情況下,套組可包含一或多種對植物具特異性之額外試劑。用於植物之一或多種額外試劑可包括例如土壤、營養物、植物、種子、孢子、 農桿菌、T-DNA載體及pBINAR載體。 基因編輯 In some embodiments, the engineered retrotransposons described herein can be used in research tools (such as kits), functional genomic analysis, and the generation of engineered cell lines and animal models for research and drug screening. In addition to the engineered retrotransposons, the kits can also include one or more reagents, such as buffers, control reagents, control vectors, control RNA polynucleotides, reagents for in vitro production of polypeptides from DNA, and linkers for sequencing. The buffer can be, for example, a stabilization buffer, a reconstitution buffer, a dilution buffer, a wash buffer, or a buffer for introducing the polypeptides and/or polynucleotides of the kit into cells. In some cases, the kit may include one or more additional reagents specific to plants. The one or more additional reagents for plants may include, for example, soil, nutrients, plants, seeds, spores, Agrobacterium , T-DNA vectors, and pBINAR vectors .

在一些實施例中,逆轉錄子用於對所需位點進行基因體編輯。用異源核酸序列對逆轉錄子進行工程改造,該異源核酸序列編碼適合與核酸酶基因體編輯系統一起使用之供體多核苷酸。該核酸酶經設計以特定地靶向接近所需編輯之位置(該核酸酶應經設計,使得一旦正確安裝編輯,該核酸酶不會切割標靶)。該核酸酶( 例如,CAS或非CAS)藉由直接與RT融合或藉由msDNA與gRNA之融合(僅適用於RNA引導之核酸酶)連接至逆轉錄子。將異源核酸序列插入逆轉錄子msd中。參見例如圖3,該圖顯示出表示編輯之標記物。 In some embodiments, a retrotranscript is used to perform genome editing at a desired site. A retrotranscript is engineered with a heterologous nucleic acid sequence that encodes a donor polynucleotide suitable for use with a nuclease genome editing system. The nuclease is designed to specifically target a position close to the desired edit (the nuclease should be designed so that once the edit is correctly installed, the nuclease will not cut the target). The nuclease ( e.g. , CAS or non-CAS) is linked to the retrotranscript by direct fusion with RT or by fusion of msDNA with gRNA (only for RNA-guided nucleases). Insert the heterologous nucleic acid sequence into the retrotranscript msd. See, for example, FIG. 3, which shows markers representing edits.

在一些實施例中,該異源核酸序列在所需編輯之兩側具有10-100或更多bp的與基因體同源之核酸序列。所需編輯(插入、缺失或突變)係在同源序列之間。In some embodiments, the heterologous nucleic acid sequence has 10-100 or more bp of nucleic acid sequence homologous to the genome on both sides of the desired edit. The desired edit (insertion, deletion or mutation) is between the homologous sequences.

在一些實施例中,供體多核苷酸包括包含側接一對同源臂之預期基因體編輯的序列,該對同源臂負責將供體多核苷酸靶向細胞中欲編輯之標靶基因座。供體多核苷酸通常包含與5'基因體標靶序列雜交之5'同源臂及與3'基因體標靶序列雜交之3'同源臂。更多同源臂在本文中稱為5'及3' ( 亦即,上游及下游)同源臂,其涉及同源臂與包含供體多核苷酸內之預期編輯的核苷酸序列之相對位置。該等5'及3'同源臂與欲修飾之基因體DNA中的標靶基因座內之區域雜交,該等區域在本文中分別稱為「5'標靶序列」及「3'標靶序列」。 In some embodiments, the donor polynucleotide includes a sequence comprising an intended genomic edit flanked by a pair of homology arms that are responsible for targeting the donor polynucleotide to a target locus to be edited in a cell. The donor polynucleotide typically comprises a 5' homology arm hybridized to the 5' genomic target sequence and a 3' homology arm hybridized to the 3' genomic target sequence. Further homology arms are referred to herein as 5' and 3' ( i.e. , upstream and downstream) homology arms, which refer to the relative positions of the homology arms to the nucleotide sequence comprising the intended edit within the donor polynucleotide. The 5' and 3' homology arms hybridize to regions within the target locus in the genomic DNA to be modified, which regions are referred to herein as "5' target sequence" and "3' target sequence," respectively.

同源臂必須足夠互補以與標靶序列雜交,從而介導供體多核苷酸與標靶基因座處之基因體DNA之間之同源重組。例如,同源臂可包含與相應基因體標靶序列具有至少約80-100%序列一致性(包括此範圍內之任何一致性百分比,諸如與其具有至少80%、81%、82%、83%、84%、85%、86%、87%、88%、89%、90%、91%、92%、93%、94%、95%、96%、97%、98%、99%或100%序列一致性)之核苷酸序列,其中包含預期編輯之核苷酸序列可藉由HDR在由5'及3'同源臂識別( 亦即,具有足夠互補性以進行雜交)之基因體標靶基因座處整合至基因體DNA中。 The homology arms must be sufficiently complementary to hybridize with the target sequence, thereby mediating homologous recombination between the donor polynucleotide and the genomic DNA at the target locus. For example, the homology arms can comprise a nucleotide sequence having at least about 80-100% sequence identity to the corresponding genomic target sequence (including any percentage of identity within this range, such as at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity thereto), wherein the nucleotide sequence comprising the desired edit can be integrated into the genomic DNA at the genomic target locus identified by the 5' and 3' homology arms ( i.e. , having sufficient complementarity to hybridize).

在一些實施例中,基因體標靶序列( 亦即,「5'標靶序列」及「3'標靶序列」)中之相應同源核苷酸序列側接用於裂解之特定位點及/或用於引入預期編輯之特定位點。特定裂解位點與同源核苷酸序列( 例如,每個同源臂)之間之距離可為數百個核苷酸。在一些實施例中,同源臂與裂解位點之間之距離為200個核苷酸或更少( 例如,0、10、20、30、50、75、100、125、150、175及200個核苷酸)。在大多數情況下,較小距離可產生較高基因靶向率。在一些實施例中,供體多核苷酸在其整個長度上與標靶基因體序列實質上一致,除了欲引入基因體之一部分中的序列變化,該部分包含特定裂解位點及基因體標靶序列中欲改變之部分。 In some embodiments, the corresponding homologous nucleotide sequences in the genomic target sequence ( i.e. , "5' target sequence" and "3' target sequence") flank a specific site for cleavage and/or a specific site for introduction of the desired edit. The distance between a specific cleavage site and the homologous nucleotide sequence ( e.g. , each homology arm) can be hundreds of nucleotides. In some embodiments, the distance between the homology arm and the cleavage site is 200 nucleotides or less ( e.g. , 0, 10, 20, 30, 50, 75, 100, 125, 150, 175, and 200 nucleotides). In most cases, smaller distances can produce higher gene targeting rates. In some embodiments, the donor polynucleotide is substantially identical to the target genome sequence throughout its entire length, except for the sequence changes to be introduced into a portion of the genome that includes the specific cleavage site and the portion of the genome target sequence to be altered.

同源臂可具有任何長度, 例如10個核苷酸或更多、15個核苷酸或更多、20個核苷酸或更多、50個核苷酸或更多、100個核苷酸或更多、250個核苷酸或更多、300個核苷酸或更多、350個核苷酸或更多、400個核苷酸或更多、450個核苷酸或更多、500個核苷酸或更多、1000個核苷酸(1 kb)或更多、5000個核苷酸(5 kb)或更多、10000個核苷酸(10 kb)或更多, 。在一些情況下,5'及3'同源臂之長度實質上彼此相等。然而,在一些情況下,5'及3'同源臂之長度未必彼此相等。例如,一個同源臂可比另一同源臂短30%或更少、比另一同源臂短20%或更少、比另一同源臂短10%或更少、比另一同源臂短5%或更少、比另一同源臂短2%或更少或者僅比另一同源臂少數個核苷酸。在其他情況下,5'及3'同源臂之長度實質上彼此不同, 例如,一個可比另一同源臂短40%或更多、短50%或更多、有時短60%或更多、短70%或更多、短80%或更多、短90%或更多或短95%或更多。 The homology arms can be of any length, e.g., 10 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 300 nucleotides or more, 350 nucleotides or more, 400 nucleotides or more, 450 nucleotides or more, 500 nucleotides or more, 1000 nucleotides (1 kb) or more, 5000 nucleotides (5 kb) or more, 10000 nucleotides (10 kb) or more, etc. In some cases, the lengths of the 5' and 3' homology arms are substantially equal to each other. However, in some cases, the lengths of the 5' and 3' homology arms are not necessarily equal to each other. For example, one homology arm may be 30% or less shorter than the other homology arm, 20% or less shorter than the other homology arm, 10% or less shorter than the other homology arm, 5% or less shorter than the other homology arm, 2% or less shorter than the other homology arm, or only a few nucleotides shorter than the other homology arm. In other cases, the lengths of the 5' and 3' homology arms are substantially different from each other, for example , one may be 40% or more shorter, 50% or more shorter, sometimes 60% or more shorter, 70% or more shorter, 80% or more shorter, 90% or more shorter, or 95% or more shorter than the other homology arm.

供體多核苷酸可與RNA引導之核酸酶組合使用,該核酸酶藉由引導RNA靶向特定基因體序列( 亦即,欲修飾之基因體標靶序列)。標靶特異性引導RNA包含與基因體標靶序列互補之核苷酸序列,且由此藉由標靶位點處之雜交來介導核酸酶-gRNA複合物之結合。例如,gRNA可經設計具有與次要等位基因之序列互補的序列,以將核酸酶-gRNA複合物靶向突變位點。突變可包含插入、缺失或取代。例如,突變可包括單核苷酸變異、基因融合、易位、倒置、重複、移碼、錯義、無義或與相關表型或疾病相關之其他突變。靶向次要等位基因可為常見遺傳變異體或罕見遺傳變異體。在一些實施例中,gRNA經設計以選擇性地結合於具有單鹼基對區分之次要等位基因,例如,以允許核酸酶-gRNA複合物與單核苷酸多態性(SNP)結合。詳言之,gRNA可經設計以靶向相關之疾病相關突變,用於達成進行基因體編輯以自基因中移除突變之目的。或者,可用與主要或野生型等位基因之序列互補的序列來設計gRNA,以將核酸酶-gRNA複合物靶向該等位基因,用於達成進行基因體編輯以將突變(諸如插入、缺失或取代)引入細胞之基因體DNA中的基因中之目的。此類經遺傳修飾之細胞可用於例如改變表型、賦予新特性或產生用於藥物篩選之疾病模型。 The donor polynucleotide can be used in combination with an RNA-guided nuclease that targets a specific genome sequence ( i.e. , the genome target sequence to be modified) via a guide RNA. The target-specific guide RNA comprises a nucleotide sequence that is complementary to the genome target sequence, and thereby mediates the binding of the nuclease-gRNA complex by hybridization at the target site. For example, the gRNA can be designed to have a sequence that is complementary to the sequence of the minor allele to target the nuclease-gRNA complex to the mutation site. The mutation may comprise an insertion, deletion, or substitution. For example, the mutation may comprise a single nucleotide variation, a gene fusion, a translocation, an inversion, a duplication, a frameshift, a missense, a nonsense, or other mutations associated with a relevant phenotype or disease. The targeted minor allele may be a common genetic variant or a rare genetic variant. In some embodiments, the gRNA is designed to selectively bind to a minor allele with a single base pair distinction, for example, to allow the nuclease-gRNA complex to bind to a single nucleotide polymorphism (SNP). In detail, the gRNA can be designed to target a relevant disease-related mutation for the purpose of performing genome editing to remove the mutation from the gene. Alternatively, the gRNA can be designed with a sequence that is complementary to the sequence of the major or wild-type allele to target the nuclease-gRNA complex to the allele for the purpose of performing genome editing to introduce a mutation (such as an insertion, deletion, or substitution) into a gene in the genomic DNA of the cell. Such genetically modified cells can be used, for example, to change the phenotype, confer new characteristics, or generate disease models for drug screening.

在一些實施例中,用於基因體修飾之RNA引導之核酸酶為規律成簇間隔短回文重複(CRISPR)系統Cas核酸酶。任何能夠催化DNA之定點裂解以允許藉由HDR機制整合供體多核苷酸的RNA引導之Cas核酸酶均可用於基因體編輯,包括CRISPR系統1類,I型、II型或III型Cas核酸酶;2類,II型核酸酶(諸如Cas9);2類,V型核酸酶(諸如Cpfl),或2類,VI型核酸酶(諸如C2c2)。Cas蛋白之實例包括Casl、CaslB、Cas2、Cas3、Cas4、Cas5、Cas5e (CasD)、Cas6、Cas6e、Cas6f、Cas7、Cas8al、Cas8a2、Cas8b、Cas8c、Cas9 (Csnl或Csxl2)、CaslO、CaslOd、CasF、CasG、CasH、Csyl、Csy2、Csy3、Csel (CasA)、Cse2 (CasB)、Cse3 (CasE)、Cse4 (CasC)、Cscl、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、Cmrl、Cmr3、Cmr4、Cmr5、Cmr6、Csbl、Csb2、Csb3、Csxl7、Csxl4、CsxlO、Csxl6、CsaX、Csx3、Csxl、Csxl5、Csfl、Csf2、Csf3、Csf4及Cul966以及其同源物或經修飾形式。In some embodiments, the RNA-guided nuclease used for genome modification is a clustered regularly interspaced short palindromic repeat (CRISPR) system Cas nuclease. Any RNA-guided Cas nuclease that can catalyze site-specific cleavage of DNA to allow integration of a donor polynucleotide by the HDR mechanism can be used for genome editing, including CRISPR system class 1, type I, type II, or type III Cas nucleases; class 2, type II nucleases (such as Cas9); class 2, type V nucleases (such as Cpf1), or class 2, type VI nucleases (such as C2c2). Examples of Cas proteins include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8a1, Cas8a2, Cas8b, Cas8c, Cas9 (Csn1 or Csx12), Cas10, Cas10d, CasF, CasG, CasSH, Csy1, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4 and Cul966, and homologs or modified forms thereof.

在一些實施例中,使用1類,II型CRISPR系統Cas9核酸內切酶。可使用來自任何物種之Cas9核酸酶,或其保留Cas9核酸內切酶活性之生物活性片段、變異體、類似物或衍生物( 亦即,催化DNA之定點裂解以生成雙股斷裂)來執行如本文所述之基因體修飾。Cas9無需以物理方式源自生物體,而是可以合成方式或重組產生。來自多種細菌物種之Cas9序列為此項技術中熟知的且在美國國家生物技術資訊中心(National Center for Biotechnology Information;NCBI)資料庫中列出。參見例如來自以下之Cas9之NCBI條目: 化膿鏈球菌(WP 002989955、WP_038434062、WP_011528583); 空腸彎曲桿菌(WP_022552435、YP 002344900)、 大腸彎曲桿菌(WP 060786116); 胎兒彎曲桿菌(WP 059434633); 潰瘍棒桿菌(NC_015683、NC_017317); 白喉棒桿菌(NC_016782、NC_016786); 糞腸球菌(WP 033919308); 栖蚜蠅螺原體(NC 021284); 中間普雷沃氏菌(NC 017861); 台灣螺旋體(NC 021846); 海豚鏈球菌(NC 021314); 波羅的海貝尓氏菌(NC 018010); 扭曲冷彎菌(NC O 18721); 嗜熱鏈球菌(YP 820832)、 變形鏈球菌(WP 061046374、WP 024786433); 無害李斯特菌(NP 472073); 單核細胞增生李斯特菌(WP 061665472); 嗜肺軍團菌(WP 062726656); 金黃色葡萄球菌(WP_001573634); 土倫病弗朗西斯菌(WP_032729892、WP_014548420)、 糞腸球菌(WP 033919308); 鼠李糖乳桿菌(WP 048482595、WP_032965177);及 腦膜炎奈瑟菌(WP_061704949、YP_002342100);所有該等序列(如藉由本申請案之申請日所輸入)均以引用之方式整體併入本文中。此等序列或其包含與其具有至少約70-100%序列一致性(包括此範圍內之任何一致性百分比,諸如與其具有70、71、72、73、74、75、76、77、78、79、80、81、82、83、84、85、86、87、88、89、90、91、92、93、94、95、96、97、98或99%序列一致性)之序列的變異體中之任一者均可用於如本文所述之基因體編輯。亦參見Fonfara 等人(2014) Nucleic Acids Res. 42(4):2577-90;Kapitonov 等人(2015) J. Bacterid. 198(5): 797-807,Shmakov 等人(2015) Mol. Cell. 60(3):385- 397,及Chylinski 等人(2014) Nucleic Acids Res. 42(10):6091-6105);用於序列比較,以及Cas9遺傳多樣性及種系發生分析之論述。 In some embodiments, a Class 1, Type II CRISPR system Cas9 endonuclease is used. Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity ( i.e. , catalyzing site-directed cleavage of DNA to generate double-strand breaks) can be used to perform genome modification as described herein. Cas9 need not be physically derived from an organism, but can be produced synthetically or recombinantly. Cas9 sequences from a variety of bacterial species are well known in the art and are listed in the National Center for Biotechnology Information (NCBI) database. See, e.g., the NCBI entries for Cas9 from: Streptococcus pyocyaneus (WP 002989955, WP_038434062, WP_011528583); Curvularia jejuni (WP_022552435, YP 002344900), Curvularia flexus (WP 060786116); Curvularia fetus (WP 059434633); Corynebacterium ulcerans (NC_015683, NC_017317); Corynebacterium diphtheriae (NC_016782, NC_016786); Enterococcus fecalis (WP 033919308); Spiroplasma aphidina (NC 021284); Prevotella intermedia (NC 017861); Taiwan spirochete (NC 021846); Streptococcus dolphinus (NC 021314); Balticella balticola (NC 018010); Psychrobacterium contortus (NC O 18721); Thermophilic Streptococcus ( YP 820832), Streptococcus mutans (WP 061046374, WP 024786433); Listeria innocua (NP 472073); Listeria monocytogenes (WP 061665472); Legionella pneumophila (WP 062726656); Staphylococcus aureus (WP_001573634); Francisella tularensis (WP_032729892, WP_014548420), Enterococcus faecalis (WP 033919308); Lactobacillus rhamnosus (WP 048482595, WP_032965177); and Neisseria meningitidis (WP_061704949, YP_002342100); all of which sequences (as imported by the filing date of this application) are incorporated herein by reference in their entirety. Any of these sequences or variants thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percentage identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing as described herein. See also Fonfara et al . (2014) Nucleic Acids Res. 42(4):2577-90; Kapitonov et al. (2015) J. Bacterid. 198(5):797-807, Shmakov et al. (2015) Mol. Cell. 60(3):385-397, and Chylinski et al. (2014) Nucleic Acids Res. 42(10):6091-6105) for sequence comparisons and discussions of Cas9 genetic diversity and phylogeny analysis.

基因體標靶位點通常將包含與gRNA互補之核苷酸序列,且可進一步包含原間隔基相鄰模體(PAM)。在一些實施例中,除了3個或3個以上鹼基對PAM以外,標靶位點亦包含20-30個鹼基對。通常,PAM之第一核苷酸可為任何核苷酸,而兩個或兩個以上其他核苷酸將取決於所選擇之特定Cas9蛋白。例示性PAM序列為熟習此項技術者已知的,且包括但不限於NNG、NGN、NAG及NGG,其中N表示任何核苷酸。在一些實施例中,由gRNA靶向之等位基因包含在等位基因內產生PAM之突變,其中PAM促進Cas9-gRNA複合物與等位基因之結合。The genomic target site will typically include a nucleotide sequence that is complementary to the gRNA, and may further include a protospacer adjacent motif (PAM). In some embodiments, in addition to 3 or more base pair PAMs, the target site also includes 20-30 base pairs. Typically, the first nucleotide of the PAM can be any nucleotide, and the two or more other nucleotides will depend on the specific Cas9 protein selected. Exemplary PAM sequences are known to those skilled in the art, and include, but are not limited to, NNG, NGN, NAG, and NGG, where N represents any nucleotide. In some embodiments, the allele targeted by the gRNA includes a mutation that produces a PAM within the allele, wherein the PAM promotes the binding of the Cas9-gRNA complex to the allele.

在一些實施例中,gRNA為5-50個核苷酸、10-30個核苷酸、15- 25個核苷酸、18-22個核苷酸或19-21個核苷酸長,或所述範圍之間的任何長度,包括例如10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34或35核苷酸長。引導RNA可為在單一RNA分子中包含crRNA及tracrRNA序列之單一引導RNA,或引導RNA可包含兩個RNA分子,其中crRNA及tracrRNA序列存在於單獨RNA分子中。In some embodiments, the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15-25 nucleotides, 18-22 nucleotides, or 19-21 nucleotides long, or any length between the ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides long. The guide RNA can be a single guide RNA comprising crRNA and tracrRNA sequences in a single RNA molecule, or the guide RNA can comprise two RNA molecules, wherein the crRNA and tracrRNA sequences are present in separate RNA molecules.

在另一實施例中,使用來自 普雷沃氏菌弗朗西斯菌1之CRISPR核酸酶(Cpfl或Cas12a)。Cpfl係與Cas9具有相似性之另一II類CRISPR/Cas系統RNA引導之核酸酶且可類似地加以使用。與Cas9不同,Cpfl不需要tracrRNA且僅依賴於其引導RNA中之crRNA,這提供如下優勢:比Cas9更短之引導RNA可與Cpfl一起使用來進行靶向。Cpfl能夠裂解DNA或RNA。由Cpfl識別之PAM位點具有序列5'-YTN-3' (其中「Y」為嘧啶且「N」為任何核鹼基)或5'-TTN-3',與由Cas9識別之富G PAM位點形成對照。DNA之Cpfl裂解產生雙股斷裂,其黏性末端具有4或5個核苷酸之懸垂。關於Cpfl之論述,參見 例如Ledford 等人(2015) Nature. 526 (7571):17-17,Zetsche 等人(2015) Cell. 163 (3):759-771,Murovec 等人(2017) Plant Biotechnol. J. 15(8):917-926,Zhang 等人(2017) Front. Plant Sci. 8:177,Fernandes 等人(2016) Postepy Biochem. 62(3):315-326;以引用之方式併入本文中。 In another embodiment, CRISPR nucleases (Cpf1 or Cas12a) from Prevotella and Francisella 1 are used. Cpf1 is another class II CRISPR/Cas system RNA-guided nuclease that has similarity to Cas9 and can be used similarly. Unlike Cas9, Cpf1 does not require tracrRNA and relies only on crRNA in its guide RNA, which provides the following advantages: guide RNAs shorter than Cas9 can be used with Cpf1 for targeting. Cpf1 is capable of cleaving DNA or RNA. The PAM site recognized by Cpf1 has the sequence 5'-YTN-3' (wherein "Y" is pyrimidine and "N" is any nucleobase) or 5'-TTN-3', which is in contrast to the G-rich PAM site recognized by Cas9. Cpf1 cleavage of DNA produces double-strand breaks with 4 or 5 nucleotide overhangs at the sticky ends. For a discussion of Cpf1, see , e.g., Ledford et al. (2015) Nature. 526(7571):17-17, Zetsche et al . (2015) Cell. 163(3):759-771, Murovec et al . (2017) Plant Biotechnol. J. 15(8):917-926, Zhang et al. (2017) Front. Plant Sci. 8:177, Fernandes et al. (2016) Postepy Biochem. 62(3):315-326; incorporated herein by reference.

C2c1 (Cas12b)係可使用之另一II類CRISPR/Cas系統RNA引導之核酸酶。與Cas9類似,C2cl依賴於crRNA及tracrRNA來引導至標靶位點。參見 例如Shmakov 等人(2015) Mol Cell. 60(3):385-397,Zhang 等人(2017) Front Plant Sci. 8:177;以引用之方式併入本文中。 C2c1 (Cas12b) is another class II CRISPR/Cas system RNA-guided nuclease that can be used. Similar to Cas9, C2c1 relies on crRNA and tracrRNA to guide to the target site. See , e.g., Shmakov et al. (2015) Mol Cell. 60(3):385-397, Zhang et al. (2017) Front Plant Sci. 8:177; incorporated herein by reference.

在另一實施例中,可使用經工程改造之RNA引導之FokI核酸酶。RNA引導之FokI核酸酶包含無活性Cas9 (dCas9)及FokI核酸內切酶之融合(FokI-dCas9),其中dCas9部分對FokI賦予引導RNA依賴性靶向。對於經工程改造之RNA引導之Fold核酸酶的描述,參見 例如Havlicek 等人(2017) Mol.Ther.25(2):342-355,Pan 等人(2016) Sci Rep. 6:35794,Tsai 等人(2014) Nat Biotechnol. 32(6):569-576;以引用之方式併入本文中。 In another embodiment, an engineered RNA-guided FokI nuclease can be used. The RNA-guided FokI nuclease comprises a fusion of an inactive Cas9 (dCas9) and a FokI endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting to FokI. For a description of engineered RNA-guided Fold nucleases, see , e.g., Havlicek et al. (2017) Mol. Ther. 25(2):342-355, Pan et al. (2016) Sci Rep. 6:35794, Tsai et al. (2014) Nat Biotechnol. 32(6):569-576; incorporated herein by reference.

在其他實施例中,可類似地使用本申請案之其他章節中所述之任何其他Cas酶及變異體(均併入本文中)。In other embodiments, any other Cas enzymes and variants described in other sections of this application (all incorporated herein) may be similarly used.

在一些實施例中,以蛋白質之形式提供RNA引導之核酸酶,視情況其中核酸酶與gRNA複合以形成核糖核蛋白(RNP)複合物。在一些實施例中,由編碼RNA引導之核酸酶的核酸,諸如RNA ( 例如,信使RNA)或DNA (表現載體)提供RNA引導之核酸酶。在一些實施例中,RNA引導之核酸酶及gRNA均由載體提供,諸如本申請案之其他部分中所述之載體及載體系統(均以引用之方式併入本文中)。兩者可由單一載體表現,或在不同載體上單獨表現。編碼RNA引導之核酸酶及gRNA之載體可包括於載體系統中,該載體系統包含經工程改造之逆轉錄子 msr基因、 msd基因及 ret基因序列。在一些實施例中,RNA引導之核酸酶與RT及/或msDNA融合。 In some embodiments, RNA-guided nucleases are provided in the form of proteins, where the nucleases are complexed with gRNA to form ribonucleoprotein (RNP) complexes. In some embodiments, RNA-guided nucleases are provided by nucleic acids encoding RNA-guided nucleases, such as RNA ( e.g. , messenger RNA) or DNA (expression vectors). In some embodiments, RNA-guided nucleases and gRNA are provided by vectors, such as the vectors and vector systems described in other parts of this application (all incorporated herein by reference). Both can be expressed by a single vector, or expressed separately on different vectors. Vectors encoding RNA-guided nucleases and gRNAs can be included in a vector system, which includes engineered retrotransposons msr genes, msd genes, and ret gene sequences. In some embodiments, RNA-guided nucleases are fused with RT and/or msDNA.

RNP複合物可藉由此項技術中已知之方法,諸如美國專利第11,390,884號(其以引用之方式整體併入本文中)中所述之彼等方法投與至個體或遞送至細胞中。在一些實施例中,藉由電穿孔將核酸內切酶/gRNA核糖核蛋白(RNP)複合物遞送至細胞。將RNP複合物直接遞送至個體或細胞會消除對自核酸中表現(例如,編碼Cas9及gRNA之質體的轉染)之需要。其亦消除源自核酸遞送(例如,編碼Cas9及gRNA之質體的轉染)之DNA區段的非想要整合。核酸內切酶/gRNA核糖核蛋白(RNP)複合物通常在投與之前形成。The RNP complex can be administered to an individual or delivered to a cell by methods known in the art, such as those described in U.S. Pat. No. 11,390,884 (which is incorporated herein by reference in its entirety). In some embodiments, the endonuclease/gRNA ribonucleoprotein (RNP) complex is delivered to a cell by electroporation. Direct delivery of the RNP complex to an individual or cell eliminates the need for expression from nucleic acid (e.g., transfection of plasmids encoding Cas9 and gRNA). It also eliminates the unwanted integration of DNA segments derived from nucleic acid delivery (e.g., transfection of plasmids encoding Cas9 and gRNA). The endonuclease/gRNA ribonucleoprotein (RNP) complex is typically formed prior to administration.

可最佳化密碼子使用,以進一步改良特定細胞或生物體中RNA引導之核酸酶及/或逆轉錄酶(RT)之產生。例如,如與天然存在之多核苷酸序列相比,編碼RNA引導之核酸酶或逆轉錄酶之核酸可經修飾以取代酵母細胞、細菌細胞、人類細胞、非人類細胞、哺乳動物細胞、囓齒動物細胞、小鼠細胞、大鼠細胞或任何其他相關宿主細胞中具有較高使用頻率之密碼子。當編碼RNA引導之核酸酶或逆轉錄酶之核酸經引入細胞中時,該蛋白質可在細胞中瞬時、條件性或組成性表現。Codon usage can be optimized to further improve the production of RNA-guided nucleases and/or reverse transcriptases (RTs) in a particular cell or organism. For example, a nucleic acid encoding an RNA-guided nuclease or reverse transcriptase can be modified to replace codons with a higher frequency of usage in yeast cells, bacterial cells, human cells, non-human cells, mammalian cells, rodent cells, mouse cells, rat cells, or any other relevant host cell, as compared to the naturally occurring polynucleotide sequence. When a nucleic acid encoding an RNA-guided nuclease or reverse transcriptase is introduced into a cell, the protein can be expressed transiently, conditionally, or constitutively in the cell.

在一些實施例中,用於利用核酸酶基因體編輯系統進行基因體編輯之經工程改造之逆轉錄子可進一步包括用於重組之輔助蛋白或增強子蛋白。重組增強子之實例可包括非同源末端接合(NHEJ)抑制劑(例如,DNA連接酶IV抑制劑、KU抑制劑( 例如,KU70或KU80)、DNA-PKc抑制劑或artemis抑制劑)及同源定向修復(HDR)啟動子或兩者,其可增強或改良更精確基因體編輯及/或同源重組效率。在一些實施例中,重組輔助物或增強子可包含C末端結合蛋白相互作用蛋白(CtIP)、cyclinB2、Rad家族成員( 例如,Rad50、Rad51、Rad52等)。 In some embodiments, the engineered retrotranscript for genome editing using a nuclease genome editing system may further include an auxiliary protein or enhancer protein for recombination. Examples of recombination enhancers may include non-homologous end joining (NHEJ) inhibitors (e.g., DNA ligase IV inhibitors, KU inhibitors ( e.g. , KU70 or KU80), DNA-PKc inhibitors or artemis inhibitors) and homology-directed repair (HDR) promoters or both, which can enhance or improve more precise genome editing and/or homologous recombination efficiency. In some embodiments, the recombination auxiliary or enhancer may include C-terminal binding protein interacting protein (CtIP), cyclinB2, Rad family members ( e.g. , Rad50, Rad51, Rad52, etc.).

CtIP係含有參與同源重組之早期步驟的C2H2鋅指之轉錄因子。哺乳動物CtIP及其在其他真核生物中之異種同源物促進DNA雙股斷裂之切除且為減數分裂重組所必需的。可藉由使用與CtIP之N末端結構域締合( 例如,融合)之Cas9核酸酶來增強HDR,該方法迫使CtIP到達裂解位點且藉由HDR增加轉殖基因整合。在一些實施例中,CtIP之N末端片段(稱為HDR增強子之HE)可能足以用於HDR刺激且需要CtIP多聚化結構域及CDK磷酸化位點具有活性。由Cas9-HE融合引起之HDR刺激取決於所用之引導RNA,且因此將相應地設計引導RNA。 CtIP is a transcription factor containing a C2H2 zinc finger that is involved in the early steps of homologous recombination. Mammalian CtIP and its heterologous homologs in other eukaryotes promote the resection of double-stranded DNA breaks and are essential for meiotic recombination. HDR can be enhanced by using a Cas9 nuclease that is conjugated ( e.g. , fused) to the N-terminal domain of CtIP, which forces CtIP to reach the cleavage site and increases transgene integration by HDR. In some embodiments, the N-terminal fragment of CtIP (referred to as HE, an HDR enhancer) may be sufficient for HDR stimulation and requires the CtIP multimerization domain and CDK phosphorylation sites to be active. HDR stimulation caused by Cas9-HE fusion depends on the guide RNA used, and therefore the guide RNA will be designed accordingly.

使用本文所述之基因編輯系統,可對宿主細胞中之任何標靶基因或序列進行編輯或修飾以獲得所需特質,包括但不限於:肌肉生長抑制素( 例如,GDF8)以增加肌肉生長;Pc POLLED以誘導除毛;KISS1R以誘導孔污染;死端蛋白(dnd)以誘導不育;Nano2及DDX以誘導不育;CD163以誘導PRRSV抗性;RELA以誘導ASFV復原;CD18以誘導溶血性曼海姆菌(巴斯德氏菌)復原;NRAMPl以誘導結核病復原;肌肉質量之負調節因子( 例如,肌肉生長抑制素)以增加肌肉質量。 疾病及病症 Using the gene editing system described herein, any target gene or sequence in a host cell can be edited or modified to obtain desired traits, including but not limited to: myostatin ( e.g. , GDF8) to increase muscle growth; Pc POLLED to induce hair removal; KISS1R to induce hole contamination; dead-end protein (dnd) to induce sterility; Nano2 and DDX to induce sterility; CD163 to induce PRRSV resistance; RELA to induce ASFV recovery; CD18 to induce hemolytic Mannheimia (Pasteurella) recovery; NRAMPl to induce tuberculosis recovery; negative regulators of muscle mass ( e.g. , myostatin) to increase muscle mass. Diseases and Conditions

本文提供治療疾病或病症之方法,該等方法包括向有需要之個體投與本揭示案之醫藥組合物,該醫藥組合物包括基於逆轉錄子之基因編輯系統及/或其組分。在本發明之各個實施例中,標靶基因體或表觀遺傳修飾包括患有單基因疾病或病症之細胞。各種單基因疾病包括但不限於:腺苷去胺酶(ADA)缺乏;α-1抗胰蛋白酶缺乏;囊性纖維化;杜氏肌肉萎縮;半乳糖血症;血色沈著;亨汀頓氏舞蹈症;楓糖尿病;馬凡症候群;1型神經纖維瘤病;先天性厚甲症;苯酮尿症;嚴重聯合免疫缺乏;鐮狀細胞疾病;Smith-Lemli-Opitz症候群;Tay-Sachs疾病;遺傳性酪胺酸血症 I;流感;SARS-CoV-2;阿茲海默氏病;帕金森氏病。Provided herein are methods for treating a disease or condition, comprising administering to an individual in need thereof a pharmaceutical composition of the disclosure comprising a retrotransposon-based gene editing system and/or components thereof. In various embodiments of the invention, the target genome or epigenetic modification comprises a cell suffering from a single gene disease or condition. Various monogenic diseases include, but are not limited to: adenosine deaminase (ADA) deficiency; alpha-1 antitrypsin deficiency; cystic fibrosis; Duchenne muscular dystrophy; galactosemia; hemochromatosis; Huntington's chorea; maple tree diabetes; Marfan syndrome; neurofibromatosis type 1; pachyonychia congenita; phenylketonuria; severe combined immunodeficiency; sickle cell disease; Smith-Lemli-Opitz syndrome; Tay-Sachs disease; hereditary tyrosinemia I; influenza; SARS-CoV-2; Alzheimer's disease; Parkinson's disease.

在一些情況下,與某些疾病及病症相關之標靶序列為已知的。標靶序列或標靶編輯位點包括10,000種單基因病症中之一或多者的疾病相關突變或致病突變。可基於單基因病症生成標靶序列清單。可藉由此處所述之基因編輯系統校正的常見遺傳病症包括但不限於:腺苷去胺酶(ADA)缺乏;α-1抗胰蛋白酶缺乏;囊性纖維化;杜氏肌肉萎縮;半乳糖血症;血色沈著;亨汀頓氏舞蹈症;楓糖尿病;馬凡症候群;1型神經纖維瘤病;先天性厚甲症;苯酮尿症;嚴重聯合免疫缺乏;鐮狀細胞疾病;Smith-Lemli-Opitz症候群;及Tay-Sachs疾病。在其他實施例中,疾病相關基因可與選自由以下組成之群的多基因病症相關:心臟病;高血壓;阿茲海默氏病;關節炎;糖尿病;癌症;及肥胖。In some cases, target sequences associated with certain diseases and conditions are known. Target sequences or target editing sites include disease-associated mutations or pathogenic mutations for one or more of the 10,000 single-gene disorders. A list of target sequences can be generated based on single-gene disorders. Common genetic disorders that can be corrected by the gene editing system described herein include, but are not limited to: adenosine deaminase (ADA) deficiency; alpha-1 antitrypsin deficiency; cystic fibrosis; Duchenne muscular dystrophy; galactosemia; hemochromatosis; Huntington's chorea; maple diabetes; Marfan syndrome; neurofibromatosis type 1; pachyonychia congenita; phenylketonuria; severe combined immunodeficiency; sickle cell disease; Smith-Lemli-Opitz syndrome; and Tay-Sachs disease. In other embodiments, the disease-associated gene may be associated with a polygenic disorder selected from the group consisting of: heart disease; hypertension; Alzheimer's disease; arthritis; diabetes; cancer; and obesity.

亦可使用本文所揭示之基因編輯系統,藉由編輯疾病相關基因中之缺陷來治療以下遺傳病症,如下: 遺傳疾病 疾病基因 腎上腺腦白質營養不良(ALD) ABCD1 非布魯頓型無γ球蛋白血症 IGHM Alport症候群 COL4A5 澱粉樣神經病變 – Andrade疾病 TTR 血管神經性水腫 C1NH α1-抗胰蛋白酶缺乏 SERPINEA 1 Bartter症候群4型 BSND 瞼裂 - 眼瞼下垂 - 倒轉型內眥贅皮(BEPS) FOXL2 Brugada症候群 - 長QT症候群-3 SCN5A 布魯頓無γ球蛋白血症酪胺酸激酶 BTK 神經元2型蠟樣質脂褐質沈著症 CLN2 Charcot Marie Tooth 1A型(CMT1A) PMP22 Charcot Marie Tooth X型(CMTX) CMTX 慢性肉芽腫病(CGD) CYBB 囊性纖維化(CF) CFTR 先天性腎上腺增生(CAH) CYP21A2 先天性Ia型糖基化病症(CDG Ia) PMM2 先天性眼外肌纖維化1 (CFEOM1) KIF21A Crigler-Najjar症候群 UGT1A1 耳聾,體染色體隱性 CX26 Diamond-Blackfan貧血(DBA) RPS19 Duchenne-Becker肌肉萎縮(DMD/DMB) DMD Duncan疾病 - X連鎖淋巴增生性症候群(XLPD) SH2D1A 缺指外胚層發育不良及唇裂/顎裂症候群(EEC) p63 營養不良性/癢疹性水疱性表皮鬆解症 COL7A1 多發性外生骨疣I型(EXT1) EXT1 多發性外生骨疣II型(EXT2) EXT2 面肩肱型肌肉萎縮 FRG1 VII因子缺乏 F7 家族性地中海熱(FMF) MEFV 範可尼貧血A FANCA 範可尼貧血G FANCG 脆性X FRAXA 神經節苷脂症(GM1) GLB1 戈謝病(GD) GBA 血小板無力症 ITGA2B 葡萄糖-6-磷酸去氫酶缺乏 G6PD 戊二酸血症Ⅰ GCDH 血友病A F8 血友病B F9 手-足-子宮症候群 HOXD13 家族性噬血性淋巴組織細胞增生,2型(FHL2) PRF1 原發性低鎂血症 CLDN16 低磷酸酯酶症 ALPL Holt-Oram症候群(HOS) TBX5 同型胱胺酸尿症 MTHFR 色素失禁症 NEMO Lesch-Nyhan症候群 HPRT 肢帶肌肉萎縮2C型(LGMD2C) SGCG 長QT症候群-1 KCNQ1 α甘露糖症 MAN2B1 Marfan症候群 FBN1 甲基丙烯酸尿症、β-羥基異丁醯基-CoA去醯酶缺乏 HIBCH 甲羥戊酸尿症 MVK 強直性肌肉萎縮(DM) DMPK 強直性肌肉萎縮2型(DM2) ZNF9 黏多醣貯積症I型 - Hurler症候群 IDUA 黏多醣貯積症IIIA型 - Sanfilippo症候群A (MPS3A) SGSH 黏多醣貯積症IIIB型 - Sanfilippo症候群B (MPS3B) NAGLU 黏多醣貯積症VI型(MPS VI) - Maroteaux-Lamy症候群 ARSB 神經元蠟樣質脂褐質沈著症1 - Batten氏病(CLN1) PPT1 Niemann-Pick疾病 SMPD1 Noonan症候群 PTPN11 遺傳性胰臟炎(PCTT) PRSS1 先天性副肌強直(PMC) SCN4A 苯酮尿症 PAH 多囊腎病1型(PKD1) PKD1 多囊腎病2型(PKD2) PKD2 多囊腎及肝病-1 (ARPKD) PKHD1 Schwartz-Jampel/Stuve-Wiedemann症候群 LIFR 鐮狀細胞貧血 HBB 多指症(SPD1) HOXA13 Smith-Lemli-Opitz症候群 DHCR7 痙攣性截癱3型 SPG3A 脊髓性肌肉萎縮(SMA) SMN 脊髓小腦共濟失調3 (SCA3) ATXN3 脊髓小腦共濟失調7 (SCA7) ATXN7 Stargardt疾病 ABCA4 Tay Sachs (TSD) HEXA 地中海型貧血-α智力低下症候群 ATRX 地中海型貧血-β HBB 扭轉肌張力失調,早髮型(EOTD) DYT1 酪胺酸血症1型 FAH 結節性硬化症1 TSC1 結節性硬化症2 TSC2 Wiskott-Aldrich症候群(WAS) WAS The gene editing system disclosed herein can also be used to treat the following genetic diseases by editing defects in disease-related genes, as follows: Genetic diseases Disease genes Adrenoleukodystrophy (ALD) ABCD1 Non-Brukton's agammaglobulinemia IGHM Alport syndrome COL4A5 Amyloid neuropathy – Andrade disease TTR Vascular edema C1NH Alpha-1 antitrypsin deficiency SERPINEA 1 Bartter syndrome type 4 BSND Palpebral fissure - ptosis - inverted epicanthus epicanthus (BEPS) FOXL2 Brugada Syndrome- Long QT Syndrome-3 SCN5A Brunton's agammaglobulinemia tyrosine kinase BTK Neuronal waxy lipofuscinosis type 2 CLN2 Charcot Marie Tooth Type 1A (CMT1A) PMP22 Charcot Marie Tooth Type X(CMTX) CMTX Chronic granulomatous disease (CGD) CYBB Cystic Fibrosis (CF) CFTR Congenital adrenal hyperplasia (CAH) CYP21A2 Congenital disorder of glycosylation type Ia (CDG Ia) PMM2 Congenital fibrosis of extraocular muscles 1 (CFEOM1) KIF21A Crigler-Najjar syndrome UGT1A1 Deafness, somatic recessive CX26 Diamond-Blackfan Anemia (DBA) RPS19 Duchenne-Becker Muscle Dystrophy (DMD/DMB) DMD Duncan Disease - X-linked Lymphoproliferative Syndrome (XLPD) SH2D1A Eccentric dysplasia and cleft lip/palate syndrome (EEC) p63 Malnutrition/itchy epidermolysis bullosa COL7A1 Multiple exostoses type 1 (EXT1) EXT1 Multiple exostoses type II (EXT2) EXT2 Facioscapulohumeral atrophy FRG1 Factor VII deficiency F7 Familial Mediterranean fever (FMF) MEFV Fanconi Anemia A FANCA Fanconi Anemia G FANCG Fragility X FRAXA Gangliosidosis (GM1) GLB1 Gaucher disease (GD) GBA Thrombocytopenia ITGA2B Glucose-6-phosphate dehydrogenase deficiency G6PD Glutaric acidemia I GCDH Hemophilia A F8 Hemophilia B F9 Hand-Foot-Uterus Syndrome HOXD13 Familial hemophagocytic lymphohistiocytosis, type 2 (FHL2) PRF1 Primary hypomagnesemia CLDN16 Hypophosphatasia ALPL Holt-Oram syndrome (HOS) TBX5 Homocystinuria MTHFR Incontinence pigmenti NEMO Lesch-Nyhan syndrome HPRT Limb-girdle muscular dystrophy type 2C (LGMD2C) SGCG Long QT Syndrome-1 KCNQ1 Alpha-mannose syndrome MAN2B1 Marfan syndrome FBN1 Methacryluria, β-hydroxyisobutyryl-CoA deacylase deficiency HIBCH Mevalonic aciduria MVK Myotonic muscular dystrophy (DM) DMPK Myotonic dystrophy type 2 (DM2) ZNF9 Mucopolysaccharidosis Type I - Hurler Syndrome IDUA Mucopolysaccharidosis type IIIA-Sanfilippo syndrome A (MPS3A) SGSH Mucopolysaccharidosis type IIIB-Sanfilippo syndrome B (MPS3B) NAGLU Mucopolysaccharidosis Type VI (MPS VI) - Maroteaux-Lamy Syndrome ARSB Neurofibromatosis 1 - Batten's disease (CLN1) PPT1 Niemann-Pick disease SMPD1 Noonan syndrome PTPN11 Hereditary pancreatitis (PCTT) PRSS1 Paramyotonia congenita (PMC) SCN4A Phenylketonuria PAH Polycystic kidney disease type 1 (PKD1) PKD1 Polycystic kidney disease type 2 (PKD2) PKD2 Polycystic kidney and liver disease-1 (ARPKD) PKHD1 Schwartz-Jampel/Stuve-Wiedemann syndrome LIFR Sickle cell anemia HBB Polydactyly (SPD1) HOXA13 Smith-Lemli-Opitz syndrome DHCR7 Spastic paraplegia type 3 SPG3A Spinal Muscular Atrophy (SMA) SMN Spinocerebellar Ataxia 3 (SCA3) ATXN3 Spinocerebellar Ataxia 7 (SCA7) ATXN7 Stargardt disease ABCA4 Tay Sachs (TSD) HEXA Mediterranean anemia-alpha mental retardation syndrome ATRX Mediterranean anemia-β HBB Torsion dystonia, early onset type (EOTD) DYT1 Tyrosinemia type 1 FAH Tuberous sclerosis complex 1 TSC1 Tuberous sclerosis complex 2 TSC2 Wiskott-Aldrich syndrome (WAS) WAS

此外,亦可使用本文所揭示之基因編輯系統,藉由編輯疾病相關基因中或超過一種與特定病症相關之基因中的缺陷來治療以下遺傳病症,如下: A B C D E F 遺傳疾病 疾病相關基因 最常見之(B) (C) 之編碼產物 (C) 之寄存編號 (C) 之產品類型 歸因於21-羥化酶缺乏之腎上腺增生(21-OHD CAH) CYP21A2 CYP21A2 (196) 細胞色素P450家族21子族A成員2 P08686 Aicardi-Goutières症候群腦病 ADAR;IFIH1;RNASEH2A;RNASEH2B;RNASEH2C;SAMHD1;TREX1 RNASEH2B (28) 核糖核酸酶H2次單元B Q5TBB1 α-1-抗胰蛋白酶(A1AT)缺乏(AATD) SERPINA1 SERPINA1 (83) Serpin家族A成員1 P01009 酶抑制劑 致心律不整性右心室心肌病變/發育不良(ARVC、ARVD) 迄今為止,13種不同基因與此病症相關。 PKP2 (138) 親斑蛋白2 Q99959 接合處及中間絲中之黏附蛋白。 常染色體顯性多囊性腎病(ADPKD) BICC1;GANAB;PKD1;PKD2 PKD1 (1154) 多囊蛋白1 P98161 離子通道複合物之次單元 Brugada症候群心室顫動 迄今為止,22種不同基因與此病症相關 SCN5A (725) 鈉電壓閘控通道α次單元5 Q14524 離子通道 兒茶酚胺激導性多形性室性心動過速(CPVT) CALM1;CALM2;CALM3;CASQ2;RYR2;TECRL;TRDN RYR2 (288) 蘭尼鹼受體2 Q92736 離子通道 Charcot–Marie-Tooth d疾病/遺傳性運動及感覺神經病變 迄今為止,75種不同基因與此病症相關。 PMP22 (63) 外周髓磷脂蛋白22 d Q01453 Ill-髓磷脂及雪旺細胞中之作用不明確 先天性腎上腺增生(CAH) CYP11B1;CYP17A1;CYP21A2;HSD3B2;POR;STAR CYP21A2 (214) 細胞色素P450家族21子族A成員2 P08686 先天性蔗糖酶-異麥芽糖酶缺乏(CSID) SI SI (23) 蔗糖酶-異麥芽糖酶 P14410 先天性雙側輸精管缺如 CFTR;ADGRG2 CFTR (120) 囊性纖維化跨膜傳導調節因子 Q20BH0 離子通道 囊性纖維化 CFTR;CLCA4;DCTN4;STX1A;TGFB1 CFTR (1053) 囊性纖維化跨膜傳導調節因子 Q20BH0 離子通道 胱胺酸尿-離胺酸尿症候群/胱胺酸尿 SLC3A1;SLC7A9 SLC7A9 (83) 溶質載體家族7成員9 P82251 膜轉運蛋白 巨細胞型先天性腎上腺增生(AHC) (先天性腎上腺增生之亞型) NR0B1 NR0B1 (112) 核受體子族0 B組成員1 P51843 核受體 牙本質發育不全(DGI) (所有類型) DSPP DSPP (11) 牙本質唾液酸磷蛋白 Q9NZW4 種子生物礦化、牙本質形成 Duchenne肌肉萎縮(DMD) DMD;LTBP4 DMD (830) 抗肌萎縮蛋白 P11532 結構蛋白 異常β脂蛋白血症/高脂蛋白血症3型 APOE APOE (42) 載脂蛋白E P02649 脂質載體、脂蛋白 Ehlers-Danlos症候群 COL1A1;COL5A1;COL5A2 COL5A1 (106) V型膠原蛋白α1鏈、V型膠原蛋白α2鏈 P20908、P05997 結構蛋白 家族性腺瘤性息肉病(FAP) APC;MUTYH APC (539) 結腸腺瘤性息肉病蛋白 P25054 腫瘤抑制因子、調節蛋白 Gardner症候群(家族性腺瘤性息肉病之亞型) APC APC (539) 結腸腺瘤性息肉病蛋白 P25054 腫瘤抑制因子,與微管相關 家族性腦海綿狀血管瘤 CCM2;KRIT1;PDCD10 KRIT1 (80) Krev相互作用捕獲蛋白1 O00522 調節蛋白 家族性低尿鈣性高鈣血症1型(FHH) CASR CASR (373) 鈣敏感受體 P41180 G蛋白偶合受體 家族性高膽固醇血症 APOB;LDLR;LDLRAP1;PCSK LDLR (1254) 低密度脂蛋白受體 P01130 脂蛋白受體 家族性孤立性擴張型心肌病變 迄今為止,45種不同基因與此病症相關。 TTN (672) Titin Q8WZ42 肌肉蛋白 家族性長QT症候群(LQTS),包括Romano-Ward症候群 迄今為止,19種不同基因與此病症相關。 KCNQ1 (448) 鉀電壓閘控通道子族Q成員1 P51787 離子通道 脆性X症候群/Martin-bell症候群 FMR1 FMR1 (7) 脆性X智力低下1 Q06787 mRNA生物學調節因子 葡萄糖-6-磷酸去氫酶缺乏 G6PD G6PD (218) 葡萄糖-6-磷酸1去氫酶 P11413 肝醣儲積症 迄今為止,27種不同基因與此病症相關。 AGL (117) 肝醣去分支酶 P35573 GM2神經節苷脂症 GM2A;HEXA;HEXB HEXA (124) 胺基己糖苷酶次單元α P06865 血色沈著 BMP6;HAMP;HFE;HJV;SLC40A1;TFR2 HFE (43) 遺傳性血色沈著蛋白 Q30201 結合轉鐵蛋白受體 歸因於紅血球丙酮酸激酶缺乏之溶血性貧血 PKLR PKLR (237) 丙酮酸激酶 P30613 血友病A及B F8;F9 F8 (1898) 凝血因子VIII P00451 因子IXa之輔因子 血友病A F8 F8 (3364) 凝血因子VIII P00451 因子IXa之輔因子 出血性毛細血管擴張症/Osler Weder Rendu疾病 ACVRL1;ENG;GDF2;SMAD4 ENG (187) 內皮糖蛋白 P17813 血管生成之調節 遺傳性血管性水腫(HAE)/血管神經性水腫 ANGPT1;F12;PLG;SERPING1 SERPING1 (252) Serpin家族G成員1 P05155 酶抑制劑 遺傳性乳癌及卵巢癌症候群 迄今為止,14種不同基因與此病症相關。 BRCA1 (1262) 乳癌1型易感蛋白 P38398 E3泛素蛋白連接酶 遺傳性果糖不耐受/果糖血症 ALDOB ALDOB (32) 醛縮酶、果糖-二磷酸B P05062 遺傳性黃嘌呤尿/黃嘌呤結石疾病 MOCOS;XDH MOCOS (8);XDH (17) 鉬輔因子硫酸酶、黃嘌呤去氫酶 Q9C5X8、P47989 少汗性外胚層發育不良(HED) 迄今為止,10種不同基因與此病症相關。 EDA (199) 外異蛋白A Q92838 細胞介素 亞胺基甘胺酸尿症 SLC36A2;SLC6A18;SLC6A19;SLC6A20 SLC36A2 (1) 溶質載體家族36成員2 Q495M3 膜轉運蛋白 Li-Fraumeni症候群 肉瘤、乳癌、白血病及腎上腺(SBLA)症候群 CDKN2A;CHEK2;MDM2;TP53 TP53 (417) 腫瘤蛋白p53 P04637 腫瘤抑制因子、基因調節 長鏈3-羥基醯基-CoA去氫酶缺乏(LCHAD) HADHA HADHA (35) 羥基醯基-CoA去氫酶三功能性多酶複合物次單元α P40939 Lynch症候群 迄今為止,11種不同基因與此病症相關。 MSH2 (34) DNA錯配修復蛋白Msh2 P43246 DNA修復、結合DNA、ATP酶 Marfan症候群 FBN1;TGFBR2 FBN1 (1893) 原纖維蛋白1 P35555 結構蛋白、細胞外基質 母體苯酮尿症/苯酮尿症胚胎病 PAH PAH (690) 苯丙胺酸羥化酶 P00439 中鏈醯基-CoA去氫酶缺乏(MCADD) ACADM ACADM (136) 醯基-CoA去氫酶中鏈 P11310 黏脂貯積症III型(ML3) α/β GNPTAB GNPTAB (68) N-乙醯基葡糖胺1磷酸轉移酶,α及β次單元 Q3T906 黏多醣貯積症4A型(MPS4A)/Morquio疾病A型 GALNS GALNS (269) 半乳糖胺(N-乙醯基)-6-硫酸酯酶 P34059 多發性內分泌贅瘤2型 RET RET (130) Ret原癌基因受體酪胺酸激酶 P07949 受體酪胺酸激酶 多發性骨骺發育不良(MED) COL2A1;COL9A1;COL9A2;COL9A3/膠原蛋白IX型α3鏈;COMP;KIF7;MATN3;SLC26A2 COMP (155) 軟骨寡聚基質蛋白 P49747 結構蛋白 神經纖維瘤病1型(NF1)/Von Recklinghausen疾病 NF1 NF1 (1208) 神經纖維瘤蛋白1 P21359 Ras GTP酶活性之調節因子 眼皮膚白化症(OCA) LRMDA;MC1R;OCA2;SLC24A5;SLC45A2;TYR;TYRP1 TYR (352) 酪胺酸酶 P14679 成骨不全/脆骨病 迄今為止,15種不同基因與此病症相關。 COL1A1 (547);COL1A2 (466) I型膠原蛋白α1鏈、I型膠原蛋白α2鏈 P02452、P08123 結構蛋白 Pendred症候群(PDS)/甲狀腺腫性耳聾 FOXI1;KCNJ10;SLC26A4 SLC26A4 (404) 溶質載體家族26成員4 O43511 膜轉運蛋白 苯酮尿症(PKU)/苯丙胺酸羥化酶缺乏(PAH缺乏) PAH PAH (690) 苯丙胺酸羥化酶 P00439 近端性脊髓性肌肉萎縮(SMA) NAIP;SMN1;SMN2 SMN1 (47) 運動神經元存活蛋白 Q16637 RNA剪接 色素性視網膜炎(RP) 迄今為止,82種不同基因與此病症相關。 RHO (204) 視紫質 P08100 G蛋白偶合受體 隱性X連鎖魚鱗病(XLI) STS STS (28) 類固醇硫酸酯酶 P08842 視網膜母細胞瘤(RB雙側(40%病例)及單側(60%病例—新突變) NMYC;RB1 RB1 (292) RB轉錄輔阻遏物1 P06400 腫瘤抑制因子、細胞週期調節 Rett症候群 MECP2 MECP2 (246) 甲基-CpG結合蛋白2 P51608 結合於甲基化DNA,基因調節 鐮狀細胞貧血 HBB HBB (433) 血紅素β次單元 P68871 載氧體 Sotos症候群/腦巨人症 APC2;NSD1;SETD2 NSD1 (228) 核受體結合SET結構域蛋白1 Q96L73 Stargardt疾病/眼底黃斑 ABCA4;CNGB3;ELOVL4;PROM1;PRPH2 ABCA4 (789) ATP結合卡匣子族A成員4 P78363 膜轉運蛋白 Stickler症候群/遺傳性進行性關節眼病 COL11A1;COL2A1;COL11A2;COL9A1;COL9A2;COL9A3;LOXL3 COL2A1 (335) II型膠原蛋白α1鏈 P02458 結構蛋白 主動脈瓣上狹窄(SVAS) ELN ELN (25) 彈性蛋白 P15502 結構蛋白 β-地中海型貧血 HBB HBB (434) 血紅素B鏈 P68871 氧載體 脛骨肌肉萎縮/上行肌病 TTN TTN (53) Titin Q8WZ42 肌肉蛋白 結節性硬化症/Bourneville症候群 TSC1;TSC2 TSC2 (518) 馬鈴薯蛋白 P49815 腫瘤抑制因子、mTORC1信號傳導之調節 Von-Hippel Lindau疾病 VHL VHL (218) Von Hippel–Lindau腫瘤抑制因子 P40337 腫瘤抑制因子,在E3泛素連接酶複合物中之作用 Von Willebrand疾病 VWF VWF (636) Von Willebrand因子 P04275 膠原蛋白結合,凝血因子VIII之伴侶蛋白 X連鎖腎上腺腦白質營養不良(ALD) ABCD1 ABCD1 (425) ATP結合卡匣子族D成員1 P33897 膜轉運蛋白 In addition, the gene editing system disclosed herein can also be used to treat the following genetic diseases by editing defects in a disease-associated gene or more than one gene associated with a particular disease, as follows: A B C D E F Genetic diseases Disease-related genes The most common (B) (C) Coded Product (C) Deposit Number (C) Product Type Adrenal hyperplasia due to 21-hydroxylase deficiency (21-OHD CAH) CYP21A2 CYP21A2 (196) Cytochrome P450 family 21 subgroup A member 2 P08686 Enzymes Aicardi-Goutières syndrome encephalopathy ADAR; IFIH1; RNASEH2A; RNASEH2B; RNASEH2C; SAMHD1; TREX1 RNASEH2B (28) RNase H2 subunit B Q5TBB1 Enzymes Alpha-1-antitrypsin (A1AT) deficiency (AATD) SERPINA1 SERPINA1 (83) Serpin family A member 1 P01009 Enzyme inhibitors Arrhythmogenic right ventricular cardiomyopathy/dysplasia (ARVC, ARVD) To date, 13 different genes have been linked to the condition. PKP2 (138) Plaquephilin 2 Q99959 Adhesion proteins in junctions and intermediate filaments. Autosomal dominant polycystic kidney disease (ADPKD) BICC1; GANAB; PKD1; PKD2 PKD1 (1154) Polycystin 1 P98161 Subunit of the ion channel complex Brugada syndrome ventricular fibrillation So far, 22 different genes have been linked to the condition. SCN5A (725) Sodium voltage gate channel α subunit 5 Q14524 Ion channel Catecholamine-induced polymorphic ventricular tachycardia (CPVT) CALM1; CALM2; CALM3; CASQ2; RYR2; TECRL; TRDN RYR2 (288) Ryanodine receptor 2 Q92736 Ion channel Charcot–Marie- Tooth disease/hereditary motor and sensory neuropathy To date, 75 different genes have been linked to the condition. PMP22 (63) Myelin protein 22 d Q01453 Ill-Myelin and Schwann cells have unclear roles Congenital adrenal hyperplasia (CAH) CYP11B1; CYP17A1; CYP21A2; HSD3B2; POR; STAR CYP21A2 (214) Cytochrome P450 family 21 subgroup A member 2 P08686 Enzymes Congenital sucrase-isomaltase deficiency (CSID) SI SI (23) Sucrase-Isomaltase P14410 Enzymes Congenital bilateral absence of the vas deferens CFTR;ADGRG2 CFTR (120) Cystic fibrosis transmembrane conductance regulator Q20BH0 Ion channel Cystic fibrosis CFTR; CLCA4; DCTN4; STX1A; TGFB1 CFTR (1053) Cystic fibrosis transmembrane conductance regulator Q20BH0 Ion channel Cystinuria-Lysinuria Syndrome/Cystinuria SLC3A1; SLC7A9 SLC7A9 (83) Solute carrier family 7 member 9 P82251 Membrane transporter Congenital adrenal hyperplasia giant cell type (AHC) (a subtype of congenital adrenal hyperplasia) NR0B1 NR0B1 (112) Nuclear receptor subfamily 0 group B member 1 P51843 Nuclear receptor Dentinogenesis Imperfecta (DGI) (all types) DSPP DSPP (11) Dentin sialophosphoprotein Q9NZW4 Seed biomineralization, dentin formation Duchenne Muscular Dystrophy (DMD) DMD;LTBP4 DMD (830) Dystrophin P11532 Structural proteins Abnormal betalipoproteinemia/hyperlipoproteinemia type 3 APOE APOE (42) Apolipoprotein E P02649 Lipoprotein Ehlers-Danlos syndrome COL1A1;COL5A1;COL5A2 COL5A1 (106) Type V collagen α1 chain, type V collagen α2 chain P20908, P05997 Structural proteins Familial Adenomatous Polyposis (FAP) APC; MUTYH APC (539) adenomatous polyposis coli protein P25054 Tumor suppressor factors, regulatory proteins Gardner syndrome (a subtype of familial adenomatous polyposis) APC APC (539) adenomatous polyposis coli protein P25054 Tumor suppressor factor, microtubule-associated Familial cavernous hemangioma CCM2;KRIT1;PDCD10 KRIT1 (80) Krev interacting trap protein 1 O00522 Regulatory proteins Familial Hypocalcemic Hypercalcemia Type 1 (FHH) CASR CASR (373) Calcium sensitive receptor P41180 G protein-coupled receptor Familial hypercholesterolemia APOB; LDLR; LDLRAP1; PCSK LDLR (1254) Low-density lipoprotein receptor P01130 Lipoprotein receptor Familial isolated dilated cardiomyopathy To date, 45 different genes have been linked to the condition. TTN (672) Titin Q8WZ42 Muscle Protein Familial long QT syndrome (LQTS), including Romano-Ward syndrome To date, 19 different genes have been linked to the condition. KCNQ1 (448) Potassium voltage gated channel subfamily Q member 1 P51787 Ion channel Fragile X Syndrome/Martin-bell Syndrome FMR1 FMR1 (7) Fragile X mental retardation 1 Q06787 mRNA biological regulators Glucose-6-phosphate dehydrogenase deficiency G6PD G6PD (218) Glucose-6-phosphate dehydrogenase P11413 Enzymes Glycogen storage disease To date, 27 different genes have been linked to the condition. AGL (117) Glycogen debranching enzyme P35573 Enzymes GM2 gangliosidosis GM2A; HEXA; HEXB HEXA (124) Hexosaminidase subunit alpha P06865 Enzymes Bloody BMP6; HAMP; HFE; HJV; SLC40A1; TFR2 HFE (43) Hereditary hemochromatin Q30201 Binds to transferrin receptor Hemolytic anemia due to erythrocyte pyruvate kinase deficiency PKLR PKLR (237) Pyruvate kinase P30613 Enzymes Hemophilia A and B F8; F9 F8 (1898) Coagulation Factor VIII P00451 Factor IXa cofactor Hemophilia A F8 F8 (3364) Coagulation Factor VIII P00451 Factor IXa cofactor Hemorrhagic telangiectasia/Osler Weder Rendu disease ACVRL1;ENG;GDF2;SMAD4 ENG (187) Endoglin P17813 Regulation of angiogenesis Hereditary angioedema (HAE)/angioneurotic edema ANGPT1; F12; PLG; SERPING1 SERPING1 (252) Serpin family G member 1 P05155 Enzyme inhibitors Hereditary breast and ovarian cancer syndrome To date, 14 different genes have been linked to the condition. BRCA1 (1262) Breast cancer susceptibility protein type 1 P38398 E3 ubiquitin protein ligase Hereditary fructose intolerance/fructoseemia ALDOB ALDOB (32) Aldolase, fructose-bisphosphate B P05062 Enzymes Hereditary xanthinuria/xanthine stone disease MOCOS;XDH MOCOS (8); XDH (17) Molybdenum cofactor sulfatase, xanthine dehydrogenase Q9C5X8, P47989 Enzymes Hypohidrotic ectodermal dysplasia (HED) To date, 10 different genes have been linked to the condition. EDA (199) Exogenous protein A Q92838 Interleukin Iminoglycinuria SLC36A2; SLC6A18; SLC6A19; SLC6A20 SLC36A2 (1) Solute carrier family 36 member 2 Q495M3 Membrane transporter Li-Fraumeni syndrome Sarcoma, Breast Cancer, Leukemia and Adrenal (SBLA) syndrome CDKN2A;CHEK2;MDM2;TP53 TP53 (417) Tumor protein p53 P04637 Tumor suppressors, gene regulators Long-chain 3-hydroxyacyl-CoA dehydrogenase deficiency (LCHAD) HADHA HADHA (35) Hydroxyacyl-CoA dehydrogenase trifunctional multienzyme complex subunit alpha P40939 Enzymes Lynch syndrome To date, 11 different genes have been linked to the condition. MSH2 (34) DNA mismatch repair protein Msh2 P43246 DNA repair, DNA binding, ATPase Marfan syndrome FBN1;TGFBR2 FBN1 (1893) Tropofibromin 1 P35555 Structural proteins, extracellular matrix Maternal phenylketonuria/phenylketonuria embryopathy PAH PAH (690) Phenylalanine hydroxylase P00439 Enzymes Medium-chain acyl-CoA dehydrogenase deficiency (MCADD) ACADM ACADM (136) Acyl-CoA dehydrogenase middle chain P11310 Enzymes Mucolipidosis type III (ML3) α/β GNPTAB GNPTAB (68) N-acetylglucosamine 1-phosphotransferase, alpha and beta subunits Q3T906 Enzymes Mucopolysaccharidosis type 4A (MPS4A)/Morquio disease type A GALNS GALNS (269) Galactosamine (N-acetyl)-6-sulfatase P34059 Enzymes Multiple endocrine neoplasia type 2 RET RET (130) Ret proto-oncogene receptor tyrosine kinase P07949 Receptor tyrosine kinase Multiple Epiphyseal Dysplasia (MED) COL2A1; COL9A1; COL9A2; COL9A3/collagen type IX α3 chain; COMP; KIF7; MATN3; SLC26A2 COMP (155) Cartilage oligomeric matrix protein P49747 Structural proteins Neurofibromatosis type 1 (NF1)/Von Recklinghausen disease NF1 NF1 (1208) Neurofibromatosis protein 1 P21359 Regulators of Ras GTPase activity Oculocutaneous albinism (OCA) LRMDA; MC1R; OCA2; SLC24A5; SLC45A2; TYR; TYRP1 TYR (352) Tyrosinase P14679 Enzymes Osteogenesis imperfecta/brittle bone disease So far, 15 different genes have been linked to the condition. COL1A1 (547); COL1A2 (466) Type I collagen alpha 1 chain, type I collagen alpha 2 chain P02452、P08123 Structural proteins Pendred syndrome (PDS)/thyroid deafness FOXI1;KCNJ10;SLC26A4 SLC26A4 (404) Solute carrier family 26 member 4 O43511 Membrane transporter Phenylketonuria (PKU)/phenylalanine hydroxylase deficiency (PAH deficiency) PAH PAH (690) Phenylalanine hydroxylase P00439 Enzymes Proximal Spinal Muscular Atrophy (SMA) NAIP; SMN1; SMN2 SMN1 (47) Motor neuron survival protein Q16637 RNA splicing Retinitis Pigmentosa (RP) To date, 82 different genes have been linked to the condition. RHO (204) Rhodopsin P08100 G protein-coupled receptor XLI (Latent X-linked Scale Infection) STS STS (28) Steroid sulfatase P08842 Enzymes Retinoblastoma (RB bilateral (40% of cases) and unilateral (60% of cases—new mutation) NMYC;RB1 RB1 (292) RB transcriptional co-repressor 1 P06400 Tumor suppressor factor, cell cycle regulator Rett syndrome MECP2 MECP2 (246) Methyl-CpG binding protein 2 P51608 Binds to methylated DNA, gene regulation Sickle cell anemia HBB HBB (433) Heme β subunit P68871 Oxygen carrier Sotos syndrome/gigantism APC2; NSD1; SETD2 NSD1 (228) Nuclear receptor binding SET domain protein 1 Q96L73 Enzymes Stargardt disease/macular fundus ABCA4; CNGB3; ELOVL4; PROM1; PRPH2 ABCA4 (789) ATP binding cassette family A member 4 P78363 Membrane transporter Stickler syndrome/hereditary progressive arthropathy COL11A1; COL2A1; COL11A2; COL9A1; COL9A2; COL9A3; LOXL3 COL2A1 (335) Type II collagen alpha 1 chain P02458 Structural proteins Supravalvular aortic stenosis (SVAS) ELN ELN (25) Elastin P15502 Structural proteins β-Thalassemia HBB HBB (434) Heme B chain P68871 Oxygen carrier Tibial muscle atrophy/ascending myopathy TTN TTN (53) Titin Q8WZ42 Muscle Protein Tuberous sclerosis/Bourneville syndrome TSC1;TSC2 TSC2 (518) Potato protein P49815 Tumor suppressor, regulation of mTORC1 signaling Von-Hippel Lindau disease VHL VHL (218) Von Hippel–Lindau tumor suppressor factor P40337 Tumor suppressor factor, role in the E3 ubiquitin ligase complex Von Willebrand disease VWF VWF (636) Von Willebrand Factor P04275 Collagen binding, partner protein of factor VIII X-linked adrenoleukodystrophy (ALD) ABCD1 ABCD1 (425) ATP-binding cassette family D member 1 P33897 Membrane transporter

因此,為了治療一或多種此類疾病或病症,在本發明之各個態樣中,使與某些疾病及病症(例如,基因突變)相關之一或多個靶向多核苷酸序列接觸本文所揭示之基於逆轉錄子之基因編輯系統;及引導RNA,其中引導RNA包含靶向多核苷酸序列之互補序列。Therefore, in order to treat one or more such diseases or disorders, in various aspects of the present invention, one or more targeted polynucleotide sequences associated with certain diseases and disorders (e.g., gene mutations) are contacted with the retrotranscript-based gene editing system disclosed herein; and a guide RNA, wherein the guide RNA comprises a complementary sequence to the targeted polynucleotide sequence.

在一些實施例中,引導RNA將基於逆轉錄子之編輯系統之可程式化核酸酶引導至標靶位點或靶向多核苷酸序列;且視情況與多肽及引導RNA形成核糖核蛋白複合物。In some embodiments, the guide RNA guides the programmable nuclease of the retroviral editing system to the target site or the targeted polynucleotide sequence; and optionally forms a ribonucleoprotein complex with the polypeptide and the guide RNA.

本文所揭示之基因體編輯系統的額外治療應用包括鹼基編輯、先導編輯、基因插入及/或缺失。Additional therapeutic applications of the genome editing systems disclosed herein include base editing, prime editing, gene insertion and/or deletion.

基因體編輯系統之診斷應用包括探針、診斷、治療診斷。Diagnostic applications of genome editing systems include probes, diagnostics, and therapeutic diagnostics.

包含異源核酸序列編輯系統可用於多種應用,本文描述其數個非限制性實例。一般而言,編輯系統可用於任何合適生物體。在一些實施例中,生物體為真核生物。The editing system comprising heterologous nucleic acid sequences can be used in a variety of applications, several non-limiting examples of which are described herein. In general, the editing system can be used in any suitable organism. In some embodiments, the organism is a eukaryotic organism.

在一些實施例中,生物體為動物。在一些實施例中,動物為魚、兩棲類、爬行動物、哺乳動物或鳥。在一些實施例中,動物為農場動物或農業動物。農場及農業動物之非限制性實例包括馬、山羊、綿羊、豬、牛、美洲駝、羊駝及鳥, 例如雞、火雞、鴨及鵝。在一些實施例中,動物為非人類靈長類動物, 例如狒狒、卷尾猴、黑猩猩、狐猴、獼猴、狨猴、絹毛猴、蜘蛛猴、松鼠猴及長尾黑顎猴。在一些實施例中,動物為寵物。寵物之非限制性實例包括犬、貓、馬、兔、雪貂、沙鼠、倉鼠、毛絲鼠、花枝鼠、豚鼠、金絲雀、長尾小鸚鵡及鸚鵡。 In some embodiments, the organism is an animal. In some embodiments, the animal is a fish, an amphibian, a reptile, a mammal, or a bird. In some embodiments, the animal is a farm animal or an agricultural animal. Non-limiting examples of farm and agricultural animals include horses, goats, sheep, pigs, cattle, camels, alpacas, and birds, such as chickens, turkeys, ducks, and geese. In some embodiments, the animal is a non-human primate, such as a baboon, a capuchin monkey, a chimpanzee, a lemur, a macaque, a marmoset, a tamarin, a spider monkey, a squirrel monkey, and a vervet monkey. In some embodiments, the animal is a pet. Non-limiting examples of pets include dogs, cats, horses, rabbits, ferrets, gerbils, hamsters, chinchillas, chinchillas, guinea pigs, canaries, parrots, and parrots.

在一些實施例中,生物體為植物。可經Cas12a編輯系統轉染之植物包括單子葉植物及雙子葉植物。特定實例包括但不限於玉米(corn/maize)、高粱、小麥、向日葵、馬鈴薯、棉花、水稻、大豆、甜菜、甘蔗、煙草、大麥及油菜、蕓薹屬、苜蓿、黑麥、小米、紅花、花生、地瓜、木薯、咖啡、椰子、鳳梨、柑橘樹、可可、茶、香蕉、鱷梨、無花果、番石榴、芒果、橄欖、木瓜、腰果、澳洲堅果、杏仁、燕麥、蔬菜、觀賞植物及針葉樹。蔬菜包括但不限於十字花科植物、辣椒、番茄、生菜、青豆、利馬豆、豌豆及 黃瓜屬成員(諸如黃瓜、哈密瓜及甜瓜)。觀賞植物包括但不限於杜鵑花、繡球花、木槿花、玫瑰、鬱金香、水仙花、矮牽牛、康乃馨、一品紅及菊花。 In some embodiments, the organism is a plant. Plants that can be transfected by the Cas12a editing system include monocots and dicots. Specific examples include, but are not limited to, corn (corn/maize), sorghum, wheat, sunflower, potato, cotton, rice, soybean, sugar beet, sugar cane, tobacco, barley and rapeseed, sedge, alfalfa, rye, millet, safflower, peanut, sweet potato, cassava, coffee, coconut, pineapple, citrus, cocoa, tea, banana, avocado, fig, guava, mango, olive, papaya, cashew, macadamia, almond, oat, vegetables, ornamental plants and conifers. Vegetables include, but are not limited to, cruciferous vegetables, peppers, tomatoes, lettuce, green beans, lima beans, peas, and members of the cucurbit family (such as cucumbers, cantaloupes, and melons). Ornamental plants include, but are not limited to, azaleas, hydrangeas, hibiscus, roses, tulips, daffodils, dwarf yews, carnations, poinsettias, and chrysanthemums.

在一些實施例中,可將異源核酸序列添加至本發明之編輯系統中以向細胞提供編碼相關蛋白質或調節RNA之異源核酸、細胞條碼、適合用於基因編輯( 例如,藉由同源定向修復(HDR)或重組介導之基因工程改造(重組工程))之供體多核苷酸或用於分子記錄之CRISPR原間隔基DNA序列,如下文進一步論述。在涉及基於逆轉錄子之基因編輯系統的實施例中,此類異源序列可插入例如 msr基因座或 msd基因座中,使得異源序列由逆轉錄子逆轉錄酶轉錄為msDNA產物之部分。 In some embodiments, heterologous nucleic acid sequences can be added to the editing systems of the present invention to provide cells with heterologous nucleic acids encoding proteins of interest or regulatory RNAs, cell barcodes, donor polynucleotides suitable for gene editing ( e.g. , by homology-directed repair (HDR) or recombination-mediated genetic engineering (recombineering)), or CRISPR protospacer DNA sequences for molecular recording, as further discussed below. In embodiments involving retrotranscript-based gene editing systems, such heterologous sequences can be inserted, for example, into the msr locus or msd locus, such that the heterologous sequence is transcribed by the retrotranscript reverse transcriptase as part of the msDNA product.

在一些實施例中,本文所述之編輯系統可用於研究工具(諸如套組)、功能性基因體學分析以及生成用於研究及藥物篩選的經工程改造之細胞株及動物模型。除了編輯系統以外,套組亦可包含一或多種試劑,諸如緩衝液、對照試劑、對照載體、對照RNA多核苷酸、用於自DNA活體外產生多肽之試劑及用於測序之銜接子。緩衝液可為例如穩定緩衝液、重構緩衝液、稀釋緩衝液、洗滌緩衝液或用於將套組之多肽及/或多核苷酸引入細胞中之緩衝液。在一些情況下,套組可包含一或多種對植物具特異性之額外試劑。用於植物之一或多種額外試劑可包括例如土壤、營養物、植物、種子、孢子、 農桿菌、T-DNA載體及pBINAR載體。 蛋白質或RNA之產生 In some embodiments, the editing systems described herein can be used for research tools (such as kits), functional genomic analysis, and the generation of engineered cell lines and animal models for research and drug screening. In addition to the editing system, the kit can also include one or more reagents, such as buffers, control reagents, control vectors, control RNA polynucleotides, reagents for in vitro production of polypeptides from DNA, and linkers for sequencing. The buffer can be, for example, a stabilization buffer, a reconstitution buffer, a dilution buffer, a wash buffer, or a buffer for introducing the polypeptides and/or polynucleotides of the kit into cells. In some cases, the kit may include one or more additional reagents specific to plants. The one or more additional reagents for plants may include, for example, soil, nutrients, plants, seeds, spores, Agrobacterium , T-DNA vectors, and pBINAR vectors. Production of protein or RNA

在一些實施例中,由本發明之經工程改造之逆轉錄子生成的單股msDNA可用於在細胞中產生所需之相關產物。In some embodiments, single-stranded msDNA generated by the engineered retrotranscripts of the present invention can be used to produce desired products of interest in cells.

在一些實施例中,以編碼相關多肽之異源序列對逆轉錄子進行工程改造,以允許由細胞中生成之逆轉錄子msDNA產生多肽。相關多肽可為任何類型之蛋白質/肽,包括但不限於酶、細胞外基質蛋白、受體、轉運蛋白、離子通道或其他膜蛋白、激素、神經肽、抗體或細胞骨架蛋白、其功能片段或相關生物活性結構域。在一些實施例中,該蛋白質為治療蛋白、用於治療疾病之治療抗體或固定基因體中之突變或突變型外顯子之模板。In some embodiments, a retrotranscript is engineered with a heterologous sequence encoding a polypeptide of interest to allow production of a polypeptide from the retrotranscript msDNA generated in a cell. The polypeptide of interest may be any type of protein/peptide, including but not limited to an enzyme, an extracellular matrix protein, a receptor, a transporter, an ion channel or other membrane protein, a hormone, a neuropeptide, an antibody or a cytoskeletal protein, a functional fragment thereof or a related biologically active domain. In some embodiments, the protein is a therapeutic protein, a therapeutic antibody for treating a disease, or a template for fixing a mutation or mutant exon in a genome.

相關多肽之非限制性實例包括:生長激素、胰島素樣生長因子(IGF-1)、Fat-1、植酸酶、木聚糖酶、β-葡聚糖酶、溶菌酶或溶葡球菌素、組蛋白去乙醯酶,諸如HDAC6、CD163 Non-limiting examples of related polypeptides include: growth hormone, insulin-like growth factor (IGF-1), Fat-1, phytase, xylanase, β-glucanase, lysozyme or lysostaphin, histone deacetylase, such as HDAC6, CD163 , etc.

在其他實施例中,以編碼相關RNA之異源序列對逆轉錄子進行工程改造,以允許由細胞中之逆轉錄子產生RNA。相關RNA可為任何類型之RNA,包括但不限於RNA干擾(RNAi)核酸或調節RNA,諸如但不限於微小RNA (miRNA)、小干擾RNA (siRNA)、短髮夾RNA (shRNA)、小核RNA (snRNA)、長非編碼RNA (IncRNA)、反義核酸及其類似物。 重組工程 In other embodiments, the retrotranscript is engineered with a heterologous sequence encoding a related RNA to allow RNA to be produced by the retrotranscript in the cell. The related RNA can be any type of RNA, including but not limited to RNA interference (RNAi) nucleic acids or regulatory RNAs, such as but not limited to microRNA (miRNA), small interfering RNA (siRNA), short hairpin RNA (shRNA), small nuclear RNA (snRNA), long noncoding RNA (IncRNA), antisense nucleic acids, and the like. Recombinant Engineering

重組工程(重組介導之基因工程改造)可用於修飾細胞中之染色體及游離複製子,例如,以產生基因置換、基因剔除、缺失、插入、倒置或點突變。重組工程亦可用於修飾質體或細菌人工染色體(BAC),例如,以選殖基因或插入標記物或標籤。Recombination (recombination-mediated genetic engineering) can be used to modify chromosomes and episomes in cells, for example, to produce gene replacement, gene knockout, deletion, insertion, inversion or point mutation. Recombination can also be used to modify plasmids or bacterial artificial chromosomes (BACs), for example, to clone genes or insert markers or tags.

本文所述之經工程改造之逆轉錄子可用於重組工程應用,以提供用於重組之線性單股或雙股DNA。可由噬菌體蛋白,諸如來自Rac原噬菌體之RecE/RecT或來自λ噬菌體之Redobd介導同源重組。線性DNA應在5'及3'端與細胞中存在之標靶DNA分子( 例如,質體、BAC或染色體)具有足夠同源性,以允許重組。 The engineered retrotransposons described herein can be used in recombineering applications to provide linear single-stranded or double-stranded DNA for recombination. Homologous recombination can be mediated by phage proteins such as RecE/RecT from the Rac prophage or Redobd from the lambda phage. The linear DNA should have sufficient homology at the 5' and 3' ends to the target DNA molecule present in the cell ( e.g. , plasmid, BAC or chromosome) to allow recombination.

用於重組工程之線性單股或雙股DNA分子( 亦即,供體多核苷酸)包含具有欲側接兩個同源臂插入之預期編輯的序列,該等同源臂將線性DNA分子靶向標靶位點以用於同源重組。用於重組工程之同源臂通常介於13-300個核苷酸或20至200個核苷酸之長度範圍內,包括此範圍內之任何長度,諸如13、14、15、16、17、18、19、20、22、24、26、28、30、32、34、36、38、40、42、44、46、48、50、55、60、65、70、75、80、85、90、95、100、105、110、115、120、125、130、135、140、145、150、155、160、165、170、175、180、185、190、195或200個核苷酸長。在一些實施例中,同源臂為至少15、至少20、至少30、至少40或至少50個或更多核苷酸長。介於40-50個核苷酸之長度範圍內的同源臂一般具有足夠靶向效率以用於重組;然而,介於150至200個鹼基或更多之範圍內的較長同源臂可進一步改良靶向效率。在一些實施例中,5'同源臂及3'同源臂之長度不同。例如,線性DNA可在5'端具有約50個鹼基且在3'端具有約20個鹼基,該等鹼基與欲靶向區域具有同源性。 A linear single-stranded or double-stranded DNA molecule ( ie, donor polynucleotide) for recombineering comprises a sequence with the desired edit to be inserted flanked by two homology arms that target the linear DNA molecule to a target site for homologous recombination. Homology arms used for recombination engineering typically range from 13-300 nucleotides in length, or from 20 to 200 nucleotides in length, including any length within this range, such as 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, or 200 nucleotides in length. In some embodiments, the homology arms are at least 15, at least 20, at least 30, at least 40, or at least 50 or more nucleotides long. Homology arms in the range of 40-50 nucleotides in length generally have sufficient targeting efficiency for recombination; however, longer homology arms in the range of 150 to 200 bases or more can further improve targeting efficiency. In some embodiments, the 5' homology arm and the 3' homology arm are different in length. For example, a linear DNA may have about 50 bases at the 5' end and about 20 bases at the 3' end that have homology to the region to be targeted.

噬菌體同源重組蛋白可作為蛋白質或藉由編碼重組蛋白之一或多種載體(諸如載體或載體系統)提供至細胞。在一些實施例中,編碼噬菌體重組蛋白之一或多種載體包括於載體系統中,該載體系統包含經工程改造之逆轉錄子 msr基因、 msd基因及/或 ret基因序列。另外,多種含有原噬菌體重組系統之細菌菌株可用於重組工程,包括但不限於DY380,其含有具有重組蛋白exo、bet及gam之缺陷性l原噬菌體;EL250,源自DY380,除了DY380中發現之重組基因以外,亦含有嚴格控制之阿拉伯糖誘導型flpe基因(flpe介導兩個一致 frt位點之間的重組);EL350,亦源自DY380,除了DY380中發現之重組基因以外,亦含有嚴格控制之阿拉伯糖誘導性 ere基因( ere介導兩個一致loxP位點之間的重組;SW102,源自DY380,其經設計用於使用galK正/負選擇之BAC重組工程;SW105,源自EL250,其亦可用於galK正/負選擇,但如同EL250,含有ara誘導型Flpe基因;及SW106,源自EL350,其可用於用於galK正/負選擇,但如同EL350,含有ara誘導型Cre基因。可藉由用包含編碼適合重組工程之線性DNA之異源序列的經工程改造之逆轉錄子轉染此類菌株之細菌細胞來進行重組工程。對於重組工程系統及方案之論述,參見 例如Sharan 等人(2009) Nat Protoc. 4(2): 206-223,Zhang 等人(1998) Nature Genetics 20: 123-128,Muyrers 等人(1999) Nucleic Acids Res. 27: 1555-1557,Yu 等人(2000) Proc. Natl. Acad. Sci U.S.A. 97 (11):5978-5983;以引用之方式併入本文中。 分子記錄 The phage homologous recombinant protein can be provided to the cell as a protein or by one or more vectors (such as vectors or vector systems) encoding the recombinant protein. In some embodiments, one or more vectors encoding the phage recombinant protein are included in a vector system, which comprises an engineered retrotransposon msr gene, msd gene and/or ret gene sequence. In addition, a variety of bacterial strains containing prophage recombination systems can be used for recombinant engineering, including but not limited to DY380, which contains a defective prophage with the recombination proteins exo, bet, and gam; EL250, which is derived from DY380 and contains, in addition to the recombination genes found in DY380, a tightly controlled arabinose-induced flpe gene (flpe mediates recombination between two consensus frt sites); EL350, which is also derived from DY380 and contains, in addition to the recombination genes found in DY380, a tightly controlled arabinose-induced ere gene ( ere mediates recombination between two consensus loxP sites; SW102, derived from DY380, is designed for BAC recombination engineering using galK positive/negative selection; SW105, derived from EL250, can also be used for galK positive/negative selection, but like EL250, contains an ara-inducible Flpe gene; and SW106, derived from EL350, can be used for galK positive/negative selection, but like EL350, contains an ara-inducible Cre gene. Recombination engineering can be performed by transfecting bacterial cells of such strains with an engineered retrotransposons containing heterologous sequences encoding linear DNA suitable for recombination engineering. For a discussion of recombination engineering systems and protocols, see, for example, Sharan et al. (2009) Nat Protoc. 4(2): 206-223, Zhang et al. (1998) Nature Genetics 20: 123-128, Muyrers et al . (1999) Nucleic Acids Res. 27: 1555-1557, Yu et al. (2000) Proc. Natl. Acad. Sci USA 97 (11): 5978-5983; incorporated herein by reference. Molecular Records

在一些實施例中,經工程改造之逆轉錄子構築體中的異源序列包含合成CRISPR原間隔基DNA序列以允許分子記錄。細菌及古細菌通常利用內源性CRISPR Cas1-Cas2系統以藉由儲存短序列( 亦即,原間隔基)來追蹤源自病毒感染之外源DNA序列,該等短序列對基於基因體之陣列內的侵入性病毒核酸賦予序列特異性抗性。此等陣列不僅保留間隔序列,而且記錄擷取序列之次序,從而生成擷取事件之時間記錄。 In some embodiments, the heterologous sequence in the engineered retrotranscript construct comprises a synthetic CRISPR protospacer DNA sequence to allow molecular recording. Bacteria and archaea often utilize endogenous CRISPR Cas1-Cas2 systems to track exogenous DNA sequences derived from viral infection by storing short sequences ( i.e., protospacers) that confer sequence-specific resistance to invasive viral nucleic acids in genome-based arrays. These arrays not only preserve the spacer sequence, but also record the order in which the sequence was captured, thereby generating a temporal record of the capture event.

此系統可經調適以「合成原間隔基」之形式將任意DNA序列記錄至基因體CRISPR陣列中,該等合成原間隔基使用經工程改造之逆轉錄子引入細胞中。可使用攜帶原間隔基序列之經工程改造之逆轉錄子,藉由利用CRISPR系統Cas1-Cas2複合物將合成CRISPR原間隔基序列整合至特定基因體基因座處。可使用分子記錄,藉由產生穩定遺傳記憶追蹤代碼來追蹤某些生物事件。參見 例如Shipman 等人(2016) Science 353(6298): aafl 175及國際專利申請公開案第WO/2018/191525號;以引用之方式整體併入本文中。 This system can be adapted to record arbitrary DNA sequences into genomic CRISPR arrays in the form of "synthetic protospacers" that are introduced into cells using engineered retrotranscriptors. Engineered retrotranscriptors carrying protospacer sequences can be used to integrate synthetic CRISPR protospacer sequences into specific genomic loci using the CRISPR system Cas1-Cas2 complex. Molecular recording can be used to track certain biological events by generating stable genetic memory tracking codes. See , e.g., Shipman et al. (2016) Science 353(6298): aafl 175 and International Patent Application Publication No. WO/2018/191525; incorporated herein by reference in their entirety.

在一些實施例中,利用CRISPR-Cas系統將特定且任意DNA序列記錄至細菌基因體中。該等DNA序列可由細胞內之經工程改造之逆轉錄子產生。例如,經工程改造之逆轉錄子可用於在細胞內產生原間隔基,該等原間隔基經插入細胞內之CRISPR陣列中。細胞可經修飾以包括一或多種經工程改造之返回物(或編碼它們之載體系統),該等返回物可在細胞中產生一或多種合成原間隔基,其中該等合成原間隔基添加至CRISPR陣列中。可生成歷經數日且以多種形式記錄之規定序列之記錄。In some embodiments, a CRISPR-Cas system is used to record specific and arbitrary DNA sequences into the bacterial genome. These DNA sequences can be generated by engineered retrotransposons in cells. For example, engineered retrotransposons can be used to generate protospacers in cells, which are inserted into the CRISPR array in the cell. Cells can be modified to include one or more engineered returners (or vector systems encoding them) that can generate one or more synthetic protospacers in the cell, where these synthetic protospacers are added to the CRISPR array. Records of specified sequences that can be recorded over several days and in a variety of forms can be generated.

在一些實施例中,經工程改造之逆轉錄子包含 msd原間隔基核酸區域或 msr原間隔基核酸區域。在 msr原間隔基核酸區域之情況下,原間隔基序列首先併入 msrRNA中,該RNA經逆轉錄成原間隔基DNA。當兩個具有互補序列之互補原間隔基DNA序列雜交時,或當單股原間隔基DNA中形成雙股結構(諸如髮夾)時( 例如,單一msDNA可形成適當髮夾結構以提供雙股DNA原間隔基),產生雙股原間隔基DNA。 In some embodiments, the engineered retrotranscriptome comprises an msd protospacer nucleic acid region or an msr protospacer nucleic acid region. In the case of an msr protospacer nucleic acid region, the protospacer sequence is first incorporated into an msr RNA, which is reverse transcribed into a protospacer DNA. Double-stranded protospacer DNA is produced when two complementary protospacer DNA sequences with complementary sequences are hybridized, or when a double-stranded structure (such as a hairpin) is formed in a single-stranded protospacer DNA ( for example , a single msDNA can form an appropriate hairpin structure to provide a double-stranded DNA protospacer).

在一些實施例中,由第一經工程改造之逆轉錄子 活體內產生的單股DNA可與由同一逆轉錄子或第二經工程改造之逆轉錄子 活體內產生的互補單股DNA雜交或可形成髮夾結構,且接著用作原間隔基序列以作為間隔序列插入CRISPR陣列中。經工程改造之逆轉錄子應在細胞內提供足夠水準之原間隔基序列以併入CRISPR陣列中。細胞內生成之原間隔基的使用會將 活體內分子記錄系統自僅捕獲用戶已知之資訊擴展至捕獲用戶先前可能未知之生物或環境資訊。例如,經工程改造之逆轉錄子構築體中的msDNA原間隔基序列可由生物現象或環境毒素之感測器路徑下游之啟動子驅動。CRISPR陣列中之原間隔基序列的捕獲及儲存記錄該事件。若多個msDNA原間隔基由不同啟動子驅動,則會記錄彼等啟動子之活性(以及可在啟動子上游之任何內容)以及啟動子活性之相對次序(基於間隔序列在CRISPR陣列中之相對位置)。在記錄發生後之任何時刻,可對CRISPR陣列進行測序,以確定既定生物或環境事件是否發生以及由CRISPR陣列中之msDNA源性間隔基之存在及相對位置給出的多個事件之次序。 In some embodiments, single-stranded DNA generated in vivo by a first engineered retrotranscriptor can hybridize or form a hairpin structure with complementary single-stranded DNA generated in vivo by the same retrotranscriptor or a second engineered retrotranscriptor, and then used as a protospacer sequence for insertion into a CRISPR array as a spacer sequence. The engineered retrotranscriptor should provide sufficient levels of protospacer sequences in cells for incorporation into a CRISPR array. The use of protospacers generated in cells will expand the in vivo molecular recording system from capturing only information known to the user to capturing biological or environmental information that may not have been previously known to the user. For example, an msDNA protospacer sequence in an engineered retrotranscriptor construct may be driven by a promoter downstream of a sensor pathway for a biological phenomenon or environmental toxin. Capture and storage of the protospacer sequence in the CRISPR array records that event. If multiple msDNA protospacers are driven by different promoters, the activity of those promoters (and anything that may be upstream of the promoter) and the relative order of promoter activity (based on the relative position of the spacer sequences in the CRISPR array) are recorded. At any time after the recording occurs, the CRISPR array can be sequenced to determine whether a given biological or environmental event occurred and the order of multiple events given by the presence and relative position of msDNA-derived spacers in the CRISPR array.

在一些實施例中,合成原間隔基進一步在其5'端包含AAG PAM序列。包含5' AAG PAM之原間隔基由CRISPR陣列擷取,其效率比不包括PAM序列之彼等原間隔基更高。In some embodiments, the synthetic protospacer further comprises an AAG PAM sequence at its 5' end. Protospacers comprising a 5' AAG PAM are captured by the CRISPR array with higher efficiency than those protospacers that do not include a PAM sequence.

在一些實施例中,Cas1及Cas2由載體提供,該載體以足以允許細胞中之CRISPR陣列擷取由經工程改造之逆轉錄子產生的合成原間隔基序列之水準表現Cas1及Cas2。此類載體系統可用於在缺乏內源性Cas蛋白之細胞中進行分子記錄。 治療應用 In some embodiments, Cas1 and Cas2 are provided by a vector that expresses Cas1 and Cas2 at levels sufficient to allow the CRISPR array in the cell to capture the synthetic protospacer sequence produced by the engineered retrotranscript. Such vector systems can be used to perform molecular recording in cells that lack endogenous Cas proteins. Therapeutic Applications

本文亦提供使用本發明之經工程改造之逆轉錄子來診斷、預後、治療及/或預防個體體內或個體的疾病、狀態或疾患之方法。Also provided herein are methods of using the engineered retrotransposons of the invention to diagnose, prognose, treat and/or prevent a disease, condition or disorder in or within an individual.

一般而言,該等診斷、預後、治療及/或預防個體體內或個體的疾病、狀態或疾患之方法可包括使用如本文所述之經工程改造之逆轉錄子的組合物、系統或其組分來修飾個體或其細胞中之多核苷酸,及/或包括使用如本文所述之經工程改造之逆轉錄子的組合物、系統或其組分來偵測個體或其細胞中之患病或健康多核苷酸。In general, such methods of diagnosing, prognosing, treating and/or preventing a disease, condition or disorder in or in an individual may comprise using an engineered retrotransposons composition, system or components thereof as described herein to modify polynucleotides in an individual or a cell thereof, and/or may comprise using an engineered retrotransposons composition, system or components thereof as described herein to detect diseased or healthy polynucleotides in an individual or a cell thereof.

在一些實施例中,該治療或預防方法可包括使用經工程改造之逆轉錄子的組合物、系統或組分來修飾個體或其細胞內之傳染性生物體( 例如,細菌或病毒)之多核苷酸。 In some embodiments, the treatment or prevention method may include using a composition, system or composition of an engineered retrotransposons to modify a polynucleotide of an infectious organism ( e.g. , a bacterium or virus) within an individual or a cell thereof.

在一些實施例中,該治療或預防方法可包括使用經工程改造之逆轉錄子的組合物、系統或組分來修飾個體體內之傳染性生物體或共生生物體之多核苷酸。In some embodiments, the treatment or prevention method may include using a composition, system or composition of an engineered retrotransposons to modify a polynucleotide of an infectious or commensal organism in an individual.

在一些實施例中,可使用經工程改造之逆轉錄子的組合物、系統及組分來開發疾病、狀態或疾患之模型。In some embodiments, engineered retrotran compositions, systems, and compositions can be used to develop models of a disease, condition, or disorder.

在一些實施例中,可使用經工程改造之逆轉錄子的組合物、系統及組分來偵測疾病狀態或其校正,諸如藉由本文所述之治療或預防方法。In some embodiments, engineered retrotranscript compositions, systems, and compositions may be used to detect disease states or correction thereof, such as by the treatment or prevention methods described herein.

在一些實施例中,可使用經工程改造之逆轉錄子的組合物、系統及組分來篩選及選擇可用作例如本文所述之治療或預防之細胞。In some embodiments, engineered retrotransposons compositions, systems, and compositions can be used to screen and select cells for use as treatments or preventions, for example, as described herein.

在一些實施例中,可使用該組合物、系統及其組分來開發生物活性劑,該等生物活性劑可用於修飾個體或其細胞中之一或多種生物功能或活性。In some embodiments, the compositions, systems, and components thereof can be used to develop bioactive agents that can be used to modify one or more biological functions or activities in an individual or its cells.

一般而言,該方法可包括藉由合適遞送技術及/或組合物將經工程改造之逆轉錄子的組合物、系統及/或組分遞送至個體或其細胞,或遞送至傳染性或共生生物體。一旦經投與,該等組分可如本文中別處所述進行操作以引發核酸修飾事件。在一些實施例中,核酸修飾事件可在基因體、表觀基因體及/或轉錄體層面下發生。可發生DNA及/或RNA裂解、基因活化及/或基因不活化。In general, the method may include delivering a composition, system and/or component of an engineered retrotransposons to an individual or its cells, or to an infectious or symbiotic organism, by a suitable delivery technology and/or composition. Once administered, the components may be manipulated as described elsewhere herein to induce nucleic acid modification events. In some embodiments, nucleic acid modification events may occur at the genomic, epigenomic and/or transcriptomic level. DNA and/or RNA cleavage, gene activation and/or gene inactivation may occur.

可使用如本文中別處所述之經工程改造之逆轉錄子的組合物、系統及組分來治療及/或預防個體之疾病,諸如遺傳及/或表觀遺傳疾病;治療及/或預防個體之遺傳傳染性疾病,諸如細菌感染、病毒感染、真菌感染、寄生蟲感染及其組合;改變個體體內之微生物組的組成或型態,這又可改變個體之健康狀況;離體修飾細胞,該等細胞接著可投與至個體,由此經修飾之細胞可治療或預防疾病或其症狀;或治療粒線體疾病,其中粒線體疾病病因涉及粒線體DNA突變。The engineered retrotransposons compositions, systems and compositions as described elsewhere herein can be used to treat and/or prevent disease in an individual, such as a genetic and/or epigenetic disease; to treat and/or prevent a genetic infectious disease in an individual, such as a bacterial infection, a viral infection, a fungal infection, a parasitic infection, and combinations thereof; to alter the composition or pattern of the microbiome within an individual, which in turn can alter the health of the individual; to modify cells ex vivo, which can then be administered to an individual, whereby the modified cells can treat or prevent a disease or a symptom thereof; or to treat a mitochondrial disease, wherein the etiology of the mitochondrial disease involves mutations in mitochondrial DNA.

亦提供一種治療個體( 例如,有需要之個體)的方法,該方法包括藉由用編碼該組合物、系統或複合物之一或多種組分的多核苷酸或經工程改造之逆轉錄子之本文所述之任何多核苷酸或載體轉化個體來誘導基因編輯,及將其投與至個體。 Also provided is a method of treating an individual ( e.g. , an individual in need thereof), comprising inducing gene editing by transforming the individual with any polynucleotide or vector described herein encoding a polynucleotide or engineered retrotransposon encoding one or more components of the composition, system, or complex, and administering it to the individual.

亦提供一種治療個體( 例如,有需要之個體)的方法,該方法包括藉由用本文所述之多核苷酸或載體轉化個體來進行多個靶標基因基因座之轉錄活化或抑制,其中該多核苷酸或載體編碼或包含經工程改造之逆轉錄子的組合物、系統、複合物或組分之一或多種組分,且包含多種Cas效應子。 Also provided is a method of treating an individual ( e.g. , an individual in need thereof), the method comprising transcriptional activation or inhibition of multiple target gene loci by transforming the individual with a polynucleotide or vector described herein, wherein the polynucleotide or vector encodes or comprises one or more components of a composition, system, complex or component of an engineered retrotransposons and comprises multiple Cas effectors.

亦提供一種治療個體( 例如,有需要之個體)的方法,該方法包括藉由用Cas效應子轉化個體來進行基因編輯,及在活體內編碼且表現經工程改造之逆轉錄子的組合物、系統( 例如,RNA、引導)、複合物或組分之剩餘部分。亦可由本文中別處所述之經工程改造之逆轉錄子提供合適修復模板。 Also provided is a method of treating an individual ( e.g. , an individual in need thereof), comprising gene editing by transforming the individual with a Cas effector, and a composition, system ( e.g. , RNA, guide), complex or remainder of a component encoding and expressing the engineered retrotranscript in vivo. Suitable repair templates may also be provided by engineered retrotranscripts described elsewhere herein.

亦提供一種治療個體( 例如,有需要之個體)的方法,該方法包括藉由用本文之系統或組合物轉化個體來進行轉錄活化或抑制。 Also provided is a method of treating a subject ( eg , a subject in need thereof) comprising activating or inhibiting transcription by transforming the subject with a system or composition herein.

亦提供一種在個體、傳染性生物體及/或個體之微生物組之生物體的真核或原核細胞或其組分( 例如,粒線體)中誘導一或多種多核苷酸修飾之方法。修飾可包括在一或多種細胞之多核苷酸之標靶序列處引入、缺失或取代一或多種核苷酸。修飾可在活體外、離體、原位或活體內發生。 Also provided is a method of inducing modification of one or more polynucleotides in a eukaryotic or prokaryotic cell or component thereof ( e.g. , mitochondria) of an individual, infectious organism, and/or organism of the microbiome of an individual. The modification may include the introduction, deletion, or substitution of one or more nucleotides at a target sequence of a polynucleotide of the one or more cells. The modification may occur in vitro, ex vivo, in situ, or in vivo.

在一些實施例中,該治療或抑制由真核生物體或非人類生物體中之基因體基因座中之一或多種突變引起的疾患或疾病之方法可包括操縱有需要之個體或非人類個體中之標靶序列中之該基因體基因座的編碼、非編碼或調節元件,此舉包括藉由操縱標靶序列來修飾個體或非人類個體且其中該疾患或疾病易於經受藉由操縱標靶序列實現之治療或抑制,包括提供包括遞送組合物之治療,該組合物包含上述實施例中之任一者的顆粒遞送系統或遞送系統或病毒顆粒或上述實施例中之任一者的細胞。In some embodiments, the method of treating or inhibiting a disorder or disease caused by one or more mutations in a genomic locus in a eukaryotic or non-human organism may include manipulating the coding, non-coding or regulatory elements of the genomic locus in a target sequence in an individual or non-human individual in need thereof, including modifying the individual or non-human individual by manipulating the target sequence and wherein the disorder or disease is susceptible to treatment or inhibition by manipulating the target sequence, including providing a treatment comprising a delivery composition comprising a particle delivery system of any of the above embodiments or a delivery system or a viral particle or a cell of any of the above embodiments.

本文亦提供上述實施例中之任一者的顆粒遞送系統或遞送系統或病毒載體(在病毒顆粒中)或上述實施例中之任一者的細胞在離體或活體內基因或基因體編輯中;或用於活體外、離體或活體內基因療法之用途。Also provided herein is a particle delivery system or a delivery system or a viral vector (in a viral particle) of any of the above embodiments or a cell of any of the above embodiments for in vitro or in vivo gene or genome editing; or for use in in vitro, in vitro or in vivo gene therapy.

本文亦提供用於製造藥劑之上述實施例中之任一者的顆粒遞送系統、非病毒遞送系統及/或病毒顆粒或上述實施例中之任一者的細胞,該藥劑用於活體外、離體或活體內基因或基因體編輯,或者用於活體外、離體或活體內基因療法,或者用於藉由操縱與疾病相關之基因體基因座中的標靶序列來修飾生物體或非人類生物體之方法中,或者用於治療或抑制藉由真核生物體或非人類生物體中之基因體基因座中之一或多種突變引起的疾患或疾病之方法中。Also provided herein are particle delivery systems, non-viral delivery systems and/or viral particles of any of the above embodiments or cells of any of the above embodiments for use in the manufacture of a medicament for use in vitro, ex vivo or in vivo gene or genome editing, or in vitro, ex vivo or in vivo gene therapy, or in a method of modifying an organism or a non-human organism by manipulating a target sequence in a genomic locus associated with a disease, or in a method of treating or inhibiting a disorder or disease caused by one or more mutations in a genomic locus in a eukaryotic organism or a non-human organism.

在一些實施例中,使用本發明之經工程改造之逆轉錄子及相關組合物、載體、系統及方法的標靶多核苷酸修飾包含在該(等)細胞之該多核苷酸之每個標靶序列處添加、缺失或取代1-約10 k個核苷酸。修飾可包括在每個標靶序列處添加、缺失或取代至少1、5、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、35、40、45、50、75、100、200、250、300、500、1000、1500、2000、2500、3000、3500、4000、5000、6000、7000、8000、9000、10,000個或更多核苷酸。In some embodiments, modification of a target polynucleotide using the engineered retrotransposons and related compositions, vectors, systems and methods of the invention comprises adding, deleting or substituting 1 to about 10 k nucleotides at each target sequence of the polynucleotide in the cell(s). Modifications may include adding, deleting, or substituting at least 1, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, 100, 200, 250, 300, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, or more nucleotides per target sequence.

在一些實施例中,系統或複合物之形成導致標靶序列中或附近( 例如,在1、2、3、4、5、6、7、8、9、10、20、50個或更多鹼基對內)之一股或兩股的裂解、刻切口及/或另一修飾。 In some embodiments, formation of the system or complex results in cleavage, nicking, and/or another modification of one or both strands in or near ( e.g. , within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50 or more base pairs of) the target sequence.

在一些實施例中,修飾細胞中之標靶多核苷酸以治療或預防疾病之方法可包括使本發明之經工程改造之逆轉錄子的組合物、系統或組分結合於標靶多核苷酸, 例如以實現該組合物、系統能夠對該標靶多核苷酸進行之裂解、刻切口或另一修飾,由此修飾標靶多核苷酸,其中該組合物、系統或其組分與指導序列複合,且使該指導序列與標靶多核苷酸內之標靶序列雜交,其中該指導序列視情況連接至tracr配對序列,該配對序列又可與tracr序列雜交。在一些實施例中,修飾可包括藉由該組合物、系統或其組分之一或多種組分在標靶序列之位置處對一股或兩股進行裂解或刻切口。 In some embodiments, a method of modifying a target polynucleotide in a cell to treat or prevent a disease may include binding a composition, system or component of an engineered retrotransposons of the invention to a target polynucleotide, such as to effect cleavage, nicking or another modification of the target polynucleotide by the composition, system or component thereof, thereby modifying the target polynucleotide, wherein the composition, system or component thereof is complexed with a guide sequence, and the guide sequence is hybridized with a target sequence within the target polynucleotide, wherein the guide sequence is optionally linked to a tracr mate sequence, which in turn can hybridize with a tracr sequence. In some embodiments, modification may include cleavage or nicking of one or both strands at the location of the target sequence by one or more of the composition, system or component thereof.

在一些實施例中,可使用經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法來治療循環系統疾病。在一些實施例中,可藉由使用AAV或慢病毒載體遞送本文所述之經工程改造之逆轉錄子、組合物、系統及/或載體以在 活體內離體修飾造血幹細胞(HSC)或iPSC來進行治療。在一些實施例中,可藉由使用本文之組合物、系統或其組分針對疾病校正HSC或iPSC來進行治療,其中該組合物、系統視情況包括合適HDR修復模板( 例如,經工程改造之逆轉錄子的msDNA中之模板)。 In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses, and methods of use can be used to treat circulatory diseases. In some embodiments, treatment can be performed by using AAV or lentiviral vectors to deliver engineered retrotransposons, compositions, systems, and/or vectors described herein to modify hematopoietic stem cells (HSCs) or iPSCs in vivo or ex vivo . In some embodiments, treatment can be performed by using the compositions, systems, or components thereof herein to correct disease HSCs or iPSCs, wherein the compositions, systems, as appropriate, include a suitable HDR repair template ( e.g. , a template in the msDNA of an engineered retrotransposons).

在一些實施例中,用於治療循環系統或血液疾病之治療或預防可包括修飾人類臍帶血細胞。在一些實施例中,用於治療循環系統或血液疾病之治療或預防可包括用本文所述之任何修飾來修飾粒細胞群落刺激因子動員之外周血細胞(mPB)。在一些實施例中,人類臍帶血細胞或mPB可為CD34 +。在一些實施例中,經修飾之臍帶血細胞或mPB細胞為自體的。在一些實施例中,臍帶血細胞或mPB細胞為同種異體的。除了疾病基因之修飾以外,亦可使用本文所述之組合物、系統進一步修飾同種異體細胞,以降低該等細胞在遞送至接受者時之免疫原性。經修飾之臍帶血細胞或mPB細胞可視情況在活體外擴增。可使用任何合適遞送技術將經修飾之臍帶血細胞或mPB細胞導出至有需要之個體。 In some embodiments, treatment or prevention for treating circulatory or blood diseases may include modifying human cord blood cells. In some embodiments, treatment or prevention for treating circulatory or blood diseases may include modifying granulocyte colony stimulating factor mobilized peripheral blood cells (mPB) with any modification described herein. In some embodiments, human cord blood cells or mPB may be CD34 + . In some embodiments, the modified cord blood cells or mPB cells are autologous. In some embodiments, the cord blood cells or mPB cells are allogeneic. In addition to modification of disease genes, the compositions and systems described herein can also be used to further modify allogeneic cells to reduce the immunogenicity of such cells when delivered to a recipient. Modified cord blood cells or mPB cells can be expanded in vitro as appropriate. Modified cord blood cells or mPB cells can be introduced into an individual in need using any appropriate delivery technology.

該組合物及系統可經工程改造以靶向HSC中之一或多個遺傳基因座。在一些實施例中,該等系統之組分可針對真核細胞且尤其哺乳動物細胞( 例如人類細胞,例如HSC或iPSC)進行密碼子最佳化且可製備靶向HSC中之一或多個基因座(諸如循環疾病)的sgRNA。此等可經由顆粒,諸如本文所述之脂質奈米顆粒遞送系統進行遞送。該等顆粒可藉由混合本文之系統之組分而形成。 The compositions and systems can be engineered to target one or more genetic loci in HSCs. In some embodiments, the components of the systems can be codon optimized for eukaryotic cells and particularly mammalian cells ( e.g., human cells, such as HSCs or iPSCs) and sgRNAs targeting one or more loci in HSCs (e.g., circulatory diseases) can be prepared. These can be delivered via particles, such as the lipid nanoparticle delivery systems described herein. The particles can be formed by mixing the components of the systems herein.

在一些實施例中,在離體修飾後,HSC或iPCS可在投與至個體之前進行擴增。HSC之擴增可經由任何合適方法,諸如以下所述之方法:Lee, 「Improved ex vivo expansion of adult hematopoietic stem cells by overcoming CUL4-mediated degradation of HOXB4.」Blood. 2013年5月16日;121(20):4082-9. doi: 10.1182/blood-2012-09-455204. Epub 2013年3月21日。In some embodiments, after ex vivo modification, HSCs or iPCS can be expanded before administration to an individual. HSCs can be expanded by any suitable method, such as the method described in Lee, "Improved ex vivo expansion of adult hematopoietic stem cells by overcoming CUL4-mediated degradation of HOXB4." Blood. 2013 May 16;121(20):4082-9. doi: 10.1182/blood-2012-09-455204. Epub 2013 Mar 21.

在一些實施例中,經修飾之HSC或iPCS為自體的。在一些實施例中,HSC或iPCS為同種異體的。除了疾病基因之修飾以外,亦可使用本文所述之組合物、系統進一步修飾同種異體細胞,以降低該等細胞在遞送至接受者時之免疫原性。In some embodiments, the modified HSC or iPCS is autologous. In some embodiments, the HSC or iPCS is allogeneic. In addition to the modification of disease genes, the compositions and systems described herein can also be used to further modify allogeneic cells to reduce the immunogenicity of the cells when delivered to a recipient.

在一些實施例中,可使用經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法來治療神經疾病。在一些實施例中,神經疾病包含腦部及CNS疾病。In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses and methods of use can be used to treat neurological diseases. In some embodiments, neurological diseases include brain and CNS diseases.

腦部疾病之遞送選項包括將系統以DNA或RNA之形式囊封於脂質體中,且與分子特洛伊木馬結合以進行跨血腦障壁(BBB)遞送。分子特洛伊木馬已顯示出有效地將B-gal表現載體遞送至非人類靈長類動物之腦中。可使用同一方法來遞送本發明之載體或載體系統。在其他實施例中,可生成用於CNS及/或腦部遞送之人工病毒。Delivery options for brain diseases include encapsulating the system in liposomes in the form of DNA or RNA and combining it with a molecular Trojan horse for delivery across the blood-brain barrier (BBB). Molecular Trojan horses have been shown to effectively deliver B-gal expression vectors into the brain of non-human primates. The same method can be used to deliver the vectors or vector systems of the present invention. In other embodiments, artificial viruses can be generated for CNS and/or brain delivery.

在一些實施例中,可使用經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法來治療一隻或兩隻耳朵之聽力疾病或聽力損失。耳聾通常由毛細胞損失或受損,因此無法將信號傳遞至聽覺神經元所致。在一些實施例中,可藉由此項技術中已知之任何合適方法或技術,諸如US20120328580 ( 例如,耳部投與)、藉由鼓室內注射( 例如,至中耳中)及/或注射至外耳、中耳及/或內耳中;經由導管或泵原位投與(U.S. 2006/0030837)及Jacobsen (美國專利第7,206,639號),將組合物、系統或經修飾之細胞遞送至一隻或兩隻耳朵以治療或預防聽力疾病或損失。亦參見US20120328580。接著,可將由此類方法產生之細胞移植或植入需要此類治療之患者中。 In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses and methods of use can be used to treat hearing disease or hearing loss in one or both ears. Hearing loss is usually caused by the loss or damage of hair cells, which are unable to transmit signals to auditory neurons. In some embodiments, the composition, system or modified cells may be delivered to one or both ears to treat or prevent hearing disease or loss by any suitable method or technique known in the art, such as US20120328580 ( e.g. , otic administration), by intratympanic injection ( e.g. , into the middle ear) and/or injection into the outer ear, middle ear and/or inner ear; in situ administration via a catheter or pump (US 2006/0030837) and Jacobsen (U.S. Patent No. 7,206,639). See also US20120328580. The cells produced by such methods may then be transplanted or implanted into a patient in need of such treatment.

在一些實施例中,可使用經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法來治療非分裂細胞之疾病。例示性非分裂細胞包括肌肉細胞或神經元。在此類細胞中,同源重組(HR)一般在G1細胞週期階段受到抑制,但可使用技術公認之方法進行逆轉,諸如Orthwein 等人(Nature. 2015年12月17日; 528(7582): 422–426)。 In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses and methods of use can be used to treat diseases of non-dividing cells. Exemplary non-dividing cells include muscle cells or neurons. In such cells, homologous recombination (HR) is generally inhibited during the G1 cell cycle, but can be reversed using art-recognized methods, such as Orthwein et al. (Nature. 2015 Dec 17; 528(7582): 422–426).

在一些實施例中,可使用經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法來治療眼睛疾病。In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses and methods of use may be used to treat eye diseases.

在一些實施例中,可使用經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法來治療肌肉疾病及心血管疾病。In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses and methods of use can be used to treat muscle diseases and cardiovascular diseases.

在一些實施例中,可使用經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法來治療肝臟及腎臟疾病。In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses and methods of use can be used to treat liver and kidney diseases.

在一些實施例中,可使用經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法來治療上皮及肺病。In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses and methods of use can be used to treat epithelial and pulmonary diseases.

在一些實施例中,可使用經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法來治療皮膚疾病。In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses and methods of use can be used to treat skin diseases.

在一些實施例中,可使用經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法來治療癌症。In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses and methods of use may be used to treat cancer.

在一些實施例中,經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法可用於過繼細胞療法。In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses and methods of use can be used in transfer cell therapy.

在一些實施例中,可使用經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法來治療傳染病。In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses and methods of use can be used to treat infectious diseases.

在一些實施例中,可使用經工程改造之逆轉錄子及相關組合物、系統、載體、用途及使用方法來治療粒線體疾病。In some embodiments, engineered retrotransposons and related compositions, systems, vectors, uses and methods of use can be used to treat mitochondrial diseases.

本文所引用之所有出版物、公開專利文件及專利申請案均由此以引用之方式併入,其併入程度就如同各個別公開案、公開專利文件或專利申請案特定地且個別地經指示以引用之方式併入一般。 O. 序列 序列彙總表 All publications, published patent documents, and patent applications cited herein are hereby incorporated by reference to the same extent as if each individual publication, published patent document, or patent application was specifically and individually indicated to be incorporated by reference. O. Sequence Summary Listing

以下序列形成本說明書之部分。 SEQ ID NO: 描述 1-7257 表A序列(AA) - 根據實例3計算發現逆轉錄子逆轉錄酶 胺基酸序列且在整個實例中進行測試/評估(序列包括於序列表中,該序列表與本說明書一起申請,經鑑定為RTX009P1_Sequence_Listing_26Jul23.xml) 7258-14498 表A序列(NT) - 根據實例3計算發現逆轉錄子逆轉錄酶 核苷酸序列且在整個實例中進行測試/評估(序列包括於序列表中,該序列表與本說明書一起申請,經鑑定為RTX009P1_Sequence_Listing_26Jul23.xml) 14499-19108 表B序列(NT) - 根據實例3發現逆轉錄子非編碼RNA (ncRNA)核苷酸序列且在整個實例中進行測試/評估(序列包括於序列表中,該序列表與本說明書一起申請,經鑑定為RTX009P1_Sequence_Listing_26Jul23.xml) 19109-19125 實例:選擇RT及ncRNA序列(序列包括於本說明書中) 19126-19211 實例:RNA序列(序列包括於本說明書中) 19212-19217 實例:引子序列(序列包括於本說明書中) 19218-19221 實例:qPCR引子序列(序列包括於本說明書中) 19222-12261 實例:RT變異體(編碼序列) (序列包括於本說明書中) 12262-19364 實例:質體序列(序列包括於本說明書中) 19365-19398 表C - 針對表A中鑑定之大多數種系發生進化枝之逆轉錄子逆轉錄酶共有 胺基酸序列(序列包括於本說明書中) 參考文獻中所揭示之序列 表X – 以下所揭示之序列:Mestre等人, Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classification of The Encoded Tripartite Systems, Nucleic Acids Research, 第48卷, 第22期, 2020年12月16日, 第12632-12647頁,其以引用之方式整體併入本文中。 19399-19401、19420、19529-19542 NLS序列 19402-19415 TALE相關序列 19416 大範圍核酸酶 19417 RE切割位點 19418-19419、19469-19528 連接體 19421 Kozak序列 19422-19453 表Y及相關揭示內容 – 5’ UTR之例示性核苷酸序列 19454-19468 表Z – 線性mRNA之額外終止元件 19543-19926 表31A – 最佳化之ncRNA文庫 19927-19930 RTX_6342 ncRNA 逆轉錄子序列 The following sequence forms part of this specification. SEQ ID NO: describe 1-7257 Table A Sequence (AA) - Retroviral reverse transcriptase amino acid sequence calculated according to Example 3 and tested/evaluated throughout the Examples (sequence included in the sequence listing filed with this specification, identified as RTX009P1_Sequence_Listing_26Jul23.xml) 7258-14498 Table A Sequence (NT) - Nucleotide sequences of reverse transcriptases found computationally according to Example 3 and tested/evaluated throughout the Examples (sequences included in the sequence listing filed with this specification and identified as RTX009P1_Sequence_Listing_26Jul23.xml) 14499-19108 Table B Sequences (NT) - Retrotranscript non-coding RNA (ncRNA) nucleotide sequences found according to Example 3 and tested/evaluated throughout the Examples (sequences included in the sequence listing, which is filed with this specification and identified as RTX009P1_Sequence_Listing_26Jul23.xml) 19109-19125 Example: Selection of RT and ncRNA sequences (sequences included in this manual) 19126-19211 Example: RNA sequence (sequence included in this manual) 19212-19217 Example: Primer sequence (sequence included in this specification) 19218-19221 Example: qPCR primer sequence (sequence included in this manual) 19222-12261 Example: RT variant (coding sequence) (sequence included in this specification) 12262-19364 Example: Plasmid sequence (sequence included in this specification) 19365-19398 Table C - Consensus amino acid sequences of reverse transcriptases for most of the retrotransposons identified in Table A (sequences included in this specification) Sequences disclosed in references Table X - Sequences disclosed in: Mestre et al., Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classification of The Encoded Tripartite Systems , Nucleic Acids Research, Vol. 48, No. 22, December 16, 2020, pp. 12632-12647, which is incorporated herein by reference in its entirety. 19399-19401, 19420, 19529-19542 NLS sequence 19402-19415 TALE-related sequences 19416 Large range nuclease 19417 RE cleavage site 19418-19419, 19469-19528 Connector 19421 Kozak sequence 19422-19453 Table Y and related disclosure - Exemplary nucleotide sequences of 5'UTR 19454-19468 Table Z – Additional termination elements for linear mRNA 19543-19926 Table 31A - Optimized ncRNA library 19927-19930 RTX_6342 ncRNA Retrotranscript sequence

本說明書揭示且主張重組逆轉錄子及其組分(例如,重組ncRNA及重組逆轉錄子RT),以及包含該等重組逆轉錄子及/或其重組組分之基因體修飾系統。此類系統包括但不限於基於重組逆轉錄子之基因體編輯系統、重組工程系統及細胞記錄系統。本說明書亦揭示且主張構成該等系統之各種組分及態樣,以及其用途及應用,包括但不限於:(a)編碼重組逆轉錄子及基於逆轉錄子之基因體修飾系統的重組核酸分子,(b)載體系統(包括病毒及非病毒),其包含該等基於逆轉錄子之基因體修飾系統的一或多種組分,包括全DNA載體系統、全RNA載體系統及DNA/RNA載體系統,(c)用於遞送該等載體系統及/或基於逆轉錄子之基因體修飾系統的組分之遞送系統(例如,脂質顆粒、脂質奈米顆粒及其他遞送媒劑形式),(d)包含任何前述組分之調配物,用於遞送至細胞及/或組織,包括活體外、活體內及離體遞送,(e)藉由本文所述之基於重組逆轉錄子之基因體修飾系統及方法進行修飾之細胞,及(f)藉由使用本文所揭示之基於逆轉錄子之基因體修飾系統進行基因體編輯、重組工程或細胞記錄來修飾細胞之方法,(g)製備本文所述之重組逆轉錄子、基於逆轉錄子之基因體修飾系統、載體及調配物的方法,以及(h)用於在活體外、活體內及離體條件下修飾細胞之醫藥組合物及套組。This specification discloses and advocates recombinant retrotranscripts and their components (e.g., recombinant ncRNA and recombinant retrotranscript RT), and genome modification systems comprising such recombinant retrotranscripts and/or their recombinant components. Such systems include but are not limited to genome editing systems, recombinant engineering systems, and cell recording systems based on recombinant retrotranscripts. This specification also discloses and advocates various components and aspects constituting such systems, as well as their uses and applications, including but not limited to: (a) recombinant nucleic acid molecules encoding recombinant retrotransposons and retrotransposons-based genome modification systems, (b) vector systems (including viral and non-viral) comprising one or more components of such retrotransposons-based genome modification systems, including all-DNA vector systems, all-RNA vector systems, and DNA/RNA vector systems, (c) delivery systems for delivering such vector systems and/or components of retrotransposons-based genome modification systems (e.g., lipid particles, lipid nanoparticles, and other forms of delivery vehicles). ), (d) formulations comprising any of the foregoing components for delivery to cells and/or tissues, including in vitro, in vivo and ex vivo delivery, (e) cells modified by the recombinant retrotransposons-based genome modification systems and methods described herein, and (f) methods of modifying cells by genome editing, recombinant engineering or cellular transcription using the retrotransposons-based genome modification systems disclosed herein, (g) methods of preparing the recombinant retrotransposons, retrotransposons-based genome modification systems, vectors and formulations described herein, and (h) pharmaceutical compositions and kits for modifying cells in vitro, in vivo and ex vivo.

在各個實施例中,可藉由將一或多種修飾引入已知逆轉錄子(例如,公開已知或先前出版(例如,表X)或為新穎逆轉錄子序列(表A及B,如實例中所鑑定)之逆轉錄子)中來製備本文所述之重組逆轉錄子及基於逆轉錄子之基因體修飾系統。 表X:先前已知之逆轉錄子逆轉錄酶 In various embodiments, the recombinant retrotranscripts and retrotranscript-based genome modification systems described herein can be prepared by introducing one or more modifications into a known retrotranscript, e.g., a retrotranscript that is publicly known or previously published (e.g., Table X) or a novel retrotranscript sequence (Tables A and B, as identified in the Examples). Table X: Previously known retrotranscript reverse transcriptases

表X提供1928種逆轉錄子逆轉錄酶之非限制性實例,該等逆轉錄子逆轉錄酶可根據本文方法進行修飾以獲得用於本文所述之組合物、系統及方法中的重組逆轉錄子逆轉錄酶。此等逆轉錄子序列報告如下:Mestre等人, 「Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classification of The Encoded Tripartite Systems,」 Nucleic Acids Research, 第48卷, 第22期, 2020年12月16日, 第12632-12647頁,其內容以引用之方式併入本文中。在一些實施例中,表X逆轉錄子意欲排除在所主張之主題的範圍外。 具有A1-A45行之表A:新穎逆轉錄子逆轉錄酶序列 Table X provides non-limiting examples of 1928 retrotranscript reverse transcriptases that can be modified according to the methods herein to obtain recombinant retrotranscript reverse transcriptases for use in the compositions, systems, and methods described herein. These retrotranscript sequences are reported as follows: Mestre et al., "Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classification of The Encoded Tripartite Systems," Nucleic Acids Research, Vol. 48, No. 22, December 16, 2020, pp. 12632-12647, the contents of which are incorporated herein by reference. In some embodiments, the retrotranscripts of Table X are intended to be excluded from the scope of the claimed subject matter. Table A with rows A1-A45: Novel Retrotranscript Reverse Transcriptase Sequences

表A提供逆轉錄子逆轉錄酶之非限制性實例,該等逆轉錄子逆轉錄酶可根據本文方法進行修飾以獲得用於本文所述之組合物、系統及方法中的重組逆轉錄子逆轉錄酶。Table A provides non-limiting examples of retrovirus reverse transcriptases that can be modified according to the methods herein to obtain recombinant retrovirus reverse transcriptases for use in the compositions, systems and methods described herein.

詳言之,表A提供與作為實例中所述之計算發現工作的結果經鑑定之新穎逆轉錄子RT相對應之序列標識符。該表提供與作為本說明書之部分所包括的序列表之內容相對應之序列標識符。該表包括RT胺基酸序列及RT核酸序列。該表經組織成四十五行,每一行表示形成相關逆轉錄子RT之單一種系發生進化枝的序列,如藉由實例3中所述之計算工作所確定。 Seq ID No ( 序列表中呈遞之序列) 種系發生進化枝 RT 胺基酸序列 RT 核酸序列 A1 3980-4178 11231-11429 I-A A2 4671-4825 11922-12075 I-B1 A3 4980-5143 12229-12392 I-B2 A4 367-368、427-441、494-521、526-527、536、626、649、660-668、675、679、687-692、695、697、703、716、721-722、751-763、767、770-1411、1456-1462 7624-7625、7684-7698、7751-7778、7783-7784、7793、7883、7906、7917-7925、7932、7936、7944-7949、7952、7954、7960、7973、7978-7979、8008-8020、8024、8027-8667、8712-8718 I-C A5 1529-1569 8784-8823 I-D A6 6697-6701 13943-13947 II A7 4179-4670 11430-11921 II-A1 A8 4884-4909 12134-12159 II-A1其他 A9 6919-6972 14163-14215 II-A2 A10 2786-2866、2887-2938 10039-10119、10140-10191 II-A3 A11 4826-4863 12076-12113 II-A4 A12 4864-4875 12114-12125 II-A4融合 A13 6974-7002 14217-14244 II-A5 A14 2598-2600、2759-2785 9851-9853、10012-10038 III A15 2445-2582 9699-9836 III-A1 A16 1983-2158 9237-9412 III-A2 A17 1612-1982 8866-9236 III-A3 A18 2601-2678 9854-9931 III-A4 A19 2679-2758 9932-10011 III-A5 A20 3442-3603 10694-10855 IV A21 3604-3708 10856-10959 V A22 2939-3441、3709-3979、5177-5192 10192-10693、10960-11230、12426-12441 VI A23 7003-7033 14245-14275 VII-A1 A24 7054-7133 14296-14374 VII-A2 A25 7034-7049 14276-14291 VIII A26 6835-6918 14079-14162 IX A27 6823-6834 14068-14078 X A28 298-366、369-373、442-493、522-525、528-535、537、551-554、557、560-625、672-674、680-681、684-686、696、698、702、723-742、764-766、1412-1450、1452-1453、1463-1466、1571-1577 7555-7623、2626-7630、7699-7750、7785-7792、7794、7808-7811、7814、7817-7882、7929-7931、7937-7938、7941-7943、7953、7955、7959、7980-7999、8021-8023、8668-8706、8708-8709、8719-8722、8825-8831 XI A29 374-426、539-550、555-556、558-559、671、682-683、743、745-750 7631-7683、7796-7807、7812-7813、7815-7816、7928、7939-7940、800、8002-8007 XII A30 5942-6665 13189-13911 XIII A31 1-297、715、1580-1603 7258-7554、7972、8834-8857 XIV A32 705-714 7962-7971 XV A33 6681-6694 13927-13940 XVI A34 6788-6803 14033-14048 XVII A35 1469-1526、5147-5151 8725-8781、12396-12400 CRISPR相關 A36 2159-2428 9413-9682 Ec107樣 A37 646-648 7903-7905 RT-atp酶 A38 2592-2595 9846-9849 RT-DUF4116 A39 676-678、717-720 7933-7935、7974-7977 RT-HTH A40 538、669、704、1454 7795、7926、7961、8710 RT-pddex A41 670、699-701 7927、7956-7958 RT-unk A42 4917-4979 12167-12228 噬菌體 A43 4910-4916 12160-12166 巨型噬菌體 A44 5195-5941 12444-13188 外群 A45 627-645、650-659、693-694、744、768-769、1451、1455、1467-1468、1527-1528、1570、1578、1579、1604-1611、2429-2444、2583-2591、2596-2597、2867-2886、4876-4883、5144-5146、5152-5176、5193-5194、6666-6680、6695-6696、6702-6787、6804-6822、6973、7050-7053、7134- 7257 7884-7902、7907-7916、7950-7951、8001、8025-8026、8707、8711、8723-8724、8782-8783、8824、8832-8833、8858-8865、9683-9698、9837-9845、9850、10120-10139、12126-12133、12393-12395、12401-12425、12442-12443、13912-13926、13941-13942、13948-14032、14049-14067、14216、14292-14295、14375-14498 未分類 具有B1-B45行之表B:例示性ncRNA序列 In detail, Table A provides sequence identifiers corresponding to novel retrotransposons RT identified as a result of the computational discovery work described in the Examples. The table provides sequence identifiers corresponding to the contents of the sequence listing included as part of this specification. The table includes RT amino acid sequences and RT nucleic acid sequences. The table is organized into forty-five rows, each row representing the sequences that form a single phylogenetic clade of the relevant retrotransposons RT, as determined by the computational work described in Example 3. surface Seq ID No ( Sequence number listed in the sequence list) Phylogenetic clades RT amino acid sequence RT nucleic acid sequence A1 3980-4178 11231-11429 I A A2 4671-4825 11922-12075 I-B1 A3 4980-5143 12229-12392 I-B2 A4 367-368, 427-441, 494-521, 526-527, 536, 626, 649, 660-668, 675, 679, 687-692, 695, 697, 703, 716, 721-722, 751-763, 767, 770-1411, 1456-1462 7624-7625, 7684-7698, 7751-7778, 7783-7784, 7793, 7883, 7906, 7917-7925, 7932, 7936, 7944-7949, 7952, 7954, 7960, 7973, 7978-7979, 8008-8 020, 8024, 8027-8667, 8712-8718 IC A5 1529-1569 8784-8823 ID A6 6697-6701 13943-13947 II A7 4179-4670 11430-11921 II-A1 A8 4884-4909 12134-12159 II-A1 Others A9 6919-6972 14163-14215 II-A2 A10 2786-2866, 2887-2938 10039-10119, 10140-10191 II-A3 A11 4826-4863 12076-12113 II-A4 A12 4864-4875 12114-12125 II-A4 Fusion A13 6974-7002 14217-14244 II-A5 A14 2598-2600, 2759-2785 9851-9853, 10012-10038 III A15 2445-2582 9699-9836 III-A1 A16 1983-2158 9237-9412 III-A2 A17 1612-1982 8866-9236 III-A3 A18 2601-2678 9854-9931 III-A4 A19 2679-2758 9932-10011 III-A5 A20 3442-3603 10694-10855 IV A21 3604-3708 10856-10959 V A22 2939-3441, 3709-3979, 5177-5192 10192-10693, 10960-11230, 12426-12441 VI A23 7003-7033 14245-14275 VII-A1 A24 7054-7133 14296-14374 VII-A2 A25 7034-7049 14276-14291 VIII A26 6835-6918 14079-14162 IX A27 6823-6834 14068-14078 X A28 298-366, 369-373, 442-493, 522-525, 528-535, 537, 551-554, 557, 560-625, 672-674, 680-681, 684-686, 696, 698, 702, 723-742, 764-766, 1412- 1450, 1452-1453, 1463-1466, 1571-1577 7555-7623, 2626-7630, 7699-7750, 7785-7792, 7794, 7808-7811, 7814, 7817-7882, 7929-7931, 7937-7938, 7941-7943, 7953, 7955, 7959, 7980-7 999, 8021-8023, 8668-8706, 8708-8709, 8719-8722, 8825-8831 XI A29 374-426, 539-550, 555-556, 558-559, 671, 682-683, 743, 745-750 7631-7683, 7796-7807, 7812-7813, 7815-7816, 7928, 7939-7940, 800, 8002-8007 XII A30 5942-6665 13189-13911 XIII A31 1-297, 715, 1580-1603 7258-7554 , 7972, 8834-8857 XIV A32 705-714 7962-7971 XV A33 6681-6694 13927-13940 XVI A34 6788-6803 14033-14048 XVII A35 1469-1526, 5147-5151 8725-8781, 12396-12400 CRISPR-related A36 2159-2428 9413-9682 Ec107-like A37 646-648 7903-7905 RT-ATPase A38 2592-2595 9846-9849 RT-DUF4116 A39 676-678, 717-720 7933-7935, 7974-7977 RT-HTH A40 538, 669, 704, 1454 7795, 7926, 7961, 8710 RT-pddex A41 670, 699-701 7927, 7956-7958 RT-unk A42 4917-4979 12167-12228 Bacteriophage A43 4910-4916 12160-12166 Giant phage A44 5195-5941 12444-13188 Outgroup A45 627-645, 650-659, 693-694, 744, 768-769, 1451, 1455, 1467-1468, 1527-1528, 1570, 1578, 1579, 1604-1611, 2429-2444, 2583-2591, 2596-2597, 2 867-2886, 4876-4883, 5144-5146, 5152-5176, 5193-5194, 6666-6680, 6695-6696, 6702-6787, 6804-6822, 6973, 7050-7053, 7134-7257 9 850, 10120- 10139, 12126-12133, 12393-12395, 12401-12425, 12442-12443, 13912-13926, 13941-13942, 13948-14032, 14049-14067, 14216, 14292-14295, 1 4375-14498 Uncategorized Table B with rows B1-B45: Exemplary ncRNA sequences

表B提供逆轉錄子ncRNA序列之非限制性實例,該等逆轉錄子ncRNA序列可根據本文方法進行修飾以獲得用於本文所述之組合物、系統及方法中的重組逆轉錄子ncRNA序列。該等序列係藉由實例3之方法發現的。Table B provides non-limiting examples of retrotransposons ncRNA sequences that can be modified according to the methods herein to obtain recombinant retrotransposons ncRNA sequences for use in the compositions, systems and methods described herein. These sequences were discovered by the methods of Example 3.

詳言之,表B提供與作為實例中所述之計算發現工作的結果經鑑定之新穎逆轉錄子RT相對應之序列標識符。該表提供與作為本說明書之部分所包括的序列表之內容相對應之序列標識符。該表經組織成四十五行,每一行表示形成相關逆轉錄子ncRNA之單一種系發生進化枝的序列,如藉由實例3中所述之計算工作所確定。In detail, Table B provides sequence identifiers corresponding to novel retrotranscript RTs identified as a result of the computational discovery work described in the Examples. The table provides sequence identifiers corresponding to the contents of the sequence listing included as part of this specification. The table is organized into forty-five rows, each row representing sequences that form a single phylogenetic clade of related retrotranscript ncRNAs, as determined by the computational work described in Example 3.

在各個實施例及申請專利範圍中,本揭示案提供基於重組逆轉錄子之基因體編輯系統,該等系統包括藉由各種遞送策略將逆轉錄子RT與ncDNA一起組合於細胞中。在各個態樣中,構成基於重組逆轉錄子之基因體編輯系統的逆轉錄子RT及ncDNA可基於將此類組分配對在一起,該等組分在自然界中天然地發現彼此締合,亦即源自同一細菌物種。此等係稱為逆轉錄子RT及逆轉錄子ncRNA之「同源」配對。藉由鑑定出表B中之彼等ncRNA序列及表A中之彼等RT序列具有 相同的針對序列表中之每個序列所指示之基因體寄存編號,可自序列表中所呈遞/包括之資訊輕鬆地鑑定ncRNA及RT (胺基酸或核苷酸序列)之同源配對。在各個其他態樣中,逆轉錄子RT組分及ncRNA組分可來自不同細菌物種,亦即,在自然界中未作為同源對一起發現。在其他實施例中,逆轉錄子RT組分及ncRNA組分可同時來自同一種系發生功能類型(例如,IA型、I-B1型、I-B2型、IC型等)。例如,基於重組逆轉錄子之基因體編輯系統可包含來自I-A型之逆轉錄子RT (亦即,針對AA之SEQ ID No: 3980-4178及針對NT之SEQ ID No: 11231-11429 – 參見表A)及亦來自I-A型之逆轉錄子ncRNA (亦即,SEQ ID No: 16886-17078 – 參見表B)。 Seq ID No ( 序列表中呈遞之序列) 種系發生進化枝 B1 16886-17078 I-A B2 17478-17622 I-B1 B3 17677-17756 I-B2 B4 14831-14833、14838、14847、14850-15460 IC B5 N/A 1D B6 N/A II B7 17079-17477 II-A1 B8 17660-17676 II-A1其他 B9 19031-19080 II-A2 B10 16414-16516 II-A3 B11 17623-17659 II-A4 B12 N/A II-A4融合 B13 19081-19108 II-A5 B14 16397-16413 III B15 16195-16320 III-A1 B16 15779-15925 III-A2 B17 15476-15778 III-A3 B18 16321-16366 III-A4 B19 16367-16396 III-A5 B20 16705-16814 IV B21 16815-16885 V B22 16517-16704 VI B23 N/A VII-A1 B24 N/A VII-A2 B25 N/A VIII B26 18949-19030 IX B27 N/A X B28 14657-14716、14778-14824、14834、14835-14836、14839、15461-15475 XI B29 14717-14777、14841-14846 XII B30 18413-18936 XIII B31 14499-14656 XIV B32 N/A XV B33 18939 XVI B34 N/A XVII B35 N/A CRISPR相關 B36 15926-16178 Ec107樣 B37 N/A RT-atp酶 B38 N/A RT-DUF4116 B39 N/A RT-pddex B40 N/A RT-HTH B41 14837 RT-unk B42 N/A 噬菌體 B43 N/A 巨型噬菌體 B44 17757-18412 外群 B45 14825-14830、14840、14848-14849、16179-16194、18937-18938、18940-18948 未分類 表C:共有RT胺基酸序列 In various embodiments and claims, the present disclosure provides a genome editing system based on recombinant retrotransposons, which includes combining retrotransposons RT and ncDNA together in cells by various delivery strategies. In various aspects, the retrotransposons RT and ncDNA that constitute the genome editing system based on recombinant retrotransposons can be based on pairing such components together, which are naturally found in nature in association with each other, i.e., derived from the same bacterial species. These are referred to as "homologous" pairing of retrotransposons RT and retrotransposons ncRNA. By identifying that the ncRNA sequences in Table B and the RT sequences in Table A have the same genome accession number indicated for each sequence in the sequence listing , the homologous pairing of ncRNA and RT (amino acid or nucleotide sequence) can be easily identified from the information presented/included in the sequence listing. In various other aspects, the retrotranscript RT component and the ncRNA component can be from different bacterial species, that is, not found together as a homologous pair in nature. In other embodiments, the retrotranscript RT component and the ncRNA component can be from the same lineage at the same time to generate functional types (e.g., type IA, type I-B1, type I-B2, type IC, etc.). For example, a recombinant retrotranscript-based genome editing system may comprise a retrotranscript RT from type IA (i.e., SEQ ID Nos: 3980-4178 for AA and SEQ ID Nos: 11231-11429 for NT - see Table A) and a retrotranscript ncRNA also from type IA (i.e., SEQ ID Nos: 16886-17078 - see Table B). surface Seq ID No ( Sequence number listed in the sequence list) Phylogenetic clades B1 16886-17078 I A B2 17478-17622 I-B1 B3 17677-17756 I-B2 B4 14831-14833, 14838, 14847, 14850-15460 IC B5 N/A 1D B6 N/A II B7 17079-17477 II-A1 B8 17660-17676 II-A1 Others B9 19031-19080 II-A2 B10 16414-16516 II-A3 B11 17623-17659 II-A4 B12 N/A II-A4 Fusion B13 19081-19108 II-A5 B14 16397-16413 III B15 16195-16320 III-A1 B16 15779-15925 III-A2 B17 15476-15778 III-A3 B18 16321-16366 III-A4 B19 16367-16396 III-A5 B20 16705-16814 IV B21 16815-16885 V B22 16517-16704 VI B23 N/A VII-A1 B24 N/A VII-A2 B25 N/A VIII B26 18949-19030 IX B27 N/A X B28 14657-14716, 14778-14824, 14834, 14835-14836, 14839, 15461-15475 XI B29 14717-14777, 14841-14846 XII B30 18413-18936 XIII B31 14499 -14656 XIV B32 N/A XV B33 18939 XVI B34 N/A XVII B35 N/A CRISPR-related B36 15926-16178 Ec107-like B37 N/A RT-ATPase B38 N/A RT-DUF4116 B39 N/A RT-pddex B40 N/A RT-HTH B41 14837 RT-unk B42 N/A Bacteriophage B43 N/A Giant phage B44 17757-18412 Outgroup B45 14825-14830, 14840, 14848-14849, 16179-16194, 18937-18938, 18940-18948 Uncategorized Table C: Consensus RT amino acid sequences

表C提供表A中鑑定之大多數RT種系發生類型之共有胺基酸序列。 RT類型(基於種系發生進化枝) 共有胺基酸序列 SEQ ID NO: IA型 YKVYXIPKRXXGXRXIAXPXXXLKXXQXXXXXXXXXXXXXHXXXXAYXXXXXIKXNAXXHXXXXYXLKXDXXXFFNSIXXXXXXXXXXXXXXXXXXXXXXXXXXXXFWXXXXXXXXXLXLSXGAPSSPXXSNXXMXXFDXXXXXXCXXXXXXYXRYADDXTFSTXXXXXLXXXPXXXXXXLXXXXXXXXXXNXXKTXFSSKAHNRHXTGXTXXNXXXXSXGRXXKRXIXXLXXXXXXX 19365 IB1型 XXXXXXXXXXXXXXXXXLKXXXXFXXXXXXXXXXXXXXXVXSYRKGXXXXXAVXXHXXXXXFXXXDXXXFFXSIXXXXXXXXXXXXXXXXPXXDXXXXXXXXXXXXXXXXXLPXGXXTSPXXSNXXLXXFDXXXXXXCXXXXXXYTRYXDDXIXSXXXXXXXXXXXXXXXXXLXXXXXXXXXXNXXKXKXXXXGXXXKXLGXXILPXGXXXXXXXXKXXXEXXXXXXXXX 19366 IB2型 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSYXXXXXXHXXXXXXXRXDIXXFFXSIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPXGXXXSPXXSNXXFRXXDXXIXXXCXXXXXXYXRYADDXLFSXXXXXXXXXXXXFXXXIXXXXXXXXXXXNXXKXXXXXXXXSLNGXXXXXXXXXXXXSXXKXXXXXXXXXXXXXX 19367 IC型 YXXFXIXKXXGXXRXIXAPXXXLKXXQXXLXXXLXXXXXXXXXXXXXXXXXXHXFXXXXXIXXNAXXHXXXXXVXNXDLXXFFXXXXFGRVXGXFXXXXXFXXXXXXAXXXAQXXCXXXXLPQGXPXSPXIXNXIXXXLDXXXXXXAXXXXXXYXRYADDXTFSTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGFXXNXXKTRXXXXXXRQXVTGLXVNXXXNXXXXYXXXXRXXXXXXXXX 19368 ID型 YXXXXXXKKXGGXRXIXXPXXXLXXXQXXLXXXLXXXYXXXXXXXXXGFXXXXXXXXXXXXIXXNAXXHXXKXXXLNXDXXXFFXSIXXXXXXXXXXXXXFXXXXXXAXXXXLLXTXXXXLPXGAPXSPXXSNXXCXXXDXXLXXXXXXXXXXYXRYADDLTFSXXXXXXXXXXXXXXXXIXXXXFXXNXKKXRXXXXXXXQXVTGXXVNXKXNXXRXXXXXXRAXXHXXXXX 19369 IIA1型 YKXXXIXKXXGXXRXIXXPXXXXKXXQXXXXXXXXXXXXXHXXAXAYXXXXXIXXNAXXHXXXXXXXXXDFXXFFXSIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXLXIGXPXSPXXSNXXXXXXDXXXXXXXXXXXXXYXRYADDXXXSXXXXXXXXXXXXXXXXXXXXXXXXXXXXNXXKTXXXXXXXXXXXTGXXXXXXXXXXXGRXXKRXXXXXXXXXXXX 19370 ID型 YXXXXXXKKXGGXRXIXXPXXXLXXXQXXLXXXLXXXYXXXXXXXXXGFXXXXXXXXXXXXIXXNAXXHXXKXXXLNXDXXXFFXSIXXXXXXXXXXXXXFXXXXXXAXXXXLLXTXXXXLPXGAPXSPXXSNXXCXXXDXXLXXXXXXXXXXYXRYADDLTFSXXXXXXXXXXXXXXXXIXXXXFXXNXKKXRXXXXXXXQXVTGXXVNXKXNXXRXXXXXXRAXXHXXXXX 19371 IIA1型 YXXXXXXXXXXXXRXXXXPXXXLKXXQXWXXXXXXXXXXXXXXXXAYXXXXSXXXXAXXHXXXXXXXXXDIXXFFXSXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXLXXGXXXSPXIXNXXMXXXDXXXXXXXXXXXXXYXRYXDDIXXSSXXXIXXXXXXXXXXXLXXXXXXXNXXKTXXXXXXXXXXXTGXXXXXXXXXXGXXXXXXXXXXXYXXXXX 19372 IIA2型 YRXFXXXKXXGXXRXIXXPRXFXKXXQXXXXDXXLXXLXXHXXXXXXXXXXSXXXNAXXHXXXXXXXXXDIXXXFXXIXXXXXXXXXXXXXXXXXXXXXXXXXXTXXXXLPQGAPTSPXXSNXXLXXFDXXXXXXXXXXXXXYXRYXDDXTXSXXXXXXXXXXXXXXXXXLXXXXXXXNXXKXRXXXXXXXQXVTGXXXNXXXXPXRXXRXXXRAXXXXAXX 19373 IIA3型 YXXXXXXKXXXXXRXIXXPXXXLKXXQXWILXXILXXXXXSXXXXXFXXXXXXXXNAXXHXXXXXXLXXDXXXFFXXXXXXXVXXXFXXXGYXXXXXXXLXXXCXXXXXLPQGXXXSPXXXNLXXXXLDXRXXXXXXXXXXXYTRYADDXXXSXXXXXXXXXXXXXXXXIXXXEXXXXNXXKXXXXXXXXXXXXTGXXXXXXXXXXXXXXXXXRXXXXXXXX 19374 IIA4型 XXXXXIXXXXXXRKIXTXXXXXXXXXXXHXXXXXXXXXXXXXXXFXKAYXXXXSIXXNAXXHMYNDXFXXXDIXXFFXXIXHXXLXXXLXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGLXXGXXXSPXLXNXYXKXFDXIXYGXLKXXXXXXXIYTRYADDXXISFKXXXXXXXXXXXIXXXXXXXLXXXXLXXNXXKXXXXXXXXSNHVXITGXXIXXXXXXXRXXXVGXXXXXXLXXXAXXXXXX 19375 IIA4型 XXXXXXXXXXKXRXXXXXXXXXXGXXXXXXHXXXXXXXXXXXXXXXXSYAYXXXXSIXXCXXXHXXXXXFXKXDIXXFFNSIXXXXLXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPXGLXXSPXXSDXYXXXXXXXXXXXXXXXXXXYTRYADDIXISXXXXXXXXXXXXIXXXXXXXLXXXXLXXNXXKXXXXXXXXXXXXXXXXGXNIXXXXXXXXXXVGXXXXXXXXXXXXXXXXX 19376 IIA5型 YRXFXXXKXXGXXRXIXXPXTYLKVXQWWIXDXIXXXXXXXXXXXGFXXGXXXXXNAXXHXXXXXXLNXDXXXFFXSXXXXXXXXXFXXXGXXXXXXXXLXXLXXXXXXXPXGAPTSPXXXNXXXXXXDXXLXXXXXXXXXXYXRYADDXTFSXXXXXXXXXXXXXXXXXXXXGFXXXXXKTXXXGXXXRXXVTGXXXNXXXXXXXXXRXXXRXXXHXXXXX 19377 IIIA1型 YRXXXIXKXXGXXRXIXEPLPXLKXIQXWILXXILXXXXXSXXAKAXXXXXXXXXNXXXHXXXXXXXXXDXXXFFXXIXXXXXXXXFXXXGYXXXXXXXLXXLCXXXXXLPQGAPTSPXLSNXXXXXXDXXXXXXXXXXXXXYTRYADDXXFSGXXXXXXXXXXXXXXXXXXXXXXNXXKXXXXXXXXXQXVTGXVVNXKXQXXXXXRXXXRXXXXXIXK 19378 IIIA2型 YRXFXIXKXXGGXRXIXXPXXXLXXXQXXIXXXILXXXXXXXXXXXXXXXXSXXXNAXXHXXXXXXLKXDXXXFFXSIXXXXXXXXFXXXGYXXXXXXXLXXXCXXXXXLPQGAXTSPXLSNXXXXXXDXXLXXXXXXXXXXYXRYADDXXXSGXXXXXXXXXXXXXXXXXXGXXXNXXKXXXXXXXXXXIXTGXXXXXXXXXXPXXXXRXXXXXXXXXXXX 19379 IIIA3型 YXXXXXXKXXXXXRXIXXPXXXLXXXQXXIXXXXLXXXXXHXXXXAXXXXXXXXXXAXXHXXXXWXXKXDXXXFFXXXXEXXXXXXFXXXGYXXLXXXEXARXCTXXXXXXXXXXXXXXXXXXXXXXXXXXXGXLPQGAPTSXXLXNLXXXXXDXXXXXXAXXXXXXYTRYXDDXXXSXXXXXXXXXXXXXXXXXXXXXXXXGXXXXXXKXXXXXPGXXXXVLGLXVXXXXXXLXXXXXXXXXXHXXXXXXX 19380 IIIA4型 YXXXXXXKXXGGXRXIXXPXXXLXXXQXWIXXNILXXXXXXXXXXGFXXXXSIXXNAXXHXXXXXXLXXDLXXFXXXIXXXXXXXXFXXXGYXXXXXXXXAXXXTXXXXXXXXXXXXXXXXXXXXLPQGAPXSPXXXNXXXXXXDXRXXXXXXXXXXXYXRYADDXTFSXXXXXXXXXXXXXXIXXXEXXXXNXXKXXXXXXXXXXXVTGLXXXXXXXXXXXXXXXXXXXXXXXCXK 19381 IIIA5型 YXXXXIXKXXGXXRXXXXPXXXLKXXQXWILXXILXXXXXXXXXXGFXXXXSIXXNAXXHXXXXXXXXXDXXXFFPXIXXXXXXXXFXXXGYXXXXXXXXXXXCTXXXXLPQGXPXSPXXXNXXXXXXDXRXXXXXXXXXXXYXRYADDXTXSGXXXXXXXXXXXXXIXXXXXXXXNXXKXXXXXXXXXQXVTGXXVNXXXXXXXXXXXXXXXXIYXXXKX 19382 IIIunk型 YXXXXXXXXXXXXRXXXXPXXXLKXXQXWIXXNILXXXXXXXXXXXXXXXXSIXXNAXXHXXXXXXXXXDIXXFFXSIXXXXXXXXFXXXXXXXXXXXXXXXXXXXXXXXLXQGXPXSPXXXNXXXXXXDXXXXXXXXXXXXXYXRYADDXXXSXXXXXXXXXXXXXXXXXXXXXXXXNXXKTXXXXXXXXXXXTGXXXXXXXXXXXXXXXXXXXEXXYCXX 19383 IV型 YXXXXXXKXXGXXRXXXXPXXXXRXXQXRINXRIFXXXXXWPXXXXGSXPXXXXXXXXXXXDYXXCAXXHCXXKXXLKXDIXXFFXNXXXXXVXXXFXXXXXXXXXXXXXLXXXCXXXXXXXQGXXTSSYXAXLXLXXXEXXXXXXXXXKXLXYTRXVDDITXSSXXXXXXFXXXXXXXXXMLXXXXLPXNXXKXXXXXXXXXXLXVHGLRXXXXXPRXPXXEXXXIRXXVXXXXXX 19384 V型 YXXXXXXXXXXXKXRXXXXPXXXLKXXQKRINXXIFXXXXXPXYLXGGXXXXXXXRDYXXNXXXHXXXXXXIXLDXXXFYXXIXXXXVXXXXXXXXXFXXXVXXXLXXLXTXXXXXPQGXCTSSYXANLXXXXXEYXXXXXXXXXXXXYXRLLDDXTXSXXXXXXXXXXXXXIXXXXXXXXXXXLXXXXXKXXXXXXXXXXXXXXVTGLWXXXXXPXXXXXXRXXIRXXVXXCXXX 19385 VI型 XXXXXXXXXXXXXRXXXXXXXXLXXXXXXXXXXXXXXXXPXXXXXXXXXXXXXXNAXXHXXXXXXXXXDXXXFXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGXXXSXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXDDXXXSXXXXXXXXXXXXXXXXXXXXXXXXXXXKXXXXXXXXXXXXTGXXXXXXXXXXXXXXXXXXXXXXXXXXXX 19386 VIIA1型 XXXXXXXXXXXXXXXRXVWEXXXXXXXXXKXXXRXXXXFXXXXXXXXPHXXXXGYXXGRXXRXNAXXHXGXXXXXXXDXXXFFPSIXXXRXXXXXXXXGXXXXXXXXLXXFXTIXXXLPLGLXXSPXXXNXXXXXXDXXLXXLAXXXXXXYXRYXDDXXXSXXXXXPXXXXXXXXXXXXXFXXXXXKXXXSKXGQXHXVTGLSXXXXXXPHXPRXXKXXLRQELXXXXXX 19387 VIIA2型 XXXXXXXXXXXXXRXXXXXXXXXXXXXKXXXXXXXXXXXXXXXXGFXXXXXXXXNAXXHXXXXXXXXXDXXXFFXXIXXXXXXXXXXXXXXXXXXXXXXXXXXTXXXXLXXGXXTSPXXXNXXXXXXDXXXXXXXXXXXXXXRYXDDXXFSXXXXXXXXXXXXXXXXXXXXXXXXNXXKXXXXXXGXXQXVTGLXXXXXXXPRXXXXXKXXXRXXXXXXXX 19388 VIII型 YXXXXXXKRXXXXXGEXRXVXXAXXXXXXXXHRXXXXXXXXXXXFGXHVQGFXXXRSXXXNAXXHXXXXXXXHADIXXFFXXITXXQVXXXXXXXXXXXXXAXXXAXXCTIDGXLXQGTRCSPXXXNXVCXXXDXXXLXLAXXXXXXXXRYADXXTFSGXXXXXXXXXXXXXXXXGFXLRXXXCYXQXXGXXQXVTGLXVXDXXXPRLPKXXKXXLRLXXXXXXKX 19389 IX型 YRXFXIXXXXXXXRXIXAPXVXLKXXQXWXXXXXXXXXXXXXXVXGFXXGXXXXXAAXXHXXAXWXXSXDXXXFFXXXXXXXXXXXLXXXGYXXXXXXXXXXXXXXXXXLXQGXPXSPXXSNXXXXXXDXXXXXXXXXXXXXXXRYADDXXFSGXXXXXXXXXXXXXXXXXXXXWXXXXXKXXXXXXPXRLKVHGLLVXXXXXXLTKGYRNXXRAXXHXXXXX 19390 X型 YXXXPXXXXXXXXRWIEAPXXXLKXXQRXXLXXXXYXXXXXXXAHGFXXGRSIXXNAXXHXGXXXVVXXDXXXFFPXXXXXXXXXXXXXXXXXXXXXXXXXXLXXXXXXLPQGAPTSPXLXNLVXXXXDXXLXXXAXXXXXXYTRYADDLXFSXXXXXXXXXXXXXXXXXXIXXXXGXXXXXXKXXXXXXXQRQXVTGXVVNXXXXLPXXXRRXLRAXXXXXXXX 19391 XI型 YXXFXIXKXXGXXRXIXXPXXXLKXXQXXXXXXLXXXXXXXXXXXGFXXXXSXXXNAXXHXXXXXXXNXDLXXFFXXIXXXRXXXXXXXXXXXXXXXXAXXXAXXXXXXXXLPQGAPXSPXXSNXXXXXXDXXLXXXAXXXXXXYTRYADDXTXSXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXFXXNXXKXXXXXXXXXXXVTGXXXNXXXNXXRXXXXXXXXXXXXXXXX 19392 XII型 YXXFXXXKXXGGXRXIXAPXXXLXXXQXXXXXXXXXXXXXXXXAHGFXXXXSXXXNAXXHXXXXXXXXXDXXXFFPXXXXXRVXGXFXXXGYXXXXAXXXAXXXTXXXXXXXXXXXXXXXXXXXXRXLPQGAXXSPXXXNXXXXXLDXRLXXXAXXXXXXYTRYADDXTFSXXXXXXXXXXXXXXXXXXXXXXEGFXXXXXKXXXXXXXXXQXVTGXXVNXXXXXXRXXXXXXRAXXXXXXXX 19393 XIII型 YXXFXXPKXXGGXRXIXAPXXXLXXXQXXXLXXXXXXXXXXXXAHGFXXXXSXXXNAXXHXXXXXXXXXDXXXFFPXXXXXRVXGXFXXXGYXXXXAXXXXLXXTXXXXXXXXXXXXXXXXXXXXRXLPQGAXXSPXXXNXXXXXLDXRLXXXAXXXGXXYTRYADDLTFSXXXXXXXXXXXXXXXXXXXXXXEGFXXXXXKXXXXRXXXXQXVTGXVVNXXXXXXRXXXXXXRAXXXXXXXX 19394 XIV型 YXXFXIXKKXGXXRXIXXPXXXLXXXXXXXXXXXXXXXXXXXXXXGFXXXXSXXXNAXXHXXXXXVXNXDLXDFFXSXXXXRXXXXXXXXPXXXXXXXXAXXXXXLCXXXXXXXXXXXXXLPQGXPXSPXXXNXXCXXLDXXLXXXAXXXXXXYXRYADDXTFSXXXXXXXXXXXFXXXXXXIXXXXXXXXNXXKTRXXXXXXRQEVTGXXVXXXXNVXXXYXXXXRXXLXXWXXX 19395 XV型 YXXFXXXKKSGGXRXIXXPXKSLXIXQXKLSQXLYXXYXPXXXVHGXXXXXSIXTNAXXHXXKXFXLNXDIXDFFXSINXGRVRGXFIAXPYXLXXXVATXXAXICCXXNKLPQGAPXSPIXSNLICXXXDXELQXFAXXXXXXYTRYADDITXSXXXXXLPXXLXXXXXXXXXXXXLGXELXXIIXXNGFXINXXKXRLXYXXQXQXVTGLXVNXXVNVXRKYIRNXXXXLHAWEKX 19396 XVI型 YXXFXXXKXXGXXRXIXAPXXXLKXXQXXILXXXLXXVXLXXXAXGFRXXRSIXTNAXXHXXXXXXXKXDXKXFFPSXXXXRVXGXXXXLGYPXXXXXXLTXLXTXXXXLPXGAPTSPXXXNXXXXRXDXRXXXLXXKXXFXYSRYADDXXXSSXXXXXXXXIPFFXXIXXXEGFXXNEXKXXIXRXGXRQXXTGXVVNXKXNXXXXEXXXLRAVXXNCXXX 19397 XVII型 YRXFXXXKXDGXXRXXXXPXXXLKXXQXXXXXXXLXXXXXHPXAXXFXXXXSXXXXAXXHAXXXXXXTXDXXDFFXXTXXXRVXXXXXXXXXXXXXXXXLXXLXXXXXXLPQGAPTSPXLSNXVNXXXDXXXXXXXXXXXXXYTRYXDDXXFSWXXXXXPXXFXXXXXXXLXXXGYXXXPXKXXXXXXXXXXPXXTGXXLXXXGXXXXPXXXXXXXXXXXX 19398 表X – 先前已知之逆轉錄子RT (寄存編號;生物體) Table C provides the consensus amino acid sequences for most of the RT germline occurrence types identified in Table A. RT type (based on phylogenetic clades) Consensus amino acid sequence SEQ ID NO: Type IA YKVYXIPKRXXGXRXIAXPXXXLKXXQXXXXXXXXXXXXXHXXXXAYXXXXXXIKXNAXXHXXXXYXLKXDXXXFFNSIXXXXXXXXXXXXXXXXXXXXXXXXXXXXFWXXXXXXXXXXLXLSXGAPSSPXXSNXXMXXFDXXXXXXCXXXXXXYXRYADDXTFSTXXXXXLXXXPXXXXXXLXXXXXXXXXXNXXKTXFSSKAHNRH XTGXTXXNXXXXSXGRXXKRXIXXLXXXXXXX 19365 Type IB1 XXXXXXXXXXXXXXXXXXLKXXXXFXXXXXXXXXXXXXXXVXSYRKGXXXXXAVXXHXXXXXFXXXDXXXFFXSIXXXXXXXXXXXXXXXXP XXXXXXKXXXEXXXXXXXXX 19366 Type IB2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXRXDIXXFFXSIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXP XXXXXXXXXXXXXX 19367 IC Type YXXFXIXKXXGXXRXIXAPXXXLKXXQXXLXXXLXXXXXXXXXXXXXXXXHXFXXXXXIXXNAXXHXXXXXVXNXDLXXFFXXXXFGRVXGXFXXXXXFXXXXXXAXXXAQXXCXXXXLPQGXPXSPXIXNXIXXXLDXXXXXXAXXXXXXYXRYADDXTFSTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXKTRXXXXXX RQXVTGLXVNXXXNXXXXYXXXXRXXXXXXXXX 19368 ID Type YXXXXXXKKXGGXRXIXXPXXXLXXXQXXLXXXLXXXYXXXXXXXXXGFXXXXXXXXXXXXIXXNAXXHXXKXXXLNXDXXXFFXSIXXXXXXXXXXXXFXXXXXXAXXXXLLXTXXXXLPXGAPXSPXXSNXXCXXXDXXLXXXXXXXXXXYXRYADLTFSXXXXXXXXXXXXXXXXIXXXXFXXNXKKXRXXXXXXXQX VTGXXVNXKXNXXRXXXXXXRAXXHXXXXX 19369 Type IIA1 YKXXXIXKXXGXXRXIXXPXXXXKXXQXXXXXXXXXXXXXHXXAXAYXXXXXIXXNAXXHXXXXXXXXXXDFXXFFXSIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXLXIGXPXSPXXSNXXXXXXDXXXXXXXXXXXXXYXRYADDXXXSXXXXXXXXXXXXXXXXXXXXXXXXXXXXNXXKTXXXXXXXXXXXTGXXXXXXXXXXX GRXXKRXXXXXXXXXXXX 19370 ID Type YXXXXXXKKXGGXRXIXXPXXXLXXXQXXLXXXLXXXYXXXXXXXXXGFXXXXXXXXXXXXIXXNAXXHXXKXXXLNXDXXXFFXSIXXXXXXXXXXXXFXXXXXXAXXXXLLXTXXXXLPXGAPXSPXXSNXXCXXXDXXLXXXXXXXXXXYXRYADLTFSXXXXXXXXXXXXXXXXIXXXXFXXNXKKXRXXXXXXXQX VTGXXVNXKXNXXRXXXXXXRAXXHXXXXX 19371 Type IIA1 YXXXXXXXXXXXXRXXXXPXXXLKXXQXWXXXXXXXXXXXXXXXXAYXXXXSXXXXAXXHXXXXXXXXXDIXXFFXSXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXYXXXXX 19372 Type IIA2 YRXFXXXKXXGXXRXIXXPRXFXKXXQXXXXDXXLXXLXXHXXXXXXXXXXSXXXNAXXHXXXXXXXXXXDIXXXFXXIXXXXXXXXXXXXXXXXXXXXXXXXTXXXXLPQGAPTSPXXSNXXLXXFDXXXXXXXXXXXXXYXRYXDDXTXSXXXXXXXXXXXXXXXXXLXXXXXXXNXXKXRXXXXXXXQXVTGXXXNXXXXPX RXXRXXXRAXXXXAXX 19373 Type IIA3 YXXXXXXKXXXXXRXIXXPXXXLKXXQXWILXXILXXXXXSXXXXXFXXXXXXXXNAXXHXXXXXXLXXDXXXFFXXXXXXXXVXXXFXXXGYXXXXXXXLXXXCXXXXXLPQGXXXSPXXXNLXXXXLDXRXXXXXXXXXXXYTRYADDXXXSXXXXXXXXXXXXXXXXIXXXEXXXXNXXKXXXXXXXXXXXXTGXXXXXXXXXXXXX XXXXRXXXXXXXX 19374 Type IIA4 XXXXXIXXXXXXRKIXTXXXXXXXXXXHXXXXXXXXXXXXXXXXFXKAYXXXXSIXXNAXXHMYNDXFXXXDIXXFFXXIXHXXLXXXLXXXXXXXXXXXXXXXXXXXXXXXXXXXXGLXXGXXXSPXLXNXYXKXFDXIXYGXLKXXXXXXXIYTRYADDXXISFKXXXXXXXXXXXIXXXXXXXLXXXXLXXNXXKXXXXXX XXSNHVXITGXXIXXXXXXXRXXXVGXXXXXXLXXXAXXXXXX 19375 Type IIA4 xxxxxxxxxxxxK XXXXXXXXXVGXXXXXXXXXXXXXXXXXX 19376 Type IIA5 YRXFXXXKXXGXXRXIXXPXTYLKVXQWWIIXDXIXXXXXXXXXXXGFXXGXXXXXNAXXHXXXXXXLNXDXXXFFXSXXXXXXXXXFXXXGXXXXXXXXLXXLXXXXXXXPXGAPTSPXXXNXXXXXXDXXLXXXXXXXXXXYXRYADDXTFSXXXXXXXXXXXXXXXXXXXXXXGFXXXXXKTXXXGXXXRXXVTGXXXXXXXXXXXXX RXXXRXXXHXXXXX 19377 Type IIIA1 YRXXXIXKXXGXXRXIXEPLPXLKXIQXWILXXILXXXXXSXXAKAXXXXXXXXXNXXXHXXXXXXXXXXDXXXFFXXIXXXXXXXXFXXXGYXXXXXXXLXXLCXXXXXLPQGAPTSPXLSNXXXXXXDXXXXXXXXXXXXXYTRYADDXXFSGXXXXXXXXXXXXXXXXXXXXXXNXXKXXXXXXXXXQXVTGXVVNXKXQX XXXXRXXXRXXXXXIXK 19378 Type IIIA2 YRXFXIXKXXG XXXXXXXXXXXX 19379 Type IIIA3 YXXXXXXKXXXXXRXIXXPXXXLXXXQXXIXXXXLXXXXXHXXXXAXXXXXXXXXXAXXHXXXXWXXKXDXXXFFXXXXEXXXXXXFXXXGYXXLXXXEXARXCTXXXXXXXXXXXXXXXXXXXXXXXXXXXGXLPQGAPTSXXLXNLXXXXXDXXXXXXAXXXXXXYTRYXDDXXXSXXXXXXXXXXXXXXXXXXXXXXXXXXG XXXXXXKXXXXXPGXXXXVLGLXVXXXXXXLXXXXXXXXXXHXXXXXXX 19380 Type IIIA4 YXXXXXXKXXGGXRXIXXPXXXLXXXQXWIXXNILXXXXXXXXXXGFXXXXSIXXNAXXHXXXXXXLXXDLXXFXXXIXXXXXXXXFXXXGYXXXXXXXXAXXXTXXXXXXXXXXXXXXXXXXXXLPQGAPXSPXXXNXXXXXXDXRXXXXXXXXXXXYXRYADDXTFSXXXXXXXXXXXXXXIXXXEXXXXNXXKXXXXXXXXXXXXXVTGLXXXXX XXXXXXXXXXXXXXXXXXCXK 19381 Type IIIA5 YXXXXIXKXXGXXRXXXXPXXXLKXXQXWILXXILXXXXXXXXXXGFXXXXSIXXNAXXHXXXXXXXXXDXXXFFPXIXXXXXXXXFXXXGYXXXXXXXXXXXCTXXXXLPQGXPXSPXXXNXXXXXXDXRXXXXXXXXXXXYXRYADDXTXSGXXXXXXXXXXXXXIXXXXXXXXNXXKXXXXXXXXXQXVTGXXVNXXXXXXXXXX XXXXXXIYXXXKX 19382 IIIunk YXXXXXXXXXXXXRXXXXPXXXLKXXQXWIXXNILXXXXXXXXXXXXXXXXSIXXNAXXHXXXXXXXXXDIXXFFXSIXXXXXXXXFXXXXXXXXXXXXXXXXXXXXXXXLXQGXPXSPXXXNXXXXXXXXXXXXXXXXXXXXYXRYADDXXXSXXXXXXXXXXXXXXXXXXXXXXXXXXNXXKTXXXXXXXXXXXTGXXXXXXXXXXXXXXXXXXXX CXX 19383 Type IV YXXXXXXKXXGXXRXXXXPXXXXRXXQXRINXRIFXXXXXWPXXXXGSXPXXXXXXXXXXXDYXXCAXXHCXXKXXLKXDIXXFFXNXXXXXVXXXFXXXXXXXXXXXXXLXXXCXXXXXXXQGXXTSSYXAXLXLXXXEXXXXXXXXXKXLXYTRXVDDITXSSXXXXXXFXXXXXXXXXMLXXXXLP XVHGLRXXXXXPRXPXXEXXXIRXXVXXXXXX 19384 V-type YXXXXXXXXXXXXKXRXXXXPXXXLKXXQKRINXXIFXXXXXPXYLXGGXXXXXXXRDYXXNXXXHXXXXXXIXLDXXXFYXXIXXXXVXXXXXXXXXFXXXVXXXLXXLXTXXXXXPQGXCTSSYXANLXXXXXEYXXXXXXXXXXXXYXRLLDDXTXSXXXXXXXXXXXXXIXXXXXXXXXXXLXXXXXKXXXXXXXXXXXX XXVTGLWXXXXXPXXXXXXRXXIRXXVXXCXXX 19385 Type VI XXXXXXXXXXXXXXRXXXXXXXXLXXXXXXXXXXXXXXXXPXXXXXXXXXXXXNAXXHXXXXXXXXXXDXXXFXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGXXXSXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXKXXXXXXXXXXXXTGXXXXXXXXXXXXXXXXXXXXXXXXXXXX 19386 Type VIIA1 XXXXXXXXXXXXXXRXVWEXXXXXXXXXKXXXRXXXXFXXXXXXXXPHXXXXGYXXGRXXRXNAXXHXGXXXXXXXXDXXXFFPSIXXXRXXXXXXXXGXXXXXXXXLXXFXTIXXXLPLGLXXSPXXXNXXXXXXDXXLXXLAXXXXXXYXRYXDDXXXSXXXXXPXXXXXXXXXXXXXFXXXXXKXXXSKXGQXHXVTGLSXX XXXXPHXPRXXKXXLRQELXXXXXX 19387 Type VIIA2 XXXXXXXXXXXXRXXXXXXXXXXXXXKXXXXXXXXXXXXXXXXGFXXXXXXXXNAXXHXXXXXXXXXXDXXXFFXXIXXXXXXXXXXXXXXXXXXXXXXXXXXTXXXXLXXGXXTSPXXXNXXXXXXDXXXXXXXXXXXXXXRYXDDXXFSXXXXXXXXXXXXXXXXXXXXXXXXNXXKXXXXXXGXXQXVTGLXXXXXXXPRXXXXXXXKXXXRXXXXXXXX 19388 Type VIII YXXXXXXKRXXXXXGEXRXVXXAXXXXXXXXHRXXXXXXXXXXXFGXHVQGFXXXRSXXXNAXXHXXXXXXXHADIXXFFXXITXXQVXXXXXXXXXXXXXXAXXXAXXCTIDGXLXQGTRCSPXXXNXVCXXXDXXXLXLAXXXXXXXXRYADXXTFSGXXXXXXXXXXXXXXXXXXGFXLRXXXCYXQXXGXXQXVTGLXVXXXXXPRLPK XXKXXLRLXXXXXXKX 19389 Type IX YRXFXIXXXXXXXRXIXAPXVXLKXXQXWXXXXXXXXXXXXXXV GYRNXXRAXXHXXXXX 19390 X-Type YXXXPXXXXXXXXRWIEAPXXXLKXXQRXXLXXXXYXXXXXXXAHGFXXGRSIXXNAXXHXGXXXVVXXDXXXFFPXXXXXXXXXXXXXXXXXXXXXXXXLXXXXXXLPQGAPTSPXLXNLVXXXXDXXLXXXAXXXXXXYTRYADDLXFSXXXXXXXXXXXXXXXXXXIXXXXGXXXXXXKXXXXXXXQRQXVTGXVVNXX XXLPXXXRRXLRAXXXXXXXX 19391 Type XI YXXFXIXKXXGXXRXIXXPXXXLKXXQXXXXXXLXXXXXXXXXXXGFXXXXSXXXNAXXHXXXXXXXNXDLXXFFXXIXXXRXXXXXXXXXXXXXXXXAXXXXXXXXXLPQGAPXSPXXSNXXXXXXDXXLXXXAXXXXXXYTRYADDXTXSXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXFXXNXXKXXXXXXXXXXXVTGXXXNX XXNXXRXXXXXXXXXXXXXXXX 19392 Type XII YXXFXXXKXXGGXRXIXAPXXXLXXXQXXXXXXXXXXXXXXXXAHGFXXXXSXXXNAXXHXXXXXXXXDXXXFFPXXXXXRVXGXFXXXGYXXXXAXXXAXXXTXXXXXXXXXXXXXXXXXXXXRXLPQGAXXSPXXXNXXXXXLDXRLXXXAXXXXXXYTRYADDXTFSXXXXXXXXXXXXXXXXXXXXXXEGFXXXXXKXXXXXXXXXQ XVTGXXVNXXXXXXRXXXXXXRAXXXXXXXX 19393 Type XIII YXXFXXPKXXGGXRXIXAPXXXLXXXQXXXLXXXXXXXXXXXXAHGFXXXXSXXXNAXXHXXXXXXXXXXDXXXFFPXXXXXRVXGXFXXXGYXXXXAXXXXLXXTXXXXXXXXXXXXXXXXXXXXRXLPQGAXXSPXXXNXXXXXLDXRLXXXAXXXGXXYTRYADDLTFSXXXXXXXXXXXXXXXXXXXXXXEGFXXXXXKXXXXRXX XXQXVTGXVVNXXXXXXRXXXXXXRAXXXXXXXX 19394 Type XIV YXXFXIXKKXGXXRXIXXPXXXLXXXXXXXXXXXXXXXXXXXXGFXXXXSXXXNAXXHXXXXXVXNXDLXDFFXSXXXXRXXXXXXXXPXXXXXXXXAXXXXXLCXXXXXXXXXXXXXLPQGXPXSPXXXNXXCXXLDXXLXXXAXXXXXXYXRYADDXTFSXXXXXXXXXXXFXXXXXXIXXXXXXXXNXXKTRXXXXXXRQ EVTGXXVXXXXNVXXXYXXXXRXXLXXWXXX 19395 Type XV YXXFXXXKKSGGXRXIXXPXKSLXIXQXKLSQXLYXXYXPXXXVHGXXXXXSIXTNAXXHXXKXFXLNXDIXDFFXSINXGRVRGXFIAXPYXLXXXVATXXAXICCXXNKLPQGAPXSPIXSNLICXXXDXELQXFAXXXXXXYTRYADDITXSXXXXXLPXXLXXXXXXXXXXXXLGXELXXIIXXNGFXINXXKXRL XYXXQXQXVTGLXVNXXVNVXRKYIRNXXXXLHAWEKX 19396 Type XVI YXXFXXXKXXGXXRXIXAPXXXLKXXQXXILXXXLXXVXLXXXAXGFRXXRSIXTNAXXHXXXXXXXKXDXKXFFPSXXXXRVXGXXXXLGYPXXXXXXLTXLXTXXXXLPXGAPTSPXXXNXXXXRXDXRXXXLXXKXXFXYSRYADDXXXSSXXXXXXXXIPFFXXIXXXEGFXXNEXKXXIXRXGXRQXXTGXVVNX KXNXXXXEXXXLRAVXXNCXXX 19397 Type XVII YRXFXXXKXDGXXRXXXXPXXXLKXXQXXXXXXXLXXXXXHPXAXXFXXXXSXXXXAXXHAXXXXXXTXDXXDFFXXTXXXRVXXXXXXXXXXXXXXXXLXXLXXXXXXLPQGAPTSPXLSNXVNXXXDXXXXXXXXXXXXXYTRYXDDXXFSWXXXXXPXXFXXXXXXXLXXXGYXXXPXKXXXXXXXXXXPXXTGXXLXXX GXXXXPXXXXXXXXXXXX 19398 Table X - Previously known retrotransposons RT (accession number; organism)

以下序列揭示如下:Mestre等人, Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classi fi cation of The Encoded Tripartite Systems, Nucleic Acids Research, 第48卷, 第22期, 2020年12月16日, 第12632-12647頁,其以引用之方式整體併入本文中。該等序列經描述為序列參考編號及來源生物體。 序號寄存編號;來源生物體fig|670897.3.peg.2382;大腸桿菌2362-75 WP_000111473.1;大腸桿菌(逆轉錄子-Eco7) fig|286156.4.peg.5031;澳洲光桿菌 fig|171439.3.peg.1995;發光光桿菌發光亞種 fig|1004151.3.peg.110;汗光桿菌NC19 fig|1736225.3.peg.2969;伊文氏桿菌屬Leaf53 fig|1897730.3.peg.2912;檸檬酸桿菌屬CFSAN044567 fig|286156.4.peg.5031;澳洲氣單胞菌 fig|1460083.3.peg.4429;液化沙雷氏菌FK01 fig|585.10.peg.2369;普通變形桿菌 WP_140315795.1;副溶血弧菌(逆轉錄子-Vpa1) fig|670.147.peg.3463;副溶血弧菌 fig|1516159.4.peg.4737;珊瑚弧菌 fig|190893.12.peg.246;珊瑚弧菌 fig|643674.5.peg.820;人類產鹼菌 fig|1122619.3.peg.2381;解尿酸寡菌DSM 18253 fig|29489.5.peg.3423;腸油氣單胞菌 fig|1899355.18.peg.3566;海洋螺菌科細菌 fig|49186.3.peg.4362;斯坦尼海桿菌 fig|672.375.peg.4377;創傷弧菌 fig|584.202.peg.1668;奇異變形桿菌 fig|394935.10.peg.4407;溶血色桿菌 fig|1196083.117.peg.637;阿爾維斯諾德格拉斯菌 fig|1196083.120.peg.2046;阿爾維斯諾德格拉斯菌 fig|1196083.114.peg.825;阿爾維斯諾德格拉斯菌 fig|550.250.peg.2975;陰溝腸桿菌 fig|680.27.peg.793;坎貝氏弧菌 fig|1348393.3.peg.352;假交替單胞菌屬H105 fig|1234128.4.peg.4777;副溶血弧菌SNUVpS-1 fig|69219.6.peg.2213;陰溝腸桿菌溶解亞種 fig|208224.13.peg.2962;神戶腸桿菌 fig|672.332.peg.2758;創傷弧菌 fig|1777131.3.peg.2267;色桿菌屬F49 fig|945550.3.peg.1167;錫那羅弧菌DSM 21326 fig|648.75.peg.922;豚鼠氣單胞菌 fig|1238221.3.peg.2053;副溶血弧菌VPTS-2009 fig|56192.3.peg.3860;髂光桿菌 fig|1806667.7.peg.3169;高盧海單胞菌 fig|272773.3.peg.1019;科斯蒂科鹽弧菌嗜鹼菌亞種 WP_073265166.1;普諾假單胞菌 fig|1946584.3.peg.2789;鹵單胞菌屬UBA3074 fig|2030880.3.peg.665;SAR86簇細菌 fig|80854.14.peg.530;黏摩替亞氏菌 fig|1902503.3.peg.1072;海單胞菌屬QM202 fig|1122212.3.peg.1985;微小海螺菌DSM 6287 fig|40576.4.peg.4387;牛致病菌 fig|287094.3.peg.78;交替單胞菌 fig|1805633.3.peg.1469;不動桿菌屬SFA fig|1945927.3.peg.1017;不動桿菌屬UBA1497 fig|202956.9.peg.1680;湯氏不動桿菌 fig|1811612.3.peg.155;莫拉氏菌科細菌REDSEA-S32_B1 fig|573.14330.peg.438;肺炎克雷伯菌 fig|470.1294.peg.971;鮑曼不動桿菌 fig|762966.3.peg.2452;排泄副腸菌YIT 11859 fig|470.3514.peg.1550;鮑曼不動桿菌 fig|470.2538.peg.3022;鮑曼不動桿菌 fig|48296.130.peg.276;皮特不動桿菌 fig|663.91.peg.4688;溶藻弧菌 fig|296199.3.peg.4813;巨型弧菌 fig|1367490.3.peg.3583;費氏阿里弧菌ETJB5C fig|326537.3.peg.3698;北極科爾韋爾氏菌 fig|1175631.4.peg.4191;芥末果膠桿菌CFBP 3304 WP_001403504.1;大腸桿菌(逆轉錄子-Eco4 / Ec83) fig|549.21.peg.1734;成團泛菌 fig|140100.3.peg.2972;霍亂弧菌 fig|693153.4.peg.1176;大西洋弧菌 fig|1238430.3.peg.1911;黑弧菌AM115 fig|1123036.3.peg.144;北極熱單胞菌DSM 14288 fig|173990.3.peg.3319;太平洋萊茵黑姆氏菌 fig|1869214.4.peg.3809;萊茵黑姆氏菌屬 fig|1898113.7.peg.1514;源洋菌科細菌 fig|29484.39.peg.1876;弗氏耶氏桿菌 fig|1761793.3.peg.274;海桿菌屬DSM 26671 fig|587.48.peg.2666;雷氏普羅威登斯菌 fig|573.4147.peg.1684;肺炎克雷伯菌 fig|1263833.3.peg.2872;黏質沙雷氏菌VGH107 fig|1690502.3.peg.467;泛菌屬CFSAN033090 fig|1029989.3.peg.5037;腸道沙門氏菌腸亞種阿戈納血清型菌株0292 fig|211759.3.peg.770;黏質沙雷氏菌 fig|29483.5.peg.2283;耶氏桿菌 fig|1268238.3.peg.3466;大腸桿菌O5:K4(L):H4菌株ATCC 23502 fig|548.121.peg.2368;產氣克雷伯菌 fig|196024.6.peg.1825;嗜水氣單胞菌 fig|386429.3.peg.3784;假交替單胞菌屬BSi20495 fig|666.2089.peg.3167;霍亂弧菌 WP_159353404.1;霍亂弧菌(逆轉錄子-Vch1 / Vc95) fig|670.362.peg.2186;副溶血弧菌 fig|615.398.peg.1671;黏質沙雷氏菌 fig|571.188.peg.5401;產酸克雷伯菌 fig|1389422.3.peg.2794;肺炎克雷伯菌LAU-KP1 fig|1082704.3.peg.1242;Lonsdalea britannica fig|1686379.3.peg.3365;檸檬酸桿菌屬MGH104 fig|83655.55.peg.221;非去羧勒克氏菌 fig|550.532.peg.617;陰溝腸桿菌 fig|349965.6.peg.153;中間耶氏桿菌ATCC 29909 fig|1947028.3.peg.31;泛菌屬UBA2708 fig|29484.34.peg.3725;弗氏耶氏桿菌 fig|314608.4.peg.222;深淵希瓦氏菌KT99 fig|585.16.peg.3620;普通變形桿菌 fig|1117313.3.peg.4128;北極假交替單胞菌A 37-1-2 fig|1236543.3.peg.1328;腐敗希瓦氏菌JCM 20190 = NBRC 3908 fig|550.520.peg.1818;陰溝腸桿菌 fig|592316.4.peg.43;泛菌屬At-9b fig|1903177.3.peg.4556;弧菌屬10N.261.45.E1 fig|1435069.3.peg.925;弧菌tritonius fig|666.3258.peg.1211;霍亂弧菌 fig|1579504.3.peg.1822;希瓦氏菌屬ECSMB14102 fig|727.548.peg.1576;流感嗜血桿菌 EIJ70524.1;副溶血性嗜血桿菌HK385 fig|1121935.3.peg.14;Hahella ganghwensis DSM 17046 fig|400668.8.peg.2509;海單胞菌屬MWYL1 fig|1777491.3.peg.1212;交替單胞菌屬Mac1 fig|2013797.3.peg.1728;γ-變形菌綱細菌HGW-γ-變形菌綱-15 fig|1008297.7.peg.4158;腸道沙門氏菌腸亞種鼠傷寒血清型菌株798 EDM6246721.1;腸道沙門氏菌腸亞種鼠傷寒血清型(逆轉錄子-Sen2 (St85)) fig|421.19.peg.3278;甲基單胞菌 fig|758.17.peg.102;嗜肺囓齒菌 fig|726.60.peg.864;溶血性嗜血桿菌 fig|1035188.3.peg.348;皮特曼嗜血桿菌HK 85 fig|670.79.peg.3738;副溶血弧菌 fig|1481663.12.peg.913;氣象弧菌 fig|1123402.3.peg.611;按蚊托賽爾氏菌DSM 18579 fig|668.83.peg.3088;費氏阿里弧菌 fig|290110.6.peg.2319;布達佩斯致病菌 fig|568766.10.peg.822;迪基氏菌屬NCPPB 3274 fig|470.4268.peg.2217;鮑曼不動桿菌 fig|1977881.3.peg.1569;不動桿菌屬ANC 4470 fig|548.171.peg.2395;產氣克雷伯菌 fig|584.105.peg.1823;奇異變形桿菌 fig|1275975.3.peg.1756;腸道沙門氏菌腸亞種紐波特血清型菌株Henan_3 fig|615.474.peg.3994;黏質沙雷氏菌 fig|61647.13.peg.3699;格爾戈維亞多桿菌 fig|549.22.peg.222;成團泛菌 fig|991944.3.peg.3216;霍亂弧菌HE-25 WP_001022871.1;霍亂弧菌(逆轉錄子-Vch2 (Vc81)) fig|1638949.3.peg.1051;弧菌屬ECSMB14106 fig|73010.3.peg.2815;鰻魚氣單胞菌 fig|1444141.3.peg.3893;大腸桿菌3-373-03_S3_C1 fig|232.5.peg.1080;交替單胞菌屬 fig|1175295.3.peg.21;假交替單胞菌屬PAMC 22718 fig|265726.7.peg.3430;耐鹽光桿菌 WP_009585554.1;不動桿菌 fig|2004649.3.peg.1632;不動桿菌屬WCHA29 fig|1324350.3.peg.2817;馬不動桿菌 fig|2048003.3.peg.1682;黃交替單胞菌 fig|571.171.peg.5963;產酸克雷伯菌 fig|573.4060.peg.3574;肺炎克雷伯菌 fig|1173850.3.peg.2995;腸道沙門氏菌腸亞種印第安納血清型菌株ATCC 51959 fig|1123516.3.peg.1267;嗜鹽氫弧菌DSM 15072 fig|1981674.3.peg.1814;假單胞菌屬R9(2017) fig|1947311.3.peg.2053;假單胞菌屬UBA2684 fig|1198309.3.peg.4291;螢光假單胞菌ICMP 11288 fig|715451.3.peg.1743;食萘交替單胞菌 fig|316.285.peg.730;施氏假單胞菌 fig|1190606.3.peg.313;卡爾維腸弧菌1F-211 WP_009176189.1; WP_097050713.1;廈門海螺 fig|1208323.3.peg.893;白氏速生桿菌B30 KZK95863.1;假弧菌屬Ad46 fig|101571.310.peg.3956;烏博伯克霍爾德菌 fig|1882791.3.peg.1790;伯克霍爾德菌屬CF099 fig|1736536.3.peg.4809;貪噬菌屬Root434 PIG30812.1;詹森桿菌屬35 fig|1798244.3.peg.1046;披毛菌書細菌GWA2_55_18 fig|1131551.3.peg.1124;嗜甲基菌屬1P/1 fig|1843082.3.peg.1574;大單胞菌屬BK-30 fig|279058.16.peg.4721;山崗單孢菌arenae fig|1548123.6.peg.1144;短毛黴屬T2 fig|380394.4.peg.276;氧化亞鐵硫桿菌ATCC 53993 WP_080292858.1; fig|101571.162.peg.3605;烏博伯克霍爾德菌 fig|1382803.3.peg.22;亞馬遜色桿菌 fig|930.4.peg.3851;氧化硫酸硫桿菌 fig|1261658.3.peg.1787;海藻百伯史坦菌Y31 fig|1679001.3.peg.631;巴氏桿菌科細菌NI1060 fig|1334187.3.peg.653;流感嗜血桿菌KR494 fig|1581107.3.peg.1286;奈瑟菌屬HMSC15G01 fig|486.24.peg.152;乳酸奈瑟菌 fig|1953412.3.peg.1956;細菌UBP10_UBA1160 WP_090322045.1;寡養亞硝化單胞菌 fig|2013740.3.peg.1400;δ-變形菌綱細菌HGW-δ-變形菌綱-13 fig|1907413.3.peg.3170;根瘤菌屬RU33A fig|1817963.3.peg.856;沙漠玫瑰單胞菌 fig|2035448.3.peg.1752;根瘤菌屬C5 WP_014077019.1; fig|1648404.4.peg.2797;大西洋紅桿菌 fig|359.11.peg.6331;髮根農桿菌 fig|887144.4.peg.573;太白山根瘤菌 fig|1116389.3.peg.333;隔離德沃斯氏菌DS-56 fig|121719.10.peg.3421;潘隆尼亞鹼湖桿菌 fig|34002.6.peg.3570;嗜鹼副球菌 fig|1940281.4.peg.1560;赫夫勒氏菌屬 fig|1040981.5.peg.1561;西塞里中根瘤菌WSM4083 fig|410764.3.peg.807;多醫院根瘤菌 fig|1825934.3.peg.3111;安徽根瘤菌 fig|1952824.3.peg.3061;紅豆科細菌UBA3976 fig|1871086.3.peg.2153;短波單胞菌屬 fig|588932.9.peg.647;內藏山短波單胞菌 fig|1951751.3.peg.1538;紅桿菌科細菌UBA1460 fig|1843368.3.peg.904;鞘脂菌屬RAC03 fig|155892.10.peg.3219;弧形柄桿菌 fig|43057.4.peg.4537;固氮紅桿菌 fig|1514904.3.peg.974;海洋阿倫斯氏菌 fig|1338034.3.peg.722;副溶血弧菌O1:Kuk菌株FDA_R31 fig|150340.18.peg.1837;古弧菌 fig|196024.5.peg.3821;嗜水氣單胞菌 fig|244366.32.peg.1886;水痘克雷伯菌 fig|180957.35.peg.1654;巴西果膠桿菌 fig|55601.149.peg.665;鰻弧菌 fig|121723.5.peg.2901;光桿菌屬SKA34 fig|584.170.peg.837;奇異變形桿菌 fig|40324.136.peg.3276;嗜麥芽寡養單胞菌 fig|1122188.5.peg.411;海綿溶桿菌DSM 21749 fig|2032566.3.peg.2826;黃單胞菌科細菌NML93-0792 fig|287.1731.peg.2578;綠膿桿菌 fig|251702.3.peg.1529;丁香假單胞桿菌金魚草致病變種 fig|1960829.3.peg.5912;假單胞菌屬MF6394 fig|76759.17.peg.5093;蒙氏假單胞菌 fig|1981678.3.peg.5241;假單胞菌屬R45(2017) fig|1699620.3.peg.3028;假單胞菌屬RIT-PI-r fig|191391.4.peg.2140;薩洛莫假單胞菌 fig|1844093.4.peg.7190;假單胞菌屬22E 5 fig|287.1744.peg.1414;綠膿桿菌 fig|287.1987.peg.910;綠膿桿菌 fig|287.4372.peg.4481;綠膿桿菌 fig|1856685.4.peg.2159;假單胞菌屬TCU-HL1 fig|1718920.3.peg.3357;假單胞菌屬ICMP 8385 fig|1781066.3.peg.2816;杜擀氏菌屬HH101 fig|95485.5.peg.60;穩定伯克霍爾德菌 fig|1572871.6.peg.588;詹森桿菌屬BJB304 WP_034208069.1;洋蔥伯克霍爾德菌 WP_074283015.1;伯克霍爾德菌屬GAS332 fig|1168169.3.peg.2570;甲基單胞菌屬11b fig|1899355.16.peg.1328;海洋螺菌科細菌 WP_093197597.1;貪噬菌屬YR750 fig|1660091.3.peg.1650;博德氏菌屬SCN 67-23 fig|134375.17.peg.4387;無色桿菌屬 fig|426114.10.peg.1990;砷氧化硫單胞菌 fig|1947551.3.peg.1903;寡養單胞菌屬UBA2302 fig|1914330.4.peg.2242;鹽球菌屬 fig|1947037.3.peg.890;泛菌屬UBA5707 WP_094422719.1;考氏科薩克氏菌 WP_079496884.1; WP_088126255.1;科貝腸桿菌 WP_049614309.1;耶氏桿菌 WP_048263135.1;秘魯果膠桿菌 WP_040197602.1;肺炎克雷伯菌 fig|669.34.peg.1586;哈維弧菌 fig|672.219.peg.1032;創傷弧菌 fig|670.1028.peg.1775;副溶血弧菌 WP_065207673.1;明亮光桿菌 fig|1869214.3.peg.2231;萊茵黑姆氏菌屬 WP_029795910.1;副溶血弧菌 fig|1191302.3.peg.1081;粗牡蠣弧菌9ZC77 fig|668.70.peg.1192;費氏阿里弧菌 fig|28229.4.peg.4229;冷紅科爾韋爾氏菌 fig|1855726.3.peg.270;伯克霍爾德菌屬KK1 fig|1674888.3.peg.829;伯克霍爾德菌Beta_02 fig|687412.4.peg.1108;水生假紅細菌 WP_092465129.1;象牙白栖東海菌 fig|1120653.3.peg.5479;劍菌屬LC384 fig|121719.5.peg.401;潘隆尼亞鹼湖桿菌 fig|1798804.3.peg.1597;根瘤菌屬58 fig|1946675.3.peg.3089;科迪單胞菌屬UBA4487 fig|36861.5.peg.1400;脫氮硫桿菌 fig|1115835.3.peg.1003;通用型嗜甲基菌79 fig|1797188.3.peg.1508;酸桿菌細菌RIFCSPLOWO2_12_FULL_60_22 fig|57320.3.peg.123;深部假脫硫弧菌 fig|1267534.3.peg.1238;酸桿菌科細菌KBS 89 fig|1951344.3.peg.527;酸桿菌科細菌UBA1307 WP_006226461.1;馬普拉無色桿菌 fig|1503054.4.peg.5764;停滯伯克霍爾德菌 WP_006159686.1;巴塞爾貪銅菌 WP_090191767.1;未分類杜擀氏菌 fig|539.8.peg.1698;腐蝕艾肯氏菌 fig|1946925.3.peg.2129;米卡弧菌屬UBA5701 WP_047031309.1;赫夫勒氏菌屬IMCC20628 fig|1946134.3.peg.1092;短波單胞菌屬UBA6547 WP_093914930.1;海洋硫酸桿菌 fig|1862950.3.peg.1234;根瘤菌目細菌NRL2 fig|1166078.4.peg.1483;葉狀金色單胞菌 fig|709015.3.peg.734;海葵海洋桿菌DSM 19842 WP_092160028.1;脫硫弧菌鐵還原菌 fig|2026749.3.peg.3364;Ignavibacteriae細菌 WP_033771991.1;成團泛菌 WP_097097099.1;未分類腸桿菌科(混雜) fig|1444151.3.peg.2733;大腸桿菌2-177-06_S3_C2 WP_137545672.1;大腸桿菌逆轉錄子-Eco3 (Ec73) fig|573.14856.peg.3852;肺炎克雷伯菌 WP_072021595.1;黏質沙雷氏菌 fig|29571.3.peg.478;冰下鹽單胞菌 WP_004534676.1; WP_095622523.1;鹽單胞菌屬WRN001 fig|376427.4.peg.3223;古道鹽單胞菌 fig|862908.3.peg.745;海洋嗜鹽噬菌弧菌SJ SCJ40239.1;未培養之梭菌屬 fig|717962.3.peg.287;貓糞球菌GD/7 WP_014642259.1;嗜鹽喜鹽芽孢桿菌 fig|2009042.3.peg.2106;假單胞菌屬Irchel 3H7 fig|1981718.3.peg.4346;假單胞菌屬B39(2017) fig|665135.13.peg.1401;假單胞菌屬In5 fig|1949067.3.peg.5629;假單胞菌屬PICF141 WP_007948552.1;假單胞菌屬GM21 SFB61662.1;鶴羽田戴爾福特菌 WP_011615687.1; WP_014778098.1; fig|1429083.4.peg.2612;侯賽因假單胞菌 WP_095024014.1;假單胞菌 WP_090203690.1;阿脾假單胞菌 fig|564423.8.peg.1646;托拉假單胞菌NCPPB 2192 WP_078802277.1;螢光假單胞菌 WP_090453229.1;假單胞菌 fig|1306420.5.peg.1032;鼻疽伯克霍爾德菌MSHR5848 fig|1357270.3.peg.1923;丁香假單胞菌UB246 fig|2018067.3.peg.2950;假單胞菌屬FDAARGOS_380 fig|317.311.peg.3241;丁香假單胞菌 fig|287.2309.peg.126;綠膿桿菌 WP_039522442.1;巴西果膠桿菌 WP_080861357.1;肺炎克雷伯菌 WP_014542745.1;伊文氏桿菌屬Ejp617 OSL25696.1;大腸桿菌TA255 fig|1125693.3.peg.761;奇異變形桿菌WGLW4 KMK80587.1;黑腐果膠桿菌ICMP 1526 fig|550.717.peg.2037;陰溝腸桿菌 ACS86154.1;香蕉迪基氏菌Ech703 WP_050122514.1;弗氏耶氏桿菌 WP_081334048.1;麥克萊迪交替單胞菌 WP_055016254.1;假交替單胞菌屬P1-13-1a fig|56799.5.peg.478;科爾韋爾氏菌屬 fig|666.3375.peg.2486;霍亂弧菌 PIW62005.1;希瓦氏菌屬CG12_big_fil_rev_8_21_14_0_65_47_15 OCA54994.1;南瑙光桿菌 CNK75559.1;弗氏耶氏桿菌 WP_024248662.1;艾氏菌 WP_088618141.1;耐冷甲基卵菌 WP_051669880.1; PCJ98666.1;交替單胞菌科細菌 WP_081919471.1;嗜鐵酸硫桿菌 WP_055769167.1;寡養單胞菌 WP_039422954.1;黃單胞菌vesicatoria WP_078568253.1;野油菜黃單胞菌 WP_093486747.1;未分類之假黃單胞菌 WP_077445058.1;紅桿菌屬C05 WP_092576562.1;無色桿菌屬NFACC18-2 fig|1330528.3.peg.2198;大腸桿菌NCCP 15656 fig|83655.67.peg.2965;非去羧勒克氏菌 fig|573.10044.peg.2850;肺炎克雷伯菌 WP_071888955.1;腸桿菌目 fig|1799789.3.peg.4357;水解居水菌 fig|2024839.8.peg.1563;栖藻海卵菌屬 fig|1381081.7.peg.1167;潘努里弧菌 fig|670.908.peg.3444;副溶血弧菌 fig|626887.3.peg.2431;南海海桿菌D15-8W fig|1913989.101.peg.1616;γ-變形菌綱細菌 fig|262489.9.peg.2938;δ-變形菌MLMS-1 fig|2035207.3.peg.545;詹森桿菌屬67 fig|28095.13.peg.1040;唐菖蒲伯克霍爾德菌 fig|941449.3.peg.1262;脫硫弧菌屬X2 fig|1768806.3.peg.778;紅螺菌科細菌CCH5-H10 WP_083634830.1;脫硫弧菌屬DV fig|1231.4.peg.574;多形亞硝化螺菌 fig|604089.3.peg.1142;中華耐冷黃桿菌 fig|357523.3.peg.1851;黃桿菌屬11 fig|1423323.5.peg.321;黃桿菌屬AED fig|178356.3.peg.502;新疆黃桿菌 fig|1946545.3.peg.3457;黃桿菌屬UBA4120 fig|150146.3.peg.2822;吉利黃桿菌 fig|229203.4.peg.1981;德氏黃桿菌 fig|280093.5.peg.432;顆粒黃桿菌 fig|728056.4.peg.1154;鈎吻黃桿菌 fig|143224.8.peg.2343;潮濕卓貝爾氏黃桿菌 fig|1225176.3.peg.4300;Cecembia lonarensis LW9 fig|1434700.3.peg.581;沈積物漠河桿菌 fig|996.47.peg.468;柱狀黃桿菌 fig|172045.56.peg.2231;米爾伊麗莎白氏菌 fig|2024823.3.peg.2086;Altibacter屬 fig|2026728.18.peg.4090;藏紅花黃色綫菌科細菌 fig|980584.3.peg.2930;噬瓊膠海水菌 fig|1946744.3.peg.1682;萊文虎克菌屬UBA1003 fig|1046627.3.peg.2526;Bizionia argentinensis JUB59 fig|906888.15.peg.37;Nonlabens ulvanivorans fig|407022.4.peg.2865;國產橄欖形菌 fig|1500282.3.peg.3713;金黃桿菌屬CF365 WP_084550290.1;大眼金黃桿菌 fig|190304.8.peg.741;具核梭桿菌具核亞種ATCC 25586 fig|1352.1731.peg.603;屎腸球菌 fig|1428.658.peg.666;蘇雲金芽孢桿菌 fig|1497681.3.peg.3095;紐約李斯特菌 fig|1396.1440.peg.4237;蠟樣芽孢桿菌 fig|1917876.3.peg.2997;布勞特氏菌屬Marseille-P3087 fig|1952168.3.peg.215;毛螺菌科細菌UBA7480 fig|1907659.3.peg.1085;布勞特氏菌屬Marseille-P3201T fig|1265309.16.peg.461;移動表桿菌F1926 fig|853.163.peg.215;普氏糞桿菌 fig|1264.5.peg.4;白色瘤胃球菌 fig|1500289.3.peg.4469;金黃桿菌屬OV705 fig|1197728.3.peg.2386;概念普雷沃氏菌9403948 fig|1947486.3.peg.2515;鞘氨醇桿菌屬UBA1897 fig|529.12.peg.1303;人蒼白桿菌 fig|1523429.3.peg.2936;根瘤菌屬AAP116 fig|1761878.3.peg.469;類芽孢桿菌屬cl6col fig|1462996.4.peg.2634;永吉類芽孢桿菌 fig|582475.4.peg.4724;木糖離胺酸桿菌 fig|1773.7915.peg.7638;結核分枝桿菌 fig|360310.3.peg.4853;芽孢桿菌屬CDB3 fig|1396.515.peg.2936;蠟樣芽孢桿菌 fig|662367.4.peg.242;內生螺狀菌 fig|1895719.3.peg.2950;擬桿菌門細菌45-6 fig|906888.9.peg.926;Nonlabens ulvanivorans fig|694433.3.peg.2346;大腐生螺旋體DSM 2844 fig|1167006.5.peg.2941;Desulfocapsa sulfexigens DSM 10523 fig|649724.3.peg.304;梭菌屬ATCC BAA-442 fig|1505.32.peg.2959;索氏梭菌 fig|1953142.3.peg.1858;擬桿菌門細菌UBA1947 fig|2029590.3.peg.2754;黏液桿菌屬MD40 fig|29581.33.peg.2300;藍黑紫色桿菌 fig|40324.292.peg.236;嗜麥芽寡養單胞菌 fig|1403329.3.peg.287;單核細胞增生李斯特菌Lm25180 fig|1121865.3.peg.1262;哥倫比亞腸球菌DSM 7374 = ATCC 51263 fig|1120746.3.peg.3113;細菌MS4 fig|1952299.3.peg.221;瘤胃球菌科細菌UBA2656 fig|1965604.3.peg.686;厭氧馬賽桿菌屬An250 fig|1673717.3.peg.805;塞內加爾厭氧馬賽桿菌 WP_116884683.1;食物谷菌 fig|1948697.3.peg.196;慢球菌細菌UBA4640 fig|1232460.3.peg.46;梭菌目細菌VE202-28 WP_007864340.1;梭菌目 WP_055649738.1;Hungatella hathewayi fig|1226325.3.peg.2005;梭菌屬KLE 1755 fig|1432052.10.peg.3166;泰伊艾森伯格菌 fig|208479.8.peg.4376;波爾特氏腸道梭狀菌 fig|1298920.3.peg.1959;[脫硫腸狀菌]滴狀DSM 4024 fig|1776047.3.peg.4241;梭菌屬C105KSO15 fig|1946596.3.peg.2399;Hungatella屬UBA4396 fig|1946603.3.peg.924;Hungatella屬UBA7603 fig|1410651.3.peg.407;[梭菌]耐氧DSM 5434 fig|1697784.3.peg.9617;梭菌屬細菌UC5.1-1D4 fig|1745713.3.peg.3865;馬賽桿菌 fig|180332.3.peg.1515;Robinsoniella peoriensis WP_072851604.1;Lactonifactor longoviformis WP_003507561.1;梭菌目 fig|1111728.3.peg.587;水生布戴約維采菌DSM 5075 = ATCC 35567 fig|1122977.4.peg.2473;泉水布拉格菌DSM 5563 = ATCC 49100 fig|1950915.3.peg.189;梭菌目細菌UBA644 fig|1950927.3.peg.912;梭菌目細菌UBA7187 ERK60856.1;顫桿菌屬KLE 1728 WP_009260579.1;普氏梭桿菌 fig|1235797.3.peg.2409;顫桿菌屬1-3 fig|1520815.3.peg.1262;瘤胃球菌科細菌D5 fig|1855302.3.peg.1138;假丁酸弧菌屬JW11 fig|43305.5.peg.3631;蛋白分解丁酸弧菌 fig|411463.15.peg.1791;腹腔真桿菌ATCC 27560 fig|1235792.3.peg.3837;毛螺菌科細菌M18-1 fig|97139.3.peg.669;阿拉伯沙氏桿菌 fig|1291051.3.peg.1165;解甘草地中海桿菌JCM 13369 fig|1532.6.peg.4793;類球布勞特氏菌 fig|1121114.4.peg.5478;生產布勞特氏菌ATCC 27340 = DSM 2950 fig|1262776.3.peg.1908;梭菌屬CAG:149 fig|1262792.3.peg.1164;梭菌屬CAG:299 fig|1262995.3.peg.2852;厚壁菌門細菌CAG:646 fig|537007.17.peg.3146;漢遜布勞特氏菌DSM 20583 fig|1965569.3.peg.1928;腸道核心菌屬An169 fig|1952411.3.peg.2018;瘤胃球菌科細菌UBA6353 fig|1965578.3.peg.1947;假梭桿菌屬An187 WP_001775049.1;大腸桿菌逆轉錄子-Eco5 (Ec107) WP_012602583.1; WP_015962464.1;腸桿菌科細菌菌株FGI 57 fig|1005999.3.peg.3342;格氏勒米諾菌ATCC 33999 = DSM 5078 fig|1378073.3.peg.795;腸桿菌屬CC120223-11 fig|911023.3.peg.138;雷金斯堡約克氏菌ATCC 49455 fig|1834193.3.peg.4113;腸球菌屬9E7_DIV0242 無花果|1649188.10.peg.406;果阿李斯特菌 fig|1430899.3.peg.278;弗萊施曼尼李斯特菌1991 fig|1211844.4.peg.748;候選Stoquefichus馬賽AP9 fig|1658109.3.peg.34;候選Stoquefichus屬SB1 fig|1262793.3.peg.950;梭菌屬CAG:302 fig|1262908.3.peg.1120;支原體屬CAG:956 fig|1674844.3.peg.242;梭菌目細菌Firm_06 fig|1410672.3.peg.2823;黃色瘤胃球菌ND2009 fig|1947424.3.peg.1718;瘤胃球菌屬UBA4310 fig|1265.9.peg.2602;黃色瘤胃球菌 fig|1336236.3.peg.1817;黃色瘤胃球菌ATCC 19208 CDC65895.1;瘤胃球菌屬CAG:57 WP_092946213.1;瘤胃球菌科細菌YRB3002 fig|1307.1644.peg.1532;豬鏈球菌 WP_050516365.1;大腸桿菌 WP_097505494.1;大腸桿菌逆轉錄子-Eco5 (Ec107) fig|573.15585.peg.2343;肺炎克雷伯菌 WP_023581669.1;豪氏變形桿菌 WP_079656969.1;黏質沙雷氏菌 WP_090085157.1;植物桿菌屬SCO41 fig|573.15584.peg.1543;肺炎克雷伯菌 WP_023330997.1;陰溝腸桿菌複合物 fig|72407.673.peg.2552;肺炎克雷伯菌肺炎亞種 CNM01182.1;假結核耶氏桿菌 CNG88012.1;小腸結腸炎耶氏桿菌 fig|1925763.3.peg.649;需鹽海桿菌 PKW24121.1;海桿菌屬LV10R510-8 WP_045597342.1;創傷弧菌 WP_098972386.1;氣單胞菌屬CU5 WP_005172873.1;小腸結腸炎耶氏桿菌 WP_052979504.1;腸桿菌科 WP_083069261.1;成團泛菌 WP_053911905.1;假交替單胞菌屬SW0106-04 fig|1916082.18.peg.39;交替單胞菌科細菌 WP_046555216.1;Arsukibacterium屬MJ3 KPW01986.1;假交替單胞菌屬P1-8 WP_094277737.1;鮑曼海洋單胞菌 fig|1414654.3.peg.2005;耐冷海洋球菌 WP_008133621.1;未分類之假交替單胞菌 KQA22543.1;氣象弧菌 WP_000284440.1;霍亂弧菌 WP_011261677.1;費氏阿里弧菌 KEE40622.1; WP_012982829.1; ALL66139.1;加勒比副伯克霍爾德菌MBA4 WP_093223969.1;溫哥華假單胞菌 fig|2015553.3.peg.2940;假單胞菌屬PGPPP1 WP_096082869.1;綠膿桿菌 ONM67687.1;綠膿桿菌 fig|316.213.peg.2906;施氏假單胞菌 WP_078734267.1;螢光假單胞菌 WP_079384669.1;綠膿桿菌 WP_095948157.1;聚硼貪噬菌 WP_011625020.1;希瓦氏菌屬MR-7 WP_100292553.1;岩洞氣單胞菌 WP_055021484.1;假交替單胞菌屬P1-26 PHS01491.1;海洋桿菌屬 fig|2024618.3.peg.1141;不動桿菌屬BS1 WP_114139108.1;肺炎克雷伯菌 WP_077749737.1;假單胞菌屬FSL W5-0299 WP_078451378.1;綠膿桿菌 WP_007245785.1;丁香假單胞菌群 fig|316.280.peg.1454;施氏假單胞菌 WP_086822222.1;綠膿桿菌 WP_073268605.1;普諾假單胞菌 WP_095280108.1;鹹海鮮萊略特氏菌 WP_095715328.1;檸檬酸桿菌屬TSA-1 WP_050111525.1;耶氏桿菌 WP_013724211.1;維氏氣單胞菌 WP_021140819.1;殺鮭氣單胞菌 fig|1094342.5.peg.1611;食烷菌(Alcanivorax xenomutans) fig|1932666.4.peg.1886;海仙菌屬 WP_087148323.1;多孢子鐵細菌 WP_064022638.1;甲基單胞菌屬DH-1 PIY64876.1;希瓦氏菌屬CG_4_10_14_0_8_um_filter_42_13 WP_006710190.1;魚腸弧菌 WP_045040928.1;髂光桿菌 WP_054543201.1;燦爛弧菌 WP_080540293.1;創傷弧菌 fig|2032624.3.peg.2540;鹵單胞菌屬WN018 KJT50308.1;腸道沙門氏菌腸亞種海德堡血清型菌株RI-11-014588逆轉錄子-Sen1 (Se72) WP_005761319.1; ODQ05744.1;志賀桿菌屬FC130 KKW01006.1;候選Saccharibacteria細菌GW2011_GWC2_48_9 KMZ12260.1;候選歐李伯克霍爾德菌 SFQ04394.1;羅爾斯頓氏菌屬NFACC01 WP_025373922.1;Advenella mimigardefordensis WP_093341200.1;貪噬菌屬PDC80 WP_091453700.1;Giesbergeria anulus SAY51889.1;韋氏奈瑟菌 WP_065255232.1;腔隙莫拉氏菌 WP_049330876.1;奈瑟菌 fig|1196095.197.peg.151;蜜蜂吉利桿菌 WP_072956843.1;產氣弧菌 fig|857087.3.peg.3286;甲基單胞菌MC09 fig|1952222.3.peg.1307;甲基球菌科細菌UBA3127 WP_039486261.1;錫那羅州弧菌 WP_065545234.1;大菱鮃弧菌 WP_033094845.1;冷紅科爾韋爾氏菌 WP_057552475.1;霍亂弧菌 WP_004726393.1;弗尼斯弧菌 fig|2020862.3.peg.1934;嗜鹽噬菌弧菌屬 fig|624.1260.peg.1437;宋內志賀桿菌 WP_011516221.1;伯克霍爾德菌目 fig|1947370.3.peg.1923;短毛黴屬UBA4517 WP_038400955.1;假結核耶氏桿菌 fig|1951903.3.peg.117;Halieaceae細菌UBA3099 WP_024914507.1;Chania multitudinisentens WP_042893228.1;腸桿菌科 WP_038238211.1;嗜線蟲致病菌 EXI65661.1;候選Accumulibacter屬SK-12 WP_016452106.1;戴爾福特菌 WP_013517170.1;脫氮嗜脂環物菌 OXC73828.1;Caballeronia sordidicola AIO65205.1;俄克拉荷馬伯克霍爾德菌 WP_013234866.1;織片草螺菌 WP_082884385.1;魚立克次體科細菌NZ-RLO1 fig|2006849.4.peg.371;黃單胞菌目細菌 WP_074262787.1;副伯克霍爾德菌吩嗪鎓 WP_009906786.1;泰國伯克霍爾德菌 WP_022524328.1; WP_081817450.1;鹽單胞菌屬HL-48 WP_020312233.1;丁香假單胞菌 KPY75916.1;杏仁假單胞菌煙草致病變種 fig|1793966.3.peg.180;河川假單胞菌 fig|1891229.16.peg.2033;假單胞菌目細菌 WP_099454886.1;惡臭假單胞菌 WP_092400423.1;假單胞菌屬NFACC39-1 WP_012315430.1;惡臭假單胞菌 WP_020799819.1;假單胞菌屬G5(2012) WP_004574016.1; fig|1435425.3.peg.787;假單胞菌屬QTF5 WP_045490543.1;假單胞菌屬StFLB209 WP_011506503.1;需鹽色鹽桿菌 fig|1609967.3.peg.3047;鹵單胞菌屬HG01 fig|1492738.3.peg.2698;黃桿菌seoulense WP_092849245.1;噬果膠海藻桿菌 WP_025835957.1;擬桿菌 fig|2025877.3.peg.668;副擬桿菌屬AT13 fig|246787.6.peg.2081;纖維素擬桿菌 fig|1339287.3.peg.1113;脆弱擬桿菌菌株3986 T(B)9 fig|1946017.3.peg.1516;另枝菌屬UBA940 WP_038655380.1;水蛭食黏菌 WP_093669272.1;黏桿菌屬MAR_2009_124 WP_073241067.1;內海黃桿菌 WP_096193803.1;噬纖維菌目細菌TFI 002 WP_076357635.1; WP_073238193.1;污水土地桿菌 WP_076451370.1; WP_091906542.1;紫單胞菌科細菌KH3R12 WP_051365712.1;鹽帽黃桿菌 fig|1938609.3.peg.1765;黃桿菌屬LM4 SDJ72221.1;非離心黃桿菌 fig|1985174.3.peg.2584;噬幾丁質菌科細菌IBVUCB2 WP_092737749.1;鴿咽喉裏默桿菌 fig|192149.3.peg.42;鼠尾菌屬 fig|418630.3.peg.1685;巨紅桿菌 fig|1915314.3.peg.3469;硫硼大桿菌屬DLFJ5-1 fig|2030815.3.peg.2725;馬氏磺醯菌屬 fig|2035451.3.peg.4632;根瘤菌屬L18 WP_043872258.1;印度洋速生桿菌 WP_055683826.1;紅色簡納西氏菌 fig|1947537.3.peg.498;鞘脂單胞菌屬UBA6198 WP_069065961.1;鞘脂菌屬RAC03 WP_084280100.1;新鞘脂菌屬B1 fig|1895845.3.peg.487;鞘脂菌屬66-54 GAK73419.1;懸鉤子農桿菌TR3 = NBRC 13261 WP_090966398.1;葉狀金色單胞菌 WP_091860144.1;刺槐博斯氏菌 WP_085092006.1;水稻固氮螺菌 fig|1528100.4.peg.28;Methylomagnum ishizawai fig|32057.3.peg.9515;眉藻屬PCC 7103 fig|103690.10.peg.3571;念珠藻屬PCC 7120 = FACHB-418 fig|1137095.11.peg.15;偽枝藻屬HK-05 CDZ48826.1;山羊豆根瘤菌(Neorhizobium galegae bv. officinalis) WP_072340070.1;德沃斯氏菌enhydra OYR18277.1;噻吩蒼白桿菌 WP_093509439.1;鞘脂單胞菌屬YR583 WP_081799025.1;食樹脂新鞘脂菌 PIY55545.1;ζ-變形菌門細菌CG_4_10_14_0_8_um_filter_49_80 SDT44912.1;加那利慢生根瘤菌 WP_096350346.1; WP_074962594.1;紅色簡納西氏菌 WP_038724888.1;鼻疽伯克霍爾德菌 WP_012217410.1;多噬伯克霍爾德菌 WP_100428762.1;詹森桿菌屬67 WP_082161008.1;候選反硝化競爭桿菌 AFL73219.1;紫色硫囊菌屬DSM 198 WP_014427842.1; fig|364030.3.peg.3554;精緻硫單胞菌 KGW20495.1;鼻疽伯克霍爾德菌MSHR2451 SFE83076.1;貪噬菌屬OK212 WP_013028226.1;Sideroxydans lithotrophicus WP_080311424.1;鼻疽伯克霍爾德菌 fig|337.13.peg.3872;莢殼伯克霍爾德菌 WP_082643860.1;假單胞菌 CKH90039.1;綠膿桿菌 WP_083287254.1;未分類之詹森桿菌 WP_122648546.1;鼻疽伯克霍爾德菌 WP_082706753.1;未分類之假單胞菌 WP_080936076.1;肺炎克雷伯菌 WP_000746343.1;腸桿菌科 EMX54653.1;大腸桿菌MP020980.2 WP_053270700.1;大腸桿菌 fig|1736224.3.peg.3731;沙雷氏菌屬Leaf51 fig|1175299.4.peg.709;玉米迪基氏菌ZJU1202 WP_001461245.1;腸桿菌科 fig|617145.3.peg.3535;燦爛弧菌1F-157 fig|1440054.3.peg.3851;弧菌屬OY15 fig|617135.3.peg.594;費氏阿里弧菌ZF-211 WP_023267764.1;脫色希瓦氏菌 fig|1481663.36.peg.3628;氣象弧菌 fig|670.893.peg.2716;副溶血弧菌 fig|680.33.peg.5391;坎貝氏弧菌 fig|298386.8.peg.4344;深海光桿菌SS9 fig|663.73.peg.714;溶藻弧菌 fig|1333511.3.peg.3208;游海假交替單胞菌TAB23 WP_064574154.1;副蜂房哈夫尼菌 WP_064645509.1;變形肥桿菌 fig|630.105.peg.4248;小腸結腸炎耶氏桿菌 fig|400673.7.peg.1969;嗜肺軍團菌菌株Corby WP_092678546.1;Rosenbergiella nectarea WP_069476513.1;解鳥胺酸拉烏爾菌 fig|1267535.3.peg.2394;Bryobacterales細菌KBS 96 WP_000446053.1;鮑曼不動桿菌 fig|1948587.3.peg.786;γ-變形菌綱細菌UBA1902 WP_014949305.1;麥克萊迪交替單胞菌 fig|1797397.3.peg.2386;蛭弧菌目細菌RIFOXYC1_FULL_54_43 fig|1386968.3.peg.847;土倫病弗朗西斯菌新殺亞種PA10-7858 WP_074900850.1; fig|1975705.3.peg.898;嗜冷桿菌屬FDAARGOS_221 WP_066184577.1;弓形桿菌 fig|1780380.4.peg.4010;真桿菌科細菌CHKCI004 fig|556261.3.peg.2546;梭菌屬D5 fig|1193534.6.peg.2375;未培養之梭桿菌屬 fig|1042163.3.peg.3771;側孢短芽孢桿菌LMG 15441 WP_062492190.1;類芽孢桿菌屬32O-W WP_081674606.1;哈爾濱乳桿菌 WP_050781686.1;棒狀乳桿菌 WP_021109137.1;屎腸球菌 WP_046309803.1;葡萄球菌 CBL03706.1;帕梅拉戈登氏桿菌7-10-1-b WP_090944285.1;Pelosinus propionicus WP_077305443.1;拜氏梭菌 fig|410072.5.peg.40;陪伴糞球菌 WP_011669870.1;波帕特森鉤端螺旋體 WP_015565235.1;普氏糞桿菌 CUO23478.1;普氏糞桿菌 WP_085748688.1;嗜膠根瘤菌 WP_093270014.1;冷桿菌屬OK032 SHE86352.1;Atopostipes suicloacalis DSM 15692 WP_000346292.1;未分類鏈球菌 WP_080465410.1;植物乳桿菌 WP_080662531.1;短乳桿菌 WP_093131554.1;康氏鹽漬芽孢桿菌 WP_093336905.1;耐鹽鹽地桿菌 fig|1974627.3.peg.386;候選列維菌門細菌CG_4_9_14_0_2_um_filter_35_21 fig|1802603.3.peg.453;候選Woykebacteria細菌RIFCSPHIGHO2_12_FULL_ 45_10 fig|392734.5.peg.3006;玫瑰色庫克菌 AGL61879.1;候選Saccharimonas aalborgensis fig|319224.16.peg.2726;腐敗希瓦氏菌CN-32 fig|1720343.3.peg.1263;假交替單胞菌屬1_2015MBL_MicDiv fig|1136158.3.peg.3691;嗜環弧菌1F97 fig|666.3017.peg.1000;霍亂弧菌 fig|1909458.3.peg.2277;鹽弧菌屬ML198 fig|1638949.3.peg.831;弧菌屬ECSMB14106 fig|493915.3.peg.158;假交替單胞菌屬NJ631 BAC94535.1;創傷弧菌YJ016 fig|1191313.3.peg.1135;燦爛弧菌1S-124 fig|670.1244.peg.3807;副溶血弧菌 fig|1659714.3.peg.4264;布氏檸檬酸桿菌 fig|1192730.4.peg.1976;腸道沙門氏菌腸亞種Kintambo血清型 fig|550.1216.peg.4296;陰溝腸桿菌 WP_072269713.1;沙雷氏菌 WP_053898075.1;大腸桿菌 fig|624.1264.peg.1635;宋內志賀桿菌 fig|1181777.3.peg.78;大腸桿菌KTE233 fig|1802256.3.peg.310;硫單胞菌屬RIFOXYB12_FULL_35_9 PHR73342.1;弓形桿菌屬 fig|2014260.3.peg.3813;細菌(候選Blackallbacteria) CG13_big_fil_rev_8_21_14_ 2_50_49_14 WP_042497590.1;海生弧菌 WP_063522799.1;弧菌屬HI00D65 WP_004186757.1;腸桿菌科 WP_040122746.1;弧菌 WP_086046550.1;哈維弧菌群 WP_063849005.1;陰溝腸桿菌 WP_023486614.1;腸桿菌科 WP_070992278.1;邊山假交替單胞菌屬 fig|1005665.3.peg.2532;科薩克氏菌oryzendophytica fig|1219066.3.peg.3636;副溶血弧菌NBRC 12711 fig|1225184.4.peg.1222;泛菌屬A4 fig|675814.3.peg.1256;珊瑚弧菌ATCC BAA-450 SFR59865.1;假丁酸弧菌屬NOR37 fig|853.16.peg.1112;普氏糞桿菌 fig|1965572.3.peg.1423;假梭桿菌屬An176 fig|588581.3.peg.3589;解丘疹瘤胃梭菌DSM 2782 fig|1396.1409.peg.4169;蠟樣芽孢桿菌 fig|1428.538.peg.4047;蘇雲金芽孢桿菌 fig|1465.16.peg.946;側孢短芽孢桿菌 WP_087385137.1; AIF42417.1;枝芽孢桿菌屬SK37 WP_076543941.1;康氏嗜鹽厭氧菌 fig|1121093.3.peg.3089;參田芽孢桿菌DSM 19096 fig|29367.3.peg.2029;石榴梭菌 WP_089719707.1;剛果嗜鹽厭氧菌 fig|307249.3.peg.3585;未培養之鼠孢菌屬 WP_072949666.1;黃色瘤胃球菌 CCX81854.1;瘤胃球菌屬CAG:108 fig|1491.669.peg.2217;肉毒梭菌 fig|1872455.3.peg.401;嗜鹼菌屬 fig|576117.5.peg.4005;嗜鹽速生桿菌 fig|1225647.3.peg.1829;褐桿菌屬11ANDIMAR09 fig|1380380.4.peg.1574;阿倫斯氏菌屬13_GOM-1096m fig|293.7.peg.2956;缺陷短波單胞菌 WP_095437634.1;根瘤菌屬11515TR fig|1912891.7.peg.702;鞘脂菌屬 fig|1736574.3.peg.4024;假黃單胞菌屬Root630 fig|227946.13.peg.4105;半透明黃單胞菌poae致病變種 fig|1761791.3.peg.4793;溶桿菌屬yr284 fig|1560195.5.peg.485;詹森桿菌屬BJB301 fig|1503054.43.peg.6257;停滯伯克霍爾德菌 fig|1207504.10.peg.4279;假多噬伯克霍爾德菌 WP_092172515.1;未分類之假單胞菌 WP_074815429.1;丁香假單胞菌 ffig|150146.3.peg.3162;吉利黃桿菌 fig|76832.8.peg.3775;擬香味類香味菌 fig|1202724.3.peg.994;棲阿卡亞黃桿菌 fig|1805473.3.peg.3678;金黃桿菌timonianum fig|253.33.peg.3826;產吲哚金黃桿菌 WP_076561634.1;惰性金黃桿菌 fig|2024823.3.peg.95;Altibacter屬 fig|1250278.4.peg.3462;需鹽桿菌屬Hel_I_6 fig|1797342.3.peg.689;擬桿菌門細菌GWF2_33_38 WP_084184261.1;解尿金黃桿菌 fig|1948560.3.peg.3003;δ-變形菌綱細菌UBA6106 fig|1392.364.peg.2564;炭疽芽孢桿菌 fig|872970.3.peg.1713;海洋兩棲桿菌 fig|1385514.3.peg.313;鹽城海芽孢桿菌Y32 fig|76853.4.peg.2614;銀芽孢桿菌 fig|1423774.3.peg.1262;南特港乳桿菌DSM 16982 fig|1410670.3.peg.2844;黃色瘤胃球菌MA2007 fig|169435.7.peg.1348;厭氧棍狀菌 fig|1946597.3.peg.2104;Hungatella屬UBA4568 fig|1948087.3.peg.796;厚壁菌門細菌UBA6113 fig|642492.3.peg.2638;Cellulosilyticum lentocellum DSM 5427 fig|1950841.3.peg.2383;梭菌目細菌UBA2436 fig|555512.3.peg.1251;海洋Salipiger fig|383381.3.peg.2538;紅桿菌屬JL475 WP_081629462.1; fig|1736258.3.peg.3392;甲基桿菌屬Leaf112 fig|1950192.3.peg.426;厭氧繩菌目細菌UBA2232 fig|170623.6.peg.4661;拜氏固氮菌 fig|170623.7.peg.704;拜氏固氮菌 fig|1981099.3.peg.513;Niveispirillum lacus fig|1250539.3.peg.3491;深淵海橄欖菌 fig|1947582.3.peg.2979;硫酸桿菌屬UBA1132 fig|1909294.17.peg.3456;根瘤菌目細菌 fig|1735583.3.peg.1657;假弧菌屬W64 fig|670.1220.peg.4688;副溶血弧菌 fig|1004786.3.peg.925;地中海交替單胞菌DE1 fig|2013797.3.peg.2109;γ-變形菌綱細菌HGW-γ-變形菌綱-15 fig|1948580.3.peg.3400;γ-變形菌綱細菌UBA1012 fig|1714300.3.peg.306;深海海桿菌 fig|1961547.3.peg.1371;脫硫桿菌科細菌UBA2273 fig|441162.10.peg.6621;俄克拉荷馬伯克霍爾德菌C6786 fig|615.307.peg.4666;黏質沙雷氏菌 fig|631.3.peg.1883;中間耶氏桿菌 fig|1763535.3.peg.1547;氫噬胞菌crassostreae fig|43263.5.peg.2702;產鹼假單胞菌 fig|244366.46.peg.3595;水痘克雷伯菌 fig|1224150.8.peg.3856;香蕉迪基氏菌NCPPB 2511 fig|61645.10.peg.2019;阿氏腸桿菌 fig|1948706.3.peg.2225;豐祐菌綱細菌UBA1333 fig|2026771.13.peg.1697;豐祐菌綱細菌 fig|2026771.11.peg.1955;豐祐菌綱細菌 fig|2026772.5.peg.424;豐祐菌綱細菌 fig|2026801.20.peg.1798;疣微菌目細菌 fig|2026801.14.peg.1176;疣微菌目細菌 fig|1951369.3.peg.1157;阿克曼氏菌科細菌UBA6946 fig|1977087.12.peg.1918;變形菌門細菌 fig|2026779.14.peg.4171;浮黴菌科細菌 fig|2026779.28.peg.3264;浮黴菌科細菌 fig|2026779.30.peg.3181;浮黴菌科細菌 fig|2026779.29.peg.2310;浮黴菌科細菌 fig|1797235.3.peg.3;不動桿菌屬RIFCSPHIGHO2_12_41_5 fig|316.284.peg.937;施氏假單胞菌 fig|296.11.peg.442;脆弱假單胞菌 fig|1981714.3.peg.993;假單胞菌屬B5(2017) fig|50340.44.peg.6020;褐鞘假單胞菌 fig|1761897.3.peg.509;假單胞菌屬ok272 fig|1402514.3.peg.154;綠膿桿菌BWHPSA014 fig|1938440.3.peg.5997;假單胞菌屬T fig|1566250.3.peg.959;假單胞菌屬NFACC02 fig|316.357.peg.479;施氏假單胞菌 fig|287.4433.peg.2945;綠膿桿菌 fig|1970515.3.peg.709;嗜氫菌目細菌12-61-10 fig|95486.85.peg.1748;新洋蔥伯克霍爾德菌 fig|292.61.peg.8104;洋蔥伯克霍爾德菌 fig|1408450.3.peg.3766;甲基桿菌tundripaludum 21/22 fig|157910.3.peg.5727;隆起副伯克霍爾德 ig|279058.16.peg.4239;山崗單孢菌arenae fig|1537272.3.peg.1916;詹森桿菌屬HH100 fig|1218081.3.peg.1751;久留里副伯克霍爾德硫氧化亞種NBRC 107107 fig|573.14059.peg.3113;肺炎克雷伯菌 fig|40324.192.peg.51;嗜麥芽寡養單胞菌 fig|1219041.3.peg.4613;固氮鞘氨醇單胞菌NBRC 15497 fig|1561196.3.peg.560;伯克霍爾德菌屬E7m39 fig|1882750.3.peg.1035;伯克霍爾德菌屬GAS332 fig|1736266.3.peg.1145;杜擀氏菌屬Leaf126 fig|2015350.3.peg.1640;伯克霍爾德菌屬AU18528 fig|58133.4.peg.815;亞硝化螺菌屬NpAV fig|1691980.3.peg.1912;紅環菌科細菌Paddy-1 fig|305.393.peg.1023;青枯羅爾斯頓氏菌 fig|56449.3.peg.3604;溴黃單胞菌 fig|1281282.5.peg.1894;野油菜黃單胞菌野油菜致病變種菌株CN14 fig|40324.334.peg.1103;嗜麥芽寡養單胞菌 fig|1349793.3.peg.2529;螺紋氫噬胞菌NBRC 102512 fig|1842727.3.peg.1491;韓國紅育菌 fig|1619952.3.peg.5158;伯克霍爾德菌科細菌16 fig|1970380.3.peg.1914;鹽生硫桿菌屬14-55-98 fig|2015568.3.peg.2963;伯克霍爾德菌PBB6 fig|1752215.3.peg.2312;γ-變形菌綱細菌Ga0077554 fig|1706231.5.peg.3125;詹森桿菌屬CG23_2 fig|2013716.3.peg.2169;β-變形菌門細菌HGW-β-變形菌-4 fig|1946997.3.peg.3049;硝化螺旋菌屬UBA7655 fig|765913.3.peg.2527;德氏硫紅球菌AZ1 fig|1743159.3.peg.1891;揚子多核桿菌 fig|1597955.3.peg.3923;棲湖菌屬DM1 fig|1184267.3.peg.1626;外食蛭弧菌JSS fig|101571.190.peg.3007;烏博伯克霍爾德菌 fig|123899.5.peg.1710;創口博德氏菌 fig|463035.3.peg.3900;博德氏菌基因種12 fig|1395608.4.peg.211;博德氏菌基因種5 fig|1947379.3.peg.2784;紅育菌屬UBA5149 WP_074294985.1;副伯克霍爾德菌吩嗪鎓 fig|1324617.3.peg.820;副伯克霍爾德aspalathi fig|80868.3.peg.3458;卡特氏噬酸菌 fig|1388764.3.peg.1840;氧化亞鐵假高炳根氏菌EGD-HP2 fig|251747.15.peg.4695;鐵杉下色桿菌 fig|670.1020.peg.382;副溶血弧菌 fig|1055803.3.peg.1434;假交替單胞菌屬TB51 fig|1201036.3.peg.177;假蒼白桿菌屬AO18b fig|1220581.4.peg.1434;髮根農桿菌NBRC 13257 fig|398.6.peg.6695;熱帶根瘤菌 fig|931866.6.peg.8184;渥太華慢生根瘤菌 fig|142585.3.peg.1658;慢生根瘤菌屬C9 fig|1082933.13.peg.1537;紫穗槐中根瘤菌CCNWGS0123 fig|1768789.3.peg.791;甲基桿菌屬CCH7-A2 fig|1381123.3.peg.3819;Aliihoeflea屬2WW fig|1297570.3.peg.1970;中根瘤菌屬STM 4661 fig|935546.3.peg.3816;百脈根中根瘤菌NZP2037 fig|1128253.3.peg.1960;日本慢生根瘤菌CCBAU 15354 fig|1444315.4.peg.3983;辣椒溶桿菌AZ78 fig|1185327.3.peg.1608;地毯草黃單胞菌木薯萎蔫致病變種菌株Xam668 fig|1881043.3.peg.2597;假黃單胞菌屬GM95 ALN84423.1;辣椒溶桿菌 fig|56460.15.peg.1977;黃單胞菌vesicatoria fig|1317116.6.peg.2759;海棲菌屬22II-s10i fig|564137.3.peg.4320;南極玫瑰色檸檬形菌 fig|1952800.3.peg.3583;紅細菌科細菌UBA2553 fig|218673.12.peg.3041;可疑硫酸桿菌 fig|1912092.3.peg.2119;沈積物國家海洋研究所菌 fig|1736558.3.peg.5006;劍菌屬Root558 fig|91360.5.peg.3717;新加坡脫硫棍棒形菌 fig|1948756.3.peg.2576;螺旋體綱細菌UBA2205 fig|1855322.3.peg.103;慢生根瘤菌屬Rc3b fig|1437360.11.peg.2429;紅斑慢生根瘤菌 fig|1871052.3.peg.1026;阿菲波菌屬 fig|1038860.3.peg.8756;埃氏慢生根瘤菌WSM2783 fig|1898112.54.peg.3758;紅螺菌科細菌 fig|1660129.3.peg.4854;苯桿菌屬SCN 70-31 fig|1482074.3.peg.4109;固氮哈特曼尼桿菌 fig|1970306.3.peg.552;酸桿菌屬35-58-6 fig|1686310.5.peg.1409;巴爾通體apis fig|1798192.3.peg.1953;海螺屬KO164 fig|1235461.17.peg.11;苜蓿中華根瘤菌GR4 fig|442.12.peg.222;氧化葡糖桿菌 fig|1938607.3.peg.1954;鞘氨醇單胞菌屬LM7 fig|1231624.3.peg.39;茂物朝井氏菌NBRC 16594 fig|1121271.3.peg.4112;蜜環菌DSM 15620 fig|33059.16.peg.1690;喜溫酸硫桿菌 fig|502025.10.peg.925;赭黃嗜鹽囊菌DSM 14365 fig|1734406.3.peg.691;α-變形菌門細菌BRH_c36 fig|1979207.3.peg.4304;短小盒菌屬 fig|1953057.3.peg.74;短小盒菌科細菌UBA4496 fig|858423.3.peg.10004;花生慢生根瘤菌 fig|267128.3.peg.2015;顆粒鞘脂單胞菌 fig|582667.3.peg.5553;擬沙西科拉甲基桿菌 fig|1187852.3.peg.2712;塔哈尼亞甲基桿菌 fig|582675.3.peg.1247;戈西皮科拉甲基桿菌 fig|1951640.3.peg.515;脫鐵桿菌科細菌UBA6799 fig|1948417.4.peg.1606;α-變形菌門細菌UBA6187 fig|45074.5.peg.981;聖十字軍團菌 fig|1434232.4.peg.2927;Magnetofaba australis IT-1 fig|1945950.3.peg.3568;不動桿菌屬UBA6526 fig|106654.22.peg.994;醫院不動桿菌 fig|1977883.3.peg.3023;不動桿菌屬ANC 3903 fig|1945948.3.peg.700;不動桿菌屬UBA5984 fig|1226327.3.peg.2796;庫奇不動桿菌 fig|1879049.4.peg.5949;不動桿菌屬WCHAc010034 fig|1945955.3.peg.1951;不動桿菌屬UBA7614 fig|1675530.3.peg.2149;不動桿菌基因種33YU fig|1310638.3.peg.1006;鮑曼不動桿菌1437282 fig|1400001.4.peg.34;馬賽壞死桿菌 fig|1132496.5.peg.136;多殺性巴斯德氏菌多殺亞種菌株HN06 fig|1908263.4.peg.2604;海藻糖囓齒菌 fig|375432.4.peg.200;流感嗜血桿菌R3021 fig|400668.8.peg.3776;海單胞菌屬MWYL1 fig|1913989.193.peg.841;γ-變形菌綱細菌 fig|856793.5.peg.1975;銅綠微弧菌ARL-13 SBW23286.1;歐洲檸檬酸桿菌 fig|1736225.3.peg.985;伊文氏桿菌屬Leaf53 fig|29486.12.peg.818;魯氏耶氏桿菌 fig|914128.3.peg.2502;共生沙雷氏菌菌株Tucson fig|1796497.3.peg.952;Grimontia celer fig|1095649.3.peg.3298;霍亂弧菌O1菌株EM-1676A fig|137584.4.peg.1627;深海單胞菌viridans fig|173990.3.peg.1773;太平洋萊茵黑姆氏菌 fig|1720343.3.peg.3189;假交替單胞菌屬1_2015MBL_MicDiv fig|1202962.4.peg.1481;海洋摩替亞氏菌ATCC 15381 fig|669.50.peg.2993;哈維弧菌 fig|691.32.peg.1517;納特里根弧菌 fig|156578.3.peg.2521;交替單胞菌目細菌TW-7 fig|661.14.peg.380;狹光桿菌 fig|654.94.peg.1733;維羅尼氣單胞菌 fig|703.9.peg.319;志賀鄰單胞菌 fig|589873.36.peg.1971;澳洲交替單胞菌 fig|28107.3.peg.3571;埃斯佩吉亞假交替單胞菌 fig|1547444.3.peg.4264;假交替單胞菌屬PLSV fig|629266.7.peg.847;丁香假單胞菌獼猴桃致病變種菌株M302091 fig|251722.19.peg.4059;杏仁假單胞菌aesculi致病變種 fig|587851.4.peg.1470;綠針假單胞菌產金色亞種 fig|1265490.3.peg.2330;假單胞菌屬URMO17WK12:I8 fig|316.101.peg.3534;施氏假單胞菌 fig|1916993.3.peg.4917;惡臭假單胞菌 fig|1628833.3.peg.2448;假單胞菌屬ES3-33 fig|1283291.4.peg.1991;假單胞菌屬URMO17WK12:I11 fig|83963.5.peg.3885;丁香假單胞桿菌皰疹致病變種 fig|1206777.3.peg.4334;假單胞菌屬Lz4W fig|113268.3.peg.3785;深海貽貝甲烷營養鰓共生體 fig|1131284.3.peg.1562;ζ-變形菌SCGC AB-137-C09 fig|2026807.7.peg.2258;ζ-變形菌門細菌 fig|281689.4.peg.2060;乙醯氧化脫硫單胞菌DSM 684 fig|1188231.4.peg.1200;氧化亞鐵深海菌M34 fig|1367489.3.peg.682;費氏阿里弧菌SA1G fig|1873135.3.peg.4249;希瓦氏菌屬SACH fig|663.73.peg.2465;溶藻弧菌 fig|1588629.3.peg.1134;氣單胞菌屬L_1B5_3 fig|1121922.3.peg.3454;Glaciecola pallidula DSM 14239 = ACAM 615 fig|351745.9.peg.2506;希瓦氏菌屬W3-18-1 fig|29497.20.peg.3798;燦爛弧菌 fig|1367486.3.peg.187;費氏阿里弧菌CB37 fig|511062.4.peg.1890;海洋單胞菌屬GK1 fig|654.12.peg.188;維羅尼氣單胞菌 fig|29497.21.peg.4482;燦爛弧菌 fig|1659713.3.peg.560;布安登腸桿菌 fig|1124991.3.peg.3617;摩氏摩根菌摩氏亞種KT fig|104623.3.peg.1381;沙雷氏菌屬ATCC 39006 fig|1256989.3.peg.902;產鹼普羅威登斯菌R90-1475 fig|1125694.3.peg.1143;奇異變形桿菌WGLW6 fig|574096.6.peg.2693;大蒜泛菌 fig|1095774.3.peg.2623;鳳梨泛菌PA13 fig|869692.4.peg.2910;大腸桿菌3003 WP_140159440.1;大腸桿菌逆轉錄子-Eco2 (Ec67) fig|550.437.peg.1444;陰溝腸桿菌 fig|573.13605.peg.2600;肺炎克雷伯菌 fig|550.285.peg.3783;陰溝腸桿菌 fig|1265672.3.peg.3869;腸道沙門氏菌腸亞種阿戈納血清型菌株 70.E.05 fig|573.10028.peg.542;肺炎克雷伯菌 fig|749537.3.peg.218;大腸桿菌MS 115-1 ANK06786.1;大腸桿菌O25b:H4 fig|670.880.peg.975;副溶血弧菌 fig|1192730.4.peg.3;腸道沙門氏菌腸亞種Kintambo血清型 fig|1224144.4.peg.4030;迪基氏菌屬CSL RW240 fig|568766.10.peg.2937;迪基氏菌屬NCPPB 3274 fig|1076549.3.peg.4260;羅達泛菌 fig|548.102.peg.3401;產氣克雷伯菌 fig|630.90.peg.1795;小腸結腸炎耶氏桿菌 fig|79883.5.peg.266;堀越氏芽孢桿菌 fig|180861.3.peg.3762;蘇雲金芽孢桿菌住吉血清型 fig|1390.157.peg.339;解澱粉芽孢桿菌 fig|293386.15.peg.304;平流層芽孢桿菌 fig|1053181.3.peg.3820;蠟樣芽孢桿菌BAG2X1-3 fig|1884375.3.peg.681;類芽孢桿菌屬PDC88 fig|334735.5.peg.923;韓國芽孢八疊球菌 fig|79884.3.peg.1120;假嗜鹼芽孢桿菌 fig|1628206.3.peg.4802;芽孢桿菌屬LK2 fig|1396.1605.peg.6235;蠟樣芽孢桿菌 fig|182710.3.peg.317;伊海海洋桿菌 fig|860.10.peg.486;牙周梭桿菌 fig|1855308.3.peg.1467;伊利斯毛球菌 fig|931626.3.peg.151;伍氏醋桿菌DSM 1030 fig|1965575.3.peg.2547;腸道核心菌屬An181 fig|1352.2757.peg.71;屎腸球菌 fig|1299895.3.peg.900;單核細胞增生李斯特菌CFSAN002349 fig|53346.29.peg.1591;芒氏腸球菌 fig|1649188.10.peg.1545;印度李斯特菌 fig|158847.6.peg.432;超級巨單胞菌 fig|1121289.3.peg.2775;少食Clostridiisalibacter DSM 22131 fig|1950885.3.peg.858;梭菌目細菌UBA4693 fig|1965576.3.peg.1978;假梭桿菌屬An184 fig|1952416.3.peg.1629;瘤胃球菌科細菌UBA642 fig|1262803.3.peg.8;梭菌屬CAG:413 fig|28037.216.peg.60;輕症鏈球菌 fig|1074052.3.peg.33;遠緣鏈球菌TCI-9 fig|1304.207.peg.1536;唾液鏈球菌 fig|1154859.3.peg.955;無乳鏈球菌LMG 14609 fig|1080071.3.peg.332;奧裡薩斯鏈球菌 fig|1139219.3.peg.2194;迪斯帕腸球菌ATCC 51266 fig|1834176.3.peg.811;腸球菌屬3G1_DIV0629 fig|1622.15.peg.947;鼠乳桿菌 fig|565651.6.peg.1942;糞腸球菌ARO1/DG fig|1473546.3.peg.703;離胺酸芽孢桿菌屬BF-4 fig|37734.13.peg.137;卡氏腸球菌 fig|492670.92.peg.623;貝萊斯芽孢桿菌 fig|1639.1907.peg.2641;單核細胞增生李斯特菌 fig|1123489.3.peg.170;大型韋榮氏球菌DSM 19857 fig|1280687.3.peg.1880;溶纖維丁酸弧菌YRB2005 fig|1262889.3.peg.680;真桿菌屬CAG:38 fig|1235800.3.peg.2226;毛螺菌科細菌10-1 fig|1897035.3.peg.445;厚壁菌門細菌CAG:552_39_19 fig|199.588.peg.774;簡潔彎曲桿菌 fig|1111133.4.peg.219;嗜腖菌屬BV3AC2 fig|936589.3.peg.875;韋榮氏球菌屬AS16 WP_070600378.1; fig|1896998.3.peg.1750;糞球菌屬CAG:131相關_45_246 fig|41170.3.peg.3013;乙醯微小桿菌 fig|59620.44.peg.897;未培養之梭菌屬 fig|1262843.3.peg.313;梭菌屬CAG:813 fig|1262834.3.peg.1287;梭菌屬CAG:715 fig|1256219.3.peg.760;副乾酪乳桿菌副乾酪亞種Lpp230 fig|115778.31.peg.1994;冷明串珠菌伴氣亞種 fig|29385.174.peg.531;腐生葡萄球菌 fig|1295.21.peg.75;施萊費葡萄球菌 fig|148814.13.peg.1360;昆基乳桿菌 fig|1282.1242.peg.673;表皮葡萄球菌 fig|1581078.3.peg.1186;葡萄球菌屬HMSC10C03 fig|1891097.3.peg.280;格氏大球菌 WP_080703103.1; fig|1214184.3.peg.1129;豬鏈球菌22083 fig|1154771.3.peg.209;無乳鏈球菌FSL C1-487 fig|1415765.3.peg.1578;輕症鏈球菌21/39 fig|1581074.3.peg.720;顆粒鏈菌屬HMSC31F03 fig|1349.233.peg.712;乳房鏈球菌 fig|1946281.3.peg.392;卡塔桿菌屬UBA5893 fig|1328309.5.peg.1889;植物乳桿菌IPLA88 fig|1214190.3.peg.2034;豬鏈球菌YS17 fig|29385.135.peg.2098;腐生葡萄球菌 fig|1715184.3.peg.1265;氣球菌屬HMSC035B07 fig|1881068.3.peg.2940;鞘氨醇單胞菌屬OV641 fig|1522072.3.peg.3829;鞘脂菌屬ba1 fig|1802172.3.peg.237;鞘脂單胞菌屬RIFCSPHIGHO2_12_FULL_65_19 fig|1128204.3.peg.2189;埃氏慢生根瘤菌CCBAU 43297 fig|1708715.5.peg.4517;劍菌aridi fig|195105.3.peg.2062;馬賽血液桿菌 fig|1283312.3.peg.4182;維氏鞘氨醇單胞菌DC-6 fig|1120654.4.peg.406;劍菌屬LC499 fig|529.36.peg.3144;人蒼白桿菌 fig|1194716.3.peg.4774;苜蓿中華根瘤菌AK75 fig|1660088.4.peg.2967;農桿菌屬SCN 61-19 fig|1951259.3.peg.2515;鞘脂單胞菌目細菌UBA6174 fig|1912891.5.peg.2102;鞘脂菌屬 fig|1670800.3.peg.1844;海洋中根瘤菌 fig|2032658.3.peg.157;α-變形菌門細菌WMHbin7 fig|1819565.5.peg.2208;海洋Flavimaricola fig|1245469.3.peg.1160;寡養慢生根瘤菌S58 fig|1615890.4.peg.173;慢生根瘤菌屬LTSP849 fig|56454.3.peg.3464;霍托魯姆黃單胞菌 fig|40324.384.peg.1060;嗜麥芽寡養單胞菌 fig|1801972.3.peg.1832;浮黴菌門細菌RBG_19FT_COMBO_48_8 fig|1978765.3.peg.3488;硝化螺旋菌屬ST-bin5 fig|2009322.3.peg.2770;細鞘絲藻屬ohadii IS1 fig|1325564.3.peg.3733;日本硝化螺旋菌 fig|43662.9.peg.1688;殺魚假交替單胞菌 fig|670.134.peg.4439;副溶血弧菌 fig|998520.3.peg.3325;食瓊膠假交替單胞菌 fig|1723759.3.peg.401;假交替單胞菌屬P1-26 fig|672.133.peg.585;創傷弧菌 fig|1324960.19.peg.585;殺鮭氣單胞菌溶果膠亞種34mel fig|196024.16.peg.3965;嗜水氣單胞菌 fig|654.27.peg.4266;維羅尼氣單胞菌 fig|1802253.3.peg.1045;硫單胞菌屬RIFCSPLOWO2_12_36_12 fig|636.16.peg.3905;遲緩愛德華氏菌 fig|1124958.3.peg.5012;腸道沙門氏菌腸亞種Muenster血清型菌株0315 fig|573.10007.peg.225;肺炎克雷伯菌 fig|1946737.3.peg.4002;勒克氏菌屬UBA1284 fig|1398203.3.peg.3712;牛致病菌菌株kraussei Quebec fig|615.247.peg.2151;黏質沙雷氏菌 fig|52441.3.peg.3752;河口亞硝化單胞菌 fig|1951948.3.peg.242;生絲單胞菌科細菌UBA2389 fig|165186.29.peg.27;未培養之瘤胃球菌屬 fig|2013842.3.peg.1881;互養菌門細菌HGW-Synergistetes-1 fig|411484.7.peg.436;梭菌屬SS2/1 fig|460384.4.peg.447;拉瓦倫腸道梭狀菌 fig|1761781.3.peg.2961;梭菌屬DSM 8431 fig|1451.25.peg.614;解澱粉類芽孢桿菌 fig|1776378.3.peg.2009;福岡類芽孢桿菌 fig|1866315.3.peg.2122;芽孢桿菌屬N35-10-4 fig|1034836.4.peg.4077;解澱粉芽孢桿菌XH7 fig|1397.14.peg.5097;環狀芽孢桿菌 fig|1497681.5.peg.772;紐約李斯特菌 fig|1053224.3.peg.4333;蠟樣芽孢桿菌VD021 fig|1374.4.peg.2798;庫庫里平球菌 fig|458233.11.peg.419;解酪大球菌JCSC5402 fig|417368.6.peg.944;泰國腸球菌 fig|1353.16.peg.736;鶉雞腸球菌 fig|1639.1307.peg.2578;單核細胞增生李斯特菌 fig|1649188.4.peg.450;印度李斯特菌 fig|333990.5.peg.1279;肉桿菌屬AT7 fig|1121085.3.peg.4805;艾丁芽孢桿菌DSM 18341 fig|659243.6.peg.1163;暹羅芽孢桿菌 fig|1965645.3.peg.1428;另枝菌屬An54 fig|1950664.3.peg.363;擬桿菌門細菌UBA5918 fig|681398.3.peg.1596;江西帕魯迪桿菌 fig|1947481.3.peg.1596;鞘氨醇桿菌屬UBA1498 fig|1946424.3.peg.2345;Dysgonomonas屬UBA4861 fig|188932.3.peg.968;低溫土地桿菌 fig|505249.7.peg.1802;海洋弓形桿菌 fig|1802259.3.peg.374;硫單胞菌屬RIFOXYD12_FULL_33_39 fig|1872629.13.peg.663;弓形桿菌屬 fig|497650.4.peg.949;硫磺菌屬富集培養純系C5 fig|1981711.3.peg.707;假單胞菌屬B8(2017) fig|287.926.peg.3808;綠膿桿菌 fig|157782.3.peg.183;副黃假單胞菌 fig|1225174.5.peg.576;門多薩假單胞菌S5.2 fig|237610.8.peg.4301;耐冷假單胞菌 fig|1116369.3.peg.182;赫夫勒氏菌屬108 WP_080858354.1; fig|1679460.3.peg.2715;深海海洋小桿菌 fig|1811547.3.peg.510;海棍狀菌屬REDSEA-S28_B5 fig|93684.8.peg.518;耐鹽玫瑰色鮮艷菌 EMZ69714.1;大腸桿菌174900 fig|103796.87.peg.3165;丁香假單胞菌獼猴桃致病變種 WP_078828851.1;鳳梨泛菌 fig|2018067.3.peg.1734;假單胞菌屬FDAARGOS_380 fig|294.255.peg.5151;螢光假單胞菌 fig|287.4271.peg.5445;綠膿桿菌 fig|46677.3.peg.3237;傘菊假單胞菌 fig|83964.10.peg.849;冠狀假單胞菌porri致病變種 fig|1932113.4.peg.2793;假單胞菌屬PA1(2017) fig|1712677.3.peg.189;假單胞菌屬2822-15 fig|1479235.3.peg.2741;鹵單胞菌屬HL-48 fig|227946.12.peg.3247;半透明黃單胞菌poae致病變種 fig|40324.220.peg.2801;嗜麥芽寡養單胞菌 fig|227946.13.peg.35;半透明黃單胞菌poae致病變種 fig|487909.15.peg.4212;半透明黃單胞菌波形致病變種 fig|40324.145.peg.2120;嗜麥芽寡養單胞菌 fig|1182783.3.peg.8;野油菜黃單胞菌JX fig|1736581.3.peg.4144;溶桿菌屬Root667 fig|470.4256.peg.2128;鮑曼不動桿菌 fig|1804984.3.peg.4735;伯克霍爾德菌屬OLGA172 fig|1882792.3.peg.5959;伯克霍爾德菌屬CF145 fig|1458357.5.peg.7849;Caballeronia jiangsuensis fig|674703.3.peg.3992;紅游動菌屬Z2-YC6860 fig|1230476.3.peg.595;慢生根瘤菌屬DFCI-1 fig|1752222.3.peg.1730;根瘤菌目細菌Ga0077525 fig|1948848.3.peg.320;髕骨細菌群細菌UBA6220 fig|1860092.3.peg.3966;α-變形菌門細菌MedPE-SWcel fig|398.7.peg.3301;熱帶根瘤菌 fig|418630.3.peg.960;巨紅桿菌 fig|56.40.peg.5712;纖維素堆囊菌 fig|1660160.3.peg.2510;酸桿菌細菌SCN 69-37 fig|1661042.3.peg.2224;假單胞菌屬NBRC 111127 fig|1712678.3.peg.4198;假單胞菌屬2822-17 fig|1736561.3.peg.128;假單胞菌屬Root562 fig|76760.8.peg.1730;羅德西亞假單胞菌 fig|1295133.4.peg.7170;惡臭假單胞菌JCM 18798 fig|1718917.3.peg.3132;假單胞菌屬ICMP 460 fig|237306.3.peg.591;丁香假單胞菌桃致病變種 fig|1079060.3.peg.1479;薩瓦斯坦假單胞菌菜豆致病變種1644R fig|1981714.3.peg.1068;假單胞菌屬B5(2017) fig|1419583.3.peg.4516;曼德里假單胞菌PD30 fig|1718918.3.peg.4166;假單胞菌屬ICMP 561 fig|64988.7.peg.76;亞德食烷菌 fig|1961564.3.peg.685;脫硫弧菌科細菌UBA5546 fig|2004648.3.peg.1747;不動桿菌屬WCHA39 fig|1080187.3.peg.399;貪銅菌屬UYPR2.512 fig|76114.8.peg.258;芳香類芳香細菌EbN1 fig|196367.9.peg.6286;Caballeronia sordidicola fig|1217418.3.peg.694;貪銅菌屬HPC(L) fig|1752216.3.peg.4007;亞硝化單胞菌目細菌Ga0074132 fig|1249621.3.peg.3614;貪銅菌屬HMR-1 fig|179879.8.peg.6514;伯克霍爾德菌anthina fig|1246301.3.peg.4482;爭論貪噬菌B4 WP_092746164.1;纈草噬酸菌 fig|536.30.peg.857;紫色色桿菌 fig|1961112.3.peg.115;浮黴菌門細菌UTPLA1 fig|44574.5.peg.4575;普通亞硝化單胞菌 fig|265901.4.peg.190;光桿菌屬J15 fig|80852.21.peg.1289;沃丹斯阿里弧菌 fig|1136159.3.peg.2534;嗜環弧菌1F111 fig|24.6.peg.4539;腐敗希瓦氏菌 fig|888433.3.peg.1974;假交替單胞菌屬GutCa3 fig|196024.6.peg.3651;嗜水氣單胞菌 fig|1352943.3.peg.5028;哈維弧菌E385 WP_088124663.1;霍亂弧菌 fig|29497.15.peg.1220;燦爛弧菌 fig|670.413.peg.1227;副溶血弧菌 fig|584.91.peg.3337;奇異變形桿菌 fig|263819.5.peg.201;阿萊克耶氏桿菌 fig|1656094.3.peg.1449;匯合交替單胞菌 fig|634.5.peg.607;伯氏耶氏桿菌 fig|630.85.peg.4080;小腸結腸炎耶氏桿菌 fig|1212491.3.peg.1847;法洛軍團菌LLAP-10 fig|1498499.3.peg.2812;諾蘭軍團菌 fig|1844092.4.peg.3143;假單胞菌屬8 R 14 fig|1441629.3.peg.2119;菊苣假單胞菌JBC1 WP_092369835.1;硒化假單胞菌 fig|477228.3.peg.2040;施氏假單胞菌TS44 fig|317.249.peg.4299;丁香假單胞菌 fig|1597.16.peg.2055;副乾酪乳桿菌 fig|1184720.6.peg.2708;安徽根瘤菌 fig|1566263.3.peg.185;根瘤菌屬NFR03 fig|1951216.3.peg.1622;根瘤菌目細菌UBA1909 fig|1219052.3.peg.3572;梅毒鞘氨醇單胞菌NBRC 15498 fig|376620.8.peg.122;日本葡糖桿菌 fig|1736587.3.peg.2638;德沃斯氏菌屬Root685 WP_011269850.1;野油菜黃單胞菌 fig|1195246.3.peg.467;野別樣希瓦氏菌BL06 fig|1931276.3.peg.1294;嗜鹽囊菌屬UPWRP_2 fig|1931204.4.peg.12;匯合微生物屬 fig|2052957.3.peg.3497;假紅桿菌屬MZDSW-24AT fig|1952825.3.peg.2772;紅豆科細菌UBA4205 fig|1189622.3.peg.1716;杏仁假單胞菌煙草致病變種菌株6605 fig|294.173.peg.2741;螢光假單胞菌 fig|294.122.peg.3307;螢光假單胞菌 fig|1198456.3.peg.4053;麩關假單胞菌 fig|1855380.3.peg.1951;假單胞菌屬Z003-0.4C(8344-21) fig|1144330.3.peg.3879;假單胞菌屬GM48 fig|86265.3.peg.2799;蒂弗瓦倫假單胞菌 fig|1881035.3.peg.3817;松江菌屬PDC51 fig|511.8.peg.774;糞產鹼菌 fig|1095552.3.peg.2955;藤黃甲基桿菌IMV-B-3098 fig|1690268.3.peg.1172;噬酸菌屬SD340 fig|871652.3.peg.1451;沈積Poseidonocella fig|1946868.3.peg.175;噬甲基菌屬UBA1490 fig|1924940.3.peg.1147;海微生態室菌屬 fig|1912891.7.peg.5370;鞘脂菌屬 fig|1236503.3.peg.1539;桃醋桿菌JCM 25330 fig|1745182.3.peg.1942;副球菌屬MKU1 fig|1112.5.peg.2875;新石卟啉菌 WP_051585410.1;少動鞘氨醇單胞菌 fig|1082931.4.peg.3584;耐鹽遠桿菌B2 fig|1907665.3.peg.5475;農桿菌屬DSM 25558 fig|1841652.4.peg.3782;農桿菌屬13-626 fig|1736312.3.peg.3441;根瘤菌屬Leaf262 fig|1768770.3.peg.4687;柄桿菌屬CCH5-E12 fig|355591.9.peg.1867;維氏海桿菌 fig|1869214.4.peg.1848;萊茵黑姆氏菌屬 fig|1946470.3.peg.3546;紅桿菌屬UBA2510 fig|1860090.3.peg.231;玫瑰桿菌屬MedPE-SWde fig|2020902.8.peg.1814;Ponticaulis屬 fig|940286.3.peg.3612;溫馴駒形桿菌174Bp2 fig|1736380.3.peg.1842;根瘤菌屬Leaf453 fig|665126.3.peg.2283;突柄微菌hirschii fig|2029410.3.peg.1956;中根瘤菌屬WSM4311 WP_003169203.1;缺陷短波單胞菌 fig|1884373.3.peg.3317;中根瘤菌屬YR577 fig|989436.3.peg.3203;假弧菌屬Ad5 fig|1736359.3.peg.3976;根瘤菌屬Leaf386 fig|104102.12.peg.3797;熱帶醋桿菌 fig|1500305.3.peg.4736;根瘤菌屬OK665 fig|1842535.30.peg.6;芽單胞菌屬RAC04 fig|70775.16.peg.91;變形假單胞菌 fig|287.4262.peg.2063;綠膿桿菌 WP_017702484.1;丁香假單胞菌 fig|1357292.3.peg.4700;丁香假單胞菌豌豆致病變種菌株PP1 fig|287.2436.peg.3554;綠膿桿菌 fig|76758.3.peg.4722;東方假單胞菌 fig|1904755.3.peg.3469;假單胞菌屬43NM1 fig|47879.37.peg.693;波紋假單胞菌 fig|1771311.3.peg.1935;假單胞菌屬ATCC PTA-122608 fig|2008975.3.peg.1976;假單胞菌屬Irchel 3E13 fig|1259798.3.peg.1121;假單胞菌屬LAMO17WK12:I2 fig|1736487.3.peg.2103;新草螺菌屬Root189 fig|1706231.5.peg.1557;詹森桿菌屬CG23_2 fig|1804984.3.peg.4700;伯克霍爾德菌屬OLGA172 fig|54067.3.peg.2925;葡萄細菌性疫病菌 fig|40324.357.peg.1426;嗜麥芽寡養單胞菌 fig|1967657.4.peg.913;腸道沙門氏菌腸亞種Telelkebir血清型 fig|573.14330.peg.816;肺炎克雷伯菌 fig|615.357.peg.16;黏質沙雷氏菌 fig|1122616.3.peg.231;拜氏海洋螺菌DSM 7166 fig|314276.4.peg.1389;波羅的海海源菌OS145 fig|1038921.4.peg.2790;綠針假單胞菌產金色亞種30-84 fig|292.72.peg.3280;洋蔥伯克霍爾德菌 fig|1899355.18.peg.947;海洋螺菌科細菌 fig|2015356.3.peg.5401;伯克霍爾德菌屬AU33647 fig|206665.3.peg.1731;海底脫硫菌 fig|1987165.3.peg.2564;鞘脂菌屬GW456-12-10-14-TSB1 fig|1283312.3.peg.2182;維氏鞘氨醇單胞菌DC-6 fig|1223566.3.peg.1810;慢生根瘤菌屬CCGE-LA001 fig|76761.16.peg.744;維羅尼假單胞菌 PIY00499.1;嗜氫菌目細菌CG_4_10_14_3_um_filter_58_23 fig|305.94.peg.4778;青枯羅爾斯頓氏菌 fig|1758178.5.peg.2546;乙醇速生桿菌 fig|1354263.4.peg.2524;副蜂房哈夫尼菌ATCC 29927 fig|1125979.3.peg.1941;根瘤菌屬PDO1-076 fig|1338032.3.peg.3393;副溶血弧菌O1:K33菌株CDC_K4557 fig|1898112.54.peg.3344;紅螺菌科細菌 fig|1432558.3.peg.4265;肺炎克雷伯菌ISC21 fig|333962.3.peg.2767;海氏普羅威登斯菌 fig|60552.10.peg.2414;越南伯克霍爾德菌 WP_011808964.1;艾森氏蠕形桿菌 fig|1844107.4.peg.2966;假單胞菌屬58 R 12 fig|1952916.3.peg.906;互養菌科細菌UBA5549 fig|458817.8.peg.542;哈利法克斯希瓦氏菌HAW-EB4 fig|1674859.3.peg.1291;螺旋體目細菌Spiro_03 fig|1121434.3.peg.22;艾氏鹽脫硫弧菌DSM 10141 fig|1262899.3.peg.286;梭桿菌屬CAG:439 fig|57320.3.peg.1084;深部假脫硫弧菌 fig|1736444.3.peg.3753;不動桿菌屬Root1280 fig|1310670.3.peg.2122;不動桿菌屬907131 fig|505345.6.peg.150;卡氏桿菌基因種3 fig|670.887.peg.4391;副溶血弧菌 fig|196024.16.peg.2750;嗜水氣單胞菌 fig|663.48.peg.1321;溶藻弧菌 fig|28141.133.peg.4717;坂崎克羅諾桿菌 fig|1117315.3.peg.3;游海假交替單胞菌ATCC 14393 fig|1917164.4.peg.2739;希瓦氏菌屬UCD-KL21 fig|2006083.3.peg.3222;光桿菌屬CECT 9192 fig|584.227.peg.2146;奇異變形桿菌 fig|1792834.4.peg.1793;沈積物海胞菌 fig|1333513.3.peg.3775;游海假交替單胞菌TAE56 fig|1305826.3.peg.1246;鏈黴菌屬Amel2xC10 WP_048809063.1;人參微桿菌 fig|1987376.3.peg.4246;假諾卡氏菌屬N23 fig|164115.3.peg.6832;尼維鏈黴菌 fig|285676.33.peg.4994;薩愛利塞斯小單孢菌 SFF52649.1;阿爾尼鏈黴菌 fig|1100822.3.peg.6408;鏈黴菌屬AmelKG-E11A WP_098467790.1; fig|1190417.3.peg.2916;地嗜皮菌telluris SDS16714.1;碳農球菌 fig|692370.5.peg.1108;東灘交替紅桿菌 fig|1736370.3.peg.1383;鞘氨醇單胞菌屬Leaf412 fig|1759074.3.peg.2615;鞘脂單胞菌屬HIX fig|1120928.3.peg.922;薑氏不動桿菌DSM 14971 = CIP 107465 fig|470.4268.peg.2032;鮑曼不動桿菌 fig|1217627.3.peg.995;鮑曼不動桿菌NIPH 67 fig|28450.149.peg.5786;鼻疽伯克霍爾德菌 fig|2032650.3.peg.3543;磁球菌目細菌HCHbin5 fig|101571.169.peg.3909;烏博伯克霍爾德菌 fig|396597.7.peg.2128;雙向伯克霍爾德菌MEX-5 fig|869212.3.peg.3514;Turneriella parva DSM 21527 fig|1196083.80.peg.1885;阿爾維斯諾德格拉斯菌 fig|1304886.3.peg.1257;脫硫棒狀菌屬細菌DSM 7044 fig|555.16.peg.1362;胡蘿蔔果膠桿菌胡蘿蔔亞種 fig|1421338.3.peg.151;阿氏腸桿菌L1 fig|443144.3.peg.785;地桿菌屬M21 fig|1265503.3.peg.1271;科爾韋爾氏菌piezophila ATCC BAA-637 fig|55601.100.peg.1601;鰻弧菌 fig|299766.9.peg.4890;霍氏腸桿菌steigerwaltii亞種 fig|243231.5.peg.1360;硫還原地桿菌PCA fig|1263083.3.peg.558;水痘克雷伯菌CAG:634 fig|57706.9.peg.1106;布氏檸檬酸桿菌 fig|1619244.3.peg.1323;布安登腸桿菌 SHO56340.1;五方弧菌 fig|688.15.peg.906;洛氏阿里弧菌 fig|663.75.peg.4800;溶藻弧菌 fig|1967612.3.peg.4508;腸道沙門氏菌豪頓亞種50:z4,z23:-血清型 fig|82985.3.peg.3508;泉布拉格菌 fig|55207.5.peg.1840;β血管果膠桿菌 fig|582.25.peg.388;摩氏摩根菌 fig|1006598.5.peg.80;普城沙雷氏菌RVH1 fig|82977.3.peg.3268;鄉間布丘氏菌 CRY53703.1;中間耶氏桿菌 fig|595494.3.peg.706;奧湖甲苯單胞菌DSM 9187 fig|1217694.3.peg.3300;不動桿菌屬CIP 64.2 fig|470.2679.peg.715;鮑曼不動桿菌 fig|1879050.4.peg.2797;武侯不動桿菌 fig|2004650.3.peg.1818;中國不動桿菌 fig|648.80.peg.1405;豚鼠氣單胞菌 fig|1217722.3.peg.1866;假單胞菌屬S13.1.2 fig|294.88.peg.4234;螢光假單胞菌 fig|629262.5.peg.1917;丁香假單胞菌日本致病變種菌株M301072 WP_053932309.1;冠狀假單胞菌 fig|1844101.3.peg.4702;假單胞菌屬31 R 17 fig|380021.13.peg.6149;保護假單胞菌 fig|287.3716.peg.2163;綠膿桿菌 fig|287.3208.peg.1464;綠膿桿菌 fig|1952221.3.peg.555;甲基球菌科細菌UBA2780 fig|1869214.3.peg.3542;萊茵黑姆氏菌屬 fig|375286.7.peg.830;詹森桿菌屬Marseille fig|536.26.peg.4616;紫色色桿菌 fig|983548.3.peg.2977;獨島菌屬4H-3-7-5 fig|307480.5.peg.1780;弗里斯塔金黃桿菌 fig|1262921.3.peg.2213;普雷沃氏菌屬CAG:1185 fig|1965649.3.peg.4193;丁酸弧菌屬An62 fig|1951558.3.peg.3731;噬幾丁質菌科細菌UBA4411 fig|1950669.3.peg.2035;擬桿菌門細菌UBA6192 fig|1869230.3.peg.3025;金黃桿菌屬CBo1 fig|1500294.3.peg.2814;金黃桿菌屬YR485 fig|1756149.11.peg.2545;伊麗莎白氏菌bruuniana fig|1137281.3.peg.1436;解明膠黃色海水菌 fig|1964365.5.peg.2525;斯尼斯氏菌屬 fig|28450.428.peg.5049;鼻疽伯克霍爾德菌 fig|1628751.3.peg.813;林氏念珠藻z16 fig|60137.10.peg.1138;龐蒂亞硫酸桿菌 fig|1580596.3.peg.2701;魚類褐桿菌 fig|1041141.4.peg.4741;豆科植物根瘤菌菜豆生物型128C53 WP_063290764.1;未分類之假弧菌 fig|1816219.4.peg.1873;科爾韋爾氏菌屬PAMC 21821 fig|651.3.peg.13;中間氣單胞菌 fig|134375.17.peg.3222;無色桿菌屬 WP_011296194.1;貪銅菌pinatubonensis fig|1513890.4.peg.2322;綠針假單胞菌魚亞種 fig|294.193.peg.980;螢光假單胞菌 fig|287.2516.peg.114;綠膿桿菌 fig|2006083.3.peg.3219;光桿菌屬CECT 9192 fig|458817.8.peg.538;哈利法克斯希瓦氏菌HAW-EB4 fig|1073383.3.peg.1289;維羅尼氣單胞菌AMC34 fig|190893.14.peg.2110;珊瑚弧菌 fig|663.144.peg.2659;溶藻弧菌 fig|55601.106.peg.926;鰻弧菌 fig|669.51.peg.5531;哈維弧菌 fig|1250059.5.peg.3511;黏桿菌屬MAR_2009_124 fig|906888.6.peg.2449;Nonlabens ulvanivorans WP_042276051.1;沈積物Nonlabens fig|1953167.3.peg.1254;擬桿菌門細菌UBA6221 fig|991.14.peg.629;棘球黃桿菌 fig|1121890.3.peg.5;寒冷黃桿菌DSM 17623 fig|387094.4.peg.1115;赫西黃桿菌 fig|1946558.3.peg.2413;黃桿菌屬UBA7665 fig|253.27.peg.2882;產吲哚金黃桿菌 fig|1685010.5.peg.4376;冰金黃桿菌 fig|1500289.3.peg.4103;金黃桿菌屬OV705 fig|1500298.3.peg.2823;金黃桿菌屬YR561 fig|1797331.3.peg.2180;擬桿菌門細菌GWE2_29_8 fig|1947498.3.peg.1366;鞘氨醇桿菌屬UBA4616 WP_074239321.1;尼阿布噬幾丁質菌 fig|192149.7.peg.174;鼠尾菌屬 fig|718222.3.peg.4924;蠟樣芽孢桿菌TIAC219 fig|1053210.3.peg.211;蠟樣芽孢桿菌HuB4-10 fig|2026089.3.peg.6077;類芽孢桿菌屬XY044 fig|1938610.3.peg.3678;黃桿菌屬LM5 fig|1947482.3.peg.632;鞘氨醇桿菌屬UBA1575 fig|1948844.3.peg.823;髕骨細菌群細菌UBA6130 fig|986.7.peg.83;約氏黃桿菌 fig|1950382.3.peg.461;擬桿菌門細菌UBA1181 fig|1947145.3.peg.376;普雷沃氏菌屬UBA3765 fig|1122989.3.peg.367;口普雷沃氏菌DSM 18711 = JCM 12252 fig|1896974.3.peg.2001;擬桿菌屬43_108 fig|2014804.3.peg.4306;賴文氏菌科細菌SD302 fig|1428.517.peg.2380;蘇雲金芽孢桿菌 fig|1428.574.peg.3351;蘇雲金芽孢桿菌 fig|1428.590.peg.5685;蘇雲金芽孢桿菌 fig|720554.3.peg.188;毛喉梭狀芽孢桿菌DSM 19732 fig|1122203.4.peg.2283;耐鹽海球菌DSM 16375 fig|1462525.3.peg.3489;深海芽孢桿菌屬TM-1 fig|1395513.3.peg.363;乳酸孢子乳桿菌DSM 442 fig|1262834.3.peg.1229;梭菌屬CAG:715 SCI87282.1;未培養之羅氏菌屬 fig|1952116.3.peg.2349;毛螺菌科細菌UBA6480 WP_069150959.1;毛螺菌科 fig|1265.10.peg.3100;黃色瘤胃球菌 fig|1120998.3.peg.2858;Anaerovorax odorimutans DSM 5092 WP_072702499.1;匈牙利丁酸弧菌 fig|1232453.3.peg.2795;梭菌目細菌VE202-21 fig|39485.11.peg.251;挑剔毛螺菌 WP_072832325.1; fig|1509.24.peg.2487;產孢梭菌 SCI88558.1;未培養之梭菌屬 fig|1490.6.peg.2635;雙發酵副梭菌 fig|1947399.3.peg.1322;亨蓋特梭菌科細菌UBA3548 fig|1953138.3.peg.796;擬桿菌門細菌UBA1312 fig|1950875.3.peg.364;梭菌目細菌UBA4139 fig|1396.1518.peg.4860;蠟樣芽孢桿菌 fig|1305675.3.peg.2174;芽孢桿菌solimangrovi fig|1423.436.peg.4365;枯草芽孢桿菌 fig|361277.6.peg.1539;嗜糖土桿菌 fig|1392.356.peg.4724;炭疽芽孢桿菌 fig|1053189.3.peg.4219;蠟樣芽孢桿菌BAG5X1-1 WP_079442297.1;鉻還原梭菌 fig|1953262.3.peg.783;候選Omnitrophica細菌UBA1562 fig|1797955.3.peg.2304;迷蹤菌門細菌RIFOXYA12_FULL_51_18 fig|1953111.3.peg.2436;酸桿菌細菌UBA7540 WP_099010551.1;大腸桿菌逆轉錄子-Eco1 (Ec86) fig|1005565.3.peg.1153;大腸桿菌3006 fig|158822.8.peg.1905;奈氏西地西菌 fig|1444060.3.peg.4830;大腸桿菌4-203-08_S1_C1 fig|29484.22.peg.4571;弗氏耶氏桿菌 fig|529823.3.peg.332;細胞弧菌屬OA-2007 fig|48296.218.peg.3142;皮特不動桿菌 fig|550.518.peg.3445;陰溝腸桿菌 fig|204773.6.peg.4;砷氧化赫山單胞菌 fig|670.190.peg.4348;副溶血弧菌 fig|44577.7.peg.209;尿素亞硝化單胞菌 fig|1125747.3.peg.1;解藻居水菌NO2 fig|1338034.3.peg.2437;副溶血弧菌O1:Kuk菌株 FDA_R31 fig|1952844.3.peg.2619;紅環菌科細菌UBA5533 fig|1288788.3.peg.2384;副溶血弧菌3631 fig|644.31.peg.975;嗜水氣單胞菌 fig|498292.3.peg.28;搖擺黃桿菌 fig|1948088.3.peg.4515;厚壁菌門細菌UBA6132 fig|1408433.3.peg.3094;藏紅花黃色綫菌catalasitica ATCC 23190 WP_074236572.1;玉米金黃桿菌 fig|1127353.3.peg.1738;腸道沙門氏菌腸亞種紐波特血清型菌株 #11-4 fig|1881110.4.peg.120;芝麻泛菌 fig|34038.6.peg.27;水生雷氏菌 fig|630.95.peg.4256;小腸結腸炎耶氏桿菌 fig|149387.11.peg.1139;腸道沙門氏菌腸亞種布蘭登堡血清型 fig|1343738.3.peg.2232;霍亂弧菌2012EL-1759逆轉錄子-Vch3 (Vc137) fig|1423.175.peg.4339;枯草芽孢桿菌 fig|2021695.3.peg.3399;芽孢桿菌屬7894-2 fig|189426.10.peg.597;氣味類芽孢桿菌 fig|2020949.3.peg.856;羅姆布茨菌weinsteinii fig|1243664.3.peg.1004;馬氏芽孢桿菌 fig|1855345.3.peg.2971;芽孢桿菌屬RRD69 fig|1946358.3.peg.2514;梭菌屬UBA4108 fig|1520.90.peg.2502;拜氏梭菌 fig|79672.3.peg.288;蘇雲金芽孢桿菌麥德林血清型 fig|189426.19.peg.3973;氣味類芽孢桿菌 fig|1497.3.peg.4049;甲酸乙酸梭菌 fig|169760.4.peg.4269;星孢類芽孢桿菌 WP_073588670.1;Anaerocolumna xylanovorans fig|1950815.3.peg.1585;梭菌目細菌UBA1341 fig|1897004.3.peg.2166;真桿菌屬45_250 fig|1946293.3.peg.290;卡塔桿菌屬UBA7571 fig|1796620.3.peg.3489;小鼠急性桿菌 fig|76857.53.peg.2245;具核梭桿菌多態亞種 fig|2013784.3.peg.1260;厚壁菌門細菌HGW-Firmicutes-3 fig|79884.3.peg.1106;假嗜鹼芽孢桿菌 fig|135461.47.peg.1552;枯草芽孢桿菌枯草亞種 WP_016122013.1;蠟樣芽胞桿菌群 fig|1499688.3.peg.3214;芽孢桿菌屬LF1 fig|1318.8.peg.1495;副血鏈球菌 fig|1381091.3.peg.1173;馬鏈球菌獸疫亞種SzAM60 fig|1409369.3.peg.815;金黃色葡萄球菌AMMC6050 fig|1497681.5.peg.3014;紐約李斯特菌 fig|1095727.3.peg.409;鏈球菌屬SK643 fig|1318.12.peg.313;副血鏈球菌 fig|1681184.3.peg.4899;離胺酸芽孢桿菌屬ZYM-1 fig|561879.29.peg.3569;薩法芽孢桿菌 fig|1884359.3.peg.2978;冷桿菌屬OK028 fig|29367.3.peg.1112;石榴梭菌 fig|225345.3.peg.1103;鉻還原梭菌 fig|1345695.10.peg.2438;糖丁酸梭菌DSM 13864 fig|119641.3.peg.784;梭菌uliginosum fig|1761781.3.peg.88;梭菌屬DSM 8431 fig|1946346.3.peg.2999;梭菌屬UBA1056 fig|1492.48.peg.3952;丁酸梭菌 fig|1121302.3.peg.4511;卡文迪許梭菌DSM 21758 fig|398512.4.peg.5301;溶纖維素擬桿菌ATCC 35603 = DSM 2933 fig|1946357.3.peg.500;梭菌屬UBA3947 fig|642492.3.peg.3338;Cellulosilyticum lentocellum DSM 5427 fig|1946690.3.peg.1553;腸道核心菌屬UBA3320 fig|397290.3.peg.150;毛螺菌科細菌A2 fig|97138.3.peg.1713;梭菌屬ASF356 fig|1965545.3.peg.499;泰勒菌屬An114 fig|1047063.3.peg.240;WS1細菌JGI 0000059-K21 fig|1192034.3.peg.1508;細尖軟骨黴菌DSM 436 AAA88323.1;黃色黏球菌逆轉錄子-Mxa2 (Mx65) fig|33.8.peg.8196;橙色黏球菌 fig|215803.3.peg.1485;鹽水水黏細菌 fig|1406225.3.peg.2150;紫色原囊菌Cb vi76 fig|1952931.3.peg.5137;疣微菌亞門3細菌UBA6082 fig|1972460.3.peg.279;厭氧菌科細菌4572_78 fig|1950201.3.peg.3297;厭氧繩菌目細菌UBA2796 WP_015247705.1; fig|1799658.3.peg.2777;浮黴菌科細菌SCGC AG-212-F19 fig|214688.26.peg.6659;隱球出芽菌UQM 2246 fig|2023130.3.peg.4250;紅小梨形菌屬MGV fig|52.7.peg.9046;紅花軟骨黴菌 fig|448385.16.peg.2914;纖維素堆囊菌So ce56 fig|888845.4.peg.14202;玫瑰微囊藻 fig|1752210.3.peg.1275;δ-變形菌綱細菌Ga0077539 fig|1391654.3.peg.3562;Labilithrix luteola fig|1752218.3.peg.3670;浮黴菌科細菌Ga0077529 WP_006981058.1;黃西索恩氏菌 fig|1952939.3.peg.2584;疣微菌科細菌UBA1938 fig|2024858.3.peg.3711;Sandaracinus屬 WP_009096166.1;紅小梨形菌屬SWK7 fig|595453.3.peg.1506;紅小梨形菌屬SM50 fig|1263868.3.peg.4100;歐洲紅小梨形菌SH398 fig|167547.3.peg.303;海洋原綠球菌菌株MIT 9311 fig|1499501.3.peg.459;原綠球菌屬SS52 fig|1905359.3.peg.4335;海洋細菌AO1-C WP_002700020.1;海洋微顫菌 fig|1913989.145.peg.263;γ-變形菌綱細菌 WP_073154989.1;嗜腖清野氏菌 fig|46223.3.peg.3648;雙岐高溫黃色微球菌 fig|1329796.3.peg.1947;馬賽Risungbinella fig|1123252.3.peg.3225;克里布所島津氏菌DSM 45090 fig|714067.3.peg.3719;象牙色克羅彭斯特德菌 fig|1341151.3.peg.628;糖萊斯氏菌1-1 fig|2026763.3.peg.3089;黏球菌目細菌 fig|1797895.3.peg.2518;δ-變形菌綱細菌RIFOXYA12_FULL_58_15 fig|373672.4.peg.4082;甘布里金黃桿菌 fig|1416778.5.peg.4374;花生金黃桿菌 fig|1603293.4.peg.829;黃桿菌屬316 CCB70859.1;嗜鰓黃桿菌FL-15 fig|1986952.3.peg.951;鞘脂桿菌科細菌GW460-11-11-14-LB5 fig|1476464.3.peg.4304;西溪土土地桿菌 fig|1761785.3.peg.3112;黃桿菌屬ov086 fig|1664068.3.peg.3690;細菌336/3 fig|880071.3.peg.3500;Bernardetia litoralis DSM 6794 fig|1121902.3.peg.2906;秀麗艾森氏桿菌DSM 3317 fig|1509483.4.peg.1923;彎桿菌屬BAB-3569 fig|1166018.3.peg.5312;Fibrella aestuarina BUZ 2 fig|634771.3.peg.967;艾森氏噬幾丁質菌 fig|29529.3.peg.396;阿文西科拉噬幾丁質菌 fig|1891659.3.peg.6032;噬幾丁質菌屬CB10 fig|2033437.3.peg.3442;噬幾丁質菌屬MD30 fig|1004.4.peg.1677;人參土噬幾丁質菌 fig|1881041.3.peg.3698;噬幾丁質菌屬YR627 fig|1123078.3.peg.2010;玉米古字狀菌DSM 19591 fig|354355.3.peg.1846;容州農研所絲桿菌 fig|1951546.3.peg.1574;噬幾丁質菌科細菌UBA1946 fig|1812911.3.peg.633;黃腐桿菌屬CACIAM 22H1 fig|477680.4.peg.4668;缺陷線狀單胞菌 fig|221126.7.peg.3781;選擇海藻桿菌 fig|342954.4.peg.734;海藻湖養菌 fig|1871037.5.peg.2180;黃桿菌科細菌 fig|669041.3.peg.843;雙中心黏桿菌 WP_074538568.1;波羅的海纖維噬菌體 fig|1248440.3.peg.1511;弗朗茲曼極桿菌ATCC 700399 fig|1121007.3.peg.1574;穆氏海水菌DSM 19832 fig|688867.3.peg.2332;韓國Ohtaekwangia fig|926565.3.peg.717;黏球生孢噬纖維菌DSM 11118 fig|1257021.3.peg.5265;火色桿菌科細菌311 fig|2044937.5.peg.2350;候選分裂KSB3細菌 fig|1499966.3.peg.180;候選絮狀Moduliflexus fig|1948269.3.peg.1254;疣微菌門細菌UBA6053 fig|694433.3.peg.3103;大腐生螺旋體DSM 2844 fig|2008677.3.peg.3337;結節松江菌 fig|946333.3.peg.3570;嗜膠根瘤菌 fig|1736433.3.peg.5559;根瘤菌屬Root1221 fig|1500265.3.peg.5760;甲基養菌屬YR605 fig|1121349.4.peg.2836;堆肥叢毛單胞菌DSM 21721 fig|1082851.3.peg.91;叢毛單胞菌serinivorans fig|1121480.5.peg.5468;紫假杜氏桿菌DSM 15887 fig|2045208.3.peg.1247;紫黑馬賽菌 fig|1736455.3.peg.3692;馬賽菌屬Root133 fig|34073.25.peg.8569;爭論貪噬菌 fig|1884311.3.peg.7121;貪噬菌屬OK202 fig|1123487.3.peg.1763;Uliginosibacterium gangwonense DSM 18521 fig|2029111.3.peg.3032;叢毛單胞菌科細菌NML120219 fig|1977087.20.peg.1226;變形菌門細菌 fig|754436.4.peg.4454;不發光光桿菌 fig|265726.7.peg.1038;耐鹽光桿菌 fig|1121867.3.peg.59;卡爾維腸弧菌DSM 14347 fig|1238431.3.peg.2655;黑弧菌BLFn1 fig|1384589.3.peg.2721;[伊文氏桿菌]油葫蘆 fig|1261127.3.peg.2947;無丙二酸檸檬酸桿菌Y19 fig|349521.8.peg.3588;Hahella chejuensis KCTC 2396 fig|525918.3.peg.1501;硫發菌caldifontis fig|1737490.4.peg.4974;栖紅藻解瓊脂菌 fig|251229.3.peg.427;溫泉擬甲色球藻PCC 7203 fig|1245923.3.peg.9587;米氏僞枝藻VB511283 fig|1503470.5.peg.10896;藍菌TDX16 fig|2005460.3.peg.1118;軟骨囊藻屬NIES-4102 fig|179408.3.peg.4679;墨綠顫藻PCC 7112 fig|1612423.3.peg.5384;林氏念珠藻z1 fig|63737.11.peg.472;點形念珠藻PCC 73102 fig|224013.5.peg.7163;池生念珠藻CENA21 fig|1932621.3.peg.7363;念珠藻屬T09 fig|373994.3.peg.3383;膠須藻屬PCC 7116 fig|2005463.3.peg.257;眉藻屬NIES-4105 fig|2005459.3.peg.7019;單歧藻屬NIES-4075 fig|184925.3.peg.2602;佛氏擬綠膠藍細菌PCC 9212 fig|454136.5.peg.3127;可疑席藻IAM M-71 fig|203124.6.peg.2732;紅海束毛藻IMS101 fig|2040638.3.peg.3067;布氏常絲藻FEM_GT703 Fig|1880991.4.peg.2927;顫藻藍藻 菌USR001 fig|1173028.3.peg.7115;顫藻屬PCC 10802 fig|568701.4.peg.2073;Moorea bouillonii PNG fig|927677.3.peg.4187;集胞藻屬PCC 7509 fig|179408.3.peg.6267;墨綠顫藻PCC 7112 fig|1710894.3.peg.2079;水華束絲藻LD13 fig|1947888.3.peg.4484;藍菌細菌UBA6047 fig|1705388.3.peg.1178;擬浮絲藻屬SR001 fig|454136.5.peg.4162;可疑席藻IAM M-71 fig|2005458.3.peg.378;念珠藻屬NIES-4103 fig|1947874.3.peg.4590;藍菌細菌UBA1583 fig|1781255.3.peg.802;Desertifilum屬IPPAS B-1220 fig|1128427.4.peg.2346;絲狀藍菌ESFC-1 fig|1946321.3.peg.3928;氯酸桿菌屬UBA7656 fig|118173.3.peg.1030;假魚腥藻屬PCC 6802 fig|1922337.4.peg.4802;細鞘絲藻屬『hensonii』 fig|927668.3.peg.2766;假魚腥藻屬biceps PCC 7429 fig|1173020.3.peg.6191;微型管孢藻PCC 6605 fig|329726.14.peg.4440;海洋Acaryochloris MBIC11017 fig|215803.3.peg.1649;鹽水水黏細菌 fig|1920190.3.peg.9548;原囊菌屬Cb G35 fig|1961464.3.peg.5181;黏球菌目細菌UBA2376 fig|765913.3.peg.336;德氏硫紅球菌AZ1 fig|1396141.3.peg.2891;鹽桿條菌屬BvORR071 fig|1961463.3.peg.5253;黏球菌目細菌UBA1671 AAL40743.1;侵蝕侏囊菌逆轉錄子-Nex2 (Ne144) fig|54.3.peg.4123;侵蝕侏囊菌 fig|53367.3.peg.3417;銹色淺野氏菌 fig|460265.11.peg.3882;結節甲基桿菌ORS 2060 fig|298794.3.peg.462;變異甲基桿菌 fig|190148.4.peg.3492;慢生根瘤菌paxllaeri fig|1075417.3.peg.445;Catalinimonas alkaloidigena fig|1429438.4.peg.7505;候選蒂殼內菌因子 fig|1977087.12.peg.1756;變形菌門細菌 fig|92487.3.peg.3972;硫發菌eikelboomii fig|1977087.20.peg.1473;變形菌門細菌 fig|1123400.3.peg.3276;柔性Thiofilum DSM 14609 fig|34062.8.peg.73;奧斯陸莫拉氏菌 fig|1699623.3.peg.1502;嗜冷桿菌屬P11G3 fig|1123509.3.peg.848;Zooshikella ganghwensis DSM 15267 fig|2026735.3.peg.2222;ζ-變形菌門細菌 fig|1977087.12.peg.2982;變形菌門細菌 fig|2026763.4.peg.1195;黏球菌目細菌 fig|1977087.12.peg.510;變形菌門細菌 fig|1123508.3.peg.7252;Zavarzinella formosa DSM 19928 fig|214688.26.peg.3091;隱球出芽菌UQM 2246 fig|1908690.5.peg.1204;Fimbriiglobus ruber fig|1805126.3.peg.4431;δ-變形菌綱細菌CG2_30_63_29 fig|1882752.4.peg.1962;Singulisphaera屬GP187 fig|1636152.3.peg.5364;浮黴狀菌屬SH-PL62 APR75442.1;玫瑰微囊藻 fig|54.3.peg.8798;侵蝕侏囊菌 fig|980254.4.peg.4083;Roseimaritima ulvae fig|1856297.3.peg.3627;γ-變形菌綱細菌細菌45_16_T64 fig|1219077.3.peg.1945;遠青弧菌NBRC 104587 fig|1334629.3.peg.167;橙色黏球菌124B02 AAA25405.1;黃色黏球菌逆轉錄子-Mxa1 (Mx162) fig|378806.16.peg.4444;橙色標樁菌DW4/3-1 WP_002615305.1;橙色標樁菌逆轉錄子-Sau1 (Sa163) fig|48.3.peg.757;過渡原囊菌 fig|448385.16.peg.2083;纖維素堆囊菌So ce56 fig|52.7.peg.5100;紅花軟骨黴菌 fig|1752210.3.peg.5621;δ-變形菌綱細菌Ga0077539 fig|2024858.3.peg.6345;Sandaracinus屬 WP_012826728.1;赭黃嗜鹽囊菌 fig|927083.3.peg.3408;解澱粉Sandaracinus WP_006977315.1;太平洋近囊藻 fig|1400863.5.peg.627;候選反硝化競爭桿菌Run_A_D11 fig|1961463.3.peg.4303;黏球菌目細菌UBA1671 fig|1898731.3.peg.3099;競爭桿菌屬MCBA15_001 fig|1279028.3.peg.3374;競爭桿菌屬314Chir4.1 fig|1898733.3.peg.2056;競爭桿菌屬MCBA15_004 fig|1795630.3.peg.3476;葉居菌屬PAMC 28766 fig|2033654.3.peg.3461;競爭桿菌屬『Ferrero』 fig|1736329.5.peg.1436;葉居菌屬Leaf304 fig|1736292.3.peg.1083;拉塞氏桿菌屬Leaf185 fig|1736327.3.peg.206;拉塞氏桿菌屬Leaf296 fig|1736311.3.peg.3668;競爭桿菌屬Leaf261 fig|1736308.3.peg.2333;寒冷桿菌屬Leaf254 fig|656366.8.peg.2905;高山節桿菌 fig|494023.3.peg.138;南極類麩胺酸桿菌 ASN40093.1;節桿菌屬7749 fig|1494608.3.peg.465;節桿菌屬PAMC 25486 fig|656366.4.peg.2620;高山節桿菌 fig|656366.3.peg.1944;高山節桿菌 fig|1132441.3.peg.1888;節桿菌屬35W fig|1704044.3.peg.520;節桿菌屬ERGS1:01 fig|1496689.3.peg.681;節桿菌屬L77 fig|1681197.3.peg.149;節桿菌屬RIT-PI-e fig|37921.12.peg.1481;敏捷節桿菌 fig|1736303.3.peg.982;節桿菌屬Leaf234 fig|1312978.3.peg.1472;節桿菌屬H41 fig|1348338.3.peg.1472;橡膠雷夫松氏菌CMS 76R fig|1452536.3.peg.1955;微桿菌屬Cr-K20 fig|1736525.3.peg.446;雷夫松氏菌屬Root4 fig|1529318.3.peg.434;低溫桿菌屬MLB-32 fig|1267973.3.peg.3479;節桿菌屬H5 fig|150121.3.peg.1900;草地阿格雷氏菌 fig|123316.3.peg.955;阿格雷氏菌屬VKM Ac-2052 fig|1052260.3.peg.3617;土壤Klenkia fig|1566299.3.peg.3962;海洋Klenkia fig|1736356.3.peg.3150;摩德桿菌屬Leaf380 fig|1736354.3.peg.1787;地嗜皮菌屬Leaf369 fig|479431.6.peg.3115;Nakamurella multipartita DSM 44233 fig|1090615.3.peg.2397;Nakamurella panacisegetis fig|1306174.4.peg.4778;橙色動孢菌JCM 3230 fig|546871.3.peg.1120;淺黃弗萊德門菌 fig|630515.4.peg.525;土壤小月菌 fig|546874.3.peg.1181;弗萊德門菌sagamiharensis BAK35674.1;積磷小月菌NM-1 fig|1380390.4.peg.72;土壤紅桿菌目細菌URHD0059 fig|1283299.3.peg.2784;伍氏束縛菌Iso977N fig|929712.3.peg.3165;港區散生桿菌DSM 18081 fig|1123262.3.peg.3125;土壤紅桿菌DSM 22325 fig|1861.4.peg.5240;暗紋地嗜皮菌 fig|1137993.4.peg.1318;非洲地嗜皮菌 fig|1070870.3.peg.778;黑地嗜皮菌 fig|1190417.3.peg.1785;地嗜皮菌telluris fig|477641.3.peg.1023;海洋摩德桿菌 fig|1798228.3.peg.3756;芽球菌屬DSM 46838 WP_091929708.1;芽球菌屬DSM 46786 SHH20361.1;痲瘋樹內生菌 fig|1844.3.peg.1151;藤黃諾卡氏菌 fig|748909.6.peg.1418;高山諾卡氏菌 fig|402596.3.peg.987;微白類諾卡氏菌 fig|1736322.3.peg.1963;諾卡氏菌屬Leaf285 fig|1445613.3.peg.3490;海洋南海所菌屬SE31 fig|543632.4.peg.9742;亞熱帶游動放線菌 fig|1036182.3.peg.2958;暗橙色遊動放線菌 fig|1246995.3.peg.737;弗留利遊動放線菌DSM 7358 fig|56427.3.peg.3052;青藍科奇氏游動菌青藍亞種 fig|1710355.3.peg.2225;遊動放線菌屬TFC3 fig|649831.3.peg.2352;遊動放線菌屬N902-109 fig|35754.4.peg.6321;橘橙指孢囊菌 fig|1881.4.peg.2703;產綠小單孢菌 fig|47863.3.peg.3975;球狀小單孢菌 fig|285665.4.peg.2050;馬桑小單孢菌 fig|1192034.3.peg.4668;細尖軟骨黴菌DSM 436 fig|1198133.3.peg.2442;黃色黏球菌DZ2 fig|33.8.peg.521;橙色黏球菌 fig|394193.3.peg.7794;擬無枝酸菌saalfeldensis fig|369932.4.peg.5621;新瀉擬無枝酸菌 fig|1238180.3.peg.5340;天藍色擬無枝酸菌DSM 43854 fig|589385.3.peg.7940;木聚糖擬無枝酸菌 fig|1068980.3.peg.1527;黑擬無枝酸菌CSC17Ta-90 fig|1854586.3.peg.2100;南極擬無枝酸菌 fig|587909.3.peg.3086;沙漠玉胡氏菌 fig|2030.3.peg.3051;乾旱擬孢囊菌 fig|1382595.4.peg.3164;紅色糖多孢菌D WP_013675061.1;食二氯雜環己烷假諾卡氏菌 fig|1660131.3.peg.2805;假諾卡氏菌屬SCN 72-86 fig|366584.3.peg.4349;木蝴蝶假諾卡氏菌 fig|1885031.4.peg.5241;假諾卡氏菌屬Ae331_Ps2 fig|1690815.5.peg.5350;假諾卡氏菌屬HH130630-07 fig|1123023.3.peg.3229;金合歡假諾卡氏菌DSM 45401 fig|1449976.3.peg.8114;庫茨涅爾氏菌albida DSM 43870 WP_007238159.1; fig|1220583.3.peg.1582;愛知戈登氏菌NBRC 108223 GAB07179.1;污泥戈登氏菌NBRC 15530 fig|1223545.3.peg.704;土壤戈登氏菌NBRC 108243 fig|1223540.3.peg.3237;脫硫戈登氏菌NBRC 100010 fig|1112204.3.peg.4913;食聚異戊二烯戈登氏菌VH2 AFR49048.1;戈登氏菌屬KTR9 fig|402289.3.peg.1558;紅球菌屬HA99 fig|1077144.3.peg.224;迪茨氏菌alimentaria 72 fig|1344003.3.peg.1864;萍婆威廉氏菌 fig|1463823.3.peg.3407;微雙孢菌屬NRRL B-24597 fig|1903117.3.peg.1566;威廉氏菌屬1138 fig|1603258.4.peg.1828;威廉氏菌herbipolensis fig|644548.3.peg.652;雲豹糞便戈登氏菌NRRL B-59395 fig|1136941.3.peg.1310;戈登氏菌phthalatica fig|1223542.3.peg.3439;污水戈登氏菌NBRC 108250 fig|47312.10.peg.4232;肺塚村菌 fig|57704.14.peg.421;耐酪胺酸塚村菌 fig|521096.6.peg.2443;稍變塚村菌DSM 20162 fig|1123241.3.peg.3642;Nakamurella lactea DSM 19367 fig|1210073.4.peg.1031;殺鮭諾卡氏菌NBRC 100378 fig|1206740.4.peg.4617;泰國諾卡氏菌NBRC 100428 fig|1210064.4.peg.2434;阿爾塔米爾諾卡氏菌NBRC 108246 fig|1123258.3.peg.1651;新瀉小球菌DSM 44881 = NBRC 103563 fig|1443888.3.peg.2891;筋膜紅球菌02-815 fig|1517936.4.peg.882;紅球菌屬CUA-806 fig|398843.6.peg.4214;京均紅球菌 fig|1813677.3.peg.4031;紅球菌屬EPR-157 WP_008711873.1;未命名之生物體 fig|1381122.3.peg.6103;紅平紅球菌DN1 fig|1736210.3.peg.2766;紅球菌屬Leaf7 fig|1736300.3.peg.1279;紅球菌屬Leaf225 fig|1219012.3.peg.1705;棒狀桿菌紅球菌NBRC 14404 fig|1219023.3.peg.2791;椿象紅球菌NBRC 100604 fig|616997.3.peg.2548;阿爾塔米爾霍伊索拉菌 fig|1303689.4.peg.2934;韓國紅球菌JCM 10743 = NBRC 100607 fig|644.85.peg.4392;嗜水氣單胞菌。 使用序列表來鑑定同源RT / ncRNA對 The following sequence reveals the following: Mestre et al., Systematic Prediction of Genes Functionally Associated with Bacterial Retrons and Classi fi cation of The Encoded Tripartite Systems, Nucleic Acids Research, Vol. 48, No. 22, December 16, 2020, pp. 12632-12647, which is incorporated herein by reference in its entirety. The sequences are described by sequence reference number and source organism. Serial number deposit number; source organismfig|670897. 3. peg. 2382; E. coli 2362-75 WP_000111473. 1; Escherichia coli (retrotransformant-Eco7) fig|286156. 4. peg. 5031; Australian Light Rod Bacteria fig|171439. 3. peg. 1995; Photorhabditis luminescens subsp. luminescens fig|1004151. 3. peg. 110; Pseudomonas aeruginosa NC19 fig|1736225. 3. peg. 2969; Evansella Leaf53 fig|1897730. 3. peg. 2912; Citrobacter CFSAN044567 fig|286156. 4. peg. 5031; Aeromonas australis fig|1460083. 3. peg. 4429; Serratia liquefaciens FK01 fig|585. 10. peg. 2369; Proteobacterium vulgaris WP_140315795. 1; Vibrio parahaemolyticus (retrotransferase-Vpa1) fig|670. 147. peg. 3463; Vibrio parahaemolyticus fig|1516159. 4. peg. 4737; Vibrio corallii fig|190893. 12. peg. 246; Vibrio corallii fig|643674. 5. peg. 820; Human alkali-producing bacteria fig|1122619. 3. peg. 2381; Oligouricobiformis DSM 18253 fig|29489. 5. peg. 3423; Aeromonas enterica serovar Enterica fig|1899355. 18. peg. 3566; Marine Spirillaceae bacteria fig|49186. 3. peg. 4362; Haemophilus stany fig|672. 375. peg. 4377; Vibrio vulnificus fig|584. 202. peg. 1668; Proteobacterium mirabilis fig|394935. 10. peg. 4407; Chromobacterium haemolyticum fig|1196083. 117. peg. 637; Alveus nodigella fig|1196083. 120. peg. 2046; Alveus nodigella fig|1196083. 114. peg. 825; Alveus nodgrassii fig|550. 250. peg. 2975; Enterobacter vulvae fig|680. 27. peg. 793; Vibrio campbeni fig|1348393. 3. peg. 352; Pseudoalteromonas H105 fig|1234128. 4. peg. 4777; Vibrio parahaemolyticus SNUVpS-1 fig|69219. 6. peg. 2213; Enterococcus lysate subsp. vulvar fig|208224. 13. peg. 2962; Kobe Escherichia coli fig|672. 332. peg. 2758; Vibrio vulnificus fig|1777131. 3. peg. 2267; Chromobacterium F49 fig|945550. 3. peg. 1167; Vibrio sinarius DSM 21326 fig|648. 75. peg. 922; Aeromonas caviae fig|1238221. 3. peg. 2053; Vibrio parahaemolyticus VPTS-2009 fig|56192. 3. peg. 3860; Photorhabditis iliacis fig|1806667. 7. peg. 3169; Gallicarpospora fig|272773. 3. peg. 1019; Halobiobrio halophilus subsp. costico WP_073265166. 1; Pseudomonas pugnorum fig|1946584. 3. peg. 2789; Halomonas UBA3074 fig|2030880. 3. peg. 665; SAR86 cluster bacteria fig|80854. 14. peg. 530; Myxotrichum fig|1902503. 3. peg. 1072; Marinomonas QM202 fig|1122212. 3. peg. 1985; Minutus villosa DSM 6287 fig|40576. 4. peg. 4387; Pathogenic bacteria of cattle fig|287094. 3. peg. 78; Alteromonas fig|1805633. 3. peg. 1469; Acinetobacter SFA fig|1945927. 3. peg. 1017; Acinetobacter spp. UBA1497 fig|202956. 9. peg. 1680; Acinetobacter thompsonii fig|1811612. 3. peg. 155; Moraxellaceae Bacteria REDSEA-S32_B1 fig|573. 14330. peg. 438; Klebsiella pneumoniae fig|470. 1294. peg. 971; Acinetobacter baumanni fig|762966. 3. peg. 2452; Escherichia coli YIT 11859 fig|470. 3514. peg. 1550; Acinetobacter baumanni fig|470. 2538. peg. 3022; Acinetobacter baumanni fig|48296. 130. peg. 276; Acinetobacter pitei fig|663. 91. peg. 4688; Vibrio alginolyticus fig|296199. 3. peg. 4813; Giant Vibrio fig|1367490. 3. peg. 3583; Vibrio fischeri ETJB5C fig|326537. 3. peg. 3698;Colvera arctica fig|1175631. 4. peg. 4191; Pectinobacterium wasabiense CFBP 3304 WP_001403504. 1; Escherichia coli (retrotransformant-Eco4 / Ec83) fig|549. twenty one. peg. 1734; Pantoea agglomerata fig|140100. 3. peg. 2972; Vibrio cholerae fig|693153. 4. peg. 1176;Atlantic Vibrio fig|1238430. 3. peg. 1911; Vibrio nigromaculata AM115 fig|1123036. 3. peg. 144; Thermomonas arcticus DSM 14288 fig|173990. 3. peg. 3319; Rheinheimia pacifica fig|1869214. 4. peg. 3809; Rheinheimia fig|1898113. 7. peg. 1514; Bacteria of the family Osmundae fig|29484. 39. peg. 1876; Yersinia freundii fig|1761793. 3. peg. 274; Haemophilus sp. DSM 26671 fig|587. 48. peg. 2666; Providencia rettgeri fig|573. 4147. peg. 1684; Klebsiella pneumoniae fig|1263833. 3. peg. 2872; Serratia marcescens VGH107 fig|1690502. 3. peg. 467; Pantoea CFSAN033090 fig|1029989. 3. peg. 5037; Enteric Salmonella enterica subsp. Agona serotype 0292 fig|211759. 3. peg. 770; Serratia marcescens fig|29483. 5. peg. 2283; Yersinia fig|1268238. 3. peg. 3466; Escherichia coli O5:K4(L):H4 strain ATCC 23502 fig|548. 121. peg. 2368; Klebsiella aerogenes fig|196024. 6. peg. 1825; Aeromonas hydrophila fig|386429. 3. peg. 3784; Pseudoalteromonas BSi20495 fig|666. 2089. peg. 3167; Vibrio cholerae WP_159353404. 1; Vibrio cholerae (retrotransposon-Vch1 / Vc95) fig|670. 362. peg. 2186; Vibrio parahaemolyticus fig|615. 398. peg. 1671; Serratia marcescens fig|571. 188. peg. 5401; Klebsiella oxytoca fig|1389422. 3. peg. 2794; Klebsiella pneumoniae LAU-KP1 fig|1082704. 3. peg. 1242;Lonsdalea britannica fig|1686379. 3. peg. 3365; Bacillus citrobacter MGH104 fig|83655. 55. peg. 221; non-decarboxylating Luxenbergia fig|550. 532. peg. 617; Enterobacter vulvae fig|349965. 6. peg. 153; Yarrowia intermedia ATCC 29909 fig|1947028. 3. peg. 31; Pantoea UBA2708 fig|29484. 34. peg. 3725; Yersinia freundii fig|314608. 4. peg. 222; Shewanella abyssinica KT99 fig|585. 16. peg. 3620; Proteobacterium vulgaris fig|1117313. 3. peg. 4128; Pseudoalteromonas polaris A 37-1-2 fig|1236543. 3. peg. 1328; Shewanella putrefaciens JCM 20190 = NBRC 3908 fig|550. 520. peg. 1818; Enterobacter vulvae fig|592316. 4. peg. 43; Pantoea At-9b fig|1903177. 3. peg. 4556; Vibrio 10N. 261. 45. E1 fig|1435069. 3. peg. 925; Vibrio tritonius fig|666. 3258. peg. 1211; Vibrio cholerae fig|1579504. 3. peg. 1822; Shewanella spp.ECSMB14102 fig|727. 548. peg. 1576; Haemophilus influenzae EIJ70524. 1; Haemophilus parahaemolyticus HK385 fig|1121935. 3. peg. 14;Hahella ganghwensis DSM 17046 fig|400668. 8. peg. 2509; Marinomonas MWYL1 fig|1777491. 3. peg. 1212; Alteromonas Mac1 fig|2013797. 3. peg. 1728; γ-Proteobacteria HGW-γ-Proteobacteria-15 fig|1008297. 7. peg. 4158; Enteric Salmonella enterica subspecies serovar Typhimurium strain 798 EDM6246721. 1; Enteric Salmonella enterica subspecies Typhimurium (retrotranscript-Sen2 (St85)) fig|421. 19. peg. 3278; Methylomonas fig|758. 17. peg. 102; Pneumocystis jirovecii fig|726. 60. peg. 864; Haemophilus influenzae fig|1035188. 3. peg. 348; Haemophilus pitmanii HK 85 fig|670. 79. peg. 3738; Vibrio parahaemolyticus fig|1481663. 12. peg. 913; Meteorological Vibrio fig|1123402. 3. peg. 611; Anopheles trosella DSM 18579 fig|668. 83. peg. 3088; Vibrio fischeri fig|290110. 6. peg. 2319; Budapest pathogens fig|568766. 10. peg. 822; Dickie's bacterium NCPPB 3274 fig|470. 4268. peg. 2217; Acinetobacter baumanni fig|1977881. 3. peg. 1569; Acinetobacter ANC 4470 fig|548. 171. peg. 2395; Klebsiella aerogenes fig|584. 105. peg. 1823; Proteobacterium mirabilis fig|1275975. 3. peg. 1756; Enteric Salmonella enterica subspecies Newport serotype strain Henan_3 fig|615. 474. peg. 3994; Serratia marcescens fig|61647. 13. peg. 3699; Polybacterium gergoviniensis fig|549. twenty two. peg. 222; Pantoea agglomerata fig|991944. 3. peg. 3216; Vibrio cholerae HE-25 WP_001022871. 1; Vibrio cholerae (retrotransposon-Vch2 (Vc81)) fig|1638949. 3. peg. 1051; Vibrio spp.ECSMB14106 fig|73010. 3. peg. 2815; Aeromonas eel fig|1444141. 3. peg. 3893; Escherichia coli 3-373-03_S3_C1 fig|232. 5. peg. 1080; Alteromonas fig|1175295. 3. peg. 21; Pseudoalteromonas PAMC 22718 fig|265726. 7. peg. 3430; Halotoxinia halophila WP_009585554. 1; Acinetobacter fig|2004649. 3. peg. 1632; Acinetobacter WCHA29 fig|1324350. 3. peg. 2817; Acinetobacter equi fig|2048003. 3. peg. 1682; Alteromonas flavus fig|571. 171. peg. 5963; Klebsiella oxytoca fig|573. 4060. peg. 3574; Klebsiella pneumoniae fig|1173850. 3. peg. 2995; Salmonella enterica subspecies Indiana strain ATCC 51959 fig|1123516. 3. peg. 1267; Vibrio halophilus DSM 15072 fig|1981674. 3. peg. 1814; Pseudomonas R9 (2017) fig|1947311. 3. peg. 2053; Pseudomonas sp.UBA2684 fig|1198309. 3. peg. 4291; Pseudomonas fluorescens ICMP 11288 fig|715451. 3. peg. 1743;Alteromonas naphthalenifera fig|316. 285. peg. 730; Pseudomonas stutzeri fig|1190606. 3. peg. 313; Enterovibrio calvei 1F-211 WP_009176189. 1; WP_097050713. 1; Xiamen Conch fig|1208323. 3. peg. 893; Bacillus thuringiensis B30 KZK95863. 1; Pseudomycin Ad46 fig|101571. 310. peg. 3956; Uboburkholderia fig|1882791. 3. peg. 1790; Burkholderia CF099 fig|1736536. 3. peg. 4809; Root434 of the genus PIG30812. 1; Janssenella 35 fig|1798244. 3. peg. 1046; Trichoderma book bacteria GWA2_55_18 fig|1131551. 3. peg. 1124; Methylophilus 1P/1 fig|1843082. 3. peg. 1574; Macromonas BK-30 fig|279058. 16. peg. 4721; Monospora arenae fig|1548123. 6. peg. 1144; Brachymycetes T2 fig|380394. 4. peg. 276; Sulfobacillus ferrooxidans ATCC 53993 WP_080292858. 1; fig|101571. 162. peg. 3605; Uboburkholderia fig|1382803. 3. peg. 22; Chromobacterium amazonica fig|930. 4. peg. 3851; Thiobacillus thiosulfate-oxidizing fig|1261658. 3. peg. 1787; Algae B. berberis Y31 fig|1679001. 3. peg. 631; Pasteurellaceae NI1060 fig|1334187. 3. peg. 653; Haemophilus influenzae KR494 fig|1581107. 3. peg. 1286; Neisseria HMSC15G01 fig|486. twenty four. peg. 152; Neisseria lactis fig|1953412. 3. peg. 1956; Bacteria UBP10_UBA1160 WP_090322045. 1; Oligotrophic Nitrosomonas fig|2013740. 3. peg. 1400; δ-Proteobacteria HGW-δ-Proteobacteria-13 fig|1907413. 3. peg. 3170; Rhizobium RU33A fig|1817963. 3. peg. 856; Adenophora deserticola fig|2035448. 3. peg. 1752; Rhizobium C5 WP_014077019. 1; fig|1648404. 4. peg. 2797; Rhodopsinus atlanticus fig|359. 11. peg. 6331; Agrobacterium radiatum fig|887144. 4. peg. 573; Taibaishan rhizobium fig|1116389. 3. peg. 333; Isolation of Devosia DS-56 fig|121719. 10. peg. 3421; Panronia alkaline lake bacillus fig|34002. 6. peg. 3570; Paracoccus alkaliphilus fig|1940281. 4. peg. 1560; Hefflera fig|1040981. 5. peg. 1561; Mesorrhiza siceri WSM4083 fig|410764. 3. peg. 807; Multi-hospital rhizobium fig|1825934. 3. peg. 3111; Anhui Rhizobium fig|1952824. 3. peg. 3061; UBA3976 of the family A. cerevisiae fig|1871086. 3. peg. 2153; Brevundimonas fig|588932. 9. peg. 647; Brevundimonas neizangshanensis fig|1951751. 3. peg. 1538; Rhodopsinaceae bacteria UBA1460 fig|1843368. 3. peg. 904; Sphingolipids RAC03 fig|155892. 10. peg. 3219; Pseudomonas aeruginosa fig|43057. 4. peg. 4537; Rhodopsinus nitrogen-fixing fig|1514904. 3. peg. 974; Arendella marineensis fig|1338034. 3. peg. 722; Vibrio parahaemolyticus O1:Kuk strain FDA_R31 fig|150340. 18. peg. 1837;Archaevibrio fig|196024. 5. peg. 3821; Aeromonas hydrophila fig|244366. 32. peg. 1886; Klebsiella varicella fig|180957. 35. peg. 1654; Brazil fruit gum bacillus fig|55601. 149. peg. 665; Vibrio anguillarum fig|121723. 5. peg. 2901; Photorhabdus ska34 fig|584. 170. peg. 837; Proteobacterium mirabilis fig|40324. 136. peg. 3276; Oligotrophomonas maltophilia fig|1122188. 5. peg. 411; Spongiolyticus spongiolyticus DSM 21749 fig|2032566. 3. peg. 2826; Xanthomonadaceae bacteria NML93-0792 fig|287. 1731. peg. 2578; Green Pseudomonas aeruginosa fig|251702. 3. peg. 1529; Pseudomonas syringae pathogenic variant snapdragon fig|1960829. 3. peg. 5912; Pseudomonas MF6394 fig|76759. 17. peg. 5093; Pseudomonas monteri fig|1981678. 3. peg. 5241; Pseudomonas R45 (2017) fig|1699620. 3. peg. 3028; Pseudomonas RIT-PI-r fig|191391. 4. peg. 2140; Pseudomonas salmoides fig|1844093. 4. peg. 7190; Pseudomonas 22E 5 fig|287. 1744. peg. 1414; Green Pseudomonas aeruginosa fig|287. 1987. peg. 910; Green Pseudomonas aeruginosa fig|287. 4372. peg. 4481; Green Pseudomonas aeruginosa fig|1856685. 4. peg. 2159; Pseudomonas TCU-HL1 fig|1718920. 3. peg. 3357; Pseudomonas ICMP 8385 fig|1781066. 3. peg. 2816; Durum HH101 fig|95485. 5. peg. 60; Stable Burkholderia fig|1572871. 6. peg. 588; Janssenella BJB304 WP_034208069. 1; Burkholderia cepacia WP_074283015. 1; Burkholderia GAS332 fig|1168169. 3. peg. 2570; Methylomonas 11b fig|1899355. 16. peg. 1328; Marine Spirillaceae bacteria WP_093197597. 1; Pseudomonas aeruginosa YR750 fig|1660091. 3. peg. 1650; Bordetella SCN 67-23 fig|134375. 17. peg. 4387; Achromobacterium fig|426114. 10. peg. 1990; Arsenic-oxidizing Thiomonas fig|1947551. 3. peg. 1903; Oligotrophomonas spp. UBA2302 fig|1914330. 4. peg. 2242; Halococcus fig|1947037. 3. peg. 890; Pantoea UBA5707 WP_094422719. 1; Coxsackie WP_079496884. 1; WP_088126255. 1; Enterobacter cobaciformis WP_049614309. 1; Yersinia WP_048263135. 1; Peruvian gum mushroom WP_040197602. 1; Klebsiella pneumoniae fig|669. 34. peg. 1586; Vibrio harveyi fig|672. 219. peg. 1032; Vibrio vulnificus fig|670. 1028. peg. 1775; Vibrio parahaemolyticus WP_065207673. 1; Bright Light Rod Fungus fig|1869214. 3. peg. 2231; Rheinheimia WP_029795910. 1; Vibrio parahaemolyticus fig|1191302. 3. peg. 1081; Vibrio crassicae 9ZC77 fig|668. 70. peg. 1192; Vibrio fischeri fig|28229. 4. peg. 4229; Cold red Corveria fig|1855726. 3. peg. 270; Burkholderia KK1 fig|1674888. 3. peg. 829; Burkholderia Beta_02 fig|687412. 4. peg. 1108; Pseudomonas aquaticus WP_092465129. 1; P. ivory fig|1120653. 3. peg. 5479; Scutellaria LC384 fig|121719. 5. peg. 401; Panronia alkaline lake bacillus fig|1798804. 3. peg. 1597; Rhizobium 58 fig|1946675. 3. peg. 3089; Codymonas spp.UBA4487 fig|36861. 5. peg. 1400; Denitrificans fig|1115835. 3. peg. 1003; Universal Methylophilus 79 fig|1797188. 3. peg. 1508; Acidobacteria RIFCSPLOWO2_12_FULL_60_22 fig|57320. 3. peg. 123; Pseudodesulfovibrio deep fig|1267534. 3. peg. 1238; Acidobacteriaceae KBS 89 fig|1951344. 3. peg. 527; Acidobacteriaceae UBA1307 WP_006226461. 1; Achromobacterium mapulatum fig|1503054. 4. peg. 5764; Burkholderia stagnans WP_006159686. 1; Basel copper bacteria WP_090191767. 1; Unclassified Duchenne fig|539. 8. peg. 1698; Eikenella corrosiva fig|1946925. 3. peg. 2129; Mikavibrio spp. UBA5701 WP_047031309. 1; Hefflera IMCC20628 fig|1946134. 3. peg. 1092; Brevundimonas spp.UBA6547 WP_093914930. 1; Sulfatium marine bacillus fig|1862950. 3. peg. 1234; Rhizobium bacteria NRL2 fig|1166078. 4. peg. 1483; Goldenomonas phylloides fig|709015. 3. peg. 734; Anemone marinus DSM 19842 WP_092160028. 1; Desulfovibrio ferrosulfuricans fig|2026749. 3. peg. 3364; Ignavibacteriae bacteria WP_033771991. 1; Pantoea agglomerata WP_097097099. 1; Unclassified Enterobacteriaceae (mixed) fig|1444151. 3. peg. 2733; Escherichia coli 2-177-06_S3_C2 WP_137545672. 1; Escherichia coli retrotransposons-Eco3 (Ec73) fig|573. 14856. peg. 3852; Klebsiella pneumoniae WP_072021595. 1; Serratia marcescens fig|29571. 3. peg. 478; Halomonas subglacialis WP_004534676. 1; WP_095622523. 1; Halomonas WRN001 fig|376427. 4. peg. 3223; Halomonas gudaoensis fig|862908. 3. peg. 745; Marine halophilic bacteriophage Vibrio SJ SCJ40239. 1; Uncultured Clostridium spp. fig|717962. 3. peg. 287; Feline fecal cocci GD/7 WP_014642259. 1; Halophilic Bacillus halophilus fig|2009042. 3. peg. 2106; Pseudomonas Irchel 3H7 fig|1981718. 3. peg. 4346; Pseudomonas B39 (2017) fig|665135. 13. peg. 1401; Pseudomonas In5 fig|1949067. 3. peg. 5629; Pseudomonas PICF141 WP_007948552. 1; Pseudomonas GM21 SFB61662. 1; Tsuruhaeida delftia WP_011615687. 1; WP_014778098. 1; fig|1429083. 4. peg. 2612; Pseudomonas husseinensis WP_095024014. 1; Pseudomonas WP_090203690. 1; Pseudomonas aeruginosa fig|564423. 8. peg. 1646; Pseudomonas tora NCPPB 2192 WP_078802277. 1; Pseudomonas fluorescens WP_090453229. 1; Pseudomonas fig|1306420. 5. peg. 1032; Burkholderia pseudomallei MSHR5848 fig|1357270. 3. peg. 1923; Pseudomonas syringae UB246 fig|2018067. 3. peg. 2950; Pseudomonas FDAARGOS_380 fig|317. 311. peg. 3241; Pseudomonas syringae fig|287. 2309. peg. 126; Pseudomonas aeruginosa WP_039522442. 1; Brazil fruit gum bacillus WP_080861357. 1; Klebsiella pneumoniae WP_014542745. 1; Evansella sp. Ejp617 OSL25696. 1; Escherichia coli TA255 fig|1125693. 3. peg. 761; Proteobacterium mirabilis WGLW4 KMK80587. 1; Black rot pectin bacteria ICMP 1526 fig|550. 717. peg. 2037; Enterobacter vaginalis ACS86154. 1; Banana Dickie's bacteria Ech703 WP_050122514. 1; Yersinia freundii WP_081334048. 1; Alteromonas macledy WP_055016254. 1; Pseudoalteromonas P1-13-1a fig|56799. 5. peg. 478; Corveria spp. fig|666. 3375. peg. 2486; Vibrio cholerae PIW62005. 1; Shewanella CG12_big_fil_rev_8_21_14_0_65_47_15 OCA54994. 1; Pseudomonas australis CNK75559. 1; Yersinia freundii WP_024248662. 1; Escherichia coli WP_088618141. 1; Cold-tolerant Methyloomycete WP_051669880. 1; PCJ98666. 1; Alteromonadaceae bacteria WP_081919471. 1; Acidithiobacillus ferroferrophilus WP_055769167. 1; Oligotrophomonas WP_039422954. 1; Xanthomonas vesicatoria WP_078568253. 1; Xanthomonas campestris WP_093486747. 1; Unclassified Pseudocanthomonas WP_077445058. 1; Rhodobacterium C05 WP_092576562. 1; Achromobacterium NFACC18-2 fig|1330528. 3. peg. 2198; Escherichia coli NCCP 15656 fig|83655. 67. peg. 2965; non-decarboxylating Luxenbergia fig|573. 10044. peg. 2850; Klebsiella pneumoniae WP_071888955. 1; Enterobacteriaceae fig|1799789. 3. peg. 4357; Hydrolytic water-dwelling bacteria fig|2024839. 8. peg. 1563; Oomycetes fig|1381081. 7. peg. 1167; Vibrio pannuri fig|670. 908. peg. 3444; Vibrio parahaemolyticus fig|626887. 3. peg. 2431; South China Sea Bacillus D15-8W fig|1913989. 101. peg. 1616; Gammaproteobacteria fig|262489. 9. peg. 2938; δ-proteobacterium MLMS-1 fig|2035207. 3. peg. 545; Janssenella 67 fig|28095. 13. peg. 1040; Burkholderia gladiolus fig|941449. 3. peg. 1262; Desulfovibrio X2 fig|1768806. 3. peg. 778; Rhodospirillaceae bacteria CCH5-H10 WP_083634830. 1; Desulfovibrio DV fig|1231. 4. peg. 574; Nitrosospira polymorpha fig|604089. 3. peg. 1142; Chinese psychrotrophic yellow bacterium fig|357523. 3. peg. 1851; Flavobacterium 11 fig|1423323. 5. peg. 321; Flavobacterium AED fig|178356. 3. peg. 502;Xinjiang yellow bacterium fig|1946545. 3. peg. 3457; Flavobacterium UBA4120 fig|150146. 3. peg. 2822; Psoralea corylifolia fig|229203. 4. peg. 1981; Flavobacterium delbrueckii fig|280093. 5. peg. 432; Granular yellow bacterium fig|728056. 4. peg. 1154; Flavobacterium argentatum fig|143224. 8. peg. 2343; Xanthomonas zephyrans fig|1225176. 3. peg. 4300; Cecembia lonarensis LW9 fig|1434700. 3. peg. 581; Sediment of Bacillus mohei fig|996. 47. peg. 468; Flavobacterium columnaris fig|172045. 56. peg. 2231; Elizabethia milleri fig|2024823. 3. peg. 2086;Altibacter genus fig|2026728. 18. peg. 4090; Saffron yellow fungus family bacteria fig|980584. 3. peg. 2930; Agarophageal seawater bacteria fig|1946744. 3. peg. 1682;Lewenhoekia UBA1003 fig|1046627. 3. peg. 2526; Bizionia argentinensis JUB59 fig|906888. 15. peg. 37;Nonlabens ulvanivorans fig|407022. 4. peg. 2865; Domestic olive fungus fig|1500282. 3. peg. 3713; Aureobacterium CF365 WP_084550290. 1; Goldenrod fig|190304. 8. peg. 741; Fusobacterium nucleatum subsp. nucleatum ATCC 25586 fig|1352. 1731. peg. 603; Escherichia coli fig|1428. 658. peg. 666; Bacillus thuringiensis fig|1497681. 3. peg. 3095; Listeria monocytogenes fig|1396. 1440. peg. 4237; Bacillus cereus fig|1917876. 3. peg. 2997; Blautia Marseille-P3087 fig|1952168. 3. peg. 215; Lachnospiraceae bacteria UBA7480 fig|1907659. 3. peg. 1085; Blautia Marseille-P3201T fig|1265309. 16. peg. 461; Bacillus fumigatus F1926 fig|853. 163. peg. 215; Pseudomonas aeruginosa fig|1264. 5. peg. 4; White Ruminococcus fig|1500289. 3. peg. 4469; Aureobacterium OV705 fig|1197728. 3. peg. 2386; Concept Prevotella 9403948 fig|1947486. 3. peg. 2515; Sphingobacterium UBA1897 fig|529. 12. peg. 1303; Candida albicans fig|1523429. 3. peg. 2936; Rhizobium AAP116 fig|1761878. 3. peg. 469; Paenibacillus cl6col fig|1462996. 4. peg. 2634; Yongji Bacillus fig|582475. 4. peg. 4724; Xylose lysine Bacillus fig|1773. 7915. peg. 7638; Mycobacterium tuberculosis fig|360310. 3. peg. 4853; Bacillus CDB3 fig|1396. 515. peg. 2936; Bacillus cereus fig|662367. 4. peg. 242; Endophytic spiral bacteria fig|1895719. 3. peg. 2950; Pseudomonas 45-6 fig|906888. 9. peg. 926;Nonlabens ulvanivorans fig|694433. 3. peg. 2346; Giant saprophytic spirochetes DSM 2844 fig|1167006. 5. peg. 2941; Desulfocapsa sulfexigens DSM 10523 fig|649724. 3. peg. 304; Clostridium ATCC BAA-442 fig|1505. 32. peg. 2959; Clostridium sordellii fig|1953142. 3. peg. 1858; Pseudomonas aeruginosa UBA1947 fig|2029590. 3. peg. 2754; Myxobacterium MD40 fig|29581. 33. peg. 2300; Blue-black-purple fungi fig|40324. 292. peg. 236; Oligotrophomonas maltophilia fig|1403329. 3. peg. 287; Listeria monocytogenes Lm25180 fig|1121865. 3. peg. 1262; Enterococcus Columbiansis DSM 7374 = ATCC 51263 fig|1120746. 3. peg. 3113; Bacteria MS4 fig|1952299. 3. peg. 221; Ruminococcaceae bacteria UBA2656 fig|1965604. 3. peg. 686; Anaerobic Massimobacillum An250 fig|1673717. 3. peg. 805; Anaerobic Massimobaciella senegal WP_116884683. 1; Food Valley Fungus fig|1948697. 3. peg. 196; Lenticoccobacillus UBA4640 fig|1232460. 3. peg. 46; Clostridiales VE202-28 WP_007864340. 1; Clostridiales WP_055649738. 1;Hungatella hathewayi fig|1226325. 3. peg. 2005; Clostridium KLE 1755 fig|1432052. 10. peg. 3166; Eisenbergia tayii fig|208479. 8. peg. 4376; Clostridium portlandii fig|1298920. 3. peg. 1959; [Desulfocholine] DSM 4024 fig|1776047. 3. peg. 4241; Clostridium C105KSO15 fig|1946596. 3. peg. 2399; Hungatella genus UBA4396 fig|1946603. 3. peg. 924; Hungatella UBA7603 fig|1410651. 3. peg. 407; [Clostridium] oxygen-tolerant DSM 5434 fig|1697784. 3. peg. 9617; Clostridium UC5. 1-1D4 fig|1745713. 3. peg. 3865; Massilia fig|180332. 3. peg. 1515;Robinsoniella peoriensis WP_072851604. 1;Lactonifactor longoviformis WP_003507561. 1; Clostridiales fig|1111728. 3. peg. 587; Aquatic Bacteria DSM 5075 = ATCC 35567 fig|1122977. 4. peg. 2473; Springwater Prague bacteria DSM 5563 = ATCC 49100 fig|1950915. 3. peg. 189; Clostridiales UBA644 fig|1950927. 3. peg. 912; Clostridiales bacterium UBA7187 ERK60856. 1; Rhizobacterium KLE 1728 WP_009260579. 1; Clostridium prausnitzii fig|1235797. 3. peg. 2409; Rhizobacterium 1-3 fig|1520815. 3. peg. 1262; Ruminococcaceae bacteria D5 fig|1855302. 3. peg. 1138; Pseudomonas butyricivibrio JW11 fig|43305. 5. peg. 3631;Proteolytic Butyrivibrio fig|411463. 15. peg. 1791; Eubacterium caeruleum ATCC 27560 fig|1235792. 3. peg. 3837; Lachnospiraceae M18-1 fig|97139. 3. peg. 669;Sabdariffa arabica fig|1291051. 3. peg. 1165; Licorice-degrading Mediterranean bacillus JCM 13369 fig|1532. 6. peg. 4793; Blautrachomatis fig|1121114. 4. peg. 5478; Produces Blautia ATCC 27340 = DSM 2950 fig|1262776. 3. peg. 1908; Clostridium CAG:149 fig|1262792. 3. peg. 1164; Clostridium CAG:299 fig|1262995. 3. peg. 2852; Firmicutes CAG:646 fig|537007. 17. peg. 3146; Hansen's blotch DSM 20583 fig|1965569. 3. peg. 1928; Intestinal core bacteria genus An169 fig|1952411. 3. peg. 2018; Ruminococcaceae bacteria UBA6353 fig|1965578. 3. peg. 1947; Pseudoclofusobacterium An187 WP_001775049. 1; Escherichia coli retrotransposons-Eco5 (Ec107) WP_012602583. 1; WP_015962464. 1; Enterobacteriaceae strain FGI 57 fig|1005999. 3. peg. 3342; Leminoa grisea ATCC 33999 = DSM 5078 fig|1378073. 3. peg. 795; Enterobacter CC120223-11 fig|911023. 3. peg. 138; Yorkella reginsburg ATCC 49455 fig|1834193. 3. peg. 4113;Enterococcus 9E7_DIV0242 Fig | 1649188. 10. peg. 406; Listeria monocytogenes fig|1430899. 3. peg. 278; Listeria monocytogenes 1991 fig|1211844. 4. peg. 748; Candidate Stoquefichus Marseille AP9 fig|1658109. 3. peg. 34; Candidate Stoquefichus genus SB1 fig|1262793. 3. peg. 950; Clostridium CAG:302 fig|1262908. 3. peg. 1120; Mycoplasma CAG:956 fig|1674844. 3. peg. 242; Clostridiales Firm_06 fig|1410672. 3. peg. 2823; Ruminococcus flavus ND2009 fig|1947424. 3. peg. 1718; Ruminococcus UBA4310 fig|1265. 9. peg. 2602; yellow Ruminococcus fig|1336236. 3. peg. 1817; Ruminococcus flavus ATCC 19208 CDC65895. 1; Ruminococcus CAG:57 WP_092946213. 1; Ruminococcaceae bacteria YRB3002 fig|1307. 1644. peg. 1532; Streptococcus suis WP_050516365. 1; Escherichia coli WP_097505494. 1; Escherichia coli retrotransposon-Eco5 (Ec107) fig|573. 15585. peg. 2343; Klebsiella pneumoniae WP_023581669. 1; Proteobacterium houerneri WP_079656969. 1; Serratia marcescens WP_090085157. 1; Plant bacillus SCO41 fig|573. 15584. peg. 1543; Klebsiella pneumoniae WP_023330997. 1; Enterobacterium vulgare complex fig|72407. 673. peg. 2552; Klebsiella pneumoniae subsp. pneumoniae CNM01182. 1; Yersinia pseudotuberculosis CNG88012. 1; Yersinia coli fig|1925763. 3. peg. 649; Halobacterium salinarum PKW24121. 1; Haemophilus sp.LV10R510-8 WP_045597342. 1; Vibrio vulnificus WP_098972386. 1; Aeromonas CU5 WP_005172873. 1; Yersinia coli WP_052979504. 1; Enterobacteriaceae WP_083069261. 1; Pantoea agglomerata WP_053911905. 1; Pseudoalteromonas SW0106-04 fig|1916082. 18. peg. 39; Alteromonadaceae bacteria WP_046555216. 1; Arsukibacterium genus MJ3 KPW01986. 1; Pseudoalteromonas P1-8 WP_094277737. 1; Marine Mononas baumani fig|1414654. 3. peg. 2005; Psychrotolerant marine cocci WP_008133621. 1; Unclassified Pseudoalteromonas KQA22543. 1; Vibrio aeruginosa WP_000284440. 1; Vibrio cholerae WP_011261677. 1; Vibrio fischeri KEE40622. 1; WP_012982829. 1; ALL66139. 1; Paraburkholderia caribe MBA4 WP_093223969. 1; Pseudomonas vancomycin fig|2015553. 3. peg. 2940; Pseudomonas PGPPP1 WP_096082869. 1; Pseudomonas aeruginosa ONM67687. 1; Green Pseudomonas aeruginosa fig|316. 213. peg. 2906; Pseudomonas stutzeri WP_078734267. 1; Pseudomonas fluorescens WP_079384669. 1; Pseudomonas aeruginosa WP_095948157. 1; Polyborophage WP_011625020. 1; Shewanella MR-7 WP_100292553. 1; Aeromonas cavernosa WP_055021484. 1; Pseudoalteromonas P1-26 PHS01491. 1; Marine Bacillaceae fig|2024618. 3. peg. 1141; Acinetobacter BS1 WP_114139108. 1; Klebsiella pneumoniae WP_077749737. 1; Pseudomonas FSL W5-0299 WP_078451378. 1; Pseudomonas aeruginosa WP_007245785. 1; Pseudomonas syringae group fig|316. 280. peg. 1454; Pseudomonas stutzeri WP_086822222. 1; Pseudomonas aeruginosa WP_073268605. 1; Pseudomonas pugnorum WP_095280108. 1; Salted seafood Leucoderma WP_095715328. 1; TSA-1 of Citrobacter spp. WP_050111525. 1; Yersinia WP_013724211. 1; Aeromonas veronii WP_021140819. 1; Aeromonas salmonicida fig|1094342. 5. peg. 1611; Alcanivorax xenomutans fig|1932666. 4. peg. 1886; Marinaria WP_087148323. 1; Ferrobacterium polysporum WP_064022638. 1; Methylomonas DH-1 PIY64876. 1; Shewanella CG_4_10_14_0_8_um_filter_42_13 WP_006710190. 1; Vibrio piscicola WP_045040928. 1; Photorhabditis iliacis WP_054543201. 1; Vibrio scintillans WP_080540293. 1; Vibrio vulnificus fig|2032624. 3. peg. 2540; Halomonas WN018 KJT50308. 1; Enteric Salmonella enterica subsp. Heidelberg strain RI-11-014588 retrotransposon-Sen1 (Se72) WP_005761319. 1; ODQ05744. 1; Shigella FC130 KKW01006. 1; Candidate Saccharibacteria bacteria GW2011_GWC2_48_9 KMZ12260. 1; candidate Burkholderia oleracea SFQ04394. 1; Ralstonia NFACC01 WP_025373922. 1;Advenella mimigardefordensis WP_093341200. 1; PDC80 of the genus Pseudomonas WP_091453700. 1; Giesbergeria anulus SAY51889. 1; Neisseria welshii WP_065255232. 1; Moraxella lacunosus WP_049330876. 1; Neisseria fig|1196095. 197. peg. 151; Bee Gillie's fungus WP_072956843. 1; Vibrio aerogenes fig|857087. 3. peg. 3286; Methylomonas MC09 fig|1952222. 3. peg. 1307; Methylococcaceae UBA3127 WP_039486261. 1; Vibrio Sinaloa WP_065545234. 1; Vibrio maxima WP_033094845. 1; Cold red Colwellia WP_057552475. 1; Vibrio cholerae WP_004726393. 1; Vibrio furnissii fig|2020862. 3. peg. 1934; Halophilic Vibrio spp. fig|624. 1260. peg. 1437; Sonneshiga WP_011516221. 1; Burkholderiales fig|1947370. 3. peg. 1923; Brachymycetes UBA4517 WP_038400955. 1; Yersinia pseudotuberculosis fig|1951903. 3. peg. 117; Halieaceae bacteria UBA3099 WP_024914507. 1;Chania multitudinisentens WP_042893228. 1; Enterobacteriaceae WP_038238211. 1; Nematophila pathogenic bacteria EXI65661. 1; Candidate Accumulibacter genus SK-12 WP_016452106. 1; Delftia WP_013517170. 1; Lipocyclilla denitrificans OXC73828. 1; Caballeronia sordidicola AIO65205. 1; Burkholderia oklahomaensis WP_013234866. 1; Spirillum vesiculosus WP_082884385. 1; Fish Rickettsiaceae NZ-RLO1 fig|2006849. 4. peg. 371; Xanthomonadales bacteria WP_074262787. 1; Paraburkholderia phenazinium WP_009906786. 1; Burkholderia thailandensis WP_022524328. 1; WP_081817450. 1; Halomonas HL-48 WP_020312233. 1; Pseudomonas syringae KPY75916. 1; Pseudomonas amygdaloids pathogenic variant nicotianae fig|1793966. 3. peg. 180; Pseudomonas aeruginosa fig|1891229. 16. peg. 2033; Pseudomonadales bacteria WP_099454886. 1; Pseudomonas faecalis WP_092400423. 1; Pseudomonas NFACC39-1 WP_012315430. 1; Pseudomonas faecalis WP_020799819. 1; Pseudomonas G5 (2012) WP_004574016. 1; fig|1435425. 3. peg. 787; Pseudomonas QTF5 WP_045490543. 1; Pseudomonas StFLB209 WP_011506503. 1; requires salt-colored salt bacteria fig|1609967. 3. peg. 3047; Halomonas HG01 fig|1492738. 3. peg. 2698; Flavobacterium seoulense WP_092849245. 1; Pectinophaga algae WP_025835957. 1; Pseudomonas aeruginosa fig|2025877. 3. peg. 668; Parabacterium AT13 fig|246787. 6. peg. 2081; Cellulose pseudobacteria fig|1339287. 3. peg. 1113; Pseudomonas fragilis strain 3986 T(B)9 fig|1946017. 3. peg. 1516; another genus UBA940 WP_038655380. 1; Leech-eating slime mold WP_093669272. 1; Myxobacterium MAR_2009_124 WP_073241067. 1; Flavobacterium inland sea WP_096193803. 1; Cellulophagoales TFI 002 WP_076357635. 1; WP_073238193. 1; Sewage soil bacteria WP_076451370. 1; WP_091906542. 1; Porphyromonadaceae KH3R12 WP_051365712. 1; Saltcap Flavobacterium fig|1938609. 3. peg. 1765; Flavobacterium LM4 SDJ72221. 1; Non-centrifuged Flavobacterium fig|1985174. 3. peg. 2584; Chitinophages IBVUCB2 WP_092737749. 1; Morchella in pigeon throat fig|192149. 3. peg. 42; Murina fig|418630. 3. peg. 1685; Rhodopseudomonas grandis fig|1915314. 3. peg. 3469; Bordetella sulphurea DLFJ5-1 fig|2030815. 3. peg. 2725; Sulfonylureas martensii fig|2035451. 3. peg. 4632; Rhizobium L18 WP_043872258. 1; Indian Ocean Fast-growing Bacillus WP_055683826. 1; Nassauria rubrum fig|1947537. 3. peg. 498; Sphingomonas sp.UBA6198 WP_069065961. 1; Sphingolipids RAC03 WP_084280100. 1; Neosphingobacteria B1 fig|1895845. 3. peg. 487; Sphingolipids 66-54 GAK73419. 1; Agrobacterium truncatum TR3 = NBRC 13261 WP_090966398. 1; Goldenomonas phyllostomonas WP_091860144. 1; Bosella robiniae WP_085092006. 1; Rice Azospirillum fig|1528100. 4. peg. 28; Methylomagnum ishizawai fig|32057. 3. peg. 9515; PCC 7103 fig|103690. 10. peg. 3571; Nostoc PCC 7120 = FACHB-418 fig|1137095. 11. peg. 15; Pseudocladens HK-05 CDZ48826. 1; Neorhizobium galegae bv. officinalis) WP_072340070. 1; Devosulia enhydra OYR18277. 1; Thiophene candida WP_093509439. 1; Sphingomonas YR583 WP_081799025. 1; Neosphingolipids feeding on tree resin PIY55545. 1;ζ-Proteobacteria CG_4_10_14_0_8_um_filter_49_80 SDT44912. 1; Bradyrhizobium canariensis WP_096350346. 1; WP_074962594. 1; Nassauria rubrum WP_038724888. 1; Burkholderia pseudomallei WP_012217410. 1; Burkholderia multiphaga WP_100428762. 1; Janssenella 67 WP_082161008. 1; Candidate denitrifying competitive Bacillus AFL73219. 1; Thiocystis purpurogenum DSM 198 WP_014427842. 1; fig|364030. 3. peg. 3554; Thiomonas elaborata KGW20495. 1; Burkholderia pseudomallei MSHR2451 SFE83076. 1; Coprophagnum OK212 WP_013028226. 1;Sideroxydans lithotrophicus WP_080311424. 1; Burkholderia pseudomallei fig|337. 13. peg. 3872; Burkholderia vesiculata WP_082643860. 1; Pseudomonas CKH90039. 1; Pseudomonas aeruginosa WP_083287254. 1; Unclassified Johnson bacteria WP_122648546. 1; Burkholderia pseudomallei WP_082706753. 1; Unclassified Pseudomonas WP_080936076. 1; Klebsiella pneumoniae WP_000746343. 1; Enterobacteriaceae EMX54653. 1; Escherichia coli MP020980. 2 WP_053270700. 1; Escherichia coli fig|1736224. 3. peg. 3731; Serratia Leaf51 fig|1175299. 4. peg. 709; Dictyophora maydis ZJU1202 WP_001461245. 1; Enterobacteriaceae fig|617145. 3. peg. 3535; Vibrio scindens 1F-157 fig|1440054. 3. peg. 3851; Vibrio OY15 fig|617135. 3. peg. 594; Vibrio fischeri ZF-211 WP_023267764. 1; Shewanella decoloris fig|1481663. 36. peg. 3628; Meteorological Vibrio fig|670. 893. peg. 2716; Vibrio parahaemolyticus fig|680. 33. peg. 5391; Vibrio campbeni fig|298386. 8. peg. 4344; Deep-sea Photorhabdus SS9 fig|663. 73. peg. 714; Vibrio alginolyticus fig|1333511. 3. peg. 3208; Pseudoalteromonas maritima TAB23 WP_064574154. 1; Hafnia paraalvei WP_064645509. 1; Proteobacteria fig|630. 105. peg. 4248; Yersinia coli fig|400673. 7. peg. 1969; Legionella pneumophila strain Corby WP_092678546. 1; Rosenbergiella nectarea WP_069476513. 1; Raoultia oxyornithine fig|1267535. 3. peg. 2394; Bryobacterales bacteria KBS 96 WP_000446053. 1; Acinetobacter baumanni fig|1948587. 3. peg. 786; Gammaproteobacteria UBA1902 WP_014949305. 1; Alteromonas macledy fig|1797397. 3. peg. 2386; Bdellovibrioles bacteria RIFOXYC1_FULL_54_43 fig|1386968. 3. peg. 847; Francisella tularensis subsp. PA10-7858 WP_074900850. 1; fig|1975705. 3. peg. 898; Psychrobacterium FDAARGOS_221 WP_066184577. 1; Toxoplasma fig|1780380. 4. peg. 4010; Fungal bacteria CHKCI004 fig|556261. 3. peg. 2546; Clostridium D5 fig|1193534. 6. peg. 2375; Uncultured Fusobacteria fig|1042163. 3. peg. 3771; Brevibacterium brevisporum LMG 15441 WP_062492190. 1; Paenibacillus 32O-W WP_081674606. 1; Harbin Lactobacillus WP_050781686. 1; Lactobacillus rod-shaped bacteria WP_021109137. 1; Enterococcus faecium WP_046309803. 1; Staphylococcus CBL03706. 1; Pamela Gordonia 7-10-1-b WP_090944285. 1; Pelosinus propionicus WP_077305443. 1; Clostridium beijerinckii fig|410072. 5. peg. 40; Accompanying Pseudomonas aeruginosa WP_011669870. 1; Leptospira poppatersonii WP_015565235. 1; Pseudomonas aeruginosa CUO23478. 1; Pseudomonas aeruginosa WP_085748688. 1; Rhizobium Glucophilum WP_093270014. 1; Psychrobacterium OK032 SHE86352. 1; Atopostipes suicloacalis DSM 15692 WP_000346292. 1; Unclassified Streptococcus WP_080465410. 1; Lactobacillus plantarum WP_080662531. 1; Lactobacillus brevis WP_093131554. 1; Bacillus koningii WP_093336905. 1; Halotoxin-tolerant Bacillus fig|1974627. 3. peg. 386; Candidate Levenomyces bacteria CG_4_9_14_0_2_um_filter_35_21 fig|1802603. 3. peg. 453; Candidate Woykebacteria RIFCSPHIGHO2_12_FULL_ 45_10 fig|392734. 5. peg. 3006; K. rosea AGL61879. 1; Candidate Saccharimonas aalborgensis fig|319224. 16. peg. 2726; Shewanella putrefaciens CN-32 fig|1720343. 3. peg. 1263; Pseudoalteromonas 1_2015MBL_MicDiv fig|1136158. 3. peg. 3691; Cyclovibrio 1F97 fig|666. 3017. peg. 1000; Vibrio cholerae fig|1909458. 3. peg. 2277; Halovibrio ML198 fig|1638949. 3. peg. 831; Vibrio spp.ECSMB14106 fig|493915. 3. peg. 158; Pseudoalteromonas NJ631 BAC94535. 1; Vibrio vulnificus YJ016 fig|1191313. 3. peg. 1135; Vibrio scindens 1S-124 fig|670. 1244. peg. 3807; Vibrio parahaemolyticus fig|1659714. 3. peg. 4264; Citrobacter brunneri fig|1192730. 4. peg. 1976; Enteric Salmonella enterica subspecies Kintambo serotype fig|550. 1216. peg. 4296; Enterobacter vulvae WP_072269713. 1; Serratia WP_053898075. 1; Escherichia coli fig|624. 1264. peg. 1635; Songnei Zhiheba fig|1181777. 3. peg. 78; Escherichia coli KTE233 fig|1802256. 3. peg. 310; Thiomonas RIFOXYB12_FULL_35_9 PHR73342. 1; Toxoplasma fig|2014260. 3. peg. 3813; Bacteria (candidate Blackallbacteria) CG13_big_fil_rev_8_21_14_ 2_50_49_14 WP_042497590. 1; Vibrio marineus WP_063522799. 1; Vibrio HI00D65 WP_004186757. 1; Enterobacteriaceae WP_040122746. 1; Vibrio WP_086046550. 1; Vibrio harveyi group WP_063849005. 1; Enterobacter vulvae WP_023486614. 1; Enterobacteriaceae WP_070992278. 1; Pseudoalteromonas fig|1005665. 3. peg. 2532; Coxsackia oryzendophytica fig|1219066. 3. peg. 3636; Vibrio parahaemolyticus NBRC 12711 fig|1225184. 4. peg. 1222; Pantoea A4 fig|675814. 3. peg. 1256; Vibrio corallii ATCC BAA-450 SFR59865. 1; Pseudomonas butyricivibrio NOR37 fig|853. 16. peg. 1112; Pseudomonas aeruginosa fig|1965572. 3. peg. 1423; Pseudoclosporin An176 fig|588581. 3. peg. 3589; Clostridium papulolyticum DSM 2782 fig|1396. 1409. peg. 4169; Bacillus cereus fig|1428. 538. peg. 4047; Bacillus thuringiensis fig|1465. 16. peg. 946; Brevibacterium brevisporum WP_087385137. 1; AIF42417. 1; Cladosporium SK37 WP_076543941. 1; Halophilic anaerobic bacteria fig|1121093. 3. peg. 3089; Bacillus spondylodis subtilis DSM 19096 fig|29367. 3. peg. 2029; Clostridium punicatum WP_089719707. 1; Congo halophilic anaerobic bacteria fig|307249. 3. peg. 3585; Uncultured Mycosporium WP_072949666. 1; Ruminococcus flavus CCX81854. 1; Ruminococcus CAG:108 fig|1491. 669. peg. 2217; Clostridium botulinum fig|1872455. 3. peg. 401; Alkaliphila fig|576117. 5. peg. 4005; Halophilic fast-growing Bacillus fig|1225647. 3. peg. 1829; Pseudomonas 11ANDIMAR09 fig|1380380. 4. peg. 1574; Arensella 13_GOM-1096m fig|293. 7. peg. 2956; Brevundimonas diminuta WP_095437634. 1; Rhizobium 11515TR fig|1912891. 7. peg. 702; Sphingolipids fig|1736574. 3. peg. 4024; Pseudomonas Root630 fig|227946. 13. peg. 4105; Xanthomonas translucentus poae pathogenic variant fig|1761791. 3. peg. 4793; Lysobacterium yr284 fig|1560195. 5. peg. 485; Janssenella BJB301 fig|1503054. 43. peg. 6257; Burkholderia stagnans fig|1207504. 10. peg. 4279; Burkholderia pseudopolyphaga WP_092172515. 1; Unclassified Pseudomonas WP_074815429. 1; Pseudomonas syringae ffig|150146. 3. peg. 3162; Yellow bacterium of Gili fig|76832. 8. peg. 3775; pseudo-scented scent fungi fig|1202724. 3. peg. 994; Akkaya flavonoids fig|1805473. 3. peg. 3678; Flavour bacillus timonianum fig|253. 33. peg. 3826; Indole-producing Aureobacterium WP_076561634. 1; Aureobacterium inertum fig|2024823. 3. peg. 95; Altibacter genus fig|1250278. 4. peg. 3462; Halobacterium Hel_I_6 fig|1797342. 3. peg. 689; Pseudomonas GWF2_33_38 WP_084184261. 1; Urolyticum aureobacterium fig|1948560. 3. peg. 3003; Delta-Proteobacteria UBA6106 fig|1392. 364. peg. 2564; Bacillus anthracis fig|872970. 3. peg. 1713; Marine amphibian bacteria fig|1385514. 3. peg. 313; Yanchenghai Bacillus Y32 fig|76853. 4. peg. 2614; Silver Bacillus fig|1423774. 3. peg. 1262; Lactobacillus nantes DSM 16982 fig|1410670. 3. peg. 2844; Ruminococcus aureus MA2007 fig|169435. 7. peg. 1348; Anaerobic club bacteria fig|1946597. 3. peg. 2104;Hungatella genus UBA4568 fig|1948087. 3. peg. 796; Firmicutes bacteria UBA6113 fig|642492. 3. peg. 2638; Cellulosilyticum lentocellum DSM 5427 fig|1950841. 3. peg. 2383; Clostridiales UBA2436 fig|555512. 3. peg. 1251; Marine Salipiger fig|383381. 3. peg. 2538; Rhodobacterium JL475 WP_081629462. 1; fig|1736258. 3. peg. 3392; Methylobacterium Leaf112 fig|1950192. 3. peg. 426; Anaerobic Rhizobacteriales bacterium UBA2232 fig|170623. 6. peg. 4661; Azotobacter beijer fig|170623. 7. peg. 704; Azotobacter beijer fig|1981099. 3. peg. 513;Niveispirillum lacus fig|1250539. 3. peg. 3491; Deep-sea olive fungus fig|1947582. 3. peg. 2979; Sulfatium spp. UBA1132 fig|1909294. 17. peg. 3456; Rhizobium bacteria fig|1735583. 3. peg. 1657; Pseudovibrio W64 fig|670. 1220. peg. 4688; Vibrio parahaemolyticus fig|1004786. 3. peg. 925; Alteromonas mediterranea DE1 fig|2013797. 3. peg. 2109;γ-Proteobacteria HGW-γ-Proteobacteria-15 fig|1948580. 3. peg. 3400; Gammaproteobacteria UBA1012 fig|1714300. 3. peg. 306; Deep-sea sea bacillus fig|1961547. 3. peg. 1371; Desulfobacteriaceae UBA2273 fig|441162. 10. peg. 6621; Burkholderia oklahomaensis C6786 fig|615. 307. peg. 4666; Serratia marcescens fig|631. 3. peg. 1883; Yarrowia intermedia fig|1763535. 3. peg. 1547; Hydrogenophage crassostreae fig|43263. 5. peg. 2702; Alkali-producing Pseudomonas fig|244366. 46. peg. 3595; Klebsiella varicella fig|1224150. 8. peg. 3856; Dickinia banana NCPPB 2511 fig|61645. 10. peg. 2019; Enterobacter aegypti fig|1948706. 3. peg. 2225; Bacteria fengyouensis UBA1333 fig|2026771. 13. peg. 1697; Fungal bacteria fig|2026771. 11. peg. 1955; Fungal bacteria fig|2026772. 5. peg. 424; Fengyou bacteria fig|2026801. 20. peg. 1798; Verrucomicrobia bacteria fig|2026801. 14. peg. 1176; Verrucomicrobia bacteria fig|1951369. 3. peg. 1157; Akkermansiaceae UBA6946 fig|1977087. 12. peg. 1918; Proteobacteria fig|2026779. 14. peg. 4171; Fungi of the family Planctomycetes fig|2026779. 28. peg. 3264; Fungi of the family Planctomycetes fig|2026779. 30. peg. 3181; Fungi of the family Planctomycetes fig|2026779. 29. peg. 2310; Fungi of the family Planctomycetes fig|1797235. 3. peg. 3; Aerithromyces RIFCSPHIGHO2_12_41_5 fig|316. 284. peg. 937; Pseudomonas stutzeri fig|296. 11. peg. 442; Pseudomonas fragilis fig|1981714. 3. peg. 993; Pseudomonas B5 (2017) fig|50340. 44. peg. 6020; Pseudomonas sphaeroides fig|1761897. 3. peg. 509; Pseudomonas ok272 fig|1402514. 3. peg. 154; Pseudomonas aeruginosa BWHPSA014 fig|1938440. 3. peg. 5997; Pseudomonas T fig|1566250. 3. peg. 959; Pseudomonas NFACC02 fig|316. 357. peg. 479; Pseudomonas stutzeri fig|287. 4433. peg. 2945; Green Pseudomonas aeruginosa fig|1970515. 3. peg. 709; Hydrogenophiles 12-61-10 fig|95486. 85. peg. 1748; Burkholderia neocypris fig|292. 61. peg. 8104; Burkholderia cepacia fig|1408450. 3. peg. 3766; Methylobacterium tundripaludum 21/22 fig|157910. 3. peg. 5727; Uplift Vice Burkholder ig|279058. 16. peg. 4239; Monospora arenae fig|1537272. 3. peg. 1916; Janssenella HH100 fig|1218081. 3. peg. 1751; Kururi Subburkholderia sulphur-oxidizing subsp. NBRC 107107 fig|573. 14059. peg. 3113; Klebsiella pneumoniae fig|40324. 192. peg. 51; Oligotrophomonas maltophilia fig|1219041. 3. peg. 4613; Sphingomonas azotobacter NBRC 15497 fig|1561196. 3. peg. 560; Burkholderia E7m39 fig|1882750. 3. peg. 1035; Burkholderia GAS332 fig|1736266. 3. peg. 1145; Durum Leaf126 fig|2015350. 3. peg. 1640; Burkholderia AU18528 fig|58133. 4. peg. 815; Nitrosospira NpAV fig|1691980. 3. peg. 1912; Rhodotorulaceae Paddy-1 fig|305. 393. peg. 1023; Ralstonia solanacearum fig|56449. 3. peg. 3604; Bromoxanthomonas fig|1281282. 5. peg. 1894; Xanthomonas campestris pathogenic variant CN14 fig|40324. 334. peg. 1103; Oligotrophomonas maltophilia fig|1349793. 3. peg. 2529; Spiral Hydrophage NBRC 102512 fig|1842727. 3. peg. 1491; Korean red bacteria fig|1619952. 3. peg. 5158; Burkholderiaceae 16 fig|1970380. 3. peg. 1914; Halostaurum 14-55-98 fig|2015568. 3. peg. 2963; Burkholderia PBB6 fig|1752215. 3. peg. 2312; Gammaproteobacteria Ga0077554 fig|1706231. 5. peg. 3125; Janssenella CG23_2 fig|2013716. 3. peg. 2169; β-Proteobacteria HGW-β-Proteobacteria-4 fig|1946997. 3. peg. 3049; Nitrospira UBA7655 fig|765913. 3. peg. 2527; Rhodococcus delbrueckii AZ1 fig|1743159. 3. peg. 1891; Polymyxin Bacillus fig|1597955. 3. peg. 3923; Haemophilus DM1 fig|1184267. 3. peg. 1626; Bdellovibrio exotrophicus JSS fig|101571. 190. peg. 3007; Uboburkholderia fig|123899. 5. peg. 1710; Bordetella woundis fig|463035. 3. peg. 3900; Bordetella genotype 12 fig|1395608. 4. peg. 211; Bordetella genotype 5 fig|1947379. 3. peg. 2784; Rhodophyte UBA5149 WP_074294985. 1; Paraburkholderia phenazinium fig|1324617. 3. peg. 820; Vice Burkholder aspalathi fig|80868. 3. peg. 3458; Acidophage bacterium Carterii fig|1388764. 3. peg. 1840; Pseudomonas ferrooxidans EGD-HP2 fig|251747. 15. peg. 4695; Chromobacteria under Hemlock fig|670. 1020. peg. 382; Vibrio parahaemolyticus fig|1055803. 3. peg. 1434; Pseudoalteromonas TB51 fig|1201036. 3. peg. 177; Pseudocalbuminae AO18b fig|1220581. 4. peg. 1434; Agrobacterium NBRC 13257 fig|398. 6. peg. 6695; Tropical Rhizobium fig|931866. 6. peg. 8184; Bradyrhizobium ottawa fig|142585. 3. peg. 1658; Bradyrhizobium C9 fig|1082933. 13. peg. 1537; Rhizobium in Amorpha fruticosa CCNWGS0123 fig|1768789. 3. peg. 791; Methylobacterium CCH7-A2 fig|1381123. 3. peg. 3819;Aliihoeflea 2WW fig|1297570. 3. peg. 1970; Mesorhizobium STM 4661 fig|935546. 3. peg. 3816; Rhizobium NZP2037 in Rhizobium radix fig|1128253. 3. peg. 1960; Bradyrhizobium japonicum CCBAU 15354 fig|1444315. 4. peg. 3983; Capsicum azetiliensis AZ78 fig|1185327. 3. peg. 1608; Xanthomonas cassava wilt pathogenic variant strain Xam668 fig|1881043. 3. peg. 2597; Pseudomonas GM95 ALN84423. 1; Capsicum lysate fig|56460. 15. peg. 1977; Xanthomonas vesicatoria fig|1317116. 6. peg. 2759; Marine Bacteria 22II-s10i fig|564137. 3. peg. 4320; Antarctic rose-colored lemon fungus fig|1952800. 3. peg. 3583; Rhodobacteriaceae UBA2553 fig|218673. 12. peg. 3041; Suspected Sulfatium fig|1912092. 3. peg. 2119; Sediment National Institute of Oceanography Bacteria fig|1736558. 3. peg. 5006; Sword mushroom Root558 fig|91360. 5. peg. 3717; Desulfurizing Clavatechus singaporeans fig|1948756. 3. peg. 2576; Borrelia spirochetes UBA2205 fig|1855322. 3. peg. 103; Bradyrhizobium Rc3b fig|1437360. 11. peg. 2429; Bradyrhizobium erythrocytosis fig|1871052. 3. peg. 1026; Aphidonia fig|1038860. 3. peg. 8756; Bradyrhizobium elsdenii WSM2783 fig|1898112. 54. peg. 3758; Rhodospirillaceae bacteria fig|1660129. 3. peg. 4854; Pseudomonas SCN 70-31 fig|1482074. 3. peg. 4109; Hartmannii nitrogen-fixing bacteria fig|1970306. 3. peg. 552; Acidobacterium 35-58-6 fig|1686310. 5. peg. 1409; Bartonella apis fig|1798192. 3. peg. 1953; Conch genus KO164 fig|1235461. 17. peg. 11; Sinorhizobium lucerne GR4 fig|442. 12. peg. 222; Gluconobacter oxydans fig|1938607. 3. peg. 1954; Sphingomonas LM7 fig|1231624. 3. peg. 39; Bogor Asai bacteria NBRC 16594 fig|1121271. 3. peg. 4112; Armillaria mellea DSM 15620 fig|33059. 16. peg. 1690; Thermothorax acidothiobacil fig|502025. 10. peg. 925; Halobacterium ochraceum DSM 14365 fig|1734406. 3. peg. 691; α-Proteobacteria BRH_c36 fig|1979207. 3. peg. 4304; Brachymetra fig|1953057. 3. peg. 74; UBA4496 of the family Brachymectomycetaceae fig|858423. 3. peg. 10004; Bradyrhizobium oleraceum fig|267128. 3. peg. 2015; Granulosphingomonas fig|582667. 3. peg. 5553; Methylobacterium sicouracifolium fig|1187852. 3. peg. 2712; Methylobacterium tahanianum fig|582675. 3. peg. 1247; Methylobacterium gesipekola fig|1951640. 3. peg. 515; Deferiorhizium bacterium UBA6799 fig|1948417. 4. peg. 1606; α-Proteobacteria UBA6187 fig|45074. 5. peg. 981; Legionella of the Holy Cross fig|1434232. 4. peg. 2927; Magnetofaba australis IT-1 fig|1945950. 3. peg. 3568; Acinetobacter spp. UBA6526 fig|106654. twenty two. peg. 994; Hospital Acinetobacter fig|1977883. 3. peg. 3023; Acinetobacter ANC 3903 fig|1945948. 3. peg. 700; Acinetobacter spp. UBA5984 fig|1226327. 3. peg. 2796; Acinetobacter kutschii fig|1879049. 4. peg. 5949; Acinetobacter WCHAc010034 fig|1945955. 3. peg. 1951; Acinetobacter spp. UBA7614 fig|1675530. 3. peg. 2149; Acinetobacter genotype 33YU fig|1310638. 3. peg. 1006; Aegilops baumanni 1437282 fig|1400001. 4. peg. 34; Marseilles bacillus fig|1132496. 5. peg. 136; Pasteurella multocida subsp. multocida strain HN06 fig|1908263. 4. peg. 2604; Trehalose saccharomyces fig|375432. 4. peg. 200; Haemophilus influenzae R3021 fig|400668. 8. peg. 3776; Marinomonas MWYL1 fig|1913989. 193. peg. 841; Gammaproteobacteria fig|856793. 5. peg. 1975; Microvibrio cupreus ARL-13 SBW23286. 1; European Bacillus citrate fig|1736225. 3. peg. 985; Evansella Leaf53 fig|29486. 12. peg. 818; Yersinia ruckeri fig|914128. 3. peg. 2502;Commensal Serratia strain Tucson fig|1796497. 3. peg. 952;Grimontia celer fig|1095649. 3. peg. 3298; Vibrio cholerae O1 strain EM-1676A fig|137584. 4. peg. 1627;Deep-sea monas viridans fig|173990. 3. peg. 1773; Rheinheimia pacifica fig|1720343. 3. peg. 3189; Pseudoalteromonas 1_2015MBL_MicDiv fig|1202962. 4. peg. 1481; M. maritima ATCC 15381 fig|669. 50. peg. 2993; Vibrio harveyi fig|691. 32. peg. 1517; Vibrio natrigensis fig|156578. 3. peg. 2521; Alteromonadales bacteria TW-7 fig|661. 14. peg. 380; Pseudomonas aeruginosa fig|654. 94. peg. 1733; Aeromonas veronii fig|703. 9. peg. 319; Shigamonas fig|589873. 36. peg. 1971; Alteromonas australis fig|28107. 3. peg. 3571; Pseudoalteromonas espegia fig|1547444. 3. peg. 4264; Pseudoalteromonas PLSV fig|629266. 7. peg. 847; Pseudomonas syringae var. kiwifruit strain M302091 fig|251722. 19. peg. 4059; Pseudomonas aesculi pathogenic variant fig|587851. 4. peg. 1470; Pseudomonas chlororaphis subsp. aurea fig|1265490. 3. peg. 2330; Pseudomonas URMO17WK12:I8 fig|316. 101. peg. 3534; Pseudomonas stutzeri fig|1916993. 3. peg. 4917; Pseudomonas foetida fig|1628833. 3. peg. 2448; Pseudomonas ES3-33 fig|1283291. 4. peg. 1991; Pseudomonas URMO17WK12:I11 fig|83963. 5. peg. 3885; Pseudomonas syringae herpes simplex virus variant fig|1206777. 3. peg. 4334; Pseudomonas Lz4W fig|113268. 3. peg. 3785; Deep-sea mussel methane-trophic gill symbiosis fig|1131284. 3. peg. 1562;ζ-Proteobacterium SCGC AB-137-C09 fig|2026807. 7. peg. 2258;ζ-Proteobacteria fig|281689. 4. peg. 2060; Desulfomonas acetyloxidans DSM 684 fig|1188231. 4. peg. 1200; Ferrous oxide deep-sea bacteria M34 fig|1367489. 3. peg. 682; Vibrio fischeri SA1G fig|1873135. 3. peg. 4249; Shewanella spp. SACH fig|663. 73. peg. 2465; Vibrio alginolyticus fig|1588629. 3. peg. 1134; Aeromonas L_1B5_3 fig|1121922. 3. peg. 3454;Glaciecola pallidula DSM 14239 = ACAM 615 fig|351745. 9. peg. 2506; Shewanella W3-18-1 fig|29497. 20. peg. 3798; Vibrio scintillans fig|1367486. 3. peg. 187; Vibrio fischeri CB37 fig|511062. 4. peg. 1890; Marine Mononas GK1 fig|654. 12. peg. 188; Aeromonas veronii fig|29497. twenty one. peg. 4482; Vibrio scintillans fig|1659713. 3. peg. 560; Enterobacter baumannii fig|1124991. 3. peg. 3617; Morganella morganii subsp. morganii KT fig|104623. 3. peg. 1381; Serratia ATCC 39006 fig|1256989. 3. peg. 902; Providencia alkali-producing R90-1475 fig|1125694. 3. peg. 1143; Proteobacterium mirabilis WGLW6 fig|574096. 6. peg. 2693; Pantotrichum garlici fig|1095774. 3. peg. 2623; Pantoea pineapple PA13 fig|869692. 4. peg. 2910; Escherichia coli 3003 WP_140159440. 1; Escherichia coli retrotransposons-Eco2 (Ec67) fig|550. 437. peg. 1444; Enterobacter vulvae fig|573. 13605. peg. 2600; Klebsiella pneumoniae fig|550. 285. peg. 3783; Enterobacter vulvae fig|1265672. 3. peg. 3869; Enteric Salmonella enterica subspecies Agona serotype 70. E. 05 fig|573. 10028. peg. 542; Klebsiella pneumoniae fig|749537. 3. peg. 218; Escherichia coli MS 115-1 ANK06786. 1; Escherichia coli O25b:H4 fig|670. 880. peg. 975; Vibrio parahaemolyticus fig|1192730. 4. peg. 3; Enteric Salmonella enterica subspecies Kintambo serotype fig|1224144. 4. peg. 4030; Dickinsonella CSL RW240 fig|568766. 10. peg. 2937; Dickie's bacterium NCPPB 3274 fig|1076549. 3. peg. 4260; Pantoea rhoda fig|548. 102. peg. 3401; Klebsiella aerogenes fig|630. 90. peg. 1795; Yersinia coli fig|79883. 5. peg. 266; Bacillus horikoshii fig|180861. 3. peg. 3762; Bacillus thuringiensis serovar Sumiyoshi fig|1390. 157. peg. 339; Bacillus starch-solubilizing fig|293386. 15. peg. 304; Stratospheric Bacillus fig|1053181. 3. peg. 3820; Bacillus cereus BAG2X1-3 fig|1884375. 3. peg. 681; Paenibacillus PDC88 fig|334735. 5. peg. 923; Korean spore-forming cocci fig|79884. 3. peg. 1120; Bacillus pseudoalkalinophilus fig|1628206. 3. peg. 4802; Bacillus LK2 fig|1396. 1605. peg. 6235; Bacillus cereus fig|182710. 3. peg. 317; Yihai Ocean Bacillus fig|860. 10. peg. 486; Clostridium periodontalis fig|1855308. 3. peg. 1467; Trichococcus ellipsis fig|931626. 3. peg. 151; Acetobacterium wustii DSM 1030 fig|1965575. 3. peg. 2547; Intestinal core bacteria genus An181 fig|1352. 2757. peg. 71; Enterococcus faecium fig|1299895. 3. peg. 900; Listeria monocytogenes CFSAN002349 fig|53346. 29. peg. 1591; Enterococcus mansoni fig|1649188. 10. peg. 1545; Listeria monocytogenes indica fig|158847. 6. peg. 432; Super Megamonas fig|1121289. 3. peg. 2775; Eat less Clostridiisalibacter DSM 22131 fig|1950885. 3. peg. 858; Clostridiales UBA4693 fig|1965576. 3. peg. 1978; Pseudoclosporium An184 fig|1952416. 3. peg. 1629; Ruminococcaceae bacteria UBA642 fig|1262803. 3. peg. 8; Clostridium CAG:413 fig|28037. 216. peg. 60; Mild Streptococcus fig|1074052. 3. peg. 33; Streptococcus tc-9 fig|1304. 207. peg. 1536; Streptococcus salivarius fig|1154859. 3. peg. 955; Streptococcus agalactiae LMG 14609 fig|1080071. 3. peg. 332; Streptococcus oris fig|1139219. 3. peg. 2194; Enterococcus dispar ATCC 51266 fig|1834176. 3. peg. 811; Enterococcus 3G1_DIV0629 fig|1622. 15. peg. 947; Lactobacillus murinus fig|565651. 6. peg. 1942; Enterococcus faecalis ARO1/DG fig|1473546. 3. peg. 703; Lysine Bacillus sp. BF-4 fig|37734. 13. peg. 137; Enterococcus carinii fig|492670. 92. peg. 623; Bacillus velesiae fig|1639. 1907. peg. 2641; Listeria monocytogenes fig|1123489. 3. peg. 170; Large Weilmannella DSM 19857 fig|1280687. 3. peg. 1880; Vibrio fibronectin YRB2005 fig|1262889. 3. peg. 680; Fungus CAG:38 fig|1235800. 3. peg. 2226; Lachnospiraceae bacteria 10-1 fig|1897035. 3. peg. 445; Firmicutes CAG:552_39_19 fig|199. 588. peg. 774; Simple Bend Fungus fig|1111133. 4. peg. 219; Bacteroides BV3AC2 fig|936589. 3. peg. 875; Veronella AS16 WP_070600378. 1; fig|1896998. 3. peg. 1750; Pseudomonas CAG:131 related_45_246 fig|41170. 3. peg. 3013; Microbacterium acetyltransferase fig|59620. 44. peg. 897; Uncultured Clostridium spp. fig|1262843. 3. peg. 313; Clostridium CAG:813 fig|1262834. 3. peg. 1287; Clostridium CAG:715 fig|1256219. 3. peg. 760; Lactobacillus paracasei subsp. paracasei Lpp230 fig|115778. 31. peg. 1994; Leuconostoc pyralis subsp. fig|29385. 174. peg. 531; Saprophytic Staphylococcus fig|1295. twenty one. peg. 75; Staphylococcus schleife fig|148814. 13. peg. 1360; Lactobacillus quinquefolius fig|1282. 1242. peg. 673; Staphylococcus epidermidis fig|1581078. 3. peg. 1186; Staphylococcus HMSC10C03 fig|1891097. 3. peg. 280; Macrococcus gattii WP_080703103. 1; fig|1214184. 3. peg. 1129; Streptococcus suis 22083 fig|1154771. 3. peg. 209; Streptococcus agalactiae FSL C1-487 fig|1415765. 3. peg. 1578; Mild Streptococcus 21/39 fig|1581074. 3. peg. 720; Granular Streptozoa HMSC31F03 fig|1349. 233. peg. 712; Streptococcus uberis fig|1946281. 3. peg. 392; Catarrhalis spp. UBA5893 fig|1328309. 5. peg. 1889; Lactobacillus plantarum IPLA88 fig|1214190. 3. peg. 2034; Streptococcus suis YS17 fig|29385. 135. peg. 2098; Saprophytic Staphylococcus fig|1715184. 3. peg. 1265; Aerococcus HMSC035B07 fig|1881068. 3. peg. 2940; Sphingomonas OV641 fig|1522072. 3. peg. 3829; Sphingolipids ba1 fig|1802172. 3. peg. 237; Sphingomonas RIFCSPHIGHO2_12_FULL_65_19 fig|1128204. 3. peg. 2189; Bradyrhizobium elsdenii CCBAU 43297 fig|1708715. 5. peg. 4517; Sword mushroom aridi fig|195105. 3. peg. 2062; Haemobacterium masaiense fig|1283312. 3. peg. 4182; Sphingomonas vermifuss DC-6 fig|1120654. 4. peg. 406; Scutellaria LC499 fig|529. 36. peg. 3144; Candida albicans fig|1194716. 3. peg. 4774; Sinorhizobium lucerne AK75 fig|1660088. 4. peg. 2967; Agrobacterium SCN 61-19 fig|1951259. 3. peg. 2515; Sphingomonasales bacterium UBA6174 fig|1912891. 5. peg. 2102; Sphingolipids fig|1670800. 3. peg. 1844; Rhizobia in the ocean fig|2032658. 3. peg. 157; α-Proteobacteria WMHbin7 fig|1819565. 5. peg. 2208; Marine Flavimaricola fig|1245469. 3. peg. 1160; Oligotrophic Bradyrhizobium S58 fig|1615890. 4. peg. 173; Bradyrhizobium LTSP849 fig|56454. 3. peg. 3464; Xanthomonas hortorum fig|40324. 384. peg. 1060; Oligotrophomonas maltophilia fig|1801972. 3. peg. 1832; Fungi RBG_19FT_COMBO_48_8 fig|1978765. 3. peg. 3488; Nitrospira ST-bin5 fig|2009322. 3. peg. 2770; Ophanophyta ohadii IS1 fig|1325564. 3. peg. 3733; Nitrospira japonica fig|43662. 9. peg. 1688; Pseudoalteromonas fish-killing fig|670. 134. peg. 4439; Vibrio parahaemolyticus fig|998520. 3. peg. 3325; Pseudoalteromonas agaricus fig|1723759. 3. peg. 401; Pseudoalteromonas P1-26 fig|672. 133. peg. 585; Vibrio vulnificus fig|1324960. 19. peg. 585; Aeromonas salmonis subsp. pectinolyticus 34mel fig|196024. 16. peg. 3965; Aeromonas hydrophila fig|654. 27. peg. 4266; Aeromonas veronii fig|1802253. 3. peg. 1045; Thiomonas RIFCSPLOWO2_12_36_12 fig|636. 16. peg. 3905;Edwardii tarda fig|1124958. 3. peg. 5012; Enteric Salmonella enterica subsp. Muenster serotype 0315 fig|573. 10007. peg. 225; Klebsiella pneumoniae fig|1946737. 3. peg. 4002; Luxenbergia spp. UBA1284 fig|1398203. 3. peg. 3712; Bacillus kraussei Quebec fig|615. 247. peg. 2151; Serratia marcescens fig|52441. 3. peg. 3752; Estuarine Nitrosomonas fig|1951948. 3. peg. 242; Mycelium spp. UBA2389 fig|165186. 29. peg. 27; Uncultured Ruminococcus spp. fig|2013842. 3. peg. 1881; HGW-Synergistetes-1 fig|411484. 7. peg. 436; Clostridium SS2/1 fig|460384. 4. peg. 447; Clostridium lavarroni fig|1761781. 3. peg. 2961; Clostridium DSM 8431 fig|1451. 25. peg. 614; Bacillus starch-degrading bacteria fig|1776378. 3. peg. 2009; Fukuoka Bacillus fig|1866315. 3. peg. 2122; Bacillus sp. N35-10-4 fig|1034836. 4. peg. 4077; Bacillus starch-solubilizing XH7 fig|1397. 14. peg. 5097; Bacillus annuli fig|1497681. 5. peg. 772; Listeria monocytogenes fig|1053224. 3. peg. 4333; Bacillus cereus VD021 fig|1374. 4. peg. 2798; Pinococcus kucuri fig|458233. 11. peg. 419; Macrococcus casei JCSC5402 fig|417368. 6. peg. 944; Enterococcus thailandensis fig|1353. 16. peg. 736; Enterococcus quail fig|1639. 1307. peg. 2578; Listeria monocytogenes fig|1649188. 4. peg. 450; Listeria monocytogenes indica fig|333990. 5. peg. 1279; Sarcobacterium AT7 fig|1121085. 3. peg. 4805; Bacillus edinii DSM 18341 fig|659243. 6. peg. 1163; Bacillus siamensis fig|1965645. 3. peg. 1428; another genus An54 fig|1950664. 3. peg. 363; Pseudomonas aeruginosa UBA5918 fig|681398. 3. peg. 1596; Jiangxi Parudibacillum fig|1947481. 3. peg. 1596; Sphingobacterium UBA1498 fig|1946424. 3. peg. 2345; Dysgonomonas genus UBA4861 fig|188932. 3. peg. 968; Low-temperature soil bacteria fig|505249. 7. peg. 1802; Marine Toxoplasma fig|1802259. 3. peg. 374; Thiomonas RIFOXYD12_FULL_33_39 fig|1872629. 13. peg. 663; Toxoplasma fig|497650. 4. peg. 949; Sulfur Bacteria enrichment culture pure line C5 fig|1981711. 3. peg. 707; Pseudomonas B8 (2017) fig|287. 926. peg. 3808; Green bacillus fig|157782. 3. peg. 183; Pseudomonas paraxanthinus fig|1225174. 5. peg. 576; Pseudomonas mendocina S5. 2 fig|237610. 8. peg. 4301; Psychrotolerant Pseudomonas fig|1116369. 3. peg. 182; Hefflera 108 WP_080858354. 1; fig|1679460. 3. peg. 2715; Deep-sea marine bacillus fig|1811547. 3. peg. 510; Redsea-S28_B5 fig|93684. 8. peg. 518; Salt-tolerant rose-colored bright fungus EMZ69714. 1; Escherichia coli 174900 fig|103796. 87. peg. 3165; Pseudomonas syringae var. kiwifruit WP_078828851. 1; Pantoea pineapple fig|2018067. 3. peg. 1734; Pseudomonas FDAARGOS_380 fig|294. 255. peg. 5151; Pseudomonas fluorescens fig|287. 4271. peg. 5445; Green Pseudomonas aeruginosa fig|46677. 3. peg. 3237; Pseudomonas chrysanthemi fig|83964. 10. peg. 849; Pseudomonas porri pathogenic variant fig|1932113. 4. peg. 2793; Pseudomonas PA1 (2017) fig|1712677. 3. peg. 189; Pseudomonas 2822-15 fig|1479235. 3. peg. 2741; Halomonas HL-48 fig|227946. 12. peg. 3247; Xanthomonas translucentus poae pathogenic variant fig|40324. 220. peg. 2801; Oligotrophomonas maltophilia fig|227946. 13. peg. 35; Xanthomonas translucentus poae pathogenic variant fig|487909. 15. peg. 4212; Xanthomonas translucentus pathogenic variant vimentin fig|40324. 145. peg. 2120; Oligotrophomonas maltophilia fig|1182783. 3. peg. 8; Xanthomonas campestris JX fig|1736581. 3. peg. 4144; Lysobacterium Root667 fig|470. 4256. peg. 2128; Acinetobacter baumanni fig|1804984. 3. peg. 4735; Burkholderia OLGA172 fig|1882792. 3. peg. 5959; Burkholderia CF145 fig|1458357. 5. peg. 7849;Caballeronia jiangsuensis fig|674703. 3. peg. 3992; Rhodozoa Z2-YC6860 fig|1230476. 3. peg. 595; Bradyrhizobium DFCI-1 fig|1752222. 3. peg. 1730; Rhizobiales bacteria Ga0077525 fig|1948848. 3. peg. 320; Patellar flora bacteria UBA6220 fig|1860092. 3. peg. 3966; α-Proteobacteria MedPE-SWcel fig|398. 7. peg. 3301; Tropical rhizobia fig|418630. 3. peg. 960; Giant Rhododendron fig|56. 40. peg. 5712; Soilbergia cellulose fig|1660160. 3. peg. 2510; Acidobacterium spp. SCN 69-37 fig|1661042. 3. peg. 2224; Pseudomonas NBRC 111127 fig|1712678. 3. peg. 4198; Pseudomonas 2822-17 fig|1736561. 3. peg. 128; Pseudomonas Root562 fig|76760. 8. peg. 1730; Pseudomonas rhodesianus fig|1295133. 4. peg. 7170; Pseudomonas faecalis JCM 18798 fig|1718917. 3. peg. 3132; Pseudomonas ICMP 460 fig|237306. 3. peg. 591; Pseudomonas syringae pathogenic variant peach fig|1079060. 3. peg. 1479; Pseudomonas savarroa v. bean pathogenic variant 1644R fig|1981714. 3. peg. 1068; Pseudomonas B5 (2017) fig|1419583. 3. peg. 4516; Pseudomonas mandellii PD30 fig|1718918. 3. peg. 4166; Pseudomonas ICMP 561 fig|64988. 7. peg. 76; Alkanophage fig|1961564. 3. peg. 685; Desulfovibrioideae bacterium UBA5546 fig|2004648. 3. peg. 1747; Acinetobacter WCHA39 fig|1080187. 3. peg. 399;Cupricobacterium UYPR2. 512 fig|76114. 8. peg. 258; Aromatic aromatic bacteria EbN1 fig|196367. 9. peg. 6286;Caballeronia sordidicola fig|1217418. 3. peg. 694; Copperophilus HPC(L) fig|1752216. 3. peg. 4007; Nitrosomonasales Ga0074132 fig|1249621. 3. peg. 3614; Cupricobacterium HMR-1 fig|179879. 8. peg. 6514; Burkholderia anthina fig|1246301. 3. peg. 4482; Controversy over bacteriophage B4 WP_092746164. 1; Acidobacterium valerianum fig|536. 30. peg. 857; Chromobacterium violaceum fig|1961112. 3. peg. 115; UTPLA1 of the Phylum Planctomycetes fig|44574. 5. peg. 4575; Nitrosomonas vulgaris fig|265901. 4. peg. 190; Photorhabdus J15 fig|80852. twenty one. peg. 1289; Vibrio votansalis fig|1136159. 3. peg. 2534; Cyclovibrio 1F111 fig|24. 6. peg. 4539; Shewanella putrefaciens fig|888433. 3. peg. 1974; Pseudoalteromonas GutCa3 fig|196024. 6. peg. 3651; Aeromonas hydrophila fig|1352943. 3. peg. 5028; Vibrio harveyi E385 WP_088124663. 1; Vibrio cholerae fig|29497. 15. peg. 1220; Vibrio scintillans fig|670. 413. peg. 1227; Vibrio parahaemolyticus fig|584. 91. peg. 3337; Proteobacterium mirabilis fig|263819. 5. peg. 201; Yersinia alexei fig|1656094. 3. peg. 1449; Alteromonas synapomorpha fig|634. 5. peg. 607; Yersinia burnetii fig|630. 85. peg. 4080; Yersinia coli fig|1212491. 3. peg. 1847; Legionella fallopian tuberculosis LLAP-10 fig|1498499. 3. peg. 2812; Legionella nolanii fig|1844092. 4. peg. 3143; Pseudomonas 8 R 14 fig|1441629. 3. peg. 2119; Pseudomonas chicory JBC1 WP_092369835. 1; Pseudomonas selenogenum fig|477228. 3. peg. 2040; Pseudomonas stutzeri TS44 fig|317. 249. peg. 4299; Pseudomonas syringae fig|1597. 16. peg. 2055; Lactobacillus paracasei fig|1184720. 6. peg. 2708; Anhui Rhizobium fig|1566263. 3. peg. 185; Rhizobium NFR03 fig|1951216. 3. peg. 1622; Rhizobium bacteria UBA1909 fig|1219052. 3. peg. 3572; Sphingomonas pallidum NBRC 15498 fig|376620. 8. peg. 122; Gluconobacter japonicus fig|1736587. 3. peg. 2638; Devospasella Root685 WP_011269850. 1; Xanthomonas campestris fig|1195246. 3. peg. 467; Shewanella spp. BL06 fig|1931276. 3. peg. 1294; Halocystis UPWRP_2 fig|1931204. 4. peg. 12; Syngenetic microorganisms fig|2052957. 3. peg. 3497; Pseudorabacterium MZDSW-24AT fig|1952825. 3. peg. 2772; UBA4205 of the family A. cerevisiae fig|1189622. 3. peg. 1716; Pseudomonas amygdaloids pathogenic variant 6605 fig|294. 173. peg. 2741; Pseudomonas fluorescens fig|294. 122. peg. 3307; Pseudomonas fluorescens fig|1198456. 3. peg. 4053; Pseudomonas glutenin fig|1855380. 3. peg. 1951; Pseudomonas Z003-0. 4C(8344-21) fig|1144330. 3. peg. 3879; Pseudomonas GM48 fig|86265. 3. peg. 2799; Pseudomonas thievery fig|1881035. 3. peg. 3817; Songjiang bacteria PDC51 fig|511. 8. peg. 774; fecal alkali-producing bacteria fig|1095552. 3. peg. 2955; Methylobacterium luteum IMV-B-3098 fig|1690268. 3. peg. 1172; Acidophage SD340 fig|871652. 3. peg. 1451; Sedimentation Poseidonocella fig|1946868. 3. peg. 175; Methylophaga UBA1490 fig|1924940. 3. peg. 1147; Marine microbial colonies fig|1912891. 7. peg. 5370; Sphingolipids fig|1236503. 3. peg. 1539; Acetobacterium serrata JCM 25330 fig|1745182. 3. peg. 1942; Paracoccus MKU1 fig|1112. 5. peg. 2875; Porphyromonas neolithospermum WP_051585410. 1; Sphingomonas paucimobilis fig|1082931. 4. peg. 3584; Halophilic Farbrachiatus B2 fig|1907665. 3. peg. 5475; Agrobacterium DSM 25558 fig|1841652. 4. peg. 3782; Agrobacterium 13-626 fig|1736312. 3. peg. 3441; Rhizobium Leaf262 fig|1768770. 3. peg. 4687; Pseudomonas CCH5-E12 fig|355591. 9. peg. 1867; Haemophilus versicolor fig|1869214. 4. peg. 1848; Rheinheimia fig|1946470. 3. peg. 3546; Rhodobacterium UBA2510 fig|1860090. 3. peg. 231; Roseobacillus MedPE-SWde fig|2020902. 8. peg. 1814; Genus Ponticaulis fig|940286. 3. peg. 3612; Bacillus 174Bp2 fig|1736380. 3. peg. 1842; Rhizobium Leaf453 fig|665126. 3. peg. 2283; Micrococcidiales hirschii fig|2029410. 3. peg. 1956; Mesorhizobium WSM4311 WP_003169203. 1; Brevundimonas diminuta fig|1884373. 3. peg. 3317; Mesorhizobium YR577 fig|989436. 3. peg. 3203; Pseudomycin Ad5 fig|1736359. 3. peg. 3976; Rhizobium Leaf386 fig|104102. 12. peg. 3797; Tropical Acetobacter fig|1500305. 3. peg. 4736; Rhizobium OK665 fig|1842535. 30. peg. 6; Gemmatimonas RAC04 fig|70775. 16. peg. 91; Pseudomonas aeruginosa fig|287. 4262. peg. 2063; Pseudomonas aeruginosa WP_017702484. 1; Pseudomonas syringae fig|1357292. 3. peg. 4700; Pseudomonas syringae pathogenic variant pea strain PP1 fig|287. 2436. peg. 3554; Green bacillus fig|76758. 3. peg. 4722; Pseudomonas orientalis fig|1904755. 3. peg. 3469; Pseudomonas 43NM1 fig|47879. 37. peg. 693; Pseudomonas undulatus fig|1771311. 3. peg. 1935; Pseudomonas ATCC PTA-122608 fig|2008975. 3. peg. 1976; Pseudomonas Irchel 3E13 fig|1259798. 3. peg. 1121; Pseudomonas LAMO17WK12:I2 fig|1736487. 3. peg. 2103; Neospirillum Root189 fig|1706231. 5. peg. 1557; Janssenella CG23_2 fig|1804984. 3. peg. 4700; Burkholderia OLGA172 fig|54067. 3. peg. 2925; Staphylococcus aureus fig|40324. 357. peg. 1426; Oligotrophomonas maltophilia fig|1967657. 4. peg. 913; Enteric Salmonella enterica subspecies Telelkebir serotype fig|573. 14330. peg. 816; Klebsiella pneumoniae fig|615. 357. peg. 16; Serratia marcescens fig|1122616. 3. peg. 231; Ocelothionemus beijerii DSM 7166 fig|314276. 4. peg. 1389; Baltic Sea-derived bacteria OS145 fig|1038921. 4. peg. 2790; Pseudomonas chlororaphis subsp. aurea 30-84 fig|292. 72. peg. 3280; Burkholderia cepacia fig|1899355. 18. peg. 947; Marine Spirillaceae bacteria fig|2015356. 3. peg. 5401; Burkholderia AU33647 fig|206665. 3. peg. 1731;Desulfurization bacteria on the seafloor fig|1987165. 3. peg. 2564; Sphingolipids GW456-12-10-14-TSB1 fig|1283312. 3. peg. 2182; Sphingomonas vermifuss DC-6 fig|1223566. 3. peg. 1810; Bradyrhizobium CCGE-LA001 fig|76761. 16. peg. 744; Pseudomonas veronii PIY00499. 1; Hydrophilales bacteria CG_4_10_14_3_um_filter_58_23 fig|305. 94. peg. 4778; Ralstonia solanacearum fig|1758178. 5. peg. 2546; Ethanol fast-growing Bacillus fig|1354263. 4. peg. 2524; Hafnia paraalvei ATCC 29927 fig|1125979. 3. peg. 1941; Rhizobium PDO1-076 fig|1338032. 3. peg. 3393; Vibrio parahaemolyticus O1:K33 strain CDC_K4557 fig|1898112. 54. peg. 3344; Rhodospirillaceae bacteria fig|1432558. 3. peg. 4265; Klebsiella pneumoniae ISC21 fig|333962. 3. peg. 2767; Providencia hirae fig|60552. 10. peg. 2414; Burkholderia vietnamense WP_011808964. 1; Helminbacter essenii fig|1844107. 4. peg. 2966; Pseudomonas 58 R 12 fig|1952916. 3. peg. 906; Intertrophobacteriaceae UBA5549 fig|458817. 8. peg. 542; Shewanella halifax HAW-EB4 fig|1674859. 3. peg. 1291; Spiro_03 of the order Spirochiales fig|1121434. 3. peg. 22; Desulfovibrio eleri DSM 10141 fig|1262899. 3. peg. 286; Fusobacterium CAG:439 fig|57320. 3. peg. 1084; Pseudodesulfovibrio deep fig|1736444. 3. peg. 3753;Acinetobacter Root1280 fig|1310670. 3. peg. 2122; Acinetobacter 907131 fig|505345. 6. peg. 150; Carinii genotype 3 fig|670. 887. peg. 4391; Vibrio parahaemolyticus fig|196024. 16. peg. 2750; Aeromonas hydrophila fig|663. 48. peg. 1321; Vibrio alginolyticus fig|28141. 133. peg. 4717; Cronobacter sakazakii fig|1117315. 3. peg. 3; Pseudoalteromonas marinum ATCC 14393 fig|1917164. 4. peg. 2739; Shewanella spp. UCD-KL21 fig|2006083. 3. peg. 3222; Photorhabdus CECT 9192 fig|584. 227. peg. 2146; Proteobacterium mirabilis fig|1792834. 4. peg. 1793; Sedimentary Marinella fig|1333513. 3. peg. 3775; Pseudoalteromonas maritima TAE56 fig|1305826. 3. peg. 1246; Streptomyces Amel2xC10 WP_048809063. 1; Microbacterium ginseng fig|1987376. 3. peg. 4246; Pseudonocardia N23 fig|164115. 3. peg. 6832; Nivea fungus fig|285676. 33. peg. 4994; Micromonospora sarellisii SFF52649. 1; Alnicotinella fig|1100822. 3. peg. 6408; Streptomyces AmelKG-E11A WP_098467790. 1; fig|1190417. 3. peg. 2916; Geoderma telluris SDS16714. 1; Carbonaceous agrobacteria fig|692370. 5. peg. 1108; Dongtan Alternaria alternata fig|1736370. 3. peg. 1383; Sphingomonas Leaf412 fig|1759074. 3. peg. 2615; Sphingomonas genus HIX fig|1120928. 3. peg. 922; Acinetobacter gingerii DSM 14971 = CIP 107465 fig|470. 4268. peg. 2032; Acinetobacter baumanni fig|1217627. 3. peg. 995; Acinetobacter baumanni NIPH 67 fig|28450. 149. peg. 5786; Burkholderia mallei fig|2032650. 3. peg. 3543; Magnetococcales bacteria HCHbin5 fig|101571. 169. peg. 3909; Uboburkholderia fig|396597. 7. peg. 2128; Burkholderia bifida MEX-5 fig|869212. 3. peg. 3514;Tuneriella parva DSM 21527 fig|1196083. 80. peg. 1885; Alveus Nordgrassella fig|1304886. 3. peg. 1257; Desulfurized Corynebacterium DSM 7044 fig|555. 16. peg. 1362; Carrot fungus subsp. carotenoids fig|1421338. 3. peg. 151; Enterobacter aegypti L1 fig|443144. 3. peg. 785; Geobacter M21 fig|1265503. 3. peg. 1271; Corveria piezophila ATCC BAA-637 fig|55601. 100. peg. 1601; Vibrio anguillarum fig|299766. 9. peg. 4890; Enterobacter steigerwaltii subsp. fig|243231. 5. peg. 1360; Geobacter sulfadioxidans PCA fig|1263083. 3. peg. 558; Klebsiella varicella CAG:634 fig|57706. 9. peg. 1106; Citrobacter brunneri fig|1619244. 3. peg. 1323; Enterobacter baumannii SHO56340. 1; Vibrio pentatus fig|688. 15. peg. 906; Vibrio rosenbergii fig|663. 75. peg. 4800; Vibrio alginolyticus fig|1967612. 3. peg. 4508; Enteric Salmonella Houghton subspecies 50:z4,z23:-serotype fig|82985. 3. peg. 3508; Spring Prague bacteria fig|55207. 5. peg. 1840;β-angiophore fig|582. 25. peg. 388; Morganella morganii fig|1006598. 5. peg. 80; Serratia marcescens RVH1 fig|82977. 3. peg. 3268; Butyrospermum vulgare CRY53703. 1; Yarrowia intermedia fig|595494. 3. peg. 706; Toluenemonas australis DSM 9187 fig|1217694. 3. peg. 3300; Acinetobacter sp.CIP 64. 2 fig|470. 2679. peg. 715; Acinetobacter baumanni fig|1879050. 4. peg. 2797; Acinetobacter wuhouensis fig|2004650. 3. peg. 1818; Acinetobacter chinensis fig|648. 80. peg. 1405; Aeromonas caviae fig|1217722. 3. peg. 1866; Pseudomonas S13. 1. 2 fig|294. 88. peg. 4234; Pseudomonas fluorescens fig|629262. 5. peg. 1917; Pseudomonas syringae Japanese pathogenic variant strain M301072 WP_053932309. 1; Pseudomonas coronatus fig|1844101. 3. peg. 4702; Pseudomonas 31 R 17 fig|380021. 13. peg. 6149; Protect Pseudomonas fig|287. 3716. peg. 2163; Green Pseudomonas aeruginosa fig|287. 3208. peg. 1464; Green bacillus fig|1952221. 3. peg. 555; Methylococcaceae UBA2780 fig|1869214. 3. peg. 3542; Rheinheimia fig|375286. 7. peg. 830; Janssenella Marseille fig|536. 26. peg. 4616; Chromobacterium violaceum fig|983548. 3. peg. 2977; Dokshima bacteria 4H-3-7-5 fig|307480. 5. peg. 1780; Aureobacterium frostii fig|1262921. 3. peg. 2213; Prevotella CAG:1185 fig|1965649. 3. peg. 4193; Butyrivibrio An62 fig|1951558. 3. peg. 3731; Chitinophage bacterium UBA4411 fig|1950669. 3. peg. 2035; Pseudomonas aeruginosa UBA6192 fig|1869230. 3. peg. 3025; Aureobacterium CBo1 fig|1500294. 3. peg. 2814; Aureobacterium YR485 fig|1756149. 11. peg. 2545; Elizabethella bruuniana fig|1137281. 3. peg. 1436; Jiemingjiao yellow seawater bacteria fig|1964365. 5. peg. 2525; Schnidella fig|28450. 428. peg. 5049; Burkholderia pseudomallei fig|1628751. 3. peg. 813; Lin's Nostoc z16 fig|60137. 10. peg. 1138; Pontiacillus sulfatide fig|1580596. 3. peg. 2701; Fish brown rod fungus fig|1041141. 4. peg. 4741; Rhizobium vulgaris biotype 128C53 WP_063290764. 1; Unclassified Pseudomonas fig|1816219. 4. peg. 1873; Corveria spp.PAMC 21821 fig|651. 3. peg. 13; Aeromonas intermedius fig|134375. 17. peg. 3222; Achromobacterium WP_011296194. 1; Copper-eating bacteria pinatubonensis fig|1513890. 4. peg. 2322; Pseudomonas chlororaphis fish subspecies fig|294. 193. peg. 980; Pseudomonas fluorescens fig|287. 2516. peg. 114; Green Pseudomonas aeruginosa fig|2006083. 3. peg. 3219; Photorhabditis spp. CECT 9192 fig|458817. 8. peg. 538; Shewanella halifax HAW-EB4 fig|1073383. 3. peg. 1289; Aeromonas veronii AMC34 fig|190893. 14. peg. 2110; Vibrio corallii fig|663. 144. peg. 2659; Vibrio alginolyticus fig|55601. 106. peg. 926; Vibrio anguillarum fig|669. 51. peg. 5531; Vibrio harveyi fig|1250059. 5. peg. 3511; Myxobacterium MAR_2009_124 fig|906888. 6. peg. 2449;Nonlabens ulvanivorans WP_042276051. 1;SedimentNonlabens fig|1953167. 3. peg. 1254; Pseudomonas aeruginosa UBA6221 fig|991. 14. peg. 629; Echinococcus fig|1121890. 3. peg. 5; Flavopsora frigida DSM 17623 fig|387094. 4. peg. 1115; Flavobacterium hussifolium fig|1946558. 3. peg. 2413; Flavobacterium UBA7665 fig|253. 27. peg. 2882; Indole-producing Aureobacterium fig|1685010. 5. peg. 4376; Ice goldenrod fig|1500289. 3. peg. 4103; Aureobacterium OV705 fig|1500298. 3. peg. 2823; Aureobacterium YR561 fig|1797331. 3. peg. 2180; Pseudomonas GWE2_29_8 fig|1947498. 3. peg. 1366; Sphingobacterium spp. UBA4616 WP_074239321. 1; Niab Chitinophage fig|192149. 7. peg. 174; Murina fig|718222. 3. peg. 4924; Bacillus cerevisiae TIAC219 fig|1053210. 3. peg. 211; Bacillus cereus HuB4-10 fig|2026089. 3. peg. 6077; Bacillus XY044 fig|1938610. 3. peg. 3678; Flavobacterium LM5 fig|1947482. 3. peg. 632; Sphingobacterium UBA1575 fig|1948844. 3. peg. 823; Patellar flora bacteria UBA6130 fig|986. 7. peg. 83; Flavobacterium johnsonii fig|1950382. 3. peg. 461; Pseudomonas aeruginosa UBA1181 fig|1947145. 3. peg. 376; Prevotella UBA3765 fig|1122989. 3. peg. 367; Prevotella oralis DSM 18711 = JCM 12252 fig|1896974. 3. peg. 2001; Pseudomonas 43_108 fig|2014804. 3. peg. 4306; Levinellaceae SD302 fig|1428. 517. peg. 2380; Bacillus thuringiensis fig|1428. 574. peg. 3351; Bacillus thuringiensis fig|1428. 590. peg. 5685; Bacillus thuringiensis fig|720554. 3. peg. 188; Clostridium forskohlii DSM 19732 fig|1122203. 4. peg. 2283; Halococcus halophilus DSM 16375 fig|1462525. 3. peg. 3489; Deep-sea Bacillus TM-1 fig|1395513. 3. peg. 363; Lactobacillus lactis DSM 442 fig|1262834. 3. peg. 1229; Clostridium CAG:715 SCI87282. 1; Uncultured Roseburia fig|1952116. 3. peg. 2349; Lachnospiraceae bacterium UBA6480 WP_069150959. 1; Lachnospiraceae fig|1265. 10. peg. 3100; yellow Ruminococcus fig|1120998. 3. peg. 2858; Anaerovorax odorimutans DSM 5092 WP_072702499. 1; Butyrivibrio hungarianensis fig|1232453. 3. peg. 2795; Clostridiales VE202-21 fig|39485. 11. peg. 251; Lachnospiraceae WP_072832325. 1; fig|1509. twenty four. peg. 2487; Clostridium sporogenes SCI88558. 1; Uncultured Clostridium spp. fig|1490. 6. peg. 2635; Bifermentative Clostridium fig|1947399. 3. peg. 1322; Clostridium hengenii UBA3548 fig|1953138. 3. peg. 796; Pseudomonas aeruginosa UBA1312 fig|1950875. 3. peg. 364; Clostridiales UBA4139 fig|1396. 1518. peg. 4860; Bacillus cereus fig|1305675. 3. peg. 2174; Bacillus solimangrovi fig|1423. 436. peg. 4365; Bacillus subtilis fig|361277. 6. peg. 1539; Geobacter saccharophilus fig|1392. 356. peg. 4724; Bacillus anthracis fig|1053189. 3. peg. 4219; Bacillus cereus BAG5X1-1 WP_079442297. 1; Chromium-reducing Clostridium fig|1953262. 3. peg. 783; Candidate Omnitrophica bacteria UBA1562 fig|1797955. 3. peg. 2304; RIFOXYA12_FULL_51_18 fig|1953111. 3. peg. 2436; Acidobacterium spp. UBA7540 WP_099010551. 1; Escherichia coli retrotransposon-Eco1 (Ec86) fig|1005565. 3. peg. 1153; Escherichia coli 3006 fig|158822. 8. peg. 1905;Sindiaceae neislui fig|1444060. 3. peg. 4830; Escherichia coli 4-203-08_S1_C1 fig|29484. twenty two. peg. 4571; Yersinia freundii fig|529823. 3. peg. 332; Vibrio OA-2007 fig|48296. 218. peg. 3142; Acinetobacter pitei fig|550. 518. peg. 3445; Enterobacter vulvae fig|204773. 6. peg. 4; Arsenic-oxidizing Bacillus serrata fig|670. 190. peg. 4348; Vibrio parahaemolyticus fig|44577. 7. peg. 209; Urea Nitrosomonas fig|1125747. 3. peg. 1; Algae-degrading water-dwelling bacteria NO2 fig|1338034. 3. peg. 2437; Vibrio parahaemolyticus O1:Kuk strain FDA_R31 fig|1952844. 3. peg. 2619; Rhodotorulaceae UBA5533 fig|1288788. 3. peg. 2384; Vibrio parahaemolyticus 3631 fig|644. 31. peg. 975; Aeromonas hydrophila fig|498292. 3. peg. 28; Swinging yellow bacillus fig|1948088. 3. peg. 4515; Firmicutes bacteria UBA6132 fig|1408433. 3. peg. 3094; Saffron yellow thread fungus catalasitica ATCC 23190 WP_074236572. 1; Aureobacterium zeae fig|1127353. 3. peg. 1738; Enteric Salmonella enterica subsp. Newport serotype #11-4 fig|1881110. 4. peg. 120; Pantoea sesame fig|34038. 6. peg. 27; Aquatic Rhabditis elegans fig|630. 95. peg. 4256; Yersinia coli fig|149387. 11. peg. 1139; Enteric Salmonella enterica subspecies Brandenburg serotype fig|1343738. 3. peg. 2232; Vibrio cholerae 2012EL-1759 retrotransposon-Vch3 (Vc137) fig|1423. 175. peg. 4339; Bacillus subtilis fig|2021695. 3. peg. 3399; Bacillus 7894-2 fig|189426. 10. peg. 597; Odor-like Bacillus fig|2020949. 3. peg. 856; Rombutsiella weinsteinii fig|1243664. 3. peg. 1004; Bacillus martensii fig|1855345. 3. peg. 2971; Bacillus sp.RRD69 fig|1946358. 3. peg. 2514; Clostridium UBA4108 fig|1520. 90. peg. 2502; Clostridium beijerinckii fig|79672. 3. peg. 288; Bacillus suyungensis serotype Medellin fig|189426. 19. peg. 3973; Odor-like Bacillus fig|1497. 3. peg. 4049; Clostridium formoaceticum fig|169760. 4. peg. 4269; Bacillus asterospora WP_073588670. 1;Anaerocolumna xylanovorans fig|1950815. 3. peg. 1585; Clostridiales UBA1341 fig|1897004. 3. peg. 2166; Fungus 45_250 fig|1946293. 3. peg. 290; Catarrhalis spp. UBA7571 fig|1796620. 3. peg. 3489; Acute bacillus in mice fig|76857. 53. peg. 2245; Fusobacterium nucleatum polymorphic subsp. fig|2013784. 3. peg. 1260; Firmicutes bacteria HGW-Firmicutes-3 fig|79884. 3. peg. 1106; Bacillus pseudoalkalinophilus fig|135461. 47. peg. 1552; Bacillus subtilis subsp. subtilis WP_016122013. 1; Bacillus cereus fig|1499688. 3. peg. 3214; Bacillus LF1 fig|1318. 8. peg. 1495; Streptococcus parahaemolyticus fig|1381091. 3. peg. 1173; Streptococcus equi subspecies zooepidemicus SzAM60 fig|1409369. 3. peg. 815; Staphylococcus aureus AMMC6050 fig|1497681. 5. peg. 3014; Listeria monocytogenes fig|1095727. 3. peg. 409; Streptococcus SK643 fig|1318. 12. peg. 313; Streptococcus parahaemolyticus fig|1681184. 3. peg. 4899; Lysine Bacillus ZYM-1 fig|561879. 29. peg. 3569; Bacillus safari fig|1884359. 3. peg. 2978; Psychrobacterium OK028 fig|29367. 3. peg. 1112; Clostridium pomegranate fig|225345. 3. peg. 1103; Chromium-reducing Clostridium fig|1345695. 10. peg. 2438; Clostridium saccharobutylicum DSM 13864 fig|119641. 3. peg. 784; Clostridium uliginosum fig|1761781. 3. peg. 88; Clostridium DSM 8431 fig|1946346. 3. peg. 2999; Clostridium UBA1056 fig|1492. 48. peg. 3952; Clostridium butyricum fig|1121302. 3. peg. 4511; Clostridium cavendishi DSM 21758 fig|398512. 4. peg. 5301; Pseudomonas aeruginosa ATCC 35603 = DSM 2933 fig|1946357. 3. peg. 500; Clostridium UBA3947 fig|642492. 3. peg. 3338; Cellulosilyticum lentocellum DSM 5427 fig|1946690. 3. peg. 1553; Intestinal core bacteria genus UBA3320 fig|397290. 3. peg. 150; Lachnospiraceae A2 fig|97138. 3. peg. 1713; Clostridium ASF356 fig|1965545. 3. peg. 499; Theileria An114 fig|1047063. 3. peg. 240; WS1 bacteria JGI 0000059-K21 fig|1192034. 3. peg. 1508; Chondroitinib spiculate DSM 436 AAA88323. 1; Myxococcus flavus retrotransposons-Mxa2 (Mx65) fig|33. 8. peg. 8196; Myxococcus aurantiacus fig|215803. 3. peg. 1485; Salt water slime bacteria fig|1406225. 3. peg. 2150; Purple Protothecoides Cb vi76 fig|1952931. 3. peg. 5137; Verrucomicrobia 3 bacteria UBA6082 fig|1972460. 3. peg. 279; Anaerobic bacteria 4572_78 fig|1950201. 3. peg. 3297; Anaerobic Rhizobacteriales bacterium UBA2796 WP_015247705. 1; fig|1799658. 3. peg. 2777; SCGC AG-212-F19 of the family Planctomycetes fig|214688. 26. peg. 6659; Cryptococcus UQM 2246 fig|2023130. 3. peg. 4250; Pyricularia MGV fig|52. 7. peg. 9046; Chondroitinibacterium safflower fig|448385. 16. peg. 2914; Soilbergia cellulose So ce56 fig|888845. 4. peg. 14202; Microcystis rosea fig|1752210. 3. peg. 1275; δ-Proteobacteria Ga0077539 fig|1391654. 3. peg. 3562;Labilithrix luteola fig|1752218. 3. peg. 3670; Fungus family Ga0077529 WP_006981058. 1; Xanthomonas flavus fig|1952939. 3. peg. 2584; Verrucomicrobiaceae UBA1938 fig|2024858. 3. peg. 3711; Genus Sandaracinus WP_009096166. 1; Pyrrophyta SWK7 fig|595453. 3. peg. 1506; Pyricularia SM50 fig|1263868. 3. peg. 4100; European red pear-shaped fungus SH398 fig|167547. 3. peg. 303; Marine Prochlorococcus strain MIT 9311 fig|1499501. 3. peg. 459; Prochlorococcus SS52 fig|1905359. 3. peg. 4335; Marine bacteria AO1-C WP_002700020. 1; Microtremorum marineum fig|1913989. 145. peg. 263; Gammaproteobacteria WP_073154989. 1; Seinosporium philum fig|46223. 3. peg. 3648; Biqi high-temperature yellow micrococcus fig|1329796. 3. peg. 1947; Marseille Risungbinella fig|1123252. 3. peg. 3225; Kribissoma shimamotoi DSM 45090 fig|714067. 3. peg. 3719; Kloppensteadia ivory fig|1341151. 3. peg. 628; Leucocephalus saccharus 1-1 fig|2026763. 3. peg. 3089; Myxococcal bacteria fig|1797895. 3. peg. 2518;δ-Proteobacteria RIFOXYA12_FULL_58_15 fig|373672. 4. peg. 4082; Aureobacterium gambryanum fig|1416778. 5. peg. 4374; Peanut goldenrod fig|1603293. 4. peg. 829; Flavobacterium 316 CCB70859. 1; Flavobacterium gillophilum FL-15 fig|1986952. 3. peg. 951; Sphingobacteriaceae GW460-11-11-14-LB5 fig|1476464. 3. peg. 4304; Xixi soil bacteria fig|1761785. 3. peg. 3112; Flavobacterium ov086 fig|1664068. 3. peg. 3690; Bacteria 336/3 fig|880071. 3. peg. 3500;Bernardetia litoralis DSM 6794 fig|1121902. 3. peg. 2906; Eisenbergia elegans DSM 3317 fig|1509483. 4. peg. 1923; Bacillus spp. BAB-3569 fig|1166018. 3. peg. 5312;Fibrella aestuarina BUZ 2 fig|634771. 3. peg. 967; Chitinophaga essenii fig|29529. 3. peg. 396; Chitinophages arvensis fig|1891659. 3. peg. 6032; Chitinophage CB10 fig|2033437. 3. peg. 3442; Chitinophage MD30 fig|1004. 4. peg. 1677; Chitinobacterium ginseng fig|1881041. 3. peg. 3698; Chitinophage YR627 fig|1123078. 3. peg. 2010; Archaeotrophic fungus DSM 19591 fig|354355. 3. peg. 1846; Rongzhou Agricultural Research Institute Mycobacterium fig|1951546. 3. peg. 1574; Chitinophage bacterium UBA1946 fig|1812911. 3. peg. 633; Flavobacterium CACIAM 22H1 fig|477680. 4. peg. 4668; Lineomonas imperfecta fig|221126. 7. peg. 3781; Select algae bacteria fig|342954. 4. peg. 734; Algae lake bacteria fig|1871037. 5. peg. 2180; Flavobacteriaceae fig|669041. 3. peg. 843; Myxobacterium bicentrum WP_074538568. 1; Baltic fiber phage fig|1248440. 3. peg. 1511; Franzmannii ATCC 700399 fig|1121007. 3. peg. 1574; Musselsia marinum DSM 19832 fig|688867. 3. peg. 2332; Ohtaekwangia, South Korea fig|926565. 3. peg. 717; Myxococcosis fibrosus DSM 11118 fig|1257021. 3. peg. 5265; Pyrochromocytaceae bacteria 311 fig|2044937. 5. peg. 2350; Candidate division KSB3 bacteria fig|1499966. 3. peg. 180; Candidate Moduliflexus fig|1948269. 3. peg. 1254; Verrucomicrobia UBA6053 fig|694433. 3. peg. 3103; Giant saprophytic spirochetes DSM 2844 fig|2008677. 3. peg. 3337; Songjiang nodosa fig|946333. 3. peg. 3570; Rhizobium gloeosporum fig|1736433. 3. peg. 5559; Rhizobium Root1221 fig|1500265. 3. peg. 5760; Methylobacterium YR605 fig|1121349. 4. peg. 2836; Composting Commissurotomus DSM 21721 fig|1082851. 3. peg. 91; serinivorans fig|1121480. 5. peg. 5468; Pseudomonas pseudomallei DSM 15887 fig|2045208. 3. peg. 1247; Purple-black Marseilles fig|1736455. 3. peg. 3692; Marsella Root133 fig|34073. 25. peg. 8569; Debate on phages fig|1884311. 3. peg. 7121; Coprophagnum OK202 fig|1123487. 3. peg. 1763;Uliginosibacterium gangwonense DSM 18521 fig|2029111. 3. peg. 3032; Combretum bacterium NML120219 fig|1977087. 20. peg. 1226; Proteobacteria fig|754436. 4. peg. 4454; Non-luminescent Photorhabditis spp. fig|265726. 7. peg. 1038; Halotoxinia fig|1121867. 3. peg. 59; Enterobacter calvei DSM 14347 fig|1238431. 3. peg. 2655; Vibrio nigromaculata BLFn1 fig|1384589. 3. peg. 2721; [Evansella] Oil gourd fig|1261127. 3. peg. 2947; Citrobacter malonic acid-free Y19 fig|349521. 8. peg. 3588;Hahella chejuensis KCTC 2396 fig|525918. 3. peg. 1501; Caldifontis fig|1737490. 4. peg. 4974; Agrobacterium spp. fig|251229. 3. peg. 427; PCC 7203 of Pseudochromococcus pyogenes fig|1245923. 3. peg. 9587; Mikimoto's pseudoclade VB511283 fig|1503470. 5. peg. 10896; Cyanobacteria TDX16 fig|2005460. 3. peg. 1118; Chondrocystis NIES-4102 fig|179408. 3. peg. 4679; PCC 7112 fig|1612423. 3. peg. 5384; Lin's Nostoc z1 fig|63737. 11. peg. 472; Nostoc punctata PCC 73102 fig|224013. 5. peg. 7163; Pond-grown Nostoc CENA21 fig|1932621. 3. peg. 7363; Nostoc T09 fig|373994. 3. peg. 3383; PCC 7116 fig|2005463. 3. peg. 257; NIES-4105 fig|2005459. 3. peg. 7019; Monophyllium NIES-4075 fig|184925. 3. peg. 2602; Pseudomonas flexneri PCC 9212 fig|454136. 5. peg. 3127; Suspicious phyllophyta IAM M-71 fig|203124. 6. peg. 2732; Red Sea Trichodesmium IMS101 fig|2040638. 3. peg. 3067; Buchnerella FEM_GT703 Fig|1880991. 4. peg. 2927; Cyanobacteria tremuloides USR001 fig|1173028. 3. peg. 7115; Pseudomonas PCC 10802 fig|568701. 4. peg. 2073;Moorea bouillonii PNG fig|927677. 3. peg. 4187; Synechocystis PCC 7509 fig|179408. 3. peg. 6267; Dark Green Algae PCC 7112 fig|1710894. 3. peg. 2079; Aphanizomenon LD13 fig|1947888. 3. peg. 4484; Cyanobacteria UBA6047 fig|1705388. 3. peg. 1178; Pseudohylostomium SR001 fig|454136. 5. peg. 4162; Suspicious phyllophyta IAM M-71 fig|2005458. 3. peg. 378; Nostoc NIES-4103 fig|1947874. 3. peg. 4590; Cyanobacteria UBA1583 fig|1781255. 3. peg. 802; Desertifilum genus IPPAS B-1220 fig|1128427. 4. peg. 2346; ESFC-1 fig|1946321. 3. peg. 3928; Chlorobacterium UBA7656 fig|118173. 3. peg. 1030; Pseudoanabaena PCC 6802 fig|1922337. 4. peg. 4802; Genus Hensonii fig|927668. 3. peg. 2766; Pseudoanabaena biceps PCC 7429 fig|1173020. 3. peg. 6191; Microcystis PCC 6605 fig|329726. 14. peg. 4440; Marine Acaryochloris MBIC11017 fig|215803. 3. peg. 1649; Salt water slime bacteria fig|1920190. 3. peg. 9548; Protothecoides Cb G35 fig|1961464. 3. peg. 5181; Myxococcal bacteria UBA2376 fig|765913. 3. peg. 336; Rhodococcus delbrueckii AZ1 fig|1396141. 3. peg. 2891; Halostyra BvORR071 fig|1961463. 3. peg. 5253; Myxococcales UBA1671 AAL40743. 1; Nephrolepis erossima retrotransposons-Nex2 (Ne144) fig|54. 3. peg. 4123; Erosion fungus fig|53367. 3. peg. 3417; Asanoella rustaflora fig|460265. 11. peg. 3882; Methylobacterium nodosa ORS 2060 fig|298794. 3. peg. 462; Methylobacterium mutans fig|190148. 4. peg. 3492; Bradyrhizobium paxllaeri fig|1075417. 3. peg. 445; Catalinimonas alkaloidigena fig|1429438. 4. peg. 7505; Candidate endothelial factor fig|1977087. 12. peg. 1756; Proteobacteria fig|92487. 3. peg. 3972; Sulfur fungus eikelboomii fig|1977087. 20. peg. 1473; Proteobacteria fig|1123400. 3. peg. 3276; Flexible Thiofilum DSM 14609 fig|34062. 8. peg. 73; Oslo Moraxella fig|1699623. 3. peg. 1502; Psychrobacterium P11G3 fig|1123509. 3. peg. 848; Zooshikella ganghwensis DSM 15267 fig|2026735. 3. peg. 2222;ζ-Proteobacteria fig|1977087. 12. peg. 2982; Proteobacteria fig|2026763. 4. peg. 1195; Myxococcal bacteria fig|1977087. 12. peg. 510; Proteobacteria fig|1123508. 3. peg. 7252; Zavarzinella formosa DSM 19928 fig|214688. 26. peg. 3091; Cryptococcus UQM 2246 fig|1908690. 5. peg. 1204; Fimbriiglobus ruber fig|1805126. 3. peg. 4431;δ-Proteobacteria CG2_30_63_29 fig|1882752. 4. peg. 1962; Singulisphaera genus GP187 fig|1636152. 3. peg. 5364; Fungus SH-PL62 APR75442. 1; Microcystis rosea fig|54. 3. peg. 8798; Erosion fungus fig|980254. 4. peg. 4083;Roseimaritima ulvae fig|1856297. 3. peg. 3627; Gammaproteobacteria 45_16_T64 fig|1219077. 3. peg. 1945; Vibrio fargesii NBRC 104587 fig|1334629. 3. peg. 167; Myxococcus aurantiacus 124B02 AAA25405. 1; Myxococcus flavus retrotransposons - Mxa1 (Mx162) fig|378806. 16. peg. 4444; Orange-colored pile fungus DW4/3-1 WP_002615305. 1; Auricularia aurantiaca retrotransposase-Sau1 (Sa163) fig|48. 3. peg. 757; Transitional Protothecoides fig|448385. 16. peg. 2083; Soilbergia cellulose So ce56 fig|52. 7. peg. 5100; Chondroitinibacterium safflower fig|1752210. 3. peg. 5621; δ-Proteobacteria Ga0077539 fig|2024858. 3. peg. 6345; Genus Sandaracinus WP_012826728. 1; Halobacterium ochraceum fig|927083. 3. peg. 3408;Sandaracinus WP_006977315. 1; Paracystis pacifica fig|1400863. 5. peg. 627; Candidate denitrifying competitive Bacillus Run_A_D11 fig|1961463. 3. peg. 4303; Myxococcal bacteria UBA1671 fig|1898731. 3. peg. 3099; Competitive Bacillus MCBA15_001 fig|1279028. 3. peg. 3374; Competitive Bacillus 314Chir4. 1 fig|1898733. 3. peg. 2056; Competitive Bacillus MCBA15_004 fig|1795630. 3. peg. 3476; Pseudomonas PAMC 28766 fig|2033654. 3. peg. 3461; Competitive Bacillus 'Ferrero' fig|1736329. 5. peg. 1436; Leaf304 fig|1736292. 3. peg. 1083; Lassa Leaf185 fig|1736327. 3. peg. 206; Lassa Leaf296 fig|1736311. 3. peg. 3668; Competitive Bacillus Leaf261 fig|1736308. 3. peg. 2333; Psoraleaf254 fig|656366. 8. peg. 2905; Alpine arborvitae fig|494023. 3. peg. 138; Antarctic Glutamibacterium ASN40093. 1; Arthrobacter 7749 fig|1494608. 3. peg. 465; Arthrobacter PAMC 25486 fig|656366. 4. peg. 2620; Alpine arborvitae fig|656366. 3. peg. 1944; Alpine arborvitae fig|1132441. 3. peg. 1888; Arthrobacter 35W fig|1704044. 3. peg. 520; Arthrobacter ERGS1:01 fig|1496689. 3. peg. 681; Arthrobacter L77 fig|1681197. 3. peg. 149; Arthrobacter RIT-PI-e fig|37921. 12. peg. 1481; Agile rod fungus fig|1736303. 3. peg. 982; Arthrobacterium Leaf234 fig|1312978. 3. peg. 1472; Arthrobacter H41 fig|1348338. 3. peg. 1472;Rafsonia rubber CMS 76R fig|1452536. 3. peg. 1955; Microbacterium Cr-K20 fig|1736525. 3. peg. 446; Revsonella Root4 fig|1529318. 3. peg. 434; Thermobacterium MLB-32 fig|1267973. 3. peg. 3479; Arthrobacter H5 fig|150121. 3. peg. 1900; Agrobacterium prausnitzii fig|123316. 3. peg. 955; Agrelia VKM Ac-2052 fig|1052260. 3. peg. 3617; Soil Klenkia fig|1566299. 3. peg. 3962; Marine Klenkia fig|1736356. 3. peg. 3150; Modena Leaf380 fig|1736354. 3. peg. 1787; Geoderma Leaf369 fig|479431. 6. peg. 3115; Nakamurella multipartita DSM 44233 fig|1090615. 3. peg. 2397; Nakamurella panacicegetis fig|1306174. 4. peg. 4778; Aurantiacum JCM 3230 fig|546871. 3. peg. 1120;Flemingella flavum fig|630515. 4. peg. 525; Soil Pleurotus eryngii fig|546874. 3. peg. 1181; Freidensis sagamiharensis BAK35674. 1; Phosphate-accumulating NM-1 fig|1380390. 4. peg. 72; Soil Rhodococcales bacteria URHD0059 fig|1283299. 3. peg. 2784; Bindella woodi Iso977N fig|929712. 3. peg. 3165; DSM 18081 of DSM 18081 fig|1123262. 3. peg. 3125; Soil Rhodobacterium DSM 22325 fig|1861. 4. peg. 5240; Dark-striped dermatophyte fig|1137993. 4. peg. 1318; African dermatophytes fig|1070870. 3. peg. 778; Dermatophagoides nigromaculata fig|1190417. 3. peg. 1785; Geodermatophytes telluris fig|477641. 3. peg. 1023; Marine Mordecai bacteria fig|1798228. 3. peg. 3756; Staphylococcus DSM 46838 WP_091929708. 1; Staphylococcus DSM 46786 SHH20361. 1; Endophytes of leprosy tree fig|1844. 3. peg. 1151; Nocardia lutea fig|748909. 6. peg. 1418; Nocardia alpina fig|402596. 3. peg. 987; Nocardia albicans fig|1736322. 3. peg. 1963; Nocardia Leaf285 fig|1445613. 3. peg. 3490; Marine South China Sea Institute of Bacteria SE31 fig|543632. 4. peg. 9742; Subtropical zoospermic actinomycetes fig|1036182. 3. peg. 2958; dark orange motile actinomycetes fig|1246995. 3. peg. 737; Actinozoa friuliensis DSM 7358 fig|56427. 3. peg. 3052; Coccidioides cyanobacteria subsp. cyanobacteria fig|1710355. 3. peg. 2225; Actinomycetes TFC3 fig|649831. 3. peg. 2352; Actinomycetes N902-109 fig|35754. 4. peg. 6321; Dactylocystis citrinum fig|1881. 4. peg. 2703; Micromonospora chlororaphis fig|47863. 3. peg. 3975; Micromonospora sphaeroides fig|285665. 4. peg. 2050; Micromonospora spp. fig|1192034. 3. peg. 4668; Chondroitinib spicata DSM 436 fig|1198133. 3. peg. 2442; Myxococcus flavus DZ2 fig|33. 8. peg. 521; Myxococcus aurantiacus fig|394193. 3. peg. 7794; Amycota saalfeldensis fig|369932. 4. peg. 5621; Niigata pseudomycotic acid bacteria fig|1238180. 3. peg. 5340; Amycolatopsis coelicolor DSM 43854 fig|589385. 3. peg. 7940; Xylan-producing Amycolatopsis fig|1068980. 3. peg. 1527; Amycolatopsis nigromaculata CSC17Ta-90 fig|1854586. 3. peg. 2100; Antarctic pseudomycotic bacteria fig|587909. 3. peg. 3086; Desert Jade Bacteria fig|2030. 3. peg. 3051; Xerocystis fig|1382595. 4. peg. 3164; Saccharopolyspora rubrum D WP_013675061. 1; Pseudonocardia spp. fig|1660131. 3. peg. 2805; Pseudonocardia SCN 72-86 fig|366584. 3. peg. 4349; Pseudonocardia oxyphylla fig|1885031. 4. peg. 5241; Pseudonocardia Ae331_Ps2 fig|1690815. 5. peg. 5350; Pseudonocardia HH130630-07 fig|1123023. 3. peg. 3229; Pseudonocardia acacia DSM 45401 fig|1449976. 3. peg. 8114; Kutzneria albida DSM 43870 WP_007238159. 1; fig|1220583. 3. peg. 1582; Gordonia aichiensis NBRC 108223 GAB07179. 1; Gordonia sludge NBRC 15530 fig|1223545. 3. peg. 704; Soil Gordonia NBRC 108243 fig|1223540. 3. peg. 3237;Gordonella desulfuricans NBRC 100010 fig|1112204. 3. peg. 4913; Gordonia polyisoprene-feedingensis VH2 AFR49048. 1; Gordonia KTR9 fig|402289. 3. peg. 1558; Rhodococcus HA99 fig|1077144. 3. peg. 224; Dietzella alimentaria 72 fig|1344003. 3. peg. 1864; Pinellia fig|1463823. 3. peg. 3407; Microbial spores NRRL B-24597 fig|1903117. 3. peg. 1566; William's genus 1138 fig|1603258. 4. peg. 1828; Williamsella herbipolensis fig|644548. 3. peg. 652; Clouded Leopard Feces Gordonia NRRL B-59395 fig|1136941. 3. peg. 1310; Gordonia phthalatica fig|1223542. 3. peg. 3439; Sewage Gordonia NBRC 108250 fig|47312. 10. peg. 4232; Pulmonaria fig|57704. 14. peg. 421; Tyrosine-resistant T. Tsukamura fig|521096. 6. peg. 2443; slightly changed Tsukamura fungus DSM 20162 fig|1123241. 3. peg. 3642; Nakamurella lactea DSM 19367 fig|1210073. 4. peg. 1031; Nocardia salinarum NBRC 100378 fig|1206740. 4. peg. 4617;Thai Nocardia NBRC 100428 fig|1210064. 4. peg. 2434; Nocardia altamilis NBRC 108246 fig|1123258. 3. peg. 1651; Neocystis DSM 44881 = NBRC 103563 fig|1443888. 3. peg. 2891; Rhodococcus fasciatus 02-815 fig|1517936. 4. peg. 882; Rhodococcus CUA-806 fig|398843. 6. peg. 4214; Rhodococcus kyonii fig|1813677. 3. peg. 4031; Rhodococcus EPR-157 WP_008711873. 1; Unnamed organism fig|1381122. 3. peg. 6103; Rhodococcus erythropolis DN1 fig|1736210. 3. peg. 2766; Rhodococcus Leaf7 fig|1736300. 3. peg. 1279; Rhodococcus Leaf225 fig|1219012. 3. peg. 1705; Rhodococcus NBRC 14404 fig|1219023. 3. peg. 2791; Rhodococcus spp. NBRC 100604 fig|616997. 3. peg. 2548; Altamira fig|1303689. 4. peg. 2934; Korean Rhodococcus JCM 10743 = NBRC 100607 fig|644. 85. peg. 4392; Aeromonas hydrophila. Using sequence listings to identify homologous RT/ncRNA pairs

藉由鑑定出表B中之彼等ncRNA序列及表A中之彼等RT序列具有 相同的針對序列表中之每個序列所指示之基因體寄存編號,可自序列表中所呈遞/包括之資訊輕鬆地鑑定ncRNA及RT (胺基酸或核苷酸序列)之同源配對。在各個其他態樣中,逆轉錄子RT組分及ncRNA組分可來自不同細菌物種,亦即,在自然界中未作為同源對一起發現。在其他實施例中,逆轉錄子RT組分及ncRNA組分可同時來自同一種系發生功能類型(例如,IA型、I-B1型、I-B2型、IC型等)。例如,基於重組逆轉錄子之基因體編輯系統可包含來自I-A型之逆轉錄子RT (亦即,針對AA之SEQ ID No: 3980-4178及針對NT之SEQ ID No: 11231-11429 - 參見表A)及亦來自I-A型之逆轉錄子ncRNA (亦即,SEQ ID No: 16886-17078 - 參見表B)。 By identifying that the ncRNA sequences in Table B and the RT sequences in Table A have the same genome accession number indicated for each sequence in the sequence listing , the homologous pairing of ncRNA and RT (amino acid or nucleotide sequence) can be easily identified from the information presented/included in the sequence listing. In various other aspects, the retrotranscript RT component and the ncRNA component can be from different bacterial species, that is, not found together as a homologous pair in nature. In other embodiments, the retrotranscript RT component and the ncRNA component can be from the same lineage at the same time to generate functional types (e.g., type IA, type I-B1, type I-B2, type IC, etc.). For example, a recombinant retrotranscript-based genome editing system may comprise a retrotranscript RT from type IA (i.e., SEQ ID Nos: 3980-4178 for AA and SEQ ID Nos: 11231-11429 for NT - see Table A) and a retrotranscript ncRNA also from type IA (i.e., SEQ ID Nos: 16886-17078 - see Table B).

與此PCT申請案共申請之序列表形成最初申請之本說明書的一部分。如上所述,可查閱序列表來鑑定逆轉錄子RT (表A – AA或NT)與逆轉錄子ncRNA (表B)之間之同源配對。可遵循以下步驟: 步驟1. 挑選RT序列。藉由其序列標識符(例如,SEQ ID NO: 4178)自表A中選擇任何RT序列。 步驟2. 在序列表中定位序列條目。使用合適文本編輯軟體中之任何文本查找功能來鑑定序列表中所選擇之序列標識符(例如,SEQ ID NO: 4178),該軟體用於讀取在申請時與本說明書一起提交且形成最初申請之本說明書的一部分之序列表文件(例如,Microsoft Word)。考慮到本說明書中最初包括之序列表呈XML文件形式,搜索將針對以下文本(在{括號}中):{sequenceIDNumber = “4178”}。 步驟3. 確定哪種生物體及/或哪個NCBI序列參考編號與序列條目相關。一旦步驟2定位所搜索之文本,請檢查序列條目中引用「</INSDQualifier_value>」之數行文本且注意在已鑑定行之左側呈現的NCBI參考序列號。指示符「</INSDQualifier_value>」亦指示來源生物體。例如,對於條目{sequenceIDNumber=“4178”},在第一個「</INSDQualifier_value>」處鑑定之生物體為「Shewanell sp.」且在第二個「</INSDQualifier_value>」處鑑定之NCBI參考編號為「Related to NZ_JWGX01000025.1」 因此,對於SEQ ID NO: 4178,NCBI參考編號為NZ_JWGX01000025.1且生物體為Shewanella sp.。 步驟4. 查找同源逆轉錄子ncRNA序列。在序列表中對步驟3中鑑定之NCBI參考編號進行文本搜索。檢查相關SEQ ID NO:之條目。在SEQ ID NO: 4178之情況下,NCBI參考編號為「NZ_JWGX01000025.1」且相關ncRNA序列為SEQ ID NO: 17078。 The sequence listing filed with this PCT application forms part of the specification of the initial application. As described above, the sequence listing can be searched to identify homologous pairs between retrotranscript RT (Table A - AA or NT) and retrotranscript ncRNA (Table B). The following steps can be followed: Step 1. Select RT sequence. Select any RT sequence from Table A by its sequence identifier (e.g., SEQ ID NO: 4178). Step 2. Locate the sequence entry in the sequence listing. Use any text search function in a suitable text editing software to identify the selected sequence identifier (e.g., SEQ ID NO: 4178) in the sequence listing, which is used to read the sequence listing file (e.g., Microsoft Word) submitted with this specification at the time of application and forming part of the specification of the initial application. Considering that the sequence listing originally included in this specification is in the form of an XML document, the search will be directed to the following text (in {brackets}): {sequenceIDNumber = "4178"}. Step 3. Determine which organism and/or which NCBI sequence reference number is associated with the sequence entry. Once step 2 locates the text being searched, check the lines of text in the sequence entry that reference "</INSDQualifier_value>" and note the NCBI reference sequence number presented to the left of the identified line. The indicator "</INSDQualifier_value>" also indicates the source organism. For example, for the entry {sequenceIDNumber="4178"}, the organism identified at the first "</INSDQualifier_value>" is "Shewanell sp." and the NCBI reference number identified at the second "</INSDQualifier_value>" is "Related to NZ_JWGX01000025.1" Therefore, for SEQ ID NO: 4178, the NCBI reference number is NZ_JWGX01000025.1 and the organism is Shewanella sp. Step 4. Find homologous retrotranscript ncRNA sequences. Perform a text search in the sequence listing for the NCBI reference number identified in step 3. Check the entry for the related SEQ ID NO:. In the case of SEQ ID NO: 4178, the NCBI reference number is "NZ_JWGX01000025.1" and the related ncRNA sequence is SEQ ID NO: 17078.

此進一步例示如下: 步驟1. 自表A中挑選任何RT序列(例如,SEQ ID NO: 4178)。 步驟2. 藉由如下搜索括號中之文本來搜索且鑑定序號表文件中之SEQ ID NO: 4178,如下:{sequenceIDNumber=“4178”}。檢查命中區域,該區域經成像為以下電腦螢幕影像且以粗體顯示(a) SEQ ID NO: 4178之胺基酸序列,(b)相關生物體為「Shewanella sp.」,以及(c)相關NCBI參考編號為「NZ_JWGX01000025.1」。 步驟3.鑑定NCBI參考編號。在此實例中,NCBI參考編號為NZ_JWGX01000025.1。 步驟4.接著使用NCBI參考編號在序列表中定位同源逆轉錄子ncRNA序列。在此情況下,同源ncRNA序列為SEQ ID NO: 17078,如SEQ ID NO: 17078之ncRNA序列條目的以下電腦螢幕影像所示,藉由檢查可見,該影像共享NCBI參考編號「 NZ_JWGX010000025.1」及「Shewanella sp.」生物體。 This is further illustrated as follows: Step 1. Pick any RT sequence from Table A (e.g., SEQ ID NO: 4178). Step 2. Search and identify SEQ ID NO: 4178 in the sequence table file by searching the text in brackets as follows: {sequenceIDNumber="4178"}. Check the hit region, which is imaged as the following computer screen image and shows in bold (a) the amino acid sequence of SEQ ID NO: 4178, (b) the related organism is "Shewanella sp.", and (c) the related NCBI reference number is "NZ_JWGX01000025.1". Step 3. Identify the NCBI reference number. In this example, the NCBI reference number is NZ_JWGX01000025.1. Step 4. The NCBI reference number is then used to locate the homologous retrotranscript ncRNA sequence in the sequence listing. In this case, the homologous ncRNA sequence is SEQ ID NO: 17078, as shown in the following computer screen image of the ncRNA sequence entry for SEQ ID NO: 17078, which, upon inspection, shares the NCBI reference number "NZ_JWGX010000025.1" and the organism "Shewanella sp.".

因此,對於涉及逆轉錄子編輯系統之實施例,可容易地鑑定出來自表A之同源逆轉錄子RT及來自表B之逆轉錄子ncRNA,該逆轉錄子編輯系統包含逆轉錄子RT,該逆轉錄子RT與其同源逆轉錄子ncRNA配對。 例示性實施例 例示性實施例組A Therefore, for embodiments involving retrotranscript editing systems, the homologous retrotranscript RT from Table A and the retrotranscript ncRNA from Table B can be easily identified, wherein the retrotranscript editing system comprises a retrotranscript RT that is paired with its homologous retrotranscript ncRNA.

以下段落描述本揭示案之例示性且非限制性實施例。The following paragraphs describe exemplary and non-limiting embodiments of the present disclosure.

段落1.   一種經工程改造之逆轉錄子,其包含: a) msr基因座(其編碼msDNA之 msrRNA部分); b)  編碼該msDNA之 msdRNA部分的 msd基因座; c)  逆轉錄子逆轉錄酶(RT),其中該 msdRNA能夠由該逆轉錄子逆轉錄酶(RT)逆轉錄以形成該msDNA; d)  在該 msd基因座、該 msr基因座上游、該 msd基因座上游或下游處或內部插入之異源核酸;及 e)  在該 msr基因座及/或該 msd基因座中之序列修飾( 例如,一或多個核苷酸之插入、缺失及/或取代),該序列修飾: i)   調節( 例如,增強)該msDNA之逆轉錄、可加工性、準確性/保真度及/或產生( 例如,在哺乳動物細胞中); ii)  調節( 例如,降低)宿主( 例如,包含該哺乳動物細胞之宿主)中由該經工程改造之逆轉錄子( 例如msr基因座及/或 msd基因座)編碼的ncRNA之免疫原性;及/或 iii) 包含調節( 例如,永久或短暫地抑制)該msDNA之功能的核苷酸序列;及/或 iv) 調節( 例如,改良)靶向基因體編輯/工程改造之效率。 Paragraph 1. An engineered retrotranscript comprising: a) an msr locus (which encodes the msr RNA portion of msDNA); b) an msd locus encoding the msd RNA portion of the msDNA; c) a retrotranscriptase (RT), wherein the msd RNA is capable of being reverse transcribed by the retrotranscriptase (RT) to form the msDNA; d) a heterologous nucleic acid inserted at the msd locus, upstream of the msr locus, upstream or downstream of the msd locus, or within the msd locus; and e) a sequence modification ( e.g. , insertion, deletion, and/or substitution of one or more nucleotides) in the msr locus and/or the msd locus, wherein the sequence modification: i) modulates ( e.g. , enhances) the reverse transcription, processability, accuracy/fidelity, and/or production ( e.g. , in mammalian cells) of the msDNA; ii) modulate ( e.g. , reduce) the immunogenicity of the ncRNA encoded by the engineered retrotranscript ( e.g. , msr locus and/or msd locus) in a host ( e.g. , a host comprising the mammalian cell); and/or iii) comprise a nucleotide sequence that modulates ( e.g. , permanently or temporarily inhibits) the function of the msDNA; and/or iv) modulates ( e.g. , improves) the efficiency of targeted genome editing/engineering.

段落2.   段落1之經工程改造之逆轉錄子,其中該經工程改造之逆轉錄子基於及/或類似於編碼以下涵蓋之野生型或共有逆轉錄子ncRNA的野生型或共有逆轉錄子之二級結構經工程改造: 1)    如SEQ ID NO: 1-21及/或圖2-22所描繪的任一序列及/或結構;或 2)    1)之變異體,其具有: A)   每10個紅色字母核苷酸多達1、2或3個( 例如,多達1個)核苷酸變化; B)    每10個黑色字母核苷酸多達4、5或6個( 例如,多達1或2個)核苷酸變化;及/或 C)    每10個灰色字母核苷酸多達7、8或9個( 例如,多達3或4個)核苷酸變化;及/或 視情況進一步包含: a)  每10個紅色圓圈核苷酸存在7、8、9或10個( 例如,9或10個)核苷酸; b)  每10個黑色圓圈核苷酸存在6、7、8、9或10個( 例如,8、9或10個)核苷酸; c)  每10個灰色圓圈核苷酸存在4、5、6、7、8、9或10個( 例如,6、7、8、9或10個)核苷酸;及/或 d)  每10個白色圓圈核苷酸存在2、3、4、5、6、7、8、9或10個( 例如,4、5、6、7、8、9或10個)核苷酸。 Paragraph 2. The engineered retrotranscript of paragraph 1, wherein the engineered retrotranscript is based on and/or engineered to be similar to the secondary structure of a wild-type or consensus retrotranscript encoding a wild-type or consensus retrotranscript ncRNA covered below: 1) any sequence and/or structure as depicted in SEQ ID NO: 1-21 and/or Figures 2-22; or 2) a variant of 1) having: A) up to 1, 2 or 3 ( e.g. , up to 1) nucleotide changes per 10 red-letter nucleotides; B) up to 4, 5 or 6 ( e.g. , up to 1 or 2) nucleotide changes per 10 black-letter nucleotides; and/or C) up to 7, 8 or 9 ( e.g. , up to 3 or 4) nucleotide changes per 10 grey-letter nucleotides; and/or further comprising, as appropriate: a) There are 7, 8, 9 or 10 ( e.g. , 9 or 10) nucleotides for every 10 red circled nucleotides; b) there are 6, 7, 8, 9 or 10 ( e.g. , 8, 9 or 10) nucleotides for every 10 black circled nucleotides; c) there are 4, 5, 6, 7, 8, 9 or 10 ( e.g. , 6, 7, 8, 9 or 10) nucleotides for every 10 grey circled nucleotides; and/or d) there are 2, 3, 4, 5, 6, 7, 8, 9 or 10 ( e.g. , 4, 5, 6, 7, 8, 9 or 10) nucleotides for every 10 white circled nucleotides.

段落3.   段落1-2中任一項之經工程改造之逆轉錄子,其中該經工程改造之逆轉錄子係藉由將該序列修飾引入編碼野生型ncRNA之野生型逆轉錄子中或引入編碼共有逆轉錄子ncRNA之合成/人工逆轉錄子中而經工程改造的。Paragraph 3. The engineered retrotranscript of any of paragraphs 1-2, wherein the engineered retrotranscript is engineered by introducing the sequence modification into a wild-type retrotranscript encoding a wild-type ncRNA or into a synthetic/artificial retrotranscript encoding a common retrotranscript ncRNA.

段落4.   段落1-3中任一項之經工程改造之逆轉錄子,其中在該經編碼之逆轉錄子ncRNA中,該序列修飾包含以下一或多者: (i)      a1、a2或a1及a2兩者中的經修飾( 例如,突變、減少或消除)之凸起; (ii)     a1、a2或a1及a2兩者之延長或縮短; (iii)    髮夾環之間的間隔序列之延伸或縮短( 例如,S1、S2、S3、S4、S5及/或S6); (iv)    髮夾環中之經修飾( 例如,突變或消除)之凸起( 例如,L1、L2、L3、L4、L5及/或L6 ( 例如,藉由移除該凸起中的未配對鹼基,或藉由用相等數目之鹼基對置換未配對鹼基)); (v)     經修飾( 例如,延長或縮短)之髮夾環長度( 例如,L1、L2、L3、L4、L5及/或L6); (vi)    具有補體、反向或反向補體序列之替代L1及/或L2; (vii)   在髮夾環( 例如,L1、L2、L3、L4、L5及/或L6)之尖端處經修飾( 例如,增加)數目之未配對鹼基; (viii)  髮夾環中經修飾( 例如,增加或減少)之GC含量( 例如,L1、L2、L3、L4、L5及/或L6); (ix)    在髮夾環之間的間隔序列中( 例如,S1、S2、S3、S4、S5及/或S6)或在髮夾環( 例如,L1、L2、L3、L4、L5及/或L6)之尖端處插入該異源核酸; (x)     缺失髮夾環( 例如,L1、L2、L3、L4、L5及/或L6); (xi)    髮夾環之間的間隔序列中添加新環( 例如,S1、S2、S3、S4、S5及/或S6); (xii)   該ncRNA之環化,其中該ncRNA之5’端及3’端直接地或經由間隔序列連接; (xiii)  能夠起始逆轉錄啟動的再定位之分支鏈鳥苷; (xiv)  降低該逆轉錄子ncRNA之免疫原性的交錯末端序列,其藉由 例如添加或移除5’ a1核苷酸及/或3’ a2核苷酸而產生;及/或, (xv)   與由該異源核酸編碼之CRISPR/Cas引導RNA (gRNA)序列互補的反義序列,其中該反義序列與該經編碼之逆轉錄子ncRNA中的該gRNA雜交且抑制該gRNA,且其中該反義序列在該msDNA之逆轉錄後經移除。 Paragraph 4. The engineered retrotransposons of any of paragraphs 1-3, wherein in the encoded retrotransposons ncRNA, the sequence modification comprises one or more of the following: (i) a modified ( e.g. , mutated, reduced, or eliminated) bulge in a1, a2, or both a1 and a2; (ii) an extension or shortening of a1, a2, or both a1 and a2; (iii) an extension or shortening of the spacer sequence between the hairpin loops ( e.g. , S1, S2, S3, S4, S5, and/or S6); (iv) a modified ( e.g. , mutated or eliminated) bulge in the hairpin loop ( e.g. , L1, L2, L3, L4, L5, and/or L6); ( e.g. , by removing unpaired bases in the protrusion, or by replacing unpaired bases with an equal number of base pairs); (v) modified ( e.g. , extended or shortened) hairpin length ( e.g. , L1, L2, L3, L4, L5 and/or L6); (vi) substituted L1 and/or L2 with complement, inverted or inverted complement sequences; (vii) modified ( e.g. , increased) number of unpaired bases at the tip of the hairpin ( e.g. , L1, L2, L3, L4, L5 and/or L6); (viii) modified ( e.g. , increased or decreased) GC content in the hairpin ( e.g. , L1, L2, L3, L4, L5 and/or L6); (ix) Insertion of the heterologous nucleic acid in the spacer sequence between the hairpin loops ( e.g. , S1, S2, S3, S4, S5 and/or S6) or at the tip of the hairpin loop ( e.g. , L1, L2, L3, L4, L5 and/or L6); (x) Deletion of the hairpin loop ( e.g. , L1, L2, L3, L4, L5 and/or L6); (xi) Addition of a new loop in the spacer sequence between the hairpin loops ( e.g. , S1, S2, S3, S4, S5 and/or S6); (xii) Circularization of the ncRNA, wherein the 5' end and the 3' end of the ncRNA are linked directly or via the spacer sequence; (xiii) Repositioning of branched guanosine capable of initiating reverse transcription initiation; (xiv) (xv) an antisense sequence complementary to a CRISPR/Cas guide RNA (gRNA) sequence encoded by the heterologous nucleic acid, wherein the antisense sequence hybridizes with and inhibits the gRNA in the encoded retrotranscript ncRNA, and wherein the antisense sequence is removed after reverse transcription of the msDNA.

段落5.   段落1-4中任一項之經工程改造之逆轉錄子,其包含: (i)    插入該 msr基因座或該 msd基因座中(諸如S區( 例如,S1、S2、S3、S4、S5及/或S6)中,或L區( 例如,L1、L2、L3、L4、L5及/或L6)之尖端)),或該 msr基因座或該 msd基因座上游或下游中之異源核酸(諸如RNA適體或核酶之編碼序列);或 (ii)   插入該 msd基因座中之第一異源核酸,及插入該 msr基因座上游或該 msd基因座下游之第二異源核酸,其中該第二異源核酸編碼CRISPR/Cas引導RNA (gRNA)。 Paragraph 5. The engineered retroviral transcript of any one of paragraphs 1-4, comprising: (i) a heterologous nucleic acid (such as a coding sequence for an RNA aptamer or a ribozyme) inserted into the msr locus or the msd locus (such as in the S region ( e.g. , S1, S2, S3, S4, S5 and/or S6), or the tip of the L region ( e.g. , L1, L2, L3, L4, L5 and/or L6)), or upstream or downstream of the msr locus or the msd locus; or (ii) a first heterologous nucleic acid inserted into the msd locus, and a second heterologous nucleic acid inserted upstream of the msr locus or downstream of the msd locus, wherein the second heterologous nucleic acid encodes a CRISPR/Cas guide RNA (gRNA).

段落6.   段落1-5中任一項之經工程改造之逆轉錄子,其中該逆轉錄子逆轉錄酶(RT)為同源RT、來自該同源RT之同一進化枝內之物種的逆轉錄子RT或未在該同源RT之同一進化枝內之逆轉錄子RT (諸如無關RT或經工程改造之RT)。Paragraph 6.   The engineered retrotransposons of any of paragraphs 1-5, wherein the retrotransposons reverse transcriptase (RT) is a homologous RT, a retrotransposon RT from a species within the same evolutionary branch of the homologous RT, or a retrotransposon RT not within the same evolutionary branch of the homologous RT (such as an unrelated RT or an engineered RT).

段落7.   段落1-6中任一項之經工程改造之逆轉錄子,其中該異源核酸編碼相關蛋白質或肽,或其中該異源核酸包含或編碼供體/模板序列( 例如,校正/修復/移除標靶基因體位點處之突變(諸如疾病基因中之突變型外顯子)的供體;功能性DNA元件(諸如啟動子、增強子、蛋白質結合序列、甲基化位點、用於輔助基因編輯之同源區等);或功能性RNA元件之編碼序列(ncRNA等))。 Paragraph 7. The engineered retrotransposons of any of paragraphs 1-6, wherein the heterologous nucleic acid encodes a protein or peptide of interest, or wherein the heterologous nucleic acid comprises or encodes a donor/template sequence ( e.g. , a donor for correcting/repairing/removing a mutation at a target genome site (e.g., a mutant exon in a disease gene); a functional DNA element (e.g., a promoter, an enhancer, a protein binding sequence, a methylation site, a homologous region for assisting gene editing, etc.); or a coding sequence for a functional RNA element (ncRNA, etc.)).

段落8.   段落7之經工程改造之逆轉錄子,其中該相關蛋白質或肽包含可用於治療疾病之治療蛋白(諸如疾病細胞中具有缺陷之wt蛋白質,或治療抗體或其抗原結合片段)。Paragraph 8. The engineered retroviral transcript of Paragraph 7, wherein the related protein or peptide comprises a therapeutic protein that can be used to treat a disease (such as a wt protein that is defective in diseased cells, or a therapeutic antibody or an antigen-binding fragment thereof).

段落9.   段落7之經工程改造之逆轉錄子,其中該供體/模板序列校正/修復/移除標靶基因體位點處之突變(諸如疾病基因中之突變型外顯子)。Paragraph 9. The engineered retrotransposons of Paragraph 7, wherein the donor/template sequence corrects/repairs/removes a mutation at a target genomic site (e.g., a mutant exon in a disease gene).

段落10.        段落1-9中任一項之經工程改造之逆轉錄子,其進一步包含或編碼序列特異性核酸酶(諸如CRISPR/Cas效應酶、ZFN、TALEN、大範圍核酸酶、TnpB、IscB或限制性核酸內切酶(RE))及/或DNA修復調節生物分子。Paragraph 10. The engineered retrotranscript of any of paragraphs 1-9, further comprising or encoding a sequence-specific nuclease (such as a CRISPR/Cas effector enzyme, ZFN, TALEN, a meganuclease, TnpB, IscB or a restriction endonuclease (RE)) and/or a DNA repair regulatory biomolecule.

段落11. 段落10之經工程改造之逆轉錄子,其中該序列特異性核酸酶視情況經由柔性連接體( 例如,包含富Gly及Ser序列(諸如G4S重複或GS重複)之柔性連接體)或藉由普遍無序之蛋白質序列(諸如非結構化親水性、生物可降解之蛋白質聚合物, 例如XTEN肽聚合物)與該RT融合。 Paragraph 11. The engineered retrotransposon of paragraph 10, wherein the sequence-specific nuclease is fused to the RT via a flexible linker ( e.g. , a flexible linker comprising Gly and Ser-rich sequences such as G4S repeats or GS repeats) or by a generally disordered protein sequence (e.g., an unstructured hydrophilic, biodegradable protein polymer, such as an XTEN peptide polymer), as appropriate.

段落12. 段落10或11之經工程改造之逆轉錄子,其中該核酸酶係與識別標靶序列之引導RNA (gRNA)形成複合物的CRISPR/Cas效應酶,其中該gRNA直接地或藉由連接體/間隔多核苷酸連接至該ncRNA及/或該msDNA。Paragraph 12. The engineered retrotransposon of paragraph 10 or 11, wherein the nuclease is a CRISPR/Cas effector enzyme that forms a complex with a guide RNA (gRNA) that recognizes a target sequence, wherein the gRNA is linked to the ncRNA and/or the msDNA directly or via a linker/spacer polynucleotide.

段落13.        段落10之經工程改造之逆轉錄子,其中該DNA修復調節生物分子為調節( 例如,增強) HDR之調節蛋白,且該調節蛋白視情況經由柔性連接體( 例如,包含富Gly及Ser序列(諸如G4S重複或GS重複)之柔性連接體)或藉由普遍無序之蛋白質序列(諸如非結構化親水性、生物可降解之蛋白質聚合物, 例如XTEN肽聚合物)與該RT或該序列特異性核酸酶融合。 Paragraph 13. The engineered retrotransposons of paragraph 10, wherein the DNA repair regulatory biomolecule is a regulatory protein that regulates ( e.g. , enhances) HDR, and the regulatory protein is fused to the RT or the sequence-specific nuclease via a flexible linker ( e.g. , a flexible linker comprising Gly and Ser-rich sequences (such as G4S repeats or GS repeats)) or by a generally disordered protein sequence (such as an unstructured hydrophilic, biodegradable protein polymer, such as an XTEN peptide polymer), as the case may be.

段落14. 一種包含載體之載體系統,該載體包含段落1-13中任一項之經工程改造之逆轉錄子。Paragraph 14. A vector system comprising a vector comprising the engineered retrotransposon of any one of Paragraphs 1-13.

段落15. 段落14之載體系統,其中該 msr基因座、該 msd基因座及該RT基因由同一載體提供。 Paragraph 15. The vector system of paragraph 14, wherein the msr locus, the msd locus and the RT gene are provided by the same vector.

段落16. 段落14或15之載體系統,其中該同一載體包含可操作地連接至該 msr基因座及/或該 msd基因座之啟動子。 Paragraph 16. The vector system of paragraph 14 or 15, wherein the same vector comprises a promoter operably linked to the msr locus and/or the msd locus.

段落17. 段落14-16中任一項之載體系統,其中該啟動子進一步可操作地連接至該RT基因。Paragraph 17. The vector system of any one of paragraphs 14-16, wherein the promoter is further operably linked to the RT gene.

段落18. 段落14之載體系統,其中該 msr基因座、該 msd基因座及該RT基因由至少兩種不同載體提供。 Paragraph 18. The vector system of paragraph 14, wherein the msr locus, the msd locus and the RT gene are provided by at least two different vectors.

段落19. 段落14或18之載體系統,其中該RT基因相對於該 msr基因及/或該 msd基因以 反式提供。 Paragraph 19. The vector system of paragraph 14 or 18, wherein the RT gene is provided in trans relative to the msr gene and/or the msd gene.

段落20. 段落14-19中任一項之載體系統,其中該一或多種載體包含病毒載體及/或非病毒載體。Paragraph 20. The vector system of any one of paragraphs 14-19, wherein the one or more vectors comprise viral vectors and/or non-viral vectors.

段落21. 段落20之載體系統,其中該非病毒載體包含質體。Paragraph 21. The vector system of paragraph 20, wherein the non-viral vector comprises a plasmid.

段落22. 段落14-21中任一項之載體系統,其包含編碼序列特異性核酸酶之載體。Paragraph 22. The vector system of any one of paragraphs 14-21, comprising a vector encoding a sequence-specific nuclease.

段落23. 段落22之載體系統,其中該序列特異性核酸酶包含RNA引導之序列特異性核酸酶( 例如,CRISPR/Cas效應酶、經工程改造之RNA引導之FokI-核酸酶( 例如dCas-FokI)、RNA引導之DNA核酸內切酶、TnpB、IscB或轉位子相關核酸酶)或非RNA引導之序列特異性核酸酶( 例如,大範圍核酸酶、鋅指核酸酶(ZFN)、TALE核酸酶(TALEN)或限制性核酸內切酶(RE))。 Paragraph 23. The vector system of paragraph 22, wherein the sequence-specific nuclease comprises an RNA-guided sequence-specific nuclease ( e.g. , a CRISPR/Cas effector enzyme, an engineered RNA-guided FokI-nuclease ( e.g., dCas-FokI), an RNA-guided DNA endonuclease, TnpB, IscB, or a translocon-associated nuclease) or a non-RNA-guided sequence-specific nuclease ( e.g. , a meganuclease, a zinc finger nuclease (ZFN), a TALE nuclease (TALEN), or a restriction endonuclease (RE)).

段落24. 段落23之載體系統,其中該Cas效應酶為1類,I型、II型或III型Cas;2類,II型Cas ( 例如Cas9);或2類,V型Cas ( 例如Cpfl)。 Paragraph 24. The vector system of paragraph 23, wherein the Cas effector enzyme is class 1, type I, type II or type III Cas; class 2, type II Cas ( e.g., Cas9); or class 2, type V Cas ( e.g. , Cpf1).

段落25. 段落23之載體系統,其中: 1)    該RNA引導之序列特異性核酸酶包含該CRISPR/Cas效應酶、該經工程改造之RNA引導之FokI核酸酶( 例如dCas-FokI)、該RNA引導之DNA核酸內切酶、TnpB、IscB、IsrB或轉位子相關核酸酶;或, 2)    非RNA引導之序列特異性核酸酶包含該大範圍核酸酶、該鋅指核酸酶(ZFN)、該TALE核酸酶(TALEN)或該限制性核酸內切酶(RE)。 Paragraph 25. The vector system of paragraph 23, wherein: 1) the RNA-guided sequence-specific nuclease comprises the CRISPR/Cas effector enzyme, the engineered RNA-guided FokI nuclease ( e.g., dCas-FokI), the RNA-guided DNA endonuclease, TnpB, IscB, IsrB, or a translocon-associated nuclease; or, 2) the non-RNA-guided sequence-specific nuclease comprises the meganuclease, the zinc finger nuclease (ZFN), the TALE nuclease (TALEN), or the restriction endonuclease (RE).

段落26. 段落14-25中任一項之載體系統,其進一步包含編碼同源重組增強子蛋白之載體。Paragraph 26. The vector system of any one of paragraphs 14-25, further comprising a vector encoding a homologous recombination enhancer protein.

段落27. 一種經分離之宿主細胞,其包含段落1-13中任一項之經工程改造之逆轉錄子或段落14-26中任一項之載體系統。Paragraph 27. An isolated host cell comprising the engineered retrotransposon of any one of paragraphs 1-13 or the vector system of any one of paragraphs 14-26.

段落28. 段落27之宿主細胞,其中該宿主細胞為原核、古核(archeon)或真核宿主細胞。Paragraph 28. The host cell of Paragraph 27, wherein the host cell is a prokaryotic, archeon or eukaryotic host cell.

段落29. 段落28之宿主細胞,其中該真核宿主細胞為哺乳動物宿主細胞。Paragraph 29. The host cell of Paragraph 28, wherein the eukaryotic host cell is a mammalian host cell.

段落30. 段落28之宿主細胞,其中該真核宿主細胞為非人類宿主細胞。Paragraph 30. The host cell of paragraph 28, wherein the eukaryotic host cell is a non-human host cell.

段落31. 段落29之宿主細胞,其中該哺乳動物宿主細胞為人類宿主細胞。Paragraph 31. The host cell of Paragraph 29, wherein the mammalian host cell is a human host cell.

段落32. 段落28-31中任一項之宿主細胞,其中該宿主細胞為人工細胞或經遺傳修飾之細胞。Paragraph 32. The host cell of any one of paragraphs 28-31, wherein the host cell is an artificial cell or a genetically modified cell.

段落33. 一種醫藥組合物,其包含段落1-13中任一項之經工程改造之逆轉錄子。Paragraph 33. A pharmaceutical composition comprising the engineered retrotransposon of any one of Paragraphs 1-13.

段落34. 一種遞送媒劑,其包含段落1-13中任一項之經工程改造之逆轉錄子或由段落1-13中任一項之經工程改造之逆轉錄子編碼的ncRNA、段落14-26中任一項之載體、段落27-32中任一項之宿主細胞或段落33之醫藥組合物。Paragraph 34. A delivery vehicle comprising an engineered retrotransposon of any of paragraphs 1-13 or an ncRNA encoded by an engineered retrotransposon of any of paragraphs 1-13, a vector of any of paragraphs 14-26, a host cell of any of paragraphs 27-32, or a pharmaceutical composition of paragraph 33.

段落35. 段落34之遞送媒劑,其為脂質奈米顆粒。Paragraph 35. The delivery medium of Paragraph 34, which is lipid nanoparticles.

段落36. 段落35之遞送媒劑,其中該脂質奈米顆粒包含選自由以下組成之群的至少一種陽離子脂質:表(I)中之脂質、具有式(I)結構之脂質、具有式(II)結構之脂質、具有式(III)結構之脂質、具有式(IV)結構之脂質、具有式(V)結構之脂質、具有式(VI)結構之脂質及其組合。Paragraph 36. The delivery medium of Paragraph 35, wherein the lipid nanoparticles comprise at least one cationic lipid selected from the group consisting of lipids in Table (I), lipids having a structure of formula (I), lipids having a structure of formula (II), lipids having a structure of formula (III), lipids having a structure of formula (IV), lipids having a structure of formula (V), lipids having a structure of formula (VI), and combinations thereof.

段落37. 一種套組,其包含段落1-13中任一項之經工程改造之逆轉錄子或由段落1-13中任一項之經工程改造之逆轉錄子編碼的ncRNA、段落14-26中任一項之載體或載體系統、段落27-32中任一項之宿主細胞、段落33之醫藥組合物或段落34-36中任一項之遞送媒劑以及關於用該經工程改造之逆轉錄子、該ncRNA、該載體/載體系統、該宿主細胞、該醫藥組合物或該遞送媒劑遺傳修飾細胞的說明書。Paragraph 37. A kit comprising an engineered retrotranscript of any of paragraphs 1-13 or an ncRNA encoded by an engineered retrotranscript of any of paragraphs 1-13, a vector or vector system of any of paragraphs 14-26, a host cell of any of paragraphs 27-32, a pharmaceutical composition of paragraph 33, or a delivery vehicle of any of paragraphs 34-36, and instructions for genetically modifying cells using the engineered retrotranscript, the ncRNA, the vector/vector system, the host cell, the pharmaceutical composition, or the delivery vehicle.

段落38. 一種修飾宿主( 例如,哺乳動物)細胞中之標靶DNA序列的方法,該方法包括將段落1-13中任一項之經工程改造之逆轉錄子或由段落1-13中任一項之經工程改造之逆轉錄子編碼的ncRNA、段落14-26中任一項之載體或載體系統、段落33之醫藥組合物或段落34-36中任一項之遞送媒劑引入該哺乳動物細胞中,以允許在該宿主( 例如,哺乳動物)細胞中產生該msDNA,其中該msDNA中之該經編碼異源核酸整合至該宿主( 例如,哺乳動物)細胞之基因體中的該標靶DNA序列處。 Paragraph 38. A method for modifying a target DNA sequence in a host ( e.g. , mammal) cell, the method comprising introducing an engineered retrotranscript of any of paragraphs 1-13 or an ncRNA encoded by an engineered retrotranscript of any of paragraphs 1-13, a vector or vector system of any of paragraphs 14-26, a pharmaceutical composition of paragraph 33, or a delivery vehicle of any of paragraphs 34-36 into the mammalian cell to allow production of the msDNA in the host ( e.g. , mammal) cell, wherein the encoded heterologous nucleic acid in the msDNA is integrated into the target DNA sequence in the genome of the host ( e.g. , mammal) cell.

段落39. 一種段落1-13中任一項之經工程改造之逆轉錄子或由段落1-13中任一項之經工程改造之逆轉錄子編碼的ncRNA、段落14-26中任一項之載體或載體系統、段落33之醫藥組合物或段落34-36中任一項之遞送媒劑的用途,用於實現宿主( 例如,哺乳動物)細胞中之標靶DNA序列的修飾。 例示性實施例組B Paragraph 39. Use of an engineered retrotransposons of any of paragraphs 1-13 or ncRNA encoded by an engineered retrotransposons of any of paragraphs 1-13, a vector or vector system of any of paragraphs 14-26, a pharmaceutical composition of paragraph 33, or a delivery vehicle of any of paragraphs 34-36 for achieving modification of a target DNA sequence in a host ( e.g. , mammalian) cell. Exemplary Embodiment Group B

以下段落進一步描述本揭示案之例示性且非限制性實施例。The following paragraphs further describe exemplary and non-limiting embodiments of the present disclosure.

段落1.   一種經工程改造之核酸構築體,其包含: a)  編碼非編碼RNA (ncRNA)之第一多核苷酸,該第一多核苷酸包含: 1)  編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及 2)  編碼該msDNA之 msdRNA部分的 msd基因座;及 b)  在選自以下之位置處或內部插入的一或多種異源核酸:該 msd基因座、該 msr基因座上游、該 msd基因座上游及該 msd基因座下游, 其中該ncRNA包含: (I)      表B中列出之ncRNA,或與表B中列出之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA;及/或 (II)       具有圖2-27之任一ncRNA結構之保守結構的ncRNA;且 其中該ncRNA視情況排除自然界中與表X之任一逆轉錄子逆轉錄酶相關的任何ncRNA。 Paragraph 1. An engineered nucleic acid construct comprising: a) a first polynucleotide encoding a non-coding RNA (ncRNA), the first polynucleotide comprising: 1) an msr locus encoding the msr RNA portion of a multi-copy single-stranded DNA (msDNA); and 2) an msd locus encoding the msd RNA portion of the msDNA; and b) one or more heterologous nucleic acids inserted at or within a position selected from: the msd locus, upstream of the msr locus, upstream of the msd locus, and downstream of the msd locus, wherein the ncRNA comprises: (I) ncRNAs listed in Table B, or ncRNAs having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with the ncRNAs listed in Table B; and/or (II) ncRNAs having a conservative structure of any of the ncRNA structures of Figures 2-27; and wherein the ncRNAs exclude any ncRNAs associated with any of the retrotransposons in Table X in nature, as the case may be.

段落2.   段落1之經工程改造之核酸構築體,其進一步包含編碼逆轉錄酶(RT)或其部分之第二多核苷酸,其中該經編碼RT或其部分能夠合成編碼該msDNA之該 msd基因座中的至少一部分之DNA複本。 Paragraph 2. The engineered nucleic acid construct of Paragraph 1, further comprising a second polynucleotide encoding a reverse transcriptase (RT) or a portion thereof, wherein the encoded RT or a portion thereof is capable of synthesizing a DNA copy of at least a portion of the msd locus encoding the msDNA.

段落3.   段落2之經工程改造之核酸構築體, 其中該第二多核苷酸包含: III)       表A中列出之多核苷酸,或與表A中列出之多核苷酸具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多核苷酸;及/或 IV)      編碼表C之共有胺基酸序列,或編碼與表C中列出之胺基酸序列具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之胺基酸序列;及/或 其中該第二多核苷酸編碼: V)     表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或 VI)      包含表C中列出之多肽共有序列的多肽,或與表C中列出之胺基酸序列具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或 其中該第二多核苷酸視情況不編碼表X中列出之胺基酸序列。 Paragraph 3.   The engineered nucleic acid construct of Paragraph 2, wherein the second polynucleotide comprises: III)       A polynucleotide listed in Table A, or a polynucleotide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with a polynucleotide listed in Table A; and/or IV)    Encoding a consensus amino acid sequence of Table C, or encoding an amino acid sequence having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with an amino acid sequence listed in Table C; and/or wherein the second polynucleotide encodes: V)    A polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with a polypeptide listed in Table A; and/or VI)     A polypeptide comprising a consensus sequence of the polypeptides listed in Table C, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with an amino acid sequence listed in Table C; and/or wherein the second polynucleotide does not encode an amino acid sequence listed in Table X as the case may be.

段落4.   一種經工程改造之核酸構築體,其包含: a)  編碼非編碼RNA (ncRNA)之第一多核苷酸,該第一多核苷酸包含: 1)  編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及 2)  編碼該msDNA之 msdRNA部分的 msd基因座; b)  在選自以下之位置處或內部插入的一或多種異源核酸:該 msd基因座、該 msr基因座上游、該 msd基因座上游及該 msd基因座下游;及 c)  編碼逆轉錄酶(RT)或其部分之第二多核苷酸,其中該經編碼RT或其部分能夠合成編碼該msDNA之該 msd基因座中的至少一部分之DNA複本,且 其中該第一多核苷酸之該非編碼RNA (ncRNA)視情況具有圖2-27之任一ncRNA結構之保守結構; 其中該第二多核苷酸包含: I)  表A中列出之多核苷酸,或與表A中列出之多核苷酸具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多核苷酸;及/或 其中該第二多核苷酸編碼: II) 表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或 IV)      包含表C中列出之多肽共有序列的多肽,或與表C中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;且 其中該第二多核苷酸視情況不編碼表X之胺基酸序列。 Paragraph 4. An engineered nucleic acid construct comprising: a) a first polynucleotide encoding a non-coding RNA (ncRNA), the first polynucleotide comprising: 1) an msr locus encoding the msr RNA portion of multiple copies of single-stranded DNA (msDNA); and 2) an msd locus encoding the msd RNA portion of the msDNA; b) one or more heterologous nucleic acids inserted at or within a position selected from: the msd locus, upstream of the msr locus, upstream of the msd locus, and downstream of the msd locus; and c) a second polynucleotide encoding a reverse transcriptase (RT) or a portion thereof, wherein the encoded RT or a portion thereof is capable of synthesizing a DNA copy of at least a portion of the msd locus encoding the msDNA, and wherein the non-coding RNA (ncRNA) of the first polynucleotide optionally has a conserved structure of any one of the ncRNA structures of Figures 2-27; wherein the second polynucleotide comprises: 1) The polynucleotides listed in Table A, or polynucleotides having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to the polynucleotides listed in Table A; and/or wherein the second polynucleotide encodes: II) A polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polypeptide listed in Table A; and/or IV) A polypeptide comprising a consensus sequence of a polypeptide listed in Table C, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polypeptide listed in Table C; and wherein the second polynucleotide optionally does not encode the amino acid sequence of Table X.

段落4a. 一種經工程改造之核酸構築體,其包含: 1) msr基因座(其編碼msDNA之 msrRNA部分); 2)  編碼該msDNA之 msdRNA部分的 msd基因座; 3)  編碼逆轉錄子逆轉錄酶(RT)之序列,其中該 msdRNA能夠由該逆轉錄子逆轉錄酶(RT)逆轉錄以形成該msDNA;及 4)  在該 msd基因座、該 msr基因座上游、該 msd基因座上游或下游處或內部插入的異源核酸; 其中該經工程改造之核酸構築體視情況具有(a)圖2-27中任一者之野生型ncRNA的二級結構,或 b)  a)之變異體,其具有: i)   每10個紅色字母核苷酸多達1、2或3個(例如,多達1個)核苷酸變化; ii)  每10個黑色字母核苷酸多達4、5或6個(例如,多達1或2個)核苷酸變化;及/或 iii) 每10個灰色字母核苷酸多達7、8或9個(例如,多達3或4個)核苷酸變化。 Paragraph 4a. An engineered nucleic acid construct comprising: 1) an msr locus (which encodes the msr RNA portion of the msDNA); 2) an msd locus encoding the msd RNA portion of the msDNA; 3) a sequence encoding a retrotranscriptase (RT), wherein the msd RNA can be reverse transcribed by the retrotranscriptase (RT) to form the msDNA; and 4) a heterologous nucleic acid inserted at or within the msd locus, upstream of the msr locus, upstream or downstream of the msd locus; wherein the engineered nucleic acid construct has, as the case may be, (a) the secondary structure of the wild-type ncRNA of any one of Figures 2-27, or b) a variant of a), having: i) up to 1, 2 or 3 (e.g., up to 1) nucleotide changes for every 10 red letter nucleotides; ii) Up to 4, 5 or 6 (eg, up to 1 or 2) nucleotide changes per 10 black lettered nucleotides; and/or iii) up to 7, 8 or 9 (eg, up to 3 or 4) nucleotide changes per 10 grey lettered nucleotides.

段落5.   段落1至4a中任一項之經工程改造之核酸構築體,其包含 msr基因座及/或 msd基因座中的一或多個序列修飾(例如,一或多個核苷酸之插入、缺失及/或取代),該一或多個序列修飾: a)  調節(例如,增強)該msDNA之逆轉錄、可加工性、準確性/保真度及/或產生(例如,在哺乳動物細胞中); b)  調節(例如,降低)宿主(例如,包含該哺乳動物細胞之宿主)中由該經工程改造之逆轉錄子(例如, msr基因座及/或 msd基因座)編碼的ncRNA之免疫原性; c)  調節(例如,永久或短暫地抑制)該msDNA之功能;及/或 d)  調節(例如,改良)靶向基因體編輯/工程改造之效率。 Paragraph 5. An engineered nucleic acid construct of any one of paragraphs 1 to 4a, comprising one or more sequence modifications (e.g., insertion, deletion and/or substitution of one or more nucleotides) in the msr locus and/or msd locus, wherein the one or more sequence modifications: a) modulate (e.g., enhance) the reverse transcription, processability, accuracy/fidelity and/or production (e.g., in mammalian cells) of the msDNA; b) modulate (e.g., reduce) the immunogenicity of the ncRNA encoded by the engineered retrotranscript (e.g., msr locus and/or msd locus) in a host (e.g., a host comprising the mammalian cell); c) modulate (e.g., permanently or temporarily inhibit) the function of the msDNA; and/or d) modulate (e.g., improve) the efficiency of targeted genome editing/engineering.

段落6.   段落1至4中任一項之經工程改造之核酸構築體,其中該經工程改造之核酸構築體具有編碼以下涵蓋之野生型逆轉錄子ncRNA的野生型逆轉錄子之二級結構: a)  如圖2-27所描繪之任一結構,或 b)  a)之變異體,其具有: i)   每10個紅色字母核苷酸多達1、2或3個(例如,多達1個)核苷酸變化; ii)  每10個黑色字母核苷酸多達4、5或6個(例如,多達1或2個)核苷酸變化;及/或 iii) 每10個灰色字母核苷酸多達7、8或9個(例如,多達3或4個)核苷酸變化;及/或 視情況進一步包含: i)   每10個紅色圓圈核苷酸存在7、8、9或10個(例如,9或10個)核苷酸; ii)  每10個黑色圓圈核苷酸存在6、7、8、9或10個(例如,8、9或10個)核苷酸; iii) 每10個灰色圓圈核苷酸存在4、5、6、7、8、9或10個(例如,6、7、8、9或10個)核苷酸;及/或 iv) 每10個白色圓圈核苷酸存在2、3、4、5、6、7、8、9或10個(例如,4、5、6、7、8、9或10個)核苷酸。 Paragraph 6.   An engineered nucleic acid construct of any of paragraphs 1 to 4, wherein the engineered nucleic acid construct has a secondary structure of a wild-type retrotranscript encoding a wild-type retrotranscript ncRNA covered below: a)  Any structure as depicted in Figure 2-27, or b)  A variant of a) having: i)   Up to 1, 2 or 3 (e.g., up to 1) nucleotide changes per 10 red-letter nucleotides; ii)  Up to 4, 5 or 6 (e.g., up to 1 or 2) nucleotide changes per 10 black-letter nucleotides; and/or iii) Up to 7, 8 or 9 (e.g., up to 3 or 4) nucleotide changes per 10 grey-letter nucleotides; and/or further comprising, as the case may be: i)  7, 8, 9 or 10 (e.g., 9 or 10) nucleotides are present for every 10 red circled nucleotides; ii) 6, 7, 8, 9 or 10 (e.g., 8, 9 or 10) nucleotides are present for every 10 black circled nucleotides; iii) 4, 5, 6, 7, 8, 9 or 10 (e.g., 6, 7, 8, 9 or 10) nucleotides are present for every 10 grey circled nucleotides; and/or iv) 2, 3, 4, 5, 6, 7, 8, 9 or 10 (e.g., 4, 5, 6, 7, 8, 9 or 10) nucleotides are present for every 10 white circled nucleotides.

段落7.   段落1-6中任一項之經工程改造之核酸構築體,其中該核酸構築體係藉由將該一或多個序列修飾引入編碼表B中列出之野生型ncRNA的野生型逆轉錄子中而經工程改造的。Paragraph 7. The engineered nucleic acid construct of any of paragraphs 1-6, wherein the nucleic acid construct is engineered by introducing the one or more sequence modifications into a wild-type retrotranscript of a wild-type ncRNA listed in Coding Table B.

段落8.   段落1-7中任一項之經工程改造之核酸構築體,其中該ncRNA中之該一或多個序列修飾包含以下一或多者: (i)          a1、a2或a1及a2兩者中的經修飾(例如,突變、減少或消除)之凸起; (ii)         a1、a2或a1及a2兩者之延長或縮短; (iii)        髮夾環之間的間隔序列之延伸或縮短(例如,S1、S2、S3及/或S4); (iv)        髮夾環中之額外或經修飾(例如,突變或消除)之凸起(例如,L2及/或L3 (例如,藉由移除該凸起中的未配對鹼基,或藉由用相等數目之鹼基對置換未配對鹼基)); (v)         髮夾環之經修飾(例如,延長或縮短)之長度(例如,L1、L2、L3及/或L4); (vi)        具有補體、反向或反向補體序列之替代L1及/或L2; (vii)       在髮夾環(例如,L1、L2、L3及/或L4)之尖端處經修飾(例如,增加)數目之未配對鹼基; (viii)      髮夾環中經修飾(例如,增加或減少)之GC含量(例如,L1、L2、L3及/或L4); (ix)        在髮夾環之間的間隔序列中(例如,S1、S2、S3及/或S4)或在髮夾環之尖端處(例如,L1、L2、L3及/或L4)插入該異源核酸; (x)         缺失一或多個髮夾環(例如,L1、L2、L3及/或L4); (xi)        髮夾環之間的間隔序列中添加新環(例如,S1、S2、S3及/或S4); (xii)       該ncRNA之環化,其中該ncRNA之5'端及3'端直接地或經由間隔序列連接; (xiii)      能夠起始逆轉錄啟動的再定位之分支鏈鳥苷; (xiv)      降低該逆轉錄子ncRNA之免疫原性的交錯末端序列,其藉由例如添加或移除5' a1核苷酸及/或3' a2核苷酸而產生;及/或, (xv)       與由該異源核酸編碼之CRISPR/Cas引導RNA (gRNA)序列互補的反義序列,其中該反義序列與該經編碼之逆轉錄子ncRNA中的該gRNA雜交且抑制該gRNA,且其中該反義序列在該msDNA之逆轉錄後經移除。 Paragraph 8.   The engineered nucleic acid construct of any of paragraphs 1-7, wherein the one or more sequence modifications in the ncRNA include one or more of the following: (i)          A modified (e.g., mutated, reduced, or eliminated) bulge in a1, a2, or both a1 and a2; (ii)          An extension or shortening of a1, a2, or both a1 and a2; (iii)        An extension or shortening of the spacer sequence between the hairpin loops (e.g., S1, S2, S3, and/or S4); (iv)        An additional or modified (e.g., mutated or eliminated) bulge in the hairpin loop (e.g., L2 and/or L3 (e.g., by removing unpaired bases in the protrusion, or by replacing unpaired bases with an equal number of base pairs); (v)         the modified (e.g., extended or shortened) length of the hairpin (e.g., L1, L2, L3 and/or L4); (vi)        replacing L1 and/or L2 with complement, reverse or reverse complement sequences; (vii)       the modified (e.g., increased) number of unpaired bases at the tip of the hairpin (e.g., L1, L2, L3 and/or L4); (viii)      the modified (e.g., increased or decreased) GC content in the hairpin (e.g., L1, L2, L3 and/or L4); (ix) Insertion of the heterologous nucleic acid in the spacer sequence between the hairpin loops (e.g., S1, S2, S3 and/or S4) or at the tip of the hairpin loop (e.g., L1, L2, L3 and/or L4); (x)         Deletion of one or more hairpin loops (e.g., L1, L2, L3 and/or L4); (xi)        Addition of new loops in the spacer sequence between the hairpin loops (e.g., S1, S2, S3 and/or S4); (xii)       Circularization of the ncRNA, wherein the 5' end and the 3' end of the ncRNA are directly or via a spacer sequence; (xiii)      Repositioned branched guanosine capable of initiating reverse transcription initiation; (xiv)     a staggered end sequence that reduces the immunogenicity of the retrotranscript ncRNA, generated by, for example, adding or removing 5' a1 nucleotides and/or 3' a2 nucleotides; and/or, (xv)       an antisense sequence that is complementary to a CRISPR/Cas guide RNA (gRNA) sequence encoded by the heterologous nucleic acid, wherein the antisense sequence hybridizes with and inhibits the gRNA in the encoded retrotranscript ncRNA, and wherein the antisense sequence is removed after reverse transcription of the msDNA.

段落9.   段落1-8中任一項之經工程改造之核酸構築體,其中該一或多個異源核酸序列包含: a)  插入該 msr基因座或該 msd基因座中(諸如S區(例如,S1、S2、S3及/或S4),或L區(例如,L1、L2、L3及/或L4)之尖端,或該 msr基因座或該 msd基因座上游或下游中之異源核酸(諸如RNA適體或核酶之編碼序列);或 b)  插入該 msd基因座中之第一異源核酸,及插入該 msr基因座上游或該 msd基因座下游之第二異源核酸,其中該第二異源核酸編碼引導RNA。 Paragraph 9. The engineered nucleic acid construct of any one of paragraphs 1-8, wherein the one or more heterologous nucleic acid sequences comprise: a) a heterologous nucleic acid (such as a coding sequence for an RNA aptamer or a ribozyme) inserted into the msr locus or the msd locus (such as the tip of the S region (e.g., S1, S2, S3 and/or S4), or the L region (e.g., L1, L2, L3 and/or L4), or upstream or downstream of the msr locus or the msd locus; or b) a first heterologous nucleic acid inserted into the msd locus, and a second heterologous nucleic acid inserted upstream of the msr locus or downstream of the msd locus, wherein the second heterologous nucleic acid encodes a guide RNA.

段落10.     段落1-9中任一項之經工程改造之核酸構築體,其中該異源核酸編碼: (a) 相關蛋白質或肽,或其中該異源核酸包含; (b) DNA供體模板序列; (c) 選自啟動子、增強子、蛋白質結合序列、甲基化位點、用於輔助基因編輯之同源區及其類似元件之功能性DNA元件;或 (d) 選自引導RNA及ncRNA之功能性RNA元件的編碼序列。 Paragraph 10.     The engineered nucleic acid construct of any one of paragraphs 1-9, wherein the heterologous nucleic acid encodes: (a) a protein or peptide of interest, or wherein the heterologous nucleic acid comprises; (b) a DNA donor template sequence; (c) a functional DNA element selected from a promoter, an enhancer, a protein binding sequence, a methylation site, a homologous region for assisting gene editing, and the like; or (d) a coding sequence for a functional RNA element selected from a guide RNA and a ncRNA.

段落11. 段落10之經工程改造之核酸構築體,其中該相關蛋白質或肽包含可用於治療疾病之治療蛋白。Paragraph 11. The engineered nucleic acid construct of Paragraph 10, wherein the related protein or peptide comprises a therapeutic protein that can be used to treat a disease.

段落12. 段落10之經工程改造之核酸構築體,其中該DNA供體模板序列校正/修復/移除標靶基因體位點處之突變。Paragraph 12. The engineered nucleic acid construct of Paragraph 10, wherein the DNA donor template sequence corrects/repairs/removes a mutation at a target genomic site.

段落13. 段落1-12中任一項之經工程改造之核酸構築體,其進一步包含或編碼序列特異性核酸酶(例如,CRISPR/Cas效應酶、ZFN、TALEN、大範圍核酸酶、TnpB、IscB或限制性核酸內切酶(RE))及/或DNA修復調節生物分子。Paragraph 13. The engineered nucleic acid construct of any of paragraphs 1-12, further comprising or encoding a sequence-specific nuclease (e.g., a CRISPR/Cas effector enzyme, a ZFN, a TALEN, a meganuclease, TnpB, IscB, or a restriction endonuclease (RE)) and/or a DNA repair regulatory biomolecule.

段落13b.      段落1-13之經工程改造之核酸構築體,其中該經工程改造之核酸為全RNA組分系統。Paragraph 13b. The engineered nucleic acid construct of paragraphs 1-13, wherein the engineered nucleic acid is an all-RNA component system.

段落13c.      段落1-13之經工程改造之核酸構築體,其中該經工程改造之核酸為全DNA分子系統。Paragraph 13c. The engineered nucleic acid construct of paragraphs 1-13, wherein the engineered nucleic acid is a fully DNA molecule system.

段落14. 段落13之經工程改造之核酸構築體,其中該序列特異性核酸酶視情況經由柔性連接體(例如,包含富Gly及Ser序列(諸如G4S重複或GS重複)之柔性連接體)或藉由普遍無序之蛋白質序列(諸如非結構化親水性、生物可降解之蛋白質聚合物,例如XTEN肽聚合物)與該RT融合。Paragraph 14. The engineered nucleic acid construct of paragraph 13, wherein the sequence-specific nuclease is fused to the RT via a flexible linker (e.g., a flexible linker comprising Gly and Ser-rich sequences such as G4S repeats or GS repeats) or by a generally disordered protein sequence (e.g., an unstructured hydrophilic, biodegradable protein polymer, such as an XTEN peptide polymer), as appropriate.

段落15. 段落13或14之經工程改造之核酸構築體,其中該核酸酶係與識別標靶序列之引導RNA (gRNA)形成複合物的CRISPR/Cas效應酶,其中該gRNA直接地或藉由連接體/間隔多核苷酸連接至該ncRNA及/或該msDNA。Paragraph 15. The engineered nucleic acid construct of paragraph 13 or 14, wherein the nuclease is a CRISPR/Cas effector enzyme that forms a complex with a guide RNA (gRNA) that recognizes a target sequence, wherein the gRNA is linked to the ncRNA and/or the msDNA directly or via a linker/spacer polynucleotide.

段落16. 段落13之經工程改造之核酸構築體,其中該DNA修復調節生物分子為調節(例如,增強) HDR之調節蛋白,且該調節蛋白視情況經由柔性連接體(例如,包含富Gly及Ser序列(諸如G4S重複或GS重複)之柔性連接體)或藉由普遍無序之蛋白質序列(諸如非結構化親水性、生物可降解之蛋白質聚合物,例如XTEN肽聚合物)視情況與該RT或該序列特異性核酸酶融合。Paragraph 16. The engineered nucleic acid construct of paragraph 13, wherein the DNA repair regulatory biomolecule is a regulatory protein that regulates (e.g., enhances) HDR, and the regulatory protein is fused to the RT or the sequence-specific nuclease, as the case may be, via a flexible linker (e.g., a flexible linker comprising Gly and Ser-rich sequences (such as G4S repeats or GS repeats)) or by a generally disordered protein sequence (such as an unstructured hydrophilic, biodegradable protein polymer, such as an XTEN peptide polymer).

段落17. 一種包含一或多種載體之載體系統,該一或多種載體包含段落1-16中任一項之經工程改造之核酸構築體,其中該載體系統視情況為全RNA的。Paragraph 17. A vector system comprising one or more vectors comprising the engineered nucleic acid construct of any one of paragraphs 1-16, wherein the vector system is optionally all-RNA.

段落18. 段落17之載體系統,其中該 msr基因座、該 msd基因座及編碼該RT之該多核苷酸包含於同一載體內。 Paragraph 18. The vector system of paragraph 17, wherein the msr locus, the msd locus and the polynucleotide encoding the RT are contained in the same vector.

段落19. 段落17或18之載體系統,其中該同一載體進一步包含可操作地連接至該 msr基因座及/或該 msd基因座之啟動子。 Paragraph 19. The vector system of paragraph 17 or 18, wherein the same vector further comprises a promoter operably linked to the msr locus and/or the msd locus.

段落20. 段落19之載體系統,其中該啟動子進一步可操作地連接至編碼該RT之該多核苷酸。Paragraph 20. The vector system of paragraph 19, wherein the promoter is further operably linked to the polynucleotide encoding the RT.

段落21. 一種包含一或多種載體之載體系統,其包含段落1或2之經工程改造之核酸構築體,其中該載體系統進一步包含編碼逆轉錄酶(RT)或其部分之第二多核苷酸,其中該經編碼RT能夠合成編碼該msDNA之該 msd基因座的至少一部分之DNA複本,且其中該 msr基因座、該 msd基因座及編碼該RT之該第二多核苷酸由至少兩種不同載體提供。 Paragraph 21. A vector system comprising one or more vectors, comprising the engineered nucleic acid construct of paragraph 1 or 2, wherein the vector system further comprises a second polynucleotide encoding a reverse transcriptase (RT) or a portion thereof, wherein the encoded RT is capable of synthesizing a DNA copy of at least a portion of the msd locus encoding the msDNA, and wherein the msr locus, the msd locus and the second polynucleotide encoding the RT are provided by at least two different vectors.

段落22. 段落21之載體系統,其中: a)  該第二多核苷酸包含: i)   表A中列出之多核苷酸,或與表A中列出之多核苷酸具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多核苷酸;及/或 b)  該第二多核苷酸編碼: i)   表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或 ii)  表C中列出之多肽,或與表C中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及 其中該第二多核苷酸視情況不編碼表X中列出之多肽。 Paragraph 22. The vector system of paragraph 21, wherein: a) the second polynucleotide comprises: i)   a polynucleotide listed in Table A, or a polynucleotide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with a polynucleotide listed in Table A; and/or b) the second polynucleotide encodes: i)  A polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with a polypeptide listed in Table A; and/or ii) A polypeptide listed in Table C, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity with a polypeptide listed in Table C; and wherein the second polynucleotide does not encode a polypeptide listed in Table X, as the case may be.

段落23. 段落21或22之載體系統,其中編碼該RT之該多核苷酸相對於該 msr基因及/或該 msd基因以 反式提供。 Paragraph 23. The vector system of paragraph 21 or 22, wherein the polynucleotide encoding the RT is provided in trans relative to the msr gene and/or the msd gene.

段落24. 段落17-23中任一項之載體系統,其中該一或多種載體包含病毒載體。Paragraph 24. The vector system of any of paragraphs 17-23, wherein the one or more vectors comprise a viral vector.

段落25. 段落24之載體系統,其中該病毒載體為逆轉錄病毒載體、慢病毒載體、腺病毒載體、腺相關病毒載體、牛痘病毒載體、痘病毒載體或單純疱疹病毒載體。Paragraph 25. The vector system of Paragraph 24, wherein the viral vector is a retroviral vector, a lentiviral vector, an adenoviral vector, an adeno-associated viral vector, a vaccinia viral vector, a poxvirus vector or a herpes simplex virus vector.

段落26. 段落17-23中任一項之載體系統,其中該一或多種載體包含非病毒載體。Paragraph 26. The vector system of any of paragraphs 17-23, wherein the one or more vectors comprise a non-viral vector.

段落27. 段落26之載體系統,其中該非病毒載體包含質體。Paragraph 27. The vector system of paragraph 26, wherein the non-viral vector comprises a plasmid.

段落28. 段落26之載體系統,其中該非病毒載體包含脂質體、脂質奈米顆粒(LNP)、陽離子聚合物、囊泡或金奈米顆粒。Paragraph 28. The vector system of Paragraph 26, wherein the non-viral vector comprises a liposome, a lipid nanoparticle (LNP), a cationic polymer, a vesicle or a gold nanoparticle.

段落29. 段落17-28中任一項之載體系統,其包含編碼序列特異性核酸酶之載體。Paragraph 29. The vector system of any one of paragraphs 17-28, comprising a vector encoding a sequence-specific nuclease.

段落30. 段落29之載體系統,其中該序列特異性核酸酶包含RNA引導之序列特異性核酸酶(例如,CRISPR/Cas效應酶、經工程改造之RNA引導之FokI-核酸酶(例如dCas-FokI)、RNA引導之DNA核酸內切酶、TnpB、IscB或轉位子相關核酸酶)或非RNA引導之序列特異性核酸酶(例如,大範圍核酸酶、鋅指核酸酶(ZFN)、TALE核酸酶(TALEN)或限制性核酸內切酶(RE))。Paragraph 30. The vector system of paragraph 29, wherein the sequence-specific nuclease comprises an RNA-guided sequence-specific nuclease (e.g., a CRISPR/Cas effector enzyme, an engineered RNA-guided FokI-nuclease (e.g., dCas-FokI), an RNA-guided DNA endonuclease, TnpB, IscB, or a translocon-associated nuclease) or a non-RNA-guided sequence-specific nuclease (e.g., a meganuclease, a zinc finger nuclease (ZFN), a TALE nuclease (TALEN), or a restriction endonuclease (RE)).

段落31. 段落30之載體系統,其中該Cas效應酶為1類,I型、II型或III型Cas;2類,II型Cas (例如Cas9);或2類,V型Cas (例如Cpfl)。Paragraph 31. The vector system of paragraph 30, wherein the Cas effector enzyme is class 1, type I, type II or type III Cas; class 2, type II Cas (e.g., Cas9); or class 2, type V Cas (e.g., Cpf1).

段落32. 段落30之載體系統,其中: 1)      該RNA引導之序列特異性核酸酶包含該CRISPR/Cas效應酶、該經工程改造之RNA引導之FokI核酸酶(例如dCas-FokI)、該RNA引導之DNA核酸內切酶、TnpB、IscB、IsrB或轉位子相關核酸酶;或, 2)      非RNA引導之序列特異性核酸酶包含該大範圍核酸酶、該鋅指核酸酶(ZFN)、該TALE核酸酶(TALEN)或該限制性核酸內切酶(RE)。 Paragraph 32. The vector system of paragraph 30, wherein: 1)      the RNA-guided sequence-specific nuclease comprises the CRISPR/Cas effector enzyme, the engineered RNA-guided FokI nuclease (e.g., dCas-FokI), the RNA-guided DNA endonuclease, TnpB, IscB, IsrB, or a translocon-associated nuclease; or, 2)      the non-RNA-guided sequence-specific nuclease comprises the meganuclease, the zinc finger nuclease (ZFN), the TALE nuclease (TALEN), or the restriction endonuclease (RE).

段落33. 段落17-32中任一項之載體系統,其進一步包含編碼同源重組增強子蛋白之載體。Paragraph 33. The vector system of any one of paragraphs 17-32, further comprising a vector encoding a homologous recombination enhancer protein.

段落34. 一種RNA分子,該RNA分子由段落1-16中任一項之經工程改造之核酸構築體編碼。Paragraph 34. An RNA molecule encoded by the engineered nucleic acid construct of any of Paragraphs 1-16.

段落35. 一種基於重組逆轉錄子之基因體編輯系統,其包含: a)  非編碼RNA (ncRNA),其包含: i)   編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及 ii)  編碼該msDNA之 msdRNA部分的 msd基因座; b)  在選自以下之位置處或內部插入的異源核酸:該 msd基因座、該 msr基因座上游、該 msd基因座上游及該 msd基因座下游;及 c)  編碼逆轉錄酶(RT)或其結構域之序列,其包含: i)   表A中列出之多肽,或與表A中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及/或 ii)  表C中列出之多肽,或與表C中列出之多肽具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及 其中,該RT視情況不包含表X中列出之多肽。 Paragraph 35. A genome editing system based on recombinant retrotranscripts, comprising: a) non-coding RNA (ncRNA) comprising: i) an msr locus encoding the msr RNA portion of a multi-copy single-stranded DNA (msDNA); and ii) an msd locus encoding the msd RNA portion of the msDNA; b) a heterologous nucleic acid inserted at or within a position selected from: the msd locus, upstream of the msr locus, upstream of the msd locus, and downstream of the msd locus; and c) a sequence encoding a reverse transcriptase (RT) or a domain thereof, comprising: i) A polypeptide listed in Table A, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to a polypeptide listed in Table A; and/or ii) The polypeptides listed in Table C, or polypeptides having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to the polypeptides listed in Table C; and wherein the RT optionally does not comprise a polypeptide listed in Table X.

段落36.     一種基於重組逆轉錄子之基因體編輯系統,其包含: a)  非編碼RNA (ncRNA),其包含: i)   編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及 ii)  編碼該msDNA之 msdRNA部分的 msd基因座; 其中該ncRNA包含: i)   表B中列出之ncRNA,或與表B中列出之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA;及/或 ii)  具有圖2-27之任一ncRNA結構之保守結構的ncRNA;且 其中該ncRNA視情況排除自然界中與表X之任一逆轉錄子逆轉錄酶相關的任何ncRNA; b)  在選自以下之位置處或內部插入的異源核酸:該 msd基因座;該 msr基因座上游;該 msd基因座上游;及該 msd基因座下游;及 c)  逆轉錄酶(RT)或其部分,其中該RT能夠合成編碼該msDNA之該 msd基因座中的至少一部分之DNA複本。 Paragraph 36. A genome editing system based on recombinant retrotranscripts, comprising: a) a non-coding RNA (ncRNA) comprising: i) an msr locus encoding a msr RNA portion of a multi-copy single-stranded DNA (msDNA); and ii) an msd locus encoding a msd RNA portion of the msDNA; wherein the ncRNA comprises: i) ncRNAs listed in Table B, or ncRNAs having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to the ncRNAs listed in Table B; and/or ii) ncRNAs having a conservative structure of any of the ncRNA structures of Figures 2-27; and wherein the ncRNA excludes any ncRNA associated with any of the retrotransposons in Table X in nature, as the case may be; b) a heterologous nucleic acid inserted at or within a position selected from: the msd locus; upstream of the msr locus; upstream of the msd locus; and downstream of the msd locus; and c) a reverse transcriptase (RT) or a portion thereof, wherein the RT is capable of synthesizing a DNA copy of at least a portion of the msd locus encoding the msDNA.

段落37.     一種基於重組逆轉錄子之基因體編輯系統,其包含: a)  非編碼RNA (ncRNA),其包含: i)   編碼多複本單股DNA (msDNA)之 msrRNA部分的 msr基因座;及 ii)  編碼該msDNA之 msdRNA部分的 msd基因座; 其中,該ncRNA包含: i)   表B中列出之ncRNA,或與表B中列出之ncRNA具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之ncRNA;及/或 ii)  具有圖2-27之任一ncRNA結構之保守結構的ncRNA;且 其中該ncRNA視情況排除自然界中與表X之任一逆轉錄子逆轉錄酶相關的任何ncRNA; b)  在選自以下之位置處或內部插入的異源核酸:該 msd基因座、該 msr基因座上游、該 msd基因座上游及該 msd基因座下游;及 c)  逆轉錄酶(RT)或其結構域: 其中該RT包含: i)   表A中列出之RT,或與表A中列出之RT具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之RT;及/或 ii)  表C中列出之共有序列,或與表C中列出之胺基酸序列具有至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%或至少99.9%序列一致性之多肽;及 其中該RT視情況不包含表X中列出之RT。 Paragraph 37. A genome editing system based on recombinant retrotranscripts, comprising: a) a non-coding RNA (ncRNA) comprising: i) an msr locus encoding a msr RNA portion of a multi-copy single-stranded DNA (msDNA); and ii) an msd locus encoding a msd RNA portion of the msDNA; wherein the ncRNA comprises: i) ncRNAs listed in Table B, or ncRNAs having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to the ncRNAs listed in Table B; and/or ii) ncRNAs having a conservative structure of any ncRNA structure of Figures 2-27; and wherein the ncRNA excludes any ncRNA associated with any retrotransposase of Table X in nature as the case may be; b) A heterologous nucleic acid inserted at or within a position selected from the group consisting of: the msd locus, upstream of the msr locus, upstream of the msd locus, and downstream of the msd locus; and c) a reverse transcriptase (RT) or a domain thereof: wherein the RT comprises: i) an RT listed in Table A, or an RT having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to an RT listed in Table A; and/or ii) The consensus sequence listed in Table C, or a polypeptide having at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% sequence identity to the amino acid sequence listed in Table C; and wherein the RT optionally does not include the RT listed in Table X.

段落38. 一種經分離之宿主細胞,其包含段落1-16中任一項之經工程改造之核酸構築體、段落17-33中任一項之載體系統、段落34之RNA分子或段落35-37中任一項之基於重組逆轉錄子之基因體編輯系統。Paragraph 38. An isolated host cell comprising the engineered nucleic acid construct of any of paragraphs 1-16, the vector system of any of paragraphs 17-33, the RNA molecule of paragraph 34, or the recombinant retroviral transcript-based genome editing system of any of paragraphs 35-37.

段落39. 段落38之經分離之宿主細胞,其中該宿主細胞為原核、古核(archeon)或真核宿主細胞。Paragraph 39. The isolated host cell of paragraph 38, wherein the host cell is a prokaryotic, archeon or eukaryotic host cell.

段落40. 段落38之經分離之宿主細胞,其中該真核宿主細胞為哺乳動物宿主細胞。Paragraph 40. The isolated host cell of paragraph 38, wherein the eukaryotic host cell is a mammalian host cell.

段落41. 段落39之經分離之宿主細胞,其中該真核宿主細胞為非人類宿主細胞。Paragraph 41. The isolated host cell of Paragraph 39, wherein the eukaryotic host cell is a non-human host cell.

段落42. 段落40之經分離之宿主細胞,其中該哺乳動物宿主細胞為人類宿主細胞。Paragraph 42. The isolated host cell of Paragraph 40, wherein the mammalian host cell is a human host cell.

段落43. 段落38-42中任一項之經分離之宿主細胞,其中該宿主細胞為人工細胞或經遺傳修飾之細胞。Paragraph 43. The isolated host cell of any one of paragraphs 38-42, wherein the host cell is an artificial cell or a genetically modified cell.

段落44. 一種醫藥組合物,其包含: a)  段落1-16中任一項之經工程改造之核酸構築體、由段落1-16中任一項之經工程改造之核酸構築體編碼的ncRNA、段落17-33中任一項之載體系統、段落34之RNA分子、段落35-37中任一項之基於重組逆轉錄子之基因體編輯系統及/或段落38-43中任一項之經分離之宿主細胞;及 b)  醫藥學上可接受之載劑。 Paragraph 44. A pharmaceutical composition comprising: a)  An engineered nucleic acid construct of any one of paragraphs 1-16, an ncRNA encoded by an engineered nucleic acid construct of any one of paragraphs 1-16, a vector system of any one of paragraphs 17-33, an RNA molecule of paragraph 34, a genome editing system based on a recombinant retrotransposon of any one of paragraphs 35-37, and/or an isolated host cell of any one of paragraphs 38-43; and b)  A pharmaceutically acceptable carrier.

段落45. 一種醫藥組合物,其包含: a)  脂質奈米顆粒(LNP);及 b)  段落1-16中任一項之經工程改造之核酸構築體、由段落1-16中任一項之經工程改造之核酸構築體編碼的ncRNA、段落17-33中任一項之載體系統、段落34之RNA分子及/或段落35-37中任一項之基於重組逆轉錄子之基因體編輯系統。 Paragraph 45. A pharmaceutical composition comprising: a)  Lipid nanoparticles (LNP); and b)  An engineered nucleic acid construct of any one of paragraphs 1-16, an ncRNA encoded by an engineered nucleic acid construct of any one of paragraphs 1-16, a vector system of any one of paragraphs 17-33, an RNA molecule of paragraph 34, and/or a genome editing system based on a recombinant retrotransposon of any one of paragraphs 35-37.

段落46.     段落45之醫藥組合物,其中該LNP囊封該經工程改造之核酸構築體、該ncRNA、該載體系統、該RNA分子及/或該經工程改造之核酸-酶構築體。Paragraph 46. The pharmaceutical composition of Paragraph 45, wherein the LNP encapsulates the engineered nucleic acid construct, the ncRNA, the vector system, the RNA molecule and/or the engineered nucleic acid-enzyme construct.

段落47.     段落45或46之醫藥組合物,其中該脂質奈米顆粒包含: a)  一或多種可離子化脂質; b)  一或多種結構脂質; c)  一或多種PEG化脂質;及 d)  一或多種磷脂。 Paragraph 47.     The pharmaceutical composition of paragraph 45 or 46, wherein the lipid nanoparticles comprise: a) one or more ionizable lipids; b) one or more structural lipids; c) one or more PEGylated lipids; and d) one or more phospholipids.

段落48.     段落47之醫藥組合物,其中該一或多種可離子化脂質選自由表2中所揭示之彼等組成之群。Paragraph 48. The pharmaceutical composition of Paragraph 47, wherein the one or more ionizable lipids are selected from the group consisting of those disclosed in Table 2.

段落49.     段落47或48之醫藥組合物,其中該一或多種結構脂質選自由以下組成之群:膽固醇、糞甾醇、β麥固醇、麥固醇、麥角甾醇、菜油甾醇、豆甾醇、蕓苔甾醇、番茄鹼、番茄苷、熊果酸、α-生育酚、潑尼松龍、地塞米松、潑尼松及氫化可的松。Paragraph 49.     The pharmaceutical composition of paragraph 47 or 48, wherein the one or more structured lipids are selected from the group consisting of: cholesterol, natriuretic acid, β-myristol, sterol, ergosterol, campesterol, stigmasterol, sterosterol, tomatine, tomatine, ursolic acid, α-tocopherol, prednisolone, dexamethasone, prednisolone and hydrocortisone.

段落50.     段落47-49中任一項之醫藥組合物,其中該一或多種PEG化脂質選自由以下組成之群:PEG-c-DOMG、PEG-DMG、PEG-DLPE、PEG-DMPE、PEG-DPPC及PEG-DSPE。Paragraph 50.     The pharmaceutical composition of any of paragraphs 47-49, wherein the one or more PEGylated lipids are selected from the group consisting of: PEG-c-DOMG, PEG-DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC and PEG-DSPE.

段落51.     段落47-50中任一項之醫藥組合物,其中該一或多種磷脂選自由以下組成之群:1,2-二硬脂醯基-sn-甘油-3-磷酸膽鹼(DSPC)、1,2-二油醯基-sn-甘油-3-磷酸乙醇胺(DOPE)、1,2-二亞油醯基-sn-甘油-3-磷酸膽鹼(DLPC)、1,2-二肉豆蔻醯基-sn-甘油-磷酸膽鹼(DMPC)、1,2-二油醯基-sn-甘油-3-磷酸膽鹼(DOPC)、1,2-二棕櫚醯基-sn-甘油-3-磷酸膽鹼(DPPC)、1,2-二(十一烷醯基)-sn-甘油-磷酸膽鹼(DUPC)、1-棕櫚醯基-2-油醯基-sn-甘油-3-磷酸膽鹼(POPC)、1,2-二-O-十八烯基-sn-甘油-3-磷酸膽鹼(18:0 Diether PC)、1-油醯基-2-膽固醇基半琥珀醯基-sn-甘油-3-磷酸膽鹼(OChemsPC)、1-十六烷基-sn-甘油-3-磷酸膽鹼(C16 Lyso PC)、1,2-二亞麻醯基-sn-甘油-3-磷酸膽鹼、1,2-二花生四烯醯基-sn-甘油-3-磷酸膽鹼、1,2-二(二十二碳六烯醯基)-sn-甘油-3-磷酸膽鹼、1,2-二植烷醯基sn-甘油-3-磷酸乙醇胺(ME 16.0 PE)、1,2-二硬脂醯基-sn-甘油-3-磷酸乙醇胺、1,2-二亞油醯基-sn-甘油-3-磷酸乙醇胺、1,2-二亞麻醯基-sn-甘油-3-磷酸乙醇胺、1,2-二花生四烯醯基-sn-甘油-3-磷酸乙醇胺、1,2-二(二十二碳六烯醯基)-sn-甘油-3-磷酸乙醇胺、1,2-二油醯基-sn-甘油-3-磷酸-外消旋-(1-甘油)鈉鹽(DOPG)及鞘磷脂。Paragraph 51.     The pharmaceutical composition of any one of paragraphs 47-50, wherein the one or more phospholipids are selected from the group consisting of: 1,2-distearyl-sn-glycero-3-phosphocholine (DSPC), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 1,2-dilinoleyl-sn-glycero-3-phosphocholine (DLPC), 1,2-dimyristoyl-sn-glycero-3-phosphocholine (DMPC ), 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC), 1,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2-di(undecanoyl)-sn-glycero-phosphocholine (DUPC), 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC), 1,2-di-O-octadecenyl-sn-glycero-3-phosphocholine (18:0 Diether PC), 1-oleyl-2-cholesterol hemisuccinyl-sn-glycero-3-phosphocholine (OChemsPC), 1-hexadecyl-sn-glycero-3-phosphocholine (C16 Lyso PC), 1,2-dilinolenoyl-sn-glycero-3-phosphocholine, 1,2-diarachidonyl-sn-glycero-3-phosphocholine, 1,2-di(docosahexaenoyl)-sn-glycero-3-phosphocholine, 1,2-diphytanyl-sn-glycero-3-phosphoethanolamine (ME 16.0 PE), 1,2-distearoyl-sn-glycero-3-phosphoethanolamine, 1,2-dilinoleyl-sn-glycero-3-phosphoethanolamine, 1,2-dilinolenoyl-sn-glycero-3-phosphoethanolamine, 1,2-diarachidonyl-sn-glycero-3-phosphoethanolamine, 1,2-di(docosahexaenoyl)-sn-glycero-3-phosphoethanolamine, 1,2-dioleoyl-sn-glycero-3-phospho-racemic-(1-glycero) sodium salt (DOPG) and sphingomyelin.

段落52.     段落47-51中任一項之醫藥組合物,其中該脂質奈米顆粒包含約48.5 mol%可離子化脂質、約10 mol%磷脂、約40 mol%結構脂質及約1.5 mol% PEG脂質。Paragraph 52.     The pharmaceutical composition of any of paragraphs 47-51, wherein the lipid nanoparticles comprise about 48.5 mol% ionizable lipids, about 10 mol% phospholipids, about 40 mol% structural lipids and about 1.5 mol% PEG lipids.

段落53.     段落47-52中任一項之醫藥組合物,其中該脂質奈米顆粒包含約48.5 mol%可離子化脂質、約10 mol%磷脂、約39 mol%結構脂質及約2.5 mol% PEG脂質。Paragraph 53.     The pharmaceutical composition of any of paragraphs 47-52, wherein the lipid nanoparticles comprise approximately 48.5 mol% ionizable lipids, approximately 10 mol% phospholipids, approximately 39 mol% structural lipids and approximately 2.5 mol% PEG lipids.

段落54.     段落47-53中任一項之醫藥組合物,其中該LNP進一步包含可操作地連接至該LNP之標靶部分。Paragraph 54.     The pharmaceutical composition of any of Paragraphs 47-53, wherein the LNP further comprises a targeting portion operably linked to the LNP.

段落55.     段落47-54中任一項之醫藥組合物,其中該LNP進一步包含選自由以下組成之群的一或多種額外組分:DDAB、EPC、14PA、18BMP、DODAP、DOTAP及C12-200。Paragraph 55.     The pharmaceutical composition of any of paragraphs 47-54, wherein the LNP further comprises one or more additional components selected from the group consisting of: DDAB, EPC, 14PA, 18BMP, DODAP, DOTAP and C12-200.

段落56.     段落45之醫藥組合物,其中該脂質奈米顆粒包含選自由以下組成之群的至少一種陽離子脂質:表2中之脂質、具有式(I)結構之脂質、具有式(II)結構之脂質、具有式(III)結構之脂質、具有式(IV)結構之脂質、具有式(V)結構之脂質、具有式(VI)結構之脂質及其組合。Paragraph 56. The pharmaceutical composition of Paragraph 45, wherein the lipid nanoparticles comprise at least one cationic lipid selected from the group consisting of lipids in Table 2, lipids having a structure of formula (I), lipids having a structure of formula (II), lipids having a structure of formula (III), lipids having a structure of formula (IV), lipids having a structure of formula (V), lipids having a structure of formula (VI), and combinations thereof.

段落57.     一種套組,其包含段落1-16中任一項之經工程改造之核酸構築體、由段落1-16中任一項之經工程改造之核酸構築體編碼的ncRNA、段落17-33中任一項之載體系統、段落34之RNA分子、段落35-37中任一項之基於重組逆轉錄子之基因體編輯系統、段落38-43中任一項之宿主細胞或段落44-56中任一項之醫藥組合物以及關於用該經工程改造之核酸構築體、該ncRNA、該載體系統、該宿主細胞或該醫藥組合物遺傳修飾細胞的說明書。Paragraph 57.     A kit comprising an engineered nucleic acid construct of any of paragraphs 1-16, an ncRNA encoded by an engineered nucleic acid construct of any of paragraphs 1-16, a vector system of any of paragraphs 17-33, an RNA molecule of paragraph 34, a genome editing system based on a recombinant retrotransposon of any of paragraphs 35-37, a host cell of any of paragraphs 38-43, or a pharmaceutical composition of any of paragraphs 44-56, and instructions for genetically modifying cells using the engineered nucleic acid construct, the ncRNA, the vector system, the host cell, or the pharmaceutical composition.

段落58.     一種修飾宿主(例如,哺乳動物)細胞中之標靶DNA序列的方法,該方法包括將段落1-16中任一項之經工程改造之核酸構築體、由段落1-16中任一項之經工程改造之核酸構築體編碼的ncRNA、段落17-33中任一項之載體系統、段落34之RNA分子、段落35-37中任一項之基於重組逆轉錄子之基因體編輯系統或段落44-56中任一項之醫藥組合物引入該哺乳動物細胞中,以允許在該宿主(例如,哺乳動物)細胞中產生該msDNA,其中該msDNA中之該異源核酸修飾該宿主之該基因體。Paragraph 58.     A method for modifying a target DNA sequence in a host (e.g., mammal) cell, the method comprising introducing an engineered nucleic acid construct of any of paragraphs 1-16, an ncRNA encoded by an engineered nucleic acid construct of any of paragraphs 1-16, a vector system of any of paragraphs 17-33, an RNA molecule of paragraph 34, a recombinant retroviral transcript-based genome editing system of any of paragraphs 35-37, or a pharmaceutical composition of any of paragraphs 44-56 into the mammalian cell to allow production of the msDNA in the host (e.g., mammal) cell, wherein the heterologous nucleic acid in the msDNA modifies the genome of the host.

段落59.     段落58之方法,其中該修飾包括將插入、缺失及/或取代引入該標靶DNA序列中。Paragraph 59. The method of paragraph 58, wherein the modification comprises introducing insertions, deletions and/or substitutions into the target DNA sequence.

段落60.     一種治療有需要之個體的疾病或疾患之方法,該方法包括向該個體投與治療有效量的段落1-16中任一項之經工程改造之核酸構築體、由段落1-16中任一項之經工程改造之核酸構築體編碼的ncRNA、段落17-33中任一項之載體系統、段落34之RNA分子、段落35-37中任一項之基於重組逆轉錄子之基因體編輯系統、段落38-43中任一項之宿主細胞或段落44-56中任一項之醫藥組合物,由此治療該個體之該疾病或疾患。Paragraph 60.     A method for treating a disease or condition in an individual in need thereof, the method comprising administering to the individual a therapeutically effective amount of an engineered nucleic acid construct of any one of paragraphs 1-16, an ncRNA encoded by an engineered nucleic acid construct of any one of paragraphs 1-16, a vector system of any one of paragraphs 17-33, an RNA molecule of paragraph 34, a recombinant retrotranscript-based genome editing system of any one of paragraphs 35-37, a host cell of any one of paragraphs 38-43, or a pharmaceutical composition of any one of paragraphs 44-56, thereby treating the disease or condition in the individual.

段落61.     一種治療有需要之個體的疾病或疾患之方法,該方法包括向該個體投與治療有效量之段落38-43中任一項之宿主細胞,由此治療該個體之該疾病或疾患。Paragraph 61. A method for treating a disease or condition in a subject in need thereof, the method comprising administering to the subject a therapeutically effective amount of a host cell of any one of paragraphs 38-43, thereby treating the disease or condition in the subject.

段落62.     段落61之方法,其中該宿主細胞對於該個體為自體的。Paragraph 62. The method of Paragraph 61, wherein the host cell is autologous to the individual.

段落63.     段落61之方法,其中該宿主細胞對於該個體為同種異體的。 例示性實施例組C Paragraph 63. The method of Paragraph 61 , wherein the host cell is allogeneic to the individual.

以下段落亦描述本揭示案之例示性且非限制性實施例。The following paragraphs also describe exemplary and non-limiting embodiments of the present disclosure.

段落1. 一種包含一或多種遞送媒劑之基因編輯系統,其中: 該(等)遞送媒劑包含RNA貨物, 該RNA貨物包含(a)至少一種編碼(i)核酸可程式化核酸酶及(ii)逆轉錄子逆轉錄酶的mRNA分子,(b)經工程改造之逆轉錄子ncRNA,及(c)用於該可程式化核酸酶之引導RNA, 每種遞送媒劑含有(a)(i)及/或(a)(ii)及/或(b)及/或(c), 藉此一種遞送媒劑或超過一種遞送媒劑遞送(a)(i)、(a)(ii)、(b)及(c)。 Paragraph 1. A gene editing system comprising one or more delivery vehicles, wherein: The delivery vehicle(s) comprises an RNA cargo, The RNA cargo comprises (a) at least one mRNA molecule encoding (i) a nucleic acid programmable nuclease and (ii) a retrotranscript reverse transcriptase, (b) an engineered retrotranscript ncRNA, and (c) a guide RNA for the programmable nuclease, Each delivery vehicle contains (a)(i) and/or (a)(ii) and/or (b) and/or (c), and (a)(i), (a)(ii), (b) and (c) are delivered by one delivery vehicle or more than one delivery vehicle.

段落2. 段落1之基因編輯系統,其中(a)(i)及(a)(ii)包含編碼該核酸可程式化核酸酶及該逆轉錄子逆轉錄酶之單一mRNA分子。Paragraph 2. The gene editing system of Paragraph 1, wherein (a)(i) and (a)(ii) comprise a single mRNA molecule encoding the nucleic acid programmable nuclease and the retrotransposase.

段落3. 段落2之基因編輯系統,其中(a)(i)及(a)(ii)經編碼且表現為融合蛋白。Paragraph 3. The gene editing system of Paragraph 2, wherein (a)(i) and (a)(ii) are encoded and expressed as a fusion protein.

段落4. 段落3之基因編輯系統,其中該融合蛋白包含與該逆轉錄子逆轉錄酶之N末端融合的該核酸可程式化核酸酶之C末端(核酸酶:RT融合)。Paragraph 4. The gene editing system of Paragraph 3, wherein the fusion protein comprises the C-terminus of the nucleic acid programmable nuclease fused to the N-terminus of the retroviral reverse transcriptase (nuclease:RT fusion).

段落5. 段落3之基因編輯系統,其中該融合蛋白包含與該逆轉錄子逆轉錄酶之C末端融合的該核酸可程式化核酸酶之N末端(RT:核酸酶融合)。Paragraph 5. The gene editing system of Paragraph 3, wherein the fusion protein comprises the N terminus of the nucleic acid programmable nuclease fused to the C terminus of the reverse transcriptase (RT: nuclease fusion).

段落6. 段落1之基因編輯系統,其中(a)(i)及(a)(ii)包含編碼該核酸可程式化核酸酶之第一mRNA分子及編碼該逆轉錄子逆轉錄酶之第二mRNA分子。Paragraph 6. The gene editing system of Paragraph 1, wherein (a)(i) and (a)(ii) comprise a first mRNA molecule encoding the nucleic acid programmable nuclease and a second mRNA molecule encoding the retrotransposase.

段落7. 段落1之基因編輯系統,其中(c)與(a)(i)、(a)(ii)及(b)分開或以 反式提供。 Paragraph 7. The gene editing system of Paragraph 1, wherein (c) is separated from (a)(i), (a)(ii) and (b) or provided in trans .

段落8. 段落1之基因編輯系統,其中(b)該經工程改造之逆轉錄子ncRNA及(c)該引導RNA經融合或以 順式提供。 Paragraph 8. The gene editing system of Paragraph 1, wherein (b) the engineered retrotran ncRNA and (c) the guide RNA are fused or provided in cis form .

段落9. 段落8之基因編輯系統,其中該引導RNA融合至該逆轉錄子ncRNA之5'端。Paragraph 9. The gene editing system of Paragraph 8, wherein the guide RNA is fused to the 5' end of the retrotranscript ncRNA.

段落10. 段落8之基因編輯系統,其中該引導RNA融合至該逆轉錄子ncRNA之3'端。Paragraph 10. The gene editing system of Paragraph 8, wherein the guide RNA is fused to the 3' end of the retrotranscript ncRNA.

段落11. 段落8之基因編輯系統,其中該經工程改造之ncRNA包含融合至該逆轉錄子ncRNA之5'端的第一引導RNA及融合至該逆轉錄子ncRNA之3’端的第二引導RNA,且該等第一及第二引導RNA靶向不同序列。Paragraph 11. The gene editing system of Paragraph 8, wherein the engineered ncRNA comprises a first guide RNA fused to the 5' end of the retrotranscript ncRNA and a second guide RNA fused to the 3' end of the retrotranscript ncRNA, and the first and second guide RNAs target different sequences.

段落12. 段落1之基因編輯系統,其中該一或多種遞送媒劑包含脂質體或脂質奈米顆粒(LNP)。Paragraph 12. The gene editing system of Paragraph 1, wherein the one or more delivery vehicles comprise liposomes or lipid nanoparticles (LNPs).

段落13. 段落1之基因編輯系統,其中(a)該至少一種編碼(i)該核酸可程式化核酸酶及(ii)該逆轉錄子逆轉錄酶之mRNA分子及(b)該經工程改造之逆轉錄子ncRNA係在同一遞送媒劑中。Paragraph 13. The gene editing system of Paragraph 1, wherein (a) the at least one mRNA molecule encoding (i) the nucleic acid programmable nuclease and (ii) the retrotranscriptase and (b) the engineered retrotranscript ncRNA are in the same delivery medium.

段落14. 段落1之基因編輯系統,其中(a)該至少一種編碼(i)該核酸可程式化核酸酶及(ii)該逆轉錄子逆轉錄酶之mRNA分子及(b)該經工程改造之逆轉錄子ncRNA係在單獨遞送媒劑中。Paragraph 14. The gene editing system of Paragraph 1, wherein (a) the at least one mRNA molecule encoding (i) the nucleic acid programmable nuclease and (ii) the retrotranscriptase and (b) the engineered retrotranscript ncRNA are in a separate delivery medium.

段落15. 段落1之基因編輯系統,其中該核酸可程式化核酸酶及該逆轉錄子逆轉錄酶係在單獨mRNA分子上經編碼且(a)(i)及(a)(ii)之彼等單獨mRNA分子含於同一遞送媒劑中。Paragraph 15. The gene editing system of Paragraph 1, wherein the nucleic acid programmable nuclease and the retroviral reverse transcriptase are encoded on separate mRNA molecules and those separate mRNA molecules of (a)(i) and (a)(ii) are contained in the same delivery medium.

段落16. 段落1之基因編輯系統,其中該核酸可程式化核酸酶及該逆轉錄子逆轉錄酶係在單獨mRNA分子上經編碼且(a)(i)及(a)(ii)之彼等單獨mRNA分子含於不同遞送媒劑中。Paragraph 16. The gene editing system of Paragraph 1, wherein the nucleic acid programmable nuclease and the retroviral reverse transcriptase are encoded on separate mRNA molecules and those separate mRNA molecules of (a)(i) and (a)(ii) are contained in different delivery vehicles.

段落17. 段落1之基因編輯系統,其中該經工程改造之逆轉錄子ncRNA包括編碼供體多核苷酸之相關序列,該供體多核苷酸包含欲整合至細胞中之標靶序列處的預期編輯,且其中該供體多核苷酸側接與該標靶序列5’處之序列雜交的5'同源臂及與該標靶序列3’處之序列雜交的3'同源臂。Paragraph 17. The gene editing system of Paragraph 1, wherein the engineered retrotranscript ncRNA comprises a related sequence encoding a donor polynucleotide comprising the intended edit at a target sequence to be integrated into a cell, and wherein the donor polynucleotide is flanked by a 5' homology arm hybridized to a sequence at the 5' position of the target sequence and a 3' homology arm hybridized to a sequence at the 3' position of the target sequence.

段落18. 段落1之基因編輯系統,其中該核酸可程式化核酸酶包含Cas9核酸酶、TnpB核酸酶或Cas12a核酸酶。Paragraph 18. The gene editing system of Paragraph 1, wherein the nucleic acid programmable nuclease comprises Cas9 nuclease, TnpB nuclease or Cas12a nuclease.

段落19. 段落18之基因編輯系統,其中該核酸可程式化核酸酶包含Cas9核酸酶。Paragraph 19. The gene editing system of Paragraph 18, wherein the nucleic acid programmable nuclease comprises a Cas9 nuclease.

段落20. 段落1之基因編輯系統,其中該逆轉錄子逆轉錄酶具有表A之胺基酸序列,或與表A之任何胺基酸序列具有至少90%、95%、99%或100%序列一致性之胺基酸序列。Paragraph 20. The gene editing system of Paragraph 1, wherein the retrovirus reverse transcriptase has an amino acid sequence of Table A, or an amino acid sequence having at least 90%, 95%, 99% or 100% sequence identity with any amino acid sequence of Table A.

段落21. 段落1之基因編輯系統,其中該經工程改造之逆轉錄子ncRNA包含: A)   具有該逆轉錄子ncRNA之第一互補區的前驅 msr序列; B)    包括 msr莖-環結構之 msr序列; C)    包括 msd莖-環結構及相關序列之 msd序列,其中該 msd序列在該逆轉錄子逆轉錄酶存在下模板化單股DNA產物(RT-DNA);及 D)   具有第二互補區之 msd後序列,其中該等第一及第二互補區形成該逆轉錄子ncRNA之a1/a2雙鏈體區, 其中該 msr莖-環結構、該 msd莖-環結構或該a1/a2雙鏈體包含在與一或多種引導RNA締合之核酸可程式化核酸酶存在下導致編輯效率增加之修飾,且 其中視情況,(c)之一或多種引導RNA與該前驅 msr序列、該 msd後序列或該前驅 msr序列及該 msd後序列兩者偶合。 Paragraph 21. The gene editing system of paragraph 1, wherein the engineered retrotransposon ncRNA comprises: A) a leading msr sequence having a first complementary region of the retrotransposon ncRNA; B) an msr sequence comprising an msr stem-loop structure; C) an msd sequence comprising an msd stem-loop structure and related sequences, wherein the msd sequence templates a single-stranded DNA product (RT-DNA) in the presence of the retrotransposon reverse transcriptase; and D) an msd post-sequence having a second complementary region, wherein the first and second complementary regions form the a1/a2 duplex region of the retrotransposon ncRNA, wherein the msr stem-loop structure, the msd stem-loop structure or the a1/a2 duplex comprises a modification that results in increased editing efficiency in the presence of a nucleic acid programmable nuclease associated with one or more guide RNAs, and wherein, as the case may be, one or more guide RNAs of (c) are coupled to the leading msr sequence, the msd post-sequence, or both the leading msr sequence and the msd post-sequence.

段落22. 段落21之基因編輯系統,其中該相關序列編碼供體多核苷酸,該供體多核苷酸包含欲整合至細胞之標靶序列處的預期編輯,其中該供體多核苷酸側接與該標靶序列5’處之序列雜交的5'同源臂及與該標靶序列3’處之序列雜交的3'同源臂。Paragraph 22. The gene editing system of paragraph 21, wherein the sequence of interest encodes a donor polynucleotide comprising the desired edit at a target sequence to be integrated into a cell, wherein the donor polynucleotide is flanked by a 5' homology arm hybridized to a sequence at the 5' position of the target sequence and a 3' homology arm hybridized to a sequence at the 3' position of the target sequence.

段落23. 段落21之基因編輯系統,其中該ncRNA具有表B之任何序列的核苷酸序列,或與表B之任何序列具有至少90%、95%、99%或100%序列一致性之核苷酸序列。Paragraph 23. The gene editing system of Paragraph 21, wherein the ncRNA has a nucleotide sequence of any sequence in Table B, or a nucleotide sequence having at least 90%, 95%, 99% or 100% sequence identity with any sequence in Table B.

段落24. 一種經分離之細胞,其包含段落1之基因編輯系統。Paragraph 24. An isolated cell comprising the gene editing system of Paragraph 1.

段落25. 段落24之經分離之細胞,其中該經分離之細胞為哺乳動物細胞。Paragraph 25. The isolated cell of Paragraph 24, wherein the isolated cell is a mammalian cell.

段落26. 段落25之經分離之細胞,其中該哺乳動物細胞為人類細胞。Paragraph 26. The isolated cell of Paragraph 25, wherein the mammalian cell is a human cell.

段落27. 一種組合物,其包含: a)  段落1之基因編輯系統;及 b)  醫藥學上或獸醫學上可接受之載劑。 Paragraph 27. A composition comprising: a) the gene editing system of Paragraph 1; and b) a pharmaceutically or veterinarily acceptable carrier.

段落28. 段落27之組合物,其中該遞送媒劑為脂質奈米顆粒,其包含: a)  一或多種可離子化脂質; b)  一或多種結構脂質; c)  一或多種PEG化脂質;及 d)  一或多種磷脂。 Paragraph 28. The composition of paragraph 27, wherein the delivery medium is a lipid nanoparticle comprising: a) one or more ionizable lipids; b) one or more structured lipids; c) one or more PEGylated lipids; and d) one or more phospholipids.

段落29. 段落28之組合物,其中該一或多種可離子化脂質包含表2中所陳述之可離子化脂質。Paragraph 29. The composition of Paragraph 28, wherein the one or more ionizable lipids comprise the ionizable lipids set forth in Table 2.

段落30. 一種遺傳修飾細胞之方法,該方法包括: 使段落17之基因編輯系統與該細胞接觸,由此將該RNA貨物遞送至該細胞, 其中: 該核酸可程式化核酸酶與該引導RNA形成複合物,其中該引導RNA將該複合物引導至該標靶序列, 該核酸可程式化核酸酶在該標靶序列中產生雙股斷裂, 該逆轉錄子逆轉錄酶及該經工程改造之逆轉錄子ncRNA產生包含該供體多核苷酸之RT DNA,且 該供體多核苷酸整合至該標靶序列處。 實例 實例1:逆轉錄子工程改造 Paragraph 30. A method for genetically modifying a cell, the method comprising: contacting the gene editing system of paragraph 17 with the cell, thereby delivering the RNA cargo to the cell, wherein: the nucleic acid programmable nuclease forms a complex with the guide RNA, wherein the guide RNA guides the complex to the target sequence, the nucleic acid programmable nuclease produces a double-strand break in the target sequence, the retrotranscriptase and the engineered retrotranscript ncRNA produce RT DNA comprising the donor polynucleotide, and the donor polynucleotide is integrated into the target sequence. Examples Example 1: Retrotranscript engineering

此實例證明如本文所述之經工程改造(或重組)逆轉錄子可基於序列資料庫中之現有序列資訊進行工程改造。This example demonstrates that an engineered (or recombinant) retrotranscript as described herein can be engineered based on existing sequence information in a sequence database.

特定言之,首先自各種基因體或宏基因體序列資料庫中鑑定出逆轉錄子樣逆轉錄酶(RT)。接著, 藉由電腦預測及/或憑經驗例如藉由在活細胞中重建推定之逆轉錄子系統來確定逆轉錄子之已鑑定ncRNA區域,以分析msDNA產生。 Specifically, retrotranscript-like reverse transcriptases (RTs) are first identified from various genome or metagenomic sequence databases. The identified ncRNA regions of the retrotranscripts are then determined by computer prediction and/or empirically, such as by reconstructing putative retrotranscript systems in living cells, to analyze msDNA production.

一旦鑑定出且確認特定野生型逆轉錄子,則基於本文所述之方法來修飾該逆轉錄子之一或多種序列元件,及/或修飾或工程改造相關RT以增強該逆轉錄子之整體活性及/或可加工性。例如,可藉由以下方法中之一或多者來工程改造野生型逆轉錄子:(a)將相關異源核酸序列(例如,編碼HDR供體模板之核苷酸序列)添加至 msd基因座之各個部分及結構中;(b)執行本文所述之任何/所有結構修飾;(c)視情況將經工程改造之逆轉錄子ncRNA連接至一或多個CRISPR gRNA,例如,連接至逆轉錄子ncRNA之3'端的gRNA,或連接至逆轉錄子ncRNA之5'端的gRNA,或一個連接至逆轉錄子ncRNA之3'端且一個連接至5'端之一對gRNA。 Once a particular wild-type retrotranscript is identified and confirmed, one or more sequence elements of the retrotranscript are modified based on the methods described herein, and/or the associated RT is modified or engineered to enhance the overall activity and/or processability of the retrotranscript. For example, a wild-type retrotranscript can be engineered by one or more of the following methods: (a) adding relevant heterologous nucleic acid sequences (e.g., nucleotide sequences encoding HDR donor templates) to various parts and structures of the msd locus; (b) performing any/all structural modifications described herein; (c) optionally linking the engineered retrotranscript ncRNA to one or more CRISPR gRNAs, e.g., a gRNA linked to the 3' end of the retrotranscript ncRNA, or a gRNA linked to the 5' end of the retrotranscript ncRNA, or a pair of gRNAs, one linked to the 3' end and one linked to the 5' end of the retrotranscript ncRNA.

經工程改造之逆轉錄子或其經編碼ncRNA視情況連接至序列特異性核酸酶,諸如CRISPR/Cas酶及/或gRNA、ZFN、TALEN、TnpB或IscB及其類似物。The engineered retrotranscript or its encoded ncRNA is optionally linked to a sequence-specific nuclease, such as a CRISPR/Cas enzyme and/or gRNA, ZFN, TALEN, TnpB or IscB and the like.

例如,RT與CRISPR/Cas酶(諸如Cas9或Cpf1)融合,作為N末端或C末端融合,視情況在該融合之N末端及/或C末端進一步與核定位信號(NLS)融合。該ncRNA或在逆轉錄後得到之msDNA亦在5'或3'端連接至引導RNA (gRNA),且可與本發明方法中之同源CRISPR/Cas酶一起使用。For example, RT is fused to a CRISPR/Cas enzyme (such as Cas9 or Cpf1) as an N-terminal or C-terminal fusion, and optionally further fused to a nuclear localization signal (NLS) at the N-terminal and/or C-terminal end of the fusion. The ncRNA or msDNA obtained after reverse transcription is also linked to a guide RNA (gRNA) at the 5' or 3' end and can be used together with the cognate CRISPR/Cas enzyme in the method of the present invention.

在另一實例中,RT連接至如上文所述之DNA修復調節生物分子,諸如HDR啟動子及/或NHEJ肽抑制劑。 實例2A:基因體工程改造 In another example, RT is linked to a DNA repair regulatory biomolecule as described above, such as an HDR promoter and/or an NHEJ peptide inhibitor. Example 2A: Genome Engineering

此實例證明經工程改造之逆轉錄子可用於將異源核酸序列( 例如,靶向DNA供體或模板)引入宿主細胞基因體( 例如,人類細胞)中。 This example demonstrates that engineered retrotransposons can be used to introduce heterologous nucleic acid sequences ( eg , targeting DNA donors or templates) into the genome of host cells ( eg , human cells).

首先,將靶向DNA引入經工程改造之逆轉錄子的 msd部分中,如圖3所示,作為「標記物」。該異源核酸序列經設計,使得其側接10-100個或更多鹼基對之同源序列,該同源序列與標靶位點處之基因體序列實質上一致/同源。因此,「標記物」上之所需編輯係在同源序列臂之間,且包括插入、缺失及/或其他突變。 First, a targeting DNA is introduced into the msd portion of an engineered retrotranscript, as shown in Figure 3, as a "marker". The heterologous nucleic acid sequence is designed so that it is flanked by 10-100 or more base pairs of homologous sequences that are substantially identical/homologous to the genome sequence at the target site. Thus, the desired edits on the "marker" are between the homologous sequence arms and include insertions, deletions, and/or other mutations.

序列特異性核酸酶(諸如Cas9核酸酶)與引導RNA形成複合物,該複合物特定地靶向基因體序列上之所需編輯位點處或附近之位置。該核酸酶經設計,使得一旦正確安裝編輯,該核酸酶不會切割標靶。在此實驗中,Cas9核酸酶連接至逆轉錄子逆轉錄酶(RT),作為融合蛋白。該Cas9 gRNA亦連接至由經工程改造之逆轉錄子產生的ncRNA或msDNA。A sequence-specific nuclease, such as the Cas9 nuclease, forms a complex with a guide RNA that specifically targets a location on a genomic sequence at or near the desired editing site. The nuclease is designed so that once the edit is correctly installed, it does not cleave the target. In this experiment, the Cas9 nuclease was linked to a retrotranscriptase (RT) as a fusion protein. The Cas9 gRNA was also linked to ncRNA or msDNA produced by the engineered retrotranscript.

接著,將編碼序列特異性核酸酶(及經融合RT)以及ncRNA (及經連接gRNA)之經工程改造之逆轉錄子引入宿主細胞(諸如人類細胞)中。Next, the engineered retrotransposons encoding the sequence-specific nuclease (and fused RT) and the ncRNA (and linked gRNA) are introduced into host cells, such as human cells.

經工程改造之逆轉錄子作為用於轉染至細胞中的質體之部分,或作為用於感染細胞的病毒載體(諸如AAV載體)之部分經引入。The engineered retroviral vector is introduced as part of a plasmid for transfection into cells, or as part of a viral vector (such as an AAV vector) for infecting cells.

或者,經轉錄ncRNA (及經連接gRNA)可在 活體外在例如本發明之脂質奈米顆粒或遞送系統中進行調配,用於直接遞送至宿主細胞中。該序列特異性核酸酶(及經融合RT)可單獨遞送至宿主細胞中(使用質體轉染或AAV感染等),或與脂質奈米顆粒中之ncRNA一起遞送。例如,Cas9-RT融合之編碼序列( 例如,mRNA)與ncRNA一起在相同脂質奈米顆粒中進行調配,或單獨調配為脂質奈米顆粒以同時或依序一起遞送。 Alternatively, the transcribed ncRNA (and linked gRNA) can be formulated in vitro , for example, in a lipid nanoparticle or delivery system of the present invention, for direct delivery to a host cell. The sequence-specific nuclease (and fused RT) can be delivered to a host cell alone (using plasmid transfection or AAV infection, etc.), or delivered together with the ncRNA in a lipid nanoparticle. For example, the coding sequence ( e.g. , mRNA) of the Cas9-RT fusion is formulated together with the ncRNA in the same lipid nanoparticle, or formulated separately into lipid nanoparticles for simultaneous or sequential delivery.

一旦存在於細胞內部,Cas9-RT融合就會由宿主細胞轉譯機器進行轉譯,而ncRNA自經工程改造之逆轉錄子中轉錄(若ncRNA未直接遞送至細胞中)。融合之RT部分繼續逆轉錄ncRNA且將其轉化為msDNA,其中包括作為貨物/供體/模板之異源核酸序列。同時,CRISPR/Cas9核酸酶在標靶位點處生成雙股斷裂(DSB)。接著,在Cas9核酸酶形成DSB之後,該貨物/供體/模板序列經由宿主細胞DNA修復( 例如,HDR)整合至宿主基因體中之標靶位點處。 Once inside the cell, the Cas9-RT fusion is translated by the host cell translation machinery, and the ncRNA is transcribed from the engineered retrotranscript (if the ncRNA was not delivered directly to the cell). The fused RT portion continues to reverse transcribe the ncRNA and convert it into msDNA, which includes the heterologous nucleic acid sequence that serves as cargo/donor/template. At the same time, the CRISPR/Cas9 nuclease generates a double-strand break (DSB) at the target site. Then, after the Cas9 nuclease forms the DSB, the cargo/donor/template sequence is integrated into the host genome at the target site through host cell DNA repair ( e.g. , HDR).

接著,藉由歸因於編輯而出現之表型變化,或藉由標靶位點之直接DNA測序,或兩者,針對該編輯在異源核酸序列上之正確安裝對相關細胞進行分析。 實例2B. 哺乳動物細胞中之精確逆轉錄子編輯水準在質體遞送之逆轉錄子與RNA遞送之逆轉錄子之間可相當 The relevant cells are then analyzed for correct installation of the edit in the heterologous nucleic acid sequence, either by phenotypic changes due to the edit, by direct DNA sequencing of the target site, or both . Example 2B. Precise retrotranscript editing levels in mammalian cells are comparable between plastid-delivered and RNA-delivered retrotranscripts

雖然CRISPR/Cas技術已徹底改變基因體編輯技術,但精確編輯病理突變或在特定位置處插入大DNA片段以恢復細胞之健康仍處於起步階段。遞送供體DNA以用於精確編輯係目前治療應用之瓶頸。供體DNA之當前治療性 離體活體內遞送依賴於AAV (腺相關病毒)轉導。然而,AAV製造複雜且昂貴。此外,AAV在一些基因療法臨床試驗中引起安全性擔憂。本發明人試圖開發一種避免基於AAV之DNA遞送之缺點的用於遞送DNA供體之方法。 Although CRISPR/Cas technology has revolutionized genome editing technology, precise editing of pathological mutations or insertion of large DNA fragments at specific locations to restore cell health is still in its infancy. Delivery of donor DNA for precise editing is a bottleneck for current therapeutic applications. Current therapeutic ex vivo and in vivo delivery of donor DNA relies on AAV (adeno-associated virus) transduction. However, AAV is complex and expensive to manufacture. In addition, AAV has raised safety concerns in some gene therapy clinical trials. The inventors attempted to develop a method for delivering DNA donors that avoids the shortcomings of AAV-based DNA delivery.

反而,本發明人試圖開發一種用於將DNA供體遞送至細胞中以進行精確編輯的基於RNA之方法。Instead, the inventors sought to develop an RNA-based method for delivering DNA donors into cells for precise editing.

已在許多細菌物種中發現逆轉錄子。此等遺傳元件藉由其產生不尋常衛星DNA之獨特能力加以定義,該不尋常衛星DNA稱為msDNA (多複本單股DNA)。該DNA包括一個逆轉錄酶(RT)編碼基因( ret)及兩個連續反向非編碼序列(命名為 msrmsd)。 ret基因及非編碼RNA (ncRNA、 msrmsd)經轉錄為單一RNA,該RNA經折疊成特定二級結構。一旦經轉譯,RT與 msd基因座下游之RNA模板結合,從而在充當引子之保守分支鏈鳥苷殘基中存在之2’OH基團的協助下,起始RNA朝向其5’端逆轉錄。逆轉錄在到達 msr基因座之前停止,且所得DNA (msDNA)經由2’-5’磷酸二酯鍵以及msDNA與RNA模板之3’端之間之鹼基配對,保持共價連接至RNA模板。在此實例中,利用逆轉錄子之此等獨特特徵來產生供體DNA以藉由經過驗證之RNA遞送系統進行精確基因體編輯,且如與質體遞送之逆轉錄子相比,該等獨特特徵顯示出在哺乳動物細胞中產生相似水準之基於Cas9之精確編輯。 Retrotranscriptons have been found in many bacterial species. These genetic elements are defined by their unique ability to produce unusual satellite DNA, called msDNA (multi-copy single-stranded DNA). The DNA includes a reverse transcriptase (RT) encoding gene ( ret ) and two consecutive inverted non-coding sequences (named msr and msd ). The ret gene and the non-coding RNA (ncRNA, msr and msd ) are transcribed into a single RNA, which is folded into a specific secondary structure. Once translated, RT binds to the RNA template downstream of the msd locus, thereby initiating reverse transcription of the RNA towards its 5' end with the help of the 2'OH group present in the conserved branch-chain guanosine residues that act as a primer. Retrotranscription stops before reaching the msr locus, and the resulting DNA (msDNA) remains covalently linked to the RNA template via 2'-5' phosphodiester bonds and base pairing between the msDNA and the 3' end of the RNA template. In this example, these unique features of retrotranscriptors were exploited to generate donor DNA for precise genomic editing by a validated RNA delivery system, and were shown to produce similar levels of Cas9-based precise editing in mammalian cells as compared to plastid-delivered retrotranscriptors.

圖1L及1M顯示出第一組實驗之結果。評估三個不同種類之逆轉錄子(Eco1 - R1、Eco3 - R2、Eco5 - R3) 在特定基因體位點(EMX1基因)處插入16個鹼基對之插入的能力。將每種逆轉錄酶之mRNA及同源非編碼逆轉錄子RNA電穿孔至表現Cas9之HEK293T哺乳動物細胞中。使Cas9之引導RNA與非編碼逆轉錄子RNA融合,以確保 引導RNA及 原位生成之單股供體DNA (亦即,ncRNA模板之逆轉錄產物)共定位於編輯標靶位點處。作為對照,將先前建立之編碼Eco1逆轉錄酶及其非編碼RNA之質體系統脂質轉染至相同細胞株中。使經編輯之序列自提取之基因體DNA中進行PCR擴增,且製成NGS文庫且藉由測序進行分析。結果表明,以RNA形式遞送之逆轉錄子組分可能為精確編輯提供供體DNA模板,與經由質體轉染遞送之彼等逆轉錄子組分同樣有效。 Figures 1L and 1M show the results of the first set of experiments. Three different types of retrotranscripts (Eco1-R1, Eco3-R2, Eco5-R3) were evaluated for their ability to insert 16 base pairs of insertions at a specific genomic site (EMX1 gene). The mRNA of each reverse transcriptase and the homologous non-coding reverse transcriptase RNA were electroporated into HEK293T mammalian cells expressing Cas9. The guide RNA of Cas9 was fused to the non-coding reverse transcriptase RNA to ensure that the guide RNA and the single- stranded donor DNA generated in situ (i.e., the reverse transcriptase product of the ncRNA template) were co-localized at the editing target site. As a control, the previously established plasmid system encoding Eco1 reverse transcriptase and its non-coding RNA was lipid-transfected into the same cell line. The edited sequences were PCR amplified from extracted genomic DNA, and NGS libraries were made and analyzed by sequencing. The results suggest that retrotranscript components delivered in RNA form can provide donor DNA templates for precise editing as effectively as those delivered via plasmid transfection.

在第二組實驗中(資料未顯示),對逆轉錄酶mRNA與其非編碼RNA模板之間之比率進行最佳化。將增量之非編碼RNA模板與既定量之逆轉錄酶mRNA組合(在兩種不同固定濃度之逆轉錄酶mRNA下測試)。結果指示,可對逆轉錄酶mRNA與其非編碼RNA模板之間之比率進行最佳化,以便增加編輯效率。例如,資料顯示,將非編碼RNA模板增加至既定量之RT酶會改良16 bp插入物之整合率。 實例3:新穎逆轉錄子之計算發現及種系發生分析,以增加基因體編輯應用之逆轉錄子多樣性 In a second set of experiments (data not shown), the ratio between the reverse transcriptase mRNA and its non-coding RNA template was optimized. Increasing amounts of non-coding RNA template were combined with a given amount of reverse transcriptase mRNA (tested at two different fixed concentrations of reverse transcriptase mRNA). The results indicate that the ratio between the reverse transcriptase mRNA and its non-coding RNA template can be optimized in order to increase editing efficiency. For example, the data showed that increasing the non-coding RNA template to a given amount of RT enzyme improved the integration rate of 16 bp inserts. Example 3: Computational discovery of novel retrotranscriptors and phylogenetic analysis to increase retrotranscript diversity for genome editing applications

逆轉錄酶(RT)亦稱為RNA定向DNA聚合酶,係能夠使用RNA作為模板來合成DNA之酶。雖然它們存在於生命及病毒之三個領域中;但與真核配對物相比,原核生物RT歷來較少被探索。原核生物RT可分為6個主要組別:(1)組II內含子、(2) CRISPR相關RT、(3)多樣性生成逆轉錄元件、(4)逆轉錄子、(5) Abi (流產感染) RT及(6)未知RT組(UG)。在過去五年中,大量研究已增加了對原核生物 RT之瞭解,由此發現具有潛在抗噬菌體特性之新穎推定系統,包括逆轉錄子。在此實例中,對公共資料庫執行系統搜索,目的係增加已知逆轉錄子之數目及多樣性,以便有可能用於基因體編輯應用。結果,已鑑定出新類型之逆轉錄子,且資料之增加使得新的相關ncRNA得以鑑定。Reverse transcriptases (RTs), also known as RNA-directed DNA polymerases, are enzymes capable of synthesizing DNA using RNA as a template. Although they are found in all three domains of life and viruses; prokaryotic RTs have historically been less explored compared to their eukaryotic counterparts. Prokaryotic RTs can be divided into six major groups: (1) Group II introns, (2) CRISPR-associated RTs, (3) diversity-generating retrotranscript elements, (4) retrotranscripts, (5) Abi (abortive infection) RTs, and (6) the unknown group of RTs (UG). Over the past five years, a large number of studies have increased the understanding of prokaryotic RTs, resulting in the discovery of novel putative systems with potential anti-phage properties, including retrotranscripts. In this example, a systematic search of public databases was performed with the goal of increasing the number and diversity of known retrotranscripts for potential use in genome editing applications. As a result, new types of retrotranscripts have been identified, and the increase in data has allowed the identification of new related ncRNAs.

作為此工作之第一步,手動策劃、修剪及比對一組已知之逆轉錄子RT以創建合適輸入,接著使用該輸入來訓練HMM模型以鑑定新的逆轉錄子 RT。As a first step in this work, a set of known retrotranscript RTs was manually curated, pruned, and aligned to create suitable inputs, which were then used to train an HMM model to identify new retrotranscript RTs.

接著,將此模型應用於現有蛋白質序列資料庫(例如,來自NCBI之nr資料庫),以鑑定潛在候選逆轉錄子RT。This model is then applied to existing protein sequence databases (e.g., the nr database from NCBI) to identify potential candidate retrotran RTs.

下一步,接著藉由序列一致性對鑑定出之候選物進行分組,且自每組中選擇個別代表性序列。接著對此等代表性候選物之此等RT結構域進行比對,且建立初始種系發生樹。Next, the identified candidates were grouped by sequence identity and individual representative sequences were selected from each group. The RT domains of these representative candidates were then aligned and an initial phylogenetic tree was constructed.

接著,使用有關其他已知RT類別之資訊,將完整種系發生分為 真正逆轉錄子RT候選物及可能屬於其他類別之RT (例如組II內含子、DGR、CRISPR-Cas RT等)。接著自此等經過驗證之候選物建立新的比對及種系發生樹,如圖28所示。對於新比對中之所有序列,建立蛋白質鄰域矩陣,從而指示哪些蛋白質存在於候選逆轉錄子RT附近。由此,基於RT種系發生及相關效應蛋白之身份來定義逆轉錄子之類型及亞型。候選RT序列在表A中呈遞 Next, using information about other known RT classes, the complete germline occurrence is divided into true retrotranscript RT candidates and RTs that may belong to other classes (e.g., group II introns, DGRs, CRISPR-Cas RTs, etc.). New alignments and germline occurrence trees are then built from these validated candidates, as shown in Figure 28. For all sequences in the new alignment, a protein neighborhood matrix is built to indicate which proteins are present near the candidate retrotranscript RT. Thus, types and subtypes of retrotranscripts are defined based on the RT germline occurrence and the identity of the associated effector proteins. The candidate RT sequences are presented in Table A.

為了預測接近候選逆轉錄子RT之ncRNA,使用迭代收斂模型來提取、比對及分析接近候選逆轉錄子RT之基因體區域的結構協方差。 ncRNA分類方法: In order to predict ncRNAs close to candidate retrotranscript RTs, an iterative convergent model was used to extract, align, and analyze the structural covariance of genomic regions close to candidate retrotranscript RTs. ncRNA classification method:

為了鑑定出序列可能屬於哪種類型之ncRNA,可針對現有ncRNA類型之協方差模型檢查序列。Infernal套件(http://eddylab.org/infernal/)提供進行此舉之工具。簡言之,可根據每種類型之已知ncRNA之結構比對來建立協方差模型,接著整理至CM資料庫中。由此,可在資料庫中搜索新序列,以驗證該序列是否適合任何所表示之家族。In order to identify which type of ncRNA a sequence may belong to, the sequence can be checked against covariance models for existing ncRNA types. The Infernal package (http://eddylab.org/infernal/) provides tools for doing this. In short, covariance models can be built based on structural alignments of known ncRNAs of each type and then organized into a CM database. From this, new sequences can be searched in the database to verify whether the sequence fits into any of the represented families.

所得逆轉錄子RT序列在本文表A中提供。所得及相應逆轉錄子ncRNA序列在本文表B中提供。 實例4:細胞中基於逆轉錄子之編輯系統的設計與演示 The resulting retrotranscript RT sequences are provided in Table A herein. The resulting and corresponding retrotranscript ncRNA sequences are provided in Table B herein. Example 4: Design and demonstration of a retrotranscript-based editing system in cells

當前基因體編輯方法可藉由使用可程式化核酸酶插入及缺失遺傳資訊來有效地破壞標靶基因。然而,不區分突變型與野生型等位基因之基因破壞不僅損害發病機制中之突變型等位基因,而且損害生物體之正常生理學中的野生型等位基因。例如,在轉甲狀腺素蛋白(TTR)誘導之澱粉樣變性中,突變型TTR等位基因引起該疾病,但非致病性野生型等位基因在神經保護及損傷反應中發揮作用(MF, 2021)。由於單核苷酸交換為TTR之主要變異體(J, 2021),人類之絕大多數致病等位基因與其非致病等位基因的差異在於小修飾,該等修飾需要精確得多的編輯技術來進行校正。由核酸酶介導之DNA斷裂刺激的同源定向修復(HDR)已廣泛用於安裝精確編輯。然而,HDR依賴於外源性供體DNA之遞送,該遞送可引發強烈免疫反應且已顯示出無法有效地在接受者中遞送普遍高豐度(D, synthetic DNA delivery systems, 2000)。供體DNA之當前治療性 離體活體內遞送依賴於AAV (腺相關病毒)轉導。然而,AAV製造複雜且昂貴。此外,在meta分析所調查之255個完整或正在進行之臨床試驗中的30%中,AAV已引起安全性擔憂(W, 2022)。相較之下,RNA遞送在兩種SARS-Covid2疫苗中成功證明了其安全性、更容易製造及功效。基於RNA作為貨物之此適應性,吾人搜索該系統,其中可能由細胞內遞送之RNA生成用於精確編輯之供體DNA。 Current genome editing methods can effectively disrupt target genes by inserting and deleting genetic information using programmable nucleases. However, gene disruption that does not distinguish between mutant and wild-type alleles damages not only the mutant allele in the pathogenic mechanism, but also the wild-type allele in the normal physiology of the organism. For example, in transthyretin (TTR)-induced amyloidosis, the mutant TTR allele causes the disease, but the non-pathogenic wild-type allele plays a role in neuroprotection and injury response (MF, 2021). Since single nucleotide exchanges are the major variant in TTR (J, 2021), the vast majority of pathogenic alleles in humans differ from their non-pathogenic alleles by small modifications that require much more precise editing techniques to correct. Homology-directed repair (HDR) stimulated by nuclease-mediated DNA breaks has been widely used to install precise edits. However, HDR relies on the delivery of exogenous donor DNA, which can elicit a strong immune response and has been shown to be inefficient in delivering high abundances in recipients (D, synthetic DNA delivery systems, 2000). Current therapeutic ex vivo and in vivo delivery of donor DNA relies on AAV (adeno-associated virus) transduction. However, AAV is complex and expensive to manufacture. Furthermore, AAV has raised safety concerns in 30% of the 255 completed or ongoing clinical trials examined in the meta-analysis (W, 2022). In contrast, RNA delivery has successfully demonstrated safety, ease of manufacture, and efficacy in two SARS-Covid2 vaccines. Based on this adaptability of RNA as cargo, we searched for systems in which donor DNA for precise editing could be generated by RNA delivered within cells.

逆轉錄子藉由它們由細胞內部之RNA產生不尋常衛星DNA之獨特能力加以定義,該不尋常衛星DNA稱為msDNA (多複本單股DNA) (S, 1989)。此等細菌元件參與噬菌體防禦(A, 2020)且由非編碼RNA及逆轉錄酶(RT)組成,該逆轉錄酶(RT)可將非編碼RNA (ncRNA)逆轉錄成msDNA。其嚴格定義之逆轉錄位點使得能夠將供體DNA序列插入非編碼RNA受質中。此外,其緊湊尺寸適合與呈全RNA形式之可程式化核酸酶一起進行遞送。此等功能使逆轉錄子成為治療應用中之精確基因體編輯的有吸引力之工具。Retrotranscripts are defined by their unique ability to generate unusual satellite DNA, called msDNA (multi-copy single-stranded DNA), from RNA inside the cell (S, 1989). These bacterial elements are involved in phage defense (A, 2020) and consist of a non-coding RNA and a reverse transcriptase (RT) that reverses the non-coding RNA (ncRNA) into msDNA. Its strictly defined retrotranscription site enables the insertion of donor DNA sequences into the non-coding RNA substrate. In addition, its compact size is suitable for delivery with programmable nucleases in the form of all-RNA. These features make retrotranscripts attractive tools for precise genome editing in therapeutic applications.

在細胞中藉由逆轉錄生成新DNA序列之獨立方法為先導編輯且其病毒源性RT可實現長達44 bp之插入(AV, 2019)。天然序列中之逆轉錄子可產生多達163個核苷酸之單股DNA (T, 1987),表明逆轉錄子介導之編輯中的更長插入係可行的。與此一致,新進鑑定出之逆轉錄子可能在 化膿鏈球菌Cas9核酸酶系統中的HEK293T細胞之EMX1基因座中插入具有6 bp缺失之100 bp。 A. 來自不同進化枝之逆轉錄子在精確編輯中之證據 An independent process for generating new DNA sequences in cells by retrotranscription is prime editing and its virally derived RT can achieve insertions of up to 44 bp in length (AV, 2019). Retrotranscripts in natural sequences can generate single-stranded DNA of up to 163 nucleotides (T, 1987), indicating that longer insertions are feasible in retrotranscript-mediated editing. Consistent with this, the newly identified retrotranscript could insert 100 bp with a 6 bp deletion into the EMX1 locus of HEK293T cells in the Streptococcus pyocyaneus Cas9 nuclease system. A. Evidence for retrotranscripts from different evolutionary branches in precise editing

第一步,在細菌基因體中搜索逆轉錄子/逆轉錄子樣RT。對自針對細菌基因體進行之搜索中鑑定出的超過7,000種逆轉錄子/逆轉錄子樣序列進行種系發生分析,得到圖28之種系發生樹。在逆轉錄子/逆轉錄子樣RT序列之基因體附近搜索非編碼RNA基因,以尋找類似於特徵性msr-msd轉錄本的保守RNA二級結構:末端自雜交反向重複、msr及msd中之髮夾結構,諸如圖2-27所表示之彼等。使用協方差模型及共有結構偵測,在某些RT中以高度置信度鑑定ncRNA。為了評估逆轉錄子在不同種系發生中之基因編輯適用性,選擇Eco1、Eco3、Eco5、Aco1、RTX3-2042、6083 v1及6943進行進一步分析(圖30)。Eco1、Eco3及Eco5為先前經過實驗驗證之逆轉錄子以顯示msDNA產生(9)。Aco1最近在文獻中加以注釋,但尚未經過實驗驗證為產生msDNA (10)。RTX3-2042、6083 v1及6943為新穎逆轉錄子。In the first step, retrotranscript/retrotranscript-like RTs were searched in bacterial genomes. Phylogenetic analysis of over 7,000 retrotranscript/retrotranscript-like sequences identified from searches against bacterial genomes yielded the phylogenetic tree of Figure 28. Noncoding RNA genes were searched near the genomes of retrotranscript/retrotranscript-like RT sequences to find conserved RNA secondary structures similar to characteristic msr-msd transcripts: terminal self-hybridizing inverted repeats, hairpin structures in msr and msd, such as those represented in Figures 2-27. ncRNAs were identified with high confidence in some RTs using covariance modeling and consensus structure detection. To evaluate the suitability of retrotranscripts for gene editing in different lineages, Eco1, Eco3, Eco5, Aco1, RTX3-2042, 6083 v1, and 6943 were selected for further analysis (Figure 30). Eco1, Eco3, and Eco5 are retrotranscripts that have been previously experimentally validated to show msDNA production (9). Aco1 has recently been annotated in the literature but has not been experimentally validated to produce msDNA (10). RTX3-2042, 6083 v1, and 6943 are novel retrotranscripts.

首先以質體DNA形式執行基因編輯分析。將逆轉錄子元件組裝於質體中,其中RT在CAG啟動子下經轉錄,且ncRNA在其3’端與Cas9核酸酶之單引導RNA (sgRNA)之5'端在U6或H1 RNA聚合酶III啟動子下融合(參見31A及31B)。在兩側均側接同源臂之EMX1靶向位點處插入的所需序列插入每種逆轉錄子之ncRNA之msD莖環內。如圖32及33所述,經由脂質轉染將質體轉染至表現Cas9之人類胚胎腎293T (HEK293T)細胞中。轉染后三天,藉由PCR擴增EMX1標靶基因體基因座,且藉由下一代測序(NGS)分析序列。所分析之精確編輯之特徵為10 bp插入及6 bp 取代,且藉由CRISPResso 2分析以及其他編輯結果來計算其百分比。Gene editing analysis was first performed in the form of plastid DNA. Retrotranscript elements were assembled in plastids, where RT was transcribed under the CAG promoter, and ncRNA was fused at its 3' end with the 5' end of the single guide RNA (sgRNA) of the Cas9 nuclease under the U6 or H1 RNA polymerase III promoter (see 31A and 31B). The desired sequence inserted at the EMX1 targeting site flanked by homologous arms on both sides was inserted into the msD stem loop of the ncRNA of each retrotranscript. As described in Figures 32 and 33, plastids were transfected into human embryonic kidney 293T (HEK293T) cells expressing Cas9 via lipid transfection. Three days after transfection, the EMX1 target gene locus was amplified by PCR, and the sequence was analyzed by next generation sequencing (NGS). The precise edits analyzed were characterized by 10 bp insertions and 6 bp substitutions, and their percentages were calculated by CRISPResso 2 analysis along with other editing results.

代表性編輯結果在圖35中示出。Eco1 (圖36)、Aco1 (圖37)、RTX3-2042 (圖38)、RTX3-6083 v1及6943 (圖39A)顯示出0.3%、0.1%、0.25%、0.06%及0.05%之精確編輯效率(分別在圖36、37、38及39中之左圖)。非所需編輯結果經定義為在Cas9切割位點附近併入隨機核苷酸或缺失序列之插入缺失,達到50%、3%、5%、2%及4% (分別在圖36、37、38及39A中之右圖) 。使用相同分析之後續實驗指示RTX3_6083v1及RTX3_6943生成比Eco1多3-4倍之精確編輯,而由此兩種逆轉錄子生成之插入缺失低2-3倍(圖39B及39C)。RTX3_2042顯示與Eco1相似頻率之精確編輯,但具有比其他樣品更大之變異性(圖39B及39C)。此等資料表明,除了先前表徵之Eco1以外,最近報告之(Aco1)及三種新穎逆轉錄子(RTX3-2042、RTX6083 v1、RTX3-6943)亦使得能夠在人類細胞中進行精確基因編輯(11)。Representative editing results are shown in Figure 35. Eco1 (Figure 36), Aco1 (Figure 37), RTX3-2042 (Figure 38), RTX3-6083 v1 and 6943 (Figure 39A) showed precise editing efficiencies of 0.3%, 0.1%, 0.25%, 0.06% and 0.05% (left panels in Figures 36, 37, 38 and 39, respectively). Unwanted editing results were defined as indels that incorporated random nucleotides or deleted sequences near the Cas9 cut site, reaching 50%, 3%, 5%, 2% and 4% (right panels in Figures 36, 37, 38 and 39A, respectively). Subsequent experiments using the same assay indicated that RTX3_6083v1 and RTX3_6943 generated 3-4 times more precise edits than Eco1, while indels generated by these two retrotranscripts were 2-3 times lower (Figures 39B and 39C). RTX3_2042 showed a similar frequency of precise edits as Eco1, but with greater variability than the other samples (Figures 39B and 39C). These data suggest that in addition to the previously characterized Eco1, the recently reported (Aco1) and three novel retrotranscripts (RTX3-2042, RTX6083 v1, RTX3-6943) also enable precise gene editing in human cells (11).

使用相同分析之後續實驗指示RTX3-6342S及RTX3-6342L以及RTX-1262生成頻率比Eco1高10-20倍且頻率比RTX3-2042、6083v1、6943及0637高約5-10倍之精確編輯(參見圖39E – 左圖)。此外,與Eco1相比,使用RTX3-6342S、6342L、1262、0637及6083v1生成之插入缺失之頻率低兩倍(圖39E – 右圖)。與Eco1相比,使用RTX3-2042生成之插入缺失減少80% (圖39E – 右圖)。另外,此等逆轉錄子均比另一文獻驗證之逆轉錄子Eco3更頻繁地生成精確編輯,不過具有相似或更高之插入缺失頻率(圖39E)。亦用msd莖中之缺失修飾一小組逆轉錄子,以評估是否可能改良精確編輯。吾人注意到,在六種逆轉錄子中,精確編輯在縮短msd莖之後有所增加,而在剩餘三種逆轉錄子中,精確編輯降低或不變(圖57及圖58)。Subsequent experiments using the same analysis indicated that RTX3-6342S and RTX3-6342L, as well as RTX-1262, generated precise edits 10-20 times more frequently than Eco1 and approximately 5-10 times more frequently than RTX3-2042, 6083v1, 6943, and 0637 (see Figure 39E - left). In addition, indels were generated two times less frequently using RTX3-6342S, 6342L, 1262, 0637, and 6083v1 compared to Eco1 (Figure 39E - right). Indels were generated 80% less frequently using RTX3-2042 compared to Eco1 (Figure 39E - right). Additionally, all of these retrotranscripts generated precise edits more frequently than another retrotranscript validated in the literature, Eco3, but with similar or higher indel frequencies (Figure 39E). A small set of retrotranscripts were also modified with deletions in the msd stem to assess whether precise editing could be improved. We noted that in six retrotranscripts, precise editing increased after shortening the msd stem, while in the remaining three retrotranscripts, precise editing decreased or remained unchanged (Figures 57 and 58).

此等資料表明,除了基因編輯中先前表徵之Eco1及Eco3以外,最近報告之(Aco1)及六種新穎逆轉錄子(RTX3-0637、RTX3-1262、RTX3-2042、RTX6083 v1、RTX3-6342、RTX3-6943)亦使得能夠在人類細胞中進行精確基因編輯。此外,修飾msd莖長度可改變精確編輯,但效果可能為逆轉錄子特異性的。 B. 逆轉錄子介導之基因編輯在全RNA系統中之證據 These data indicate that, in addition to Eco1 and Eco3, which have been previously characterized in gene editing, the recently reported (Aco1) and six novel retrotranscripts (RTX3-0637, RTX3-1262, RTX3-2042, RTX6083 v1, RTX3-6342, RTX3-6943) also enable precise gene editing in human cells. In addition, modifying the msd stem length can alter precise editing, but the effect is likely retrotranscript-specific. B. Evidence for retrotranscript-mediated gene editing in a whole RNA system

為了建立基於RNA之編輯系統,首先在組成性表現Cas9核酸酶之HEK293T細胞中用兩種RNA組分(RT mRNA及與Cas9核酸酶之sgRNA融合的ncRNA)執行基因編輯分析。隨著RT及ncRNA以RNA形式進行分離,藉由最佳化RT酶與其受質之間的比率,可能比它們一起在同一質體中時更容易地鑑定出用於更高精度編輯及更低插入缺失之條件。選擇三種先前已知之逆轉錄子EcoI、Eco3及Eco5用於此第一組使用全RNA形式之實驗。To establish an RNA-based editing system, gene editing assays were first performed in HEK293T cells that constitutively express the Cas9 nuclease with two RNA components (RT mRNA and ncRNA fused to the sgRNA of the Cas9 nuclease). With the RT and ncRNA isolated in RNA form, by optimizing the ratio between the RT enzyme and its substrate, it may be easier to identify conditions for higher precision editing and lower indels than when they are together in the same plastid. Three previously known retrotranscripts, EcoI, Eco3, and Eco5, were selected for this first set of experiments using the all-RNA format.

藉由活體外轉錄產生每種逆轉錄子之RT mRNA及其與Cas9核酸酶之sgRNA融合的同源ncRNA。遵循圖40中之實驗方案。在表現Cas9之HEK293T細胞中,在兩種不同比率之RT mRNA及ncRNA-sgRNA融合下比較Eco1、Eco3及Eco5介導之基因編輯活性。比RT mRNA更多量之ncRNA經轉染,以使msDNA產生增至最多,以便進行精確基因編輯。結果顯示,針對Eco3之精確編輯高達0.4%且針對Eco3之插入缺失低至10% (圖41)。基於RNA之分析中的精確編輯頻率與基於質體之分析中的精確編輯頻率可相當。RT mRNA of each retrotransposons and its cognate ncRNA fused to sgRNA of Cas9 nuclease were generated by in vitro transcription. The experimental scheme in Figure 40 was followed. In HEK293T cells expressing Cas9, the gene editing activities mediated by Eco1, Eco3 and Eco5 were compared at two different ratios of RT mRNA and ncRNA-sgRNA fusion. More ncRNA than RT mRNA was transfected to maximize msDNA production for precise gene editing. The results showed that the precise editing for Eco3 was as high as 0.4% and the indel for Eco3 was as low as 10% (Figure 41). The precise editing frequency in RNA-based analysis can be comparable to that in plastid-based analysis.

鑑於Eco3產生最高之編輯,在Eco3逆轉錄子系統中進一步最佳化RNA負載及RT mRNA: ncRNA-sgRNA融合比率。向0.2或0.5 µg RT mRNA中添加ncRNA-sgRNA融合,RT mRNA與ncRNA之比率分別為1:2、1:3、1:4、1:5、1:8、1:10。0.5 μg RT mRNA在任何比率下均產生比0.2 ug更精確之編輯,且增加之ncRNA與更高精度編輯相關(圖42,左圖)。插入缺失低至6% (圖42,右圖)。利用兩種RNA組分,在高達1%之細胞群體中實現精確編輯。Given that Eco3 produced the highest edits, RNA loading and RT mRNA: ncRNA-sgRNA fusion ratios were further optimized in the Eco3 retrotranscript system. ncRNA-sgRNA fusions were added to 0.2 or 0.5 µg RT mRNA at 1:2, 1:3, 1:4, 1:5, 1:8, 1:10 ratios of RT mRNA to ncRNA. 0.5 μg RT mRNA produced more precise edits than 0.2 ug at any ratio, and increased ncRNA correlated with higher precision edits (Figure 42, left). Indels were as low as 6% (Figure 42, right). With both RNA components, precise edits were achieved in up to 1% of the cell population.

接著,將Cas9 mRNA添加至RT mRNA及ncRNA-sgRNA融合中,從而在HEK293T細胞中製造全RNA編輯系統(圖43)。在最佳化RNA負載及RT mRNA與ncRNA-sgRNA融合之比率下,用Eco3逆轉錄子滴定Cas9 mRNA之量(圖44)。使用0.2 µg Cas9 mRNA在高達0.1%之細胞群體中觀察到精確編輯,且添加更多不會增加編輯效率。雖然精確編輯頻率比2-RNA系統低一個個量級,但編輯係藉由Cas9核酸酶及逆轉錄子之特定作用而發生的,因為缺乏任一者均會取消精確編輯。Next, Cas9 mRNA was added to the RT mRNA and ncRNA-sgRNA fusions to create a full RNA editing system in HEK293T cells (Figure 43). The amount of Cas9 mRNA was titrated with the Eco3 retrotranscript at optimized RNA loading and ratios of RT mRNA to ncRNA-sgRNA fusions (Figure 44). Precise editing was observed in up to 0.1% of the cell population using 0.2 µg of Cas9 mRNA, and adding more did not increase editing efficiency. Although the frequency of precise editing was orders of magnitude lower than the 2-RNA system, editing occurred through the specific action of the Cas9 nuclease and the retrotranscript, as the absence of either abolished precise editing.

亦使用MessengerMAX試劑經由脂質轉染將全RNA系統送至細胞(圖45)。就RNA/脂質複合物之形成及細胞攝取機製而言,此方法更類似於脂質奈米顆粒(LNP)中裝載之治療性RNA的 活體內遞送。當應用於由Aco1 RT mRNA、Aco1 ncRNA-sgRNA融合及Cas9 mRNA組成之Aco1逆轉錄子系統時,在0.1%之細胞群體中觀察到56 bp插入及6 bp缺失之精確編輯,且插入缺失頻率為約1.5% (圖46)。Aco1逆轉錄子最近得到注釋,但尚未經過實驗驗證(MR, 2020)。Aco1介導之精確基因編輯強烈地表明Aco1逆轉錄子可能在細胞內部產生msDNA。此外,精確編輯中之插入長度超過由其他逆轉錄酶介導之編輯製得的彼等長度(AV, 2019)。 C. 藉由sgRNA摻加增強全RNA系統中之逆轉錄子介導之基因編輯 The whole RNA system was also delivered to cells via lipofection using the MessengerMAX reagent (Figure 45). This approach is more similar to the in vivo delivery of therapeutic RNA loaded in lipid nanoparticles (LNPs) in terms of RNA/lipid complex formation and cellular uptake mechanisms. When applied to the Aco1 retrotranscript system composed of Aco1 RT mRNA, Aco1 ncRNA-sgRNA fusion, and Cas9 mRNA, precise editing of 56 bp insertions and 6 bp deletions was observed in 0.1% of the cell population, and the insertion-deletion frequency was approximately 1.5% (Figure 46). The Aco1 retrotranscript has recently been annotated but has not yet been experimentally validated (MR, 2020). Aco1-mediated precise gene editing strongly suggests that the Aco1 retrotranscriptome may generate msDNA inside cells. Furthermore, the insert lengths in the precise editing exceed those produced by editing mediated by other reverse transcriptases (AV, 2019). C. Enhancement of retrotranscriptome-mediated gene editing in an all-RNA system by sgRNA incorporation

在上述所有實驗中,均使用與Cas9核酸酶之sgRNA融合的逆轉錄子ncRNA。此融合RNA充當RT之模板,同時sgRNA部分可能與Cas9核酸酶複合。作用於單一RNA片段之兩種酶可能產生影響任一酶之活性的空間效應。與此假設一致,插入缺失頻率說明,整體Cas9核酸酶活性在400 ng ncRNA-sgRNA融合下僅為約1.5% (圖46,右圖)。In all of the above experiments, a retrotranscriptomic ncRNA fused to the sgRNA of the Cas9 nuclease was used. This fusion RNA served as a template for the RT, while the sgRNA portion was likely complexed with the Cas9 nuclease. Two enzymes acting on a single RNA fragment could produce spatial effects that affect the activity of either enzyme. Consistent with this hypothesis, the indel frequency suggests that the overall Cas9 nuclease activity is only about 1.5% at 400 ng of ncRNA-sgRNA fusion (Figure 46, right).

為了系統地測試該假設,在等莫耳量之ncRNA-sgRNA融合與經分離之sgRNA之間比較藉由插入缺失頻率說明之Cas9核酸酶活性。當用200 ng Cas9 mRNA進行電穿孔時,1 µg (7.5 pmol) ncRNA-sgRNA融合顯示比0.266 µg (7.5 pmol)單獨經分離之sgRNA低20倍的活性(圖47) 結果顯示,ncRNA-sgRNA融合可能抑制與Cas9蛋白形成複合物或抑制Cas9-sgRNA複合物之活性,或抑制兩者。此外,經化學修飾之sgRNA顯示比未經修飾之sgRNA高6倍的活性。此等結果指示,藉由ncRNA-sgRNA融合實現之Cas9裂解活性可能限制精確編輯,且添加經分離之sgRNA可能對此進行補償。預期經分離之sgRNA的修飾進一步增強精確編輯。To systematically test this hypothesis, Cas9 nuclease activity as indicated by indel frequency was compared between equimolar amounts of ncRNA-sgRNA fusions and isolated sgRNAs. When electroporated with 200 ng of Cas9 mRNA, 1 µg (7.5 pmol) of ncRNA-sgRNA fusions showed 20-fold lower activity than 0.266 µg (7.5 pmol) of isolated sgRNA alone (Figure 47). The results suggest that ncRNA-sgRNA fusions may inhibit the formation of a complex with the Cas9 protein or inhibit the activity of the Cas9-sgRNA complex, or both. In addition, chemically modified sgRNAs showed 6-fold higher activity than unmodified sgRNAs. These results suggest that Cas9 cleavage activity achieved by ncRNA-sgRNA fusions may limit precise editing and that addition of isolated sgRNAs may compensate for this. Modification of the isolated sgRNAs is expected to further enhance precise editing.

圖48概述全RNA系統中之sgRNA摻加的分析,且圖R.顯示使用Eco3逆轉錄子之結果。在0.2 μg Cas9 mRNA及0.5 μg RT mRNA下,在50、100及200 ng下滴定sgRNA 摻加量(圖41,左圖)。在兩種不同RT mRNA: ncRNA-sgRNA融合比率=1:6或1:8下,執行滴定。藉由添加50 ng sgRNA摻加至1:6比率,精確編輯達到40倍且隨著更多sgRNA逐漸增加,在200 ng時達到高達50倍增加。1:8比率之RT mRNA:ncRNA-sgRNA融合與sgRNA摻加類似地作出反應,且獲得略高於1:6之活性,精確編輯高達13%。在右圖中,藉由sgRNA摻加獲得之更高精度編輯伴隨著更高插入缺失。Figure 48 summarizes the analysis of sgRNA spike-in in the all-RNA system, and Figure R. shows the results using the Eco3 retrotransposons. At 0.2 μg Cas9 mRNA and 0.5 μg RT mRNA, the amount of sgRNA spike-in was titrated at 50, 100, and 200 ng (Figure 41, left panel). Titration was performed at two different RT mRNA: ncRNA-sgRNA fusion ratios = 1:6 or 1:8. By adding 50 ng of sgRNA spike-in to a 1:6 ratio, precise editing reached 40-fold and gradually increased with more sgRNA, reaching up to a 50-fold increase at 200 ng. The 1:8 ratio of RT mRNA:ncRNA-sgRNA fusion responded similarly to sgRNA addition and achieved slightly higher activity than 1:6, with up to 13% precise editing. In the right panel, the higher precision edits achieved by sgRNA addition were accompanied by higher indels.

使用正交遞送方法、脂質轉染來確認sgRNA摻加在精確編輯中之效應(圖50)。儘管轉染之RNA總量比電穿孔低五倍,但在無sgRNA摻加之情況下獲得3.4%精確編輯,這比藉由電穿孔實現之精確編輯高7.5倍(圖49及圖51,左圖)。隨著添加低至2 ng之sgRNA,精確編輯進一步增加高達3.5倍,達到12%效率(圖51,左圖)。將sgRNA之量增加至10 ng不會進一步改變效率。觀察到的精確編輯取決於逆轉錄子,因為逆轉錄子組分(RT mRNA及ncRNA-sgRNA融合)之缺失完全消除了編輯。An orthogonal delivery method, lipofection, was used to confirm the effect of sgRNA incorporation in precise editing (Figure 50). Although the total amount of RNA transfected was five-fold lower than electroporation, 3.4% precise editing was obtained without sgRNA incorporation, which is 7.5-fold higher than the precise editing achieved by electroporation (Figures 49 and 51, left). With the addition of as little as 2 ng of sgRNA, precise editing increased further by up to 3.5-fold, reaching 12% efficiency (Figure 51, left). Increasing the amount of sgRNA to 10 ng did not further change the efficiency. The observed precise editing was dependent on the retrotranscript, as the loss of retrotranscript components (RT mRNA and ncRNA-sgRNA fusion) completely abolished editing.

此等資料支持ncRNA及Cas9核酸酶之sgRNA的物理融合可能為Cas9活性及因此精確編輯之限制因素,且sgRNA摻加藉由對其進行補充顯著增強編輯。 D. ncRNA 及sgRNA之分離 These data support that physical fusion of the ncRNA and the sgRNA of the Cas9 nuclease may be a limiting factor for Cas9 activity and thus precise editing, and that sgRNA incorporation significantly enhances editing by complementing it. D. Isolation of ncRNA and sgRNA

在Eco3逆轉錄子系統中測試sgRNA與ncRNA分離對編輯之影響,在ncRNA-sgRNA融合與並列經分離之ncRNA及sgRNA之間比較編輯效率(圖52)。將增加量之sgRNA添加至300 ng ncRNA中,該量為400 ng ncRNA-sgRNA融合之等莫耳量。在未添加sgRNA之情況下,未如預期觀察到精確編輯。在10 ng sgRNA下,精確編輯達到2.23%峰值,相比之下,用ncRNA-sgRNA融合獲得1.78% (圖52,左圖)。sgRNA之量增加超過10 ng不會進一步改良精確編輯。插入缺失之頻率(圖52,右圖)顯示與精確編輯類似之趨勢。The effect of separation of sgRNA and ncRNA on editing was tested in the Eco3 retrotranscript system, and editing efficiency was compared between ncRNA-sgRNA fusions and juxtaposed separated ncRNA and sgRNA (Figure 52). Increasing amounts of sgRNA were added to 300 ng ncRNA, which is an equimolar amount of 400 ng ncRNA-sgRNA fusion. In the absence of added sgRNA, precise editing was not observed as expected. At 10 ng sgRNA, precise editing peaked at 2.23%, compared to 1.78% obtained with ncRNA-sgRNA fusions (Figure 52, left). Increasing the amount of sgRNA beyond 10 ng did not further improve precise editing. The frequency of indels (Fig. 52, right) shows a trend similar to that of precise edits.

此等資料表明,ncRNA及sgRNA之分離可能實現與ncRNA-sgRNA融合可相當甚至更高之精確編輯。 E. 來自不同進化枝之先導逆轉錄子在替代細胞類型中顯示出精確編輯活性 These data suggest that separation of ncRNA and sgRNA may enable editing with comparable or even greater precision than ncRNA-sgRNA fusions. E. Leader retrotranscripts from different evolutionary clades display precise editing activity in alternative cell types

此等實例已顯示293T細胞中穩健之精確編輯活性。為了確定此等結果可擴展至其他細胞類型,將編碼先導逆轉錄子逆轉錄酶及ncRNA/EMX1 sgRNA融合之質體與編碼Cas9-mCherry之質體一起共電穿孔至K562細胞(一種成紅血球細胞白血病細胞株)中(圖87)。圖88 (左圖)顯示RTX3_2042、6083v1、6943、1262、6342L及6342S可分別生成1.3%、1.2%、1%、2.9%、5%及6.7%之精確編輯頻率。藉由在Cas9-mCherry質體上包括EMX1 gRNA表現卡匣,RTX3_6342S之精確編輯頻率可能提高至10.5%。此系統亦在所有樣品中生成穩健之插入缺失頻率。此等頻率在逆轉錄子之間有所不同,且與精確之編輯頻率不相關(圖88,右圖)。插入缺失與精確編輯之間缺乏相關性可能藉由ncRNA/sgRNA融合中之空間效應來解釋,該等空間效應影響Cas9與gRNA之間或逆轉錄酶與ncRNA之間的相互作用。These examples have shown robust precise editing activity in 293T cells. To determine whether these results can be extended to other cell types, plasmids encoding the leader retrotransposon reverse transcriptase and ncRNA/EMX1 sgRNA fusion were co-electroporated with plasmids encoding Cas9-mCherry into K562 cells (an erythroblastic leukemia cell line) (Figure 87). Figure 88 (left panel) shows that RTX3_2042, 6083v1, 6943, 1262, 6342L and 6342S can generate precise editing frequencies of 1.3%, 1.2%, 1%, 2.9%, 5% and 6.7%, respectively. By including the EMX1 gRNA expression cassette on the Cas9-mCherry plasmid, the precise editing frequency of RTX3_6342S could be increased to 10.5%. This system also generated robust indel frequencies in all samples. These frequencies varied between retrotranscripts and did not correlate with precise editing frequencies (Figure 88, right). The lack of correlation between indels and precise editing may be explained by spatial effects in the ncRNA/sgRNA fusion that affect the interaction between Cas9 and gRNA or between the reverse transcriptase and ncRNA.

使用新穎逆轉錄子之精確編輯效率亦與此系統中使用Cas9-mCherry EMX1 gRNA質體時數種經過實驗驗證之逆轉錄子進行比較(MR, 2020)。對逆轉錄子Eco1、Eco3、Aco1、Sau1、Sen1、RTX3_2781、RTX3_1262、RTX3_6342L及RTX3_6342S進行測試且觀察到分別具有0.1%、0.8%、14%、3.4%、0.1%、14.2%、19%及19.5%之精確編輯效率(圖89及90,左圖)。插入缺失在樣品之間亦有所不同,且與精確編輯水準無關,這可能藉由先前提及之相同空間效應來解釋(圖89及90,右圖)。此等資料指示,RTX3_1262、6342L及6342S生成精確編輯之頻率與 先前記錄之逆轉錄子相同或更頻繁。 F. 先導逆轉錄子可使用RNA引導之切口酶穩健地生成精確編輯 Precise editing efficiencies using the novel retrotranscripts were also compared to several retrotranscripts that have been experimentally validated in this system using Cas9-mCherry EMX1 gRNA plasmids (MR, 2020). Retrotranscripts Eco1, Eco3, Aco1, Sau1, Sen1, RTX3_2781, RTX3_1262, RTX3_6342L, and RTX3_6342S were tested and observed to have precise editing efficiencies of 0.1%, 0.8%, 14%, 3.4%, 0.1%, 14.2%, 19%, and 19.5%, respectively (Figures 89 and 90, left). Indels also varied between samples and were not associated with precise editing levels, which may be explained by the same spatial effects mentioned previously (Figures 89 and 90, right). These data indicate that RTX3_1262, 6342L, and 6342S generate precise edits at the same or greater frequency than previously documented retrotranscripts. F. Lead retrotranscripts can robustly generate precise edits using RNA-guided nicking enzymes

先前實驗已顯示,逆轉錄子可使用雙股斷裂(DSB)作為重組起始事件來安裝精確編輯。然而,使用DSB起始修復可導致脫靶位點處形成插入缺失及染色體重排,這對治療劑開發而言為非所需效應(Schiorli, 2019)。為了避免插入缺失,可使用ssDNA切口而非DSB來起始重組(13)。為了確定Cas9 D10A切口酶是否可起始逆轉錄子介導之精確編輯,將Cas9 D10A-mCherry EMX1 gRNA質體與編碼逆轉錄子RT及ncRNA/EMX1 sgRNA融合之質體共電穿孔至K562細胞中。當使用逆轉錄子Eco1、Eco3、RTX3_6083v1或RTX3_6943時,未觀察到顯著編輯(圖91,左圖)。然而,發現RTX3_2042、RTX3_1262、RTX3_6342L及RTX3_6342S之精確編輯頻率分別為約0.2%、約0.6%、約0.2%及約0.4% (圖91,左圖)。如所預期,由切口起始之精確編輯頻率比使用DSB起始之彼等頻率低> 10倍,且與使用Cas9核酸酶可見之頻率相比,插入缺失頻率低得多(圖91,右圖)。此等資料表明,RNA引導之切口酶可在 活體外使用基於質體之系統安裝由新穎逆轉錄子編碼之編輯。 G. 逆轉錄子介導之基因編輯與LbCas12a及TnpB核酸酶可相容 Previous experiments have shown that retrotranscripts can use double-strand breaks (DSBs) as recombination initiation events to install precise edits. However, using DSBs to initiate repair can lead to indel formation and chromosomal rearrangements at off-target sites, which are undesirable effects for therapeutic development (Schiorli, 2019). To avoid indels, ssDNA nicking can be used instead of DSBs to initiate recombination (13). To determine whether the Cas9 D10A nickase can initiate retrotranscript-mediated precise edits, a Cas9 D10A-mCherry EMX1 gRNA plasmid was co-electroporated into K562 cells with a plasmid encoding a retrotranscript RT and a ncRNA/EMX1 sgRNA fusion. No significant editing was observed when retrotranscripts Eco1, Eco3, RTX3_6083v1, or RTX3_6943 were used (Figure 91, left). However, precise editing frequencies were found to be about 0.2%, about 0.6%, about 0.2%, and about 0.4% for RTX3_2042, RTX3_1262, RTX3_6342L, and RTX3_6342S, respectively (Figure 91, left). As expected, the precise editing frequencies initiated by nicks were >10-fold lower than those initiated using DSBs, and the indel frequencies were much lower than those seen using Cas9 nuclease (Figure 91, right). These data demonstrate that RNA-guided nicking can be used in vitro to install editing encoded by novel retrotranscripts using a plastid-based system. G. Retrotran-mediated gene editing is compatible with LbCas12a and TnpB nucleases

此等實例已使用Cas9核酸酶或切口酶來執行逆轉錄子介導之基因編輯。然而,替代核酸酶(諸如但不限於LbCas12a (Zetsche B, 2015)及TnpB (Karvelis, 2021))可在基於逆轉錄子之基因編輯系統中使用,來替代Cas9核酸酶或切口酶。為了確定Cas12a或TnpB為否可誘導逆轉錄子介導之基因編輯,將編碼LbCas12a、TnpB或Cas9之質體及其相應EMX1 gRNA及編碼逆轉錄子RTX3_6083v1逆轉錄酶及每種核酸酶之ncRNA模板的質體共電穿孔至K562細胞中。使用此方法,可注意到LbCas12a及RTX3_6083v1可能介導約0.6%之精確編輯(圖92,左圖),其中插入缺失比Cas9比較物(圖92,右圖)少約30%。然而,由TnpB及RTX3_6083v1生成之精確編輯頻率在一種引導物中為約0.04%且在另一者中無法偵測到(圖93,左圖)。這可能歸因於gRNA#2之切割活性有限,因為在背景之上未偵測到插入缺失。儘管使用具有sgRNA#1之Cas12a (圖93,右圖)可見相似插入/缺失頻率,但精確編輯頻率實質上低於使用Cas12a (圖92,左圖)可見之彼等頻率。精確編輯頻率之差異可能歸因於DSB形成、DSB解析度或核酸酶之蛋白質表現的差異。此外,應注意,使用經分離之Cas9 gRNA及逆轉錄子ncRNA進行的精確編輯頻率導致精確編輯效率類似於先前實驗中將ncRNA/sgRNA融合與編碼Cas9之質體及EMX1 gRNA表現卡匣共電穿孔時可見之彼等頻率(圖91,左圖)。總之,逆轉錄子介導之基因編輯可藉由Cas12a樣及TnpB樣核酸酶來實現,且不必要將gRNA與逆轉錄子ncRNA融合來促進使用基於質體之逆轉錄子基因編輯器進行精確編輯。 H. 逆轉錄子介導之基因編輯可用經截短之RT蛋白執行 These examples have used Cas9 nuclease or nickase to perform retrotranscript-mediated gene editing. However, alternative nucleases (such as but not limited to LbCas12a (Zetsche B, 2015) and TnpB (Karvelis, 2021)) can be used in retrotranscript-based gene editing systems to replace Cas9 nuclease or nickase. To determine whether Cas12a or TnpB can induce retrotranscript-mediated gene editing, plasmids encoding LbCas12a, TnpB or Cas9 and their corresponding EMX1 gRNA and plasmids encoding the retrotranscript RTX3_6083v1 reverse transcriptase and ncRNA templates for each nuclease were co-electroporated into K562 cells. Using this method, it can be noted that LbCas12a and RTX3_6083v1 may mediate about 0.6% of precise editing (Figure 92, left), with about 30% fewer indels than the Cas9 comparator (Figure 92, right). However, the precise editing frequency generated by TnpB and RTX3_6083v1 was about 0.04% in one guide and could not be detected in the other (Figure 93, left). This may be attributed to the limited cutting activity of gRNA#2, as indels were not detected above background. Although similar indel frequencies were seen using Cas12a with sgRNA#1 (Figure 93, right), the precise editing frequency was substantially lower than those seen using Cas12a (Figure 92, left). Differences in precise editing frequency may be due to differences in DSB formation, DSB resolution, or protein expression of nucleases. In addition, it should be noted that precise editing frequencies using separated Cas9 gRNA and retrotranscript ncRNA resulted in precise editing efficiencies similar to those seen in previous experiments when ncRNA/sgRNA fusions were co-electroporated with plasmids encoding Cas9 and EMX1 gRNA expression cassettes (Figure 91, left). In summary, retrotranscript-mediated gene editing can be achieved by Cas12a-like and TnpB-like nucleases, and it is not necessary to fuse gRNA to retrotranscript ncRNA to facilitate precise editing using plastid-based retrotranscript gene editors. H. Retrotranscript-mediated gene editing can be performed using truncated RT proteins

逆轉錄子含有對生理噬菌體防禦功能可能至關重要但對逆轉錄酶活性不必要之搭配結構域(A, 2020)。RTX3_6342含有由數個α螺旋組成之N末端區域,預測該等α螺旋不與逆轉錄酶結構域對齊(圖94)(Jumper, 2021)。為了確定此區域是否為精確編輯所必需的,創建不同長度之N末端截短,且藉由NGS評估精確編輯活性。應注意,殘基1-49 AA、1-99 AA及1-144 AA之缺失並未損害RTX3_6342之精確編輯活性(圖95)。如所預期,插入缺失未受影響(圖95)。此等資料表明,6342S中之非RT結構域並非逆轉錄所必需的。 I. 逆轉錄子介導之外顯子大小200-400鹼基對插入物之安裝 Retrotranscripts contain cooperating domains that may be critical for physiological phage defense functions but are dispensable for reverse transcriptase activity (A, 2020). RTX3_6342 contains an N-terminal region composed of several α-helices that are predicted not to align with the reverse transcriptase domain (Figure 94) (Jumper, 2021). To determine whether this region is required for precise editing, N-terminal truncations of varying lengths were created and precise editing activity was assessed by NGS. It should be noted that deletions of residues 1-49 AA, 1-99 AA, and 1-144 AA did not impair the precise editing activity of RTX3_6342 (Figure 95). As expected, indels were unaffected (Figure 95). These data indicate that the non-RT domains in 6342S are not required for reverse transcription. I. Retrotranscript-mediated installation of exon inserts 200-400 bp in size

為了確定逆轉錄子介導之基因編輯是否可能安裝大於10 bp之編輯,試圖在EMX1基因座處進行10、25、205、305及405bp之編輯。使用RTX3_6342,觀察到10 bp、25 bp、205 bp、305 bp及405 bp之精確編輯效率分別為約35%、約40%、約30%、約35%及約40% (圖96,左圖)。亦以類似方式對RTX3_1262進行測試且觀察到10 bp、25 bp、205 bp、305 bp及405 bp之編輯分別具有約22%、約35%、約28%、約2%及約10%之精確編輯效率。應注意,樣品間之插入缺失頻率相似,且因此核酸酶活性並不限制此等編輯之安裝(圖96,右圖)。此等資料合起來指示,使用逆轉錄子介導之編輯可有效地安裝>205 bp之插入物,這超出由先前基於逆轉錄酶之基因編輯器安裝的大小(AV, 2019)。此外,RTX3_6342S傾向於以比RTX3_1262更穩健之頻率安裝更長編輯。To determine whether retrotransposon-mediated gene editing could install edits greater than 10 bp, attempts were made to edit 10, 25, 205, 305, and 405 bp at the EMX1 locus. Using RTX3_6342, precise editing efficiencies of about 35%, about 40%, about 30%, about 35%, and about 40% were observed for 10 bp, 25 bp, 205 bp, 305 bp, and 405 bp, respectively (Figure 96, left). RTX3_1262 was also tested in a similar manner and precise editing efficiencies of about 22%, about 35%, about 28%, about 2%, and about 10% were observed for 10 bp, 25 bp, 205 bp, 305 bp, and 405 bp, respectively. It should be noted that the frequency of indels was similar between samples, and thus nuclease activity was not limiting the installation of these edits (Figure 96, right). Together, these data indicate that inserts >205 bp can be efficiently installed using retrotranscript-mediated editing, which exceeds the size installed by previous reverse transcriptase-based gene editors (AV, 2019). In addition, RTX3_6342S tends to install longer edits at a more robust frequency than RTX3_1262.

由於缺乏校對結構域及酶活性位點之結構差異,與複製DNA聚合酶相比,逆轉錄酶(RT)傾向於觀察到易錯(Oscorbin IP, 2021)。為了確定是否使用RTX3_6342S逆轉錄酶準確地安裝了更長編輯,使用對經編輯等位基因中之取代進行定量的CRISPResso2參數再分析先前資料。應注意,當使用CRISPResso2對取代進行定量時,隨著插入物變長,精確編輯之百分比下降(圖97,左圖,將圓形與三角形進行比較)。當對插入缺失進行定量時未觀察到此趨勢(圖97,右圖,將圓形與三角形進行比較),這表明逆轉錄酶保真度造成由RTX3_6342S安裝之精確編輯中可見之突變。這表明可執行逆轉錄酶保真度之進一步最佳化以準確安裝大編輯。 J. 逆轉錄子RT之合理突變誘發以改良可加工性及保真度 Reverse transcriptases (RTs) tend to be more error prone compared to replicative DNA polymerases due to the lack of a proofreading domain and structural differences in the enzyme active site (Oscorbin IP, 2021). To determine whether longer edits were accurately installed using the RTX3_6342S reverse transcriptase, previous data were reanalyzed using the CRISPResso2 parameter that quantifies substitutions in the edited allele. It should be noted that when substitutions were quantified using CRISPResso2, the percentage of accurate edits decreased as the insert became longer (Figure 97, left panel, compare circles to triangles). This trend was not observed when indels were quantified (Figure 97, right, compare circles to triangles), indicating that reverse transcriptase fidelity is responsible for the mutations seen in the precise edits installed by RTX3_6342S. This suggests that further optimization of reverse transcriptase fidelity can be performed to accurately install large edits. J. Rational mutation induction of retrotranscript RT to improve processability and fidelity

多種RT之生物化學特性(諸如可加工性及保真度)在文獻中得到廣泛表徵(Oscorbin IP, 2021)。此外,已針對高可加工性及保真度對一些RT進行廣泛工程改造以用於諸如RNA測序文庫製備及先導編輯之應用(Oscorbin IP, 2021) ( Yarnall MTN, 2023)。然而,逆轉錄子RT保真度及可加工性尚未經表徵,亦無人試圖改良此等特質。大多數DNA聚合酶內部之手掌、手指及拇指結構域與模板分子相互作用,且將引入之核苷酸引導至催化核心,以進行高效DNA合成(Oscorbib IP, 2020)。此處,基於RTX3_6342S之Alphafold預測結構與MMLV或Marathon RT及Eco1之晶體結構的結構比較,使此等結構域中之選擇殘基發生突變(下表J1)。 表J1. RTX3_6342S中之選擇殘基的突變。 The biochemical properties of various RTs, such as processability and fidelity, have been extensively characterized in the literature (Oscorbin IP, 2021). In addition, some RTs have been extensively engineered for high processability and fidelity for applications such as RNA sequencing library preparation and prime editing (Oscorbin IP, 2021) ( Yarnall MTN, 2023). However, the fidelity and processability of retrotranscript RTs have not been characterized, and no attempts have been made to improve these properties. The palm, finger, and thumb domains within most DNA polymerases interact with the template molecule and guide the incoming nucleotide to the catalytic core for efficient DNA synthesis (Oscorbib IP, 2020). Here, selected residues in these domains were mutated based on structural comparison of the Alphafold predicted structure of RTX3_6342S with the crystal structures of MMLV or Marathon RT and Eco1 (Table J1 below). Table J1. Mutations of selected residues in RTX3_6342S.

此等殘基基於其與MMLV/Eco1結構中存在之RNA/DNA模板的接近度以及文獻中引用之先前經工程改造之MMLV或Marathon變異體進行優先排序(Oscorbin IP, 2021) (Grünewald, 2023)。These residues were prioritized based on their proximity to the RNA/DNA template present in the MMLV/Eco1 structure and previously engineered MMLV or Marathon variants cited in the literature (Oscorbin IP, 2021) (Grünewald, 2023).

作為概念驗證,在RTX3_6342 RT拇指結構域中之9種殘基中,對32種RT變異體進行工程改造。詳言之,基於RTX3_6342S構建以下32種RT變異體: (1) E466K、E466N、E466Q、E466R; (2) K475N、K475Q、K475R; (3) M470K、M470N、M470Q、M470R; (4) N465K、N465Q、N465R; (5) S468K、S468N、S468Q、S468R; (6) S472G、S472P; (7) S476K、S476N、S476Q、S476R; (8) V477F、V477I、V477L、V477W;及 (9) W473F、W473K、W473R、W473Y。 As a proof of concept, 32 RT variants were engineered in 9 residues in the RTX3_6342 RT thumb domain. Specifically, the following 32 RT variants are built based on RTX3_6342S: (1) E466K, E466N, E466Q, E466R; (2) K475N, K475Q, K475R; (3) M470K, M470N, M470Q, M470R; (4) N465K, N465Q, N465R; (5) S468K, S468N, S468Q, S468R; (6) S472G, S472P; (7) S476K, S476N, S476Q, S476R; (8) V477F, V477I, V477L, V477W; and (9) W473F, W473K, W473R, W473Y.

接著將此等構築體與編碼Cas9之質體、EMX1 sgRNA及具有10 bp或305 bp插入物之RTX3_6342S ncRNA共電穿孔至K562細胞中。72 h後,收穫細胞進行擴增子測序分析且使用CRISPResso對編輯進行定量。大多數合理設計之突變使用10 bp或305 bp插入物適度降低了精確編輯與插入缺失之比率(PEIR) (圖98,頂部)。然而,應注意,N465K在安裝305 bp插入時增加PEIR,這表明突變增加RTX_6342逆轉錄酶之可加工性或總ssDNA產生(圖98,底部)。觀察到,使用RTX3_6342 (M470K及V477F)之異種同源突變體作為單一突變體及組合突變體時,改良MMLV活性之文獻突變(T306K及W313F)降低了10 bp及305 bp插入物之PEIR (圖98及圖99)。亦注意到RTX3_6342 V265P (MMLV L139P之異種同源物)降低305 bp插入物之PEIR,對205 bp插入物無影響,且使用405 bp插入物時增加該比率(圖99)。此等結果可能歸因於每個個別插入物中存在不同序列特徵或二級結構。亦基於先前經工程改造之marathon RT變異體或對6342 RT建模後有利之Rosetta分數篩選第二組突變。觀察到,突變E238R、E479K及K255P均導致PEIR相對於WT 6342 RT增加(圖100)。構建額外N末端截短,其保留可能與逆轉錄子RNA接觸之α-螺旋(圖101)。應注意,使用此部分截短,而非完全消除N末端結構域時,精確編輯:插入缺失比率在安裝較長編輯時有所增加(圖102)。 K. 蛋白質融合以改良逆轉錄子RT可加工性 These constructs were then co-electroporated into K562 cells with plasmids encoding Cas9, EMX1 sgRNA, and RTX3_6342S ncRNA with 10 bp or 305 bp inserts. After 72 h, cells were harvested for amplicon sequencing analysis and edits were quantified using CRISPResso. Most rationally designed mutations moderately reduced the precise editing to indel ratio (PEIR) using 10 bp or 305 bp inserts (Figure 98, top). However, it should be noted that N465K increased PEIR when a 305 bp insertion was installed, indicating that the mutation increased the processivity of the RTX_6342 reverse transcriptase or total ssDNA production (Figure 98, bottom). It was observed that the literature mutations (T306K and W313F) that improved MMLV activity decreased the PEIR of 10 bp and 305 bp inserts when using heterologous mutants of RTX3_6342 (M470K and V477F) as single mutants and in combination (Figures 98 and 99). It was also noted that RTX3_6342 V265P (a heterolog of MMLV L139P) decreased the PEIR of 305 bp inserts, had no effect on 205 bp inserts, and increased the ratio when using 405 bp inserts (Figure 99). These results may be due to the presence of different sequence features or secondary structures in each individual insert. A second set of mutations was also screened based on previously engineered marathon RT variants or favorable Rosetta scores after modeling the 6342 RT. It was observed that mutations E238R, E479K, and K255P all resulted in an increase in PEIR relative to WT 6342 RT (Figure 100). An additional N-terminal truncation was constructed that retained the α-helix that could contact the retrotranscript RNA (Figure 101). It should be noted that when this partial truncation was used, rather than completely eliminating the N-terminal domain, the exact edit:indel ratio was increased when longer edits were installed (Figure 102). K. Protein Fusions to Improve Retrotranscript RT Processability

DNA聚合酶可加工性不僅藉由聚合酶自身之突變誘發經最佳化,而且亦已知結構域融合改良可加工性(Oscorbib IP, 2020)。小DNA結合蛋白Sso7d及Sac7d已與MMLV RT融合,以將DNA及RNA模板上之PCR反應期間之可加工性改良近八倍。此外,嘗試藉由將此等蛋白質融合至MMLV RT來改良先導編輯器之編輯效率,但未觀察到編輯效率之顯著改良((Yarnall MTN, 2023)。然而,此等實驗中安裝之編輯很小,< 90 bp,且原生MMLV具有69 ± 14 bp/結合事件之可加工性,這表明將需要安裝長得多的編輯才能觀察到改良(Yarnall MTN, 2023) ( Oscorbib IP, 2020)。此說明書教示且證明了可使用逆轉錄子介導之基因體編輯來安裝長達400 bp之編輯。為了確定融合Sso7d或Sac7d是否可能改良逆轉錄子介導之基因體編輯效率,將Sac7d或Sso7d每一者之兩種變異體在N或C端融合至來自RTX3_1262及RTX2_6342之逆轉錄子RT。接著評估405 bp插入在EMX1基因座中之安裝。據觀察,將此等結構域融合至任一RT之N末端傾向於比C末端融合更穩健。此外,此等N末端融合導致使用RTX3_1262之405 bp插入物之安裝頻率增加約十倍,且使用RTX3_1262時增加約兩倍(圖103)。吾人認為此幅度差異可能歸因於RTX3_6342 (約4%)與RTX3_1262 (約0.4%)之基線可加工性或ssDNA生成之差異,且此等融合對如同RTX3_6342之RT的影響可能隨著更長插入物而更加明顯。 L. 鑑定來自RTX3_6342家族之替代逆轉錄子 DNA polymerase processivity has not only been optimized by mutation induction of the polymerase itself, but domain fusions are also known to improve processivity (Oscorbib IP, 2020). The small DNA binding proteins Sso7d and Sac7d have been fused to MMLV RT to improve processivity nearly eightfold during PCR reactions on DNA and RNA templates. Additionally, attempts were made to improve the editing efficiency of the lead editor by fusing these proteins to MMLV RT, but no significant improvement in editing efficiency was observed (Yarnall MTN, 2023). However, the edits installed in these experiments were small, < 90 bp, and the native MMLV has a processivity of 69 ± 14 bp/binding event, suggesting that much longer edits would need to be installed to observe improvements (Yarnall MTN, 2023) (Oscorbib IP, 2020). This publication teaches and demonstrates that retrotran-mediated genome editing can be used to install edits up to 400 bp long. bp editing. To determine whether fusion of Sso7d or Sac7d might improve the efficiency of retrotranscript-mediated genome editing, two variants of each of Sac7d or Sso7d were fused at the N- or C-terminus to retrotranscript RTs from RTX3_1262 and RTX2_6342. The installation of the 405 bp insertion in the EMX1 locus was then evaluated. It was observed that fusion of these domains to the N-terminus of either RT tended to be more robust than C-terminal fusions. In addition, these N-terminal fusions resulted in an approximately tenfold increase in the frequency of installation of the 405 bp insert using RTX3_1262, and an approximately two-fold increase when using RTX3_1262 (Figure 103). We believe that this difference in magnitude may be due to the difference in the frequency of installation of the 405 bp insert in RTX3_6342 (approximately 4%) and RTX3_1262. The differences in baseline processability or ssDNA generation of these fusions (approximately 0.4%) are not statistically significant, and the effects of these fusions on RTs like RTX3_6342 may be more pronounced with longer inserts. L. Identification of alternative retrotransposons from the RTX3_6342 family

除了工程改造逆轉錄子以外,篩選與某些高效逆轉錄子(例如,RTX3_6342、6943、6083及1262)同一家族中之額外逆轉錄子亦可能鑑定出具有更理想之基因編輯特徵的新穎逆轉錄子。自初始資料集中選擇額外99種獨特逆轉錄子,與RTX3_6342、6943、6083或1262 RT或其相應ncRNA具有至少70%相似性。將此等逆轉錄子選殖至表現來自CAG啟動子之RT及ncRNA之質體載體中,作為ncRNA-gRNA融合,該融合經設計以將305 bp插入h EMX1基因座中。將此質體與含有Cas9及相同h EMX1sgRNA之第二質體共電穿孔至K562細胞中,以提高編輯效率。72 h後,分離基因體DNA且擴增h EMX1基因座以用於測序。據觀察,此資料集中能夠使精確編輯頻率>1%之所有逆轉錄子均與先導逆轉錄子RTX3_6342相似,指示此家族尤其富集與逆轉錄子介導之基因編輯可相容的RT (圖104)。然而,此等RT均未顯示出與RTX3_6342相當之精確編輯頻率。 實例5:藉由自具有鈍端之模板線性化質體轉錄生成的經改良之ncRNA顯示出更高精確編輯效率 In addition to engineering retrotransposons, screening additional retrotransposons in the same family as certain highly efficient retrotransposons (e.g., RTX3_6342, 6943, 6083, and 1262) may also identify novel retrotransposons with more desirable gene editing characteristics. An additional 99 unique retrotransposons were selected from the initial dataset, with at least 70% similarity to RTX3_6342, 6943, 6083, or 1262 RTs or their corresponding ncRNAs. These retrotransposons were cloned into a plasmid vector expressing RT and ncRNA from a CAG promoter as ncRNA-gRNA fusions designed to insert 305 bp into the hEMX1 locus. This plasmid was co-electroporated into K562 cells with a second plasmid containing Cas9 and the same h EMX1 sgRNA to increase editing efficiency. After 72 h, the genomic DNA was isolated and the h EMX1 locus was amplified for sequencing. It was observed that all retrotranscripts in this dataset capable of precise editing frequencies > 1% were similar to the lead retrotranscript RTX3_6342, indicating that this family is particularly enriched for RTs compatible with retrotranscript-mediated gene editing (Figure 104). However, none of these RTs showed a precise editing frequency comparable to RTX3_6342. Example 5: Improved ncRNA generated by transcription from a linearized plasmid with a template having blunt ends shows higher precise editing efficiency

先前進行的產生ncRNA之活體外轉錄實驗使用含有3’懸垂(與T7啟動子序列在同一股上)之雙股DNA模板。設計且測試具有鈍端之新模板。參見圖52。如圖53所示,使用MessengerMAX脂質轉染試劑將四種RNA (cas9 mRNA、Eco3 RT mRNA、靶向EMX1基因座之gRNA及ncRNA)轉染至293T細胞中。3天後收穫細胞,分離基因體DNA,擴增 EMX1基因座,產生Illumina文庫且在NextSeq 上進行測序。如與使用懸垂模板產生之ncRNA的精確編輯相比,使用鈍端模板產生之ncRNA的精確編輯百分比改良5倍以上。 實例6:歸因於經改良之活體外模板最佳化及MS2髮夾之ncRNA末端保護,改良ncRNA The previously performed in vitro transcription experiments to generate ncRNA used double-stranded DNA templates containing 3' overhangs (on the same strand as the T7 promoter sequence). Design and test new templates with blunt ends. See Figure 52. As shown in Figure 53, four RNAs (cas9 mRNA, Eco3 RT mRNA, gRNA targeting the EMX1 locus, and ncRNA) were transfected into 293T cells using MessengerMAX lipid transfection reagent. Cells were harvested 3 days later, genomic DNA was isolated, the EMX1 locus was amplified, an Illumina library was generated, and sequencing was performed on NextSeq. As compared with the precise editing of ncRNA generated using the overhang template, the precise editing percentage of ncRNA generated using the blunt-end template was improved by more than 5 times. Example 6: Improved ncRNA due to improved in vitro template optimization and ncRNA end protection of MS2 hairpin

藉由在質體模板上編碼額外元件,可將該等元件添加至ncRNA中。圖53顯示經修飾之Eco3 ncRNA,其包含添加至其3’端之MS2莖環髮夾結構。如圖54所示,與由懸垂模板生成之ncRNA相比,包含3’ MS2結構之ncRNA的精確編輯導致精確編輯增加近15倍,且與由鈍端模板生成之ncRNA相比,精確編輯增加近3倍。By encoding additional elements on the plasmid template, these elements can be added to the ncRNA. Figure 53 shows a modified Eco3 ncRNA that includes an MS2 stem-loop hairpin structure added to its 3' end. As shown in Figure 54, accurate editing of the ncRNA containing the 3' MS2 structure resulted in a nearly 15-fold increase in accurate editing compared to ncRNA generated from the overhanging template, and a nearly 3-fold increase in accurate editing compared to ncRNA generated from the blunt-end template.

圖53示意圖描述為改良RNA品質而進行之修飾。第一種方法係關於如何線性化用於活體外轉錄之質體DNA模板(圖53,A)。製備兩種編碼Eco3 ncRNA之一致質體,除了ncRNA 3’處之限制酶位點不同:一個生成3’懸垂且另一個生成鈍端。具有3’懸垂之模板可導致RNA聚合酶在相對股上延伸,從而導致含有互補股序列之更長轉錄本(13)。第二種考慮係在ncRNA 3’處添加MS2髮夾(圖53,B)。逆轉錄子ncRNA在5’及3端含有相互反向互補之序列,該等序列形成莖環結構。然而,此等序列亦可能介導分子間相互作用,從而產生更高級結構。在一端添加MS2髮夾可能減輕此類分子間相互作用。MS2序列為噬菌體源性RNA適體,該適體不與哺乳動物細胞中之內源性蛋白質相互作用。Figure 53 schematically depicts modifications performed to improve RNA quality. The first approach concerns how to linearize plasmid DNA templates for in vitro transcription (Figure 53, A). Two identical plasmids encoding Eco3 ncRNA were prepared, except for the restriction enzyme site at the 3' end of the ncRNA: one produced a 3' overhang and the other produced a blunt end. Templates with 3' overhangs can cause RNA polymerase to extend on opposite strands, resulting in longer transcripts containing complementary strand sequences (13). The second consideration is to add an MS2 hairpin at the 3' end of the ncRNA (Figure 53, B). Retrotranscript ncRNAs contain mutually complementary sequences at the 5' and 3' ends, which form a stem-loop structure. However, these sequences may also mediate intermolecular interactions, thereby producing higher-order structures. Adding an MS2 hairpin at one end may reduce these molecular interactions. The MS2 sequence is a bacteriophage-derived RNA aptamer that does not interact with endogenous proteins in mammalian cells.

圖54顯示在基因編輯分析中測試此等新RNA。藉由Lipofectamine MessengerMAX將全RNA系統之四種組分(RT mRNA、Cas9 mRNA、ncRNA、sgRNA)遞送至HEK293T細胞中。全RNA以如圖所示之固定量經轉染。使用具有含有3’懸垂之模板的RNA產生1.35%精確編輯。使用含有鈍端之模板將精確編輯增加至5.94%。向ncRNA之3’端添加MS2莖環,進一步將精確編輯增加至12.39%。在右圖中,顯示各別條件下之插入缺失頻率。總而言之,藉由最佳化活體外轉錄模板及MS2髮夾之3’端保護,精確編輯增加九 (9)倍。 實例7:具有帽及/或尾之經修飾ncRNA Figure 54 shows testing of these new RNAs in a gene editing assay. The four components of the total RNA system (RT mRNA, Cas9 mRNA, ncRNA, sgRNA) were delivered to HEK293T cells by Lipofectamine MessengerMAX. Total RNA was transfected in fixed amounts as shown. Using RNA with a template containing a 3' overhang produced 1.35% precise edits. Using a template containing a blunt end increased precise edits to 5.94%. Adding an MS2 stem loop to the 3' end of the ncRNA further increased precise edits to 12.39%. In the right figure, the indel frequency under the respective conditions is shown. In summary, precise edits were increased nine (9) times by optimizing the in vitro transcription template and 3' end protection of the MS2 hairpin. Example 7: Modified ncRNA with a cap and/or tail

可藉由在5’端添加帽結構及/或在3’端添加聚(A)尾來改變ncRNA效能,如圖55所描繪。如圖56所示,如與無帽/無尾ncRNA相比,使用由帽及尾提供之任一或兩種保護的ncRNA會降低插入缺失。在此實驗中,藉由Lipofectamine MessengerMAX將4組分全RNA系統(RT mRNA + Cas9 mRNA + ncRNA-sgRNA + sgRNA)遞送至HEK293T細胞中。全RNA以固定量RT mRNA 100 ng、ncRNA-sgRNA 400 ng、Cas9 mRNA 100 ng及sgRNA 5 ng經轉染。ncRNA-gRNA融合為加帽的(+帽–尾)或加聚A尾的(-帽+尾),或同時加帽及加聚A尾(+帽+尾)。使用無末端保護之RNA(-帽–尾)會產生約4.5%之精確編輯,且編輯依賴於逆轉錄子,因為RT缺乏會取消精確編輯。與無帽及尾之情況相比,使用具有帽及尾保護中之任一者或兩者的RNA會產生較低精確編輯(左圖),但降低插入缺失(右圖)。 實例8:使用Cas9切口酶進行精確編輯 ncRNA potency can be altered by adding a cap structure at the 5' end and/or a poly(A) tail at the 3' end, as depicted in Figure 55. As shown in Figure 56, the use of ncRNAs with either or both protections provided by a cap and a tail reduces indels compared to uncapped/tailed ncRNAs. In this experiment, a 4-component total RNA system (RT mRNA + Cas9 mRNA + ncRNA-sgRNA + sgRNA) was delivered to HEK293T cells by Lipofectamine MessengerMAX. Total RNA was transfected at fixed amounts of RT mRNA 100 ng, ncRNA-sgRNA 400 ng, Cas9 mRNA 100 ng, and sgRNA 5 ng. ncRNA-gRNA fusions were capped (+cap–tail) or poly(A) tailed (-cap+tail), or both capped and poly(A) tailed (+cap+tail). Using RNA without end protection (-cap-tail) resulted in approximately 4.5% precise editing, and editing was dependent on the retrotranscript, as RT deficiency abolished precise editing. Using RNA with either or both cap and tail protection resulted in less precise editing (left), but reduced indels (right), compared to the case without cap and tail. Example 8: Precise editing using Cas9 nickase

野生型(WT) Cas9核酸酶生成雙股斷裂 (DSB),DSB可能藉由生成更大缺失或染色體易位而潛在地增加基因體不穩定性。此外,DSB誘導p53介導之DNA損傷反應路徑,該路徑通常會損害經編輯細胞之生理學。任一Cas9核酸酶結構域中之單點突變均可生成僅切割標靶位點之一個股之切口酶。D10A變異體使用完整HNH核酸酶結構域切割標靶股且H840A變異體使用完整RuvC結構域切割非標靶股。切口可起始同源定向修復(HDR),且引起比DSB更少之突變誘發末端接合。此等特徵引起人們對在逆轉錄子介導之精確編輯中使用切口酶之極大興趣。Wild-type (WT) Cas9 nucleases generate double-strand breaks (DSBs) that can potentially increase genomic instability by generating larger deletions or chromosomal translocations. In addition, DSBs induce the p53-mediated DNA damage response pathway, which is usually impairing the physiology of the edited cell. Single-point mutations in either Cas9 nuclease domain can generate a nickase that only cleaves one strand of the target site. The D10A variant uses the complete HNH nuclease domain to cleave the target strand and the H840A variant uses the complete RuvC domain to cleave the non-target strand. Nicking can initiate homology-directed repair (HDR) and causes less mutation-inducing end joining than DSBs. These features have generated considerable interest in the use of nickases for precise retrotran-mediated editing.

圖73顯示藉由Lipofectamine MessengerMAX遞送至HEK293T細胞的4組分全RNA系統(RT mRNA + Cas9 mRNA + ncRNA + sgRNA)中之Cas9切口酶活性之測試結果。D10A及H840A切口酶與WT Cas9平行使用,且用經分離之ncRNA或ncRNA-sgRNA融合對該等切口酶進行測試。全RNA以固定量RT mRNA 100 ng、ncRNA/ncRNA-sgRNA 400 ng、Cas9變異體mRNA 100 ng及sgRNA 5 ng經轉染。當ncRNA或ncRNA-sgRNA與額外sgRNA一起使用時,Cas9 WT (野生型)之精確編輯為約9%。對於切割非標靶股之Cas9 H480A突變體,在任何條件下幾乎均看不到高於背景之活性。使用對標靶股刻切口之Cas9 D10A突變體,在經分離及融合ncRNA及sgRNA形式中均觀察到約0.1%頻率之精確編輯。右圖顯示左圖之各別條件下之插入缺失。如預期,對於額外sgRNA存在或不存在下之任一切口酶而言,插入缺失均極低,且僅WT Cas9生成顯著插入/缺失。Figure 73 shows the results of testing Cas9 nickase activity in a 4-component total RNA system (RT mRNA + Cas9 mRNA + ncRNA + sgRNA) delivered to HEK293T cells by Lipofectamine MessengerMAX. D10A and H840A nickases were used in parallel with WT Cas9 and tested with isolated ncRNA or ncRNA-sgRNA fusions. Total RNA was transfected with fixed amounts of RT mRNA 100 ng, ncRNA/ncRNA-sgRNA 400 ng, Cas9 variant mRNA 100 ng, and sgRNA 5 ng. When ncRNA or ncRNA-sgRNA was used with additional sgRNA, the precise editing of Cas9 WT (wild type) was about 9%. For the Cas9 H480A mutant, which nicks the off-target strand, little activity above background was seen under any condition. Using the Cas9 D10A mutant, which nicks the on-target strand, precise edits were observed at a frequency of approximately 0.1% in both isolated and fused ncRNA and sgRNA formats. The right panel shows indels under the respective conditions in the left panel. As expected, indels were extremely low for either nicking enzyme in the presence or absence of additional sgRNA, and only WT Cas9 generated significant indels.

圖74顯示在全RNA系統中缺乏一種組分(Cas9變異體或ncRNA/ncRNA-sgRNA或RT)之陰性對照。其在具有或不具有額外sgRNA之情況下進行測試。頂部圖中之所有對照均未顯示超過背景水準之精確編輯活性,支持由Cas9 WT或Cas9切口酶實現之精確編輯取決於全RNA系統之每種組分。 實例9:雙sgRNA實現比單sgRNA更高之精確編輯 Figure 74 shows negative controls lacking one component (Cas9 variant or ncRNA/ncRNA-sgRNA or RT) in the all-RNA system. They were tested with or without additional sgRNAs. All controls in the top panel showed no precise editing activity above background levels, supporting that precise editing achieved by Cas9 WT or Cas9 nickase depends on each component of the all-RNA system. Example 9: Dual sgRNAs achieve higher precise editing than single sgRNAs

據推測,在標靶位點附近使用多種sgRNA可能在切割後生成更多游離DNA末端且募集更多修復機制,這將提供更高之精確編輯機會。發現兩對sgRNA來測試此假設,緊接插入左側之sgRNA1與EMX1基因上之插入右側之sgRNA2或sgRNA3組合(圖68右側示意圖)。雙sgRNA與單sgRNA之測試結果在圖37中示出。藉由脂質轉染將全RNA系統中之RTX2042逆轉錄子遞送至HEK293T細胞。在既定量之RT mRNA、Cas9 mRNA及ncRNA-sgRNA下,添加額外sgRNA。在所有條件下,使用相同ncRNA-sgRNA,其模板編碼Cas9 sgRNA之非標靶股上的25 bp插入。使用一種額外sgRNA (#1)時,精確編輯為0.27%,在第二sgRNA存在下,精確編輯增加高達5倍至1.33% (#2)或1.13% (#3)。在底部圖中,顯示各別條件下之插入缺失頻率。It is speculated that the use of multiple sgRNAs near the target site may generate more free DNA ends after cutting and recruit more repair mechanisms, which will provide a higher chance of accurate editing. Two pairs of sgRNAs were found to test this hypothesis, sgRNA1 inserted immediately on the left side and sgRNA2 or sgRNA3 inserted on the right side of the EMX1 gene (schematic diagram on the right side of Figure 68). The test results of double sgRNA and single sgRNA are shown in Figure 37. The RTX2042 retrotransposons in the all-RNA system were delivered to HEK293T cells by lipofection. Additional sgRNAs were added to the given amount of RT mRNA, Cas9 mRNA and ncRNA-sgRNA. Under all conditions, the same ncRNA-sgRNA was used, and its template encoded a 25 bp insertion on the non-target strand of the Cas9 sgRNA. When one additional sgRNA (#1) was used, precise editing was 0.27%, and in the presence of a second sgRNA, precise editing increased up to 5-fold to 1.33% (#2) or 1.13% (#3). In the bottom graph, the indel frequencies under the respective conditions are shown.

此等資料表明,靶向位點附近之雙sgRNA可能增強由逆轉錄子介導之精確編輯。 實例10. 使用更大插入物大小之編輯 These data suggest that dual sgRNAs targeting nearby sites may enhance precise editing mediated by retrotranscripts. Example 10. Editing using larger insert sizes

迄今為止,大多數實驗均藉由逆轉錄子介導之精確編輯在EMX1基因3’處測試10 bp插入。藉由設計插入物長度逐漸增加之新ncRNA來檢查更長插入之可行性:Eco3、Aco1、RTX-2042及RTX-6943逆轉錄子分別為25、50、75及100 bp。插入之序列為隨機選擇的,該等序列與人類基因體不共享任何同源性,且所有插入物之GC含量均為50%,未生成偏差。在既定量之RT mRNA (100 ng)及Cas9 mRNA (100 ng)下,添加400 ng具有介於10至100 bp範圍內之不同模板長度的ncRNA-sgRNA融合且經由脂質轉染遞送至HEK 293T細胞中。總之,在插入物大小中,所有逆轉錄子之效率為可相當的。對於100 bp插入,Eco3之精確編輯達到1.04%,Aco1之精確編輯達到1.52%,RTX2042之精確編輯達到1.37%,且RTX6943逆轉錄子之精確編輯達到0.05%頻率(圖59A)。在各圖之底部列中,各別條件之插入缺失頻率顯示出與精確編輯非常相似之趨勢(圖59B)。To date, most experiments have tested 10 bp insertions at the 3’ of the EMX1 gene by retrotranscript-mediated precise editing. The feasibility of longer inserts was examined by designing new ncRNAs with increasing insert lengths: 25, 50, 75 and 100 bp for Eco3, Aco1, RTX-2042 and RTX-6943 retrotranscripts, respectively. The inserted sequences were randomly selected, sharing no homology with the human genome, and all inserts had a GC content of 50%, generating no bias. At given amounts of RT mRNA (100 ng) and Cas9 mRNA (100 ng), 400 ng of ncRNA-sgRNA fusions with different template lengths ranging from 10 to 100 bp were added and delivered to HEK 293T cells by lipofection. In summary, the efficiency of all retrotranscripts was comparable across insert sizes. For 100 bp inserts, precise editing reached 1.04% for Eco3, 1.52% for Aco1, 1.37% for RTX2042, and 0.05% frequency for RTX6943 retrotranscript ( FIG. 59A ). In the bottom row of each figure, the indel frequencies of the individual conditions showed very similar trends to precise editing ( FIG. 59B ).

當將sgRNA摻加添加至上述條件中時(圖60A及60B),整體活性增加且活性不會因更長插入而受到損害。對於100 bp插入,精確編輯比無sgRNA摻加時增加1.8~8倍,Eco3之精確編輯為1.82%,Aco1之精確編輯為4.50%,RTX-2042之精確編輯為4.40%,且RTX-6943之精確編輯為0.4%。在底部圖中,各別條件下之插入缺失頻率亦增加。When sgRNA spike-in was added to the above conditions (Figures 60A and 60B), overall activity increased and activity was not compromised by longer inserts. For 100 bp inserts, precise editing increased 1.8-8 fold compared to no sgRNA spike-in, with precise editing of 1.82% for Eco3, 4.50% for Aco1, 4.40% for RTX-2042, and 0.4% for RTX-6943. In the bottom graph, the frequency of indels under the respective conditions also increased.

與Cas9核酸酶結合,逆轉錄子可將100 bp精確地插入基因體之標靶位點處,效率高達4.5%。 實例11. 藉由逆轉錄子介導之精確編輯插入整個基因 Combined with Cas9 nuclease, retrotransposons can precisely insert 100 bp into the target site of the genome with an efficiency of up to 4.5%. Example 11. Insertion of an entire gene by precise editing mediated by retrotransposons

校正大突變熱點或將中等大小之基因精確插入基因體中會拓寬逆轉錄子介導之基因編輯的適用性。為了探討這一點,將具有其自身啟動子及聚A信號之GFP基因(1267 bp,圖61A)整合至逆轉錄子之ncRNA中,且以三種逆轉錄子Eco3、Aco1及RTX6943執行基因編輯分析。作為插入之讀數,藉由qPCR擴增EMX1宿主與插入序列之間之5’及3’接合處。ΔCt (無RT樣品之Ct - 具有RT樣品之Ct)值用於整合之相對定量。ΔCt值愈大,指示樣品中之插入頻率愈高。對於所有三種逆轉錄子,均可偵測到5’及3’接合處之擴增子顯著高於背景水準(圖61B)。3’接合處之擴增子比5’接合處之擴增子更豐富,可能因為逆轉錄產生之msDNA的5’端富集,促進了3’處之插入。圖61D顯示Aco1逆轉錄子之PCR產物,藉由Agilent Tapestation高靈敏度D1000 screentape進行分析。在RT (+RT樣品,留在凝膠上)存在下,在經編輯之等位基因之5’及3’接合處均偵測到預期大小之擴增子,如箭頭所示。在RT不存在下(-RT樣品,凝膠右側),未偵測到經編輯之等位基因。三角形指示未編輯之等位基因之預期擴增子,其同時存在於-RT及+RT樣品中。此外,圖61C顯示qPCR資料,描繪兩種逆轉錄子在EMX1基因座中之5’或3’接合處插入GFP。將經工程改造以含有GFP基因插入物之ncRNA-sgRNA與每種逆轉錄子之RT mRNA、Cas9 mRNA一起在具有/不具有額外sgRNA的情況下轉染至HEK293T細胞。藉由表中所指示之引子對分析基因體DNA,以偵測經編輯之等位基因上的5’或3’接合處之GFP基因插入。藉由自EMX1 Ct值中減去接合Ct值來獲得ΔCt,用於EMX1基因座處的GFP插入之相對定量。對於所有兩種逆轉錄子,均可偵測到5’及3’接合之擴增子顯著高於背景水準。3’接合之擴增子比5’接合之擴增子略微更豐富,可能因為逆轉錄產生之msDNA的5’端富集更多,促進了3’處之插入。因此,兩種不同逆轉錄子已顯示出藉由逆轉錄子編輯插入整個基因。Correction of large mutation hotspots or precise insertion of medium-sized genes into the genome will broaden the applicability of retrotranscript-mediated gene editing. To explore this point, the GFP gene (1267 bp, Figure 61A) with its own promoter and poly A signal was integrated into the ncRNA of the retrotranscript, and gene editing analysis was performed with three retrotranscripts Eco3, Aco1 and RTX6943. As a readout of insertion, the 5' and 3' junctions between the EMX1 host and the insertion sequence were amplified by qPCR. ΔCt (Ct of samples without RT - Ct of samples with RT) values are used for relative quantification of integration. The larger the ΔCt value, the higher the insertion frequency in the indicated sample. For all three retrotranscripts, amplicon at the 5' and 3' junctions were detected significantly above background levels (Figure 61B). The amplicon at the 3' junction was more abundant than the amplicon at the 5' junction, probably because the 5' end of the msDNA produced by the retrotranscription was enriched, promoting insertion at the 3'. Figure 61D shows the PCR product of the Aco1 retrotranscript, analyzed by Agilent Tapestation High Sensitivity D1000 screentape. In the presence of RT (+RT samples, left on the gel), amplicon of the expected size was detected at both the 5' and 3' junctions of the edited allele, as indicated by the arrows. In the absence of RT (-RT samples, right side of the gel), the edited allele was not detected. The triangle indicates the expected amplicon of the unedited allele, which is present in both -RT and +RT samples. In addition, Figure 61C shows qPCR data, depicting the insertion of GFP at the 5' or 3' junction of two retrotransposons in the EMX1 locus. The ncRNA-sgRNA engineered to contain the GFP gene insert was transfected into HEK293T cells with/without additional sgRNA together with the RT mRNA and Cas9 mRNA of each retrotransposon. The genome DNA was analyzed by the primers indicated in the table to detect the GFP gene insertion at the 5' or 3' junction on the edited allele. ΔCt was obtained by subtracting the junction Ct value from the EMX1 Ct value for the relative quantification of GFP insertion at the EMX1 locus. For both retrotranscripts, both 5' and 3' spliced amplicon were detected significantly above background levels. The 3' spliced amplicon was slightly more abundant than the 5' spliced amplicon, probably because the 5' end of the msDNA generated by retrotranscription is more enriched, promoting insertion at the 3' end. Thus, two different retrotranscripts have been shown to insert entire genes by retrotran editing.

總之,此等結果證明了藉由逆轉錄子介導之精確編輯插入整個基因之可行性。 實例12. 經融合RNA中之sgRNA及ncRNA的次序 Together, these results demonstrate the feasibility of inserting entire genes via retrotranscript-mediated precise editing. Example 12. Sequence of sgRNA and ncRNA in fusion RNA

接著,解決融合中之sgRNA及ncRNA的次序。該次序可能在確保兩種酶(Cas9及逆轉錄子RT)在一種融合受質上之活性方面發揮重要作用。比較A. ncRNA-sgRNA及B. sgRNA-ncRNA次序以用Eco3、Aco1及RTX-2042逆轉錄子進行精確編輯(圖62A)。對於Eco3及RTX-2042,與ncRNA-sgRNA次序相比,sgRNA-ncRNA次序在精確編輯(分別為6倍及3倍)及插入缺失(分別為3倍及2倍)方面允許顯著更高活性。該相同次序為Aco1提供更高插入缺失(2倍),表明在融合5’處之sgRNA允許更高Cas9活性。然而,用Aco1進行之精確編輯在兩種次序之間為可相當的。Next, the order of sgRNA and ncRNA in the fusion was resolved. This order may play an important role in ensuring the activity of two enzymes (Cas9 and retrotranscript RT) on one fusion substrate. Comparison of A. ncRNA-sgRNA and B. sgRNA-ncRNA orders for precise editing with Eco3, Aco1 and RTX-2042 retrotranscripts (Figure 62A). For Eco3 and RTX-2042, the sgRNA-ncRNA order allowed significantly higher activity in precise editing (6-fold and 3-fold, respectively) and indels (3-fold and 2-fold, respectively) compared to the ncRNA-sgRNA order. The same order provided higher indels (2-fold) for Aco1, indicating that the sgRNA at the 5' of the fusion allows higher Cas9 activity. However, precise editing with Aco1 was comparable between the two orders.

當sgRNA摻加添加至上述條件中時,整體活性略有增加(圖62B)。添加sgRNA摻加會補償使用ncRNA-sgRNA時之較差Cas9活性(與使用sgRNA-ncRNA時之Cas9活性相比),因此精確編輯及插入缺失變得與所有逆轉錄子之逆序可相當。When sgRNA spike-in was added to the above conditions, overall activity increased slightly (Figure 62B). The addition of sgRNA spike-in compensated for the poor Cas9 activity when using ncRNA-sgRNA (compared to Cas9 activity when using sgRNA-ncRNA), so that precise editing and indels became comparable in reverse order for all retrotranscripts.

精確編輯依賴於RT及Cas9之存在(圖63)。當移除此等組分中之任一者時,精確編輯在ncRNA及sgRNA融合之任一次序下無法偵測到或處於背景水準下(圖63頂部圖)。這與所有三種逆轉錄子一致。在底部圖中,顯示各別條件下之插入缺失頻率(圖63底部圖)。 實例13. ncRNA及sgRNA之分離 Precise editing depends on the presence of RT and Cas9 (Figure 63). When either of these components is removed, precise editing is undetectable or at background levels in any sequence of ncRNA and sgRNA fusions (Figure 63 top panel). This is consistent with all three retrotranscripts. In the bottom panel, the indel frequency under each condition is shown (Figure 63 bottom panel). Example 13. Isolation of ncRNA and sgRNA

在ncRNA及sgRNA融合之任一次序中添加sgRNA摻加會增加精確編輯及插入缺失(圖62A、62B及圖63)。此等資料表明,ncRNA及Cas9核酸酶之sgRNA的物理融合可能為Cas9活性及因此精確編輯之限制因素,因此,ncRNA及sgRNA之分離可能為精確編輯之較佳策略。Adding sgRNA incorporation in either order of ncRNA and sgRNA fusion increased precise editing and indels (Figures 62A, 62B and Figure 63). These data suggest that physical fusion of ncRNA and sgRNA of the Cas9 nuclease may be a limiting factor for Cas9 activity and therefore precise editing, and therefore separation of ncRNA and sgRNA may be a better strategy for precise editing.

在Eco3、Aco1、RTX-2042及RTX-6943逆轉錄子系統中測試sgRNA與ncRNA分離對編輯之影響,在A. ncRNA-sgRNA或B. sgRNA-ncRNA融合與C. 具有或不具有額外sgRNA的經分離之ncRNA及sgRNA之間比較編輯效率(圖64)。使用Eco3時,在精確編輯中,分離效果顯著優於(約5倍) ncRNA及sgRNA之任何融合次序。在未向經分離ncRNA中添加sgRNA之情況下,未如預期觀察到精確編輯。另一方面,Aco1及RTX-2042 ncRNA之經分離形式實現了與融合RNA可相當之精確編輯水準。RTX-6943之經分離ncRNA顯示出比融合形式更低之活性,但此逆轉錄子之整體活性比其他逆轉錄子低一個量級。The effect of separation of sgRNA and ncRNA on editing was tested in the Eco3, Aco1, RTX-2042 and RTX-6943 retrotranscript system, and editing efficiency was compared between A. ncRNA-sgRNA or B. sgRNA-ncRNA fusions and C. isolated ncRNA and sgRNA with or without additional sgRNA (Figure 64). When using Eco3, the separation effect was significantly better (about 5-fold) in precise editing than any fusion order of ncRNA and sgRNA. In the case where no sgRNA was added to the isolated ncRNA, precise editing was not observed as expected. On the other hand, the isolated forms of Aco1 and RTX-2042 ncRNAs achieved a level of precise editing comparable to that of the fusion RNA. The isolated ncRNA of RTX-6943 showed lower activity than the fusion form, but the overall activity of this retrotran was an order of magnitude lower than other retrotranscripts.

此等資料表明,ncRNA及sgRNA之分離可能實現與ncRNA及sgRNA融合可相當甚至更高之精確編輯,視逆轉錄子類型而定。 實例14. 逆轉錄子介導之精確編輯發生於另一基因體基因座 AAVS1 These data suggest that separation of ncRNA and sgRNA may enable comparable or even higher precision editing than fusion of ncRNA and sgRNA, depending on the retrotranscript type. Example 14. Retrotranscript-mediated precise editing occurs at another genomic locus, AAVS1

所有先前基因編輯分析均在EMX1基因座處進行。使用Eco3、Aco1、RTX2042及RTX6943逆轉錄子時,AAVS1基因座經靶向用於精確編輯。PPP1R12C (蛋白磷酸酶1調節次單元12C)基因之內含子1中的AAVS1基因座係稱為「安全港」,因為其破壞不會對細胞產生不利影響,且基因座之穩健轉錄允許穩定表現外源插入之基因。在來自每種逆轉錄子之ncRNA的msD莖環內,併入25 bp插入供體序列,且靶向AAVS1之sgRNA與經工程改造之ncRNA的3’端融合。All previous gene editing analyses were performed at the EMX1 locus. The AAVS1 locus was targeted for precise editing using the Eco3, Aco1, RTX2042, and RTX6943 retrotranscripts. The AAVS1 locus in intron 1 of the PPP1R12C (protein phosphatase 1 regulatory subunit 12C) gene is referred to as a "safe harbor" because its disruption does not adversely affect the cell, and robust transcription of the locus allows for stable expression of exogenously inserted genes. Within the msD stem loop of the ncRNA from each retrotranscript, a 25 bp insertion donor sequence was incorporated, and the sgRNA targeting AAVS1 was fused to the 3' end of the engineered ncRNA.

圖65顯示四種逆轉錄子編輯AAVS1基因座之測試結果。對於所有四種逆轉錄子,精確編輯在額外sgRNA存在下有所增強且介於0.73%至2.05%範圍內。在右側,各別條件下之插入缺失頻率具有類似之精確編輯趨勢。移除RT、ncRNA-sgRNA或Cas9 mRNA會取消精確編輯,顯示逆轉錄子及Cas9編輯特異性(圖66),而插入缺失在底部圖中係穩健的。 實例15. 測試模板股偏差以進行精確編輯 Figure 65 shows the results of testing four retrotranscripts to edit the AAVS1 locus. For all four retrotranscripts, precise editing was enhanced in the presence of additional sgRNA and ranged from 0.73% to 2.05%. On the right, the indel frequencies under the respective conditions had similar trends for precise editing. Removing RT, ncRNA-sgRNA, or Cas9 mRNA abolished precise editing, showing retrotranscript and Cas9 editing specificity (Figure 66), while indels were robust in the bottom graph. Example 15. Testing template strand bias for precise editing

Cas9使用不同核酸內切酶結構域在sgRNA靶向股或非靶向股上進行裂解。切割後Cas9之每個核酸酶結構域自DNA受質中釋放之時間不同,可能影響單股供體與斷裂之DNA及修復路徑如何相互作用(Richardson, 2016)。Cas9 uses different nuclease domains to cleave on the sgRNA-targeted strand or the non-targeted strand. The time at which each nuclease domain of Cas9 is released from the DNA substrate after cleavage is different, which may affect how the single-stranded donor interacts with the broken DNA and the repair pathway (Richardson, 2016).

相對於Cas9 sgRNA靶向股在相對 股上測試供體模板之結果在圖67中示出。藉由脂質轉染將全RNA系統中之Eco3及Aco1逆轉錄子遞送至HEK293T細胞。添加既定量之RT mRNA、Cas9 mRNA、靶向EMX1基因座之額外sgRNA以及該圖下方所述之適當ncRNA-sgRNA。所測試之所有供體模板均含有25 bp插入物。先前實驗中所用之所有ncRNA均編碼由Cas9 sgRNA靶向且由字母T表示之股的供體模板。含有Cas9靶向股之反向互補序列的ncRNA表示為NT。對於Eco3,在額外sgRNA存在下,當模板位於靶向股上時,精確編輯為1.67%且當模板位於非靶向股上時,精確編輯為0.85%。對於Aco1,在額外sgRNA存在下,當模板位於靶向股上時,精確編輯為2.96%且當模板位於非靶向股上時,精確編輯為3.40%。在各圖之底部列中,顯示各別條件下之插入缺失頻率。Results of testing donor templates on opposite strands relative to the Cas9 sgRNA targeted strand are shown in Figure 67. Eco3 and Aco1 retrotranscripts in the all-RNA system were delivered to HEK293T cells by lipofection. A given amount of RT mRNA, Cas9 mRNA, additional sgRNA targeting the EMX1 locus, and the appropriate ncRNA-sgRNA described below the figure were added. All donor templates tested contained a 25 bp insert. All ncRNAs used in previous experiments encode donor templates for the strand targeted by the Cas9 sgRNA and represented by the letter T. ncRNAs containing reverse complementary sequences of the Cas9 targeted strand are represented as NT. For Eco3, in the presence of additional sgRNA, the precise edits were 1.67% when the template was on the targeted strand and 0.85% when the template was on the non-targeted strand. For Aco1, in the presence of additional sgRNA, the precise edits were 2.96% when the template was on the targeted strand and 3.40% when the template was on the non-targeted strand. In the bottom column of each figure, the indel frequencies under the respective conditions are shown.

至少在EMX1基因座中,編碼Cas9靶向股或非靶向股之逆轉錄子ncRNA產生類似水準之精確編輯。 實例16. 藉由組合雙sgRNA改良逆轉錄子介導之Cas9切口酶精確編輯 At least in the EMX1 locus, retrotranscript ncRNAs encoding either the Cas9 targeting strand or the non-targeting strand produced similar levels of precise editing. Example 16. Improvement of retrotranscript-mediated Cas9 nickase precise editing by combining dual sgRNAs

在先前研究中觀察到逆轉錄子介導之使用Cas9切口酶之精確編輯,但效率較低(約0.1%)。這與其他研究一致,即單一切口不會有效地誘導基因編輯,因為所誘導之單股斷裂由鹼基切除修復路徑非常有效地修復(Nakajuma K, 2018) (Ran FA, 2013)。相較之下,在Cas9切口酶附近組合一對PAM (原間隔基相鄰模體)輸出sgRNA會引起有效基因編輯(精確編輯及插入缺失)( Ran FA, 2013)。此雙切口方法誘導位點特異性雙股斷裂,導致標靶基因插入缺失,但脫靶比Cas9 WT顯著減少50~1000倍。In previous studies, retrotranscript-mediated precise editing using the Cas9 nickase was observed, but the efficiency was low (about 0.1%). This is consistent with other studies that a single nick does not efficiently induce gene editing because the induced single-strand break is very efficiently repaired by the base excision repair pathway (Nakajuma K, 2018) (Ran FA, 2013). In contrast, exporting sgRNA with a pair of PAMs (protospacer adjacent motifs) near the Cas9 nickase results in efficient gene editing (precise editing and indels) (Ran FA, 2013). This double-nicking approach induces site-specific double-strand breaks, resulting in indels of the target gene, but off-targets are significantly reduced by 50-1000 times compared to Cas9 WT.

無論雙切口方法是否支持逆轉錄子介導之精確編輯,在EMX1之最後一個外顯子的3’處選擇三種sgRNA (圖70)。用Cas9 D10A切口酶測試sgRNA 1+3、2+3及1+2之組合,同時使用sgRNA1用Cas9 WT進行測試。全RNA各自以固定量RT mRNA 100 ng、ncRNA-sgRNA 400 ng、Cas9變異體mRNA 100 ng及sgRNA 5 ng經轉染。使用切口酶時之sgRNA 1+3組合以3%效率給出最高精確編輯(圖70,左上圖)。使用雙sgRNA時之效率比使用單sgRNA實現之效率高約30倍,但仍低於sgRNA1引導之Cas9 WT (13%編輯效率)。編輯依賴於逆轉錄子系統,因為RT缺乏會取消編輯。精確編輯不成比率,而是與插入缺失頻率相關(圖54,左下圖)。 實例17. 新穎逆轉錄子R6083能夠在全RNA系統中插入多達100 bp Regardless of whether the double nicking approach supports retrotranscript-mediated precise editing, three sgRNAs were selected at the 3' of the last exon of EMX1 (Figure 70). Combinations of sgRNA 1+3, 2+3, and 1+2 were tested with Cas9 D10A nickase, while Cas9 WT was tested using sgRNA1. All RNA was transfected with a fixed amount of RT mRNA 100 ng, ncRNA-sgRNA 400 ng, Cas9 variant mRNA 100 ng, and sgRNA 5 ng. The sgRNA 1+3 combination gave the highest precise editing with an efficiency of 3% when using nickase (Figure 70, upper left). The efficiency when using dual sgRNAs was approximately 30 times higher than that achieved using a single sgRNA, but was still lower than Cas9 WT guided by sgRNA1 (13% editing efficiency). Editing is dependent on the retrotranscript system, as RT deficiency abolishes editing. The exact editing is not a ratio, but rather correlates with the indel frequency (Figure 54, bottom left). Example 17. The novel retrotranscript R6083 is able to insert up to 100 bp in the all-RNA system

圖71顯示在全RNA系統中測試不同模板長度/ncRNA及sgRNA分離或與R6083逆轉錄子融合之結果。藉由脂質轉染將全RNA遞送至HEK293T細胞。在既定量之RT mRNA及Cas9 mRNA下,添加具有介於25至100 bp範圍內之不同模板長度的ncRNA-sgRNA融合或具有25 bp插入之ncRNA。觀察到向EMX1基因座中精確插入100 bp為約2%效率(圖71左上圖)。經分離之ncRNA展現比具有25 bp插入之ncRNA-sgRNA融合更低的活性。精確編輯取決於RT及ncRNA之存在,如頂部右圖所示。在各圖之底部列中,顯示各別條件下之插入缺失頻率。Figure 71 shows the results of testing different template lengths/ncRNA and sgRNA separation or fusion with R6083 retrotransposons in a full RNA system. Full RNA was delivered to HEK293T cells by lipofection. Under a given amount of RT mRNA and Cas9 mRNA, ncRNA-sgRNA fusions with different template lengths ranging from 25 to 100 bp or ncRNAs with 25 bp insertions were added. It was observed that the precise insertion of 100 bp into the EMX1 locus was about 2% efficient (Figure 71 upper left). The separated ncRNA exhibited lower activity than the ncRNA-sgRNA fusion with 25 bp insertion. Accurate editing depends on the presence of RT and ncRNA, as shown in the top right figure. In the bottom row of each figure, the insertion and deletion frequencies under the respective conditions are shown.

圖72顯示R6083逆轉錄子能夠將25 bp模板插入全RNA系統中之AAVS1基因座中。使用額外sgRNA,觀察到3%之編輯效率,且活性完全依賴於RT或ncRNA之存在(圖72左圖)。 實例18. Cas9-RT融合與經分離之Cas9及RT Figure 72 shows that the R6083 retrotranscript is able to insert a 25 bp template into the AAVS1 locus in an all-RNA system. Using additional sgRNAs, 3% editing efficiency was observed, and activity was completely dependent on the presence of RT or ncRNA (Figure 72 left). Example 18. Cas9-RT fusion and isolated Cas9 and RT

在所有先前研究中,單獨遞送Cas9及RT mRNA。物理融合之Cas9-RT可能有益於兩種酶之細胞隔室中之共定位,而且因干擾每種酶之活性而為有害的。為了解決這一點,使Cas9及Eco3 RT藉由SGGSx2-XTEN-SGGSx2 (SEQ ID NO:19932)柔性連接體融合,且與經分離形式並行測試精確基因編輯。使用對應於200 ng Cas9-RT融合之相同莫耳濃度之Cas9 (150 ng)及RT mRNA (50 ng)進行分離,且比較融合或經分離之酶針對單獨ncRNA (圖73之左圖左側)或ncRNA-sgRNA融合(圖73之右圖右側)之編輯效率。使用ncRNA (3倍)及ncRNA-sgRNA (1.5倍)兩者時,經分離之Cas9及RT實現比其融合時更高之編輯效率。當ncRNA與sgRNA分離時,經分離之酶的更高編輯更加明顯,這表明經分離之酶更靈活地搜索各別標靶。與精確編輯一致,在ncRNA及ncRNA-sgRNA融合中,融合酶之插入缺失均較低。此等結果表明,呈融合形式之Cas9及RT可能相互約束,即使其間具有柔性連接體。 實例19. 藉由帽及尾進行之逆轉錄子ncRNA末端保護(新實驗) In all previous studies, Cas9 and RT mRNA were delivered separately. Physically fused Cas9-RT may be beneficial for the colocalization of the two enzymes in the cellular compartment, and is harmful by interfering with the activity of each enzyme. To address this, Cas9 and Eco3 RT were fused by SGGSx2-XTEN-SGGSx2 (SEQ ID NO: 19932) flexible linker, and precise gene editing was tested in parallel with the separated form. Separation was performed using the same molar concentration of Cas9 (150 ng) and RT mRNA (50 ng) corresponding to 200 ng Cas9-RT fusion, and the editing efficiency of the fused or separated enzymes was compared for single ncRNA (left side of the left figure of Figure 73) or ncRNA-sgRNA fusion (right side of the right figure of Figure 73). Separated Cas9 and RT achieved higher editing efficiencies than when they were fused when both ncRNA (3-fold) and ncRNA-sgRNA (1.5-fold) were used. The higher editing of the separated enzyme was more pronounced when the ncRNA was separated from the sgRNA, suggesting that the separated enzymes searched the respective targets more flexibly. Consistent with precise editing, the fusion enzymes had lower indels in both ncRNA and ncRNA-sgRNA fusions. These results suggest that Cas9 and RT in fusion form can constrain each other, even with a flexible linker in between. Example 19. Retrotranscript ncRNA end protection by cap and tail (new experiment)

分別藉由加帽及加尾對真核mRNA之5’及3’端進行修飾。帽及尾均保護轉錄本免受核酸外切酶影響,且幫助其自細胞核中輸出且在核糖體上轉譯。最近,長非編碼RNA (>200 nt)顯示通常加帽及加尾(L Statello, 2021)。The 5’ and 3’ ends of eukaryotic mRNAs are modified by capping and tailing, respectively. Both caps and tails protect the transcript from exonucleases and aid in its export from the nucleus and translation on the ribosome. Recently, long noncoding RNAs (>200 nt) were shown to be commonly capped and tailed (L Statello, 2021).

為了嘗試增加逆轉錄子ncRNA在細胞中之半衰期,將帽或尾以及帽及尾兩者添加至ncRNA中(圖55示意圖)。圖74顯示藉由Lipofectamine MessengerMAX遞送至HEK293T細胞的4組分全RNA系統(RT mRNA + Cas9 mRNA + ncRNA + sgRNA)中,藉由帽及尾對ncRNA-sgRNA融合進行之末端保護的結果。全RNA以固定量RT mRNA 100 ng、ncRNA-sgRNA 400 ng、Cas9 mRNA 100 ng及sgRNA 5 ng經轉染。ncRNA-gRNA融合為加帽的(+帽–尾)或加聚A尾的(-帽+尾),或同時加帽及加聚A尾(+帽+尾)。當添加額外sgRNA時,無末端保護之ncRNA (-帽–尾)產生約4.5%之精確編輯,且編輯依賴於RT之存在。雖然單獨加帽不會改變編輯效率,但單獨加尾略微增加編輯,且帽及尾兩者使編輯效率顯著改良約2倍(左圖)。在右圖中,所有條件下之插入缺失如所示為可相當的。In an attempt to increase the half-life of retrotranscript ncRNAs in cells, caps or tails, as well as both caps and tails, were added to the ncRNAs (Figure 55 schematic). Figure 74 shows the results of end protection of ncRNA-sgRNA fusions by caps and tails in a 4-component all-RNA system (RT mRNA + Cas9 mRNA + ncRNA + sgRNA) delivered to HEK293T cells by Lipofectamine MessengerMAX. All-RNAs were transfected with fixed amounts of RT mRNA 100 ng, ncRNA-sgRNA 400 ng, Cas9 mRNA 100 ng, and sgRNA 5 ng. ncRNA-gRNA fusions were capped (+cap–tail) or poly A-tailed (-cap+tail), or both capped and poly A-tailed (+cap+tail). When additional sgRNAs were added, ncRNAs without end protection (-cap-tail) produced approximately 4.5% precise editing, and editing was dependent on the presence of RT. While capping alone did not alter editing efficiency, tailing alone slightly increased editing, and both capping and tailing significantly improved editing efficiency by approximately 2-fold (left panel). In the right panel, indels under all conditions are comparable as shown.

此等結果支持帽及尾對ncRNA之末端保護可能藉由增強細胞內之ncRNA穩定性來促進逆轉錄子介導之精確編輯。 實例20. ncRNA之環化 These results support that cap and tail protection of ncRNA ends may promote accurate retrotranscript-mediated editing by enhancing ncRNA stability in cells. Example 20. ncRNA circularization

我們尋找使ncRNA穩定之替代方式。環狀RNA形成共價閉合之連續環且此無盡形式賦予延長之耐久性。圖75描述如何製得環狀ncRNA。環化利用組I自剪接內含子方法,據報告該方法更適用於長RNA環化且僅需要添加GTP及Mg2 +作為輔因子(22)。ncRNA由自剪接內含子及互補同源序列圍繞(圖75步驟1)。外部及內部同源序列使兩種內含子接近(步驟2)。外源添加之GTP起始一系列轉酯化,由此切除內含子且將ncRNA接合成環狀形式(步驟3)。 We sought alternative ways to stabilize ncRNA. Circular RNAs form covalently closed continuous loops and this endless form confers durability for extension. Figure 75 describes how circular ncRNAs are made. Circularization utilizes the group I self-splicing intron method, which is reported to be more suitable for long RNA circularization and only requires the addition of GTP and Mg2 + as cofactors (22). The ncRNA is surrounded by self-splicing introns and complementary homologous sequences (Figure 75, step 1). External and internal homologous sequences bring the two introns into proximity (step 2). Exogenously added GTP initiates a series of transesterifications, thereby excising the intron and joining the ncRNA into a circular form (step 3).

圖76概述在HEK293T細胞中藉由Eco3全RNA系統測試用於在EMX1基因座處進行精確編輯之經修飾之ncRNA的結果。全RNA以固定量RT mRNA 100 ng、ncRNA 400 ng、Cas9 mRNA 100 ng及sgRNA 5 ng經轉染。在ncRNA 3’端添加MS2莖環之情況下,活性相對於未經修飾之ncRNA (約8%)幾乎翻倍,效率為15%。ncRNA之環化實現了效率之進一步增加,從而達到22%。如圖75所述進行ncRNA之環化。若無sgRNA,則無法如預期偵測到精確編輯及插入缺失。右圖顯示左圖之各別條件下之插入缺失。 實例21. 需要RT及其來自相同逆轉錄子之同源ncRNA來支持精確編輯 Figure 76 summarizes the results of testing modified ncRNAs for precise editing at the EMX1 locus in HEK293T cells by the Eco3 whole RNA system. Whole RNA was transfected with fixed amounts of RT mRNA 100 ng, ncRNA 400 ng, Cas9 mRNA 100 ng, and sgRNA 5 ng. When the MS2 stem loop was added to the 3' end of the ncRNA, the activity was almost doubled relative to the unmodified ncRNA (approximately 8%), with an efficiency of 15%. Circularization of the ncRNA achieved a further increase in efficiency, reaching 22%. Circularization of the ncRNA was performed as described in Figure 75. Without sgRNA, precise editing and indels could not be detected as expected. The right figure shows indels under the respective conditions of the left figure. Example 21. RT and its cognate ncRNA from the same retrotranscript are required to support accurate editing

逆轉錄子RT不需要外源性引子,而是利用模板ncRNA之經折疊二級結構內之特定分支鏈G殘基來起始逆轉錄(Lampson BC, 2005)。一些早期發現之RT顯示出特異性地識別二級結構且啟動其同源ncRNA中之G進行msD合成(Shimamoto T, 1993)。在基因編輯系統中用新近表徵之Aco1逆轉錄子與早期逆轉錄子之一Eco3並行測試此RT-ncRNA特異性。與先前研究一致,Aco1僅在與其同源ncRNA配對時支持精確編輯,且當與Eco3 ncRNA配對時取消活性(圖77,左圖)。對Eco3逆轉錄子及Eco3進行相同觀察,僅其同源ncRNA促進精確編輯。右圖顯示左圖之各別條件下之插入缺失。 實例22. 外顯子大小序列之精確插入 Retrotranscript RT does not require exogenous primers, but instead utilizes specific branched chain G residues within the folded secondary structure of the template ncRNA to initiate retrotranscription (Lampson BC, 2005). Some early discovered RTs were shown to specifically recognize secondary structures and initiate msD synthesis in G in their homologous ncRNAs (Shimamoto T, 1993). This RT-ncRNA specificity was tested in a gene editing system using the recently characterized Aco1 retrotranscript and one of the early retrotranscripts, Eco3. Consistent with previous studies, Aco1 only supports precise editing when paired with its homologous ncRNA, and abolishes activity when paired with Eco3 ncRNA (Figure 77, left). The same observations were made for the Eco3 retrotranscript and Eco3, and only their homologous ncRNAs promoted precise editing. The right panel shows the indels under different conditions in the left panel. Example 22. Accurate insertion of exon-sized sequences

外顯子大小之長插入具有置換引起遺傳疾病之突變富集外顯子的臨床價值。例如,外顯子8為Wilson氏病患者之突變熱點,具有約500種突變,該等突變存在地區及種族差異(Liu L, 2019)。吾人對在HEK293T細胞之EMX1基因座處插入不同大小(多達305 bp)之Aco1、Eco3及全新R1262逆轉錄子進行測試。Aco1以8、4及1.3%實現10、100及205 bp之精確插入(圖78,頂部圖左側)。Eco3分別以12及2.3%進行10及205 bp之精確插入(圖78,頂部圖中間)。新穎逆轉錄子R1262以20、15及8.8%效率獲得25、205及305 bp插入(圖78,頂部圖右側)。與Aco1或Eco3相比,R1262之活性非常強勁,自25增加至205 bp大小時,活性下降不顯著。此外,使用R1262時,精確編輯與插入缺失之比率最小(1:2) (圖78底部圖中之插入缺失)。在RT或sgRNA不存在下,精確編輯被取消,指示該系統依賴於逆轉錄子及位點特異性核酸酶。Exon-sized long insertions have clinical value in replacing mutation-rich exons that cause genetic diseases. For example, exon 8 is a mutation hotspot in Wilson's disease patients, with approximately 500 mutations that vary in region and ethnicity (Liu L, 2019). We tested Aco1, Eco3, and a de novo R1262 retrotransposons of varying sizes (up to 305 bp) inserted at the EMX1 locus in HEK293T cells. Aco1 achieved precise insertions of 10, 100, and 205 bp at 8, 4, and 1.3% (Figure 78, top left). Eco3 performed precise insertions of 10 and 205 bp at 12 and 2.3%, respectively (Figure 78, top center). The novel retrotranscript R1262 obtained 25, 205, and 305 bp insertions with efficiencies of 20, 15, and 8.8% (Figure 78, top panel, right). Compared with Aco1 or Eco3, the activity of R1262 was very robust, with no significant decrease in activity when the size increased from 25 to 205 bp. In addition, the ratio of precise editing to indels was minimal (1:2) when using R1262 (indels in the bottom panel of Figure 78). In the absence of RT or sgRNA, precise editing was abolished, indicating that the system is dependent on the retrotranscript and site-specific nuclease.

除了EMX1基因座以外,R1262逆轉錄子亦支持AAVS1基因座處之205 bp精確插入,效率為11%(圖79)。此等資料擴展了逆轉錄子介導之基因編輯作為基因醫學之潛力。 實例23. 用R1262逆轉錄子最佳化>200 bp長插入之RT:ncRNA比率 In addition to the EMX1 locus, the R1262 retrotranscript also supported a 205 bp precise insertion at the AAVS1 locus with an efficiency of 11% (Figure 79). These data expand the potential of retrotranscript-mediated gene editing as a genetic medicine. Example 23. Optimization of RT:ncRNA ratios for >200 bp long inserts using the R1262 retrotranscript

ncRNA為充當修復模板之單股DNA的來源,且更多單股DNA產生可能誘導更高編輯。然而,在RT、Cas9 mRNA及sgRNA中,ncRNA可能在細胞內 存活時間最短,且係基於逆轉錄子之編輯系統中的限制因素,因為未經化學修飾之ncRNA容易受到細胞核酸酶影響,且在逆轉錄後,ncRNA 由細胞RNase H降解(Palka C, 2022)。基於此等假設,全RNA系統係由混合液中佔據最高比率之ncRNA調配而成,先前用Eco3逆轉錄子進行之研究同意這一點。圖80顯示使用新穎逆轉錄子R1262時在EMX1基因座處之>200 bp長插入之RT: ncRNA比率的測試結果。在1:12.5、1:16.7及1:25莫耳比下測試RT: ncRNA比率。1:16.7比率對應於餘下研究中使用之1:4質量比。使用ncRNA (左圖中之右側)時,對於205及305 bp插入,在1:12.5比率下觀察到最佳編輯。使用ncRNA-sgRNA融合(左圖中之左側)時,所測試之所有比率生成可相當的編輯,但隨著ncRNA-sgRNA之量增加,可見編輯增加之微弱趨勢。 實例24. 新穎逆轉錄子R2781能夠在全RNA系統中插入多達100 bp ncRNA is a source of single-stranded DNA that serves as a repair template, and the production of more single-stranded DNA may induce higher editing. However, among RT, Cas9 mRNA and sgRNA, ncRNA may have the shortest survival time in cells and is a limiting factor in retrotranscript-based editing systems because unchemically modified ncRNA is susceptible to cellular nucleases and, after retrotranscription, ncRNA is degraded by cellular RNase H (Palka C, 2022). Based on these assumptions, the all-RNA system is prepared with the ncRNA that accounts for the highest ratio in the mixture, which is consistent with previous studies using the Eco3 retrotranscript. Figure 80 shows the results of testing the RT: ncRNA ratio for >200 bp long inserts at the EMX1 locus using the novel retrotranscript R1262. RT: ncRNA ratios were tested at 1:12.5, 1:16.7, and 1:25 molar ratios. The 1:16.7 ratio corresponds to the 1:4 mass ratio used in the remaining studies. When using ncRNA (right side in left figure), optimal editing was observed at a 1:12.5 ratio for 205 and 305 bp inserts. When using ncRNA-sgRNA fusions (left side in left figure), all ratios tested produced comparable edits, but a weak trend of increased editing was seen with increasing amounts of ncRNA-sgRNA. Example 24. Novel retrotranscript R2781 is capable of inserting up to 100 bp in an all-RNA system

圖105顯示在全RNA系統中用新穎逆轉錄子R2781測試不同模板長度之ncRNA的結果。藉由脂質轉染將全RNA遞送至HEK293T細胞。在既定量之RT mRNA及Cas9 mRNA下,添加具有不同模板長度插入之ncRNA。在圖16中,R2781在EMX1基因座處證明10 bp插入,與其他命中R1262、R6342S、6342L具有相似活性。此處,使用兩種不同同源臂(HA)長度(雙臂30 bp或49 bp左臂/65 bp右臂)來評估 R2781在EMX1基因座處插入25 bp、205 bp及405 bp之活性。以11%效率實現25 bp及30 bp同源臂之精確插入(左圖)。使用較長同源臂(49/65 bp)時,相同長度插入之效率降低至一半,表面此逆轉錄子逆轉錄酶可能具有有限的酶可加工性。相反,插入205或405 bp之活性顯著下降。右圖顯示左圖之各別條件下之插入缺失。 實例25. 用Cas9切口酶及逆轉錄子R6342S進行精確編輯 Figure 105 shows the results of testing ncRNAs of different template lengths with the novel retrotransposons R2781 in the full RNA system. Full RNA was delivered to HEK293T cells by lipofection. Under a given amount of RT mRNA and Cas9 mRNA, ncRNAs with different template lengths were added. In Figure 16, R2781 demonstrated 10 bp insertion at the EMX1 locus, with similar activity to other hits R1262, R6342S, and 6342L. Here, two different homology arm (HA) lengths (double-arm 30 bp or 49 bp left arm/65 bp right arm) were used to assess the activity of R2781 inserting 25 bp, 205 bp, and 405 bp at the EMX1 locus. Accurate insertion of 25 bp and 30 bp homology arms was achieved with 11% efficiency (left figure). When using longer homology arms (49/65 bp), the efficiency of insertion of the same length is reduced to half, indicating that this retrotranscript reverse transcriptase may have limited enzymatic processivity. In contrast, the activity of insertions of 205 or 405 bp is significantly reduced. The right figure shows the indels under the respective conditions of the left figure. Example 25. Precise editing with Cas9 nickase and retrotranscript R6342S

野生型(WT) Cas9核酸酶生成雙股斷裂 (DSB),DSB可能藉由生成更大缺失或染色體易位或染色體損失而潛在地增加基因體不穩定性(Nahmad AD, 2022)。任一Cas9核酸酶結構域中之單點突變均可生成僅切割標靶位點之一個股之切口酶。D10A變異體使用完整HNH核酸酶結構域切割標靶股且H840A變異體使用完整RuvC結構域切割非標靶股。切口可起始同源定向修復(HDR),且引起比DSB更少之突變誘發末端接合(Maizels N, 2018)。此等特徵引起人們對在基於逆轉錄子之精確編輯中使用切口酶之極大興趣。Wild-type (WT) Cas9 nucleases generate double-strand breaks (DSBs) that can potentially increase genomic instability by generating larger deletions or chromosomal translocations or chromosomal losses (Nahmad AD, 2022). Single-point mutations in either Cas9 nuclease domain can generate a nickase that cleaves only one strand of the target site. The D10A variant uses the full HNH nuclease domain to cleave the target strand and the H840A variant uses the full RuvC domain to cleave the non-target strand. Nicks can initiate homology-directed repair (HDR) and cause less mutation-inducing end joining than DSBs (Maizels N, 2018). These features have generated great interest in the use of nickases in retrotranscript-based precision editing.

先前,用Eco3逆轉錄子與D10A切口酶組合以0.1%效率實現基於逆轉錄子之10 bp精確插入(圖76)。此處,用先導逆轉錄子之一R6342S測試先前顯示出改良Cas9核酸酶活性(Spencer JM, 2017)之D10A野生型及具有R221K/N394K突變之D10A切口酶。觀察到R6342S之精確編輯活性比使用野生型切口酶之Eco3高3~4倍,且使用R221K/N394K切口酶突變體時進一步增強(高7倍) (圖106)。 實例26. 逆轉錄子介導之基因體DNA之精確缺失 Previously, the Eco3 retrotranscript was used in combination with the D10A nickase to achieve retrotranscript-based precise insertion of 10 bp at 0.1% efficiency (Figure 76). Here, D10A wild type and D10A nickase with R221K/N394K mutations, which had previously shown improved Cas9 nuclease activity (Spencer JM, 2017), were tested with one of the leading retrotranscripts, R6342S. The precise editing activity of R6342S was observed to be 3-4 times higher than that of Eco3 using the wild-type nickase, and was further enhanced (7 times higher) when the R221K/N394K nickase mutant was used (Figure 106). Example 26. Retrotranscript-mediated precise deletion of genomic DNA

許多遺傳疾病可能藉由簡單地缺失異常基因體序列來校正。肌肉疾病中之抗肌萎縮蛋白基因(Nelson CE, 2016)及皮膚疾病中之Col7A1基因(Bonafont J, 2019)以及重複擴增病症(Meijboom KE, 2022)係缺失含有非所需突變之序列的良好實例。此處,在全RNA系統中測試由逆轉錄子R1262介導之精確缺失。在圖107中,兩種缺失策略描繪於頂部圖中。藉由並置左及右同源臂序列以缺失間插序列來設計逆轉錄子ncRNA。Del1在EMX1基因座處之Cas9切割位點上游缺失214 bp且del2在Cas9切割位點下游缺失248 bp。將逆轉錄子介導之缺失與直接遞送增加量之單股DNA (ssDNA)供體(150及300 ng)進行比較。在底部圖中,左圖顯示R1262能夠以與ssDNA供體相似之活性自EMX1基因座缺失248 bp (del2)。右圖顯示左圖之各別條件下之插入缺失。應注意,逆轉錄子介導之缺失導致的意外插入缺失活性比ssDNA供體介導之缺失低約6倍。未偵測到R1262之214 bp (del1)缺失活性。Many genetic diseases can potentially be corrected by simply deleting abnormal genomic sequences. The dystrophin gene in muscle disease (Nelson CE, 2016) and the Col7A1 gene in skin disease (Bonafont J, 2019) as well as repeat expansion disorders (Meijboom KE, 2022) are good examples of deleting sequences containing unwanted mutations. Here, precise deletions mediated by retrotranscript R1262 were tested in a total RNA system. In Figure 107, two deletion strategies are depicted in the top figure. Retrotranscript ncRNAs were designed by juxtaposing left and right homology arm sequences to delete intervening sequences. Del1 deleted 214 bp upstream of the Cas9 cleavage site at the EMX1 locus and del2 deleted 248 bp downstream of the Cas9 cleavage site. Retrotranscript-mediated deletions were compared with direct delivery of increasing amounts of single-stranded DNA (ssDNA) donors (150 and 300 ng). In the bottom figure, the left figure shows that R1262 can delete 248 bp (del2) from the EMX1 locus with activity similar to that of the ssDNA donor. The right figure shows the indels under the respective conditions of the left figure. It should be noted that the accidental indel activity caused by retrotranscript-mediated deletions is about 6 times lower than that mediated by ssDNA donors. No 214 bp (del1) deletion activity of R1262 was detected.

此資料表明,逆轉錄子可能用於精確缺失基因體DNA,其活性與ssDNA供體相似,但意外活性較低。 實例27. 基於NHEJ在標靶位點處插入逆轉錄子源性雙股DNA This data suggests that retrotranscripts may be used to precisely delete genomic DNA with activity similar to that of ssDNA donors, but unexpectedly with lower activity. Example 27. NHEJ-based insertion of retrotranscript-derived double-stranded DNA at the target site

當前逆轉錄子介導之基因編輯方法依賴於同源依賴性修復(HDR)機制,該機制僅在分裂細胞中有效。大多數逆轉錄子生成單股DNA (ssDNA),其可用於HDR模板。為了將逆轉錄子介導之編輯應用於其中HDR無效之非分裂細胞(例如,在非分裂細胞中),可考慮替代方法。一種此類替代方法係利用非同源DNA末端接合(NHEJ)在標靶切割位點處整合雙股DNA (dsDNA)。基於NHEJ之dsODN/dsDNA整合已在文獻中多次證實,且在細胞及活體內發揮作用。此方法可用於小插入以及大插入,諸如外顯子置換及安全港敲入。Current retrotranscript-mediated gene editing methods rely on the homology-dependent repair (HDR) mechanism, which is only effective in dividing cells. Most retrotranscripts generate single-stranded DNA (ssDNA), which can be used as HDR templates. In order to apply retrotranscript-mediated editing to non-dividing cells (for example, in non-dividing cells) where HDR is ineffective, alternative methods can be considered. One such alternative method is to integrate double-stranded DNA (dsDNA) at the target cleavage site using non-homologous DNA end joining (NHEJ). NHEJ-based dsODN/dsDNA integration has been demonstrated many times in the literature and works in cells and in vivo. This method can be used for small insertions as well as large insertions, such as exon replacements and safe harbor knock-ins.

可藉由使用任何逆轉錄子之兩個ncRNA由逆轉錄子形成dsDNA模板,該兩個ncRNA之經編碼之RT ssDNA產物彼此互補且形成雙鏈體以變成dsDNA (圖108,左側示意圖)。此等dsDNA在兩個極端均含有msR及msD間隔序列,該等序列不會相互雜交且可能抑制末端接合。在插入物之兩端串聯包括sgRNA識別序列(與基因體標靶序列相同)會引起鈍端雙股DNA,且一個插入方向比另一方向安裝地更穩定,因為其防止Cas9進行再切割。另一方向之插入允許Cas9再切割,因此編輯之序列不太穩定。使用此策略,具有兩個互補ncRNA之R1262逆轉錄子以約1%效率在EMX1基因座上插入約120 bp dsDNA,當插入序列側接兩個串聯sgRNA序列時,效率更高(圖108,右上圖)。無串聯sgRNA序列之R6342之基礎插入活性略微高於R1262之彼活性。底部圖右側顯示頂部圖之各別條件下之插入缺失。此資料表明,逆轉錄子源性dsDNA可能藉由NHEJ機制直接整合至標靶位點處,且不需要與整合位點同源。 實例28. 逆轉錄子RNA不會引發明顯免疫反應 A dsDNA template can be formed from a retrotranscript by using two ncRNAs of any retrotranscript, the encoded RT ssDNA products of which complement each other and form a duplex to become dsDNA (Figure 108, schematic diagram on the left). These dsDNAs contain msR and msD spacer sequences at both ends, which do not hybridize with each other and may inhibit end joining. Including sgRNA recognition sequences (same as the genome target sequence) in series at both ends of the insert will result in blunt-ended double-stranded DNA, and one insertion direction is more stable than the other because it prevents Cas9 from re-cutting. Insertion in the other direction allows Cas9 to re-cut, so the edited sequence is less stable. Using this strategy, the R1262 retrotranscript with two complementary ncRNAs inserted approximately 120 bp of dsDNA at the EMX1 locus with an efficiency of approximately 1%, and the efficiency was higher when the inserted sequence was flanked by two tandem sgRNA sequences (Figure 108, upper right). The basal insertion activity of R6342 without tandem sgRNA sequences was slightly higher than that of R1262. The right side of the bottom graph shows the insertions and deletions under the respective conditions of the top graph. This data suggests that retrotranscript-derived dsDNA may be directly integrated into the target site through the NHEJ mechanism and does not require homology to the integration site. Example 28. Retrotranscript RNA does not induce an obvious immune response

使用人類外周血單核細胞(hPBMC)在全RNA系統中測試逆轉錄子介導之編輯的潛在免疫原性。PBMC含有先天性及適應性免疫細胞,且此等細胞配備有感測器來偵測外源核酸。RNA品質及藉由活體外轉錄生成之RNA之5’三磷酸的存在可能活化免疫感測器。此外,由逆轉錄子RT使用ncRNA作為模板製備之單股DNA可能潛在地引發免疫反應。為了減少免疫反應,由免疫原性較低之m1Ψ置換RT mRNA中之尿苷(Nace KD, 2021),且用m7G對ncRNA之5’三磷酸加帽。此等經修飾形式之RNA單獨或組合電穿孔至hPBMC,連同作為對照之無鹼基修飾之GFP mRNA (TriLink)。隔夜培養後,分析上清液之細胞介素產生。在所檢查之25種細胞介素及趨化介素中,高於偵測極限之彼等在圖109中示出。在任何逆轉錄子RNA轉染之細胞中均未偵測到I型乾擾素(對外源核酸之免疫反應的標誌),且在逆轉錄子RNA轉染之細胞中偵測到低水準之一些發炎性細胞介素及趨化介素,但其水準比除不具有U修飾之RT mRNA以外的GFP mRNA對照低得多。應注意,ncRNA並未作為GFP mRNA對照針對U進行修飾且電穿孔比RT mRNA或GFP mRNA多四倍,反映了用於基因編輯之條件。儘管存在此相似性及差異,ncRNA轉染之細胞產生比GFP mRNA對照少得多的細胞介素/趨化介素。 實例29. 逆轉錄子介導之精確編輯在原代細胞中之證據 The potential immunogenicity of retrotranscript-mediated editing was tested in a total RNA system using human peripheral blood mononuclear cells (hPBMCs). PBMCs contain innate and adaptive immune cells, and these cells are equipped with sensors to detect foreign nucleic acids. RNA quality and the presence of 5' triphosphates in RNA generated by ex vivo transcription may activate immune sensors. In addition, single-stranded DNA prepared by retrotranscript RT using ncRNA as a template may potentially induce an immune response. To reduce the immune response, uridine in RT mRNA was replaced by the less immunogenic m1Ψ (Nace KD, 2021), and the 5' triphosphate of ncRNA was capped with m7G. These modified forms of RNA were electroporated into hPBMCs alone or in combination, along with GFP mRNA (TriLink) without the base modification as a control. After overnight culture, the supernatants were analyzed for interleukin production. Of the 25 interleukins and chemokines examined, those above the detection limit are shown in Figure 109. Type I interferons (a marker of immune response to foreign nucleic acids) were not detected in any of the retrotranscript RNA transfected cells, and low levels of some inflammatory interleukins and chemokines were detected in retrotranscript RNA transfected cells, but the levels were much lower than the GFP mRNA control except for the RT mRNA without the U modification. It should be noted that the ncRNA was not modified for U as a GFP mRNA control and was electroporated four times more than RT mRNA or GFP mRNA, reflecting the conditions used for gene editing. Despite this similarity and difference, ncRNA-transfected cells produced much less cytokines/interleukins than the GFP mRNA control. Example 29. Evidence for precise retrotranscript-mediated editing in primary cells

首先,使用骨髓源性CD34+人類幹細胞(HSC)來評估原代細胞中逆轉錄子介導之基因編輯之效能。HSC壽命長且能夠生成整個造血系統,使得HSC移植數十年來已用於治療造血病症,諸如遺傳性疾病或白血病。雖然移植療法對患者具有支持作用,但其缺乏明確治愈結果。相較之下,直接校正患者HSC之基因體中存在的突變有可能產生治愈效果。此觀點最終將藉由臨床設定中之HSC基因編輯療法得到確認。First, bone marrow-derived CD34+ human stem cells (HSCs) were used to evaluate the efficacy of retrotransposons-mediated gene editing in primary cells. Due to their long lifespan and their ability to generate the entire hematopoietic system, HSC transplantation has been used for decades to treat hematopoietic disorders such as genetic diseases or leukemias. Although transplantation therapy has a supportive effect on patients, it lacks clear curative results. In contrast, directly correcting mutations present in the genome of a patient's HSCs has the potential to produce a curative effect. This view will ultimately be confirmed by HSC gene editing therapy in a clinical setting.

在圖110中,顯示全RNA系統中的逆轉錄子介導之基因編輯在人類幹祖細胞中之測試結果。將冷凍之HSC解凍以在hSCF、hFLT3-L、hTPO細胞介素存在下擴增三天,從而防止分化。使總計3或5 µg Cas9 mRNA、引導RNA (gRNA)、R6342 RT mRNA及ncRNA以左圖中之每個標籤中所指示之質量比混合,且電穿孔至HSC。再培養三天后,提取基因體DNA且進行測序以量測編輯頻率。使用共計5微克RNA (以1: 0.3: 4: 1=Cas9 :gRNA :ncRNA :RT分開),以0.1%頻率觀察到25 bp在AAVS1基因座處之精確插入(左圖)。右圖示出在其各別條件下之插入缺失。In Figure 110, the results of testing retrotranscript-mediated gene editing in a total RNA system in human stem cells are shown. Frozen HSCs were thawed to expand for three days in the presence of hSCF, hFLT3-L, hTPO cytokines to prevent differentiation. A total of 3 or 5 μg of Cas9 mRNA, guide RNA (gRNA), R6342 RT mRNA, and ncRNA were mixed at the mass ratios indicated in each label in the left figure and electroporated into HSCs. After another three days of culture, genomic DNA was extracted and sequenced to measure the frequency of editing. Using a total of 5 μg of RNA (split into 1: 0.3: 4: 1 = Cas9: gRNA: ncRNA: RT), a precise insertion of 25 bp at the AAVS1 locus was observed at a frequency of 0.1% (left). The right panel shows the indels under their respective conditions.

其次,T細胞為免疫系統中負責尋找且破壞不健康細胞之細胞。此等不健康細胞通常為感染有害病原體之細胞,但一些T細胞亦可天然地識別且殺死癌細胞。T細胞基因體之基因工程改造會增強T細胞功能且增強T細胞功能之免疫療法在治療B細胞急性淋巴母細胞白血病方面已取得顯著臨床成功(Ellis GI, 2021)。基於此等成功,針對癌症、傳染病及自體免疫疾病之廣泛臨床開發已利用T細胞基因體工程改造。Secondly, T cells are cells in the immune system that seek out and destroy unhealthy cells. These unhealthy cells are usually cells infected with harmful pathogens, but some T cells can also naturally recognize and kill cancer cells. Genetic engineering of T cell genomes enhances T cell function and immunotherapy to enhance T cell function has achieved remarkable clinical success in treating B cell acute lymphoblastic leukemia (Ellis GI, 2021). Based on these successes, extensive clinical development for cancer, infectious diseases and autoimmune diseases has utilized T cell genome engineering.

在圖111中,顯示全RNA系統中的逆轉錄子介導之基因編輯在人類T細胞中之測試結果。將來自外周血之人類泛T細胞解凍且在抗CD3/抗CD28結合之磁珠及IL-2細胞介素存在下活化兩天。培養後,用磁鐵自細胞中移除抗CD3/抗CD28珠粒。使總計3或5 µg Cas9 mRNA、引導RNA (gRNA)、R6342 RT mRNA及ncRNA以左圖中之每個標籤中所指示之質量比混合,且藉由Neon或Lonza電穿孔器電穿孔至T細胞。再培養三天后,提取基因體DNA且進行測序以量測編輯頻率。使用共計3 µg RNA (以1: 0.3: 3: 1=Cas9 :gRNA :ncRNA :RT分開),使用Neon機器以高達1.7%頻率觀察到25 bp在AAVS1基因座處之精確插入(左圖)。右圖示出在其各別條件下之插入缺失。 實例30. 使用LNP在 活體內進行基於逆轉錄子之基因編輯 In Figure 111, the results of testing retrotranscript-mediated gene editing in a total RNA system in human T cells are shown. Human pan T cells from peripheral blood were thawed and activated for two days in the presence of anti-CD3/anti-CD28-conjugated magnetic beads and IL-2 cytokines. After culture, anti-CD3/anti-CD28 beads were removed from the cells with a magnet. A total of 3 or 5 μg of Cas9 mRNA, guide RNA (gRNA), R6342 RT mRNA, and ncRNA were mixed at the mass ratio indicated in each label in the left figure and electroporated into T cells by Neon or Lonza electroporators. After another three days of culture, genomic DNA was extracted and sequenced to measure the frequency of editing. Using a total of 3 µg RNA (split 1: 0.3: 3: 1 = Cas9: gRNA: ncRNA: RT), precise insertions of 25 bp at the AAVS1 locus were observed with a frequency of up to 1.7% using the Neon machine (left). The right panel shows indels under their respective conditions. Example 30. Retrotran-based gene editing in vivo using LNP

在某些實施例中,可用全RNA系統來實現逆轉錄子介導之基因編輯,而不需要病毒DNA供體或外源DNA供體,因為逆轉錄子能夠由細胞內部之ncRNA模板產生供體DNA。如本文所提出,此特徵尤其適用於基於脂質奈米顆粒(LNP)之平台。根據本文所述之方法,研究基於逆轉錄子之基因編輯在 活體內之效能。基於逆轉錄子之基因編輯系統可能在一或多種LNP中調配而成。 In certain embodiments, retrotranscript-mediated gene editing can be achieved with an all-RNA system without the need for a viral DNA donor or an exogenous DNA donor, because retrotranscripts can generate donor DNA from ncRNA templates inside cells. As proposed herein, this feature is particularly suitable for lipid nanoparticle (LNP)-based platforms. According to the methods described herein, the efficacy of retrotranscript-based gene editing in vivo is studied. Retrotranscript-based gene editing systems may be formulated in one or more LNPs.

可製備例示性且非限制性LNP調配物系統來用於逆轉錄子編輯系統之 活體內遞送。 用於活體內遞送基於逆轉錄子之編輯系統的LNP調配物 Exemplary and non-limiting LNP formulation systems can be prepared for in vivo delivery of retrotranscript-based editing systems. LNP formulations for in vivo delivery of retrotranscript-based editing systems

使可離子化脂質、磷脂、膽固醇及PEG-脂質以規定之莫耳比溶解於純乙醇中(例示性調配物在下文中顯示為調配物莫耳比A、B、C及D),其中總脂質濃度為約7.2 mM。使用含有基因編輯系統(諸如1:0.3:4:1比率之Cas9 mRNA/gRNA/ncRNA/RT)之酸性緩衝液(pH 4.0-5.0)來製備多核苷酸溶液(例示性濃度為0.067 mg / mL)。使用NanoAssemblr微流體系統以12 mL/min之總流動速率使核苷酸及脂質溶液以3:1體積比混合,從而導致LNP之快速混合及自組裝。使調配物進一步在4℃下針對PBS (pH 7.4)透析隔夜,使用離心過濾進行濃縮且過濾(0.2 µm孔徑)。使用Zetasizer Ultra (Malvern Panalytical)藉由動態光散射(DLS)來量測調配物之粒徑及多分散指數(PDI)。藉由Ribogreen分析來確定RNA囊封效率(EE%)。 調配物莫耳比A: 可离子化脂質 DSPC 膽固醇 DMG PEG-2k 48.5 10 39 2.5 調配物莫耳比B: 可离子化脂質 DSPC 膽固醇 DMG PEG-2k 55 10 32.5 2.5 調配物莫耳比C: 可离子化脂質 DSPC 膽固醇 DMPE PEG-2k 55 10 32.5 2.5 調配物莫耳比D: 可离子化脂質 DSPC 膽固醇 DSPE PEG-2k 48.5 10 40 1.5 基於LNP之逆轉錄子編輯系統的活體內遞送方案 Ionizable lipids, phospholipids, cholesterol, and PEG-lipids were dissolved in pure ethanol at defined molar ratios (exemplary formulations are shown below as formulation molar ratios A, B, C, and D), with a total lipid concentration of approximately 7.2 mM. A polynucleotide solution (exemplary concentration of 0.067 mg/mL) was prepared using an acidic buffer (pH 4.0-5.0) containing a gene editing system (e.g., Cas9 mRNA/gRNA/ncRNA/RT at a ratio of 1:0.3:4:1). Nucleotide and lipid solutions were mixed at a 3:1 volume ratio using a NanoAssemblr microfluidic system at a total flow rate of 12 mL/min, resulting in rapid mixing and self-assembly of LNPs. The formulation was further dialyzed against PBS (pH 7.4) at 4°C overnight, concentrated and filtered (0.2 µm pore size) using centrifugal filtration. The particle size and polydispersity index (PDI) of the formulation were measured by dynamic light scattering (DLS) using a Zetasizer Ultra (Malvern Panalytical). RNA encapsulation efficiency (EE%) was determined by Ribogreen analysis. Formulation molar ratio A: Ionizable lipids DSPC Cholesterol DMG PEG-2k 48.5 10 39 2.5 Formulation molar ratio B: Ionizable lipids DSPC Cholesterol DMG PEG-2k 55 10 32.5 2.5 Formulation molar ratio C: Ionizable lipids DSPC Cholesterol DMPE PEG-2k 55 10 32.5 2.5 Preparation molar ratio D: Ionizable lipids DSPC Cholesterol DSPE PEG-2k 48.5 10 40 1.5 In vivo delivery solution based on LNP retrotranscript editing system

每項研究均使用介於新生至成年(6-10週)範圍內之CD-1雌性小鼠。LNP經由側尾靜脈以大約5 mL/每公斤體重之體積給藥。在給藥後至少24小時內,定期觀察該等動物之副作用。小鼠以2 mpk給藥。每種調配物對5隻動物進行給藥。7天后,在異氟烷麻醉下藉由經由心臟穿刺進行放血來對動物進行安樂死。自每隻動物收集中靶/脫靶組織以進行DNA提取及分析。藉由下一代測序(NGS)來量測小鼠群體之編輯。記錄動物之整體健康及福祉,以確定基因編輯有效載荷之遞送是否導致有害脫靶編輯。 實例31. 經由ncRNA變異體文庫之高通量篩選對逆轉錄子ncRNA序列進行最佳化 CD-1 female mice ranging in age from newborn to adult (6-10 weeks) were used in each study. LNP was administered via the caudal vein in a volume of approximately 5 mL/kg of body weight. The animals were observed regularly for side effects for at least 24 hours after dosing. Mice were dosed at 2 mpk. Five animals were dosed for each formulation. After 7 days, animals were euthanized by exsanguination via cardiac puncture under isoflurane anesthesia. On-target/off-target tissues were collected from each animal for DNA extraction and analysis. Editing was measured in mouse colonies by next generation sequencing (NGS). The overall health and well-being of the animals was recorded to determine whether delivery of the gene editing payload resulted in deleterious off-target editing. Example 31. Optimization of retrotranscript ncRNA sequences by high-throughput screening of ncRNA variant libraries

逆轉錄子為原核生物原生的且因此,ncRNA序列可進一步進行工程改造以適應其應用且將人類細胞中之效率增至最大。Retrotranscripts are native to prokaryotes and therefore, ncRNA sequences can be further engineered to adapt their applications and maximize their efficiency in human cells.

為了瞭解及最佳化逆轉錄子ncRNA在人類細胞中之編輯效率,開發一種匯集之ncRNA文庫,該文庫在數種關鍵元件中具有不同長度及序列的191種變異體(下表31A中提供之序列)加上2種對照(陽性及陰性對照) (諸如但不限於,a1:a2莖、msR莖-環之長度的修飾)。此等變異體中之每一者均與供體區域中設計之獨特條碼相關,且該變異體文庫經合成為匯集之oligo文庫。將合成之oligo文庫選殖至匯集之DNA文庫中,藉由活體外轉錄自其中產生匯集之ncRNA。接著將RNA文庫轉染至293T細胞株,且在轉染後之不同時間點收穫細胞。接著對獨特條碼應用下一代測序,以量測每種RNA變異體之豐度、ssDNA供體產生以及經編輯基因座之精確插入。藉由比較此等資料集,可對每種ncRNA變異體之穩定性及功效進行評分。自評分角度來看,可藉由整合最有效之經修飾特徵來最佳化/工程改造增強型ncRNA。 方法: To understand and optimize the editing efficiency of retrotranscript ncRNA in human cells, a pooled ncRNA library was developed with 191 variants of different lengths and sequences in several key elements (sequences provided in Table 31A below) plus 2 controls (positive and negative controls) (such as, but not limited to, modifications of the length of the a1:a2 stem, msR stem-loop). Each of these variants is associated with a unique barcode designed in the donor region, and the variant library is synthesized into a pooled oligo library. The synthesized oligo library is cloned into a pooled DNA library, from which pooled ncRNAs are generated by in vitro transcription. The RNA library was then transfected into 293T cell lines and cells were harvested at different time points after transfection. Next generation sequencing was then applied to the unique barcodes to measure the abundance of each RNA variant, ssDNA donor production, and precise insertion of the edited loci. By comparing these datasets, the stability and efficacy of each ncRNA variant can be scored. From the perspective of the scores, enhanced ncRNAs can be optimized/engineered by integrating the most effective modified features. Methods:

設計191種ncRNA變異體,其中向以下ncRNA元件引入變異:(1) a1a2;(2) msD莖;(3) msR莖;(4) msR環;(5)終止序列(在msR間隔區內);(6) msD、msR間隔基缺失(最小變異體)。此外,亦存在一種陽性對照(WT R6342S ncRNA),及具有破壞之逆轉錄啟動位點之陰性對照。此等變異體中之每一者均與在靶向人類EMX1基因之供體區域中設計的數種獨特條碼(例如,「ACCTATCATTCANNNNNNNN」)相關。該變異體文庫經合成為匯集之oligo文庫。將合成之oligo文庫組裝至匯集之DNA文庫中,藉由活體外轉錄自其中產生匯集之ncRNA。ncRNA變異體序列及對照序列提供於表31A中。代表性文庫ncRNA構築體描繪於圖112中。該文庫中之變異類型描繪於圖113中。 表31A – ncRNA變異體序列及條碼 變異體 DNA 序列(變化區域以小寫字母標記) SEQ ID NO: 條碼 SEQ ID NO: 類型 陽性對照(WT) CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG ACCTATCATTCANNNNNNNN 陽性對照 陰性對照 CATAGATTTCTTatCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG ACCTATCATTCANNNNNNNN 陰性對照 Var0001 tataaGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGttata 19543 ACCTATCATTCANNNNNNNN 19735 a1a2變異體 Var0002 tgataGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGtatca 19544 ACCTATCATTCANNNNNNNN 19736 a1a2變異體 Var0003 tctacGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgtaga 19545 ACCTATCATTCANNNNNNNN 19737 a1a2變異體 Var0004 gatccGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGggatc 19546 ACCTATCATTCANNNNNNNN 19738 a1a2變異體 Var0005 acgcgGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGcgcgt 19547 ACCTATCATTCANNNNNNNN 19739 a1a2變異體 Var0006 cgcgcGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgcgcg 19548 ACCTATCATTCANNNNNNNN 19740 a1a2變異體 Var0007 tattaaatttGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGaaatttaata 19549 ACCTATCATTCANNNNNNNN 19741 a1a2變異體 Var0008 tataatacgaGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGtcgtattata 19550 ACCTATCATTCANNNNNNNN 19742 a1a2變異體 Var0009 ttctgccaatGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGattggcagaa 19551 ACCTATCATTCANNNNNNNN 19743 a1a2變異體 Var0010 tctcctcgagGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGctcgaggaga 19552 ACCTATCATTCANNNNNNNN 19744 a1a2變異體 Var0011 ccgggttcgcGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgcgaacccgg 19553 ACCTATCATTCANNNNNNNN 19745 a1a2變異體 Var0012 ggccgggcccGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgggcccggcc 19554 ACCTATCATTCANNNNNNNN 19746 a1a2變異體 Var0013 aaatttaattataaaGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGtttataattaaattt 19555 ACCTATCATTCANNNNNNNN 19747 a1a2變異體 Var0014 atattctattacttgGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGcaagtaatagaatat 19556 ACCTATCATTCANNNNNNNN 19748 a1a2變異體 Var0015 gcgtatagtaatctgGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGcagattactatacgc 19557 ACCTATCATTCANNNNNNNN 19749 a1a2變異體 Var0016 aacgcgaaacgctggGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGccagcgtttcgcgtt 19558 ACCTATCATTCANNNNNNNN 19750 a1a2變異體 Var0017 gaggcgggtgccgcaGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGtgcggcacccgcctc 19559 ACCTATCATTCANNNNNNNN 19751 a1a2變異體 Var0018 gcggccgcgggcgggGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGcccgcccgcggccgc 19560 ACCTATCATTCANNNNNNNN 19752 a1a2變異體 Var0019 aatattatattaaatattatGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGataatatttaatataatatt 19561 ACCTATCATTCANNNNNNNN 19753 a1a2變異體 Var0020 gaacaaactaaatataaatcGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgatttatatttagtttgttc 19562 ACCTATCATTCANNNNNNNN 19754 a1a2變異體 Var0021 cgatttctaagacttcggtaGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGtaccgaagtcttagaaatcg 19563 ACCTATCATTCANNNNNNNN 19755 a1a2變異體 Var0022 gtcagggaactccaggaggaGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGtcctcctggagttccctgac 19564 ACCTATCATTCANNNNNNNN 19756 a1a2變異體 Var0023 cgtgggccgcgcttgagccgGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGcggctcaagcgcggcccacg 19565 ACCTATCATTCANNNNNNNN 19757 a1a2變異體 Var0024 cggccgccggcgccggcgcgGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGcgcgccggcgccggcggccg 19566 ACCTATCATTCANNNNNNNN 19758 a1a2變異體 Var0025 attaatatataaatttaattatataGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGtatataattaaatttatatattaat 19567 ACCTATCATTCANNNNNNNN 19759 a1a2變異體 Var0026 gttatactaataagaattatctgaaGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGttcagataattcttattagtataac 19568 ACCTATCATTCANNNNNNNN 19760 a1a2變異體 Var0027 ttaagatagaggcacttctagtgagGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGctcactagaagtgcctctatcttaa 19569 ACCTATCATTCANNNNNNNN 19761 a1a2變異體 Var0028 caggcagcacgcacacagccgaaatGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGatttcggctgtgtgcgtgctgcctg 19570 ACCTATCATTCANNNNNNNN 19762 a1a2變異體 Var0029 caccactggcgccgccagcaggcggGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGccgcctgctggcggcgccagtggtg 19571 ACCTATCATTCANNNNNNNN 19763 a1a2變異體 Var0030 cgcggccgcgccgggcgcgcgcccgGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGcgggcgcgcgcccggcgcggccgcg 19572 ACCTATCATTCANNNNNNNN 19764 a1a2變異體 Var0031 aataattattatataataatatattaataaGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGttattaatatattattatataataattatt 19573 ACCTATCATTCANNNNNNNN 19765 a1a2變異體 Var0032 aataatataatcccttagtctattagtttaGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGtaaactaatagactaagggattatattatt 19574 ACCTATCATTCANNNNNNNN 19766 a1a2變異體 Var0033 taatttacaggacagggactttactcgatcGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgatcgagtaaagtccctgtcctgtaaatta 19575 ACCTATCATTCANNNNNNNN 19767 a1a2變異體 Var0034 tatctggcaacggccagagagagcgcctgaGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGtcaggcgctctctctggccgttgccagata 19576 ACCTATCATTCANNNNNNNN 19768 a1a2變異體 Var0035 gccgtcccgcgcggcctgcccaaacgccctGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGagggcgtttgggcaggccgcgcgggacggc 19577 ACCTATCATTCANNNNNNNN 19769 a1a2變異體 Var0036 cgccgccggccgggcgcgggcgccggcggcGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgccgccggcgcccgcgcccggccggcggcg 19578 ACCTATCATTCANNNNNNNN 19770 a1a2變異體 Var0037 aatatatataaatattaataataatattaataaatGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGatttattaatattattattaatatttatatatatt 19579 ACCTATCATTCANNNNNNNN 19771 a1a2變異體 Var0038 ttaatatctaacaattattgaagttgctttatttcGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgaaataaagcaacttcaataattgttagatattaa 19580 ACCTATCATTCANNNNNNNN 19772 a1a2變異體 Var0039 cagttatagcgcgaactagtttgacctgatttgtaGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGtacaaatcaggtcaaactagttcgcgctataactg 19581 ACCTATCATTCANNNNNNNN 19773 a1a2變異體 Var0040 atagactggccgcccgtaggaacttgaggacgcacGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgtgcgtcctcaagttcctacgggcggccagtctat 19582 ACCTATCATTCANNNNNNNN 19774 a1a2變異體 Var0041 gcgagcgcgaggacgccacgagcacgcgccctgccGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGggcagggcgcgtgctcgtggcgtcctcgcgctcgc 19583 ACCTATCATTCANNNNNNNN 19775 a1a2變異體 Var0042 cggccggcgcccgggccgggcgggcccgcgggcggGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGccgcccgcgggcccgcccggcccgggcgccggccg 19584 ACCTATCATTCANNNNNNNN 19776 a1a2變異體 Var0043 catagagGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGctctatg 19585 ACCTATCATTCANNNNNNNN 19777 a1a2變異體 Var0044 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgcgcgcaattcgctacgctaccaatactgtgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcacagtattggtgctacgctgaagtgtcacaaccAAATATAAGAATTGTTAGCAAGAAATCTATG 19586 ACCTATCATTCANNNNNNNN 19778 msD莖變異體 Var0045 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgcgcgcaattcgctacgctaccaataACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtattggtgctacgctgaagtgtcacaaccAAATATAAGAATTGTTAGCAAGAAATCTATG 19587 ACCTATCATTCANNNNNNNN 19779 msD莖變異體 Var0046 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgcgcgcaattcgctacgctacACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgtgctacgctgaagtgtcacaaccAAATATAAGAATTGTTAGCAAGAAATCTATG 19588 ACCTATCATTCANNNNNNNN 19780 msD莖變異體 Var0047 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgcgcgcaattcgctacACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtacgctgaagtgtcacaaccAAATATAAGAATTGTTAGCAAGAAATCTATG 19589 ACCTATCATTCANNNNNNNN 19781 msD莖變異體 Var0048 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgcgcgcaattcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgaagtgtcacaaccAAATATAAGAATTGTTAGCAAGAAATCTATG 19590 ACCTATCATTCANNNNNNNN 19782 msD莖變異體 Var0049 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgcgcgcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgtgtcacaaccAAATATAAGAATTGTTAGCAAGAAATCTATG 19591 ACCTATCATTCANNNNNNNN 19783 msD莖變異體 Var0050 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcaaccAAATATAAGAATTGTTAGCAAGAAATCTATG 19592 ACCTATCATTCANNNNNNNN 19784 msD莖變異體 Var0051 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAtataaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAttataAAATATAAGAATTGTTAGCAAGAAATCTATG 19593 ACCTATCATTCANNNNNNNN 19785 msD莖變異體 Var0052 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAtgataACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtatcaAAATATAAGAATTGTTAGCAAGAAATCTATG 19594 ACCTATCATTCANNNNNNNN 19786 msD莖變異體 Var0053 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAtctacACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgtagaAAATATAAGAATTGTTAGCAAGAAATCTATG 19595 ACCTATCATTCANNNNNNNN 19787 msD莖變異體 Var0054 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAgatccACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAggatcAAATATAAGAATTGTTAGCAAGAAATCTATG 19596 ACCTATCATTCANNNNNNNN 19788 msD莖變異體 Var0055 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAacgcgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcgcgtAAATATAAGAATTGTTAGCAAGAAATCTATG 19597 ACCTATCATTCANNNNNNNN 19789 msD莖變異體 Var0056 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAcgcgcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgcgcgAAATATAAGAATTGTTAGCAAGAAATCTATG 19598 ACCTATCATTCANNNNNNNN 19790 msD莖變異體 Var0057 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAtattaaatttACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAaaatttaataAAATATAAGAATTGTTAGCAAGAAATCTATG 19599 ACCTATCATTCANNNNNNNN 19791 msD莖變異體 Var0058 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAtataatacgaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtcgtattataAAATATAAGAATTGTTAGCAAGAAATCTATG 19600 ACCTATCATTCANNNNNNNN 19792 msD莖變異體 Var0059 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAttctgccaatACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAattggcagaaAAATATAAGAATTGTTAGCAAGAAATCTATG 19601 ACCTATCATTCANNNNNNNN 19793 msD莖變異體 Var0060 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAtctcctcgagACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCActcgaggagaAAATATAAGAATTGTTAGCAAGAAATCTATG 19602 ACCTATCATTCANNNNNNNN 19794 msD莖變異體 Var0061 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAccgggttcgcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgcgaacccggAAATATAAGAATTGTTAGCAAGAAATCTATG 19603 ACCTATCATTCANNNNNNNN 19795 msD莖變異體 Var0062 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAggccgggcccACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgggcccggccAAATATAAGAATTGTTAGCAAGAAATCTATG 19604 ACCTATCATTCANNNNNNNN 19796 msD莖變異體 Var0063 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAaaatttaattataaaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtttataattaaatttAAATATAAGAATTGTTAGCAAGAAATCTATG 19605 ACCTATCATTCANNNNNNNN 19797 msD莖變異體 Var0064 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAatattctattacttgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcaagtaatagaatatAAATATAAGAATTGTTAGCAAGAAATCTATG 19606 ACCTATCATTCANNNNNNNN 19798 msD莖變異體 Var0065 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAgcgtatagtaatctgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcagattactatacgcAAATATAAGAATTGTTAGCAAGAAATCTATG 19607 ACCTATCATTCANNNNNNNN 19799 msD莖變異體 Var0066 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAaacgcgaaacgctggACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAccagcgtttcgcgttAAATATAAGAATTGTTAGCAAGAAATCTATG 19608 ACCTATCATTCANNNNNNNN 19800 msD莖變異體 Var0067 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAgaggcgggtgccgcaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtgcggcacccgcctcAAATATAAGAATTGTTAGCAAGAAATCTATG 19609 ACCTATCATTCANNNNNNNN 19801 msD莖變異體 Var0068 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAgcggccgcgggcgggACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcccgcccgcggccgcAAATATAAGAATTGTTAGCAAGAAATCTATG 19610 ACCTATCATTCANNNNNNNN 19802 msD莖變異體 Var0069 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAaatattatattaaatattatACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAataatatttaatataatattAAATATAAGAATTGTTAGCAAGAAATCTATG 19611 ACCTATCATTCANNNNNNNN 19803 msD莖變異體 Var0070 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAgaacaaactaaatataaatcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgatttatatttagtttgttcAAATATAAGAATTGTTAGCAAGAAATCTATG 19612 ACCTATCATTCANNNNNNNN 19804 msD莖變異體 Var0071 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAcgatttctaagacttcggtaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtaccgaagtcttagaaatcgAAATATAAGAATTGTTAGCAAGAAATCTATG 19613 ACCTATCATTCANNNNNNNN 19805 msD莖變異體 Var0072 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAgtcagggaactccaggaggaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtcctcctggagttccctgacAAATATAAGAATTGTTAGCAAGAAATCTATG 19614 ACCTATCATTCANNNNNNNN 19806 msD莖變異體 Var0073 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAcgtgggccgcgcttgagccgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcggctcaagcgcggcccacgAAATATAAGAATTGTTAGCAAGAAATCTATG 19615 ACCTATCATTCANNNNNNNN 19807 msD莖變異體 Var0074 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAcggccgccggcgccggcgcgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcgcgccggcgccggcggccgAAATATAAGAATTGTTAGCAAGAAATCTATG 19616 ACCTATCATTCANNNNNNNN 19808 msD莖變異體 Var0075 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAattaatatataaatttaattatataACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtatataattaaatttatatattaatAAATATAAGAATTGTTAGCAAGAAATCTATG 19617 ACCTATCATTCANNNNNNNN 19809 msD莖變異體 Var0076 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAgttatactaataagaattatctgaaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAttcagataattcttattagtataacAAATATAAGAATTGTTAGCAAGAAATCTATG 19618 ACCTATCATTCANNNNNNNN 19810 msD莖變異體 Var0077 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAttaagatagaggcacttctagtgagACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCActcactagaagtgcctctatcttaaAAATATAAGAATTGTTAGCAAGAAATCTATG 19619 ACCTATCATTCANNNNNNNN 19811 msD莖變異體 Var0078 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAcaggcagcacgcacacagccgaaatACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAatttcggctgtgtgcgtgctgcctgAAATATAAGAATTGTTAGCAAGAAATCTATG 19620 ACCTATCATTCANNNNNNNN 19812 msD莖變異體 Var0079 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAcaccactggcgccgccagcaggcggACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAccgcctgctggcggcgccagtggtgAAATATAAGAATTGTTAGCAAGAAATCTATG 19621 ACCTATCATTCANNNNNNNN 19813 msD莖變異體 Var0080 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAcgcggccgcgccgggcgcgcgcccgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcgggcgcgcgcccggcgcggccgcgAAATATAAGAATTGTTAGCAAGAAATCTATG 19622 ACCTATCATTCANNNNNNNN 19814 msD莖變異體 Var0081 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAaataattattatataataatatattaataaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAttattaatatattattatataataattattAAATATAAGAATTGTTAGCAAGAAATCTATG 19623 ACCTATCATTCANNNNNNNN 19815 msD莖變異體 Var0082 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAaataatataatcccttagtctattagtttaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtaaactaatagactaagggattatattattAAATATAAGAATTGTTAGCAAGAAATCTATG 19624 ACCTATCATTCANNNNNNNN 19816 msD莖變異體 Var0083 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAtaatttacaggacagggactttactcgatcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgatcgagtaaagtccctgtcctgtaaattaAAATATAAGAATTGTTAGCAAGAAATCTATG 19625 ACCTATCATTCANNNNNNNN 19817 msD莖變異體 Var0084 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAtatctggcaacggccagagagagcgcctgaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtcaggcgctctctctggccgttgccagataAAATATAAGAATTGTTAGCAAGAAATCTATG 19626 ACCTATCATTCANNNNNNNN 19818 msD莖變異體 Var0085 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAgccgtcccgcgcggcctgcccaaacgccctACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAagggcgtttgggcaggccgcgcgggacggcAAATATAAGAATTGTTAGCAAGAAATCTATG 19627 ACCTATCATTCANNNNNNNN 19819 msD莖變異體 Var0086 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAcgccgccggccgggcgcgggcgccggcggcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgccgccggcgcccgcgcccggccggcggcgAAATATAAGAATTGTTAGCAAGAAATCTATG 19628 ACCTATCATTCANNNNNNNN 19820 msD莖變異體 Var0087 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAaatatatataaatattaataataatattaataaatACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAatttattaatattattattaatatttatatatattAAATATAAGAATTGTTAGCAAGAAATCTATG 19629 ACCTATCATTCANNNNNNNN 19821 msD莖變異體 Var0088 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAttaatatctaacaattattgaagttgctttatttcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgaaataaagcaacttcaataattgttagatattaaAAATATAAGAATTGTTAGCAAGAAATCTATG 19630 ACCTATCATTCANNNNNNNN 19822 msD莖變異體 Var0089 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAcagttatagcgcgaactagtttgacctgatttgtaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtacaaatcaggtcaaactagttcgcgctataactgAAATATAAGAATTGTTAGCAAGAAATCTATG 19631 ACCTATCATTCANNNNNNNN 19823 msD莖變異體 Var0090 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAatagactggccgcccgtaggaacttgaggacgcacACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgtgcgtcctcaagttcctacgggcggccagtctatAAATATAAGAATTGTTAGCAAGAAATCTATG 19632 ACCTATCATTCANNNNNNNN 19824 msD莖變異體 Var0091 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAgcgagcgcgaggacgccacgagcacgcgccctgccACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAggcagggcgcgtgctcgtggcgtcctcgcgctcgcAAATATAAGAATTGTTAGCAAGAAATCTATG 19633 ACCTATCATTCANNNNNNNN 19825 msD莖變異體 Var0092 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAcggccggcgcccgggccgggcgggcccgcgggcggACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAccgcccgcgggcccgcccggcccgggcgccggccgAAATATAAGAATTGTTAGCAAGAAATCTATG 19634 ACCTATCATTCANNNNNNNN 19826 msD莖變異體 Var0093 CATAGATTTCTTGGCCTTTATGCTGTGGTGtgtCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19635 ACCTATCATTCANNNNNNNN 19827 msR環變異體 Var0094 CATAGATTTCTTGGCCTTTATGCTGTGGTGgttCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19636 ACCTATCATTCANNNNNNNN 19828 msR環變異體 Var0095 CATAGATTTCTTGGCCTTTATGCTGTGGTGttcCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19637 ACCTATCATTCANNNNNNNN 19829 msR環變異體 Var0096 CATAGATTTCTTGGCCTTTATGCTGTGGTGtctCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19638 ACCTATCATTCANNNNNNNN 19830 msR環變異體 Var0097 CATAGATTTCTTGGCCTTTATGCTGTGGTGcttCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19639 ACCTATCATTCANNNNNNNN 19831 msR環變異體 Var0098 CATAGATTTCTTGGCCTTTATGCTGTGGTGtttCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19640 ACCTATCATTCANNNNNNNN 19832 msR環變異體 Var0099 CATAGATTTCTTGGCCTTTATGCTGTGGTGttttCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19641 ACCTATCATTCANNNNNNNN 19833 msR環變異體 Var0100 CATAGATTTCTTGGCCTTTATGCTGTGGTGttgcCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19642 ACCTATCATTCANNNNNNNN 19834 msR環變異體 Var0101 CATAGATTTCTTGGCCTTTATGCTGTGGTGcttgCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19643 ACCTATCATTCANNNNNNNN 19835 msR環變異體 Var0102 CATAGATTTCTTGGCCTTTATtataaTTGttataGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19644 ACCTATCATTCANNNNNNNN 19836 msR莖變異體 Var0103 CATAGATTTCTTGGCCTTTATtgataTTGtatcaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19645 ACCTATCATTCANNNNNNNN 19837 msR莖變異體 Var0104 CATAGATTTCTTGGCCTTTATtctacTTGgtagaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19646 ACCTATCATTCANNNNNNNN 19838 msR莖變異體 Var0105 CATAGATTTCTTGGCCTTTATgatccTTGggatcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19647 ACCTATCATTCANNNNNNNN 19839 msR莖變異體 Var0106 CATAGATTTCTTGGCCTTTATacgcgTTGcgcgtGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19648 ACCTATCATTCANNNNNNNN 19840 msR莖變異體 Var0107 CATAGATTTCTTGGCCTTTATcgcgcTTGgcgcgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19649 ACCTATCATTCANNNNNNNN 19841 msR莖變異體 Var0108 CATAGATTTCTTGGCCTTTATtattaaatttTTGaaatttaataGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19650 ACCTATCATTCANNNNNNNN 19842 msR莖變異體 Var0109 CATAGATTTCTTGGCCTTTATtataatacgaTTGtcgtattataGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19651 ACCTATCATTCANNNNNNNN 19843 msR莖變異體 Var0110 CATAGATTTCTTGGCCTTTATttctgccaatTTGattggcagaaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19652 ACCTATCATTCANNNNNNNN 19844 msR莖變異體 Var0111 CATAGATTTCTTGGCCTTTATtctcctcgagTTGctcgaggagaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19653 ACCTATCATTCANNNNNNNN 19845 msR莖變異體 Var0112 CATAGATTTCTTGGCCTTTATccgggttcgcTTGgcgaacccggGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19654 ACCTATCATTCANNNNNNNN 19846 msR莖變異體 Var0113 CATAGATTTCTTGGCCTTTATggccgggcccTTGgggcccggccGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19655 ACCTATCATTCANNNNNNNN 19847 msR莖變異體 Var0114 CATAGATTTCTTGGCCTTTATaaatttaattataaaTTGtttataattaaatttGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19656 ACCTATCATTCANNNNNNNN 19848 msR莖變異體 Var0115 CATAGATTTCTTGGCCTTTATatattctattacttgTTGcaagtaatagaatatGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19657 ACCTATCATTCANNNNNNNN 19849 msR莖變異體 Var0116 CATAGATTTCTTGGCCTTTATgcgtatagtaatctgTTGcagattactatacgcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19658 ACCTATCATTCANNNNNNNN 19850 msR莖變異體 Var0117 CATAGATTTCTTGGCCTTTATaacgcgaaacgctggTTGccagcgtttcgcgttGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19659 ACCTATCATTCANNNNNNNN 19851 msR莖變異體 Var0118 CATAGATTTCTTGGCCTTTATgaggcgggtgccgcaTTGtgcggcacccgcctcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19660 ACCTATCATTCANNNNNNNN 19852 msR莖變異體 Var0119 CATAGATTTCTTGGCCTTTATgcggccgcgggcgggTTGcccgcccgcggccgcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19661 ACCTATCATTCANNNNNNNN 19853 msR莖變異體 Var0120 CATAGATTTCTTGGCCTTTATaatattatattaaatattatTTGataatatttaatataatattGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19662 ACCTATCATTCANNNNNNNN 19854 msR莖變異體 Var0121 CATAGATTTCTTGGCCTTTATgaacaaactaaatataaatcTTGgatttatatttagtttgttcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19663 ACCTATCATTCANNNNNNNN 19855 msR莖變異體 Var0122 CATAGATTTCTTGGCCTTTATcgatttctaagacttcggtaTTGtaccgaagtcttagaaatcgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19664 ACCTATCATTCANNNNNNNN 19856 msR莖變異體 Var0123 CATAGATTTCTTGGCCTTTATgtcagggaactccaggaggaTTGtcctcctggagttccctgacGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19665 ACCTATCATTCANNNNNNNN 19857 msR莖變異體 Var0124 CATAGATTTCTTGGCCTTTATcgtgggccgcgcttgagccgTTGcggctcaagcgcggcccacgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19666 ACCTATCATTCANNNNNNNN 19858 msR莖變異體 Var0125 CATAGATTTCTTGGCCTTTATcggccgccggcgccggcgcgTTGcgcgccggcgccggcggccgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19667 ACCTATCATTCANNNNNNNN 19859 msR莖變異體 Var0126 CATAGATTTCTTGGCCTTTATattaatatataaatttaattatataTTGtatataattaaatttatatattaatGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19668 ACCTATCATTCANNNNNNNN 19860 msR莖變異體 Var0127 CATAGATTTCTTGGCCTTTATgttatactaataagaattatctgaaTTGttcagataattcttattagtataacGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19669 ACCTATCATTCANNNNNNNN 19861 msR莖變異體 Var0128 CATAGATTTCTTGGCCTTTATttaagatagaggcacttctagtgagTTGctcactagaagtgcctctatcttaaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19670 ACCTATCATTCANNNNNNNN 19862 msR莖變異體 Var0129 CATAGATTTCTTGGCCTTTATcaggcagcacgcacacagccgaaatTTGatttcggctgtgtgcgtgctgcctgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19671 ACCTATCATTCANNNNNNNN 19863 msR莖變異體 Var0130 CATAGATTTCTTGGCCTTTATcaccactggcgccgccagcaggcggTTGccgcctgctggcggcgccagtggtgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19672 ACCTATCATTCANNNNNNNN 19864 msR莖變異體 Var0131 CATAGATTTCTTGGCCTTTATcgcggccgcgccgggcgcgcgcccgTTGcgggcgcgcgcccggcgcggccgcgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19673 ACCTATCATTCANNNNNNNN 19865 msR莖變異體 Var0132 CATAGATTTCTTGGCCTTTATaataattattatataataatatattaataaTTGttattaatatattattatataataattattGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19674 ACCTATCATTCANNNNNNNN 19866 msR莖變異體 Var0133 CATAGATTTCTTGGCCTTTATaataatataatcccttagtctattagtttaTTGtaaactaatagactaagggattatattattGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19675 ACCTATCATTCANNNNNNNN 19867 msR莖變異體 Var0134 CATAGATTTCTTGGCCTTTATtaatttacaggacagggactttactcgatcTTGgatcgagtaaagtccctgtcctgtaaattaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19676 ACCTATCATTCANNNNNNNN 19868 msR莖變異體 Var0135 CATAGATTTCTTGGCCTTTATtatctggcaacggccagagagagcgcctgaTTGtcaggcgctctctctggccgttgccagataGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19677 ACCTATCATTCANNNNNNNN 19869 msR莖變異體 Var0136 CATAGATTTCTTGGCCTTTATgccgtcccgcgcggcctgcccaaacgccctTTGagggcgtttgggcaggccgcgcgggacggcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19678 ACCTATCATTCANNNNNNNN 19870 msR莖變異體 Var0137 CATAGATTTCTTGGCCTTTATcgccgccggccgggcgcgggcgccggcggcTTGgccgccggcgcccgcgcccggccggcggcgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19679 ACCTATCATTCANNNNNNNN 19871 msR莖變異體 Var0138 CATAGATTTCTTGGCCTTTATaatatatataaatattaataataatattaataaatTTGatttattaatattattattaatatttatatatattGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19680 ACCTATCATTCANNNNNNNN 19872 msR莖變異體 Var0139 CATAGATTTCTTGGCCTTTATttaatatctaacaattattgaagttgctttatttcTTGgaaataaagcaacttcaataattgttagatattaaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19681 ACCTATCATTCANNNNNNNN 19873 msR莖變異體 Var0140 CATAGATTTCTTGGCCTTTATcagttatagcgcgaactagtttgacctgatttgtaTTGtacaaatcaggtcaaactagttcgcgctataactgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19682 ACCTATCATTCANNNNNNNN 19874 msR莖變異體 Var0141 CATAGATTTCTTGGCCTTTATatagactggccgcccgtaggaacttgaggacgcacTTGgtgcgtcctcaagttcctacgggcggccagtctatGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19683 ACCTATCATTCANNNNNNNN 19875 msR莖變異體 Var0142 CATAGATTTCTTGGCCTTTATgcgagcgcgaggacgccacgagcacgcgccctgccTTGggcagggcgcgtgctcgtggcgtcctcgcgctcgcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19684 ACCTATCATTCANNNNNNNN 19876 msR莖變異體 Var0143 CATAGATTTCTTGGCCTTTATcggccggcgcccgggccgggcgggcccgcgggcggTTGccgcccgcgggcccgcccggcccgggcgccggccgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19685 ACCTATCATTCANNNNNNNN 19877 msR莖變異體 Var0144 CATAGATTTCTTGgcatGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19686 ACCTATCATTCANNNNNNNN 19878 最小變異體 Var0145 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggtgtcaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19687 ACCTATCATTCANNNNNNNN 19879 最小變異體 Var0146 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatacactaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19688 ACCTATCATTCANNNNNNNN 19880 最小變異體 Var0147 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19689 ACCTATCATTCANNNNNNNN 19881 最小變異體 Var0148 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaataGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19690 ACCTATCATTCANNNNNNNN 19882 最小變異體 Var0149 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggcatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19691 ACCTATCATTCANNNNNNNN 19883 最小變異體 Var0150 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagattttaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19692 ACCTATCATTCANNNNNNNN 19884 最小變異體 Var0151 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggtaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19693 ACCTATCATTCANNNNNNNN 19885 最小變異體 Var0152 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCaaagaattgttagCAAGAAATCTATG 19694 ACCTATCATTCANNNNNNNN 19886 最小變異體 Var0153 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCaaatataagaaagCAAGAAATCTATG 19695 ACCTATCATTCANNNNNNNN 19887 最小變異體 Var0154 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCaatgttagCAAGAAATCTATG 19696 ACCTATCATTCANNNNNNNN 19888 最小變異體 Var0155 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCaaatatagCAAGAAATCTATG 19697 ACCTATCATTCANNNNNNNN 19889 最小變異體 Var0156 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCaaagCAAGAAATCTATG 19698 ACCTATCATTCANNNNNNNN 19890 最小變異體 Var0157 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatacacataattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19699 ACCTATCATTCANNNNNNNN 19891 終止變異體 Var0158 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatacacattattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19700 ACCTATCATTCANNNNNNNN 19892 終止變異體 Var0159 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatacaaatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19701 ACCTATCATTCANNNNNNNN 19893 終止變異體 Var0160 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatacatatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19702 ACCTATCATTCANNNNNNNN 19894 終止變異體 Var0161 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatacaaataattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19703 ACCTATCATTCANNNNNNNN 19895 終止變異體 Var0162 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatacaaattattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19704 ACCTATCATTCANNNNNNNN 19896 終止變異體 Var0163 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatacatataattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19705 ACCTATCATTCANNNNNNNN 19897 終止變異體 Var0164 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatacatattattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19706 ACCTATCATTCANNNNNNNN 19898 終止變異體 Var0165 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaataaacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19707 ACCTATCATTCANNNNNNNN 19899 終止變異體 Var0166 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatatacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19708 ACCTATCATTCANNNNNNNN 19900 終止變異體 Var0167 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaataaaaatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19709 ACCTATCATTCANNNNNNNN 19901 終止變異體 Var0168 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaataaatatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19710 ACCTATCATTCANNNNNNNN 19902 終止變異體 Var0169 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatataaatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19711 ACCTATCATTCANNNNNNNN 19903 終止變異體 Var0170 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatatatatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19712 ACCTATCATTCANNNNNNNN 19904 終止變異體 Var0171 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtaaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19713 ACCTATCATTCANNNNNNNN 19905 終止變異體 Var0172 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgttaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19714 ACCTATCATTCANNNNNNNN 19906 終止變異體 Var0173 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtaaaataaacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19715 ACCTATCATTCANNNNNNNN 19907 終止變異體 Var0174 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtaaaatatacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19716 ACCTATCATTCANNNNNNNN 19908 終止變異體 Var0175 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgttaaataaacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19717 ACCTATCATTCANNNNNNNN 19909 終止變異體 Var0176 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgttaaatatacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19718 ACCTATCATTCANNNNNNNN 19910 終止變異體 Var0177 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttatcaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19719 ACCTATCATTCANNNNNNNN 19911 終止變異體 Var0178 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttttcaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19720 ACCTATCATTCANNNNNNNN 19912 終止變異體 Var0179 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttataaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19721 ACCTATCATTCANNNNNNNN 19913 終止變異體 Var0180 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttattaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19722 ACCTATCATTCANNNNNNNN 19914 終止變異體 Var0181 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttttaaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19723 ACCTATCATTCANNNNNNNN 19915 終止變異體 Var0182 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagattttttaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19724 ACCTATCATTCANNNNNNNN 19916 終止變異體 Var0183 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatacacatccttaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19725 ACCTATCATTCANNNNNNNN 19917 終止變異體 Var0184 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcaaatacacatcgttaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19726 ACCTATCATTCANNNNNNNN 19918 終止變異體 Var0185 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggaggtttgtcaaatacacaccattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19727 ACCTATCATTCANNNNNNNN 19919 終止變異體 Var0186 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagctttgtcaaatacacagcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19728 ACCTATCATTCANNNNNNNN 19920 終止變異體 Var0187 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagattggtcaaatacccatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19729 ACCTATCATTCANNNNNNNN 19921 終止變異體 Var0188 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagattcgtcaaatacgcatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19730 ACCTATCATTCANNNNNNNN 19922 終止變異體 Var0189 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtccaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19731 ACCTATCATTCANNNNNNNN 19923 終止變異體 Var0190 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgtcgaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19732 ACCTATCATTCANNNNNNNN 19924 終止變異體 Var0191 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggagatttgccaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19733 ACCTATCATTCANNNNNNNN 19925 終止變異體 191 ncRNA variants were designed, in which mutations were introduced into the following ncRNA elements: (1) a1a2; (2) msD stem; (3) msR stem; (4) msR loop; (5) terminator sequence (within the msR spacer); (6) msD, msR spacer base deletion (minimal variant). In addition, there is also a positive control (WT R6342S ncRNA) and a negative control with a disrupted reverse transcription start site. Each of these variants is associated with several unique barcodes designed in the donor region targeting the human EMX1 gene (e.g., "ACCTATCATTCANNNNNNNN"). The variant library was synthesized as a pooled oligo library. The synthetic oligo library was assembled into a pooled DNA library from which pooled ncRNAs were generated by in vitro transcription. ncRNA variant sequences and control sequences are provided in Table 31A. Representative library ncRNA constructs are depicted in Figure 112. The variant types in the library are depicted in Figure 113. Table 31A - ncRNA variant sequences and barcodes Mutant DNA sequence (variable regions are marked with lowercase letters) SEQ ID NO: Barcode SEQ ID NO: Type Positive control (WT) CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG ACCTATCATTCANNNNNNNN Positive control Negative control CATAGATTTCTTatCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG ACCTATCATTCANNNNNNNN Negative control Var0001 tataaGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGttata 19543 ACCTATCATTCANNNNNNNN 19735 a1a2 variant Var0002 tgataGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGtatca 19544 ACCTATCATTCANNNNNNNN 19736 a1a2 variant Var0003 tctacGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgtaga 19545 ACCTATCATTCANNNNNNNN 19737 a1a2 variant Var0004 gatccGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgggatc 19546 ACCTATCATTCANNNNNNNN 19738 a1a2 variant Var0005 acgcgGCCTTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGcgcgt 19547 ACCTATCATTCANNNNNNNN 19739 a1a2 variant Var0006 cgcgcGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgcgcg 19548 ACCTATCATTCANNNNNNNN 19740 a1a2 variant Var0007 tattaaatttGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGaaatttaata 19549 ACCTATCATTCANNNNNNNN 19741 a1a2 variant Var0008 tataatacgaGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGtcgtattata 19550 ACCTATCATTCANNNNNNNN 19742 a1a2 variant Var0009 ttctgccaatGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGattggcagaa 19551 ACCTATCATTCANNNNNNNN 19743 a1a2 variant Var0010 tctcctcgagGCCTTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGctcgaggaga 19552 ACCTATCATTCANNNNNNNN 19744 a1a2 variant Var0011 ccgggttcgcGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGgcga acccgg 19553 ACCTATCATTCANNNNNNNN 19745 a1a2 variant Var0012 ggccgggcccGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGgggcccggcc 19554 ACCTATCATTCANNNNNNNN 19746 a1a2 variant Var0013 aaatttaattataaaGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGtttataatta aattt 19555 ACCTATCATTCANNNNNNNN 19747 a1a2 variant Var0014 atattctattacttgGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGcaagtaataga atat 19556 ACCTATCATTCANNNNNNNN 19748 a1a2 variant Var0015 gcgtatagtaatctgGCCTTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGcagatt actatacgc 19557 ACCTATCATTCANNNNNNNN 19749 a1a2 variant Var0016 aacgcgaaacgctggGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGccag cgtttcgcgtt 19558 ACCTATCATTCANNNNNNNN 19750 a1a2 variant Var0017 gaggcgggtgccgcaGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGtg cggcacccgcctc 19559 ACCTATCATTCANNNNNNNN 19751 a1a2 variant Var0018 gcggccgcgggcgggGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGc ccgcccgcggccgc 19560 ACCTATCATTCANNNNNNNN 19752 a1a2 variant Var0019 aatattatattaaatattatGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGataatat ttaatataatatt 19561 ACCTATCATTCANNNNNNNN 19753 a1a2 variant Var0020 gaacaaactaaatataaatcGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTA Ggattttatatttagtttgttc 19562 ACCTATCATTCANNNNNNNN 19754 a1a2 variant Var0021 cgatttctaagacttcggtaGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAG taccgaagtcttagaaatcg 19563 ACCTATCATTCANNNNNNNN 19755 a1a2 variant Var0022 gtcagggaactccaggaggaGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGt cctcctggagttccctgac 19564 ACCTATCATTCANNNNNNNN 19756 a1a2 variant Var0023 cgtgggccgcgcttgagccgGCCTTTTATGCTGTGGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATT GTTAGcggctcaagcgcggcccacg 19565 ACCTATCATTCANNNNNNNN 19757 a1a2 variant Var0024 cggccgccggcgccggcgcgGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGT TAGcgcgccggcgccggcggccg 19566 ACCTATCATTCANNNNNNNN 19758 a1a2 variant Var0025 attaatatataaatttaattatataGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGt atataattaaatttatatattaat 19567 ACCTATCATTCANNNNNNNN 19759 a1a2 variant Var0026 gttatactaataagaattatctgaaGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAG ttcagataattctttattagtataac 19568 ACCTATCATTCANNNNNNNN 19760 a1a2 variant Var0027 ttaagatagaggcacttctagtgagGCCTTTTATGCTGTGGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGT TAGctcactagaagtgcctctatcttaa 19569 ACCTATCATTCANNNNNNNN 19761 a1a2 variant Var0028 caggcagcacgcacacagccgaaatGCCTTTTATGCTGTGGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATT GTTAGatttcggctgtgtgcgtgctgcctg 19570 ACCTATCATTCANNNNNNNN 19762 a1a2 variant Var0029 caccactggcgccgccagcaggcggGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGT TAGccgcctgctggcggcgccagtggtg 19571 ACCTATCATTCANNNNNNNN 19763 a1a2 variant Var0030 cgcggccgcgccgggcgcgcgcccgGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAA TATAAGAATTGTTAGcgggcgcgcgcccggcgcggccgcg 19572 ACCTATCATTCANNNNNNNN 19764 a1a2 variant Var0031 aataattattatataataatatattaataaGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAG ttattaatatattattatataataattatt 19573 ACCTATCATTCANNNNNNNN 19765 a1a2 variant Var0032 aataataatcccttagtctattagtttaGCCTTTTATGCTGTGGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGT TAGtaaactaatagactaagggattatattatt 19574 ACCTATCATTCANNNNNNNN 19766 a1a2 variant Var0033 taatttacaggacagggactttactcgatcGCCTTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAA GAATTGTTAGgatcgagtaaagtccctgtcctgtaaatta 19575 ACCTATCATTCANNNNNNNN 19767 a1a2 variant Var0034 tatctggcaacggccagagagagcgcctgaGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATA AGAATTGTTAGtcaggcgctctctctggccgttgccagata 19576 ACCTATCATTCANNNNNNNN 19768 a1a2 variant Var0035 gccgtcccgcgcggcctgcccaaacgccctGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACC AAATATAAGAATTGTTAGagggcgtttgggcaggccgcgcgggacggc 19577 ACCTATCATTCANNNNNNNN 19769 a1a2 variant Var0036 cgccgccggccgggcgcgggcgccggcggcGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACC AAATATAAGAATTGTTAGgccgccggcgcccgcgcccggccggcggcg 19578 ACCTATCATTCANNNNNNNN 19770 a1a2 variant Var0037 aatatataaatattaataataataataaatGCCTTTTATGCTGTGGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATT GTTAGatttattaatattattattaatatttatatatatt 19579 ACCTATCATTCANNNNNNNN 19771 a1a2 variant Var0038 ttaatatctaacaattattgaagttgctttatttcGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACC AAATATAAGAATTGTTAGgaaataaagcaacttcaataattgttagatattaa 19580 ACCTATCATTCANNNNNNNN 19772 a1a2 variant Var0039 cagttatagcgcgaactagtttgacctgatttgtaGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACC AAATATAAGAATTGTTAGtacaaatcaggtcaaactagttcgcgctataactg 19581 ACCTATCATTCANNNNNNNN 19773 a1a2 variant Var0040 atagactggccgcccgtaggaacttgaggacgcacGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAAT ATAAGAATTGTTAGgtgcgtcctcaagttcctacgggcggccagtctat 19582 ACCTATCATTCANNNNNNNN 19774 a1a2 variant Var0041 gcgagcgcgaggacgccacgagcacgcgccctgccGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTC ACACAACCAAATATAAGAATTGTTAGggcagggcgcgtgctcgtggcgtcctcgcgctcgc 19583 ACCTATCATTCANNNNNNNN 19775 a1a2 variant Var0042 cggccggcgcccgggccgggcgggcccgcgggcggGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTC ACACAACCAAATATAAGAATTGTTAGccgcccgcgggcccgcccggcccgggcgccggccg 19584 ACCTATCATTCANNNNNNNN 19776 a1a2 variant Var0043 catagagGCCTTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGctctatg 19585 ACCTATCATTCANNNNNNNN 19777 a1a2 variant Var0044 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgcgcgcaattcgctacgctaccaatactgtgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGA TGTCAcacagtattggtgctacgctgaagtgtcacaaccAAATATAAGAATTGTTAGCAAGAAATCTATG 19586 ACCTATCATTCANNNNNNNN 19778 msD stem variant Var0045 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgcgcgcaattcgctacgctaccaataACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAttatt ggtgctacgctgaagtgtcacaaccAAATATAAGAATTGTTAGCAAGAAATCTATG 19587 ACCTATCATTCANNNNNNNN 19779 msD stem variant Var0046 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgcgcgcaattcgctacgctacACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgtg ctacgctgaagtgtcacaaccAAATATAAGAATTGTTAGCAAGAAATCTATG 19588 ACCTATCATTCANNNNNNNN 19780 msD stem variant Var0047 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgcgcgcaattcgctacACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtacgctgaagt gtcacaaccAAATATAAGAATTGTTAGCAAGAAATCTATG 19589 ACCTATCATTCANNNNNNNN 19781 msD stem variant Var0048 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgcgcgcaattcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgaagtgtcacaacc AAATATAAGAATTGTTAGCAAGAAATCTATG 19590 ACCTATCATTCANNNNNNNN 19782 msD stem variant Var0049 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgcgcgcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgtgtcacaaccAAATATA AGAATTGTTAGCAAGAAATCTATG 19591 ACCTATCATTCANNNNNNNN 19783 msD stem variant Var0050 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAggttgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcaaccAAATATAAGAATTGTTAGCAAGAAATC TATG 19592 ACCTATCATTCANNNNNNNN 19784 msD stem variant Var0051 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAtataaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAttataAAATATAAGAATTGTTAGCAAGAAATCTATG 19593 ACCTATCATTCANNNNNNNN 19785 msD stem variant Var0052 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAtgataACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtatcaAAATATAAGAATTGTTAGCAAGAAATCTATG 19594 ACCTATCATTCANNNNNNNN 19786 msD stem variant Var0053 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAtctacACAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgtagaAAATATAAGAATTGTTAGCAAGAAATCTATG 19595 ACCTATCATTCANNNNNNNN 19787 msD stem variant Var0054 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAgatccACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAggatcAAATATAAGAATTGTTAGCAAGAAATCTATG 19596 ACCTATCATTCANNNNNNNN 19788 msD stem variant Var0055 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAacgcgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcgcgtAAATATAAGAATTGTTAGCAAGAAA TCTATG 19597 ACCTATCATTCANNNNNNNN 19789 msD stem variant Var0056 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAcgcgcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgcgcgAAATATAAGAATTGTTAGCAA GAAATCTATG 19598 ACCTATCATTCANNNNNNNN 19790 msD stem variant Var0057 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAtattaaatttACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAaaatttaataAAATATAAGAATTGTTAGCAA GAAATCTATG 19599 ACCTATCATTCANNNNNNNN 19791 msD stem variant Var0058 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAtataatacgaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtcgtattataAAATATAAGAATTGTTAGCAA GAAATCTATG 19600 ACCTATCATTCANNNNNNNN 19792 msD stem variant Var0059 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAttctgccaatACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAattggcagaaAAATATAAGAATTGTTA GCAAGAAATCTATG 19601 ACCTATCATTCANNNNNNNN 19793 msD stem variant Var0060 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAtctcctcgagACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCActcgaggagaAAATATAAGAATTGTTA GCAAGAAATCTATG 19602 ACCTATCATTCANNNNNNNN 19794 msD stem variant Var0061 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAccgggttcgcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgcgaacccggAAATATAAGAATT GTTAGCAAGAAATCTATG 19603 ACCTATCATTCANNNNNNNN 19795 msD stem variant Var0062 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAggccgggcccACAAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgggcccggccAAATATAAGAATTGTTA GCAAGAAATCTATG 19604 ACCTATCATTCANNNNNNNN 19796 msD stem variant Var0063 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAaaatttaattataaaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtttataattaaatttAAATATAAGAATT GTTAGCAAGAAATCTATG 19605 ACCTATCATTCANNNNNNNN 19797 msD stem variant Var0064 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAatattctattacttgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcaagtaatagaatatAAATATAAGAATT GTTAGCAAGAAATCTATG 19606 ACCTATCATTCANNNNNNNN 19798 msD stem variant Var0065 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAgcgtatagtaatctgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcagattactatacgcAAATATAA GAATTGTTAGCAAGAAATCTATG 19607 ACCTATCATTCANNNNNNNN 19799 msD stem variant Var0066 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAaacgcgaaacgctggACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAccagcgtttcgc gttAAATATAAGAATTGTTAGCAAGAAATCTATG 19608 ACCTATCATTCANNNNNNNN 19800 msD stem variant Var0067 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAgaggcgggtgccgcaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtgcggcacccgcctc AAATATAAGAATTGTTAGCAAGAAATCTATG 19609 ACCTATCATTCANNNNNNNN 19801 msD stem variant Var0068 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAgcggccgcgggcgggACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcccgcccgcggccg cAAATATAAGAATTGTTAGCAAGAAATCTATG 19610 ACCTATCATTCANNNNNNNN 19802 msD stem variant Var0069 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAaatattatattaaatattatACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAataatatttaataataatattAAATATA AGAATTGTTAGCAAGAAATCTATG 19611 ACCTATCATTCANNNNNNNN 19803 msD stem variant Var0070 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAgaacaaactaaatataaatcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgatttatatttagtt tgttcAAATATAAGAATTGTTAGCAAGAAATCTATG 19612 ACCTATCATTCANNNNNNNN 19804 msD stem variant Var0071 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAcgatttctaagacttcggtaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtaccgaagtcttaga aatcgAAATATAAGAATTGTTAGCAAGAAATCTATG 19613 ACCTATCATTCANNNNNNNN 19805 msD stem variant Var0072 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAgtcagggaactccaggaggaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtcctcctggagttcc ctgacAAATATAAGAATTGTTAGCAAGAAATCTATG 19614 ACCTATCATTCANNNNNNNN 19806 msD stem variant Var0073 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAcgtgggccgcgcttgagccgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcggctcaagc gcggcccacgAAATATAAGAATTGTTAGCAAGAAATCTATG 19615 ACCTATCATTCANNNNNNNN 19807 msD stem variant Var0074 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAcggccgccggcgccggcgcgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAcgcgccggc gccggcggccgAAATATAAGAATTGTTAGCAAGAAATCTATG 19616 ACCTATCATTCANNNNNNNN 19808 msD stem variant Var0075 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAattaatatataaatttaattatataACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAttataattaaatttatatattaat AAATATAAGAATTGTTAGCAAGAAATCTATG 19617 ACCTATCATTCANNNNNNNN 19809 msD stem variant Var0076 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAgttatactaataagaattatctgaaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAttcagataattctttat tagtataacAAATATAAGAATTGTTAGCAAGAAATCTATG 19618 ACCTATCATTCANNNNNNNN 19810 msD stem variant Var0077 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAttaagatagaggcacttctagtgagACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCActcactagaagtg cctctatcttaaAAATATAAGAATTGTTAGCAAGAAATCTATG 19619 ACCTATCATTCANNNNNNNN 19811 msD stem variant Var0078 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAcaggcagcacgcacacagccgaaatACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAatttcggct gtgtgcgtgctgcctgAAATATAAGAATTGTTAGCAAGAAATCTATG 19620 ACCTATCATTCANNNNNNNN 19812 msD stem variant Var0079 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAcaccactggcgccgccagcaggcggACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAccgcctgct ggcggcgccagtggtgAAATATAAGAATTGTTAGCAAGAAATCTATG 19621 ACCTATCATTCANNNNNNNN 19813 msD stem variant Var0080 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAcgcggccgcgccgggcgcgcgcccgACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAc gggcgcgcgcccggcgcggccgcgAAATATAAGAATTGTTAGCAAGAAATCTATG 19622 ACCTATCATTCANNNNNNNN 19814 msD stem variant Var0081 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAaataattattatataataatatattaataaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAttattatatattata ataattattAAATATAAGAATTGTTAGCAAGAAATCTATG 19623 ACCTATCATTCANNNNNNNN 19815 msD stem variant Var0082 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAaataatataatcccttagtctattagtttaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtaaactaatag actaagggattatattattAAATATAAGAATTGTTAGCAAGAAATCTATG 19624 ACCTATCATTCANNNNNNNN 19816 msD stem variant Var0083 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAtaatttacaggacagggactttactcgatcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAgatcga gtaaagtccctgtcctgtaaattaAAATATAAGAATTGTTAGCAAGAAATCTATG 19625 ACCTATCATTCANNNNNNNN 19817 msD stem variant Var0084 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAtatctggcaacggccagagagagcgcctgaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAtcaggc gctctctctggccgttgccagataAAATATAAGAATTGTTAGCAAGAAATCTATG 19626 ACCTATCATTCANNNNNNNN 19818 msD stem variant Var0085 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAgccgtcccgcgcggcctgcccaaacgccctACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGT CAagggcgtttgggcaggccgcgcgggacggcAAATATAAGAATTGTTAGCAAGAAATCTATG 19627 ACCTATCATTCANNNNNNNN 19819 msD stem variant Var0086 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAcgccgccggccgggcggggcgccggcggcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGT CAgccgccggcgcccgcgcccggccggcggcgAAATATAAGAATTGTTAGCAAGAAATCTATG 19628 ACCTATCATTCANNNNNNNN 19820 msD stem variant Var0087 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAaatatataaaatattaataataatataataaatACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAatttattaattatt attattaatatttatatatattAAATATAAGAATTGTTAGCAAGAAATCTATG 19629 ACCTATCATTCANNNNNNNN 19821 msD stem variant Var0088 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAttaatatctaacaattattgaagttgctttatttcACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA gaaataaagcaacttcaataattgttagatattaaAAATATAAGAATTGTTAGCAAGAAATCTATG 19630 ACCTATCATTCANNNNNNNN 19822 msD stem variant Var0089 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAcagttatagcgcgaactagtttgacctgatttgtaACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGT CAtacaaatcaggtcaaactagttcgcgctataactgAAATATAAGAATTGTTAGCAAGAAATCTATG 19631 ACCTATCATTCANNNNNNNN 19823 msD stem variant Var0090 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAatagactggccgcccgtaggaacttgaggacgcacACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCAg tgcgtcctcaagttcctacgggcggccagtctatAAATATAAGAATTGTTAGCAAGAAATCTATG 19632 ACCTATCATTCANNNNNNNN 19824 msD stem variant Var0091 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAgcgagcgcgaggacgccacgagcacgcgccctgccACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACAT CGATGTCAggcagggcgcgtgctcgtggcgtcctcgcgctcgcAAATATAAGAATTGTTAGCAAGAAATCTATG 19633 ACCTATCATTCANNNNNNNN 19825 msD stem variant Var0092 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAcggccggcgcccgggccgggcgggcccgcgggcggACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACAT CGATGTCAccgcccgcgggcccgcccggcccgggcgccggccgAAATATAAGAATTGTTAGCAAGAAATCTATG 19634 ACCTATCATTCANNNNNNNN 19826 msD stem variant Var0093 CATAGATTTCTTGGCCTTTATGCTGTGGTGtgtCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19635 ACCTATCATTCANNNNNNNN 19827 msR cyclomutant Var0094 CATAGATTTCTTGGCCTTTATGCTGTGGTGgttCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19636 ACCTATCATTCANNNNNNNN 19828 msR cyclomutant Var0095 CATAGATTTCTTGGCCTTTATGCTGTGGTGttcCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19637 ACCTATCATTCANNNNNNNN 19829 msR cyclomutant Var0096 CATAGATTTCTTGGCCTTTATGCTGTGGTGtctCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19638 ACCTATCATTCANNNNNNNN 19830 msR cyclomutant Var0097 CATAGATTTCTTGGCCTTTATGCTGTGGTGcttCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19639 ACCTATCATTCANNNNNNNN 19831 msR cyclomutant Var0098 CATAGATTTCTTGGCCTTTATGCTGTGGTGtttCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19640 ACCTATCATTCANNNNNNNN 19832 msR cyclomutant Var0099 CATAGATTTCTTGGCCTTTATGCTGTGGTGttttCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCACCAAATATAAGAATTGTTAGCAAGAAA TCTATG 19641 ACCTATCATTCANNNNNNNN 19833 msR cyclomutant Var0100 CATAGATTTCTTGGCCTTTATGCTGTGGTGttgcCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGCAAGAAA TCTATG 19642 ACCTATCATTCANNNNNNNN 19834 msR cyclomutant Var0101 CATAGATTTCTTGGCCTTTATGCTGTGGTGcttgCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19643 ACCTATCATTCANNNNNNNN 19835 msR cyclomutant Var0102 CATAGATTTCTTGCCTTTAttataaTTGttataGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19644 ACCTATCATTCANNNNNNNN 19836 msR stem variant Var0103 CATAGATTTCTTGCCTTTTATtgataTTGtatcaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19645 ACCTATCATTCANNNNNNNN 19837 msR stem variant Var0104 CATAGATTTCTTGGCCTTTATctacTTGgtagaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19646 ACCTATCATTCANNNNNNNN 19838 msR stem variant Var0105 CATAGATTTCTTGGCCTTTATgatccTTGggatcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19647 ACCTATCATTCANNNNNNNN 19839 msR stem variant Var0106 CATAGATTTCTTGGCCTTTATacgcgTTGcgcgtGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19648 ACCTATCATTCANNNNNNNN 19840 msR stem variant Var0107 CATAGATTTCTTGGCCTTTATcgcgcTTGgcgcgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTA TG 19649 ACCTATCATTCANNNNNNNN 19841 msR stem variant Var0108 CATAGATTTCTTGGCCTTTATattaaatttTTGaaatttaataGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19650 ACCTATCATTCANNNNNNNN 19842 msR stem variant Var0109 CATAGATTTCTTGCCTTTATataatacgaTTGtcgtattataGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCACCAAATATAAGAATTGTTAGCAAGAAA TCTATG 19651 ACCTATCATTCANNNNNNNN 19843 msR stem variant Var0110 CATAGATTTCTTGCCTTTTATttctgccaatTTGattggcagaaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGCAAG AAATCTATG 19652 ACCTATCATTCANNNNNNNN 19844 msR stem variant Var0111 CATAGATTTCTTGCCTTTATctcctcgagTTGctcgaggagaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGCAAG AAATCTATG 19653 ACCTATCATTCANNNNNNNN 19845 msR stem variant Var0112 CATAGATTTCTTGGCCTTTATccgggttcgcTTGgcgaacccggGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAG CAAGAAATCTATG 19654 ACCTATCATTCANNNNNNNN 19846 msR stem variant Var0113 CATAGATTTCTTGGCCTTTATggccgggcccTTGgggcccggccGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGCAAG AAATCTATG 19655 ACCTATCATTCANNNNNNNN 19847 msR stem variant Var0114 CATAGATTTCTTGGCCTTTAaatttaattataaaTTGtttataattaaatttGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAG CAAGAAATCTATG 19656 ACCTATCATTCANNNNNNNN 19848 msR stem variant Var0115 CATAGATTTCTTGGCCTTTATattattctattacttgTTGcaagtaagaatatGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAG CAAGAAATCTATG 19657 ACCTATCATTCANNNNNNNN 19849 msR stem variant Var0116 CATAGATTTCTTGGCCTTTATgcgtatagtaatctgTTGcagattactatacgcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATT GTTAGCAAGAAATCTATG 19658 ACCTATCATTCANNNNNNNN 19850 msR stem variant Var0117 CATAGATTTCTTGGCCTTTATaacgcgaaacgctggTTGccagcgtttcgcgttGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAA TATAAGAATTGTTAGCAAGAAATCTATG 19659 ACCTATCATTCANNNNNNNN 19851 msR stem variant Var0118 CATAGATTTCTTGGCCTTTATgaggcgggtgccgcaTTGtgcggcacccgcctcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAAT ATAAGAATTGTTAGCAAGAAATCTATG 19660 ACCTATCATTCANNNNNNNN 19852 msR stem variant Var0119 CATAGATTTCTTGGCCTTTTATgcggccgcgggcgggTTGcccgcccgcggccgcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAA TATAAGAATTGTTAGCAAGAAATCTATG 19661 ACCTATCATTCANNNNNNNN 19853 msR stem variant Var0120 CATAGATTTCTTGGCCTTTATaatattatattaaatattatTTGataatatttaataatattGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTA GCAAGAAATCTATG 19662 ACCTATCATTCANNNNNNNN 19854 msR stem variant Var0121 CATAGATTTCTTGCCTTTATgaacaaactaaatataaatcTTGgatttatatttagtttgttcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACA ACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19663 ACCTATCATTCANNNNNNNN 19855 msR stem variant Var0122 CATAGATTTCTTGGCCTTTATcgatttctaagacttcggtaTTGtaccgaagtcttagaaatcgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACC AAATATAAGAATTGTTAGCAAGAAATCTATG 19664 ACCTATCATTCANNNNNNNN 19856 msR stem variant Var0123 CATAGATTTCTTGGCCTTTATgtcagggaactccaggaggaTTGtcctcctggagttccctgacGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACC AAATATAAGAATTGTTAGCAAGAAATCTATG 19665 ACCTATCATTCANNNNNNNN 19857 msR stem variant Var0124 CATAGATTTCTTGGCCTTTATcgtgggccgcgcttgagccgTTGcggctcaagcgcggcccacgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATG TCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19666 ACCTATCATTCANNNNNNNN 19858 msR stem variant Var0125 CATAGATTTCTTGGCCTTTATcggccgccggcgccggcgcgTTGcgcgccggcgccggcggccgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATG TCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19667 ACCTATCATTCANNNNNNNN 19859 msR stem variant Var0126 CATAGATTTCTTGCCTTTTattaataatatataaatttaattatataTTGtatataattaaatttatatattaatGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAAT ATAAGAATTGTTAGCAAGAAATCTATG 19668 ACCTATCATTCANNNNNNNN 19860 msR stem variant Var0127 CATAGATTTCTTGCCTTTTATgttatactaataagaattatctgaaTTGttcagataattcttattagtataacGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACA CAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19669 ACCTATCATTCANNNNNNNN 19861 msR stem variant Var0128 CATAGATTTCTTGCCTTTTATTttaagatagaggcacttctagtgagTTGctcactagaagtgcctctatcttaaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATG TCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19670 ACCTATCATTCANNNNNNNN 19862 msR stem variant Var0129 CATAGATTTCTTGGCCTTTATcaggcagcacgcacacagccgaaatTTGatttcggctgtgtgcgtgctgcctgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATG GGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19671 ACCTATCATTCANNNNNNNN 19863 msR stem variant Var0130 CATAGATTTCTTGGCCTTTTATcaccactggcgccgccagcaggcggTTGccgcctgctggcggcgccagtggtgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGG ACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19672 ACCTATCATTCANNNNNNNN 19864 msR stem variant Var0131 CATAGATTTCTTGGCCTTTATcgcggccgcgccgggcgcgcgcccgTTGcgggcgcgcgcccggcgcggccgcgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACG AAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19673 ACCTATCATTCANNNNNNNN 19865 msR stem variant Var0132 CATAGATTTCTTGGCCTTTAataataattattatataataatatattaataaTTGttattaatatattattatataataattattGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAAT ATAAGAATTGTTAGCAAGAAATCTATG 19674 ACCTATCATTCANNNNNNNN 19866 msR stem variant Var0133 CATAGATTTCTTGGCCTTTATaataataatcccttagtctattagtttaTTGtaaactaatagactaagggattatattattGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTC ACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19675 ACCTATCATTCANNNNNNNN 19867 msR stem variant Var0134 CATAGATTTCTTGGCCTTTTATtaatttacaggacagggactttactcgatcTTGgatcgagtaaagtccctgtcctgtaaattaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATG GGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19676 ACCTATCATTCANNNNNNNN 19868 msR stem variant Var0135 CATAGATTTCTTGGCCTTTATtatctggcaacggccagagagcgcctgaTTGtcaggcgctctctctggccgttgccagataGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATG GGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19677 ACCTATCATTCANNNNNNNN 19869 msR stem variant Var0136 CATAGATTTCTTGGCCTTTATgccgtcccgcgcggcctgcccaaacgccctTTGagggcgtttgggcaggccgcgcgggacggcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCA CGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19678 ACCTATCATTCANNNNNNNN 19870 msR stem variant Var0137 CATAGATTTCTTGGCCTTTATcgccgccggccgggcgcgggcgccggcggcTTGgccgccgcgcccgcgcccggccggcggcgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCA CGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19679 ACCTATCATTCANNNNNNNN 19871 msR stem variant Var0138 CATAGATTTCTTGGCCTTTATaatatataaatattaataataataataaatTTGatttattaatattattattaatatttatatatattGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGA TGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19680 ACCTATCATTCANNNNNNNN 19872 msR stem variant Var0139 CATAGATTTCTTGGCCTTTATttaatatctaacaattattgaagttgctttatttcTTGgaaataaagcaacttcaataattgttagatattaaGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAG GCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19681 ACCTATCATTCANNNNNNNN 19873 msR stem variant Var0140 CATAGATTTCTTGGCCTTTATcagttatagcgcgaactagtttgacctgatttgtaTTGtacaaaatcaggtcaaactagttcgcgctataactgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGC CACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19682 ACCTATCATTCANNNNNNNN 19874 msR stem variant Var0141 CATAGATTTCTTGGCCTTTATatagactggccgcccgtaggaacttgaggacgcacTTGgtgcgtcctcaagttcctacgggcggccagtctatGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGC CACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19683 ACCTATCATTCANNNNNNNN 19875 msR stem variant Var0142 CATAGATTTCTTGCCTTTATgcgagcgcgaggacgccacgagcacgcgccctgccTTGggcagggcgcgtgctcgtggcgtcctcgcgctcgcGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCAT CACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19684 ACCTATCATTCANNNNNNNN 19876 msR stem variant Var0143 CATAGATTTCTTGGCCTTTATcggccggcgcccgggccgggcgggcccgcgggcggTTGccgcccgcgggcccgcccggcccgggcgccggccgGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACC GGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19685 ACCTATCATTCANNNNNNNN 19877 msR stem variant Var0144 CATAGATTTCTTGgcatGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19686 ACCTATCATTCANNNNNNNN 19878 Minimal variant Var0145 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTggtgtcaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19687 ACCTATCATTCANNNNNNNN 19879 Minimal variant Var0146 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatacactaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19688 ACCTATCATTCANNNNNNNN 19880 Minimal variant Var0147 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19689 ACCTATCATTCANNNNNNNN 19881 Minimal variant Var0148 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaataGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19690 ACCTATCATTCANNNNNNNN 19882 Minimal variant Var0149 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggcatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19691 ACCTATCATTCANNNNNNNN 19883 Minimal variant Var0150 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagattttaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19692 ACCTATCATTCANNNNNNNN 19884 Minimal variant Var0151 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggtaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19693 ACCTATCATTCANNNNNNNN 19885 Minimal variant Var0152 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCaaagaattgttagCAAGAAATCTATG 19694 ACCTATCATTCANNNNNNNN 19886 Minimal variant Var0153 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCaaataagaaagCAAGAAATCTATG 19695 ACCTATCATTCANNNNNNNN 19887 Minimal variant Var0154 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCaatgttagCAAGAAATCTATG 19696 ACCTATCATTCANNNNNNNN 19888 Minimal variant Var0155 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCaaatatagCAAGAAATCTATG 19697 ACCTATCATTCANNNNNNNN 19889 Minimal variant Var0156 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCaaagCAAGAAATCTATG 19698 ACCTATCATTCANNNNNNNN 19890 Minimal variant Var0157 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatacacataattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19699 ACCTATCATTCANNNNNNNN 19891 Stop variant Var0158 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatacacattattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19700 ACCTATCATTCANNNNNNNN 19892 Stop variant Var0159 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatacaaatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAA TCTATG 19701 ACCTATCATTCANNNNNNNN 19893 Stop variant Var0160 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatacatatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19702 ACCTATCATTCANNNNNNNN 19894 Stop variant Var0161 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatacaaataattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19703 ACCTATCATTCANNNNNNNN 19895 Stop variant Var0162 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatacaaattattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19704 ACCTATCATTCANNNNNNNN 19896 Stop variant Var0163 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatacatataattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19705 ACCTATCATTCANNNNNNNN 19897 Stop variant Var0164 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatacatattattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19706 ACCTATCATTCANNNNNNNN 19898 Stop variant Var0165 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaataaacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGCAAGAAA TCTATG 19707 ACCTATCATTCANNNNNNNN 19899 Stop variant Var0166 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19708 ACCTATCATTCANNNNNNNN 19900 Stop variant Var0167 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaataaaaatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGCAAG AAATCTATG 19709 ACCTATCATTCANNNNNNNN 19901 Stop variant Var0168 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaataaatatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCACCAAATATAAGAATTGTTAGCAAGAAA TCTATG 19710 ACCTATCATTCANNNNNNNN 19902 Stop variant Var0169 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaataaaatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCACCAAATATAAGAATTGTTAGCAAGAAA TCTATG 19711 ACCTATCATTCANNNNNNNN 19903 Stop variant Var0170 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatatatatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCACCAAATATAAGAATTGTTAGCAAGAAA TCTATG 19712 ACCTATCATTCANNNNNNNN 19904 Stop variant Var0171 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtaaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19713 ACCTATCATTCANNNNNNNN 19905 Stop variant Var0172 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgttaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19714 ACCTATCATTCANNNNNNNN 19906 Stop variant Var0173 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtaaaataaacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCACCAAATATAAGAATTGTTAGCAAGAAA TCTATG 19715 ACCTATCATTCANNNNNNNN 19907 Stop variant Var0174 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtaaaatacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19716 ACCTATCATTCANNNNNNNN 19908 Stop variant Var0175 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgttaaataaacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19717 ACCTATCATTCANNNNNNNN 19909 Stop variant Var0176 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgttaaatatacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19718 ACCTATCATTCANNNNNNNN 19910 Stop variant Var0177 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttatcaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19719 ACCTATCATTCANNNNNNNN 19911 Stop variant Var0178 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttttcaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19720 ACCTATCATTCANNNNNNNN 19912 Stop variant Var0179 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttataaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19721 ACCTATCATTCANNNNNNNN 19913 Stop variant Var0180 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttattaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19722 ACCTATCATTCANNNNNNNN 19914 Stop variant Var0181 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttttaaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19723 ACCTATCATTCANNNNNNNN 19915 Stop variant Var0182 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagattttttaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19724 ACCTATCATTCANNNNNNNN 19916 Stop variant Var0183 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatacacatccttaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTTCCCCATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19725 ACCTATCATTCANNNNNNNN 19917 Stop variant Var0184 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcaaatacacatcgttaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCACCAAATATAAGAATTGTTAGCAAGAAA TCTATG 19726 ACCTATCATTCANNNNNNNN 19918 Stop variant Var0185 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggaggtttgtcaaatacacaccattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGCAAG AAATCTATG 19727 ACCTATCATTCANNNNNNNN 19919 Stop variant Var0186 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagctttgtcaaatacacagcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCAAATATAAGAATTGTTAGCAAG AAATCTATG 19728 ACCTATCATTCANNNNNNNN 19920 Stop variant Var0187 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagattggtcaaatacccatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG 19729 ACCTATCATTCANNNNNNNN 19921 Stop variant Var0188 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagattcgtcaaatacgcatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACATCACCAAATATAAGAATTGTTAGCAAGAAA TCTATG 19730 ACCTATCATTCANNNNNNNN 19922 Stop variant Var0189 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtccaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19731 ACCTATCATTCANNNNNNNN 19923 Stop variant Var0190 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgtcgaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19732 ACCTATCATTCANNNNNNNN 19924 Stop variant Var0191 CATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTggagatttgccaaatacacatcattaGGTTGCGACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACCTGCAGGACCTATCATTCANNNNNNNNAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCACACAACCAAATATAAGAATTGTTAGCAAGAAATC TATG 19733 ACCTATCATTCANNNNNNNN 19925 Stop variant

篩選方法描繪於圖114中。本質上,將該ncRNA文庫與Cas9 mRNA、逆轉錄酶mRNA及靶向EMX1位點之引導RNA共轉染至293T細胞中。轉染後16小時收集包括RNA、ssDNA在內之總核酸,且在轉染後96小時收穫基因體DNA (gDNA)。藉由ncRNA、ssDNA模板及基因體EMX1位點之內部高通量測序來量測條碼豐度。首先,將ssDNA及gDNA中之條碼計數標準化為初始ncRNA文庫之輸入ncRNA水準。接著,藉由對與此類變異體相關之獨特條碼求平均值來計算每種變異體之ssDNA模板產生水準或gDNA插入水準。 結果: The screening method is depicted in Figure 114. Essentially, the ncRNA library was co-transfected into 293T cells with Cas9 mRNA, reverse transcriptase mRNA, and guide RNA targeting the EMX1 site. Total nucleic acid including RNA, ssDNA was collected 16 hours after transfection, and genomic DNA (gDNA) was harvested 96 hours after transfection. Barcode abundance was measured by in-house high-throughput sequencing of ncRNA, ssDNA templates, and genomic EMX1 sites. First, the barcode counts in ssDNA and gDNA were normalized to the input ncRNA level of the initial ncRNA library. Then, the ssDNA template generation level or gDNA insertion level for each variant was calculated by averaging the unique barcodes associated with this type of variant. Results:

結果在圖115之散點圖中示出。基因體插入水準與ssDNA模板產生水準相關(圖),且陰性對照ncRNA在ssDNA產生以及gDNA插入方面均展現低效率。展現比WT R6342S ncRNA更高之活性的ncRNA變異體可在圖115中經鑑定為屬於右上象限之彼等資料點。下表31B列出13種表現最好之ncRNA變異體,該等變異體在gDNA插入水準及ssDNA產生水準方面均顯示出超過1.5倍高之活性。 表31B – 表現最佳之ncRNA變異體 具有經改良效能之變異體 變異類型 Var0015 a1a2變異體 Var0020 a1a2變異體 Var0026 a1a2變異體 Var0031 a1a2變異體 Var0044 msD莖變異體 Var0060 msD莖變異體 Var0061 msD莖變異體 Var0062 msD莖變異體 Var0065 msD莖變異體 Var0066 msD莖變異體 Var0070 msD莖變異體 Var0071 msD莖變異體 Var0076 msD莖變異體 實例:材料及方法 The results are shown in the scatter plot of Figure 115. The level of genomic insertion correlates with the level of ssDNA template production (Figure), and the negative control ncRNA exhibits low efficiency in both ssDNA production and gDNA insertion. ncRNA variants that exhibit higher activity than WT R6342S ncRNA can be identified in Figure 115 as those data points belonging to the upper right quadrant. Table 31B below lists the 13 best performing ncRNA variants, which show more than 1.5-fold higher activity in both gDNA insertion and ssDNA production levels. Table 31B - Best performing ncRNA variants Variants with improved potency Variant Type Var0015 a1a2 variant Var0020 a1a2 variant Var0026 a1a2 variant Var0031 a1a2 variant Var0044 msD stem variant Var0060 msD stem variant Var0061 msD stem variant Var0062 msD stem variant Var0065 msD stem variant Var0066 msD stem variant Var0070 msD stem variant Var0071 msD stem variant Var0076 msD stem variant Example: Materials and Methods

哺乳動物細胞培養在補充有10% (v/v)胎牛血清(Gibco)之杜氏改良伊格爾培養基(DMEM)加上GlutaMAX (Thermo Fisher Scientific)中培養HEK293T (ATCC CRF-3216)或293T-Cas9 (Genecopoeia SL502)細胞。使細胞維持於37℃及5% CO2下。 Mammalian cell culture HEK293T (ATCC CRF-3216) or 293T-Cas9 (Genecopoeia SL502) cells were cultured in Dulbecco's modified Eagle's medium (DMEM) plus GlutaMAX (Thermo Fisher Scientific) supplemented with 10% (v/v) fetal bovine serum (Gibco). Cells were maintained at 37°C and 5% CO2.

基因體DNA提取在培育後,自細胞中移除培養基,且藉由將prepGEM試劑(Thomas Scientific: PUN0050)直接添加至組織培養板之每個孔中來提取基因體DNA。將溶解之細胞轉移至96孔PCR板且在72℃下培育10分鐘,接著進行95℃酶不活化步驟持續2分鐘。 Genomic DNA extraction After incubation, the medium was removed from the cells and genomic DNA was extracted by adding prepGEM reagent (Thomas Scientific: PUN0050) directly to each well of the tissue culture plate. Lysed cells were transferred to a 96-well PCR plate and incubated at 72°C for 10 minutes, followed by an enzyme inactivation step at 95°C for 2 minutes.

基因體DNA樣品之高通量DNA測序自基因體DNA樣品中擴增人類EMX1基因或AAVS1基因座,且在Illumina NextSeq上進行測序。簡言之,將含有Illumina正向及反向銜接子之擴增引子用於第一輪PCR (PCR1),從而擴增EMX1標靶位點。用0.3 µM各正向及反向引子、1 µl基因體DNA提取物及12.5 µl KAPA HIFI HOTSTART PCR主混合物執行25 µl PCR1反應。如下進行PCR反應:95℃持續1分鐘,接著25個循環[98℃持續20秒,65℃持續15秒,且72℃持續15秒],隨後最終72℃延伸持續2分鐘。使用Ampure XP珠粒(Beckman Coulter)來純化PCR反應物且在20 µl H20中溶離。在二次PCR反應(PCR2)中,將獨特Illumina條碼引子對添加至每個樣品中。特定言之,用5 µl用於Illumina UDI引子(Illumina)之IDT、1 µl經純化之PCR1反應物及12.5 µl KAPA HIFI HOTSTART PCR主混合物執行25 µl PCR2反應。如下進行PCR反應:95℃持續3分鐘,接著10個循環[95℃持續30秒,55℃持續30秒,且72℃持續30秒],隨後最終72℃延伸持續5分鐘。藉由SequalPrep標準化板套組(Thermo Fisher Scintific)純化PCR2反應物且進行匯集。藉由Tape station D1000分析(Agilent)評估大小及純度。藉由螢光定量(Qubit, Thermo Fisher Scientific)量測DNA濃度,且使用P1或P2 300循環套組在Illumina NextSeq 2000儀器上用30% PhiX測序對照對文庫進行測序。將測序讀數解編,且使用CRISPresso2將擴增子序列與參考序列進行比對。使用所需序列作為預期等位基因,以HDR模式運行CRISPresso2,且將精確編輯產率計算為HDR比對讀數除以總比對讀數之數字。 High-throughput DNA sequencing of genomic DNA samples The human EMX1 gene or AAVS1 locus was amplified from genomic DNA samples and sequenced on an Illumina NextSeq. Briefly, amplification primers containing Illumina forward and reverse adapters were used in the first round of PCR (PCR1) to amplify the EMX1 target site. A 25 µl PCR1 reaction was performed with 0.3 µM each of the forward and reverse primers, 1 µl of genomic DNA extract, and 12.5 µl of KAPA HIFI HOTSTART PCR Master Mix. The PCR reaction was performed as follows: 95°C for 1 min, followed by 25 cycles of [98°C for 20 sec, 65°C for 15 sec, and 72°C for 15 sec], followed by a final extension at 72°C for 2 min. PCR reactions were purified using Ampure XP beads (Beckman Coulter) and eluted in 20 µl H20. In the secondary PCR reaction (PCR2), unique Illumina barcoded primer pairs were added to each sample. Specifically, 25 µl PCR2 reactions were performed with 5 µl IDT for Illumina UDI primers (Illumina), 1 µl purified PCR1 reaction, and 12.5 µl KAPA HIFI HOTSTART PCR Master Mix. PCR reactions were performed as follows: 95°C for 3 min, followed by 10 cycles of [95°C for 30 sec, 55°C for 30 sec, and 72°C for 30 sec], followed by a final extension at 72°C for 5 min. PCR2 reactions were purified by SequalPrep normalization plate kit (Thermo Fisher Scintific) and pooled. Size and purity were assessed by Tape station D1000 analysis (Agilent). DNA concentration was measured by fluorescence quantification (Qubit, Thermo Fisher Scientific), and libraries were sequenced on an Illumina NextSeq 2000 instrument with 30% PhiX sequencing controls using either the P1 or P2 300 cycle kit. Sequencing reads were decoded, and amplicon sequences were aligned to a reference sequence using CRISPresso2. CRISPresso2 was run in HDR mode using the desired sequence as the expected allele, and the accurate edit yield was calculated as the number of HDR aligned reads divided by the total aligned reads.

用於GFP整合之接合處qPCR使用qPCR來確定編輯後GFP在EMX1基因座處之存在。對引子進行設計以側接EMX1基因座處之GFP插入的5’及3’接合處。簡言之,用12.5 µl KAPA HiFi HotStart ReadyMix (Roche)、0.3 µM各正向及反向引子、2 µl gDNA、1X SYBR Green (Thermo Fisher Scientific: S7563)及1X Rox參考染料(Thermo Fisher Scientific:54881)執行25 µl qPCR反應。如下進行qPCR反應:50℃持續2分鐘,95℃持續3分鐘,接著40個循環[98℃持續30秒,67℃持續30秒,且72℃持續30秒],72℃延伸持續2分鐘,接著進行熔解曲線分析,其中95℃持續15秒,60℃持續1分鐘且95℃持續1秒。藉由自PCR周圍切割之Ct值中減去PCR周圍接合處之Ct值來獲得ΔCt,以對EMX1基因座處的GFP插入進行相對定量。此外,用高靈敏度D1000試劑(Agilent)使qPCR樣品在Tapestation 4200 (Agilent)上運行,以確認預期之條帶大小。 Junction qPCR for GFP integration qPCR was used to determine the presence of GFP at the EMX1 locus after editing. Primers were designed to flank the 5' and 3' junctions of the GFP insertion at the EMX1 locus. Briefly, a 25 µl qPCR reaction was performed with 12.5 µl KAPA HiFi HotStart ReadyMix (Roche), 0.3 µM each forward and reverse primers, 2 µl gDNA, 1X SYBR Green (Thermo Fisher Scientific: S7563), and 1X Rox reference dye (Thermo Fisher Scientific:54881). The qPCR reaction was performed as follows: 50°C for 2 minutes, 95°C for 3 minutes, followed by 40 cycles of [98°C for 30 seconds, 67°C for 30 seconds, and 72°C for 30 seconds], 72°C extension for 2 minutes, followed by melting curve analysis with 95°C for 15 seconds, 60°C for 1 minute, and 95°C for 1 second. The ΔCt was obtained by subtracting the Ct value of the PCR perijunction from the Ct value of the PCR perijunction cut to relative quantify the GFP insertion at the EMX1 locus. In addition, qPCR samples were run on a Tapestation 4200 (Agilent) with high sensitivity D1000 reagent (Agilent) to confirm the expected band size.

基於質體之基因編輯。由Twist Bioscience合成編碼逆轉錄子之質體。由Azenta Life Sciences合成含有Cas12a及TnpB之質體。由Vector Builder Inc合成Cas9-mCherry及Cas9-mCherry EMX1 sgRNA質體。Quintara Biosciences合成RTX3_6342S之缺失變異體。對於293T-Cas9轉染,根據製造商之方案,使用Lipofectamine 3000 (Thermofisher: L3000001) 來轉染25,000個293T-Cas9細胞(Genecopoeia: SL502)。經轉染細胞在37℃下培育72 h,接著根據製造商之方案在prepGEM試劑(Thomas Scientific: PUN0050)中溶解。對於K562細胞,使用Neon或Lonza電穿孔系統。在OPTI-MEM (Neon)或SF緩衝液(Lonza)中對100,000個K562細胞進行電穿孔,使用以下設定:1450 V及三個10 ms脈衝(Neon)或EH120 (Lonza)以及500 ng總質體DNA (250 ng Retron RT + ncRNA/EMX1 sgRNA融合,及250 ng Cas9-mCherry、250 ng Cas9-mCherry EMX1 sgRNA、250 ng Cas12a EMX1或250 ng TnpB EMX1。使細胞在37℃下培育72 h。使用1 µL等分試樣之粗溶解產物作為模板,以藉由PCR擴增EMX1標靶位點。在第二輪PCR中添加Illumina銜接子及UDI,接著裝載至Illumina NextSeq上。所得FASTQ文件用作CRISPResso2之輸入,該等文件對EMX1基因座處之精確編輯及插入缺失進行定量。 Plasmid-based gene editing. Retroviral encoding plasmids were synthesized by Twist Bioscience. Plasmids containing Cas12a and TnpB were synthesized by Azenta Life Sciences. Cas9-mCherry and Cas9-mCherry EMX1 sgRNA plasmids were synthesized by Vector Builder Inc. The deletion variant of RTX3_6342S was synthesized by Quintara Biosciences. For 293T-Cas9 transfection, 25,000 293T-Cas9 cells (Genecopoeia: SL502) were transfected using Lipofectamine 3000 (Thermofisher: L3000001) according to the manufacturer's protocol. Transfected cells were incubated at 37°C for 72 h and then lysed in prepGEM reagent (Thomas Scientific: PUN0050) according to the manufacturer's protocol. For K562 cells, Neon or Lonza electroporation systems were used. 100,000 K562 cells were electroporated in OPTI-MEM (Neon) or SF buffer (Lonza) using the following settings: 1450 V with three 10 ms pulses (Neon) or EH120 (Lonza) with 500 ng total plasmid DNA (250 ng Retron RT + ncRNA/EMX1 sgRNA fusion, and 250 ng Cas9-mCherry, 250 ng Cas9-mCherry EMX1 sgRNA, 250 ng Cas12a EMX1, or 250 ng TnpB EMX1. Cells were incubated at 37°C for 72 h. 1 A 1 µL aliquot of the crude lysate was used as template to amplify the EMX1 target site by PCR. Illumina adapters and UDI were added in a second round of PCR followed by loading onto the Illumina NextSeq. The resulting FASTQ files were used as input to CRISPResso2, which quantified precise edits and indels at the EMX1 locus.

活體外轉錄由Twist Bioscience合成質體模板。將T7啟動子插入RT或ncRNA序列上游。在37℃下藉由Sbf1-HF酶(NEB, R3642S)或EcoR V (NEB, R0195L)使質體線性化持續16小時,且藉由瓊脂糖凝膠電泳來檢查完全線性化。藉由GeneRT PCR純化套組(K0701, Thermo Fisher Scientific)純化線性化模板,且藉由Nanodrop進行定量。根據製造商之方案,使用HiScribe T7 mRNA套組及CleanCap Reagent AG (NEB, E2080S)在37℃下合成mRNA持續16小時,且使用HiScribe T7高產率RNA合成套組(NEB, 2040S)合成ncRNA。在DNAse1處理後,使用Qiagen RNeasy midi套組(Qiagen, 75144)來純化RNA。藉由Nanodrop (Thermo Fisher)量測RNA濃度,且藉由Tape station (Agilent)上之RNA ScreenTape分析來評估純度及大小。進一步藉由大腸桿菌聚(A)聚合酶(NEB, M0276)對RT mRNA加聚(A)尾,且藉由Qiagen RNeasy midi套組進行純化。藉由對聚(A)腺苷酸化之前及之後進行比較,利用RNA ScreenTape分析來估算尾長。 In vitro transcription Plasmid templates were synthesized by Twist Bioscience. T7 promoter was inserted upstream of RT or ncRNA sequence. Plasmids were linearized by Sbf1-HF enzyme (NEB, R3642S) or EcoR V (NEB, R0195L) at 37°C for 16 h, and complete linearization was checked by agarose gel electrophoresis. Linearized templates were purified by GeneRT PCR Purification Kit (K0701, Thermo Fisher Scientific) and quantified by Nanodrop. mRNA was synthesized using HiScribe T7 mRNA Kit and CleanCap Reagent AG (NEB, E2080S) at 37°C for 16 h, and ncRNA was synthesized using HiScribe T7 High Yield RNA Synthesis Kit (NEB, 2040S) according to the manufacturer's protocol. After DNAse1 treatment, RNA was purified using the Qiagen RNeasy midi kit (Qiagen, 75144). RNA concentration was measured by Nanodrop (Thermo Fisher), and purity and size were assessed by RNA ScreenTape analysis on Tape station (Agilent). RT mRNA was further poly(A)-tailed by E. coli poly(A) polymerase (NEB, M0276) and purified by the Qiagen RNeasy midi kit. Tail length was estimated by RNA ScreenTape analysis by comparing before and after poly(A) adenylation.

藉由Neon電穿孔系統進行RNA轉染收穫HEK293T (ATCC CRF-3216)或293T-Cas9 (Genecopoeia SL502)細胞且再懸浮於R緩衝液中。利用10 µl Neon尖端在1150伏特電壓、20個脈衝下對50,000個含有1 µl RNA混合物之細胞進行電穿孔。將經電穿孔之細胞輕柔地轉移至含有預加溫培養基之96孔板(VWR)中,且培養72小時,接著提取基因體DNA。 RNA transfection by Neon electroporation system HEK293T (ATCC CRF-3216) or 293T-Cas9 (Genecopoeia SL502) cells were harvested and resuspended in R buffer. 50,000 cells containing 1 µl RNA mixture were electroporated using a 10 µl Neon tip at 1150 volts for 20 pulses. The electroporated cells were gently transferred to a 96-well plate (VWR) containing pre-warmed medium and incubated for 72 hours before genomic DNA extraction.

藉由Lipofectamine MessengerMAX進行RNA轉染將20,000個HEK293T (ATCC CRF-3216)細胞接種於96孔板中。將120,000個細胞接種於24孔板中。接種後16-24小時,用0.3 µl (或0.1 µl Lipofectamine MessengerMax/100 ng RNA) Lipofectamine MessengerMax (Thermo Fisher Scientific)及適量RNA混合物轉染細胞。將細胞培養72小時,接著提取基因體DNA。 RNA transfection by Lipofectamine MessengerMAX 20,000 HEK293T (ATCC CRF-3216) cells were seeded in 96-well plates. 120,000 cells were seeded in 24-well plates. 16-24 hours after seeding, cells were transfected with 0.3 µl (or 0.1 µl Lipofectamine MessengerMax/100 ng RNA) Lipofectamine MessengerMax (Thermo Fisher Scientific) and the appropriate amount of RNA mixture. Cells were cultured for 72 hours and genomic DNA was then extracted.

ncRNA 之環化採用(27)所示之策略及序列來設計模板。與上文所述類似地執行活體外轉錄,除了使用7.5 mM NTP來替代10 mM濃度。在DNAse1處理後,使用Qiagen RNeasy midi套組(Qiagen, 75144)來純化RNA。用RNAse R (Biosearch, RNR0725)處理RNA以富集環狀RNA,且使用Qiagen Rneasy modi套組進行純化。藉由Nanodrop (Thermo Fisher)量測RNA濃度,且藉由Tape station (Agilent)上之RNA ScreenTape分析來評估純度及大小。 實例:參考文獻A, M. (2020). Bacterial retrons funciton in anti-pahge defense. 183. AJ, S. (2019). Retrons and their applications in genome engineering. 47(21). AV, a. (2019). Search-and-replace genome editing without double-strand breaks or donor DNA. Nature, 149-157. Bonafont J, M. A. (2019). Clinically relevant correction of Recessive Dystrophic Epidermolysis Bullosa by dual sgRNA CRISPR/Casi-mediated gene editing. 27(5). D, L. (2000). synthetic DNA delivery systems. Nature Biotechnology, 33-37. D, L. (2000). Synthetic DNA delivery systems. Nature Biotechnology, 33-37. Ellis GI, S. N. (2021). Genetic engineering of T cells for immunotherapy. 22. F J Triana-Alonso, M. D. (1995). Self-coded 3' extension of run-off transcripts produces aberrant products during in vitro transcription with T7 polymerase. 270(11). Grünewald, J. M. (2023). Engineered CRISPR primer editors with compact, untethered reverse transcriptases. 41. Hou Z, Z. T. (2021). Lipid nanoparticles for mRNA delivery. 6. J, G.-M. (2021). Val50Met hereditary transthyretin amyloidosis:not just a medical problem but a psychosocial burden. 16. Jumper, J. E. (2021). Highly accurate protein structure prediction with AlphaFold. 596. Karvelis, T. D. (2021). Transposase-associated TnpB is a programmable RNA-guided DNA endonuclease. 599. L Statello, C.-J. G.-L. (2021). Gene regulation by long non-coding RNAs and its biological functions. 22. Lampson BC, I. M. (2005). Retrons, msDNA, and the bacterial genome. 110. Liu L, C. J. (2019). In vivo Exon Replacement in the Mouse Atp7b Gene by the Cas9 system. 30(9). M Vera, E. T. (2019). Imaging Single mRNA molecules in mammalian cells using an optimized MS2-MCP system. 2038. Maizels N, D. L. (2018). Initiation of homologous recombination at DNA nicks. 46(14). Meijboom KE, A. A. (2022). CRISPR/Cas9-mediated excision of ALS/FTD-causing hexanucleotide repeat expansion in C9ORF72 rescues major disease mechanisms in vivo and in vitro. 13(6286). MF, D. (2021). Are we creating a new phenotype? Neurological Research and Practice. MR, M. (2020). Systematic prediction of genes functionally associated with bacterial retrons and classification of the encoded tripartite system. 48(22). Nace KD, M. J. (2021). Modifications in an Emergency: The role of N1-Methylpseudouridine in COVID-19 Vaccines. 7(5). Nahmad AD, R. E. (2022). Frequent aneuploidy in primary human T cells after CRISPR-Cas9 cleavage. 10. Nakajuma K, Z. Y. (2018). Precise and efficient nucleotide substitution near genomic nick via noncanonical homology-directed repair. 28(2). Nelson CE, H. C. (2016). In vivo genome editing improves muscle function in a mouse model of Duchenne muscular dystrophy. 351(6271). Oscorbib IP, W. P. (2020). The attachment of a DNA-binding Sso7D-like protein improves processivity and resistance to inhibitors of M-MuLV reverse transcriptase. 594(24). Oscorbin IP, F. M. (2021). M-MuLV reverse transcriptase: Selected properties and improved mutants. 19. Palka C, F. C.-K. (2022). Retron reverse transcriptase termination and phage defense are dependent on host RNase H1. 50(6). Ran FA, H. P. (2013). Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity. 154(6). Richardson, C. D. (2016). Enhancing homology-directed genome editing by active and inactive CRISPR/CAS9 using asymmetric donor DNA. 34. S, i. (1989). Reverse transcriptase associated with the biosynthesis of the branched RNA-linked msDNA in myxococcus xanthus. 56. SC, l. (2021). Precise genome editing across kingdoms of life using retron-derived DNA. 18(2). Schiorli, G. (2019). Precise Gene Editing Preserves Hematopoietic Stem Cell Function following Transient p53-mediated DNA Damage Response. 24. Schiroli, G. (2019). Precise Gene Editing Preserves Hematopoietic Stem Cell Function following Transient p53-Mediated DNA Damage Response. 24(4). Shimamoto T, H. M. (1993). Reverse transcriptase from bacterial retrons require specific secondary structure at the 5'-end of the template for the cDNA priming reaction. 268(4). Spencer JM, Z. X. (2017). Deep mutational scanning of S.pyogenes Cas9 reveals important functional domains. 7. T, F. (1987). Branched RNA covalently linked to the 5' end of a single-stdnaded DNA in Stigmatella aurantiaca. 48(1). W, S. (2022). rAAV immunogenicity, toxicity, and durability in 255 clinical trials:A meta-analysis. Frontiers in Immunology. Wesselhoeft RA, K. P. (2018). Engineering circular RNA for potent and stable translation in eukaryotic cells. 9. Yarnall MTN, I. E.-U. (2023). Drag-and-drop genome insertion of large sequences without double-strand DNA cleavage using CRISPR-directed integrases. 41(4). Zetsche B, G. J. (2015). Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. 163(3). 實例:序列 選擇RT及ncRNA序列 - 概述 描述 SEQ ID NO: EMX1 gRNA序列(nt) 19109 Eco1 ncRNA (nt) 19110 AcoI ncRNA (nt) 19111 RTX_1262 RT (aa) 1262 RTX_1262 ncRNA (nt) 15327 RTX3_2042 RT (aa) 2042或19112 RTX_2042 ncRNA (nt) 19113 RTX_2781 RT (aa) 2781 RTX_2781 ncRNA (nt) 16411 RTX_6083v1 RT (aa) 6083或19114 RTX3_6083v1 ncRNA (nt) 18547 RTX_6342 RT (aa) 6342 RTX_6342 ncRNA (nt) 18731 RTX_6342L ncRNA (nt) 19927 RTX_6342S ncRNA (nt) 19928 經工程改造之6342S ncRNA + HDR模板 19929 經工程改造之6342L ncRNA + HDR模板 19930 RTX_6943 RT (aa) 6943或19116 RTX_6943 ncRNA (nt) 19053 經工程改造之RTX_6943 ncRNA 19117 Eco1 RT (nt) 19118 Eco1 ncRNA (nt) 19119 Eco3 RT (nt) 19120 Eco3 ncRNA (nt) 19121 Eco5 RT – (nt) 19122 Eco5 ncRNA (nt) 19123 EMX1 – sgRNA經修飾(nt) 19124 EMX1 sgRNA未經修飾(nt) 19125 選擇RT及ncRNA序列EMX1 s.p Cas9 gRNA GAGTCCGAGCAGAAGAAGAA (SEQ ID NO: 19109) Eco1 ncRNA + HDR_ 模板TGATAAGATTCCGTATGCGCACCCTTAGCGAGAGGTTTATCATTAAGGTCAACCTCTGGATGTTGTTTCGGCATCCTGCATTGAATCTGAGTTACTGTCTGTTTTCCT ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAAAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA AGGAAACCCGTTTCTTCTGACGTAAGGGTGCGCATACGGAATCTTATCA (SEQ ID NO: 19110) Aco1 ncRNA + HDR 模板CCGTAGTGGGAGCCTCAGGCGAGGGTGTGTATCATGCCCGTTCTGCCAAGACCCACCAAAGAAGGGCACCGTGGAGG ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAAACAGAAGTGCAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA CCTCCACGGTGCAATGCGAAAGCAACTTGAGGCTTTGCTTAGTATGAGGCTCCCACTACGG (SEQ ID NO: 19111) 逆轉錄子 1262 RTMISFSEIKSRNDFADALQIPRSVLTHVLYIAKPESFYESFTIPKKNGEDRIIMAPKGTLKSIQTKLSKQLVEYRASISQKGQEKSNISHGFEREKSIITNAQIHRNKRYVINYDLKDFFDSFHFGRVVGFFEKNKHFLLPYEVAVIIAQLTCYNGRLPQGAPTSPVITNLICEILDYRVLKIAKRYKLDYTRYADDLTFSTNYSRFLEVFDSFAKELLQEISNSGFTINQSKTRLLYRDSRQEVTGLVVNKKIGVNREYVKSTRAMAQALYSTGEFTINGIPGTIKQLEGRFGFIDQLDHYNNVIDDAKHDAYSLNGREKQFQEFLFFKTFFFNEYPLVITEGKTDIRYLKAALKSLHQKYPELICKEDDGTFRFKISFFRRSKRWKYFFGISKDGADAMKLLYRFFTGQKGVKNYYRLFAEKYKAVQRNPVIMLFDNEMESKRPLNKFISEEVKIPSSEQQLFKEQLYYHLIPGSKTYLMTHPLPPGKTEAEIEDLFPTEVLGVKLDGKSFSTKDKFDTSKFYGKDIFSSYVYEHWKSIDFSGFIPLLDKINMLVQNEKKPGLNT (SEQ ID NO: 1262) ncRNAGAGCGTGTGACAAGATTTTGGGCTTGTGTTTCGCAAGCTTTGATTAAAATGGTAGATGGATTTGCTATCTATCGTCATTTCCAGTACTGTTATGTTTATGTTCTGTTTACCCGTATGCACCATCCGCATAATGAGTTGTCACACGCTC (SEQ ID NO: 15327) 逆轉錄子 2042 RTMKDDQYSQWKKYYESRGILPEIQDKLLNYAKIHIDNNTPVIFNFEHLTLLLGREKNYLSSVVNSPDSHYRKFKIKKRSGGEREITAPYLSLLEMQYWIYRNILINVKIHYAAHGFAQDKSIITNSRNHLGQKHLLKMDLKDFFPSIKLNRIIYIFKSLGYPNIIAFYLASICSYKGHLPQGSPTSPILSNIVSITLDNRLVKFARKMKLRYSRYADDLTFSGDKIPTNYIKYITDIINDEGFEVNDTKTKLYLKAGKRIVTGISVIGNDPKLPREYKRKLKQELHYIFTYGIGSHMAKKKIKKINYLYRIIGKVNFWLNIEPDNEYARNAKAKLLLLIDN* (SEQ ID NO: 19112或SEQ ID NO: 2042) 同源ncRNAAGGTGGTTATATTCTAGTATTTATGAAGTGTAGTCGCTTCGATCGTTAAGGCTGATTTTAACCTCTGCATAATAATATCGGTAGATATTATTATGCACGCTCCCTTTAGCAGAGCTAAGAATCGCTCACTCAGGCACAAGCTTTGAGGAGCGACCTCCTCAAAGCTTGTGCCTGAGTGAGAGCTAAAGAAAAGAAAAGTAGAATAAGCCACCT (SEQ ID NO: 15833) 經工程改造之2042 ncRNA:RTX3_2042 ncRNA + HDR_ 模板GTTAAGGTGGTTATATTCTAGTATTTATGAAGTGTAGTCGCTTCGATCGTTAAGGCTGATTTTAACCTCTGCATAATAATATCGGTAGATATTATTATGCACGCTCCCTTTAGCAGAGCTAAGAATCGCTCACTCAGGCACAAGCTTTGAGG ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAATCCTAAGAGAAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA CCTCAAAGCTTGTGCCTGAGTGAGAGCTAAAGAAAAGAAAAGTAGAATAAGCCACCTTAAC (SEQ ID NO: 19113) 逆轉錄子 2781 RTMSGRLWKYLTPQQEREFILSLNLLPASCRKRMSGEAQLALLYAISNHVERHYRKVRIPKKDGKSRRLLVPDGLLKGVQRNILRHCLDGRSVSEYACAYRRGLSAADNAAPHAGGGKDKLLLKLDIRDFFDSILFWQVYGAAFPGSLFPPAAAGLLTHLCCCGDRLPQGAPTSPAISNLVMKPFDEFMGAWCRQRQIVYTRYCDDLTFSGAFDPKAVYHKARHLLEAMGFALNAEKTALRNRGMRQSVTGLVVNERVRTGVPYRRRIRQEMYYCRKYGVGEHLQAVGRLTGTETQEEAAEAERAFLCALLGKIGYVLLLEPENREFLEYREICRGMQAEQEKASCPPACGEREQKRRGGKEK (SEQ ID NO: 2781) 同源ncRNA AAGAGCAACTAGATTGAGGCGATTCGCCTCCTTGGAAAAGGGTACTAAGTTTCTGTCGCACACCAATTTATAAGCTTATAAATTGGTGTGCGACAGAAATGAAATAAATAGTAGTTGCTCTT (SEQ ID NO: 16411) 逆轉錄子 6083 RTMSNPQPTRAEIFERIKQSSKQEVILEEMQRLGFWPRSEGQPEVAADLIQREGELQRELAELNKKLAVKRNPERALREMRKQRMKDARDKREVTKRAQAQQRYDKALLWHEKRASHVAYLGPGVSASLHENSSATQEQGDKGKPKRARDRAVPDLQRLTLNGLPALISAAQLAESMGVSVAELRFLSFHREVARTNHYHSFTLPKKTGGERLISAPMPRLKRAQYWVLDNVLAKMPAHDAAHGFLAGRSIISNAKPHAGQDVVINLDVKDFFPSIAFGRIKGVFRQLGYGESIATVFALLCSENRAQAWQVDGERLFVGGKARERVLPQGAPTSPMLTNLLCRRMDRRLLGLAKQLGFVYTRYADDLTFSASGEPARDNVGKLLSRVRWILRDEGFTPHPDKERVMRKGRRQEVTGLVVNSDTPSVSRETRRRLRAALHRASQPDAASKPAHWQGHTAQPSQLLGLATFVHQIDPKQGKTLLADAQQLMRSPIDRANDAAKSASRADAAQQSFRVLAAAGKPPVLADGKNWWQPAPPATPVLEKTDQQRREERQATRRQQAAAAAPPPSSTRRNERPQQAAHEQQGDAQPQNEAPPRFDPDQYAPPPRNVMTYWAQIAISFFLGSILHNRLITIFAMVAVIALYYMRRQRWDVFMGILVVATLLGYLVRGMG* (SEQ ID NO: 19114或SEQ ID NO: 6083) 同源ncRNACCGGAGCAATGAGCAGGCTCTTGCAATCCGGGCGGTGTTTCGCCGCCCTTGTGAACTGCCGTTTCATGCACCACGGGCGCCGTTTTCACTGTGCCACCCCAGCCACGGTAGTGCTTTTCCTGAGTGGCTTGCCACAAGCGAAGCAGCACTACCGTGGCGGGGTGGCGTCGAGCGAACAGCTCCCGTCCCGTGAGCCCTACAGGCTCTTCGACGAGATGCACATTGCTCCG (SEQ ID NO:18547) 經工程改造之6083 ncRNA:RTX3_6083v1 ncRNA + HDR 模板GCTCCGGAGCAATGAGCAGGCTCTTGCAATCCGGGCGGTGTTTCGCCGCCCTTGTGAACTGCCGTTTCATGCACCACGGGCGCCGTTTTCACTGTGCCACCCCAGCCACGGTAGTGCT ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACTACAAGGTTAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA AGCACTACCGTGGCGGGGTGGCGTCGAGCGAACAGCTCCCGTCCCGTGAGCCCTACAGGCTCTTCGACGAGATGCACATTGCTCCGGAGC (SEQ ID NO: 19115) 逆轉錄子 6342 RTMMENQENNMNNRKNYREAVNTMGKKEFTLIKMQEYGFWPKNLPTPYERQESETEEQYKERKCLLEKYEKVIDEISKLYEEKDKINLKLRELQKKYDETWDYERIRLDVSKTIMQESIERRAERKRQRELEKEQRSEEWKKEKENKIVFIGKGYSSLLYDKETDENKLLLQELPIIKDDKELANFLGIEYKKLRFLVYHRDVISVDNYHRYTIPKKKGGVRNIAAPKSILKNSQRIILEEILSKIPTSNYSHGFLKGKSVVSAAKVHVKKPDLLINIDLEDFFPTITFERVRGMFKSFGYSGYVASMLAMICTYCERMKVEVRGEEKYVKISDRILPQGSPASPMITNIICVKLDKRLNGLSTKYDFIYTRYADDMSFSFTGDINELSVGSFMGLVSKIVKEEGFNINKDKTKFLRKNNRQCITGIVINNEEIGVPKKWIKILRAAIYNANKVKNSGEILSNKVINEISGMTSWVKSVNEERYKDIINDAMNLINN (SEQ ID NO: 6342) 同源ncRNAACATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGCGCAATTCGCTACGCTACCAATACTGTGGGCATTAAAAATCGAGCCTTCGGTTATGCCACAGTATTGGTGCTACGCTGAAGTGTCACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG (SEQ ID NO: 18731) RTX_6342S ncRNA CAUAGAUUUCUUGGCCUUUAUGCUGUGGUGUUGCGCCACGGUGGAGAUUUGUCAAAUACACAUCAUUAGGUUGCG_置放_插入物_此處_CACA ACCAAAUAUAAGAAUUGUUAGCAAGAAAUCUAUG (SEQ ID NO: 19927) RTX_6342L ncRNA CAUAGAUUUCUUGGCCUUUAUGCUGUGGUGUUGCGCCACGGUGGAGAUUUGUCAAAUACACAUCAUUAGGUUGCGCGCAAUUCGCUACGCUACCAAUACUGUGGGCAU_置放_插入物_此處_AUGCCACAGUAUUGGU GCUACGCUGAAGUGUCACAACCAAAUAUAAGAAUUGUUAGCAAGAAAUCUAUG (SEQ ID NO: 19928) RTX_6342S ncRNA + HDR模板插入物 CAUAGAUUUCUUGGCCUUUAUGCUGUGGUGUUGCGCCACGGUGGAGAUUUGUCAAAUACACAUCAUUAGGUUGCG ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAATCGTCAGTGCAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA CACAACCAAAUAUAAGAAUUGUUAGCAAGAAAUCUAUG (SEQ ID NO: 19929) RTX_6342L ncRNA + HDR模板插入物 CAUAGAUUUCUUGGCCUUUAUGCUGUGGUGUUGCGCCACGGUGGAGAUUUGUCAAAUACACAUCAUUAGGUUGCGCGCAAUUCGCUACGCUACCAAUACUGUGGGCAU ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACGTAATCCGGAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA AUGCCACAGUAUUGGUGCUACGCUGAAGUGUCACAACCAAAUAUAAGAAUUGUUAGCAAGAAAUCUAUG (SEQ ID NO: 19930) 逆轉錄子 6943 RTMEESTNYKLLVWGLSVIQPATPNEVLNYLTSTLNDNGLLPDVEKMIHYFELLDQLGYIHQVSKRNNLYSLTPRGNERLTPALKRLRDKIRLFMLDNCHSISKLGVLASTDTENMGGDSPSLQLRHNLKEVPHPSLSWAAGTLPSSPRQAWVRIYEQLNIGSMSSDEASTPTTARNAPLSFVGRLGFSLNYYSFNKIDEPLFNNDGVTAIASCIGISPGLITAMVKSPKRYYRTFNLRKKSGGFRSILAPRKFIKTIQYWLKDHVLNRLKIHSSCYSYRSGVSIKDNAINHVKKKFVASIDISDYFGSINKKMVKDCFYKNNIPDHIVNTISGIVTYNDVLPQGAPTSPIISNAILFEFDEEMTAHALTLDCIYTRYSDDISISSDYKENIAILINIAEANLLSAGFTLNRQKQRIASDNSRQVVTGILVNESIRPTRCYRKKIRSAFDHALKEQDGSQLTINKLRGYLNYLKSFETYGFKFNEKKYKETLDFLIALKQS* (SEQ ID NO: 19116或SEQ ID NO: 6943)) ncRNAACTGATACAGAGAATATGGGCGGTGATTCGCCGTCTTTACAGTTAAGGCACAATTTAAAAGAGGTTCCGCACCCAAGCCTGTCTTGGGCTGCAGGGACCCTGCCCAGCAGCCCAAGACAGGCTTGGGTGCGGATCTACGAGCAATTAAATATTGGTTCGATGTCCAGT (SEQ ID NO: 19053) 經工程改造之ncRNA:RTX3_6943 ncRNA + HDR 模板GCGGAGTGCTGGCCTCAACTGATACAGAGAATATGGGCGGTGATTCGCCGTCTTTACAGTTAAGGCACAATTTAAAAGAGGTTCCGCACCCAAGCCTGTCTTGGGCTGC ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACGTATGGCCTAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA GCAGCCCAAGACAGGCTTGGGTGCGGATCTACGAGCAATTAAATATTGGTTCGATGTCCAGTGATGAGGCCAGCACCCGC (SEQ ID NO: 19117) Eco1 RT 僅編碼序列。由CRO產生RNA,且添加專有5'帽、5' UTR、3' UTR及約120個鹼基之經編碼聚A。經1N-甲基-假尿苷完全取代) AUGAAAUCUGCAGAGUAUCUGAAUACGUUCCGCCUUAGGAAUUUGGGCCUCCCCGUGAUGAACAAUCUCCACGAUAUGAGCAAGGCGACUCGAAUAUCCGUGGAAACGCUGAGACUGCUCAUCUAUACAGCAGACUUUCGGUACAGGAUCUACACGGUCGAAAAGAAGGGGCCUGAGAAACGCAUGCGAACAAUUUAUCAACCUAGCCGAGAGCUCAAGGCGUUGCAGGGCUGGGUUCUUCGAAACAUCCUUGACAAACUCUCAUCAUCACCCUUUAGUAUUGGGUUUGAAAAGCACCAAAGCAUCCUUAACAACGCGACGCCACACAUAGGUGCCAAUUUCAUAUUGAACAUCGACUUGGAGGAUUUUUUUCCGAGCCUCACAGCCAAUAAAGUGUUCGGUGUUUUUCACAGUCUUGGGUACAAUCGCCUUAUUAGUUCCGUUCUUACCAAGAUUUGUUGUUACAAGAAUCUCUUGCCCCAGGGAGCACCCAGCAGUCCGAAAUUGGCGAAUUUGAUUUGUUCCAAGCUCGAUUAUCGAAUACAAGGGUACGCGGGCAGCCGGGGACUCAUCUAUACCCGCUACGCAGACGAUCUUACGCUGUCUGCCCAAUCAAUGAAGAAGGUCGUAAAGGCGCGGGAUUUCUUGUUUUCUAUCAUCCCGUCCGAGGGCUUGGUAAUUAAUUCCAAAAAGACUUGUAUCUCAGGACCACGAUCUCAGCGAAAAGUGACAGGACUCGUCAUUUCUCAAGAAAAAGUCGGUAUAGGGAGAGAGAAGUAUAAGGAAAUCCGCGCGAAGAUCCACCACAUAUUCUGUGGCAAGAGCAGCGAGAUAGAACACGUCCGAGGCUGGUUGUCCUUCAUACUGAGCGUGGACUCAAAAAGCCACCGCCGGUUGAUCACCUAUAUUUCAAAACUGGAAAAGAAAUAUGGAAAGAACCCACUCAACAAAGCUAAAACACCACCAAAGAAGAAAAGAAAGGUCUGA (SEQ ID NO: 19118) Eco1 ncRNA ncRNA-sgRNA融合(EMX1基因中之6 bp取代) GAAAUGAUAAGAUUCCGUAUGCGCACCCUUAGCGAGAGGUUUAUCAUUAAGGUCAACCUCUGGAUGUUGUUUCGGCAUCCUGCAUUGAAUCUGAGUUACUGUCUGUUUUCCUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAAGGAAACCCGUUUCUUCUGACGUAAGGGUGCGCAUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUUUUUU (SEQ ID NO: 19119) Eco3 RT 僅編碼序列。由CRO產生RNA,且添加專有5'帽、5' UTR、3' UTR及約120個鹼基之經編碼聚A。經1N-甲基-假尿苷完全取代 AUGCGCAUUUACUCUCUGAUCGACAGCCAAACCUUAAUGACCAAAGGGUUCGCAUCCGAGGUCAUGAGGAGCCCAGAACCCCCUAAGAAGUGGGACAUUGCGAAGAAGAAGGGCGGAAUGCGUACGAUAUACCAUCCCUCUUCUAAGGUGAAGCUGAUACAGUACUGGCUGAUGAACAACGUGUUCUCCAAAUUGCCGAUGCACAACGCCGCGUACGCUUUCGUGAAGAAUAGAUCUAUCAAGUCUAACGCACUGCUGCACGCAGAGAGUAAGAACAAAUACUACGUUAAGAUUGACCUGAAGGACUUCUUUCCAAGCAUCAAGUUCACAGACUUCGAAUAUGCCUUUACCCGGUACCGUGACAGAAUAGAGUUCACGACCGAGUACGACAAAGAACUGCUUCAGCUGAUUAAGACCAUUUGUUUCAUUUCUGACUCUACACUGCCAAUAGGCUUCCCCACUUCCCCUCUUAUAGCCAAUUUCGUCGCCAGGGAGCUGGACGAGAAGCUCACUCAGAAGCUGAACGCUAUAGACAAGCUCAACGCUACGUACACUCGCUACGCAGACGACAUAAUCGUGAGCACGAACAUGAAGGGCGCCUCUAAGCUGAUCUUAGACUGCUUCAAGCGGACCAUGAAGGAAAUCGGACCCGAUUUCAAGAUCAAUAUCAAGAAGUUCAAAAUAUGCUCUGCCAGUGGCGGCUCAAUUGUCGUGACGGGCCUUAAGGUCUGUCAUGACUUCCACAUAACUCUGCACCGGUCUAUGAAGGACAAGAUCCGCCUGCACCUCUCUCUCCUGUCCAAAGGUAUUCUGAAGGACGAGGACCACAACAAGCUGUCCGGGUACAUCGCCUACGCUAAGGACAUCGAUCCACACUUCUACACCAAGCUCAAUAGGAAGUACUUCCAGGAGAUCAAGUGGAUACAAAACCUGCAUAAUAAGGUGGAGCCACCAAAGAAGAAAAGAAAGGUCUGA (SEQ ID NO: 19120) Eco3 ncRNA ncRNA-sgRNA融合(EMX1基因中之10 bp插入) GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUUUUUU (SEQ ID NO: 19121) Eco5 RT 僅編碼序列。由CRO產生RNA,且添加專有5'帽、5' UTR、3' UTR及約120個鹼基之經編碼聚A。經1N-甲基-假尿苷完全取代 AUGGACGCUACCAGAACGACUCUCCUUGCAUUGGAUCUCUUCGGAUCUCCAGGUUGGUCCGCCGAUAAAGAAAUUCAGAGGCUUCAUGCGCUCAGUAAUCAUGCUGGAAGGCAUUACAGAAGGAUUAUAUUAAGUAAAAGGCACGGCGGACAGCGUCUUGUGCUUGCACCUGAUUACUUGUUAAAGACCGUUCAGCGCAACAUUUUGAAGAACGUUUUGAGUCAAUUUCCACUGUCACCAUUUGCUACAGCCUACAGACCGGGAUGCCCAAUCGUGUCUAACGCGCAGCCACACUGCCAACAGCCACAGAUCUUGAAACUCGAUAUAGAAAACUUCUUCGAUUCUAUUAGUUGGUUGCAGGUGUGGCGGGUGUUUCGCCAGGCCCAGUUGCCCCGAAAUGUCGUAACGAUGCUCACUUGGAUAUGUUGUUAUAACGACGCACUUCCGCAGGGUGCCCCUACAUCCCCUGCAAUUUCCAAUCUCGUCAUGAGAAGGUUUGAUGAACGGAUUGGAGAAUGGUGUCAGGCUCGAGGGAUUACCUACACUCGCUACUGCGAUGACAUGACGUUUAGUGGACACUUCAAUGCAAGGCAGGUCAAGAAUAAAGUCUGCGGUCUCUUAGCUGAGCUGGGCCUUUCCCUGAAUAAACGGAAAGGCUGCCUCAUAGCGGCUUGUAAGCGCCAGCAAGUCACCGGCAUUGUUGUGAAUCACAAGCCACAGCUUGCCCGAGAAGCCAGGCGUGCCCUGCGUCAGGAAGUGCACCUGUGCCAGAAAUAUGGAGUUAUCUCUCAUCUCUCACAUAGAGGUGAACUGGAUCCUAGCGGAGAUCUGCACGCUCAGGCGACAGCGUAUCUCUAUGCACUCCAGGGGAGAAUUAACUGGCUUCUUCAAAUUAACCCUGAGGAUGAGGCGUUUCAACAGGCCCGGGAGUCCGUUAAGAGGAUGUUAGUUGCCUGGCCACCAAAGAAGAAAAGAAAGGUCUGA (SEQ ID NO: 19122) Eco5 ncRNA ncRNA-sgRNA融合(EMX1基因中之10 bp插入) GAAAUGAUAAGAUUCCGUACGCCAGCAGUGGCAAUAGCGUUUCCGGCCUUUUGUGCCGGGAGGGUCGGCGAGUCGCUGACUUAACGCCAGUAGUAUGUCCAUAUACCCAAAGUCGCUUCAUUGUAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUCCUUCGAGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUACAGUUACGCGCCUUCGGGAUGGUUUAAUGGUAUUGCCGCUGUUGGCGUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUUUUUU (SEQ ID NO: 19123) EMX1 sgRNA經修飾 含有來自Synthego之化學修飾:2'-F、2'-O-甲基、硫代磷酸酯 GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU (SEQ ID NO: 19124) EMX1 sgRNA未經修飾 無修飾 GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU (SEQ ID NO: 19125) RNA 序列 - 概述 序列名稱 描述 SEQ ID NO: Eco1 RT 僅編碼序列。由CRO產生RNA,且添加專有5'帽、5' UTR、3' UTR及約120個鹼基之經編碼聚A。經1N-甲基-假尿苷完全取代 19126 Eco1 ncRNA-sgRNA ncRNA-sgRNA融合(EMX1基因中之6 bp取代) 19127 Eco3 RT 僅編碼序列。由CRO產生RNA,且添加專有5'帽、5' UTR、3' UTR及約120個鹼基之經編碼聚A。經1N-甲基-假尿苷完全取代 19128 Eco3 ncRNA-sgRNA ncRNA-sgRNA融合(EMX1基因中之10 bp插入) 19129 Eco5 RT 僅編碼序列。由CRO產生RNA,且添加專有5'帽、5' UTR、3' UTR及約120個鹼基之經編碼聚A。經1N-甲基-假尿苷完全取代 19130 Eco5 ncRNA-sgRNA ncRNA-sgRNA融合(EMX1基因中之10 bp插入) 19131 Cas9 所有U均為N-1-甲基假尿苷。 19132 EMX1 sgRNA經修飾 含有來自Synthego之化學修飾:2'-F、2'-O-甲基、硫代磷酸酯 19133 EMX1 sgRNA未經修飾 無修飾 19134 Eco3 ncRNA Eco3 ncRNA,僅具有10 bp插入 19135 Eco3 ncRNA-MS2 Eco3 ncRNA,僅具有10 bp插入及3' MS2莖環 19136 Aco1 RT Aco1 RT RNA序列,包括5'及3' UTR 19137 Aco1 ncRNA-sgRNA_50 Aco1 ncRNA-sgRNA,具有50 bp插入 19138 Aco1 sgRNA-ncRNA_50 Aco1 sgRNA-ncRNA,具有50 bp插入 19139 Eco3_EMX1 gRNA_ncRNA_25 Eco3 ncRNA在EMX1基因中含有25 bp插入,該基因在5'端與EMX1 sgRNA融合 19140 僅Eco3_ncRNA Eco3 ncRNA在EMX1基因中含有25 bp插入 19141 Eco3_ncRNA_AAVS1 gRNA_25 Eco3 ncRNA在AAVS1基因中含有25 bp插入,該基因在3'端與AAVS1 sgRNA融合 19142 Eco3_ncRNA_EMX1 gRNA_25 Eco3 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與EMX1 sgRNA融合 19143 Eco3_ncRNA_EMX1 gRNA_50 Eco3 ncRNA在EMX1基因中含有50 bp插入,該基因在3'端與EMX1 sgRNA融合 19144 Eco3_ncRNA_EMX1 gRNA_75 Eco3 ncRNA在EMX1基因中含有75 bp插入,該基因在3'端與EMX1 sgRNA融合 19145 Eco3_ncRNA_EMX1 gRNA_100 Eco3 ncRNA在EMX1基因中含有100 bp插入,該基因在3'端與EMX1 sgRNA融合 19146 Eco3_ncRNA_EMX1 gRNA_GFP基因 Eco3 ncRNA在EMX1基因中含有GFP基因插入,該基因在3'端與EMX1 sgRNA融合。整個GFP卡匣係呈反義取向,且含有微型EF1a啟動子及β球蛋白聚A信號。 19147 Eco3_NT_ncRNA_EMX gRNA_25 Eco3 ncRNA在EMX1基因中含有25 bp插入(在sgRNA切割之相對股上),該基因在3'端與EMX1 sgRNA融合 19148 Aco1_EMX1 gRNA_ncRNA_25 Aco1 ncRNA在EMX1基因中含有25 bp插入,該基因在5'端與EMX1 sgRNA融合 19149 僅Aco1_ncRNA Aco1 ncRNA在EMX1基因中含有25 bp插入 19150 Aco1_ncRNA_AAVS1 gRNA_25 Aco1 ncRNA在AAVS1基因中含有25 bp插入,該基因在3'端與AAVS1 sgRNA融合 19151 Aco1_ncRNA_EMX1 gRNA_25 Aco1 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與EMX1 sgRNA融合 19152 Aco1_ncRNA_EMX1 gRNA_50 Aco1 ncRNA在EMX1基因中含有50 bp插入,該基因在3'端與EMX1 sgRNA融合 19153 Aco1_ncRNA_EMX1 gRNA_75 Aco1 ncRNA在EMX1基因中含有75 bp插入,該基因在3'端與EMX1 sgRNA融合 19154 Aco1_ncRNA_EMX1 gRNA_100 Aco1 ncRNA在EMX1基因中含有100 bp插入,該基因在3'端與EMX1 sgRNA融合 19155 Aco1_ncRNA_EMX1 gRNA_GFP基因 Aco1 ncRNA在EMX1基因中含有GFP基因插入,該基因在3'端與EMX1 sgRNA融合。整個GFP卡匣係呈反義取向,且含有微型EF1a啟動子及β球蛋白聚A信號。 19156 Aco1_NT_ncRNA_EMX1 gRNA_25 Aco1 ncRNA在EMX1基因中含有25 bp插入(在sgRNA切割之相對股上),該基因在3'端與EMX1 sgRNA融合 19157 R2042_EMX1 gRNA_ncRNA_25 R2042 ncRNA在EMX1基因中含有25 bp插入,該基因在5'端與EMX1 sgRNA融合 19158 R2042_僅ncRNA_25 R2042 ncRNA在EMX1基因中含有25 bp插入 19159 R2042_ncRNA_AAVS1 gRNA_25 R2042 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與AAVS1 sgRNA融合 19160 R2042_ncRNA_EMX1 gRNA_25 R2042 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與EMX1 sgRNA融合 19161 R2042_ncRNA_EMX1 gRNA_50 R2042 ncRNA在EMX1基因中含有50 bp插入,該基因在3'端與EMX1 sgRNA融合 19162 R2042_ncRNA_EMX1 gRNA_75 R2042 ncRNA在EMX1基因中含有75 bp插入,該基因在3'端與EMX1 sgRNA融合 19163 R2042_ncRNA_EMX1 gRNA_100 R2042 ncRNA在EMX1基因中含有100 bp插入,該基因在3'端與EMX1 sgRNA融合 19164 R2042_ncRNA_NT_EMX1 gRNA_25_雙重引導 R2042 ncRNA在EMX1基因中含有25 bp插入(在sgRNA切割之相對股上),該基因在3'端與EMX1 sgRNA融合 19165 R2042_RT R2042 RT mRNA序列,僅編碼序列 19166 R6943 RT R6943 RT mRNA序列,僅編碼序列 19167 R6943_ncRNA_EMX1gRNA_GFP基因 R6943 ncRNA在EMX1基因中含有GFP基因插入,該基因在3'端與EMX1 sgRNA融合。整個GFP卡匣係呈反義取向,且含有微型EF1a啟動子及β球蛋白聚A信號。 19168 R6943_ncRNA_EMX1gRNA_100bp R6943 ncRNA在EMX1基因中含有100 bp插入,該基因在3'端與EMX1 sgRNA融合 19169 R6943_ncRNA_EMX1gRNA_75bp R6943 ncRNA在EMX1基因中含有75 bp插入,該基因在3'端與EMX1 sgRNA融合 19170 R6943_ncRNA_EMX1gRNA_50bp R6943 ncRNA在EMX1基因中含有50 bp插入,該基因在3'端與EMX1 sgRNA融合 19171 R6943_ncRNA_EMX1gRNA_25bp R6943 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與EMX1 sgRNA融合 19172 R6943_ncRNA_AAVS1 gRNA_25bp R6943 ncRNA在AAVS1基因中含有25 bp插入,該基因在3'端與AAVS1 sgRNA融合 19173 R6943_僅ncRNA_25bp R6943 ncRNA在EMX1基因中含有25 bp插入 19174 AAVS1 sgRNA 含有來自Synthego之化學修飾:2'-F、2'-O-甲基、硫代磷酸酯 19175 R1262_RT-1 (R1262 RT) R1262逆轉錄子RT mRNA 19176 R1262_nc-5 (R1262_ncRNA-EMX1_505) R1262 ncRNA在EMX1位點中含有505 bp插入 19177 R1262_nc-3 (R1262_ncRNA-EMX1_305) R1262 ncRNA在EMX1位點中含有305 bp插入 19178 R1262_nc-1 (R1262_ncRNA-EMX1_25) R1262 ncRNA在EMX1位點中含有25 bp插入 19179 R1262_nc-4 (R1262_ncRNA-EMX1_405) R1262 ncRNA在EMX1位點中含有405 bp插入 19180 R1262_nc-2 (R1262_ncRNA-EMX1_205) R1262 ncRNA在EMX1位點中含有205 bp插入 19181 R1262_nc-19 (R1262_ncRNA-sgEMX1_205) R1262 ncRNA在EMX1位點中含有205 bp插入,該位點與3'端之EMX1 sgRNA組合 19182 R1262_nc-20 (R1262_sgEMX1-ncRNA_205) R1262 ncRNA在EMX1位點中含有205 bp插入,該位點與5'端之EMX1 sgRNA組合 19183 R1262_nc-13 (R1262_ncRNA-AAVS1_25) R1262 ncRNA在AAVS1位點中含有25 bp插入 19184 R1262_nc-14 (R1262_ncRNA-AAVS1_205) R1262 ncRNA在AAVS1位點中含有205 bp插入 19185 R1262_nc-15 (R1262_ncRNA-AAVS1_505) R1262 ncRNA在AAVS1位點中含有505 bp插入 19186 R1262_nc-18 (R1262_ncRNA-AAVS1-MS2_505) R1262 ncRNA在AAVS1位點中含有505 bp插入且在3’端含有MS2莖環 19187 R1262_nc-17 (R1262_ncRNA-AAVS1-MS2_205) R1262 ncRNA在AAVS1位點中含有205 bp插入且在3’端含有MS2莖環 19188 R1262_nc-16 (R1262_ncRNA-AAVS1-MS2_25) R1262 ncRNA在AAVS1位點中含有25 bp插入且在3'端含有MS2莖環 19189 R1262_nc-9 (R1262_ncRNA-EMX1-MS2_305) R1262 ncRNA在EMX1位點中含有305 bp插入且在3'端含有MS2莖環 19190 R1262_nc-7 (R1262_ncRNA-EMX1-MS2_25) R1262 ncRNA在EMX1位點中含有25 bp插入且在3'端含有MS2莖環 19191 R1262_nc-6 (R1262_ncRNA-EMX1_P2A-GFP) R1262 ncRNA在EMX1位點中含有P2A-GFP插入 19192 R1262_nc-22 (R1262_sgAAVS1-ncRNA_205) R1262 ncRNA在AAVS1位點中含有205 bp插入,該位點與5'端之sgAAVS1組合 19193 R1262_nc-8 (R1262_ncRNA-EMX1-MS2_205) R1262 ncRNA在EMX1位點中含有205 bp插入且在3'端含有MS2莖環 19194 R1262_nc-21 (R1262_ncRNA-sgAAVS1_205) R1262 ncRNA在AAVS1位點中含有205 bp插入,該位點與3'端之sgAAVS1組合 19195 R1262_nc-10 (R1262_ncRNA-EMX1-MS2_405) R1262 ncRNA在EMX1位點中含有405 bp插入且在3'端含有MS2莖環 19196 R1262_nc-11 (R1262_ncRNA-EMX1-MS2_505) R1262 ncRNA在EMX1位點中含有505 bp插入且在3'端含有MS2莖環 19197 R1262_nc-12 (R1262_ncRNA-EMX1-MS2_P2A-GFP) R1262 ncRNA在EMX1位點中含有P2A-GFP插入且在3'端含有MS2莖環 19198 R1262_nc-23 (R1262_ncRNA-EMX1_P2A-GFP_LongHA) 與R1262_nc-6相同,但具有較長同源臂 19199 R1262_nc-24 (R1262_ncRNA-EMX1-MS2_P2A-GFP_長HA) 與R1262_nc-12相同,但具有較長同源臂 19200 6083 v1 RT 6083逆轉錄子RT mRNA 19201 6083 v1_ncRNA_AAVS1 gRNA_25bp R6083逆轉錄子在AAVS1位點處含有25 bp插入 19202 6083 v1_僅ncRNA_25bp R6083逆轉錄子在EMX1位點處含有25 bp插入 19203 6083 v1_ncRNA_EMX1 gRNA_25bp R6083逆轉錄子在EMX1位點處含有25 bp插入且在3'端含有sgEMX1 19204 6083 v1_ncRNA_EMX1 gRNA_GFP基因 R6083逆轉錄子在EMX1位點處含有GFP基因插入且在3'端含有sgEMX1 19205 6083 v1_ncRNA_EMX1 gRNA_50bp R6083逆轉錄子在EMX1位點處含有50 bp插入且在3'端含有sgEMX1 19206 6083 v1_ncRNA_EMX1 gRNA_100bp R6083逆轉錄子在EMX1位點處含有100 bp插入且在3'端含有sgEMX1 19207 6083 v1_ncRNA_EMX1 gRNA_75bp R6083逆轉錄子在EMX1位點處含有75 bp插入且在3'端含有sgEMX1 19208 Aco1_ncRNA-EMX1_100 Aco1 ncRNA在EMX1基因中含有100 bp插入 19209 Aco1_ncRNA-EMX1_205 Aco1 ncRNA在EMX1基因中含有205 bp插入 19210 Eco3_ncRNA-EMX1_205 Eco3 ncRNA在EMX1基因中含有205 bp插入 19211 RNA 序列 序列名稱 描述 序列 Eco1 RT 僅編碼序列。由CRO產生RNA,且添加專有5'帽、5' UTR、3' UTR及約120個鹼基之經編碼聚A。經1N-甲基-假尿苷完全取代 AUGAAAUCUGCAGAGUAUCUGAAUACGUUCCGCCUUAGGAAUUUGGGCCUCCCCGUGAUGAACAAUCUCCACGAUAUGAGCAAGGCGACUCGAAUAUCCGUGGAAACGCUGAGACUGCUCAUCUAUACAGCAGACUUUCGGUACAGGAUCUACACGGUCGAAAAGAAGGGGCCUGAGAAACGCAUGCGAACAAUUUAUCAACCUAGCCGAGAGCUCAAGGCGUUGCAGGGCUGGGUUCUUCGAAACAUCCUUGACAAACUCUCAUCAUCACCCUUUAGUAUUGGGUUUGAAAAGCACCAAAGCAUCCUUAACAACGCGACGCCACACAUAGGUGCCAAUUUCAUAUUGAACAUCGACUUGGAGGAUUUUUUUCCGAGCCUCACAGCCAAUAAAGUGUUCGGUGUUUUUCACAGUCUUGGGUACAAUCGCCUUAUUAGUUCCGUUCUUACCAAGAUUUGUUGUUACAAGAAUCUCUUGCCCCAGGGAGCACCCAGCAGUCCGAAAUUGGCGAAUUUGAUUUGUUCCAAGCUCGAUUAUCGAAUACAAGGGUACGCGGGCAGCCGGGGACUCAUCUAUACCCGCUACGCAGACGAUCUUACGCUGUCUGCCCAAUCAAUGAAGAAGGUCGUAAAGGCGCGGGAUUUCUUGUUUUCUAUCAUCCCGUCCGAGGGCUUGGUAAUUAAUUCCAAAAAGACUUGUAUCUCAGGACCACGAUCUCAGCGAAAAGUGACAGGACUCGUCAUUUCUCAAGAAAAAGUCGGUAUAGGGAGAGAGAAGUAUAAGGAAAUCCGCGCGAAGAUCCACCACAUAUUCUGUGGCAAGAGCAGCGAGAUAGAACACGUCCGAGGCUGGUUGUCCUUCAUACUGAGCGUGGACUCAAAAAGCCACCGCCGGUUGAUCACCUAUAUUUCAAAACUGGAAAAGAAAUAUGGAAAGAACCCACUCAACAAAGCUAAAACACCACCAAAGAAGAAAAGAAAGGUCUGA (SEQ ID NO: 19126) Eco1 ncRNA-sgRNA ncRNA-sgRNA融合(EMX1基因中之6 bp取代) GAAAUGAUAAGAUUCCGUAUGCGCACCCUUAGCGAGAGGUUUAUCAUUAAGGUCAACCUCUGGAUGUUGUUUCGGCAUCCUGCAUUGAAUCUGAGUUACUGUCUGUUUUCCUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAAGGAAACCCGUUUCUUCUGACGUAAGGGUGCGCAUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUUUUUU (SEQ ID NO: 19127) Eco3 RT 僅編碼序列。由CRO產生RNA,且添加專有5'帽、5' UTR、3' UTR及約120個鹼基之經編碼聚A。經1N-甲基-假尿苷完全取代 AUGCGCAUUUACUCUCUGAUCGACAGCCAAACCUUAAUGACCAAAGGGUUCGCAUCCGAGGUCAUGAGGAGCCCAGAACCCCCUAAGAAGUGGGACAUUGCGAAGAAGAAGGGCGGAAUGCGUACGAUAUACCAUCCCUCUUCUAAGGUGAAGCUGAUACAGUACUGGCUGAUGAACAACGUGUUCUCCAAAUUGCCGAUGCACAACGCCGCGUACGCUUUCGUGAAGAAUAGAUCUAUCAAGUCUAACGCACUGCUGCACGCAGAGAGUAAGAACAAAUACUACGUUAAGAUUGACCUGAAGGACUUCUUUCCAAGCAUCAAGUUCACAGACUUCGAAUAUGCCUUUACCCGGUACCGUGACAGAAUAGAGUUCACGACCGAGUACGACAAAGAACUGCUUCAGCUGAUUAAGACCAUUUGUUUCAUUUCUGACUCUACACUGCCAAUAGGCUUCCCCACUUCCCCUCUUAUAGCCAAUUUCGUCGCCAGGGAGCUGGACGAGAAGCUCACUCAGAAGCUGAACGCUAUAGACAAGCUCAACGCUACGUACACUCGCUACGCAGACGACAUAAUCGUGAGCACGAACAUGAAGGGCGCCUCUAAGCUGAUCUUAGACUGCUUCAAGCGGACCAUGAAGGAAAUCGGACCCGAUUUCAAGAUCAAUAUCAAGAAGUUCAAAAUAUGCUCUGCCAGUGGCGGCUCAAUUGUCGUGACGGGCCUUAAGGUCUGUCAUGACUUCCACAUAACUCUGCACCGGUCUAUGAAGGACAAGAUCCGCCUGCACCUCUCUCUCCUGUCCAAAGGUAUUCUGAAGGACGAGGACCACAACAAGCUGUCCGGGUACAUCGCCUACGCUAAGGACAUCGAUCCACACUUCUACACCAAGCUCAAUAGGAAGUACUUCCAGGAGAUCAAGUGGAUACAAAACCUGCAUAAUAAGGUGGAGCCACCAAAGAAGAAAAGAAAGGUCUGA (SEQ ID NO: 19128) Eco3 ncRNA-sgRNA ncRNA-sgRNA融合(EMX1基因中之10 bp插入) GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUUUUUU (SEQ ID NO: 19129) Eco5 RT 僅編碼序列。由CRO產生RNA,且添加專有5'帽、5' UTR、3' UTR及約120個鹼基之經編碼聚A。經1N-甲基-假尿苷完全取代 AUGGACGCUACCAGAACGACUCUCCUUGCAUUGGAUCUCUUCGGAUCUCCAGGUUGGUCCGCCGAUAAAGAAAUUCAGAGGCUUCAUGCGCUCAGUAAUCAUGCUGGAAGGCAUUACAGAAGGAUUAUAUUAAGUAAAAGGCACGGCGGACAGCGUCUUGUGCUUGCACCUGAUUACUUGUUAAAGACCGUUCAGCGCAACAUUUUGAAGAACGUUUUGAGUCAAUUUCCACUGUCACCAUUUGCUACAGCCUACAGACCGGGAUGCCCAAUCGUGUCUAACGCGCAGCCACACUGCCAACAGCCACAGAUCUUGAAACUCGAUAUAGAAAACUUCUUCGAUUCUAUUAGUUGGUUGCAGGUGUGGCGGGUGUUUCGCCAGGCCCAGUUGCCCCGAAAUGUCGUAACGAUGCUCACUUGGAUAUGUUGUUAUAACGACGCACUUCCGCAGGGUGCCCCUACAUCCCCUGCAAUUUCCAAUCUCGUCAUGAGAAGGUUUGAUGAACGGAUUGGAGAAUGGUGUCAGGCUCGAGGGAUUACCUACACUCGCUACUGCGAUGACAUGACGUUUAGUGGACACUUCAAUGCAAGGCAGGUCAAGAAUAAAGUCUGCGGUCUCUUAGCUGAGCUGGGCCUUUCCCUGAAUAAACGGAAAGGCUGCCUCAUAGCGGCUUGUAAGCGCCAGCAAGUCACCGGCAUUGUUGUGAAUCACAAGCCACAGCUUGCCCGAGAAGCCAGGCGUGCCCUGCGUCAGGAAGUGCACCUGUGCCAGAAAUAUGGAGUUAUCUCUCAUCUCUCACAUAGAGGUGAACUGGAUCCUAGCGGAGAUCUGCACGCUCAGGCGACAGCGUAUCUCUAUGCACUCCAGGGGAGAAUUAACUGGCUUCUUCAAAUUAACCCUGAGGAUGAGGCGUUUCAACAGGCCCGGGAGUCCGUUAAGAGGAUGUUAGUUGCCUGGCCACCAAAGAAGAAAAGAAAGGUCUGA (SEQ ID NO: 19130) Eco5 ncRNA-sgRNA ncRNA-sgRNA融合(EMX1基因中之10 bp插入) GAAAUGAUAAGAUUCCGUACGCCAGCAGUGGCAAUAGCGUUUCCGGCCUUUUGUGCCGGGAGGGUCGGCGAGUCGCUGACUUAACGCCAGUAGUAUGUCCAUAUACCCAAAGUCGCUUCAUUGUAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUCCUUCGAGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUACAGUUACGCGCCUUCGGGAUGGUUUAAUGGUAUUGCCGCUGUUGGCGUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUUUUUU (SEQ ID NO: 19131) Cas9 所有U均為N-1-甲基假尿苷。 GGGUCCCGCAGUCGGCGUCCAGCGGCUCUGCUUGUUCGUGUGUGUGUCGUUGCAGGCCUUAUUCGGAUCCGCCACCAUGGACAAGAAGUACAGCAUCGGACUGGACAUCGGAACAAACAGCGUCGGAUGGGCAGUCAUCACAGACGAAUACAAGGUCCCGAGCAAGAAGUUCAAGGUCCUGGGAAACACAGACAGACACAGCAUCAA GAAGAACCUGAUCGGAGCACUGCUGUUCGACAGCGGAGAAACAGCAGAAGCAACAAGACUGAAGAGAAC AGCAAGAAGAAGAUACACAAGAAGAAAGAACAGAAUCUGCUACCUGCAGGAAAUCUUCAGCAACGAAAU GGCAAAGGUCGACGACAGCUUCUUCCACAGACUGGAAGAAAGCUUCCUGGUCGAAGAAGACAAGAAGCA CGAAAGACACCCGAUCUUCGGAAACAUCGUCGACGAAGUCGCAUACCACGAAAAGUACCCGACAAUCUAC CACCUGAGAAAGAAGCUGGUCGACAGCACAGACAAGGCAGACCUGAGACUGAUCUACCUGGCACUGGCA CACAUGAUCAAGUUCAGAGGACACUUCCUGAUCGAAGGAGACCUGAACCCGGACAACAGCGACGUCGAC AAGCUGUUCAUCCAGCUGGUCCAGACAUACAACCAGCUGUUCGAAGAAAACCCGAUCAACGCAAGCGGA GUCGACGCAAAGGCAAUCCUGAGCGCAAGACUGAGCAAGAGCAGAAGACUGGAAAACCUGAUCGCACAG CUGCCGGGAGAAAAGAAGAACGGACUGUUCGGAAACCUGAUCGCACUGAGCCUGGGACUGACACCGAAC UUCAAGAGCAACUUCGACCUGGCAGAAGACGCAAAGCUGCAGCUGAGCAAGGACACAUACGACGACGAC CUGGACAACCUGCUGGCACAGAUCGGAGACCAGUACGCAGACCUGUUCCUGGCAGCAAAGAACCUGAGC GACGCAAUCCUGCUGAGCGACAUCCUGAGAGUCAACACAGAAAUCACAAAGGCACCGCUGAGCGCAAGC AUGAUCAAGAGAUACGACGAACACCACCAGGACCUGACACUGCUGAAGGCACUGGUCAGACAGCAGCUG CCGGAAAAGUACAAGGAAAUCUUCUUCGACCAGAGCAAGAACGGAUACGCAGGAUACAUCGACGGAGGA GCAAGCCAGGAAGAAUUCUACAAGUUCAUCAAGCCGAUCCUGGAAAAGAUGGACGGAACAGAAGAACUG CUGGUCAAGCUGAACAGAGAAGACCUGCUGAGAAAGCAGAGAACAUUCGACAACGGAAGCAUCCCGCAC CAGAUCCACCUGGGAGAACUGCACGCAAUCCUGAGAAGACAGGAAGACUUCUACCCGUUCCUGAAGGAC AACAGAGAAAAGAUCGAAAAGAUCCUGACAUUCAGAAUCCCGUACUACGUCGGACCGCUGGCAAGAGGA AACAGCAGAUUCGCAUGGAUGACAAGAAAGAGCGAAGAAACAAUCACACCGUGGAACUUCGAAGAAGUC GUCGACAAGGGAGCAAGCGCACAGAGCUUCAUCGAAAGAAUGACAAACUUCGACAAGAACCUGCCGAAC GAAAAGGUCCUGCCGAAGCACAGCCUGCUGUACGAAUACUUCACAGUCUACAACGAACUGACAAAGGUC AAGUACGUCACAGAAGGAAUGAGAAAGCCGGCAUUCCUGAGCGGAGAACAGAAGAAGGCAAUCGUCGAC CUGCUGUUCAAGACAAACAGAAAGGUCACAGUCAAGCAGCUGAAGGAAGACUACUUCAAGAAGAUCGAA UGCUUCGACAGCGUCGAAAUCAGCGGAGUCGAAGACAGAUUCAACGCAAGCCUGGGAACAUACCACGAC CUGCUGAAGAUCAUCAAGGACAAGGACUUCCUGGACAACGAAGAAAACGAAGACAUCCUGGAAGACAUC GUCCUGACACUGACACUGUUCGAAGACAGAGAAAUGAUCGAAGAAAGACUGAAGACAUACGCACACCUG UUCGACGACAAGGUCAUGAAGCAGCUGAAGAGAAGAAGAUACACAGGAUGGGGAAGACUGAGCAGAAAG CUGAUCAACGGAAUCAGAGACAAGCAGAGCGGAAAGACAAUCCUGGACUUCCUGAAGAGCGACGGAUUC GCAAACAGAAACUUCAUGCAGCUGAUCCACGACGACAGCCUGACAUUCAAGGAAGACAUCCAGAAGGCA CAGGUCAGCGGACAGGGAGACAGCCUGCACGAACACAUCGCAAACCUGGCAGGAAGCCCGGCAAUCAAG AAGGGAAUCCUGCAGACAGUCAAGGUCGUCGACGAACUGGUCAAGGUCAUGGGAAGACACAAGCCGGAA AACAUCGUCAUCGAAAUGGCAAGAGAAAACCAGACAACACAGAAGGGACAGAAGAACAGCAGAGAAAGAAUGAAGAGAAUCGAAGAAGGAAUCAAGGAACUGGGAAGCCAGAUCCUGAAGGAACACCCGGUCGAAAAC ACACAGCUGCAGAACGAAAAGCUGUACCUGUACUACCUGCAGAACGGAAGAGACAUGUACGUCGACCAG GAACUGGACAUCAACAGACUGAGCGACUACGACGUCGACCACAUCGUCCCGCAGAGCUUCCUGAAGGACG ACAGCAUCGACAACAAGGUCCUGACAAGAAGCGACAAGAACAGAGGAAAGAGCGACAACGUCCCGAGCG AAGAAGUCGUCAAGAAGAUGAAGAACUACUGGAGACAGCUGCUGAACGCAAAGCUGAUCACACAGAGAA AGUUCGACAACCUGACAAAGGCAGAGAGAGGAGGACUGAGCGAACUGGACAAGGCAGGAUUCAUCAAGA GACAGCUGGUCGAAACAAGACAGAUCACAAAGCACGUCGCACAGAUCCUGGACAGCAGAAUGAACACAA AGUACGACGAAAACGACAAGCUGAUCAGAGAAGUCAAGGUCAUCACACUGAAGAGCAAGCUGGUCAGCG ACUUCAGAAAGGACUUCCAGUUCUACAAGGUCAGAGAAAUCAACAACUACCACCACGCACACGACGCAU ACCUGAACGCAGUCGUCGGAACAGCACUGAUCAAGAAGUACCCGAAGCUGGAAAGCGAAUUCGUCUACG GAGACUACAAGGUCUACGACGUCAGAAAGAUGAUCGCAAAGAGCGAACAGGAAAUCGGAAAGGCAACAG CAAAGUACUUCUUCUACAGCAACAUCAUGAACUUCUUCAAGACAGAAAUCACACUGGCAAACGGAGAAA UCAGAAAGAGACCGCUGAUCGAAACAAACGGAGAAACAGGAGAAAUCGUCUGGGACAAGGGAAGAGACU UCGCAACAGUCAGAAAGGUCCUGAGCAUGCCGCAGGUCAACAUCGUCAAGAAGACAGAAGUCCAGACAG GAGGAUUCAGCAAGGAAAGCAUCCUGCCGAAGAGAAACAGCGACAAGCUGAUCGCAAGAAAGAAGGACU GGGACCCGAAGAAGUACGGAGGAUUCGACAGCCCGACAGUCGCAUACAGCGUCCUGGUCGUCGCAAAGG UCGAAAAGGGAAAGAGCAAGAAGCUGAAGAGCGUCAAGGAACUGCUGGGAAUCACAAUCAUGGAAAGAAGCAGCUUCGAAAAGAACCCGAUCGACUUCCUGGAAGCAAAGGGAUACAAGGAAGUCAAGAAGGACCUGA UCAUCAAGCUGCCGAAGUACAGCCUGUUCGAACUGGAAAACGGAAGAAAGAGAAUGCUGGCAAGCGCAG GAGAACUGCAGAAGGGAAACGAACUGGCACUGCCGAGCAAGUACGUCAACUUCCUGUACCUGGCAAGCC ACUACGAAAAGCUGAAGGGAAGCCCGGAAGACAACGAACAGAAGCAGCUGUUCGUCGAACAGCACAAGC ACUACCUGGACGAAAUCAUCGAACAGAUCAGCGAAUUCAGCAAGAGAGUCAUCCUGGCAGACGCAAACC UGGACAAGGUCCUGAGCGCAUACAACAAGCACAGAGACAAGCCGAUCAGAGAACAGGCAGAAAACAUCA UCCACCUGUUCACACUGACAAACCUGGGAGCACCGGCAGCAUUCAAGUACUUCGACACAACAAUCGACAG AAAGAGAUACACAAGCACAAAGGAAGUCCUGGACGCAACACUGAUCCACCAGAGCAUCACAGGACUGUA CGAAACAAGAAUCGACCUGAGCCAGCUGGGAGGAGACGGAGGAGGAAGCCCGAAGAAGAAGAGAAAGGU CUAGCUAGCCAUCACAUUUAAAAGCAUCUCAGCCUACCAUGAGAAUAAGAGAAAGAAAAUGAAGAUCAA UAGCUUAUUCAUCUCUUUUUCUUUUUCGUUGGUGUAAAGCCAACACCCUGUCUAAAAAACAUAAAUUUC UUUAAUCAUUUUGCCUCUUUUCUCUGUGCUUCAAUUAAUAAAAAAUGGAAAGAACCUCGAGAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAUCUAG (SEQ ID NO: 19132) EMX1 sgRNA經修飾 含有來自Synthego之化學修飾:2'-F、2'-O-甲基、硫代磷酸酯 GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU (SEQ ID NO: 19133) EMX1 sgRNA未經修飾 無修飾 GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU (SEQ ID NO: 19134) Eco3 ncRNA Eco3 ncRNA僅具有10 bp插入 GGGAUAAUUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAGAU (SEQ ID NO: 19135) Eco3 ncRNA-MS2 Eco3 ncRNA僅具有10 bp插入及3' MS2莖環 GGGAUAAUUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAACAUGAGGAUCACCCAUGUGAU (SEQ ID NO: 19136) Aco1 RT Aco1 RT RNA序列,包括5'及3' UTR AGGGGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACCAUGGAGCCCAAUGACUACGUAAAUAGGUUGCGACAUGCUAUGGAAAUAAGUGAAAACCCGCGCUUUAGCCCUGAGUACAUUGCUCAAUGUUGUACUUACGCCGAGAAUCUCCUCAAGCAAGGGCUGCCAGUUCUGUUCGAUCAAACGCAUAUUCGGAAAGUUCUGGGGAUGGCGGCACCUCGAUUGUGUGAUUAUCACAGAUUCACAAUCCCAAAGCACAACGGAUCUAGAAUUAUCACGGCCCCAUCUAGGAAGCUGAAGCUUCGACAACAAUGGAUAUACCAGAAUAUCCUUAUACGAAAGGAGGCUUCACCGUACACGCACGGAUUUGUUCCUGAACGCAGCAUCGUGACUAACGCAAUCCUCCAUAUAGGAUACGCAUACACCUACUGCGUGGAUAUCACGGAUUUCUUUCCUAGCAUCACUAAGAAGCAGGUCUUGCCUAUAUUCCGAAAUAUGGGCUAUAGUGGUUCUGCUGCAAAUACUCUCUGCGACCUCUGUUGCUAUGACGGGGUCCUCCCCCAGGGGGCGCCUACUAGCCCAUACCUCAGUAACAUGAUUUGUCGCGAUCUUGAUGACGAAUUGGGGGCUAUGGCGCGGCGGUUCCGGGGGAUUUUUACACGGUAUGCGGAUGACAUAGCUAUCUCCACAAACCAACAACAGCCGCAACUUUUGGAUGCCUUGGGACUUAUCCUCGGGAAGCACGGAUUUCUUAUGAAUCUCGAUAAGUGUCGAGUCUAUAAUCCUGGACAGCCCAAAAGAAUUACUGGAUUGACCGUUCACAAUAGAGUAUCAGUUCCGAAAACCUUUAAGCGGAAAUUGCGGCAGGAAAUACAUUACUGUCAGAAGUUCGGAGUGACUGCACAUUUGGAAAACACGAAGGCUGCACGAUCCAUCCACUAUAGGGAACAUCUGUAUGGAAAGGCAUACUAUGUUAAAAUGGUUGAGCCUGAGCUCGGGGCGCACUUCCUCGAUGAGUUGUCAAAGGUAGACUGGCCAGAGCCACCAAAGAAGAAAAGAAAGGUCUGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGCCCUGCA (SEQ ID NO: 19137) Aco1 ncRNA-sgRNA_50 Aco1 ncRNA-sgRNA,具有50 bp插入 GGGGUAUAAAACCGGGAACGAUCAGACCGGGGUGAAUUCGCCCCCUUGAUCAAACGGCACUAACCACUGUUUGCCGUGCGUGCGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAACGCACGCACGGCAAACAGACAGAUCCAUUAUUAUUACAAUUUAUUUAGUGAUCGUUCCCGGUUUUAUACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUCCUGCA (SEQ ID NO: 19138) Aco1 sgRNA-ncRNA_50 Aco1 sgRNA-ncRNA,具有50 bp插入 GGGGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUGUAUAAAACCGGGAACGAUCAGACCGGGGUGAAUUCGCCCCCUUGAUCAAACGGCACUAACCACUGUUUGCCGUGCGUGCGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAACGCACGCACGGCAAACAGACAGAUCCAUUAUUAUUACAAUUUAUUUAGUGAUCGUUCCCGGUUUUAUACCCUGCA (SEQ ID NO: 19139) Eco3_EMX1 gRNA_ncRNA_25 Eco3 ncRNA在EMX1基因中含有25 bp插入,該基因在5'端與EMX1 sgRNA融合 GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUAAUAACAACGAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCA(SEQ ID NO: 19140) 僅Eco3_ncRNA Eco3 ncRNA在EMX1基因中含有25 bp插入 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCA(SEQ ID NO: 19141) Eco3_ncRNA_AAVS1 gRNA_25 Eco3 ncRNA在AAVS1基因中含有25 bp插入,該基因在3'端與AAVS1 sgRNA融合 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGACUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGGGGCCACUAGGGACAGGAUGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU(SEQ ID NO: 19142) Eco3_ncRNA_EMX1 gRNA_25 Eco3 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與EMX1 sgRNA融合 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU(SEQ ID NO: 19143) Eco3_ncRNA_EMX1 gRNA_50 Eco3 ncRNA在EMX1基因中含有50 bp插入,該基因在3'端與EMX1 sgRNA融合 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU(SEQ ID NO: 19144) Eco3_ncRNA_EMX1 gRNA_75 Eco3 ncRNA在EMX1基因中含有75 bp插入,該基因在3'端與EMX1 sgRNA融合 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU(SEQ ID NO: 19145) Eco3_ncRNA_EMX1 gRNA_100 Eco3 ncRNA在EMX1基因中含有100 bp插入,該基因在3'端與EMX1 sgRNA融合 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU(SEQ ID NO: 19146) Eco3_ncRNA_EMX1 gRNA_GFP基因 Eco3 ncRNA在EMX1基因中含有GFP基因插入,該基因在3'端與EMX1 sgRNA融合。整個GFP卡匣係呈反義取向,且含有微型EF1a啟動子及β球蛋白聚A信號。 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAcccauauguccuuccgagugagagacacaaaaaauuccaacacacuauugcaaugaaaauaaauuuccuuuauuagccagaagucagaugcucaaggggcuucaugauguccccauaauuuuuggcagagggaaaaagaucucagugguauuugugagccagggcauuggccacaccagccaccaccuucugauaggcagccugcaccugaggagugcggccgcuuuacuuguacagcucguccaugccgagagugaucccggcggcggucacgaacuccagcaggaccaugugaucgcgcuucucguuggggucuuugcucagggcggacugggugcucagguagugguugucgggcagcagcacggggccgucgccgauggggguguucugcugguaguggucggcgagcugcacgcugccguccucgauguuguggcggaucuugaaguucaccuugaugccguucuucugcuugucggccaugauauagacguuguggcuguuguaguuguacuccagcuugugccccaggauguugccguccuccuugaagucgaugcccuucagcucgaugcgguucaccagggugucgcccucgaacuucaccucggcgcgggucuuguaguugccgucguccuugaagaagauggugcgcuccuggacguagccuucgggcauggcggacuugaagaagucgugcugcuucauguggucgggguagcggcugaagcacugcacgccguaggucaggguggucacgagggugggccagggcacgggcagcuugccgguggugcagaugaacuucagggucagcuugccguagguggcaucgcccucgcccucgccggacacgcugaacuuguggccguuuacgucgccguccagcucgaccaggaugggcaccaccccggugaacagcuccucgcccuugcucaccaugguggcgaccgguggaucccgggcccgcgguaccgucgacugcagaauucgaagcuugagcucgagaucugaguccgguagcgcuagcggaucugacgguucacuaaacccuguguucuggcggcaaacccguugcgaaaaagaacguucacggcgacuacugcacuuauauacgguucucccccacccucgggaaaaaggcggagccaguacacgacaucacuuucccaguuuaccccgcgccaccuucucuaggcaccgguucaauugccgaccccuccccccaacuucucggggacugugggcgaugugcgcucugcccguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU(SEQ ID NO: 19147) Eco3_NT_ncRNA_EMX gRNA_25 Eco3 ncRNA在EMX1基因中含有25 bp插入(在sgRNA切割之相對股上),該基因在3'端與EMX1 sgRNA融合 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAugacaucgauguccuccccauuggccugcuucguggcaaugcgccaccgguugaugugaugggagaacuuUAUAGGAGUCAUUCAGAGCUCGAGUuuucuucugcucggacucaggcccuuccuccuccagcuucugccguuuguUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU(SEQ ID NO: 19148) Aco1_EMX1 gRNA_ncRNA_25 Aco1 ncRNA在EMX1基因中含有25 bp插入,該基因在5'端與EMX1 sgRNA融合 gaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuuAAUAACAACguauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauac(SEQ ID NO: 19149) 僅Aco1_ncRNA Aco1 ncRNA在EMX1基因中含有25 bp插入 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauac(SEQ ID NO: 19150) Aco1_ncRNA_AAVS1 gRNA_25 Aco1 ncRNA在AAVS1基因中含有25 bp插入,該基因在3'端與AAVS1 sgRNA融合 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19151) Aco1_ncRNA_EMX1 gRNA_25 Aco1 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與EMX1 sgRNA融合 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19152) Aco1_ncRNA_EMX1 gRNA_50 Aco1 ncRNA在EMX1基因中含有50 bp插入,該基因在3'端與EMX1 sgRNA融合 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19153) Aco1_ncRNA_EMX1 gRNA_75 Aco1 ncRNA在EMX1基因中含有75 bp插入,該基因在3'端與EMX1 sgRNA融合 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19154) Aco1_ncRNA_EMX1 gRNA_100 Aco1 ncRNA在EMX1基因中含有100 bp插入,該基因在3'端與EMX1 sgRNA融合 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19155) Aco1_ncRNA_EMX1 gRNA_GFP基因 Aco1 ncRNA在EMX1基因中含有GFP基因插入,該基因在3'端與EMX1 sgRNA融合。整個GFP卡匣係呈反義取向,且含有微型EF1a啟動子及β球蛋白聚A信號。 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaacccauauguccuuccgagugagagacacaaaaaauuccaacacacuauugcaaugaaaauaaauuuccuuuauuagccagaagucagaugcucaaggggcuucaugauguccccauaauuuuuggcagagggaaaaagaucucagugguauuugugagccagggcauuggccacaccagccaccaccuucugauaggcagccugcaccugaggagugcggccgcuuuacuuguacagcucguccaugccgagagugaucccggcggcggucacgaacuccagcaggaccaugugaucgcgcuucucguuggggucuuugcucagggcggacugggugcucagguagugguugucgggcagcagcacggggccgucgccgauggggguguucugcugguaguggucggcgagcugcacgcugccguccucgauguuguggcggaucuugaaguucaccuugaugccguucuucugcuugucggccaugauauagacguuguggcuguuguaguuguacuccagcuugugccccaggauguugccguccuccuugaagucgaugcccuucagcucgaugcgguucaccagggugucgcccucgaacuucaccucggcgcgggucuuguaguugccgucguccuugaagaagauggugcgcuccuggacguagccuucgggcauggcggacuugaagaagucgugcugcuucauguggucgggguagcggcugaagcacugcacgccguaggucaggguggucacgagggugggccagggcacgggcagcuugccgguggugcagaugaacuucagggucagcuugccguagguggcaucgcccucgcccucgccggacacgcugaacuuguggccguuuacgucgccguccagcucgaccaggaugggcaccaccccggugaacagcuccucgcccuugcucaccaugguggcgaccgguggaucccgggcccgcgguaccgucgacugcagaauucgaagcuugagcucgagaucugaguccgguagcgcuagcggaucugacgguucacuaaacccuguguucuggcggcaaacccguugcgaaaaagaacguucacggcgacuacugcacuuauauacgguucucccccacccucgggaaaaaggcggagccaguacacgacaucacuuucccaguuuaccccgcgccaccuucucuaggcaccgguucaauugccgaccccuccccccaacuucucggggacugugggcgaugugcgcucugcccguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19156) Aco1_NT_ncRNA_EMX1 gRNA_25 Aco1 ncRNA在EMX1基因中含有25 bp插入(在sgRNA切割之相對股上),該基因在3'端與EMX1 sgRNA融合 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguugacaucgauguccuccccauuggccugcuucguggcaaugcgccaccgguugaugugaugggagaacuuUAUAGGAGUCAUUCAGAGCUCGAGUuuucuucugcucggacucaggcccuuccuccuccagcuucugccguuuguacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19157) R2042_EMX1 gRNA_ncRNA_25 R2042 ncRNA在EMX1基因中含有25 bp插入,該基因在5'端與EMX1 sgRNA融合 gaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuuAAUAACAACGCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGC(SEQ ID NO: 19158) R2042_僅ncRNA_25 R2042 ncRNA在EMX1基因中含有25 bp插入 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGC(SEQ ID NO: 19159) R2042_ncRNA_AAVS1 gRNA_25 R2042 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與AAVS1 sgRNA融合 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19160) R2042_ncRNA_EMX1 gRNA_25 R2042 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19161) R2042_ncRNA_EMX1 gRNA_50 R2042 ncRNA在EMX1基因中含有50 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19162) R2042_ncRNA_EMX1 gRNA_75 R2042 ncRNA在EMX1基因中含有75 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19163) R2042_ncRNA_EMX1 gRNA_100 R2042 ncRNA在EMX1基因中含有100 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19164) R2042_ncRNA_NT_EMX1 gRNA_25_雙重引導 R2042 ncRNA在EMX1基因中含有25 bp插入(在sgRNA切割之相對股上),該基因在3'端與EMX1 sgRNA融合 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGugacaucgauguccuccccauuggccugcuucguggcaaugcgccaccuauaguuaguguacugcaacuuUAUAGGAGUCAUUCAGAGCUCGAGUuuucuucugcucggacucaggcccuuccuccuccagcuucugccguuuguCCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19165) R2042_RT R2042 RT mRNA序列,僅編碼序列 AUGAAGGACGACCAGUACUCUCAGUGGAAGAAGUACUACGAGAGCAGGGGCAUCCUGCCCGAGAUCCAGGACAAGCUCCUGAACUACGCCAAGAUCCACAUCGACAACAACACCCCGGUGAUCUUCAACUUCGAGCACCUGACCCUGCUGCUGGGCAGGGAGAAGAACUACCUGUCCAGCGUGGUGAACAGCCCCGACAGCCACUACAGAAAGUUCAAGAUCAAGAAGAGAUCCGGAGGCGAGAGGGAGAUCACCGCUCCCUAUCUCAGCCUCCUGGAGAUGCAGUACUGGAUCUACAGGAACAUCCUGAUCAACGUGAAGAUCCACUACGCCGCUCACGGCUUCGCUCAGGAUAAGAGCAUUAUCACCAACUCCAGGAACCACCUGGGGCAGAAACAUCUGCUCAAGAUGGAUCUGAAGGAUUUCUUCCCCAGCAUCAAGCUGAACAGAAUCAUCUACAUCUUCAAGAGCCUGGGCUACCCCAAUAUCAUCGCCUUCUACCUGGCCAGCAUCUGCAGCUACAAGGGCCACCUGCCCCAGGGCAGCCCCACAAGCCCUAUCCUGAGCAACAUCGUGAGCAUCACCCUGGACAACAGACUGGUGAAGUUCGCCAGAAAGAUGAAGCUGAGAUACAGCAGGUACGCCGACGACCUGACGUUCAGCGGGGACAAGAUCCCCACCAACUACAUCAAGUACAUCACCGACAUCAUCAAUGAUGAGGGCUUCGAGGUGAACGACACCAAGACCAAGCUCUACCUGAAGGCCGGGAAGAGAAUCGUGACGGGCAUUUCCGUGAUCGGAAAUGACCCGAAGCUUCCGCGGGAAUACAAGCGGAAGCUGAAGCAGGAGCUGCACUACAUCUUCACCUACGGCAUCGGCAGCCACAUGGCCAAGAAGAAGAUCAAGAAGAUCAACUACCUCUACAGGAUCAUCGGCAAGGUGAACUUCUGGCUGAACAUCGAGCCCGACAACGAGUACGCCAGAAACGCCAAGGCCAAGCUGCUGCUGCUCAUCGACAACccaccaaagaagaaaagaaaggucuga(SEQ ID NO: 19166) R6943 RT R6943 RT mRNA序列,僅編碼序列 AUGGAGGAGAGCACCAACUACAAGCUGCUGGUGUGGGGACUGAGCGUGAUCCAACCCGCUACCCCCAACGAGGUGCUGAACUACCUCACCAGCACCCUGAACGAUAACGGGCUGCUGCCCGACGUGGAGAAGAUGAUCCACUACUUUGAGCUGCUGGACCAGCUGGGCUACAUCCACCAGGUGAGCAAGAGGAACAACCUCUACUCACUGACCCCCAGGGGCAACGAAAGGUUGACCCCUGCCCUGAAGAGACUCAGGGACAAGAUCAGACUGUUCAUGCUGGACAACUGCCACAGCAUCAGCAAGCUGGGCGUGCUGGCCAGCACAGAUACAGAGAACAUGGGGGGCGACAGCCCCUCACUCCAGCUGAGGCACAACCUCAAGGAGGUGCCUCAUCCCAGCCUUAGCUGGGCUGCUGGAACCCUGCCUAGCUCUCCUAGGCAGGCUUGGGUUCGGAUCUACGAACAGCUGAACAUCGGCAGCAUGAGCAGCGACGAGGCUAGCACACCCACCACCGCCAGAAAUGCCCCCCUGAGCUUCGUGGGCAGGCUGGGCUUCAGCCUGAACUACUACAGCUUCAACAAGAUCGACGAGCCCCUGUUCAACAACGAUGGCGUGACCGCCAUCGCCAGCUGCAUCGGCAUCAGCCCCGGGCUGAUUACCGCUAUGGUGAAGUCACCAAAGCGGUACUACAGGACCUUCAACCUGAGAAAGAAGUCCGGGGGCUUCAGAUCCAUUCUGGCCCCCAGAAAGUUCAUCAAGACCAUCCAGUACUGGCUGAAGGAUCAUGUGCUGAACAGGCUCAAGAUCCACAGCUCCUGUUACAGCUACAGGAGCGGCGUGUCCAUCAAGGACAACGCCAUCAACCACGUGAAGAAGAAGUUCGUGGCCAGCAUCGACAUUUCCGAUUACUUCGGAAGCAUCAACAAGAAGAUGGUGAAGGACUGCUUUUACAAGAACAAUAUUCCCGAUCACAUCGUGAAUACCAUCAGCGGCAUCGUGACCUACAACGACGUGCUGCCUCAGGGCGCUCCCACCAGCCCUAUCAUUAGCAACGCCAUCCUGUUCGAGUUCGACGAGGAGAUGACGGCUCAUGCCCUCACUCUCGACUGUAUCUACACCAGAUACAGCGACGACAUCUCGAUAUCCUCCGACUAUAAGGAGAAUAUCGCCAUCCUGAUCAACAUCGCCGAGGCCAACCUGUUGAGCGCUGGAUUCACGCUCAACAGACAGAAGCAAAGGAUUGCUUCUGACAACAGCCGCCAGGUUGUGACCGGCAUCCUGGUGAACGAGAGCAUCAGACCCACCAGAUGCUACAGAAAGAAGAUCAGAAGCGCCUUUGAUCACGCCCUGAAGGAGCAGGACGGCUCCCAGCUGACAAUCAACAAGUUGAGGGGCUACCUCAACUACCUGAAGUCCUUCGAGACCUACGGCUUCAAGUUCAACGAGAAGAAGUAUAAGGAGACCCUGGAUUUCCUGAUCGCUCUGAAGCAGAGCccaccaaagaagaaaagaaaggucuga(SEQ ID NO: 19167) R6943_ncRNA_EMX1gRNA_GFP基因 R6943 ncRNA在EMX1基因中含有GFP基因插入,該基因在3'端與EMX1 sgRNA融合。整個GFP卡匣係呈反義取向,且含有微型EF1a啟動子及β球蛋白聚A信號。 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaacccauauguccuuccgagugagagacacaaaaaauuccaacacacuauugcaaugaaaauaaauuuccuuuauuagccagaagucagaugcucaaggggcuucaugauguccccauaauuuuuggcagagggaaaaagaucucagugguauuugugagccagggcauuggccacaccagccaccaccuucugauaggcagccugcaccugaggagugcggccgcuuuacuuguacagcucguccaugccgagagugaucccggcggcggucacgaacuccagcaggaccaugugaucgcgcuucucguuggggucuuugcucagggcggacugggugcucagguagugguugucgggcagcagcacggggccgucgccgauggggguguucugcugguaguggucggcgagcugcacgcugccguccucgauguuguggcggaucuugaaguucaccuugaugccguucuucugcuugucggccaugauauagacguuguggcuguuguaguuguacuccagcuugugccccaggauguugccguccuccuugaagucgaugcccuucagcucgaugcgguucaccagggugucgcccucgaacuucaccucggcgcgggucuuguaguugccgucguccuugaagaagauggugcgcuccuggacguagccuucgggcauggcggacuugaagaagucgugcugcuucauguggucgggguagcggcugaagcacugcacgccguaggucaggguggucacgagggugggccagggcacgggcagcuugccgguggugcagaugaacuucagggucagcuugccguagguggcaucgcccucgcccucgccggacacgcugaacuuguggccguuuacgucgccguccagcucgaccaggaugggcaccaccccggugaacagcuccucgcccuugcucaccaugguggcgaccgguggaucccgggcccgcgguaccgucgacugcagaauucgaagcuugagcucgagaucugaguccgguagcgcuagcggaucugacgguucacuaaacccuguguucuggcggcaaacccguugcgaaaaagaacguucacggcgacuacugcacuuauauacgguucucccccacccucgggaaaaaggcggagccaguacacgacaucacuuucccaguuuaccccgcgccaccuucucuaggcaccgguucaauugccgaccccuccccccaacuucucggggacugugggcgaugugcgcucugcccguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19168) R6943_ncRNA_EMX1gRNA_100bp R6943 ncRNA在EMX1基因中含有100 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19169) R6943_ncRNA_EMX1gRNA_75bp R6943 ncRNA在EMX1基因中含有75 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19170) R6943_ncRNA_EMX1gRNA_50bp R6943 ncRNA在EMX1基因中含有50 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19171) R6943_ncRNA_EMX1gRNA_25bp R6943 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19172) R6943_ncRNA_AAVS1 gRNA_25bp R6943 ncRNA在AAVS1基因中含有25 bp插入,該基因在3'端與AAVS1 sgRNA融合 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19173) R6943_僅ncRNA_25bp R6943 ncRNA在EMX1基因中含有25 bp插入 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGC(SEQ ID NO: 19174) AAVS1 sgRNA 含有來自Synthego之化學修飾:2'-F、2'-O-甲基、硫代磷酸酯 GGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu (SEQ ID NO: 19175) 錯誤! 非有效鏈接。引子序列 描述 序列 AAVS1 NGS_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGCTG GGACCACCTTATATTCC(SEQ ID NO: 19212) AAVS1 NGS_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTTCT GGCAAGGAGAGAGATG(SEQ ID NO: 19213) EMX1_NGS_F3 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGaggtgaag gtgtggttccag(SEQ ID NO: 19214) EMX1_NGS_R3 GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGtagtcatt ggaggtgacatcg(SEQ ID NO: 19215) EMX1_NGS_F1 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGaggtgaag gtgtggttccag(SEQ ID NO: 19216) EMX1_NGS_R1 GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGgccaga gtccagcttggg(SEQ ID NO: 19217) qPCR引子 qPCR 引子- 非NGS EMX1正向引子 aggtgaaggtgtggttccag(SEQ ID NO: 19218) EMX1反向引子 tagtcattggaggtgacatcg(SEQ ID NO: 19219) 5’接合處GFP反向引子 ccccttgagcatctgacttc(SEQ ID NO: 19220) 3’接合處GFP正向引子 catcactttcccagtttaccc(SEQ ID NO: 19221) RT變異體(編碼序列) RT變異體 ORF序列 SEQ ID NO: 6342S_N_465_K ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAAGGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19222 6342S_N_465_Q ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCCAGGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19223 6342S_N_465_R ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCCGGGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19224 6342S_E_466_K ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACAAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19225 6342S_E_466_N ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACAACATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19226 6342S_E_466_Q ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACCAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19227 6342S_E_466_R ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACCGGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19228 6342S_S_468_K ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAAGGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19229 6342S_S_468_N ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAACGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19230 6342S_S_468_Q ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCCAGGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19231 6342S_S_468_R ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCCGGGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19232 6342S_M_470_K ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCAAGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19233 6342S_M_470_N ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCAACACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19234 6342S_M_470_Q ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCCAGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19235 6342S_M_470_R ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCCGGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19236 6342S_S_472_G ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCGGCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19237 6342S_S_472_P ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCCCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19238 6342S_W_473_F ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTTCGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19239 6342S_W_473_K ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCAAGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19240 6342S_W_473_R ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCCGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19241 6342S_W_473_Y ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTACGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19242 6342S_K_475_N ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAACTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19243 6342S_K_475_Q ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGCAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19244 6342S_K_475_R ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGCGGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19245 6342S_S_476_K ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGAAGGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19246 6342S_S_476_N ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGAACGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19247 6342S_S_476_Q ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGCAGGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19248 6342S_S_476_R ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGCGGGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19249 6342S_V_477_F ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCTTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19250 6342S_V_477_I ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCATCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19251 6342S_V_477_L ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCCTGAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19252 6342S_V_477_W ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCTGGAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19253 6342S_V_265_P ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGCCTCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19254 6342S_M470K_V477W ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAATAAGGTTAAGAATAGTGGAGAAATTCTAAGTAATAAAGTCATAAACGAGATCAGCGGCAAGACCTCCTGGGTGAAGTCCTTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACCCACCAAAGAAGAAAAGAAAGGTCTGA SEQ ID NO: 19255 6342S_M470K_V477W_V265P ATGATGGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGCCCCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAATAAGGTTAAGAATAGTGGAGAAATTCTAAGTAATAAAGTCATAAACGAGATCAGCGGCAAGACCTCCTGGGTGAAGTCCTTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACCCACCAAAGAAGAAAAGAAAGGTCTGA SEQ ID NO: 19256 6342S_ntermtrunc_whelix ATGGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAACTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAATAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAAGAGAGATACAAGGACaTTATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaCGGaaggtctga SEQ ID NO: 19257 Sso7d-PASTE (v2)_WT 6342S ATGACCGTCAAGTTCAAGTACAAGGGTGAGGAACTTGAAGTTGATATTAGCAAAATCAAGAAGGTTTGGCGCGTTGGTAAAATGATATCTTTTACTTATGACGACAACGGCAAGACAGGTAGAGGGGCAGTGTCTGAGAAAGACGCCCCCAAGGAGCTGTTGCAAATGTTGGAAAAGTCTGGGAAAAAGTCTGGCGGCTCAAAAAGAACCGCCGACGGCAGCGAATTCGAGCCCAAGAAGAAGAGGAAAGTCGGGACAGGTGGCGGTGGTGTCGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAAC SEQ ID NO: 19258 Sso7d-ncbiprotein (v1)_WT 6342S ATGACAGTGAAGTTCAAGTACAAGGGCGAGGAAAAGGAAGTCGACATCAGCAAGATCAAGAAAGTGTGGCGGGTGGGCAAGATGATCAGCTTCACCTACGACGAGGGCGGAGGAAAGACCGGCAGAGGCGCCGTGTCTGAGAAAGATGCCCCTAAGGAGCTGCTGCAGATGCTGGAAAAACAGAAGAAGCCCAAGAAGAAGAGGAAAGTCGGGACAGGTGGCGGTGGTGTCGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAAC SEQ ID NO: 19259 Sac7d-ncbiprotein (v2)_WT 6342S ATGGTGAAGGTGAAATTCAAGTACAAGGGCGAGGAAAAAGAGGTGGATACAAGCAAGATCAAGAAAGTGTGGAGAGTGGGAAAGATGGTCAGCTTCACCTACGACGACAACGGCAAGACCGGCAGAGGCGCCGTGTCTGAGAAGGACGCCCCTAAGGAACTGCTGGATATGCTGGCTAGAGCCGAGCGGGAAAAGAAGCCCAAGAAGAAGAGGAAAGTCGGGACAGGTGGCGGTGGTGTCGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAAC SEQ ID NO: 19260 Sac7d-ncbigene (v1)_WT 6342S ATGGTCAAGGTGAAGTTCAAGTACAAGGGCGAGGAAAAAGAGGTGGATACAAGCAAGATCAAGAAAGTGTGGAGAGTGGGAAAAATGGTGAGCTTCACCTACGACGACAACGGCAAGACCGGCAGAGGCGCCGTGTCTGAGAAGGACGCCCCTAAGGAACTGCTGGATATGCTGGCTAGAGCCGAGCGGGAAAAGAAGCCCAAGAAGAAGAGGAAAGTCGGGACAGGTGGCGGTGGTGTCGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAAC SEQ ID NO: 19261 質體序列 名稱 描述 序列 SEQ ID NO: Eco1 RT 僅編碼序列。由CRO產生RNA,且添加專有5'帽、5' UTR、3' UTR及約120個鹼基之經編碼聚A。經1N-甲基-假尿苷完全取代 AUGAAAUCUGCAGAGUAUCUGAAUACGUUCCGCCUUAGGAAUUUGGGCCUCCCCGUGAUGAACAAUCUCCACGAUAUGAGCAAGGCGACUCGAAUAUCCGUGGAAACGCUGAGACUGCUCAUCUAUACAGCAGACUUUCGGUACAGGAUCUACACGGUCGAAAAGAAGGGGCCUGAGAAACGCAUGCGAACAAUUUAUCAACCUAGCCGAGAGCUCAAGGCGUUGCAGGGCUGGGUUCUUCGAAACAUCCUUGACAAACUCUCAUCAUCACCCUUUAGUAUUGGGUUUGAAAAGCACCAAAGCAUCCUUAACAACGCGACGCCACACAUAGGUGCCAAUUUCAUAUUGAACAUCGACUUGGAGGAUUUUUUUCCGAGCCUCACAGCCAAUAAAGUGUUCGGUGUUUUUCACAGUCUUGGGUACAAUCGCCUUAUUAGUUCCGUUCUUACCAAGAUUUGUUGUUACAAGAAUCUCUUGCCCCAGGGAGCACCCAGCAGUCCGAAAUUGGCGAAUUUGAUUUGUUCCAAGCUCGAUUAUCGAAUACAAGGGUACGCGGGCAGCCGGGGACUCAUCUAUACCCGCUACGCAGACGAUCUUACGCUGUCUGCCCAAUCAAUGAAGAAGGUCGUAAAGGCGCGGGAUUUCUUGUUUUCUAUCAUCCCGUCCGAGGGCUUGGUAAUUAAUUCCAAAAAGACUUGUAUCUCAGGACCACGAUCUCAGCGAAAAGUGACAGGACUCGUCAUUUCUCAAGAAAAAGUCGGUAUAGGGAGAGAGAAGUAUAAGGAAAUCCGCGCGAAGAUCCACCACAUAUUCUGUGGCAAGAGCAGCGAGAUAGAACACGUCCGAGGCUGGUUGUCCUUCAUACUGAGCGUGGACUCAAAAAGCCACCGCCGGUUGAUCACCUAUAUUUCAAAACUGGAAAAGAAAUAUGGAAAGAACCCACUCAACAAAGCUAAAACACCACCAAAGAAGAAAAGAAAGGUCUGA SEQ ID NO: 19262 Eco1 ncRNA-sgRNA ncRNA-sgRNA融合(EMX1基因中之6 bp取代) GAAAUGAUAAGAUUCCGUAUGCGCACCCUUAGCGAGAGGUUUAUCAUUAAGGUCAACCUCUGGAUGUUGUUUCGGCAUCCUGCAUUGAAUCUGAGUUACUGUCUGUUUUCCUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAAGGAAACCCGUUUCUUCUGACGUAAGGGUGCGCAUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUUUUUU SEQ ID NO: 19263 Eco3 RT 僅編碼序列。由CRO產生RNA,且添加專有5'帽、5' UTR、3' UTR及約120個鹼基之經編碼聚A。經1N-甲基-假尿苷完全取代 AUGCGCAUUUACUCUCUGAUCGACAGCCAAACCUUAAUGACCAAAGGGUUCGCAUCCGAGGUCAUGAGGAGCCCAGAACCCCCUAAGAAGUGGGACAUUGCGAAGAAGAAGGGCGGAAUGCGUACGAUAUACCAUCCCUCUUCUAAGGUGAAGCUGAUACAGUACUGGCUGAUGAACAACGUGUUCUCCAAAUUGCCGAUGCACAACGCCGCGUACGCUUUCGUGAAGAAUAGAUCUAUCAAGUCUAACGCACUGCUGCACGCAGAGAGUAAGAACAAAUACUACGUUAAGAUUGACCUGAAGGACUUCUUUCCAAGCAUCAAGUUCACAGACUUCGAAUAUGCCUUUACCCGGUACCGUGACAGAAUAGAGUUCACGACCGAGUACGACAAAGAACUGCUUCAGCUGAUUAAGACCAUUUGUUUCAUUUCUGACUCUACACUGCCAAUAGGCUUCCCCACUUCCCCUCUUAUAGCCAAUUUCGUCGCCAGGGAGCUGGACGAGAAGCUCACUCAGAAGCUGAACGCUAUAGACAAGCUCAACGCUACGUACACUCGCUACGCAGACGACAUAAUCGUGAGCACGAACAUGAAGGGCGCCUCUAAGCUGAUCUUAGACUGCUUCAAGCGGACCAUGAAGGAAAUCGGACCCGAUUUCAAGAUCAAUAUCAAGAAGUUCAAAAUAUGCUCUGCCAGUGGCGGCUCAAUUGUCGUGACGGGCCUUAAGGUCUGUCAUGACUUCCACAUAACUCUGCACCGGUCUAUGAAGGACAAGAUCCGCCUGCACCUCUCUCUCCUGUCCAAAGGUAUUCUGAAGGACGAGGACCACAACAAGCUGUCCGGGUACAUCGCCUACGCUAAGGACAUCGAUCCACACUUCUACACCAAGCUCAAUAGGAAGUACUUCCAGGAGAUCAAGUGGAUACAAAACCUGCAUAAUAAGGUGGAGCCACCAAAGAAGAAAAGAAAGGUCUGA SEQ ID NO: 19264 Eco3 ncRNA-sgRNA ncRNA-sgRNA融合(EMX1基因中之10 bp插入) GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUUUUUU SEQ ID NO: 19265 Eco5 RT 僅編碼序列。由CRO產生RNA,且添加專有5'帽、5' UTR、3' UTR及約120個鹼基之經編碼聚A。經1N-甲基-假尿苷完全取代 AUGGACGCUACCAGAACGACUCUCCUUGCAUUGGAUCUCUUCGGAUCUCCAGGUUGGUCCGCCGAUAAAGAAAUUCAGAGGCUUCAUGCGCUCAGUAAUCAUGCUGGAAGGCAUUACAGAAGGAUUAUAUUAAGUAAAAGGCACGGCGGACAGCGUCUUGUGCUUGCACCUGAUUACUUGUUAAAGACCGUUCAGCGCAACAUUUUGAAGAACGUUUUGAGUCAAUUUCCACUGUCACCAUUUGCUACAGCCUACAGACCGGGAUGCCCAAUCGUGUCUAACGCGCAGCCACACUGCCAACAGCCACAGAUCUUGAAACUCGAUAUAGAAAACUUCUUCGAUUCUAUUAGUUGGUUGCAGGUGUGGCGGGUGUUUCGCCAGGCCCAGUUGCCCCGAAAUGUCGUAACGAUGCUCACUUGGAUAUGUUGUUAUAACGACGCACUUCCGCAGGGUGCCCCUACAUCCCCUGCAAUUUCCAAUCUCGUCAUGAGAAGGUUUGAUGAACGGAUUGGAGAAUGGUGUCAGGCUCGAGGGAUUACCUACACUCGCUACUGCGAUGACAUGACGUUUAGUGGACACUUCAAUGCAAGGCAGGUCAAGAAUAAAGUCUGCGGUCUCUUAGCUGAGCUGGGCCUUUCCCUGAAUAAACGGAAAGGCUGCCUCAUAGCGGCUUGUAAGCGCCAGCAAGUCACCGGCAUUGUUGUGAAUCACAAGCCACAGCUUGCCCGAGAAGCCAGGCGUGCCCUGCGUCAGGAAGUGCACCUGUGCCAGAAAUAUGGAGUUAUCUCUCAUCUCUCACAUAGAGGUGAACUGGAUCCUAGCGGAGAUCUGCACGCUCAGGCGACAGCGUAUCUCUAUGCACUCCAGGGGAGAAUUAACUGGCUUCUUCAAAUUAACCCUGAGGAUGAGGCGUUUCAACAGGCCCGGGAGUCCGUUAAGAGGAUGUUAGUUGCCUGGCCACCAAAGAAGAAAAGAAAGGUCUGA SEQ ID NO: 19266 Eco5 ncRNA-sgRNA ncRNA-sgRNA融合(EMX1基因中之10 bp插入) GAAAUGAUAAGAUUCCGUACGCCAGCAGUGGCAAUAGCGUUUCCGGCCUUUUGUGCCGGGAGGGUCGGCGAGUCGCUGACUUAACGCCAGUAGUAUGUCCAUAUACCCAAAGUCGCUUCAUUGUAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUCCUUCGAGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUACAGUUACGCGCCUUCGGGAUGGUUUAAUGGUAUUGCCGCUGUUGGCGUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUUUUUU SEQ ID NO: 19267 Cas9 序列取自Intellia患者。所有U均為N-1-甲基假尿苷。 GGGUCCCGCAGUCGGCGUCCAGCGGCUCUGCUUGUUCGUGUGUGUGUCGUUGCAGGCCUUAUUCGGAUC CGCCACCAUGGACAAGAAGUACAGCAUCGGACUGGACAUCGGAACAAACAGCGUCGGAUGGGCAGUCAU CACAGACGAAUACAAGGUCCCGAGCAAGAAGUUCAAGGUCCUGGGAAACACAGACAGACACAGCAUCAA GAAGAACCUGAUCGGAGCACUGCUGUUCGACAGCGGAGAAACAGCAGAAGCAACAAGACUGAAGAGAAC AGCAAGAAGAAGAUACACAAGAAGAAAGAACAGAAUCUGCUACCUGCAGGAAAUCUUCAGCAACGAAAU GGCAAAGGUCGACGACAGCUUCUUCCACAGACUGGAAGAAAGCUUCCUGGUCGAAGAAGACAAGAAGCA CGAAAGACACCCGAUCUUCGGAAACAUCGUCGACGAAGUCGCAUACCACGAAAAGUACCCGACAAUCUAC CACCUGAGAAAGAAGCUGGUCGACAGCACAGACAAGGCAGACCUGAGACUGAUCUACCUGGCACUGGCA CACAUGAUCAAGUUCAGAGGACACUUCCUGAUCGAAGGAGACCUGAACCCGGACAACAGCGACGUCGAC AAGCUGUUCAUCCAGCUGGUCCAGACAUACAACCAGCUGUUCGAAGAAAACCCGAUCAACGCAAGCGGA GUCGACGCAAAGGCAAUCCUGAGCGCAAGACUGAGCAAGAGCAGAAGACUGGAAAACCUGAUCGCACAG CUGCCGGGAGAAAAGAAGAACGGACUGUUCGGAAACCUGAUCGCACUGAGCCUGGGACUGACACCGAAC UUCAAGAGCAACUUCGACCUGGCAGAAGACGCAAAGCUGCAGCUGAGCAAGGACACAUACGACGACGAC CUGGACAACCUGCUGGCACAGAUCGGAGACCAGUACGCAGACCUGUUCCUGGCAGCAAAGAACCUGAGC GACGCAAUCCUGCUGAGCGACAUCCUGAGAGUCAACACAGAAAUCACAAAGGCACCGCUGAGCGCAAGC AUGAUCAAGAGAUACGACGAACACCACCAGGACCUGACACUGCUGAAGGCACUGGUCAGACAGCAGCUG CCGGAAAAGUACAAGGAAAUCUUCUUCGACCAGAGCAAGAACGGAUACGCAGGAUACAUCGACGGAGGA GCAAGCCAGGAAGAAUUCUACAAGUUCAUCAAGCCGAUCCUGGAAAAGAUGGACGGAACAGAAGAACUG CUGGUCAAGCUGAACAGAGAAGACCUGCUGAGAAAGCAGAGAACAUUCGACAACGGAAGCAUCCCGCAC CAGAUCCACCUGGGAGAACUGCACGCAAUCCUGAGAAGACAGGAAGACUUCUACCCGUUCCUGAAGGAC AACAGAGAAAAGAUCGAAAAGAUCCUGACAUUCAGAAUCCCGUACUACGUCGGACCGCUGGCAAGAGGA AACAGCAGAUUCGCAUGGAUGACAAGAAAGAGCGAAGAAACAAUCACACCGUGGAACUUCGAAGAAGUC GUCGACAAGGGAGCAAGCGCACAGAGCUUCAUCGAAAGAAUGACAAACUUCGACAAGAACCUGCCGAAC GAAAAGGUCCUGCCGAAGCACAGCCUGCUGUACGAAUACUUCACAGUCUACAACGAACUGACAAAGGUC AAGUACGUCACAGAAGGAAUGAGAAAGCCGGCAUUCCUGAGCGGAGAACAGAAGAAGGCAAUCGUCGAC CUGCUGUUCAAGACAAACAGAAAGGUCACAGUCAAGCAGCUGAAGGAAGACUACUUCAAGAAGAUCGAA UGCUUCGACAGCGUCGAAAUCAGCGGAGUCGAAGACAGAUUCAACGCAAGCCUGGGAACAUACCACGAC CUGCUGAAGAUCAUCAAGGACAAGGACUUCCUGGACAACGAAGAAAACGAAGACAUCCUGGAAGACAUC GUCCUGACACUGACACUGUUCGAAGACAGAGAAAUGAUCGAAGAAAGACUGAAGACAUACGCACACCUG UUCGACGACAAGGUCAUGAAGCAGCUGAAGAGAAGAAGAUACACAGGAUGGGGAAGACUGAGCAGAAAG CUGAUCAACGGAAUCAGAGACAAGCAGAGCGGAAAGACAAUCCUGGACUUCCUGAAGAGCGACGGAUUC GCAAACAGAAACUUCAUGCAGCUGAUCCACGACGACAGCCUGACAUUCAAGGAAGACAUCCAGAAGGCA CAGGUCAGCGGACAGGGAGACAGCCUGCACGAACACAUCGCAAACCUGGCAGGAAGCCCGGCAAUCAAG AAGGGAAUCCUGCAGACAGUCAAGGUCGUCGACGAACUGGUCAAGGUCAUGGGAAGACACAAGCCGGAA AACAUCGUCAUCGAAAUGGCAAGAGAAAACCAGACAACACAGAAGGGACAGAAGAACAGCAGAGAAAGAAUGAAGAGAAUCGAAGAAGGAAUCAAGGAACUGGGAAGCCAGAUCCUGAAGGAACACCCGGUCGAAAAC ACACAGCUGCAGAACGAAAAGCUGUACCUGUACUACCUGCAGAACGGAAGAGACAUGUACGUCGACCAG GAACUGGACAUCAACAGACUGAGCGACUACGACGUCGACCACAUCGUCCCGCAGAGCUUCCUGAAGGACG ACAGCAUCGACAACAAGGUCCUGACAAGAAGCGACAAGAACAGAGGAAAGAGCGACAACGUCCCGAGCG AAGAAGUCGUCAAGAAGAUGAAGAACUACUGGAGACAGCUGCUGAACGCAAAGCUGAUCACACAGAGAA AGUUCGACAACCUGACAAAGGCAGAGAGAGGAGGACUGAGCGAACUGGACAAGGCAGGAUUCAUCAAGA GACAGCUGGUCGAAACAAGACAGAUCACAAAGCACGUCGCACAGAUCCUGGACAGCAGAAUGAACACAA AGUACGACGAAAACGACAAGCUGAUCAGAGAAGUCAAGGUCAUCACACUGAAGAGCAAGCUGGUCAGCG ACUUCAGAAAGGACUUCCAGUUCUACAAGGUCAGAGAAAUCAACAACUACCACCACGCACACGACGCAU ACCUGAACGCAGUCGUCGGAACAGCACUGAUCAAGAAGUACCCGAAGCUGGAAAGCGAAUUCGUCUACG GAGACUACAAGGUCUACGACGUCAGAAAGAUGAUCGCAAAGAGCGAACAGGAAAUCGGAAAGGCAACAG CAAAGUACUUCUUCUACAGCAACAUCAUGAACUUCUUCAAGACAGAAAUCACACUGGCAAACGGAGAAA UCAGAAAGAGACCGCUGAUCGAAACAAACGGAGAAACAGGAGAAAUCGUCUGGGACAAGGGAAGAGACU UCGCAACAGUCAGAAAGGUCCUGAGCAUGCCGCAGGUCAACAUCGUCAAGAAGACAGAAGUCCAGACAG GAGGAUUCAGCAAGGAAAGCAUCCUGCCGAAGAGAAACAGCGACAAGCUGAUCGCAAGAAAGAAGGACU GGGACCCGAAGAAGUACGGAGGAUUCGACAGCCCGACAGUCGCAUACAGCGUCCUGGUCGUCGCAAAGG UCGAAAAGGGAAAGAGCAAGAAGCUGAAGAGCGUCAAGGAACUGCUGGGAAUCACAAUCAUGGAAAGAA GCAGCUUCGAAAAGAACCCGAUCGACUUCCUGGAAGCAAAGGGAUACAAGGAAGUCAAGAAGGACCUGA UCAUCAAGCUGCCGAAGUACAGCCUGUUCGAACUGGAAAACGGAAGAAAGAGAAUGCUGGCAAGCGCAG GAGAACUGCAGAAGGGAAACGAACUGGCACUGCCGAGCAAGUACGUCAACUUCCUGUACCUGGCAAGCC ACUACGAAAAGCUGAAGGGAAGCCCGGAAGACAACGAACAGAAGCAGCUGUUCGUCGAACAGCACAAGC ACUACCUGGACGAAAUCAUCGAACAGAUCAGCGAAUUCAGCAAGAGAGUCAUCCUGGCAGACGCAAACC UGGACAAGGUCCUGAGCGCAUACAACAAGCACAGAGACAAGCCGAUCAGAGAACAGGCAGAAAACAUCA UCCACCUGUUCACACUGACAAACCUGGGAGCACCGGCAGCAUUCAAGUACUUCGACACAACAAUCGACAG AAAGAGAUACACAAGCACAAAGGAAGUCCUGGACGCAACACUGAUCCACCAGAGCAUCACAGGACUGUA CGAAACAAGAAUCGACCUGAGCCAGCUGGGAGGAGACGGAGGAGGAAGCCCGAAGAAGAAGAGAAAGGU CUAGCUAGCCAUCACAUUUAAAAGCAUCUCAGCCUACCAUGAGAAUAAGAGAAAGAAAAUGAAGAUCAA UAGCUUAUUCAUCUCUUUUUCUUUUUCGUUGGUGUAAAGCCAACACCCUGUCUAAAAAACAUAAAUUUC UUUAAUCAUUUUGCCUCUUUUCUCUGUGCUUCAAUUAAUAAAAAAUGGAAAGAACCUCGAGAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAUCUAG SEQ ID NO: 19268 EMX1 sgRNA經修飾 含有來自Synthego之化學修飾:2'-F、2'-O-甲基、硫代磷酸酯 GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU SEQ ID NO: 19269 EMX1 sgRNA未經修飾 無修飾 GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU SEQ ID NO: 19270 Eco3 ncRNA Eco3 ncRNA僅具有10 bp插入 GGGAUAAUUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAGAU SEQ ID NO: 19271 Eco3 ncRNA-MS2 Eco3 ncRNA僅具有10 bp插入及3' MS2莖環 GGGAUAAUUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19272 Eco3環狀ncRNA Eco3 ncRNA,僅在環狀形式中具有10 bp插入 UACCGGCGAAACAAAAGAAAAAACCAAAAAAACAAAACACAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAAAAACAAAAAACAAAACGGCUAUUAUGCGUUACCGGCG SEQ ID NO: 19273 Aco1 RT Aco1 RT RNA序列,包括5'及3' UTR AGGGGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACCAUGGAGCCCAAUGACUACGUAAAUAGGUUGCGACAUGCUAUGGAAAUAAGUGAAAACCCGCGCUUUAGCCCUGAGUACAUUGCUCAAUGUUGUACUUACGCCGAGAAUCUCCUCAAGCAAGGGCUGCCAGUUCUGUUCGAUCAAACGCAUAUUCGGAAAGUUCUGGGGAUGGCGGCACCUCGAUUGUGUGAUUAUCACAGAUUCACAAUCCCAAAGCACAACGGAUCUAGAAUUAUCACGGCCCCAUCUAGGAAGCUGAAGCUUCGACAACAAUGGAUAUACCAGAAUAUCCUUAUACGAAAGGAGGCUUCACCGUACACGCACGGAUUUGUUCCUGAACGCAGCAUCGUGACUAACGCAAUCCUCCAUAUAGGAUACGCAUACACCUACUGCGUGGAUAUCACGGAUUUCUUUCCUAGCAUCACUAAGAAGCAGGUCUUGCCUAUAUUCCGAAAUAUGGGCUAUAGUGGUUCUGCUGCAAAUACUCUCUGCGACCUCUGUUGCUAUGACGGGGUCCUCCCCCAGGGGGCGCCUACUAGCCCAUACCUCAGUAACAUGAUUUGUCGCGAUCUUGAUGACGAAUUGGGGGCUAUGGCGCGGCGGUUCCGGGGGAUUUUUACACGGUAUGCGGAUGACAUAGCUAUCUCCACAAACCAACAACAGCCGCAACUUUUGGAUGCCUUGGGACUUAUCCUCGGGAAGCACGGAUUUCUUAUGAAUCUCGAUAAGUGUCGAGUCUAUAAUCCUGGACAGCCCAAAAGAAUUACUGGAUUGACCGUUCACAAUAGAGUAUCAGUUCCGAAAACCUUUAAGCGGAAAUUGCGGCAGGAAAUACAUUACUGUCAGAAGUUCGGAGUGACUGCACAUUUGGAAAACACGAAGGCUGCACGAUCCAUCCACUAUAGGGAACAUCUGUAUGGAAAGGCAUACUAUGUUAAAAUGGUUGAGCCUGAGCUCGGGGCGCACUUCCUCGAUGAGUUGUCAAAGGUAGACUGGCCAGAGCCACCAAAGAAGAAAAGAAAGGUCUGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGCCCUGCA SEQ ID NO: 19274 Aco1 ncRNA_12 Aco1 ncRNA在EMX1基因中含有12 bp插入 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguCAGAAGAAGAAGGGCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCCAAUGuucgaacgaucgGGUGGGCAACCACAAACCCACGAGGGCAGAGUGCUGCUUGCUGCUacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACGUCACCUCCAAUGACUAGGGguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19275 Aco1 ncRNA-sgRNA_50 Aco1 ncRNA-sgRNA,具有50 bp插入 GGGGUAUAAAACCGGGAACGAUCAGACCGGGGUGAAUUCGCCCCCUUGAUCAAACGGCACUAACCACUGUUUGCCGUGCGUGCGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAACGCACGCACGGCAAACAGACAGAUCCAUUAUUAUUACAAUUUAUUUAGUGAUCGUUCCCGGUUUUAUACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUCCUGCA SEQ ID NO: 19276 Aco1 sgRNA-ncRNA_50 Aco1 sgRNA-ncRNA,具有50 bp插入 GGGGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUGUAUAAAACCGGGAACGAUCAGACCGGGGUGAAUUCGCCCCCUUGAUCAAACGGCACUAACCACUGUUUGCCGUGCGUGCGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAACGCACGCACGGCAAACAGACAGAUCCAUUAUUAUUACAAUUUAUUUAGUGAUCGUUCCCGGUUUUAUACCCUGCA SEQ ID NO: 19277 Eco3_EMX1 gRNA_ncRNA_25 Eco3 ncRNA在EMX1基因中含有25 bp插入,該基因在5'端與EMX1 sgRNA融合 GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUAAUAACAACGAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCA SEQ ID NO: 19278 僅Eco3_ncRNA Eco3 ncRNA在EMX1基因中含有25 bp插入 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCA SEQ ID NO: 19279 Eco3_ncRNA_AAVS1 gRNA_25 Eco3 ncRNA在AAVS1基因中含有25 bp插入,該基因在3'端與AAVS1 sgRNA融合 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGACUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGGGGCCACUAGGGACAGGAUGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU SEQ ID NO: 19280 Eco3_ncRNA_EMX1 gRNA_25 Eco3 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與EMX1 sgRNA融合 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU SEQ ID NO: 19281 Eco3_ncRNA_EMX1 gRNA_50 Eco3 ncRNA在EMX1基因中含有50 bp插入,該基因在3'端與EMX1 sgRNA融合 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU SEQ ID NO: 19282 Eco3_ncRNA_EMX1 gRNA_75 Eco3 ncRNA在EMX1基因中含有75 bp插入,該基因在3'端與EMX1 sgRNA融合 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU SEQ ID NO: 19283 Eco3_ncRNA_EMX1 gRNA_100 Eco3 ncRNA在EMX1基因中含有100 bp插入,該基因在3'端與EMX1 sgRNA融合 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU SEQ ID NO: 19284 Eco3_ncRNA_EMX1 gRNA_GFP基因 Eco3 ncRNA在EMX1基因中含有GFP基因插入,該基因在3'端與EMX1 sgRNA融合。整個GFP卡匣係呈反義取向,且含有微型EF1a啟動子及β球蛋白聚A信號。 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAcccauauguccuuccgagugagagacacaaaaaauuccaacacacuauugcaaugaaaauaaauuuccuuuauuagccagaagucagaugcucaaggggcuucaugauguccccauaauuuuuggcagagggaaaaagaucucagugguauuugugagccagggcauuggccacaccagccaccaccuucugauaggcagccugcaccugaggagugcggccgcuuuacuuguacagcucguccaugccgagagugaucccggcggcggucacgaacuccagcaggaccaugugaucgcgcuucucguuggggucuuugcucagggcggacugggugcucagguagugguugucgggcagcagcacggggccgucgccgauggggguguucugcugguaguggucggcgagcugcacgcugccguccucgauguuguggcggaucuugaaguucaccuugaugccguucuucugcuugucggccaugauauagacguuguggcuguuguaguuguacuccagcuugugccccaggauguugccguccuccuugaagucgaugcccuucagcucgaugcgguucaccagggugucgcccucgaacuucaccucggcgcgggucuuguaguugccgucguccuugaagaagauggugcgcuccuggacguagccuucgggcauggcggacuugaagaagucgugcugcuucauguggucgggguagcggcugaagcacugcacgccguaggucaggguggucacgagggugggccagggcacgggcagcuugccgguggugcagaugaacuucagggucagcuugccguagguggcaucgcccucgcccucgccggacacgcugaacuuguggccguuuacgucgccguccagcucgaccaggaugggcaccaccccggugaacagcuccucgcccuugcucaccaugguggcgaccgguggaucccgggcccgcgguaccgucgacugcagaauucgaagcuugagcucgagaucugaguccgguagcgcuagcggaucugacgguucacuaaacccuguguucuggcggcaaacccguugcgaaaaagaacguucacggcgacuacugcacuuauauacgguucucccccacccucgggaaaaaggcggagccaguacacgacaucacuuucccaguuuaccccgcgccaccuucucuaggcaccgguucaauugccgaccccuccccccaacuucucggggacugugggcgaugugcgcucugcccguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU SEQ ID NO: 19285 Eco3_NT_ncRNA_EMX gRNA_25 Eco3 ncRNA在EMX1基因中含有25 bp插入(在sgRNA切割之相對股上),該基因在3'端與EMX1 sgRNA融合 GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAugacaucgauguccuccccauuggccugcuucguggcaaugcgccaccgguugaugugaugggagaacuuUAUAGGAGUCAUUCAGAGCUCGAGUuuucuucugcucggacucaggcccuuccuccuccagcuucugccguuuguUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU SEQ ID NO: 19286 Aco1_EMX1 gRNA_ncRNA_25 Aco1 ncRNA在EMX1基因中含有25 bp插入,該基因在5'端與EMX1 sgRNA融合 gaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuuAAUAACAACguauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauac SEQ ID NO: 19287 僅Aco1_ncRNA Aco1 ncRNA在EMX1基因中含有25 bp插入 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauac SEQ ID NO: 19288 Aco1_ncRNA_AAVS1 gRNA_25 Aco1 ncRNA在AAVS1基因中含有25 bp插入,該基因在3'端與AAVS1 sgRNA融合 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19289 Aco1_ncRNA_EMX1 gRNA_25 Aco1 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與EMX1 sgRNA融合 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19290 Aco1_ncRNA_EMX1 gRNA_50 Aco1 ncRNA在EMX1基因中含有50 bp插入,該基因在3'端與EMX1 sgRNA融合 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19291 Aco1_ncRNA_EMX1 gRNA_75 Aco1 ncRNA在EMX1基因中含有75 bp插入,該基因在3'端與EMX1 sgRNA融合 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19292 Aco1_ncRNA_EMX1 gRNA_100 Aco1 ncRNA在EMX1基因中含有100 bp插入,該基因在3'端與EMX1 sgRNA融合 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19293 Aco1_ncRNA_EMX1 gRNA_GFP基因 Aco1 ncRNA在EMX1基因中含有GFP基因插入,該基因在3'端與EMX1 sgRNA融合。整個GFP卡匣係呈反義取向,且含有微型EF1a啟動子及β球蛋白聚A信號。 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaacccauauguccuuccgagugagagacacaaaaaauuccaacacacuauugcaaugaaaauaaauuuccuuuauuagccagaagucagaugcucaaggggcuucaugauguccccauaauuuuuggcagagggaaaaagaucucagugguauuugugagccagggcauuggccacaccagccaccaccuucugauaggcagccugcaccugaggagugcggccgcuuuacuuguacagcucguccaugccgagagugaucccggcggcggucacgaacuccagcaggaccaugugaucgcgcuucucguuggggucuuugcucagggcggacugggugcucagguagugguugucgggcagcagcacggggccgucgccgauggggguguucugcugguaguggucggcgagcugcacgcugccguccucgauguuguggcggaucuugaaguucaccuugaugccguucuucugcuugucggccaugauauagacguuguggcuguuguaguuguacuccagcuugugccccaggauguugccguccuccuugaagucgaugcccuucagcucgaugcgguucaccagggugucgcccucgaacuucaccucggcgcgggucuuguaguugccgucguccuugaagaagauggugcgcuccuggacguagccuucgggcauggcggacuugaagaagucgugcugcuucauguggucgggguagcggcugaagcacugcacgccguaggucaggguggucacgagggugggccagggcacgggcagcuugccgguggugcagaugaacuucagggucagcuugccguagguggcaucgcccucgcccucgccggacacgcugaacuuguggccguuuacgucgccguccagcucgaccaggaugggcaccaccccggugaacagcuccucgcccuugcucaccaugguggcgaccgguggaucccgggcccgcgguaccgucgacugcagaauucgaagcuugagcucgagaucugaguccgguagcgcuagcggaucugacgguucacuaaacccuguguucuggcggcaaacccguugcgaaaaagaacguucacggcgacuacugcacuuauauacgguucucccccacccucgggaaaaaggcggagccaguacacgacaucacuuucccaguuuaccccgcgccaccuucucuaggcaccgguucaauugccgaccccuccccccaacuucucggggacugugggcgaugugcgcucugcccguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19294 Aco1_NT_ncRNA_EMX1 gRNA_25 Aco1 ncRNA在EMX1基因中含有25 bp插入(在sgRNA切割之相對股上),該基因在3'端與EMX1 sgRNA融合 guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgugcgugcguugacaucgauguccuccccauuggccugcuucguggcaaugcgccaccgguugaugugaugggagaacuuUAUAGGAGUCAUUCAGAGCUCGAGUuuucuucugcucggacucaggcccuuccuccuccagcuucugccguuuguacgcacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19295 R2042_EMX1 gRNA_ncRNA_25 R2042 ncRNA在EMX1基因中含有25 bp插入,該基因在5'端與EMX1 sgRNA融合 gaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuuAAUAACAACGCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGC SEQ ID NO: 19296 R2042_僅ncRNA_25 R2042 ncRNA在EMX1基因中含有25 bp插入 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGC SEQ ID NO: 19297 R2042_ncRNA_AAVS1 gRNA_25 R2042 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與AAVS1 sgRNA融合 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19298 R2042_ncRNA_EMX1 gRNA_25 R2042 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19299 R2042_ncRNA_EMX1 gRNA_50 R2042 ncRNA在EMX1基因中含有50 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19300 R2042_ncRNA_EMX1 gRNA_75 R2042 ncRNA在EMX1基因中含有75 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19301 R2042_ncRNA_EMX1 gRNA_100 R2042 ncRNA在EMX1基因中含有100 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19302 R2042_ncRNA_NT_EMX1 gRNA_25_雙重引導 R2042 ncRNA在EMX1基因中含有25 bp插入(在sgRNA切割之相對股上),該基因在3'端與EMX1 sgRNA融合 GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGugacaucgauguccuccccauuggccugcuucguggcaaugcgccaccuauaguuaguguacugcaacuuUAUAGGAGUCAUUCAGAGCUCGAGUuuucuucugcucggacucaggcccuuccuccuccagcuucugccguuuguCCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19303 R2042_RT R2042 RT mRNA序列,僅編碼序列 AUGAAGGACGACCAGUACUCUCAGUGGAAGAAGUACUACGAGAGCAGGGGCAUCCUGCCCGAGAUCCAGGACAAGCUCCUGAACUACGCCAAGAUCCACAUCGACAACAACACCCCGGUGAUCUUCAACUUCGAGCACCUGACCCUGCUGCUGGGCAGGGAGAAGAACUACCUGUCCAGCGUGGUGAACAGCCCCGACAGCCACUACAGAAAGUUCAAGAUCAAGAAGAGAUCCGGAGGCGAGAGGGAGAUCACCGCUCCCUAUCUCAGCCUCCUGGAGAUGCAGUACUGGAUCUACAGGAACAUCCUGAUCAACGUGAAGAUCCACUACGCCGCUCACGGCUUCGCUCAGGAUAAGAGCAUUAUCACCAACUCCAGGAACCACCUGGGGCAGAAACAUCUGCUCAAGAUGGAUCUGAAGGAUUUCUUCCCCAGCAUCAAGCUGAACAGAAUCAUCUACAUCUUCAAGAGCCUGGGCUACCCCAAUAUCAUCGCCUUCUACCUGGCCAGCAUCUGCAGCUACAAGGGCCACCUGCCCCAGGGCAGCCCCACAAGCCCUAUCCUGAGCAACAUCGUGAGCAUCACCCUGGACAACAGACUGGUGAAGUUCGCCAGAAAGAUGAAGCUGAGAUACAGCAGGUACGCCGACGACCUGACGUUCAGCGGGGACAAGAUCCCCACCAACUACAUCAAGUACAUCACCGACAUCAUCAAUGAUGAGGGCUUCGAGGUGAACGACACCAAGACCAAGCUCUACCUGAAGGCCGGGAAGAGAAUCGUGACGGGCAUUUCCGUGAUCGGAAAUGACCCGAAGCUUCCGCGGGAAUACAAGCGGAAGCUGAAGCAGGAGCUGCACUACAUCUUCACCUACGGCAUCGGCAGCCACAUGGCCAAGAAGAAGAUCAAGAAGAUCAACUACCUCUACAGGAUCAUCGGCAAGGUGAACUUCUGGCUGAACAUCGAGCCCGACAACGAGUACGCCAGAAACGCCAAGGCCAAGCUGCUGCUGCUCAUCGACAACccaccaaagaagaaaagaaaggucuga SEQ ID NO: 19304 R6943 RT R6943 RT mRNA序列,僅編碼序列 AUGGAGGAGAGCACCAACUACAAGCUGCUGGUGUGGGGACUGAGCGUGAUCCAACCCGCUACCCCCAACGAGGUGCUGAACUACCUCACCAGCACCCUGAACGAUAACGGGCUGCUGCCCGACGUGGAGAAGAUGAUCCACUACUUUGAGCUGCUGGACCAGCUGGGCUACAUCCACCAGGUGAGCAAGAGGAACAACCUCUACUCACUGACCCCCAGGGGCAACGAAAGGUUGACCCCUGCCCUGAAGAGACUCAGGGACAAGAUCAGACUGUUCAUGCUGGACAACUGCCACAGCAUCAGCAAGCUGGGCGUGCUGGCCAGCACAGAUACAGAGAACAUGGGGGGCGACAGCCCCUCACUCCAGCUGAGGCACAACCUCAAGGAGGUGCCUCAUCCCAGCCUUAGCUGGGCUGCUGGAACCCUGCCUAGCUCUCCUAGGCAGGCUUGGGUUCGGAUCUACGAACAGCUGAACAUCGGCAGCAUGAGCAGCGACGAGGCUAGCACACCCACCACCGCCAGAAAUGCCCCCCUGAGCUUCGUGGGCAGGCUGGGCUUCAGCCUGAACUACUACAGCUUCAACAAGAUCGACGAGCCCCUGUUCAACAACGAUGGCGUGACCGCCAUCGCCAGCUGCAUCGGCAUCAGCCCCGGGCUGAUUACCGCUAUGGUGAAGUCACCAAAGCGGUACUACAGGACCUUCAACCUGAGAAAGAAGUCCGGGGGCUUCAGAUCCAUUCUGGCCCCCAGAAAGUUCAUCAAGACCAUCCAGUACUGGCUGAAGGAUCAUGUGCUGAACAGGCUCAAGAUCCACAGCUCCUGUUACAGCUACAGGAGCGGCGUGUCCAUCAAGGACAACGCCAUCAACCACGUGAAGAAGAAGUUCGUGGCCAGCAUCGACAUUUCCGAUUACUUCGGAAGCAUCAACAAGAAGAUGGUGAAGGACUGCUUUUACAAGAACAAUAUUCCCGAUCACAUCGUGAAUACCAUCAGCGGCAUCGUGACCUACAACGACGUGCUGCCUCAGGGCGCUCCCACCAGCCCUAUCAUUAGCAACGCCAUCCUGUUCGAGUUCGACGAGGAGAUGACGGCUCAUGCCCUCACUCUCGACUGUAUCUACACCAGAUACAGCGACGACAUCUCGAUAUCCUCCGACUAUAAGGAGAAUAUCGCCAUCCUGAUCAACAUCGCCGAGGCCAACCUGUUGAGCGCUGGAUUCACGCUCAACAGACAGAAGCAAAGGAUUGCUUCUGACAACAGCCGCCAGGUUGUGACCGGCAUCCUGGUGAACGAGAGCAUCAGACCCACCAGAUGCUACAGAAAGAAGAUCAGAAGCGCCUUUGAUCACGCCCUGAAGGAGCAGGACGGCUCCCAGCUGACAAUCAACAAGUUGAGGGGCUACCUCAACUACCUGAAGUCCUUCGAGACCUACGGCUUCAAGUUCAACGAGAAGAAGUAUAAGGAGACCCUGGAUUUCCUGAUCGCUCUGAAGCAGAGCccaccaaagaagaaaagaaaggucuga SEQ ID NO: 19305 R6943_ncRNA_EMX1gRNA_GFP基因 R6943 ncRNA在EMX1基因中含有GFP基因插入,該基因在3'端與EMX1 sgRNA融合。整個GFP卡匣係呈反義取向,且含有微型EF1a啟動子及β球蛋白聚A信號。 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaacccauauguccuuccgagugagagacacaaaaaauuccaacacacuauugcaaugaaaauaaauuuccuuuauuagccagaagucagaugcucaaggggcuucaugauguccccauaauuuuuggcagagggaaaaagaucucagugguauuugugagccagggcauuggccacaccagccaccaccuucugauaggcagccugcaccugaggagugcggccgcuuuacuuguacagcucguccaugccgagagugaucccggcggcggucacgaacuccagcaggaccaugugaucgcgcuucucguuggggucuuugcucagggcggacugggugcucagguagugguugucgggcagcagcacggggccgucgccgauggggguguucugcugguaguggucggcgagcugcacgcugccguccucgauguuguggcggaucuugaaguucaccuugaugccguucuucugcuugucggccaugauauagacguuguggcuguuguaguuguacuccagcuugugccccaggauguugccguccuccuugaagucgaugcccuucagcucgaugcgguucaccagggugucgcccucgaacuucaccucggcgcgggucuuguaguugccgucguccuugaagaagauggugcgcuccuggacguagccuucgggcauggcggacuugaagaagucgugcugcuucauguggucgggguagcggcugaagcacugcacgccguaggucaggguggucacgagggugggccagggcacgggcagcuugccgguggugcagaugaacuucagggucagcuugccguagguggcaucgcccucgcccucgccggacacgcugaacuuguggccguuuacgucgccguccagcucgaccaggaugggcaccaccccggugaacagcuccucgcccuugcucaccaugguggcgaccgguggaucccgggcccgcgguaccgucgacugcagaauucgaagcuugagcucgagaucugaguccgguagcgcuagcggaucugacgguucacuaaacccuguguucuggcggcaaacccguugcgaaaaagaacguucacggcgacuacugcacuuauauacgguucucccccacccucgggaaaaaggcggagccaguacacgacaucacuuucccaguuuaccccgcgccaccuucucuaggcaccgguucaauugccgaccccuccccccaacuucucggggacugugggcgaugugcgcucugcccguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19306 R6943_ncRNA_EMX1gRNA_100bp R6943 ncRNA在EMX1基因中含有100 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19307 R6943_ncRNA_EMX1gRNA_75bp R6943 ncRNA在EMX1基因中含有75 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19308 R6943_ncRNA_EMX1gRNA_50bp R6943 ncRNA在EMX1基因中含有50 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19309 R6943_ncRNA_EMX1gRNA_25bp R6943 ncRNA在EMX1基因中含有25 bp插入,該基因在3'端與EMX1 sgRNA融合 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19310 R6943_ncRNA_AAVS1 gRNA_25bp R6943 ncRNA在AAVS1基因中含有25 bp插入,該基因在3'端與AAVS1 sgRNA融合 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19311 R6943_ncRNA only_25bp R6943 ncRNA在EMX1基因中含有25 bp插入 GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGC SEQ ID NO: 19312 AAVS1 sgRNA 含有來自Synthego之化學修飾:2'-F、2'-O-甲基、硫代磷酸酯 GGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19313 R1262_RT-1 (R1262 RT) R1262逆轉錄子RT mRNA AGGAUAAUGGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACCAUGAUCAGCUUCAGCGAGAUCAAGAGCAGAAACGAUUUUGCAGACGCUCUGCAGAUUCCUCGGAGCGUGCUGACGCACGUUCUGUACAUUGCCAAGCCCGAGAGCUUCUACGAGAGCUUCACCAUCCCCAAGAAGAAUGGGGAGGACAGAAUCAUCAUGGCCCCCAAGGGCACCCUGAAGUCCAUCCAGACCAAGCUGAGCAAGCAGCUGGUGGAGUACAGAGCCUCCAUCAGCCAGAAAGGCCAGGAGAAGUCCAACAUCUCCCAUGGCUUCGAGAGGGAGAAGUCCAUCAUCACCAAUGCUCAGAUACACCGCAACAAGCGGUAUGUCAUCAACUACGACCUGAAGGACUUCUUCGACUCCUUCCACUUCGGCAGAGUGGUGGGAUUCUUCGAGAAGAACAAGCACUUCCUGCUGCCCUACGAGGUGGCCGUGAUCAUCGCCCAGCUGACCUGCUAUAAUGGCAGGCUGCCCCAGGGCGCCCCCACAAGCCCUGUGAUCACCAACCUGAUCUGCGAGAUCCUGGACUACAGGGUGCUCAAGAUCGCCAAGAGAUACAAGCUGGACUACACCAGGUACGCUGACGACCUGACCUUCAGCACCAACUACUCCAGAUUCCUGGAGGUGUUCGACAGCUUCGCCAAGGAGCUGCUCCAGGAGAUCAGCAACUCUGGUUUUACCAUAAACCAGAGCAAGACCAGGCUGCUGUACAGAGACAGCCGCCAGGAGGUCACAGGCCUCGUGGUGAAUAAGAAGAUCGGCGUGAACAGAGAGUACGUCAAGAGCACUAGGGCGAUGGCCCAGGCUCUGUACAGCACCGGCGAGUUCACCAUCAACGGCAUCCCCGGCACCAUCAAACAGCUGGAGGGCAGGUUCGGCUUCAUCGACCAGCUGGACCACUACAACAACGUGAUCGACGAUGCCAAGCACGACGCCUACUCCCUGAACGGCAGGGAGAAGCAGUUCCAGGAGUUCCUGUUCUUCAAGACAUUCUUCUUCAACGAGUACCCCCUGGUGAUCACCGAGGGCAAGACCGACAUCAGAUACCUGAAGGCCGCCCUCAAGAGCCUGCACCAGAAGUACCCCGAGCUGAUCUGCAAGGAGGACGACGGAACCUUUCGGUUCAAGAUCAGCUUCUUCAGAAGAUCCAAGAGAUGGAAGUACUUCUUCGGCAUCAGCAAGGAUGGCGCUGAUGCCAUGAAGCUGCUGUACCGGUUCUUUACCGGACAGAAGGGCGUGAAGAACUACUACAGGCUGUUCGCCGAGAAGUACAAGGCCGUGCAGAGAAACCCCGUGAUCAUGCUGUUUGACAACGAGAUGGAGAGCAAGAGACCCCUCAACAAGUUCAUCUCCGAGGAGGUGAAGAUUCCCUCCUCGGAGCAGCAGCUGUUCAAGGAGCAGCUGUACUACCACCUGAUCCCCGGCAGCAAGACCUACCUGAUGACCCACCCCUUGCCUCCUGGCAAGACAGAGGCCGAGAUCGAGGACCUGUUCCCUACCGAGGUGUUGGGCGUGAAGCUCGACGGCAAGAGCUUCAGCACCAAGGACAAGUUCGACACCAGCAAGUUCUACGGCAAGGACAUCUUCAGCAGCUACGUGUACGAGCACUGGAAGUCCAUCGACUUCAGCGGCUUCAUCCCCCUGCUGGACAAGAUCAACAUGCUCGUGCAGAACGAGAAGAAGCCCGGGCUGAACACACCACCAAAGAAGAAAAGAAAGGUCUGAGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGCGAU SEQ ID NO: 19314 R1262_nc-5 (R1262_ncRNA-EMX1_505) R1262 ncRNA在EMX1位點中含有505 bp插入 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAACUUCAGCUAAGGAAGCUACCAAUAUUUAGUUUCUGAGUCUCACGACAGACCUCGCGCGUAGAUUGCCAUGCGUAGAGCUAACGAGCCAGCGGAAAGCGUGAGGCGCUUUUAAGCAUGGCGAGUAAGUGAUCCAACGCUUCGGAUAUGACUAUAUACUUAGGUUCGAUCUCGUCCCGAGAAUUCUAAGCCUCAACAUCUAUGAGUUAUGAGGUUAGCCGAAAAAGCACGUGGUGGCGCCCACCGACUGUUCCCAGACUGUAGCUCUUUGUUCUGUCAAGGCCCGACCUUCAUCGCGGCCGAUUCCUUCUGCGGACCAUACCGUCCUGAUACUUUGGUCAUGUUUCCGUUGUAGGAGUGAACCCACUUGCCUUUGCGUCUUAAUACCAAUGAAAAACCUAUGCACUUUGUACAGGGUACCAUCGGGAUUCUGAACCCUCAGAUAGUGGGGAUCCCGGGUAUAGACCUUUAUCUGCGGUCCAACUUAGGCAUAAACCUCCAUGCUACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19315 R1262_nc-3 (R1262_ncRNA-EMX1_305) R1262 ncRNA在EMX1位點中含有305 bp插入 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAACGACUACCAAAUCCGCAUGUUACGGGACUUCUUAUUAAUUCUUUUUUCGUGAGGAGCAGCGGAUCUUAAUGGAUGGCCGCAGGUGGUAUGGAAGCUAAUAGCGCGGGUGAGAGGGUAAUCAGCCGUGUCCACCAACACAACGCUAUCGGGCGAUUCUAUAAGAUUCCGCAUUGCGUCUACUUAUAAGAUGUCUCAACGGUAUCCGCAACUUGCGAUGUGCCUGCUAUCCUUAAAUGCAUAUCUCGCCCAGUAGCUUCCCAAUAUGAGAGCAUCAAUUGUAGAUCGGGCCGGGAUAGUCAUGUCGUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19316 R1262_nc-1 (R1262_ncRNA-EMX1_25) R1262 ncRNA在EMX1位點中含有25 bp插入 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19317 R1262_nc-4 (R1262_ncRNA-EMX1_405) R1262 ncRNA在EMX1位點中含有405 bp插入 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAACUGCUAAAUCCGCGUGAUAGGGGAUUUGAAGUUUAAUCUUCUAUCGCAAGGAACUGCCGAUCUUAAUGGAUGGCCGGAGGUGGUAUGGAAGCUAUAAGCGCGGGUGAGAGGGUAAUUAGGCGUGUUCACCUACGCUACGCUAACGGGCGAUUCUAUAAGAUUGCACAUUGCGUCAACUCAUAAGAUGUCUCAACGGCAUGCGCAACUUGUGAAGUGUCUACUAUCCUUAAACGCAUAUCUCGCACAGUAACUCCCGAAUAUGUCGGCAUCUGAUGUUGCCCGGGCCGAGUUAGUGUUGAGCUCACGGAACUUAUUGUAUGAGUAGUGAUUUGUAAGAGUUGUCAGUUAGCUCGUUCAGGUAAUAGUUGCCCACACAACGUCAAAAUAAGAGAACGGUCGUAACAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19318 R1262_nc-2 (R1262_ncRNA-EMX1_205) R1262 ncRNA在EMX1位點中含有205 bp插入 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19319 R1262_nc-19 (R1262_ncRNA-sgEMX1_205) R1262 ncRNA在EMX1位點中含有205 bp插入,該位點與3'端之EMX1 sgRNA組合 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUGAU SEQ ID NO: 19320 R1262_nc-20 (R1262_sgEMX1-ncRNA_205) R1262 ncRNA在EMX1位點中含有205 bp插入,該位點與5'端之EMX1 sgRNA組合 GGGAUAAUGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUAAUAACAACGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19321 R1262_nc-13 (R1262_ncRNA-AAVS1_25) R1262 ncRNA在AAVS1位點中含有25 bp插入 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19322 R1262_nc-14 (R1262_ncRNA-AAVS1_205) R1262 ncRNA在AAVS1位點中含有205 bp插入 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19323 R1262_nc-15 (R1262_ncRNA-AAVS1_505) R1262 ncRNA在AAVS1位點中含有505 bp插入 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGACUUCAGCUAAGGAAGCUACCAAUAUUUAGUUUCUGAGUCUCACGACAGACCUCGCGCGUAGAUUGCCAUGCGUAGAGCUAACGAGCCAGCGGAAAGCGUGAGGCGCUUUUAAGCAUGGCGAGUAAGUGAUCCAACGCUUCGGAUAUGACUAUAUACUUAGGUUCGAUCUCGUCCCGAGAAUUCUAAGCCUCAACAUCUAUGAGUUAUGAGGUUAGCCGAAAAAGCACGUGGUGGCGCCCACCGACUGUUCCCAGACUGUAGCUCUUUGUUCUGUCAAGGCCCGACCUUCAUCGCGGCCGAUUCCUUCUGCGGACCAUACCGUCCUGAUACUUUGGUCAUGUUUCCGUUGUAGGAGUGAACCCACUUGCCUUUGCGUCUUAAUACCAAUGAAAAACCUAUGCACUUUGUACAGGGUACCAUCGGGAUUCUGAACCCUCAGAUAGUGGGGAUCCCGGGUAUAGACCUUUAUCUGCGGUCCAACUUAGGCAUAAACCUCCAUGCUACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19324 R1262_nc-18 (R1262_ncRNA-AAVS1-MS2_505) R1262 ncRNA在AAVS1位點中含有505 bp插入且在3'端含有MS2莖環 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGACUUCAGCUAAGGAAGCUACCAAUAUUUAGUUUCUGAGUCUCACGACAGACCUCGCGCGUAGAUUGCCAUGCGUAGAGCUAACGAGCCAGCGGAAAGCGUGAGGCGCUUUUAAGCAUGGCGAGUAAGUGAUCCAACGCUUCGGAUAUGACUAUAUACUUAGGUUCGAUCUCGUCCCGAGAAUUCUAAGCCUCAACAUCUAUGAGUUAUGAGGUUAGCCGAAAAAGCACGUGGUGGCGCCCACCGACUGUUCCCAGACUGUAGCUCUUUGUUCUGUCAAGGCCCGACCUUCAUCGCGGCCGAUUCCUUCUGCGGACCAUACCGUCCUGAUACUUUGGUCAUGUUUCCGUUGUAGGAGUGAACCCACUUGCCUUUGCGUCUUAAUACCAAUGAAAAACCUAUGCACUUUGUACAGGGUACCAUCGGGAUUCUGAACCCUCAGAUAGUGGGGAUCCCGGGUAUAGACCUUUAUCUGCGGUCCAACUUAGGCAUAAACCUCCAUGCUACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19325 R1262_nc-17 (R1262_ncRNA-AAVS1-MS2_205) R1262 ncRNA在AAVS1位點中含有205 bp插入且在3'端含有MS2莖環 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19326 R1262_nc-16 (R1262_ncRNA-AAVS1-MS2_25) R1262 ncRNA在AAVS1位點中含有25 bp插入且在3'端含有MS2莖環 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAACUCGAGCUCUGAAUGACUCCUAUAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19327 R1262_nc-9 (R1262_ncRNA-EMX1-MS2_305) R1262 ncRNA在EMX1位點中含有305 bp插入且在3'端含有MS2莖環 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAACGACUACCAAAUCCGCAUGUUACGGGACUUCUUAUUAAUUCUUUUUUCGUGAGGAGCAGCGGAUCUUAAUGGAUGGCCGCAGGUGGUAUGGAAGCUAAUAGCGCGGGUGAGAGGGUAAUCAGCCGUGUCCACCAACACAACGCUAUCGGGCGAUUCUAUAAGAUUCCGCAUUGCGUCUACUUAUAAGAUGUCUCAACGGUAUCCGCAACUUGCGAUGUGCCUGCUAUCCUUAAAUGCAUAUCUCGCCCAGUAGCUUCCCAAUAUGAGAGCAUCAAUUGUAGAUCGGGCCGGGAUAGUCAUGUCGUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19328 R1262_nc-7 (R1262_ncRNA-EMX1-MS2_25) R1262 ncRNA在EMX1位點中含有25 bp插入且在3'端含有MS2莖環 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19329 R1262_nc-6 (R1262_ncRNA-EMX1_P2A-GFP) R1262 ncRNA在EMX1位點中含有P2A-GFP插入 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAGGAAGCGGAGCCACUAACUUCUCCCUGUUGAAACAAGCAGGGGAUGUCGAAGAGAAUCCCGGGCCAGUGAGCAAGGGCGAGGAGCUGUUCACCGGGGUGGUGCCCAUCCUGGUCGAGCUGGACGGCGACGUAAACGGCCACAAGUUCAGCGUGUCCGGCGAGGGCGAGGGCGAUGCCACCUACGGCAAGCUGACCCUGAAGUUCAUCUGCACCACCGGCAAGCUGCCCGUGCCCUGGCCCACCCUCGUGACCACCCUGACCUACGGCGUGCAGUGCUUCAGCCGCUACCCCGACCACAUGAAGCAGCACGACUUCUUCAAGUCCGCCAUGCCCGAAGGCUACGUCCAGGAGCGCACCAUCUUCUUCAAGGACGACGGCAACUACAAGACCCGCGCCGAGGUGAAGUUCGAGGGCGACACCCUGGUGAACCGCAUCGAGCUGAAGGGCAUCGACUUCAAGGAGGACGGCAACAUCCUGGGGCACAAGCUGGAGUACAACUACAACAGCCACAACGUCUAUAUCAUGGCCGACAAGCAGAAGAACGGCAUCAAGGUGAACUUCAAGAUCCGCCACAACAUCGAGGACGGCAGCGUGCAGCUCGCCGACCACUACCAGCAGAACACCCCCAUCGGCGACGGCCCCGUGCUGCUGCCCGACAACCACUACCUGAGCACCCAGUCCGCCCUGAGCAAAGACCCCAACGAGAAGCGCGAUCACAUGGUCCUGCUGGAGUUCGUGACCGCCGCCGGGAUCACUCUCGGCAUGGACGAGCUGUACAAGUAGAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19330 R1262_nc-22 (R1262_sgAAVS1-ncRNA_205) R1262 ncRNA在AAVS1位點中含有205 bp插入,該位點與5'端之sgAAVS1組合 GGGAUAAUGGGGCCACUAGGGACAGGAUGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUAAUAACAACGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19331 R1262_nc-8 (R1262_ncRNA-EMX1-MS2_205) R1262 ncRNA在EMX1位點中含有205 bp插入且在3'端含有MS2莖環 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19332 R1262_nc-21 (R1262_ncRNA-sgAAVS1_205) R1262 ncRNA在AAVS1位點中含有205 bp插入,該位點與3'端之sgAAVS1組合 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCAAUAACAACGGGGCCACUAGGGACAGGAUGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUGAU SEQ ID NO: 19333 R1262_nc-10 (R1262_ncRNA-EMX1-MS2_405) R1262 ncRNA在EMX1位點中含有405 bp插入且在3'端含有MS2莖環 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAACUGCUAAAUCCGCGUGAUAGGGGAUUUGAAGUUUAAUCUUCUAUCGCAAGGAACUGCCGAUCUUAAUGGAUGGCCGGAGGUGGUAUGGAAGCUAUAAGCGCGGGUGAGAGGGUAAUUAGGCGUGUUCACCUACGCUACGCUAACGGGCGAUUCUAUAAGAUUGCACAUUGCGUCAACUCAUAAGAUGUCUCAACGGCAUGCGCAACUUGUGAAGUGUCUACUAUCCUUAAACGCAUAUCUCGCACAGUAACUCCCGAAUAUGUCGGCAUCUGAUGUUGCCCGGGCCGAGUUAGUGUUGAGCUCACGGAACUUAUUGUAUGAGUAGUGAUUUGUAAGAGUUGUCAGUUAGCUCGUUCAGGUAAUAGUUGCCCACACAACGUCAAAAUAAGAGAACGGUCGUAACAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19334 R1262_nc-11 (R1262_ncRNA-EMX1-MS2_505) R1262 ncRNA在EMX1位點中含有505 bp插入且在3'端含有MS2莖環 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAACUUCAGCUAAGGAAGCUACCAAUAUUUAGUUUCUGAGUCUCACGACAGACCUCGCGCGUAGAUUGCCAUGCGUAGAGCUAACGAGCCAGCGGAAAGCGUGAGGCGCUUUUAAGCAUGGCGAGUAAGUGAUCCAACGCUUCGGAUAUGACUAUAUACUUAGGUUCGAUCUCGUCCCGAGAAUUCUAAGCCUCAACAUCUAUGAGUUAUGAGGUUAGCCGAAAAAGCACGUGGUGGCGCCCACCGACUGUUCCCAGACUGUAGCUCUUUGUUCUGUCAAGGCCCGACCUUCAUCGCGGCCGAUUCCUUCUGCGGACCAUACCGUCCUGAUACUUUGGUCAUGUUUCCGUUGUAGGAGUGAACCCACUUGCCUUUGCGUCUUAAUACCAAUGAAAAACCUAUGCACUUUGUACAGGGUACCAUCGGGAUUCUGAACCCUCAGAUAGUGGGGAUCCCGGGUAUAGACCUUUAUCUGCGGUCCAACUUAGGCAUAAACCUCCAUGCUACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19335 R1262_nc-12 (R1262_ncRNA-EMX1-MS2_P2A-GFP) R1262 ncRNA在EMX1位點中含有P2A-GFP插入且在3'端含有MS2莖環 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAGGAAGCGGAGCCACUAACUUCUCCCUGUUGAAACAAGCAGGGGAUGUCGAAGAGAAUCCCGGGCCAGUGAGCAAGGGCGAGGAGCUGUUCACCGGGGUGGUGCCCAUCCUGGUCGAGCUGGACGGCGACGUAAACGGCCACAAGUUCAGCGUGUCCGGCGAGGGCGAGGGCGAUGCCACCUACGGCAAGCUGACCCUGAAGUUCAUCUGCACCACCGGCAAGCUGCCCGUGCCCUGGCCCACCCUCGUGACCACCCUGACCUACGGCGUGCAGUGCUUCAGCCGCUACCCCGACCACAUGAAGCAGCACGACUUCUUCAAGUCCGCCAUGCCCGAAGGCUACGUCCAGGAGCGCACCAUCUUCUUCAAGGACGACGGCAACUACAAGACCCGCGCCGAGGUGAAGUUCGAGGGCGACACCCUGGUGAACCGCAUCGAGCUGAAGGGCAUCGACUUCAAGGAGGACGGCAACAUCCUGGGGCACAAGCUGGAGUACAACUACAACAGCCACAACGUCUAUAUCAUGGCCGACAAGCAGAAGAACGGCAUCAAGGUGAACUUCAAGAUCCGCCACAACAUCGAGGACGGCAGCGUGCAGCUCGCCGACCACUACCAGCAGAACACCCCCAUCGGCGACGGCCCCGUGCUGCUGCCCGACAACCACUACCUGAGCACCCAGUCCGCCCUGAGCAAAGACCCCAACGAGAAGCGCGAUCACAUGGUCCUGCUGGAGUUCGUGACCGCCGCCGGGAUCACUCUCGGCAUGGACGAGCUGUACAAGUAGAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19336 R1262_nc-23 (R1262_ncRNA-EMX1_P2A-GFP_LongHA) 與R1262_nc-6相同,但具有較長同源臂 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUGAGGCCCCAGUGGCUGCUCUGGGGGCCUCCUGAGUUUCUCAUCUGUGCCCCUCCCUCCCUGGCCCAGGUGAAGGUGUGGUUCCAGAACCGGAGGACAAAGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAGGAAGCGGAGCCACUAACUUCUCCCUGUUGAAACAAGCAGGGGAUGUCGAAGAGAAUCCCGGGCCAGUGAGCAAGGGCGAGGAGCUGUUCACCGGGGUGGUGCCCAUCCUGGUCGAGCUGGACGGCGACGUAAACGGCCACAAGUUCAGCGUGUCCGGCGAGGGCGAGGGCGAUGCCACCUACGGCAAGCUGACCCUGAAGUUCAUCUGCACCACCGGCAAGCUGCCCGUGCCCUGGCCCACCCUCGUGACCACCCUGACCUACGGCGUGCAGUGCUUCAGCCGCUACCCCGACCACAUGAAGCAGCACGACUUCUUCAAGUCCGCCAUGCCCGAAGGCUACGUCCAGGAGCGCACCAUCUUCUUCAAGGACGACGGCAACUACAAGACCCGCGCCGAGGUGAAGUUCGAGGGCGACACCCUGGUGAACCGCAUCGAGCUGAAGGGCAUCGACUUCAAGGAGGACGGCAACAUCCUGGGGCACAAGCUGGAGUACAACUACAACAGCCACAACGUCUAUAUCAUGGCCGACAAGCAGAAGAACGGCAUCAAGGUGAACUUCAAGAUCCGCCACAACAUCGAGGACGGCAGCGUGCAGCUCGCCGACCACUACCAGCAGAACACCCCCAUCGGCGACGGCCCCGUGCUGCUGCCCGACAACCACUACCUGAGCACCCAGUCCGCCCUGAGCAAAGACCCCAACGAGAAGCGCGAUCACAUGGUCCUGCUGGAGUUCGUGACCGCCGCCGGGAUCACUCUCGGCAUGGACGAGCUGUACAAGUAGAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCCAAUGACUAGGGUGGGCAACCACAAACCCACGAGGGCAGAGUGCUGCUUGCUGCUGGCCAGGCCCCUGCGUGGGCCCAAGCGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19337 R1262_nc-24 (R1262_ncRNA-EMX1-MS2_P2A-GFP_長HA) 與R1262_nc-12相同,但具有較長同源臂 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUGAGGCCCCAGUGGCUGCUCUGGGGGCCUCCUGAGUUUCUCAUCUGUGCCCCUCCCUCCCUGGCCCAGGUGAAGGUGUGGUUCCAGAACCGGAGGACAAAGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAGGAAGCGGAGCCACUAACUUCUCCCUGUUGAAACAAGCAGGGGAUGUCGAAGAGAAUCCCGGGCCAGUGAGCAAGGGCGAGGAGCUGUUCACCGGGGUGGUGCCCAUCCUGGUCGAGCUGGACGGCGACGUAAACGGCCACAAGUUCAGCGUGUCCGGCGAGGGCGAGGGCGAUGCCACCUACGGCAAGCUGACCCUGAAGUUCAUCUGCACCACCGGCAAGCUGCCCGUGCCCUGGCCCACCCUCGUGACCACCCUGACCUACGGCGUGCAGUGCUUCAGCCGCUACCCCGACCACAUGAAGCAGCACGACUUCUUCAAGUCCGCCAUGCCCGAAGGCUACGUCCAGGAGCGCACCAUCUUCUUCAAGGACGACGGCAACUACAAGACCCGCGCCGAGGUGAAGUUCGAGGGCGACACCCUGGUGAACCGCAUCGAGCUGAAGGGCAUCGACUUCAAGGAGGACGGCAACAUCCUGGGGCACAAGCUGGAGUACAACUACAACAGCCACAACGUCUAUAUCAUGGCCGACAAGCAGAAGAACGGCAUCAAGGUGAACUUCAAGAUCCGCCACAACAUCGAGGACGGCAGCGUGCAGCUCGCCGACCACUACCAGCAGAACACCCCCAUCGGCGACGGCCCCGUGCUGCUGCCCGACAACCACUACCUGAGCACCCAGUCCGCCCUGAGCAAAGACCCCAACGAGAAGCGCGAUCACAUGGUCCUGCUGGAGUUCGUGACCGCCGCCGGGAUCACUCUCGGCAUGGACGAGCUGUACAAGUAGAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCCAAUGACUAGGGUGGGCAACCACAAACCCACGAGGGCAGAGUGCUGCUUGCUGCUGGCCAGGCCCCUGCGUGGGCCCAAGCGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19338 6083 v1 RT 6083逆轉錄子RT mRNA AUGAGCAACCCUCAGCCCACCAGGGCUGAGAUCUUCGAGCGCAUCAAACAGAGCUCGAAGCAGGAGGUGAUCCUGGAGGAGAUGCAGAGACUGGGCUUCUGGCCAAGGUCAGAAGGCCAGCCUGAGGUCGCUGCCGACCUCAUCCAGAGAGAAGGCGAGCUCCAAAGGGAGCUCGCCGAACUCAACAAGAAGUUGGCGGUCAAACGCAACCCCGAGAGGGCCCUCAGAGAAAUGAGGAAGCAGAGAAUGAAGGACGCCAGAGACAAGAGAGAGGUGACCAAGAGAGCCCAGGCUCAGCAGAGAUACGACAAGGCCCUGCUGUGGCACGAGAAGAGGGCCAGCCACGUGGCCUAUCUUGGCCCUGGUGUGAGCGCCAGCUUACACGAGAACAGCUCUGCUACCCAGGAGCAGGGCGACAAGGGGAAACCCAAGAGAGCCCGAGAUCGGGCUGUGCCAGACCUGCAGAGGCUGACACUGAACGGGCUGCCUGCUCUGAUUAGCGCAGCCCAACUCGCUGAGUCCAUGGGAGUCAGCGUGGCUGAGCUGAGAUUCCUCAGCUUCCACAGGGAGGUGGCUAGGACAAACCACUACCACAGCUUCACGCUCCCCAAGAAGACAGGUGGGGAGCGUCUGAUCAGCGCCCCCAUGCCCAGACUGAAGAGAGCCCAGUACUGGGUGCUGGACAACGUGCUGGCAAAGAUGCCAGCGCACGAUGCGGCUCACGGCUUCCUGGCCGGCAGAAGCAUCAUCAGCAAUGCCAAGCCCCAUGCGGGGCAGGAUGUGGUCAUUAACUUGGACGUUAAGGACUUCUUCCCCAGCAUCGCCUUCGGCAGAAUCAAGGGCGUGUUCAGACAGCUGGGCUACGGCGAAAGCAUCGCCACAGUGUUCGCCCUGCUGUGCAGCGAGAACAGAGCCCAGGCCUGGCAAGUGGACGGAGAGAGACUCUUCGUCGGCGGCAAGGCCAGAGAAAGAGUGCUGCCUCAGGGCGCUCCUACCAGCCCCAUGCUGACCAACCUGCUGUGCCGGCGGAUGGAUAGACGUCUGCUGGGUCUCGCGAAGCAGCUGGGCUUCGUGUACACCAGAUACGCCGACGACCUGACCUUCAGCGCCUCCGGCGAACCCGCAAGGGAUAACGUCGGAAAGCUGCUGAGCAGGGUCCGGUGGAUCCUGAGGGAUGAGGGGUUCACCCCUCAUCCCGAUAAGGAGAGAGUGAUGAGAAAGGGCAGAAGACAGGAGGUGACCGGCCUUGUGGUCAACUCCGACACUCCCAGCGUGAGCAGGGAGACCAGAAGAAGGCUGAGAGCCGCUCUGCACAGAGCCUCGCAGCCUGACGCUGCGAGCAAACCUGCACAUUGGCAGGGCCAUACGGCCCAGCCAAGUCAGCUGCUGGGCCUGGCCACAUUCGUGCAUCAGAUCGACCCCAAGCAGGGCAAGACCCUGCUCGCUGACGCUCAGCAACUGAUGCGCAGCCCUAUCGACCGCGCUAAUGACGCGGCGAAGUCUGCUAGCAGGGCUGACGCUGCUCAGCAGAGCUUCAGAGUGCUGGCUGCUGCCGGCAAGCCACCAGUUCUUGCCGACGGCAAGAACUGGUGGCAACCUGCUCCUCCUGCCACACCUGUGCUGGAGAAGACCGACCAGCAGAGAAGAGAGGAAAGGCAGGCUACCAGAAGGCAGCAGGCCGCUGCUGCUGCUCCUCCUCCUAGCAGCACAAGAAGAAACGAGCGGCCGCAACAGGCCGCUCAUGAGCAGCAGGGAGAUGCCCAGCCUCAGAACGAGGCCCCUCCUAGAUUCGACCCCGACCAGUACGCUCCUCCCCCGAGGAACGUGAUGACCUACUGGGCCCAGAUCGCCAUCAGCUUCUUUCUGGGCAGCAUCCUGCACAACAGACUGAUCACGAUCUUCGCCAUGGUGGCGGUGAUCGCUCUGUACUACAUGAGAAGACAGAGAUGGGAUGUCUUCAUGGGCAUCCUGGUGGUGGCCACCCUGCUGGGAUACCUGGUCAGGGGCAUGGGCccaccaaagaagaaaagaaaggucuga SEQ ID NO: 19339 6083 v1_ncRNA_AAVS1 gRNA_25bp R6083逆轉錄子在AAVS1位點處含有25 bp插入 GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGCAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19340 6083 v1_僅ncRNA_25bp R6083逆轉錄子在EMX1位點處含有25 bp插入 GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGC SEQ ID NO: 19341 6083 v1_ncRNA_EMX1 gRNA_25bp R6083逆轉錄子在EMX1位點處含有25 bp插入且在3'端含有sgEMX1 GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19342 6083 v1_ncRNA_EMX1 gRNA_GFP基因 R6083逆轉錄子在EMX1位點處含有GFP基因插入且在3'端含有sgEMX1 GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaacccauauguccuuccgagugagagacacaaaaaauuccaacacacuauugcaaugaaaauaaauuuccuuuauuagccagaagucagaugcucaaggggcuucaugauguccccauaauuuuuggcagagggaaaaagaucucagugguauuugugagccagggcauuggccacaccagccaccaccuucugauaggcagccugcaccugaggagugcggccgcuuuacuuguacagcucguccaugccgagagugaucccggcggcggucacgaacuccagcaggaccaugugaucgcgcuucucguuggggucuuugcucagggcggacugggugcucagguagugguugucgggcagcagcacggggccgucgccgauggggguguucugcugguaguggucggcgagcugcacgcugccguccucgauguuguggcggaucuugaaguucaccuugaugccguucuucugcuugucggccaugauauagacguuguggcuguuguaguuguacuccagcuugugccccaggauguugccguccuccuugaagucgaugcccuucagcucgaugcgguucaccagggugucgcccucgaacuucaccucggcgcgggucuuguaguugccgucguccuugaagaagauggugcgcuccuggacguagccuucgggcauggcggacuugaagaagucgugcugcuucauguggucgggguagcggcugaagcacugcacgccguaggucaggguggucacgagggugggccagggcacgggcagcuugccgguggugcagaugaacuucagggucagcuugccguagguggcaucgcccucgcccucgccggacacgcugaacuuguggccguuuacgucgccguccagcucgaccaggaugggcaccaccccggugaacagcuccucgcccuugcucaccaugguggcgaccgguggaucccgggcccgcgguaccgucgacugcagaauucgaagcuugagcucgagaucugaguccgguagcgcuagcggaucugacgguucacuaaacccuguguucuggcggcaaacccguugcgaaaaagaacguucacggcgacuacugcacuuauauacgguucucccccacccucgggaaaaaggcggagccaguacacgacaucacuuucccaguuuaccccgcgccaccuucucuaggcaccgguucaauugccgaccccuccccccaacuucucggggacugugggcgaugugcgcucugcccguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19343 6083 v1_ncRNA_EMX1 gRNA_50bp R6083逆轉錄子在EMX1位點處含有50 bp插入且在3'端含有sgEMX1 GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19344 6083 v1_ncRNA_EMX1 gRNA_100bp R6083逆轉錄子在EMX1位點處含有100 bp插入且在3'端含有sgEMX1 GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19345 6083 v1_ncRNA_EMX1 gRNA_75bp R6083逆轉錄子在EMX1位點處含有75 bp插入且在3'端含有sgEMX1 GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19346 Aco1_ncRNA-EMX1_100 Aco1 ncRNA在EMX1基因中含有100 bp插入 GGGAUAAUGUAUAAAACCGGGAACGAUCAGACCGGGGUGAAUUCGCCCCCUUGAUCAAACGGCACUAACCACUGUUUGCCGUGCGUGCGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAACGCACGCACGGCAAACAGACAGAUCCAUUAUUAUUACAAUUUAUUUAGUGAUCGUUCCCGGUUUUAUACGAU SEQ ID NO: 19347 Aco1_ncRNA-EMX1_205 Aco1 ncRNA在EMX1基因中含有205 bp插入 GGGAUAAUGUAUAAAACCGGGAACGAUCAGACCGGGGUGAAUUCGCCCCCUUGAUCAAACGGCACUAACCACUGUUUGCCGUGCGUGCGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAACGCACGCACGGCAAACAGACAGAUCCAUUAUUAUUACAAUUUAUUUAGUGAUCGUUCCCGGUUUUAUACGAU SEQ ID NO: 19348 Eco3_ncRNA-EMX1_205 Eco3 ncRNA在EMX1基因中含有205 bp插入 GGGAUAAUUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAGAU SEQ ID NO: 19349 R2781_RT-1 R2781逆轉錄子RT AGGAUAAUGGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGGAUCCGCCACCAUGGGUUACAACUACGAGUACACCAUCAGGCUGUGUGAGACGCUCAAGUCCUUGAGCGCUGACGACGUGUACAUCAGCAAGUGCUGCAACUACGCCGAGGGACUCCUGGAUAAGGAGCUCCCUGUGAUCUUCGACCCCACCCACCUGAAGCAGAUCCUGAGGCUGGACGACAUCAGCCUCGACGAGUACCACAUCUUCUAUAUCGACAAGAAGAACGGGGGCAGCAGAGAGAUCAAUGCCCCCAGCGAGGAGCUGAAGAAGCGCCAACGCUGGAUUCUCAAGAACAUCCUUGAGAAGAUCAGCAUCAGCCACAACGUGCACGGCUUCAUCAAGGGCAAGAGCAUUGUCAGCAAUGCUCGCAAGCACCUGAACAAGGAGUAUGUGCUGAACAUCGACAUCAAGGACUUCUUCCCCAGCGUCACCAAAUACUCCGUGGAGAAGAUCUUCCGGAGAAUGGGCUACUGCAACAGCGUGGCCCAGCUGCUUGCCCGCGUCUGUUGUUACAGAGGCGGGCUGCCUCAGGGAGCUCCUACAAGCCCCUACCUGGCCAACCUCGCUUUUGACGAGGUUGACCAGGAGAUCAUCAACGUGGUGAGAAACCGGGACAUCACCUACACCAGAUACGCCGACGACAUGACCUUCAGCGCCAACUACGACCUGAGCACCUUCAAGAAGGAGGUGUACAAGAGCCUGGGCAAAUACAGAUUCAGCCCCAACAUCAUGAAGACUCACCAGAUGUCUGGCGAGAAGAGAAAGCUGGUGACCGGCCUGAUCGUGGACGACAAGGUGAAGGUGUGCAAGAAGUACAAGAGAAAGCUGCGGCAGGAGAUCUACUACUGCAAGAAGUUCGGCGUGACCAACCACCUGAGAAACUGCCACAGCGAGAAGUCCAUCAACUACAAGGAGUACCUGUACGGCAAGGCCUACUUCAUCAAGAUGGUCGAGGAGAUCGUCGGCGAGAAGUUCCUGGCCGACCUGGACAGCAUUGACUGGUACCCACCAAAGAAGAAAAGAAAGGUCUGACUCGAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGC SEQ ID NO: 19350 R2781_nc-01_25bp_49-65HA R2781 ncRNA在具有49及69 bp同源臂之EMX1基因中含有25 bp插入 GGGAUAAUAAGAGCAACUAGAUUGAGGCGAUUCGCCUCCUUGGAAAAGGGUACUAAGUUUCUGUCACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGACAGAAAUGAAAUAAAUAGUAGUUGCUCUU SEQ ID NO: 19351 R2781_nc-02_205bp_49-65HA R2781 ncRNA在具有49及69 bp同源臂之EMX1基因中含有205 bp插入 GGGAUAAUAAGAGCAACUAGAUUGAGGCGAUUCGCCUCCUUGGAAAAGGGUACUAAGUUUCUGUCACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGACAGAAAUGAAAUAAAUAGUAGUUGCUCUU SEQ ID NO: 19352 R2781_nc-03_405bp_49-65HA R2781 ncRNA在具有49及69 bp同源臂之EMX1基因中含有405 bp插入 GGGAUAAUAAGAGCAACUAGAUUGAGGCGAUUCGCCUCCUUGGAAAAGGGUACUAAGUUUCUGUCACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAACUGCUAAAUCCGCGUGAUAGGGGAUUUGAAGUUUAAUCUUCUAUCGCAAGGAACUGCCGAUCUUAAUGGAUGGCCGGAGGUGGUAUGGAAGCUAUAAGCGCGGGUGAGAGGGUAAUUAGGCGUGUUCACCUACGCUACGCUAACGGGCGAUUCUAUAAGAUUGCACAUUGCGUCAACUCAUAAGAUGUCUCAACGGCAUGCGCAACUUGUGAAGUGUCUACUAUCCUUAAACGCAUAUCUCGCACAGUAACUCCCGAAUAUGUCGGCAUCUGAUGUUGCCCGGGCCGAGUUAGUGUUGAGCUCACGGAACUUAUUGUAUGAGUAGUGAUUUGUAAGAGUUGUCAGUUAGCUCGUUCAGGUAAUAGUUGCCCACACAACGUCAAAAUAAGAGAACGGUCGUAACAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGACAGAAAUGAAAUAAAUAGUAGUUGCUCUU SEQ ID NO: 19353 R2781_nc-04_25bp_30HA R2781 ncRNA在每側具有30 bp同源臂之EMX1基因中含有25 bp插入 GGGAUAAUAAGAGCAACUAGAUUGAGGCGAUUCGCCUCCUUGGAAAAGGGUACUAAGUUUCUGUCGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCGACAGAAAUGAAAUAAAUAGUAGUUGCUCUU SEQ ID NO: 19354 R2781_nc-05_205bp_30HA R2781 ncRNA在每側具有30 bp同源臂之EMX1基因中含有205 bp插入 GGGAUAAUAAGAGCAACUAGAUUGAGGCGAUUCGCCUCCUUGGAAAAGGGUACUAAGUUUCUGUCGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCGACAGAAAUGAAAUAAAUAGUAGUUGCUCUU SEQ ID NO: 19355 R2781_nc-06_405bp_30HA R2781 ncRNA在每側具有30 bp同源臂之EMX1基因中含有405 bp插入 GGGAUAAUAAGAGCAACUAGAUUGAGGCGAUUCGCCUCCUUGGAAAAGGGUACUAAGUUUCUGUCGGAGGAAGGGCCUGAGUCCGAGCAGAAGAACUGCUAAAUCCGCGUGAUAGGGGAUUUGAAGUUUAAUCUUCUAUCGCAAGGAACUGCCGAUCUUAAUGGAUGGCCGGAGGUGGUAUGGAAGCUAUAAGCGCGGGUGAGAGGGUAAUUAGGCGUGUUCACCUACGCUACGCUAACGGGCGAUUCUAUAAGAUUGCACAUUGCGUCAACUCAUAAGAUGUCUCAACGGCAUGCGCAACUUGUGAAGUGUCUACUAUCCUUAAACGCAUAUCUCGCACAGUAACUCCCGAAUAUGUCGGCAUCUGAUGUUGCCCGGGCCGAGUUAGUGUUGAGCUCACGGAACUUAUUGUAUGAGUAGUGAUUUGUAAGAGUUGUCAGUUAGCUCGUUCAGGUAAUAGUUGCCCACACAACGUCAAAAUAAGAGAACGGUCGUAACAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCGACAGAAAUGAAAUAAAUAGUAGUUGCUCUU SEQ ID NO: 19356 R1262_nc-46 (ncRNA-EMX1位點1-缺失) R1262 ncRNA含有EMX1缺失(del1) GAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUAACCCUAUGUAGCCUCAGUCUUCCCAUCAGGCUCUCAGCUCAGCCUGAGUGUUGAGGCCCCAGUGGCUGCUCUGGGUGGGCAACCACAAACCCACGAGGGCAGAGUGCUGCUUGCUGCUGCAUAAUGAGUUGUCACACGCUC SEQ ID NO: 19357 R1262_nc-47 (ncRNA-EMX1位點2-缺失) R1262 ncRNA含有EMX1缺失(del2) GAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUCAGAAGAAGAAGGGCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCCAAUGACCGCAGCCUCCCAGCUGCUCUCCGUGUCUCCAAUCUCCCUUUUGUUUUGAUGCAGCAUAAUGAGUUGUCACACGCUC SEQ ID NO: 19358 R1262_nc-61 (ncRNA-EMX1 sgRNA串聯_有義)-001 R1262 ncRNA含有有義序列,該有義序列具有串聯sgRNA標靶以插入EMX1中 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUgaguccgagcagaagaagaagggAUAGGAGUCAUUCAACCAUGGGCCCCAUGAUACUGAAAGUGCUGGCAAGCUGCGGGAGCGAGAUACAUACUUCUCGACAGUGGGGCUCAGAGCUCGAGUugaguccgagcagaagaagaagggGCAUAAUGAGUUGUCACACGCUC SEQ ID NO: 19359 R1262_nc-62 (ncRNA-EMX1 sgRNA串聯_反義)-001 R1262 ncRNA含有反義序列,該反義序列具有串聯sgRNA標靶以插入EMX1中 GGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUgaguccgagcagaagaagaagggAUAGGAGUCAUUCAACCAUGGGCCCCAUGAUACUGAAAGUGCUGGCAAGCUGCGGGAGCGAGAUACAUACUUCUCGACAGUGGGGCUCAGAGCUCGAGUugaguccgagcagaagaagaagggGCAUAAUGAGUUGUCACACGCUC SEQ ID NO: 19360 R1262_nc-63 (ncRNA_NHEJ_有義)-001 R1262 ncRNA含有有義序列以插入EMX1中 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUAUAGGAGUCAUUCAACCAUGGGCCCCAUGAUACUGAAAGUGCUGGCAAGCUGCGGGAGCGAGAUACAUACUUCUCGACAGUGGGGCUCAGAGCUCGAGUuGCAUAAUGAGUUGUCACACGCUC SEQ ID NO: 19361 R1262_nc-64 (ncRNA_NHEJ_反義)-001 R1262 ncRNA含有反義序列以插入EMX1中 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUGCAUAAUGAGUUGUCACACGCUC SEQ ID NO: 19362 R6342S_nc-63 (ncRNA_NHEJ_反義)-001 R6342S ncRNA含有有義序列以插入EMX1中 GGGAUAAUCAUAGAUUUCUUGGCCUUUAUGCUGUGGUGUUGCGCCACGGUGGAGAUUUGUCAAAUACACAUCAUUAGGUUGCGaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUCACAACCAAAUAUAAGAAUUGUUAGCAAGAAAUCUAUGACAUGAGGAUCACCCAUGU SEQ ID NO: 19363 R6342S_nc-64 (ncRNA_NHEJ_有義)-001 R6342S ncRNA含有反義序列以插入EMX1中 GGGAUAAUCAUAGAUUUCUUGGCCUUUAUGCUGUGGUGUUGCGCCACGGUGGAGAUUUGUCAAAUACACAUCAUUAGGUUGCGAUAGGAGUCAUUCAACCAUGGGCCCCAUGAUACUGAAAGUGCUGGCAAGCUGCGGGAGCGAGAUACAUACUUCUCGACAGUGGGGCUCAGAGCUCGAGUuCACAACCAAAUAUAAGAAUUGUUAGCAAGAAAUCUAUGACAUGAGGAUCACCCAUGU SEQ ID NO: 19364 ncRNA CircularizationTemplates were designed using the strategy and sequences described in (27). In vitro transcription was performed similarly to that described above, except that 7.5 mM NTP was used instead of 10 mM concentration. RNA was purified using the Qiagen RNeasy midi kit (Qiagen, 75144) after DNAse1 treatment. RNA was treated with RNAse R (Biosearch, RNR0725) to enrich for circular RNAs and purified using the Qiagen RNeasy midi kit. RNA concentration was measured by Nanodrop (Thermo Fisher), and purity and size were assessed by RNA ScreenTape analysis on a Tape station (Agilent). Example: ReferencesA, M. (2020). Bacterial retrons funciton in anti-pahge defense.183. AJ, S. (2019). Retrons and their applications in genome engineering.47(21). AV, a. (2019). Search-and-replace genome editing without double-strand breaks or donor DNA.Nature, 149-157. Bonafont J, M. A. (2019). Clinically relevant correction of Recessive Dystrophic Epidermolysis Bullosa by dual sgRNA CRISPR/Casi-mediated gene editing.27(5). D, L. (2000). synthetic DNA delivery systems.Nature Biotechnology, 33-37. D, L. (2000). Synthetic DNA delivery systems.Nature Biotechnology, 33-37. Ellis GI, S. N. (2021). Genetic engineering of T cells for immunotherapy.twenty two. F J Triana-Alonso, M. D. (1995). Self-coded 3' extension of run-off transcripts produces aberrant products during in vitro transcription with T7 polymerase.270(11). Grünewald, J. M. (2023). Engineered CRISPR primer editors with compact, untethered reverse transcriptases.41. Hou Z, Z. T. (2021). Lipid nanoparticles for mRNA delivery.6. J, G.-M. (2021). Val50Met hereditary transthyretin amyloidosis: not just a medical problem but a psychosocial burden.16. Jumper, J. E. (2021). Highly accurate protein structure prediction with AlphaFold.596. Karvelis, T. D. (2021). Transposase-associated TnpB is a programmable RNA-guided DNA endonuclease.599. L Statello, C.-J. G.-L. (2021). Gene regulation by long non-coding RNAs and its biological functions.twenty two. Lampson BC, I. M. (2005). Retrons, msDNA, and the bacterial genome.110. Liu L, C. J. (2019). In vivo Exon Replacement in the Mouse Atp7b Gene by the Cas9 system.30(9). M Vera, E. T. (2019). Imaging Single mRNA molecules in mammalian cells using an optimized MS2-MCP system.2038. Maizels N, D. L. (2018). Initiation of homologous recombination at DNA nicks.46(14). Meijboom KE, A. A. (2022). CRISPR/Cas9-mediated excision of ALS/FTD-causing hexanucleotide repeat expansion in C9ORF72 rescues major disease mechanisms in vivo and in vitro.13(6286). MF, D. (2021). Are we creating a new phenotype?Neurological Research and Practice. MR, M. (2020). Systematic prediction of genes functionally associated with bacterial retrons and classification of the encoded tripartite system.48(22). Nace KD, M. J. (2021). Modifications in an Emergency: The role of N1-Methylpseudouridine in COVID-19 Vaccines.7(5). Nahmad AD, R. E. (2022). Frequent aneuploidy in primary human T cells after CRISPR-Cas9 cleavage.10. Nakajuma K, Z. Y. (2018). Precise and efficient nucleotide substitution near genomic nick via noncanonical homology-directed repair.28(2). Nelson CE, H. C. (2016). In vivo genome editing improves muscle function in a mouse model of Duchenne muscular dystrophy.351(6271). Oscorbib IP, W. P. (2020). The attachment of a DNA-binding Sso7D-like protein improves processivity and resistance to inhibitors of M-MuLV reverse transcriptase.594(24). Oscorbin IP, F. M. (2021). M-MuLV reverse transcriptase: Selected properties and improved mutants.19. Palka C, F. C.-K. (2022). Retron reverse transcriptase termination and phage defense are dependent on host RNase H1.50(6). Ran FA, H. P. (2013). Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity.154(6). Richardson, C. D. (2016). Enhancing homology-directed genome editing by active and inactive CRISPR/CAS9 using asymmetric donor DNA.34. S, i. (1989). Reverse transcriptase associated with the biosynthesis of the branched RNA-linked msDNA in myxococcus xanthus.56. SC, L. (2021). Precise genome editing across kingdoms of life using retron-derived DNA.18(2). Schiorli, G. (2019). Precise Gene Editing Preserves Hematopoietic Stem Cell Function following Transient p53-mediated DNA Damage Response.twenty four. Schiroli, G. (2019). Precise Gene Editing Preserves Hematopoietic Stem Cell Function following Transient p53-Mediated DNA Damage Response.twenty four(4). Shimamoto T, H. M. (1993). Reverse transcriptase from bacterial retrons requires specific secondary structure at the 5'-end of the template for the cDNA priming reaction.268(4). Spencer JM, Z. X. (2017). Deep mutational scanning of S.pyogenes Cas9 reveals important functional domains.7. T, F. (1987). Branched RNA covalently linked to the 5' end of a single-stdnaded DNA in Stigmatella aurantiaca.48(1). W, S. (2022). rAAV immunogenicity, toxicity, and durability in 255 clinical trials: A meta-analysis.Frontiers in Immunology. Wesselhoeft RA, K. P. (2018). Engineering circular RNA for potent and stable translation in eukaryotic cells.9. Yarnall MTN, I. E.-U. (2023). Drag-and-drop genome insertion of large sequences without double-strand DNA cleavage using CRISPR-directed integrases.41(4). Zetsche B, G. J. (2015). Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system.163(3). Example: Sequence Selection of RT and ncRNA sequences - Overview describe SEQ ID NO: EMX1 gRNA sequence (nt) 19109 Eco1 ncRNA (nt) 19110 AcoI ncRNA (nt) 19111 RTX_1262 RT (aa) 1262 RTX_1262 ncRNA (nt) 15327 RTX3_2042 RT (aa) 2042 or 19112 RTX_2042 ncRNA (nt) 19113 RTX_2781 RT (aa) 2781 RTX_2781 ncRNA (nt) 16411 RTX_6083v1 RT (aa) 6083 or 19114 RTX3_6083v1 ncRNA (nt) 18547 RTX_6342 RT (aa) 6342 RTX_6342 ncRNA (nt) 18731 RTX_6342L ncRNA (nt) 19927 RTX_6342S ncRNA (nt) 19928 Engineered 6342S ncRNA + HDR template 19929 Engineered 6342L ncRNA + HDR template 19930 RTX_6943 RT (aa) 6943 or 19116 RTX_6943 ncRNA (nt) 19053 Engineered RTX_6943 ncRNA 19117 Eco1 RT (nt) 19118 Eco1 ncRNA (nt) 19119 Eco3 RT (nt) 19120 Eco3 ncRNA (nt) 19121 Eco5 RT – (nt) 19122 Eco5 ncRNA (nt) 19123 EMX1 – sgRNA modified (nt) 19124 EMX1 sgRNA unmodified (nt) 19125 Select RT and ncRNA sequencesEMX1 s.p Cas9 gRNA GAGTCCGAGCAGAAGAAGAA (SEQ ID NO: 19109) Eco1 ncRNA +HDR_ templateTGATAAGATTCCGTATGCGCACCCTTAGCGAGAGGTTTATCATTAAGGTCAACCTCTGGATGTTGTTTCGGCATCCTGCATTGAATCTGAGTTACTGTCTGTTTTCCT ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAAAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA AGGAAACCCGTTTCTTCTGACGTAAGGGTGCGCATACGGAATCTTATCA (SEQ ID NO: 19110) Aco1 ncRNA +HDR templateCCGTAGTGGGAGCCTCAGGCGAGGGTGTGTATCATGCCCGTTCTGCCAAGACCCACCAAAGAAGGGCACCGTGGAGG ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAAACAGAAGTGCAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA CCTCCACGGTGCAATGCGAAAGCAACTTGAGGCTTTGCTTAGTATGAGGCTCCCACTACGG (SEQ ID NO: 19111) Reverse Recorder 1262 RTMISFSEIKSRNDFADALQIPRSVLTHVLYIAKPESFYESFTIPKKNGEDRIIMAPKGTLKSIQTKLSKQLVEYRASISQKGQEKSNISHGFEREKSIITNAQIHRNKRYVINYDLKDFFDSFHFGRVVGFFEKNKHFLLPYEVAVIIAQLTCYNGRLPQGAPTSPVITNLICEILDYRVLKIAKRYKLDYTRYADDLTFSTNYSRFLEVFDSFAKE LLQEISNSGFTINQSKTRLLYRDSRQEVTGLVVNKKIGVNREYVKSTRAMAQALYSTGEFTINGIPG TIKQLEGRFGFIDQLDHYNNVIDDAKHDAYSLNGREKQFQEFLFFKTFFFNEYPLVITEGKTDIRYLKAALKSLHQKYPELICKEDDGTFRFKISFFRRSKRWKYFFGISKDGADAMKLLYRFFTGQKGVKNYYRLFAEKYKAVQRNPVIMLFDNEMESKRPLNKFISEEVKIPSSEQQLFKEQLYYHLIPGSKTYLMTHPLPPGK TEAEIEDLFPTEVLGVKLDGKSFSTKDKFDTSKFYGKDIFSSYVYEHWKSIDFSGFIPLLDKINMLVQNEKKPGLNT (SEQ ID NO: 1262)ncRNAGAGCGTGTGACAAGATTTTGGGCTTGTGTTTCGCAAGCTTTGATTAAAATGGTAGATGGATTTGCTATCTATCGTCATTTCCAGTACTGTTATGTTTATGTTCTGTTTACCCGTATGCACCATCCGCATAATGAGTTGTCACACGCTC (SEQ ID NO: 15327) Reverse Recorder 2042 RTMKDDQYSQWKKYYESRGILPEIQDKLLNYAKIHIDNNTPVIFNFEHLTLLLLGREKNYLSSVVNSPDSHYRKFKIKKRSGGEREITAPYLSLLEMQYWIYRNILINVKIHYAAHGFAQDKSIITNSRNHLGQKHLLKMDLKDFFPSIKLNRIIYIFKSLGYPNIIAFYLASICSYKGHLPQGSPTSPILSNIVSITLDNRLV KFARKMKLRYSRYADDLTFSGDKIPTNYIKYITDIINDEGFEVNDTKTKLYLKAGKRIVTGISVIGNDPKLPREYKRKLKQELHYIFTYGIGSHMAKKKIKKINYLYRIIGKVNFWLNIEPDNEYARNAKAKLLLLIDN* (SEQ ID NO: 19112 or SEQ ID NO: 2042)Homologous ncRNAAGGTGGTTATATTCTAGTATTTATGAAGTGTAGTCGCTTCGATCGTTAAGGCTGATTTTAACCTCTGCATAATAATATCGGTAGATATTATTATGCACGCTCCCTTTAGCAGAGCTAAGAATCGCTCACTCAGGCACAAGCTTTGAGGAGCGACCTCCTCAAAGCTTGTGCCTGAGTGAGAGCTAAAGAAAAGAAAAGTAGAATAAGCCACCT (SEQ ID NO: 15833) 204 2 ncRNA: RTX3_2042 ncRNA +HDR_ templateGTTAAGGTGGTTATATTCTAGTATTTATGAAGTGTAGTCGCTTCGATCGTTAAGGCTGATTTTAACCTCTGCATAATAATATCGGTAGATATTATTATGCACGCTCCCTTTAGCAGAGCTAAGAATCGCTCACTCAGGCACAAGCTTTGAGG ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAATCCTAAGAGAAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA CCTCAAAGCTTGTGCCTGAGTGAGAGCTAAAGAAAAGAAAGTAGAATAAGCCACCTTAAC (SEQ ID NO: 19113) Reverse Recorder 2781 RTMSGRLWKYLTPQQEREFILSLNLLPASCRKRMSGEAQLALLYAISNHVERHYRKVRIPKKDGKSRRLLVPDGLLKGVQRNILRHCLDGRSVSEYACAYRRGLSAADNAAPHAGGGKDKLLLKLDIRDFFDSILFWQVYGAAFPGSLFPPAAAGLLTHLCCCGDRLPQGAPTSPAISNLVMKPFDEFMGAWCRQRQIVYTRYCDDLTFSGAFDPKAV YHKARHLLEAMGFALNAEKTALRNRGMRQSVTGLVVNERVRTGVPYRRRIRQEMYYCRKYGVGEHLQAVGRLTGTETQEEAAEAERAFLCALLGKIGYVLLLEPENREFLEYREICRGMQAEQEKASCPPACGEREQKRRGGKEK (SEQ ID NO: 2781)Homologous ncRNAAAGAGCAACTAGATTGAGGCGATTCGCCTCCTTGGAAAAGGGTACTAAGTTTCTGTCGCACACCAATTTATAAGCTTATAAATTGGTGTGCGACAGAAATGAAATAAATAGTAGTTGCTCTT (SEQ ID NO: 16411) Reverse Recorder 6083 RTMSNPQPTRAEIFERIKQSSKQEVILEEMQRLGFWPRSEGQPEVAADLIQREGELQRELAELNKKLAVKRNPERALREMRKQRMKDARDKREVTKRAQAQQRYDKALLWHEKRASHVAYLGPGVSASLHENSSATQEQGDKGKPKRARDRAVPDLQRLTLNGLPALISAAQLAESMGVSVAELRFLSFHREVARTNHYHSFTLPKKTGGERLISAPMPRLK RAQYWVLDNVLAKMPAHDAAHGFLAGRSIISNAKPHAGQDVVINLDVKDFFPSIAFGRIKGVFRQLGYGESIATVFALLCSENRAQAWQVDGERLFVGGKARERVLPQGAPTSPML TNLLCRRMDRRLLGLAKQLGFVYTRYADDLTFSASGEPARDNVGKLLSRVRWILRDEGFTPHPDKERVMRKGRRQEVTGLVVNSDTPSVSRETRRRLRAALHRASQPDAASKPAHWQGHTAQPSQLLGLATFVHQIDPKQGKTLLADAQQLMRSPIDRANDAAKSASRADAAQQSFRVLAAAGKPPVLADGKNWWQPAPPATPVLEK TDQQRREERQATRRQQAAAAAPPPSSTRRNERPQQAAHEQQGDAQPQNEAPPRFDPDQYAPPPRNVMTYWAQIAISFFLGSILHNRLITIFAMVAVIALYYMRRQRWDVFMGILVVATLLGYLVRGMG* (SEQ ID NO: 19114 or SEQ ID NO: 6083)Homologous ncRNACCGGAGCAATGAGCAGGCTCTTGCAATCCGGGCGGTGTTTCGCCGCCCTTGTGAACTGCCGTTCATGCACCACGGGCCCGTTTTCACTGTGCCACCCCAGCCACGGTAGTGCTTTTCCTGAGTGGCTTGCCAGCGAAGCAGCACTACCGTGGCGGGGTGGCGTCGAGCGAACAGCTCCCGTCCCGTGAGCCCTACAGGCTCTTCGACGAGATGCACATTGCTCCG (SEQ ID NO: 18547) Engineered 6083 ncRNA: RTX3_6083v1 ncRNA +HDR templateGCTCCGGAGCAATGAGCAGGCTCTTGCAATCCGGGCGGTGTTTCGCCGCCCTTGTGAACTGCCGTTTCATGCACCACGGGCCGCCGTTTTCACTGTGCCACCCCAGCCACGGTAGTGCT ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACTACAAGGTTAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA AGCACTACCGTGGCGGGGTGGCGTCGAGCGAACAGCTCCCGTCCCGTGAGCCCTACAGGCTCTTCGACGAGATGCACATTGCTCCGGAGC (SEQ ID NO: 19115) Reverse Recorder 6342 RTMMENQENNMNNRKNYREAVNTMGKKEFTLIKMQEYGFWPKNLPTPYERQESETEEQYKERKCLLEKYEKVIDEISKLYEEKDKINLKLRELQKKYDETWDYERIRLDVSKTIMQESIERRAERKRQRELEKEQRSEEWKKEKENKIVFIGKGYSSLLYDKETDENKLLLQELPIIKDDKELANFLGIEYKKLRFLVYHRDVISV DNYHRYTIPKKKGGVRNIAAPKSILKNSQRIILEEILSKIPTS NYSHGFLKGKSVVSAAKVHVKKPDLLINIDLEDFFPTITTFERVRGMFKSFGYSGYVASMLAMICTYCERMKVEVRGEEKYVKISDRILPQGSPASPMITNIICVKLDKRLNGLSTKYDFIYTRYADDMSFSFTGDINELSVGSFMGLVSKIVKEEGFNINKDKTKFLRKNNRQCITGIVINNEEIGVPKKWIKILRAAIYNANKVKNSGEIL SNKVINEISGMTSWVKSVNEERYKDIINDAMNLINN (SEQ ID NO: 6342)Homologous ncRNAACATAGATTTCTTGGCCTTTATGCTGTGGTGTTGCCACGGTGGAGATTTGTCAAATACACATCATTAGGTTGCGCGCAATTCGCTACGCTACCAATACTGTGGGCATTAAAAATCGAGCCTTCGGTTATGCCACAGTATTGGTGCTACGCTGAAGTGTCACAACCAAATATAAGAATTGTTAGCAAGAAATCTATG (SEQ ID NO: 18731) RTX_6342S ncRNA CAUAGAUU UCUUGGCCUUUAUGCUGGUGUGUUGCGCCACGGUGGAGAUUUGUCAAAUACACAUCAUUAGGUUGCG_Place_INSERT_HERE_CACA ACCAAAUAUAAGAAUUGUUAGCAAGAAAUCUAUG (SEQ ID NO: 19927) RTX_6342L ncRNA CAUAGAUUUCUUGGCCUUUAUGCUGUGGUGUUGCGCCACGGUGGAGAUUUGUCAAAUACACAUCAUUAGGUUGCGCGCAAUUCGCUACGCUACCAAUACUGUGGGCAU_placement_insert_here_AUGCCACAGUAUUGGU GCUACGCUGAAGUGUCACAACCAAAUAUAAGAAUUGUUAGCAAGAAAUCUAUG (SEQ ID NO: 19928) RTX_6342S ncRNA + HDR template insert CAUAGAUUUCUUGGCCUUUAUGCUGUGGUGUUGCGCCACGGUGGAGAUUUGUCAAAUACACAUCAUUAGGUUGCG ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAATCGTCAGTGCAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA CACAACCAAAUAUAAGAAUUGUUAGCAAGAAAUCUAUG (SEQ ID NO: 19929) RTX_6342L ncRNA + HDR template insert CAUAGAUUUCUUGGCCUUUAUGCUGUGGUGUUGCGCCACGGUGGAGAUUUGUCAAAUACACAUCAUUAGGUUGCGCGCAAUUCGCUACGCUACCAAUACUGUGGGCAU ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACGTAATCCGGAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA AUGCCACAGUAUUGGUGCUACGCUGAAGUGUCACAACCAAAUAUAAGAAUUGUUAGCAAGAAAUCUAUG (SEQ ID NO: 19930) Reverse Recorder 6943 RTMEESTNYKLLVWGLSVIQPATPNEVLNYLTSTLNDNGLLPDVEKMIHYFELLDQLGYIHQVSKRNNLYSLTPRGNERLTPALKRLRDKIRLFMLDNCHSISKLGVLASTDTENMGGDSPSLQLRHNLKEVPHPSLSWAAGTLPSSPRQAWVRIYEQLNIGSMSSDEASTPTTARNAPLSFVGRLGFSLNYYSFNKIDEPLFNNDGVTAIASCIGISPGL ITAMVKSPKRYYRTFNLRKKSGGFRSILAPR KFIKTIQYWLKDHVLNRLKIHSSCYSYRSGVSIKDNAINHVKKKFVASIDISDYFGSINKKMVKDCFYKNNIPDHIVNTISGIVTYNDVLPQGAPTSPIISNAILFEFDEEMTAHALTLDCIYTRYSDDISISSDYKENIAILINIAEANLLSAGFTLNRQKQRIASDNSRQVVTGILVNESIRPTRCYRKKIRSAFDHALKEQDGSQLTINKLRG YLNYLKSFETYGFKFNEKKYKETLDFLIALKQS* (SEQ ID NO: 19116 or SEQ ID NO: 6943))ncRNAACTGATACAGAGAATATGGGCGGTGATTCGCCGTCTTTACAGTTAAGGCACAATTTAAAAGAGGTTCCGCACCCAAGCCTGTCTTGGGCTGCAGGGACCCTGCCCAGCAGCCCAAGACAGGCTTGGGTGCGGATCTACGAGCAATTAAATATTGGTTCGATGTCCAGT (SEQ ID NO: 19053) Engineered ncRNA: RTX3_6943 ncRNA +HDR templateGCGGAGTGCTGGCCTCAACTGATACAGAGAATATGGGCGGTGATTCGCCGTCTTTACAGTTAAGGCACAATTTAAAAGAGGTTCCGCACCCAAGCCTGTCTTGGGCTGC ACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCGAGCAGAAGAACGTATGGCCTAAAGTTCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAGGCCAATGGGGAGGACATCGATGTCA GCAGCCCAAGACAGGCTTGGGTGCGGATCTACGAGCAATTAAATATTGGTTCGATGTCCAGTGATGAGGCCAGCACCCGC (SEQ ID NO: 19117)Eco1 RT Coding sequence only. RNA is generated by CRO and a proprietary 5' cap, 5' UTR, 3' UTR and encoded poly A of approximately 120 bases are added. Completely replaced with 1N-methyl-pseudouridine) AUGAAAUCUGCAGAGUAUCUGAAUACGUUCCGCCUUAGGAAUUUGGGCCUCCCCGUGAUGAACAAUCUCCACGAUAUGAGCAAGGCGACUCGAAUAUCCGUGGAAACGCUGAGACUGCUCAUCUAUACAGCAGACUUUCGGUACAGGAUCUACACGGUCGAAAAGAAGGGGCCUGAGAAACGCAUGCGAACAAUUUAUCAACCUAGCCGAGAGCUCAAGGCGUUGCAGGGCUGGGUUCUUCGAAAC AUCCUUGACAAACUCUCAUCAUCACCCUUUAGUAUUGGGUUUGAAAAGCACCAAAGCAUCCUUAACAACGCGACGCCACACAUAGGUGCCAAUUUCAUAUGAACAUCGACUUGGAGGAUUUUUUCCGAGCCUCACAGCCAAUAAAGUGUUCGGUGUUUUUCACAGUCUUGGGUACAAUCGCCUUAUUAGUUCCGUUCUUACCAAGAUUUGUUGUUACAAGAAUCUUGCCC CAGGGAGCACCCA GCAGUCCGAAAUUGGCGAAUUUGAUUUGUUCCAAGCUCGAUUAUCGAAUACAAGGGUACGCGGGCAGCCGGGGACUCAUCUACCCGCUACGCAGACGAUCUUACGCUGUCUGCCCAAUCAAUGAAGAAGGUCGUAAAGGCGCGGGAUUUCUUGUUUUCUAUCCAUCCCGAGGGCUUGGUAAUUAAUUCCAAAAAGACUUGUAUCCAGGACCACGAUCUCAGCGAAAAGUGACAGG ACUCGU CAUUUCUCAAGAAAAAGUCGGUAAGGGAGAGAAGUAUAAGGAAAUCCGCGCGAAGAUCCACCACAUAUUCUGUGGCAAGAGCAGCGAGAUAGAACACGUCCGAGGCUGGUUGUCCUUCAUACUGAGCGUGGACUCAAAAAGCCACCGCCGGUUGAUCACCUAUAUUUCAAAACUGGAAAAGAAAUAUGGAAAGAACCCACUCAACAAAGCUAAAACACCACCAAAGAAGAAAAGAAAGGUCUGA (SEQ ID NO: 19118) Eco1 ncRNA ncRNA-sgRNA fusion (6 bp substitution in EMX1 gene) GAAAUGAUAAAGAUUCCGUAUGCGCACCCUUAGCGAGAGGUUUAUCAUUAAGGUCAACCUCUGGAUGUUGUUUCGGCAUCCUGCAUUGAAUCUGAGUUACUGUCUGUUUUCCUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAAGGAAACCC Eco3 RTCoding sequence only. RNA is generated by CRO and a proprietary 5' cap, 5' UTR, 3' UTR and encoded poly A of approximately 120 bases are added. Completely replaced with 1N-methyl-pseudouridine AUGCGCAUUUACUCUCUGAUCGACAGCCAAACCUUAAUGACCAAAGGGUUCGCAUCCGAGGUCAUGAGGAGCCCAGAACCCCCUAAGAAGUGGGACAUUGCGAAGAAGAAGGGCGGAAUGCGUACGAUAUACCAUCCCUCUUCUAAGGUGAAGCUGAUACAGUACUGGCUGAUGAACAACGUGUUCUCCAAAUUGCCGAUGCACAACGCCGCGUACGCUUUCGUGAAGAAUAGAUCUAUCAAG UCUAACGCACUGCUGCACGCAGAGAGUAAGAACAAAUACUACGUUAAGAUUGACCUGAAGGACUUCUUUCCAAGCAUCAAGUUCACAGACUUCGAAUAUGCCUUUACCCGGUACCGUGACAGAAUAGAGUUCACGACCGAGUACGACAAAGAACUGCUUCAGCUGAUUAAGACCAUUUGUUUCAUUUCUGACUCUACACUGCCAAUAGGCUUCCCCACUUCCCCUUAUAGCCAAUUUCGU CG CCAGGGAGCUGGACGAAGCUCACUCAGAAGCUGAACGCUAUAGACAAGCUCAACGCUACGUACACUCGCUACGCAGACGACAUAAUCGUGAGCACGAACAUGAAGGGCGCCUCUAAGCUGAUCUUAGACUGCUUCAAGCGGACCAUGAAGGAAAUCGGACCCGAUUUCAAGAUCAAUAUCAAGAAGUUCAAAAUAUGCUCUGCCAGUGGCGGCUCAAUUGUCGUGACGGGGCCUUAAGGUCUG UCAUGACUUCCACAUAACUCUGCACCGGUCUAUGAAGGACAAGAUCCGCCUGCACCUCUCUCUCCUGUCCAAAGGUAUUCUGAAGGACGAGGACCACAACAAGCUGUCCGGGUACAUCGCCUACGCUAAGGACAUCGAUCCACACUUCUACACCAAGCUCAAUAGGAAGUACUUCCAGGAGAUCAAGUGGAUACAAAACCUGCAUAAUAAGGUGGAGCCACCAAAGAAGAAAAGAAAGGUCUGA (SEQ ID NO. : 19120) Eco3 ncRNA ncRNA-sgRNA fusion (10 bp insertion in EMX1 gene) GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUC GAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGGCUUUUUUUUUU (SEQ ID NO: 19121) Eco5 RTCoding sequence only. RNA is generated by CRO and a proprietary 5' cap, 5' UTR, 3' UTR and encoded poly A of approximately 120 bases are added. Completely replaced with 1N-methyl-pseudouridine AUGGACGCUACCAGAACGACUCUCCUUGCAUUGGAUCUCUUCGGAUCUCCAGGUUGGUCCGCCGAUAAAGAAAUUCAGAGGCUUCAUGCGCUCAGUAAUCAUGCUGGAAGGCAUUACAGAAGGAUUAUAUUAAGUAAAAGGCACGGCGGACAGCGUCUUGUGCUUGCACCUGAUUACUUGUUAAAGACCGUUCAGCGCAACAUUUUUGAAGAACGUUUUGAGUCAAUUUCCACUGUCACCAUUUGCU ACAGCCUACAGACCGGGAUGCCCAAUCGUGUCUAACGCGCAGCCACACUGCCAACAGCCACAGAUCUUGAAACUCGAUAGAAAACUUCUUCGAUUCUAUUAGUUGGUUGCAGGUGUGGCGGGUGUUUCGCCAGGCCCAGUUGCCCCGAAAUGUCGUAAACGAUGCUCACUUGGAUAUGUUGUUAUAACGACGCACUUCCGCAGGGGUCCCCCUACAUCCCCUGCAAUUUCCAAUCUCGUCAU GAGA AGGUUUGAUGAACGGAUUGGAGAAUGGUGUCAGGCUCGAGGGAUUACCUACACUCGCUACUGCGAUGACAUGACGUUUAGUGGACACUUCAAUGCAAGCAGGUCAAGAAUAAAGUCUGCGGUCUCUUAGCUGAGCUGGGCCUUUCCCUGAAUAAACGGAAAGGCUGCCUCAUAGCGGCUUGUAAGCGCCAGCAAGUCACCGGCAUUGUUGUGAAUCACAAGCCACAGCUUGCCCGAGAAGCC AGG CGUGCCCUGCGUCAGGAAGUGCCACCUGUGCCAGAAAUAUGGAGUUAUCUCUCUCUCACAUAGAGGUGAACUGGAUCCUAGCGGAGAUCUGCACGCUCAGGCGACAGCGUAUCUCUAUGCACUCCAGGGGAGAAUUAACUGGCUUCUUCAAAUUAACCCUGAGGAUGAGGCGUUUCAACAGGCCCGGGAGUCCGUUAAGAGGAUGUUAGUUGCCUGGCCACCAAAGAAGAAAAGAAAGGU CUGA (SEQ ID NO: 19122) Eco5 ncRNAncRNA-sgRNA fusion (10 bp insertion in EMX1 gene) GAAAUGAUAAGAUUCCGUACGCCAGCAGUGGCAAUAGCGUUUCCGGCCUUUUGUGCCGGGAGGGUCGGCGAGUCGCUGACUUAACGCCAGUAGUAUGUCCAUAUACCCAAAGUCGCUUCAUUGUAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUCCUUCGAGCAAAGUUCUCCCAUCACAUCA ACCGGUGGCGCAUU GCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUACAGUUACGCGCCUUCGGGAUGGUUUAAUGGUAUUGCCGCUGUUGGCGUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUUUUUUU (SEQ ID NO: 19123) EMX1 sgRNA modifiedContains chemical modifications from Synthego: 2'-F, 2'-O-methyl, phosphorothioate GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU (SEQ ID NO: 19124) EMX1 sgRNA is unmodified Unmodified GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUU (SEQ ID NO: 19125)RNA Sequence - Overview Sequence Name describe SEQ ID NO: Eco1 RT Coding sequence only. RNA is generated by CRO and a proprietary 5' cap, 5' UTR, 3' UTR and approximately 120 bases of the encoded poly A are added. Completely replaced with 1N-methyl-pseudouridine 19126 Eco1 ncRNA-sgRNA ncRNA-sgRNA fusion (6 bp substitution in EMX1 gene) 19127 Eco3 RT Coding sequence only. RNA is generated by CRO and a proprietary 5' cap, 5' UTR, 3' UTR and approximately 120 bases of the encoded poly A are added. Completely replaced with 1N-methyl-pseudouridine 19128 Eco3 ncRNA-sgRNA ncRNA-sgRNA fusion (10 bp insertion in the EMX1 gene) 19129 Eco5 RT Coding sequence only. RNA is generated by CRO and a proprietary 5' cap, 5' UTR, 3' UTR and approximately 120 bases of the encoded poly A are added. Completely replaced with 1N-methyl-pseudouridine 19130 Eco5 ncRNA-sgRNA ncRNA-sgRNA fusion (10 bp insertion in the EMX1 gene) 19131 Cas9 All U are N-1-methylpseudouridine. 19132 EMX1 sgRNA modified Contains chemical modifications from Synthego: 2'-F, 2'-O-methyl, phosphorothioate 19133 EMX1 sgRNA unmodified No modification 19134 Eco3 ncRNA Eco3 ncRNA, with only 10 bp insertion 19135 Eco3 ncRNA-MS2 Eco3 ncRNA, with only a 10 bp insertion and a 3' MS2 stem loop 19136 Aco1 RT Aco1 RT RNA sequence, including 5' and 3' UTR 19137 Aco1 ncRNA-sgRNA_50 Aco1 ncRNA-sgRNA with 50 bp insert 19138 Aco1 sgRNA-ncRNA_50 Aco1 sgRNA-ncRNA with 50 bp insert 19139 Eco3_EMX1 gRNA_ncRNA_25 Eco3 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 5' end 19140 Eco3_ncRNA only Eco3 ncRNA contains a 25 bp insertion in the EMX1 gene 19141 Eco3_ncRNA_AAVS1 gRNA_25 Eco3 ncRNA contains a 25 bp insertion in the AAVS1 gene, which is fused to the AAVS1 sgRNA at the 3' end 19142 Eco3_ncRNA_EMX1 gRNA_25 Eco3 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19143 Eco3_ncRNA_EMX1 gRNA_50 Eco3 ncRNA contains a 50 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19144 Eco3_ncRNA_EMX1 gRNA_75 Eco3 ncRNA contains a 75 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19145 Eco3_ncRNA_EMX1 gRNA_100 Eco3 ncRNA contains a 100 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19146 Eco3_ncRNA_EMX1 gRNA_GFP gene Eco3 ncRNA contains a GFP gene inserted into the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end. The entire GFP cassette is in antisense orientation and contains a miniature EF1a promoter and a β-globin poly A signal. 19147 Eco3_NT_ncRNA_EMX gRNA_25 The Eco3 ncRNA contains a 25 bp insertion in the EMX1 gene (on the opposite strand to that cut by the sgRNA) which is fused to the EMX1 sgRNA at the 3' end 19148 Aco1_EMX1 gRNA_ncRNA_25 The Aco1 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 5' end 19149 Aco1_ncRNA only Aco1 ncRNA contains a 25 bp insertion in the EMX1 gene 19150 Aco1_ncRNA_AAVS1 gRNA_25 The Aco1 ncRNA contains a 25 bp insertion in the AAVS1 gene, which is fused to the AAVS1 sgRNA at the 3' end 19151 Aco1_ncRNA_EMX1 gRNA_25 The Aco1 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19152 Aco1_ncRNA_EMX1 gRNA_50 The Aco1 ncRNA contains a 50 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19153 Aco1_ncRNA_EMX1 gRNA_75 The Aco1 ncRNA contains a 75 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19154 Aco1_ncRNA_EMX1 gRNA_100 Aco1 ncRNA contains a 100 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19155 Aco1_ncRNA_EMX1 gRNA_GFP gene Aco1 ncRNA contains a GFP gene inserted into the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end. The entire GFP cassette is in antisense orientation and contains a miniature EF1a promoter and a β-globin poly A signal. 19156 Aco1_NT_ncRNA_EMX1 gRNA_25 The Aco1 ncRNA contains a 25 bp insertion in the EMX1 gene (on the opposite strand to that cut by the sgRNA) which is fused to the EMX1 sgRNA at the 3' end 19157 R2042_EMX1 gRNA_ncRNA_25 The R2042 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 5' end 19158 R2042_ncRNA_25 only R2042 ncRNA contains a 25 bp insertion in the EMX1 gene 19159 R2042_ncRNA_AAVS1 gRNA_25 The R2042 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the AAVS1 sgRNA at the 3' end 19160 R2042_ncRNA_EMX1 gRNA_25 The R2042 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19161 R2042_ncRNA_EMX1 gRNA_50 The R2042 ncRNA contains a 50 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19162 R2042_ncRNA_EMX1 gRNA_75 The R2042 ncRNA contains a 75 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19163 R2042_ncRNA_EMX1 gRNA_100 The R2042 ncRNA contains a 100 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19164 R2042_ncRNA_NT_EMX1 gRNA_25_Double guide The R2042 ncRNA contains a 25 bp insertion in the EMX1 gene (on the opposite strand to that cut by the sgRNA) that is fused to the EMX1 sgRNA at the 3' end 19165 R2042_RT R2042 RT mRNA sequence, coding sequence only 19166 R6943 RT R6943 RT mRNA sequence, coding sequence only 19167 R6943_ncRNA_EMX1gRNA_GFP gene The R6943 ncRNA contains a GFP gene inserted into the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end. The entire GFP cassette is in antisense orientation and contains a miniature EF1a promoter and a β-globin poly A signal. 19168 R6943_ncRNA_EMX1gRNA_100bp The R6943 ncRNA contains a 100 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19169 R6943_ncRNA_EMX1gRNA_75bp The R6943 ncRNA contains a 75 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19170 R6943_ncRNA_EMX1gRNA_50bp The R6943 ncRNA contains a 50 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19171 R6943_ncRNA_EMX1gRNA_25bp The R6943 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end 19172 R6943_ncRNA_AAVS1 gRNA_25bp The R6943 ncRNA contains a 25 bp insertion in the AAVS1 gene, which is fused to the AAVS1 sgRNA at the 3' end 19173 R6943_ncRNA only_25bp R6943 ncRNA contains a 25 bp insertion in the EMX1 gene 19174 AAVS1 sgRNA Contains chemical modifications from Synthego: 2'-F, 2'-O-methyl, phosphorothioate 19175 R1262_RT-1 (R1262 RT) R1262 retrotranscript RT mRNA 19176 R1262_nc-5 (R1262_ncRNA-EMX1_505) R1262 ncRNA contains a 505 bp insertion in the EMX1 locus 19177 R1262_nc-3 (R1262_ncRNA-EMX1_305) R1262 ncRNA contains a 305 bp insertion in the EMX1 locus 19178 R1262_nc-1 (R1262_ncRNA-EMX1_25) R1262 ncRNA contains a 25 bp insertion in the EMX1 locus 19179 R1262_nc-4 (R1262_ncRNA-EMX1_405) R1262 ncRNA contains a 405 bp insertion in the EMX1 locus 19180 R1262_nc-2 (R1262_ncRNA-EMX1_205) R1262 ncRNA contains a 205 bp insertion in the EMX1 locus 19181 R1262_nc-19 (R1262_ncRNA-sgEMX1_205) The R1262 ncRNA contains a 205 bp insertion in the EMX1 locus, which is combined with the EMX1 sgRNA at the 3' end 19182 R1262_nc-20 (R1262_sgEMX1-ncRNA_205) The R1262 ncRNA contains a 205 bp insertion in the EMX1 locus, which is combined with the EMX1 sgRNA at the 5' end 19183 R1262_nc-13 (R1262_ncRNA-AAVS1_25) R1262 ncRNA contains a 25 bp insertion in the AAVS1 locus 19184 R1262_nc-14 (R1262_ncRNA-AAVS1_205) R1262 ncRNA contains a 205 bp insertion in the AAVS1 locus 19185 R1262_nc-15 (R1262_ncRNA-AAVS1_505) R1262 ncRNA contains a 505 bp insertion in the AAVS1 locus 19186 R1262_nc-18 (R1262_ncRNA-AAVS1-MS2_505) The R1262 ncRNA contains a 505 bp insertion in the AAVS1 locus and an MS2 stem loop at the 3' end 19187 R1262_nc-17 (R1262_ncRNA-AAVS1-MS2_205) The R1262 ncRNA contains a 205 bp insertion in the AAVS1 locus and an MS2 stem loop at the 3' end 19188 R1262_nc-16 (R1262_ncRNA-AAVS1-MS2_25) The R1262 ncRNA contains a 25 bp insertion in the AAVS1 locus and an MS2 stem loop at the 3' end 19189 R1262_nc-9 (R1262_ncRNA-EMX1-MS2_305) R1262 ncRNA contains a 305 bp insertion in the EMX1 locus and an MS2 stem loop at the 3' end 19190 R1262_nc-7 (R1262_ncRNA-EMX1-MS2_25) R1262 ncRNA contains a 25 bp insertion in the EMX1 locus and an MS2 stem loop at the 3' end 19191 R1262_nc-6 (R1262_ncRNA-EMX1_P2A-GFP) R1262 ncRNA contains a P2A-GFP insertion in the EMX1 locus 19192 R1262_nc-22 (R1262_sgAAVS1-ncRNA_205) The R1262 ncRNA contains a 205 bp insertion in the AAVS1 locus, which is combined with the sgAAVS1 at the 5' end. 19193 R1262_nc-8 (R1262_ncRNA-EMX1-MS2_205) R1262 ncRNA contains a 205 bp insertion in the EMX1 locus and an MS2 stem loop at the 3' end 19194 R1262_nc-21 (R1262_ncRNA-sgAAVS1_205) The R1262 ncRNA contains a 205 bp insertion in the AAVS1 locus, which is combined with the sgAAVS1 at the 3' end. 19195 R1262_nc-10 (R1262_ncRNA-EMX1-MS2_405) R1262 ncRNA contains a 405 bp insertion in the EMX1 locus and an MS2 stem loop at the 3' end 19196 R1262_nc-11 (R1262_ncRNA-EMX1-MS2_505) R1262 ncRNA contains a 505 bp insertion in the EMX1 locus and an MS2 stem loop at the 3' end 19197 R1262_nc-12 (R1262_ncRNA-EMX1-MS2_P2A-GFP) R1262 ncRNA contains a P2A-GFP insertion in the EMX1 locus and an MS2 stem loop at the 3' end 19198 R1262_nc-23 (R1262_ncRNA-EMX1_P2A-GFP_LongHA) Same as R1262_nc-6, but with longer homology arms 19199 R1262_nc-24 (R1262_ncRNA-EMX1-MS2_P2A-GFP_long HA) Same as R1262_nc-12, but with longer homology arms 19200 6083 v1 RT 6083 Retrotranscript RT mRNA 19201 6083 v1_ncRNA_AAVS1 gRNA_25bp The R6083 retrotransposons contain a 25 bp insertion at the AAVS1 locus 19202 6083 v1_ncRNA_only_25bp The R6083 retrotranscript contains a 25 bp insertion at the EMX1 locus 19203 6083 v1_ncRNA_EMX1 gRNA_25bp The R6083 retrotranscript contains a 25 bp insertion at the EMX1 locus and sgEMX1 at the 3' end 19204 6083 v1_ncRNA_EMX1 gRNA_GFP gene The R6083 retrotranscript contains a GFP gene inserted at the EMX1 locus and sgEMX1 at the 3' end 19205 6083 v1_ncRNA_EMX1 gRNA_50bp The R6083 retrotranscript contains a 50 bp insertion at the EMX1 locus and sgEMX1 at the 3' end 19206 6083 v1_ncRNA_EMX1 gRNA_100bp The R6083 retrotranscript contains a 100 bp insertion at the EMX1 locus and sgEMX1 at the 3' end 19207 6083 v1_ncRNA_EMX1 gRNA_75bp The R6083 retrotranscript contains a 75 bp insertion at the EMX1 locus and sgEMX1 at the 3' end 19208 Aco1_ncRNA-EMX1_100 Aco1 ncRNA contains a 100 bp insertion in the EMX1 gene 19209 Aco1_ncRNA-EMX1_205 Aco1 ncRNA contains a 205 bp insertion in the EMX1 gene 19210 Eco3_ncRNA-EMX1_205 Eco3 ncRNA contains a 205 bp insertion in the EMX1 gene 19211 RNA sequence Sequence Name describe sequence Eco1 RT Coding sequence only. RNA is generated by CRO and a proprietary 5' cap, 5' UTR, 3' UTR and approximately 120 bases of the encoded poly A are added. Completely replaced with 1N-methyl-pseudouridine AUGAAAUCUGCAGAGUAUCUGAAUACGUUCCGCCUUAGGAAUUUGGGCCUCCCCGUGAUGAACAAUCUCCACGAUAUGAGCAAGGCGACUCGAAUAUCCGUGGAAACGCUGAGACUGCUCAUCUAUACAGCAGACUUUCGGUACAGGAUCUACACGGUCGAAAAGAAGGGGCCUGAGAAACGCAUGCGAACAAUUUAUCAACCUAGCCGAGAGCUCAAGGCGUUGCAGGGCUGGGUUCUUCG AAAC AUCCUUGACAAACUCUCAUCAUCACCCUUUAGUAUUGGGUUUGAAAAGCACCAAAGCAUCCUUAACAACGCGACGCCACACAUAGGUGCCAAUUUCAUAUGAACAUCGACUUGGAGGAUUUUUUCCGAGCCUCACAGCCAAUAAAGUGUUCGGUGUUUUUCACAGUCUUGGGUACAAUCGCCUUAUUAGUUCCGUUCUUACCAAGAUUUGUUGUUACAAGAAUCUUGCCC CAGGGAGCACCCA GCAGUCCGAAAUUGGCGAAUUUGAUUUGUUCCAAGCUCGAUUAUCGAAUACAAGGGUACGCGGGCAGCCGGGGACUCAUCUACCCGCUACGCAGACGAUCUUACGCUGUCUGCCCAAUCAAUGAAGAAGGUCGUAAAGGCGCGGGAUUUCUUGUUUUCUAUCCAUCCCGAGGGCUUGGUAAUUAAUUCCAAAAAGACUUGUAUCCAGGACCACGAUCUCAGCGAAAAGUGACAGG ACUCGU CAUUUCUCAAGAAAAAGUCGGUAAGGGAGAGAAGUAUAAGGAAAUCCGCGCGAAGAUCCACCACAUAUUCUGUGGCAAGAGCAGCGAGAUAGAACACGUCCGAGGCUGGUUGUCCUUCAUACUGAGCGUGGACUCAAAAAGCCACCGCCGGUUGAUCACCUAUAUUUCAAAACUGGAAAAGAAAUAUGGAAAGAACCCACUCAACAAAGCUAAAACACCACCAAAGAAGAAAAGAAAGGUCUGA (SEQ ID NO: 19126) Eco1 ncRNA-sgRNA ncRNA-sgRNA fusion (6 bp substitution in EMX1 gene) GAAAUGAUAAAGAUUCCGUAUGCGCACCCUUAGCGAGAGGUUUAUCAUUAAGGUCAACCUCUGGAUGUUGUUUCGGCAUCCUGCAUUGAAUCUGAGUUACUGUCUGUUUUCCUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAAGGAAACCC GUUUCUUCUGACGUAAGGGUGCGCAUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUUUUUUU (SEQ ID NO: 19127) Eco3 RT Coding sequence only. RNA is generated by CRO and a proprietary 5' cap, 5' UTR, 3' UTR and approximately 120 bases of the encoded poly A are added. Completely replaced with 1N-methyl-pseudouridine AUGCGCAUUUACUCUCUGAUCGACAGCCAAACCUUAAUGACCAAAGGGUUCGCAUCCGAGGUCAUGAGGAGCCCAGAACCCCCUAAGAAGUGGGACAUUGCGAAGAAGAAGGGCGGAAUGCGUACGAUACCAUCCCUCUUCUAAGGUGAAGCUGAUACAGUACUGGCUGAUGAACAACGUGUUCUCCAAAUUGCCGAUGCACAACGCCGCGUACGCUUUCGUGAAGAAUAGAUCUAUCAAG UCUAACGCACUGCUGCACGCAGAGAGUAAGAACAAAUACUACGUUAAGAUUGACCUGAAGGACUUCUUUCCAAGCAUCAAGUUCACAGACUUCGAAUAUGCCUUUACCCGGUACCGUGACAGAAUAGAGUUCACGACCGAGUACGACAAAGAACUGCUUCAGCUGAUUAAGACCAUUUGUUUCAUUUCUGACUCUACACUGCCAAUAGGCUUCCCCACUUCCCCUUAUAGCCAAUUUCGU CG CCAGGGAGCUGGACGAAGCUCACUCAGAAGCUGAACGCUAUAGACAAGCUCAACGCUACGUACACUCGCUACGCAGACGACAUAAUCGUGAGCACGAACAUGAAGGGCGCCUCUAAGCUGAUCUUAGACUGCUUCAAGCGGACCAUGAAGGAAAUCGGACCCGAUUUCAAGAUCAAUAUCAAGAAGUUCAAAAUAUGCUCUGCCAGUGGCGGCUCAAUUGUCGUGACGGGGCCUUAAGGUCUG UCAUGACUUCCACAUAACUCUGCACCGGUCUAUGAAGGACAAGAUCCGCCUGCACCUCUCUCUCCUGUCCAAAGGUAUUCUGAAGGACGAGGACCACAACAAGCUGUCCGGGUACAUCGCCUACGCUAAGGACAUCGAUCCACACUUCUACACCAAGCUCAAUAGGAAGUACUUCCAGGAGAUCAAGUGGAUACAAAACCUGCAUAAUAAGGUGGAGCCACCAAAGAAGAAAAGAAAGGUCUGA (SEQ ID NO. : 19128) Eco3 ncRNA-sgRNA ncRNA-sgRNA fusion (10 bp insertion in the EMX1 gene) GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUC GAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGGCUUUUUUUUUU (SEQ ID NO: 19129) Eco5 RT Coding sequence only. RNA is generated by CRO and a proprietary 5' cap, 5' UTR, 3' UTR and approximately 120 bases of the encoded poly A are added. Completely replaced with 1N-methyl-pseudouridine AUGGACGCUACCAGAACGACUCCUUGCAUUGGAUCUCUUCGGAUCUCCAGGUUGGUCCGCCGAUAAAGAAAUUCAGAGGCUUCAUGCGCUCAGUAAUCAUGCUGGAAGGCAUUACAGAAGGAUUAUAUUAAGUAAAAGGCACGGCGGACAGCGUCUUGUGCUUGCACCUGAUUACUUGUUAAAGACCGUUCAGCGCAACAUUUUGAAGAACGUUUUGAGUCAAUUUCCACU GUCACCAUUUGCU ACAGCCUACAGACCGGGAUGCCCAAUCGUGUCUAACGCGCAGCCACACUGCCAACAGCCACAGAUCUUGAAACUCGAUAGAAAACUUCUUCGAUUCUAUUAGUUGGUUGCAGGUGUGGCGGGUGUUUCGCCAGGCCCAGUUGCCCCGAAAUGUCGUAAACGAUGCUCACUUGGAUAUGUUGUUAUAACGACGCACUUCCGCAGGGGUCCCCCUACAUCCCCUGCAAUUUCCAAUCUCGUCAU GAGA AGGUUUGAUGAACGGAUUGGAGAAUGGUGUCAGGCUCGAGGGAUUACCUACACUCGCUACUGCGAUGACAUGACGUUUAGUGGACACUUCAAUGCAAGCAGGUCAAGAAUAAAGUCUGCGGUCUCUUAGCUGAGCUGGGCCUUUCCCUGAAUAAACGGAAAGGCUGCCUCAUAGCGGCUUGUAAGCGCCAGCAAGUCACCGGCAUUGUUGUGAAUCACAAGCCACAGCUUGCCCGAGAAGCC AGG CGUGCCCUGCGUCAGGAAGUGCCACCUGUGCCAGAAAUAUGGAGUUAUCUCUCUCUCACAUAGAGGUGAACUGGAUCCUAGCGGAGAUCUGCACGCUCAGGCGACAGCGUAUCUCUAUGCACUCCAGGGGAGAAUUAACUGGCUUCUUCAAAUUAACCCUGAGGAUGAGGCGUUUCAACAGGCCCGGGAGUCCGUUAAGAGGAUGUUAGUUGCCUGGCCACCAAAGAAGAAAAGAAAGGU CUGA (SEQ ID NO: 19130) Eco5 ncRNA-sgRNA ncRNA-sgRNA fusion (10 bp insertion in the EMX1 gene) GAAAUGAUAAGAUUCCGUACGCCAGCAGUGGCAAUAGCGUUUCCGGCCUUUUGUGCCGGGAGGGUCGGCGAGUCGCUGACUUAACGCCAGUAGUAUGUCCAUAUACCCAAAGUCGCUUCAUUGUAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUCCUUCGAGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUU GCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUACAGUUACGCGCCUUCGGGAUGGUUUAAUGGUAUUGCCGCUGUUGGCGUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGGCUUUUUUUUUU (SEQ ID NO: 19131) Cas9 All U are N-1-methylpseudouridine. GGGUCCCGCAGUCGGCGUCCAGCGGCUCUGCUUGUUCGUGUGUGUCGUUGCAGGCCUUAUUCGGAUCCGCCACCAUGGACAAGAAGUACAGCAUCGGACUGGACAUCGGAACAAACAGCGUCGGAUGGGCAGUCAUCACAGACGAAUACAAGGUCCCGAGCAAGAAGUUCAAGGUCCUGGGAAACACAGACAGACACAGCAUCAA GAAGAACCUGAUCGGAGCACUGCUGUUCGACAGCGGAGAAA CAGCAGAAGCAACAAGACUGAAGAGAAC AGCAAGAAGAAGAUACACAAGAAGAAAGAACAGAAUCUGCUACCUGCAGGAAAUCUUCAGCAACGAAAU GGCAAAGGUCGACGACAGCUUCUUCCACAGACUGGAAGAAAGCUUCCUGGUCGAAGAAGACAAGAAGCA CGAAAGACACCCGAUCUUCGGAAACAUCGUCGACGAAGUCGCAUACCACGAAAAGUACCCGACAAUCUAC CACCUGAGAAAGAAGCUGGUCGACAGCACAGACAAGGCAGACCUGAGACUGAUCUACCUGGCACUGGCA CACAUGAUCAAGUUCAGAGGACACUUCCUGAUCGAAGGAGACCUGAACCCGGACAACAGCGACGUCGAC AAGCUGUUCAUCCAGCUGGUCCAGACAAUACAACCAGCUGUUCGAAGAAA ACCCGAUCAACGCAAGCGGAGUCGACGCAAAGGCAAUCCUGAGCGCAAGACUGAGCAAGAGCAGAAGACUGGAAAACCUGAUCGCACAG CUGCCGGGAGAAAAGAAGAACGGACUGUUCGGAAACCUGAUCGCACUGAGCCUGGGACUGACACCGAAC UUCAAGAGCAACUUCGACCUGGCAGAAGACGCAAAGCUGCAGCUGAGCAAGGACACAUACGACGACGACCUGGACAACCUGCUGGCACAGAUCGGAGACCAGUACGCAGACCUGUUCCUGGCAGCAAAGAACCUGAGCGACGCAAUCCUGCUGAGCGACAUCCUGAGAGUCAACACAGAAAUCACAAAGGCACCGCUGAGCGCAAGC AUGAUCAAGAGAUACGACGAACCACCAGGACCUGACACUGCUGAA GGCACUGGUCAGACAGCAGCUG CCGGAAAAGUACAAGGAAAUCUUCUUCGACCAGAGCAAGAACGGAUACGCAGGAUACAUCGACGGAGGA GCAAGCCAGGAAGAAUUCUACAAGUUCAUCAAGCCGAUCCUGGAAAAGAUGGACGGAACAGAAGAACUG CUGGUCAAGCUGAACAGAGAAGACCUGCUGAGAAAGCAGAGAACAUUCGACAACGGAAGCAUCCCGCAC CAGAUCCACCUGGGAGAACUGCACGCAAUCCUGAGAAGACAGGAAGACUUCUACCCGUUCCUGAAGGAC AACAGAGAAAAGAUCGAAAAGAUCCUGACAUUCAGAAUCCCGUACUACGUCGGACCGCUGGCAAGAGGA AACAGCAGAUUCGCAUGGAUGACAAGAAAGAGCGAAGAAACAAUC ACACCGUGGAACUUCGAAGAAGUC GUCGACAAGGGAGCAAGCGCACAGAGCUUCAUCGAAAGAAUGACAAACUUCGACAAGAACCUGCCGAAC GAAAAGGUCCUGCCGAAGCACAGCCUGCUGUACGAAUACUUCACAGUCUACAACGAACUGACAAAGGUC AAGUACGUCACAGAAGGAAUGAGAAAGCCGGCAUUCCUGAGCGGAGAACAGAAGAAGGCAAUCGUCGAC CUGCUGUUCAAGACAAACAGAAAGGUCACAGUCAAGCAGCUGAAGGAAGACUACUUCAAGAAGAUCGAA UGCUUCGACAGCGUCGAAAUCAGCGGAGUCGAAGACAGAUUCAACGCAAGCCUGGGAACAUACCACGACCUGCUGAAGAUCAUCAAGGACAAGGACUUCCUGGACAACGAAG AAAACGAAGACAUCCUGGAAGACAUCGUCCUGACACUGACACUGUUCGAAGACAGAGAAAUGAUCGAAGAAAGACUGAAGACAUACGCACACCUG UUCGACGACAAGGUCAUGAAGCAGCUGAAGAGAAGAAGAUACACAGGAUGGGGAAGACUGAGCAGAAAG CUGAUCAACGGAAUCAGACAAGCAGAGCGGAAAGACAAUCCUGGACUUCCUGAAGAGCGACGGAUUC GCAAACAGAAACUUCAUGCAGCUGAUCCACGACGACAGCCUGACAUUCAAGGAAGACAUCCAGAAGGCAGGGAGACAGCCUGCACGAACACAUCGCAAACCUGGCAGGAAGCCCGGCAAUCAAG AAGGGAAUCCUGCAGACAGUCAAGGUCGUCGACGAACUGGUCAA GGUCAUGGGAAGACACAAGCCGGAA AACAUCGUCAUCGAAAUGGCAAGAGAAAACCAGACAACACAGAAGGGACAGAAGAACAGCAGAGAAAGAAUGAAGAGAAUCGAAGAAGGAAUCAAGGAACUGGGAAGCCAGAUCCUGAAGGAACACCCGGUCGAAAC ACACAGCUGCAGAACGAAAAGCUGUACCUGUACUACCUGCAGAACGGAAGAGACAUGUACGUCGACCAG GAACUGGACAUCAACAGACUGAGCGACUACGACGUCGACCACAUCGUCCCGCAGAGCUUCCUGAAGGACG ACAGCAUCGACAACAAGGUCCUGACAAGAAGCGACAAGAACAGAGGAAAGAGCGACAACGUCCCGAGCG AAGAAGUCGUCAAGAAGAUGAAGAACUACUGGAGACAGCUGCUGAACGC AAAGCUGAUCACACAGAGAA AGUUCGACAACCUGACAAAGGCAGAGAGAGGAGGACUGAGCGAACUGGACAAGGCAGGAUUCAUCAAGA GACAGCUGGUCGAAACAAGACAGAUCACAAAGCACGUCGCACAGAUCCUGGACAGCAGAAUGAACACAA AGUACGACGAAAACGACAAGCUGAUCAGAGAAGUCAAGGUCAUCACACUGAAGAGCAAGCUGGUCAGCG ACUUCAGAAAGGACUUCCAGUUCUACAAGGUCAGAGAAAUCAACAACUACCACCACGCACACGACGCAU ACCUGAACGCAGUCGUCGGAACAGCACUGAUCAAGAAGUACCCGAAGCUGGAAAGCGAAUUCGUCUACGGAGACUACAAGGUCUACGACGUCAGAAAGAUGAUCGCAAAGAGCGAACAGG AAAUCGGAAAGGCAACAG CAAAGUACUUCUUCUACAGCAACAUCAUGAACUUCUUCAAGACAGAAAUCACACUGGCAAACGGAGAAA UCAGAAAGAGACCGCUGAUCGAAACAAACGGAGAAACAGGAGAAAUCGUCUGGGACAAGGGAAGAGACU UCGCAACAGUCAGAAAGGUCCUGAGCAUGCCGCAGGUCAUCGUCAAGAAGACAGAAGUCCAGACAG GAGGAUUCAGCAAGGAAAGCAUCCUGCCGAAGAGAAACAGCGACAAGCUGAUCGCAAGAAAGAAGGACU GGGACCCGAAGAAGUACGGAGGAUUCGACAGCCCGACAGUCGCAUACAGCGUCCUGGUCGUCGCAAAGG UCGAAAAGGGAAAGAGCAAGAAGCUGAAGAGCGUCAAGGAACUG CUGGGAAUCACAAUCAUGGAAAGAAGCAGCUUCGAAAAGAACCCGAUCGACUUCCUGGAAGCAAAGGGAUACAAGGAAGUCAAGAAGGACCUGA UCAUCAAGCUGCCGAAGUACAGCCUGUUCGAACUGGAAAACGGAAGAAAGAGAAUGCUGGCAAGCGCAG GAGAACUGCAGAAGGGAAACGAACUGGCACUGCCGAGCAAGUACGUCAACUUCCUGUACCUGGCAAGCC ACUACGAAAAGCUGAAGGGAAGCCCGGAAGACAACGAACAGAAGCAGCUGUUCGUCGAACAGCACAAGC ACUACCUGGACGAAAUCAUCGAACAGAUCAGCGAAUUCAGCAAGAGAGUCAUCCUGGCAGACGCAAACC UGGACAAGGUCCUGAGCGCAUACAAGCACAGAGACAAGCCGAUCA GAGAACAGGCAGAAAACAUCA UCCACCUGUUCACACUGACAAACCUGGGAGCACCGGCAGCAUUCAAGUACUUCGACACAAUCGACAG AAAGAGAUACACAAGCACAAAGGAAGUCCUGGACGCAACACUGAUCCACCAGAGCAUCACAGGACUGUA CGAAACAAGAAUCGACCUGAGCCAGCUGGGAGGAGACGGAGGAGGAAGCCCGAAGAAGAAGAGAAAGGU CUAGCUAGCCAUCACAUUUAAAAGCAUCUCAGCCUACCAUGAGAAUAAGAGAAAGAAAAUGAAGAUCAA UAGCUUAUUCAUCUCUUUUUCUUUUUCGUUGGUGUAAAGCCAACACCCUGUCUAAAAAACAUAAAUUUC UUUAAUCAUUUUGCCUCUUUUCUCUG UGCUUCAAUUAAUAAAAAAUGGAAAGAACCUCGAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAUCUAG (SEQ ID NO: 19132) EMX1 sgRNA modified Contains chemical modifications from Synthego: 2'-F, 2'-O-methyl, phosphorothioate GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUU (SEQ ID NO: 19133) EMX1 sgRNA unmodified No modification GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUU (SEQ ID NO: 19134) Eco3 ncRNA Eco3 ncRNA has only a 10 bp insertion GGGAUAAUUGAUAAGAAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGA GGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAGAU (SEQ ID NO: 19135) Eco3 ncRNA-MS2 Eco3 ncRNA has only a 10 bp insertion and a 3' MS2 stem loop GGGAUAAUUGAUAAGAAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGA GGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAACAUGAGGAUCACCCAUGUGAU (SEQ ID NO: 19136) Aco1 RT Aco1 RT RNA sequence, including 5' and 3' UTR AGGGGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACCAUGGAGCCCAAUGACUACGUAAAUAGGUUGCGACAUGCUAUGGAAAUAAGUGAAAACCCGCGCUUUAGCCCUGAGUACAUUGCUCAAUGUUGUACUUACGCCGAGAAUCUCCUCAAGCAAGGGCUGCCAGUUCUGUUCGAUCAAACGCAUAUUCGGAAAGUUCUGGGGAUGGCGGCACCUCGAUUGUGUGAUUAUCACAGA UUCACAAUCCCAAAGCACAACGGAUCUAGAAUUAUCACGGCCCCAUCUA GGAAGCUGAAGCUUCGACAACAAUGGAUAUACCAGAAUAUCCUUAUACGAAAGGAGGCUUCACCGUACACGCACGGAUUUGUUCCUGAACGCAGCAUCGUGACUAACGCAAUCCUCCAUAUAGGAUACGCAUACACCUACUGCGUGGAUAUCACGGAUUUCUUUCCUAGCAUCACUAAGAAGCAGGUCUUGCCUAUAUUCCGAAAUAUGGGCUAUAGUGGUUCUGCUGCAAAUACUCUGC GACCUCUGUUGCUAUGACGGGGUCCUCCCCCAGGGGGCGCCUACUAGCCCAUACCUCA GUAACAUGAUUUGUCGCGAUCUUGAUGACGAAUUGGGGGCUAUGGCCGGCGGUUCCGGGGGAUUUUUACACGGUAUGCGGAUGACAUAGCUAUCCACAAACCAACAACAGCCGCAACUUUUGGAUGCCUUGGGACUUAUCCUCGGGAAGCACGGAUUUCUUAUGAAUCUCGAUAAGUGUCGAGUCUAUAAUCCUGGACAGCCCAAAAGAAUUACUGGAUUGACCGUUCACAAUA GAGUAUCAGUUCCGAAAACCUUUAAGCGAAAUUGCGGCAGAAAUACAUUACUGUCAGAAGU UCGGAGUGACUGCACAUUUGGAAAACACGAAGGCUGCACGAUCCAUCCACUAUAGGGAACAUCUGUAUGGAAAGGCAUACUAUGUUAAAAUGGUUGAGCCCUGAGCUCGGGGCGCACUUCCUCGAUGAGUUGUCAAAGGUAGACUGGCCAGAGCCACCAAAGAAGAAAAGAAAGGUCUGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCCAGCCCUCCUCCCCUUCCUGC ACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGCCCUGCA (SEQ ID NO: 19137) Aco1 ncRNA-sgRNA_50 Aco1 ncRNA-sgRNA with 50 bp insert GGGGUAUAAAACCGGGAACGAUCAGACCGGGGUGAAUUCGCCCCCUUGAUCAAACGGCACUAACCACUGUUUGCCGUGCGUGCGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCC ACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAACGCACGCACGGCAAACAGACAGAUCCAUUAUUAUUCACAAUUUAUUUAGUGAUCGUUCCCGGUUUAUACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUUCCUGCA (SEQ ID NO: 19138) Aco1 sgRNA-ncRNA_50 Aco1 sgRNA-ncRNA with 50 bp insert GGGGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGUGCUUUUUGUAUAAAACCGGGAACGAUCAGACCGGGGUGAAUUCGCCCCCUUGAUCAAACGGCACUAACCACUGUUUGCCGUGCGUGCGUACAAACGGCAGAAGCUGGAGGAGGA AGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAAAGUUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAACGCACGCACGGCAAACAGACAGAUCCAUUAUUAUACAAUUUAUUUAGUGAUCGUUCCCGGUUUUAUACCCUGCA (SEQ ID NO: 191 39) Eco3_EMX1 gRNA_ncRNA_25 Eco3 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 5' end GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUUAAUAACAACGAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGA ACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCA(SEQ ID NO: 19140) Eco3_ncRNA only Eco3 ncRNA contains a 25 bp insertion in the EMX1 gene GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAAGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGA UGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCA(SEQ ID NO: 19141) Eco3_ncRNA_AAVS1 gRNA_25 Eco3 ncRNA contains a 25 bp insertion in the AAVS1 gene, which is fused to the AAVS1 sgRNA at the 3' end GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGACUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCCUCCUUCCUAGUCUCCUGAUAUUGGG UCUAACCCCCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGGGGCCCACUAGGGACAGGAUGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUU(SEQ ID NO: 19142) Eco3_ncRNA_EMX1 gRNA_25 Eco3 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAG CAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGGCUUUUU(SEQ ID NO: 19143) Eco3_ncRNA_EMX1 gRNA_50 Eco3 ncRNA contains a 50 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACA UCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU(SEQ ID NO: 1 9144) Eco3_ncRNA_EMX1 gRNA_75 Eco3 ncRNA contains a 75 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCC UAUAaag uuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUU (SEQ ID NO: 19145) Eco3_ncRNA_EMX1 gRNA_100 Eco3 ncRNA contains a 100 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAU GGGGCCCAUGGUUGAAUG ACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUC GGUGCUUUUU(SEQ ID NO: 19146) Eco3_ncRNA_EMX1 gRNA_GFP gene Eco3 ncRNA contains a GFP gene inserted into the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end. The entire GFP cassette is in antisense orientation and contains a miniature EF1a promoter and a β-globin poly A signal. (SEQ ID NO: 19147) Eco3_NT_ncRNA_EMX gRNA_25 The Eco3 ncRNA contains a 25 bp insertion in the EMX1 gene (on the opposite strand to that cut by the sgRNA) which is fused to the EMX1 sgRNA at the 3' end GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAugacaucgauguccuccccauuggccugcuucguggcaaugcgccaccgguugaugugaugggagaacuuUAUAGGAGUCAUUCAGAGCUCGAGUuuucuucugcggacuca ggcccuuccuccuccagcuucugccguuuguUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUU(SEQ ID NO: 19148) Aco1_EMX1 gRNA_ncRNA_25 The Aco1 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 5' end gaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuuAAUAACAACguauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgcgcgugcguacaaacggcagaagcuggagg aggaagggccugaguccgagcagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauauuacaauuuauuuagugaucguucccgguuuuauac (SEQ ID NO: 19149) Aco1_ncRNA only Aco1 ncRNA contains a 25 bp insertion in the EMX1 gene guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcggcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacg cacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauac(SEQ ID NO: 19150) Aco1_ncRNA_AAVS1 gRNA_25 The Aco1 ncRNA contains a 25 bp insertion in the AAVS1 gene, which is fused to the AAVS1 sgRNA at the 3' end guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgcggcguUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAacgcacgcacggcaa acagacagauccauuauuauuacaauuuauuagugaucguucccgguuuuauacAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19151) Aco1_ncRNA_EMX1 gRNA_25 The Aco1 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaac gcacgcacggcaaacagacagauccauuauauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu (SEQ ID NO: 19152) Aco1_ncRNA_EMX1 gRNA_50 Aco1 ncRNA contains a 50 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaac gcacgcacggcaaacagacagauccauuauauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19153) Aco1_ncRNA_EMX1 gRNA_75 The Aco1 ncRNA contains a 75 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaacc gguggc gcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauauuacaauuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgaguc ggugcuuuuu(SEQ ID NO: 19154) Aco1_ncRNA_EMX1 gRNA_100 Aco1 ncRNA contains a 100 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcggcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUA aaguucuccacaca ucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaag uggcaccgagucggugcuuuuu(SEQ ID NO: 19155) Aco1_ncRNA_EMX1 gRNA_GFP gene Aco1 ncRNA contains a GFP gene inserted into the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end. The entire GFP cassette is in antisense orientation and contains a miniature EF1a promoter and a β-globin poly A signal. (SEQ ID NO: 19156) Aco1_NT_ncRNA_EMX1 gRNA_25 The Aco1 ncRNA contains a 25 bp insertion in the EMX1 gene (on the opposite strand to that cut by the sgRNA) which is fused to the EMX1 sgRNA at the 3' end guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcgugcguugacaucgauguccuccccauuggccugcuucguggcaaugcgccaccgguugaugugaugggagaacuuUAUAGGAGUCAUUCAGAGCUCGAGUuuucuucugcucggacucaggcccuuccuccuccagcuucugccguuuguac gcacgcacggcaaacagacagauccauuauauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19157) R2042_EMX1 gRNA_ncRNA_25 The R2042 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 5' end gaguccgagcagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuuAAUAACAACGCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCA CGCUCCCUUUAGCAGAGCUA AGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGC(SEQ ID NO: 19158) R2042_ncRNA_25 only R2042 ncRNA contains a 25 bp insertion in the EMX1 gene GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAAAGACUCCUAUAAAGUUC UCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGC(SEQ ID NO: 19159) R2042_ncRNA_AAVS1 gRNA_25 The R2042 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the AAVS1 sgRNA at the 3' end GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAUGACAGAAAAGCCC CAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu (SEQ ID NO: 19160) R2042_ncRNA_EMX1 gRNA_25 The R2042 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAAAGU UCUCCCAUCA CAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuu uu(SEQ ID NO: 19161) R2042_ncRNA_EMX1 gRNA_50 The R2042 ncRNA contains a 50 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAUCAGUAUCAUGGGG CCCAUGGUUGAAUGACUCCUAUAa aguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagu cggugcuuuuu(SEQ ID NO: 19162) R2042_ncRNA_EMX1 gRNA_75 The R2042 ncRNA contains a 75 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end (SEQ ID NO: 19163) R2042_ncRNA_EMX1 gRNA_100 The R2042 ncRNA contains a 100 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end (SEQ ID NO: 19164) R2042_ncRNA_NT_EMX1 gRNA_25_Double guide The R2042 ncRNA contains a 25 bp insertion in the EMX1 gene (on the opposite strand to that cut by the sgRNA) that is fused to the EMX1 sgRNA at the 3' end GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGugacaucgauguccuccccauuggccugcuucguggcaaugcgccaccuauaguuaguguacugcaacuuUAUAGG AGUCAUUCAGAGC UCGAGUuuucuucugcucggacucaggcccuuccuccuccagcuucugccguuuguCCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgaguc ggugcuuuuu(SEQ ID NO: 19165) R2042_RT R2042 RT mRNA sequence, coding sequence only (SEQ ID NO: 19166) R6943 RT R6943 RT mRNA sequence, coding sequence only (SEQ ID NO: 19167) R6943_ncRNA_EMX1gRNA_GFP gene The R6943 ncRNA contains a GFP gene inserted into the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end. The entire GFP cassette is in antisense orientation and contains a miniature EF1a promoter and a β-globin poly A signal. (SEQ ID NO: 19168) R6943_ncRNA_EMX1gRNA_100bp The R6943 ncRNA contains a 100 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end (SEQ ID NO: 19169) R6943_ncRNA_EMX1gRNA_75bp The R6943 ncRNA contains a 75 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaaaaagu ucucccaucacaucaa ccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcacc gagucggugcuuuuu(SEQ ID NO: 19170) R6943_ncRNA_EMX1gRNA_50bp The R6943 ncRNA contains a 50 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaaccggugg cgcau ugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcu uuuu(SEQ ID NO: 19171) R6943_ncRNA_EMX1gRNA_25bp The R6943 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggagggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucuccccaucacaucaaccgguggcgcauugccacgaagcag gccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19172) R6943_ncRNA_AAVS1 gRNA_25bp The R6943 ncRNA contains a 25 bp insertion in the AAVS1 gene, which is fused to the AAVS1 sgRNA at the 3' end GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCU GAUAUUGGGUCUAACCCCCAGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu(SEQ ID NO: 19173) R6943_ncRNA only_25bp R6943 ncRNA contains a 25 bp insertion in the EMX1 gene GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucuccccaucacaucaaccgguggcgcauugccacgaagcaggccaaugg ggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGC(SEQ ID NO: 19174) AAVS1 sgRNA Contains chemical modifications from Synthego: 2'-F, 2'-O-methyl, phosphorothioate GGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu (SEQ ID NO: 19175) Error! Not a valid link.Primer sequence describe sequence AAVS1 NGS_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGCTG GGACCACCTTATATTCC(SEQ ID NO: 19212) AAVS1 NGS_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTTCT GGCAAGGAGAGAGATG (SEQ ID NO: 19213) EMX1_NGS_F3 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGaggtgaag gtgtggttccag(SEQ ID NO: 19214) EMX1_NGS_R3 GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGtagtcatt ggaggtgacatcg(SEQ ID NO: 19215) EMX1_NGS_F1 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGaggtgaag gtgtggttccag(SEQ ID NO: 19216) EMX1_NGS_R1 GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGgccaga gtccagcttggg(SEQ ID NO: 19217) qPCR primers qPCR Primers - Non-NGS EMX1 forward primer aggtgaaggtgtggttccag(SEQ ID NO: 19218) EMX1 reverse primer tagtcattggaggtgacatcg(SEQ ID NO: 19219) 5' junction GFP reverse primer ccccttgagcatctgacttc(SEQ ID NO: 19220) 3' junction GFP forward primer catcactttcccagtttaccc(SEQ ID NO: 19221) RT variant (coding sequence) RT variant ORF sequence SEQ ID NO: 6342S_N_465_K ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAAGGTGATCAAGGAGAT CAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19222 6342S_N_465_Q ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAAGGTGATCCAGGAGA TCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19223 6342S_N_465_R ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAAGGTGATCCGGGAGA TCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19224 6342S_E_466_K ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAAGATC AGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19225 6342S_E_466_N ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACATCA GCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19226 6342S_E_466_Q ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACCAGAT CAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19227 6342S_E_466_R ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACCGGAT CAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19228 6342S_S_468_K ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAAGGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19229 6342S_S_468_N ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAACGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19230 6342S_S_468_Q ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CCAGGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19231 6342S_S_468_R ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CCGGGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19232 6342S_M_470_K ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCAAGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19233 6342S_M_470_N ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCAACACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19234 6342S_M_470_Q ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCCAGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19235 6342S_M_470_R ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCCGGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19236 6342S_S_472_G ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCGGCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19237 6342S_S_472_P ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCCCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19238 6342S_W_473_F ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTTCGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19239 6342S_W_473_K ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCAAGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19240 6342S_W_473_R ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCCGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19241 6342S_W_473_Y ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTACGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19242 6342S_K_475_N ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTGGGTGAACTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19243 6342S_K_475_Q ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTGGGTGCAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19244 6342S_K_475_R ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTGGGTGCGGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19245 6342S_S_476_K ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTGGGTGAAGAAGGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19246 6342S_S_476_N ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTGGGTGAAGAACGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19247 6342S_S_476_Q ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTGGGTGAAGCAGGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19248 6342S_S_476_R ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTGGGTGAAGCGGGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19249 6342S_V_477_F ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTGGGTGAAGTCCTTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19250 6342S_V_477_I ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTGGGTGAAGTCCATCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19251 6342S_V_477_L ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTGGGTGAAGTCCCTGAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19252 6342S_V_477_W ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTGGGTGAAGTCCTGGAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19253 6342S_V_265_P ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGCCTCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGAT CAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACccaccaaagaagaaaagaaaggtctga SEQ ID NO: 19254 6342S_M470K_V477W ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCC AGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAATAAGGTTAAGAATAGTGGAGAAATTCTAAGTAATAAAGTCATAAACGAG ATCAGCGGCAAGACCTCCTGGGTGAAGTCCTTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACCCACCAAAGAAGAAAAGAAAGGTCTGA SEQ ID NO: 19255 6342S_M470K_V477W_V265P ATGATGGAGAATCAGGAGAACAACATGAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGA AGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAG CGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATACACCAT CCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGC TTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGCCCCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAG GGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGC TTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAATAAGGTTAAGAATAGTGGAGAAATTCTAAGTAATAAAGTCATAAACGAG ATCAGCGGCAAGACCTCCTGGGTGAAGTCCTTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAACCCACCAAAGAAGAAAAGAAAGGTCTGA SEQ ID NO: 19256 6342S_ntermtrunc_whelix ATGGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAACTGAGATTCCTGGTGTACCACCGGGACGTG ATCTCCGTGGACAACTACCACAGATACACCATCCCC AAGAAGAAGGGCGGCGTGAGAAACATCGCCGCTCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCTTCGGCT ACAGCGGATACGGTGGCCAGCATGCTGGCCATGATC TGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAG CAAGATCGTGAAGGAAGAGGGCTTCAACATCAAC AAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAATAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAAGAGAGATACAAGGACaTTATCAACGATGCCATGAACCTGATCAACAA CccaccaaagaagaaaCGGaaggtctga SEQ ID NO: 19257 Sso7d-PASTE (v2)_WT 6342S ATGACCGTCAAGTTCAAGTACAAGGGTGAGGAACTTGAAGTTGATATTAGCAAAATCAAGAAGGTTTGGCGCGTTGGTAAAATGATATCTTTTACTTATGACGACAACGGCAAGACAGGTAGAGGGGCAGTGTCTGAGAAAGACGCCCCCAAGGAGCTGTTGCAAATGTTGGAAAAGTCTGGGAAAAAGTCTGGCGGCTCAAAAAGAACCGCCGACGGCAGCGAATTCGAGCCCAAGAAGAAGAGGAAAGTCGGGACAGGTGGC GGTGGTGTCGAGAATCAGGAGAACAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTAC AAGGAGAGAAAGTGCCTGCTCGAGAAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCCGCTCCGAGGAGTGGAAGAAGGA GAAGGAGAACAAGATCGTCTTCATCGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCC GTGGACAACTACCACAGATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGTCCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGT GAGAGGCATGTTCAAGAGCTTCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATC TGCGTGAAGCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAA GAAGTGGATCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAAC SEQ ID NO: 19258 Sso7d-ncbiprotein (v1)_WT 6342S ATGACAGTGAAGTTCAAGTACAAGGGCGAGGAAAAGGAAGTCGACATCAGCAAGATCAAGAAAGTGTGGCGGGTGGGCAAGATGATCAGCTTCACCTACGACGAGGGCGGAGGAAAGACCGGCAGAGGCGCCGTGTCTGAGAAAGATGCCTCTAAGGAGCTGCTGCAGATGCTGGAAAAACAGAAGAAGCCCAAGAAGAAGAGGAAAGTCGGGACAGGTGGCGGTGGTGTCGAGAATCAGGAGAACAACATGAACAACA GAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAGAAGTACG AGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCATCGGCAA GGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACAGATAC ACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGTCCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTTCAAGAGCT TCGGCTACAGCGGATACGTGGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAAGC TGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGAT CAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAAC SEQ ID NO: 19259 Sac7d-ncbiprotein (v2)_WT 6342S ATGGTGAAGGTGAAATTCAAGTACAAGGGCGAGGAAAAAGAGGTGGATACAAGCAAGATCAAGAAAGTGTGGAGAGTGGGAAAGATGGTCAGCTTCACCTACGACGACAACGGCAAGACCGGCAGAGGCGCCGTGTCTGAGAAGGACGCCCCTAAGGAACTGCTGGATATGCTGGCTAGAGCCGAGCGGGAAAAGAAGCCCAAGAAGAAGAGGAAAGTCGGGACAGGTGGCGGTGGTGTCGAGAATCAGGAGAACAACAA CATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAG AAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCAT CGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACA GATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGTCCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTT CAAGAGCTTCGGCTACAGCGGATACGTGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAA GCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGA TCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAAC SEQ ID NO: 19260 Sac7d-ncbigene (v1)_WT 6342S ATGGTCAAGGTGAAGTTCAAGTACAAGGGCGAGGAAAAAGAGGTGGATACAAGCAAGATCAAGAAAGTGTGGAGAGTGGGAAAAATGGTGAGCTTCACCTACGACGACAACGGCAAGACCGGCAGAGGCGCCGTGTCTGAGAAGGACGCCCCTAAGGAACTGCTGGATATGCTGGCTAGAGCCGAGCGGGAAAAGAAGCCCAAGAAGAAGAGGAAAGTCGGGACAGGTGGCGGTGGTGTCGAGAATCAGGAGAA CAACATGAACAACAGAAAGAACTACAGAGAGGCCGTGAACACCATGGGCAAGAAGGAATTCACCCTGATCAAGATGCAGGAGTACGGCTTCTGGCCCAAGAACCTGCCTACTCCCTACGAGAGGCAGGAAAGCGAGACCGAGGAGCAGTACAAGGAGAGAAAGTGCCTGCTCGAG AAGTACGAGAAGGTGATCGACGAGATCAGCAAGCTGTACGAAGAGAAGGACAAGATCAACCTGAAGCTGAGGGAGCTCCAGAAGAAGTACGACGAGACCTGGGACTATGAGCGCATCCGGCTGGATGTGAGCAAGACCATCATGCAGGAGAGCATCGAGAGGAGAGCTGAAAGAAAGCGCCAGCGGGAGCTGGAGAAGGAGCAGCGCTCCGAGGAGTGGAAGAAGGAGAAGGAGAACAAGATCGTCTTCAT CGGCAAGGGGTACAGCAGCCTGCTGTACGACAAGGAGACCGACGAGAACAAGCTGCTGCTGCAGGAGCTCCCTATCATCAAGGATGATAAGGAGCTGGCCAACTTCCTGGGCATCGAGTACAAGAAGCTGAGATTCCTGGTGTACCACCGGGACGTGATCTCCGTGGACAACTACCACA GATACACCATCCCCAAGAAGAAGGGCGGCGTGAGAAACATCGCCGTCCCCAAGAGCATCCTGAAGAACTCCCAGAGGATTATCCTCGAGGAGATTCTGAGCAAGATCCCCACCTCCAACTACAGCCACGGCTTCCTGAAGGGGAAGTCCGTGGTGAGCGCCGCCAAGGTGCACGTGAAGAAACCCGACCTCCTGATCAACATCGATCTGGAGGACTTCTTCCCCACCATCACCTTCGAGAGGGTGAGAGGCATGTT CAAGAGCTTCGGCTACAGCGGATACGTGCCAGCATGCTGGCCATGATCTGCACCTACTGCGAGAGGATGAAGGTGGAGGTGAGGGGCGAGGAAGTACGTCAAGATCAGCGACAGAATTCTGCCCCAGGGCAGCCCTGCCAGCCCCATGATCACCAACATCATCTGCGTGAA GCTGGACAAGAGACTGAACGGCCTGAGCACCAAGTATGACTTCATTTATACTCGGTACGCTGACGACATGTCGTTCAGCTTCACCGGCGACATCAACGAACTGAGCGTCGGCAGCTTCATGGGCCTGGTGAGCAAGATCGTGAAGGAAGAGGGCTTCAACATCAACAAGGACAAGACCAAGTTCCTGAGAAAGAACAACAGACAGTGCATCACCGGCATCGTGATCAACAACGAGGAGATCGGCGTGCCCAAGAAGTGGA TCAAGATCCTGAGAGCCGCCATCTACAACGCCAACAAGGTGAAGAACAGCGGCGAGATCCTGAGCAACAAGGTGATCAACGAGATCAGCGGCATGACCTCCTGGGTGAAGTCCGTCAACGAGGAGAGATACAAGGACATCATCAACGATGCCATGAACCTGATCAACAAC SEQ ID NO: 19261 Plasmid sequence Name describe sequence SEQ ID NO: Eco1 RT Coding sequence only. RNA is generated by CRO and a proprietary 5' cap, 5' UTR, 3' UTR and approximately 120 bases of the encoded poly A are added. Completely replaced with 1N-methyl-pseudouridine AUGAAAUCUGCAGAGUAUCUGAAUACGUUCCGCCUUAGGAAUUUGGGCCUCCCCGUGAUGAACAAUCUCCACGAUAUGAGCAAGGCGACUCGAAUAUCCGUGGAAACGCUGAGACUGCUCAUCUAUACAGCAGACUUUCGGUACAGGAUCUACACGGUCGAAAAGAAGGGGCCUGAGAAACGCAUGCGAACAAUUUAUCAACCUAGCCGAGAGCUCAAGGCGUUGCAGGGCUGGGUUCUUCG AAAC AUCCUUGACAAACUCUCAUCAUCACCCUUUAGUAUUGGGUUUGAAAAGCACCAAAGCAUCCUUAACAACGCGACGCCACACAUAGGUGCCAAUUUCAUAUGAACAUCGACUUGGAGGAUUUUUUCCGAGCCUCACAGCCAAUAAAGUGUUCGGUGUUUUUCACAGUCUUGGGUACAAUCGCCUUAUUAGUUCCGUUCUUACCAAGAUUUGUUGUUACAAGAAUCUUGCCC CAGGGAGCACCCA GCAGUCCGAAAUUGGCGAAUUUGAUUUGUUCCAAGCUCGAUUAUCGAAUACAAGGGUACGCGGGCAGCCGGGGACUCAUCUACCCGCUACGCAGACGAUCUUACGCUGUCUGCCCAAUCAAUGAAGAAGGUCGUAAAGGCGCGGGAUUUCUUGUUUUCUAUCCAUCCCGAGGGCUUGGUAAUUAAUUCCAAAAAGACUUGUAUCCAGGACCACGAUCUCAGCGAAAAGUGACAGG ACUCGU CAUUUCUCAAGAAAAAGUCGGUAAGGGAGAGAAGUAUAAGGAAAUCCGCGCGAAGAUCCACCACAUAUUCUGUGGCAAGAGCAGCGAGAUAGAACACGUCCGAGGCUGGUUGUCCUUCAUACUGAGCGUGGACUCAAAAAGCCACCGCCGGUUGAUCACCUAUAUUUCAAAACUGGAAAAGAAAUAUGGAAAGAACCCACUCAACAAAGCUAAAACACCACCAAAGAAGAAAAGGUCUGA SEQ ID NO: 19262 Eco1 ncRNA-sgRNA ncRNA-sgRNA fusion (6 bp substitution in EMX1 gene) GAAAUGAUAAAGAUUCCGUAUGCGCACCCUUAGCGAGAGGUUUAUCAUUAAGGUCAACCUCUGGAUGUUGUUUCGGCAUCCUGCAUUGAAUCUGAGUUACUGUCUGUUUUCCUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAAGGAAACCC GUUUCUUCUGACGUAAGGGUGCGCAUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUUUUUUUU SEQ ID NO: 19263 Eco3 RT Coding sequence only. RNA is generated by CRO and a proprietary 5' cap, 5' UTR, 3' UTR and approximately 120 bases of the encoded poly A are added. Completely replaced with 1N-methyl-pseudouridine AUGCGCAUUUACUCUCUGAUCGACAGCCAAACCUUAAUGACCAAAGGGUUCGCAUCCGAGGUCAUGAGGAGCCCAGAACCCCCUAAGAAGUGGGACAUUGCGAAGAAGAAGGGCGGAAUGCGUACGAUACCAUCCCUCUUCUAAGGUGAAGCUGAUACAGUACUGGCUGAUGAACAACGUGUUCUCCAAAUUGCCGAUGCACAACGCCGCGUACGCUUUCGUGAAGAAUAGAUCUAUCAAG UCUAACGCACUGCUGCACGCAGAGAGUAAGAACAAAUACUACGUUAAGAUUGACCUGAAGGACUUCUUUCCAAGCAUCAAGUUCACAGACUUCGAAUAUGCCUUUACCCGGUACCGUGACAGAAUAGAGUUCACGACCGAGUACGACAAAGAACUGCUUCAGCUGAUUAAGACCAUUUGUUUCAUUUCUGACUCUACACUGCCAAUAGGCUUCCCCACUUCCCCUUAUAGCCAAUUUCGU CG CCAGGGAGCUGGACGAAGCUCACUCAGAAGCUGAACGCUAUAGACAAGCUCAACGCUACGUACACUCGCUACGCAGACGACAUAAUCGUGAGCACGAACAUGAAGGGCGCCUCUAAGCUGAUCUUAGACUGCUUCAAGCGGACCAUGAAGGAAAUCGGACCCGAUUUCAAGAUCAAUAUCAAGAAGUUCAAAAUAUGCUCUGCCAGUGGCGGCUCAAUUGUCGUGACGGGGCCUUAAGGUCUG UCAUGACUUCCACAUAACUCUGCACCGGUCUAUGAAGGACAAGAUCCGCCUGCACCUCUCUCUCCUGUCCAAAGGUAUUCUGAAGGACGAGGACCACAACAAGCUGUCCGGGUACAUCGCCUACGCUAAGGACAUCGAUCCACACUUCUACACCAAGCUCAAUAGGAAGUACUUCCAGGAGAUCAAGUGGAUACAAAACCUGCAUAAUAAGGUGGAGCCACCAAAGAAGAAAAGAAAGGUCUGA SEQ ID NO: 19264 Eco3 ncRNA-sgRNA ncRNA-sgRNA fusion (10 bp insertion in the EMX1 gene) GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUC GAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGGCUUUUUUUUUUU SEQ ID NO: 19265 Eco5 RT Coding sequence only. RNA is generated by CRO and a proprietary 5' cap, 5' UTR, 3' UTR and approximately 120 bases of the encoded poly A are added. Completely replaced with 1N-methyl-pseudouridine AUGGACGCUACCAGAACGACUCCUUGCAUUGGAUCUCUUCGGAUCUCCAGGUUGGUCCGCCGAUAAAGAAAUUCAGAGGCUUCAUGCGCUCAGUAAUCAUGCUGGAAGGCAUUACAGAAGGAUUAUAUUAAGUAAAAGGCACGGCGGACAGCGUCUUGUGCUUGCACCUGAUUACUUGUUAAAGACCGUUCAGCGCAACAUUUUGAAGAACGUUUUGAGUCAAUUUCCACU GUCACCAUUUGCU ACAGCCUACAGACCGGGAUGCCCAAUCGUGUCUAACGCGCAGCCACACUGCCAACAGCCACAGAUCUUGAAACUCGAUAGAAAACUUCUUCGAUUCUAUUAGUUGGUUGCAGGUGUGGCGGGUGUUUCGCCAGGCCCAGUUGCCCCGAAAUGUCGUAAACGAUGCUCACUUGGAUAUGUUGUUAUAACGACGCACUUCCGCAGGGGUCCCCCUACAUCCCCUGCAAUUUCCAAUCUCGUCAU GAGA AGGUUUGAUGAACGGAUUGGAGAAUGGUGUCAGGCUCGAGGGAUUACCUACACUCGCUACUGCGAUGACAUGACGUUUAGUGGACACUUCAAUGCAAGCAGGUCAAGAAUAAAGUCUGCGGUCUCUUAGCUGAGCUGGGCCUUUCCCUGAAUAAACGGAAAGGCUGCCUCAUAGCGGCUUGUAAGCGCCAGCAAGUCACCGGCAUUGUUGUGAAUCACAAGCCACAGCUUGCCCGAGAAGCC AGG CGUGCCCUGCGUCAGGAAGUGCCACCUGUGCCAGAAAUAUGGAGUUAUCUCUCUCUCACAUAGAGGUGAACUGGAUCCUAGCGGAGAUCUGCACGCUCAGGCGACAGCGUAUCUCUAUGCACUCCAGGGGAGAAUUAACUGGCUUCUUCAAAUUAACCCUGAGGAUGAGGCGUUUCAACAGGCCCGGGAGUCCGUUAAGAGGAUGUUAGUUGCCUGGCCACCAAAGAAGAAAAGAAAGGU CUGA SEQ ID NO: 19266 Eco5 ncRNA-sgRNA ncRNA-sgRNA fusion (10 bp insertion in the EMX1 gene) GAAAUGAUAAGAUUCCGUACGCCAGCAGUGGCAAUAGCGUUUCCGGCCUUUUGUGCCGGGAGGGUCGGCGAGUCGCUGACUUAACGCCAGUAGUAUGUCCAUAUACCCAAAGUCGCUUCAUUGUAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUCCUUCGAGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUU GCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUACAGUUACGCGCCUUCGGGAUGGUUUAAUGGUAUUGCCGCUGUUGGCGUACGGAAUCUUUACAGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUUUUUUUU SEQ ID NO: 19267 Cas9 Sequences were obtained from Intellia patients. All Us are N-1-methylpseudouridine. GGGUCCCGCAGUCGGCGUCCAGCGGCUGCUUGUUCGUGUGUGUCGUUGCAGGCCUUAUUCGGAUC CGCCACCAUGGACAAGAAGUACAGCAUCGGACUGGACAUCGGAACAAACAGCGUCGGAUGGGCAGUCAU CACAGACGAAUACAAGGUCCCGAGCAAGAAGUUCAAGGUCCUGGGAAACACAGACAGACACAGCAUCAA GAAGAACCUGAUCGGAGCACUGCUGUUCGACAGCGG AGAAACAGCAGAAGCAACAAGACUGAAGAAC AGCAAGAAGAAGAUACACAAGAAGAAAGAACAGAAUCUGCUACCUGCAGGAAAUCUUCAGCAACGAAAU GGCAAAGGUCGACGACAGCUUCUUCCACAGACUGGAAGAAAGCUUCCUGGUCGAAGAAGACAAGAAGCA CGAAAGACACCCGAUCUUCGGAAACAUCGUCGACGAAGUCGCAUACCACGAAAAGUACCCGACAAUCUAC CACCUGAGAAAGAAGCUGGUCGACAGCACAGACAAGGCAGACCUGAGACUGAUCUACCUGGCACUGGCA CACAUGAUCAAGUUCAGAGGACACUUCCUGAUCGAAGGAGACCUGAACCCGGACAACAGCGACGUCGAC AAGCUGUUCAUCCAGCUGGUCCAGACAAUACAACCAGCUGUUCGAAGAAA ACCCGAUCAACGCAAGCGGAGUCGACGCAAAGGCAAUCCUGAGCGCAAGACUGAGCAAGAGCAGAAGACUGGAAAACCUGAUCGCACAG CUGCCGGGAGAAAAGAAGAACGGACUGUUCGGAAACCUGAUCGCACUGAGCCUGGGACUGACACCGAAC UUCAAGAGCAACUUCGACCUGGCAGAAGACGCAAAGCUGCAGCUGAGCAAGGACACAUACGACGACGACCUGGACAACCUGCUGGCACAGAUCGGAGACCAGUACGCAGACCUGUUCCUGGCAGCAAAGAACCUGAGCGACGCAAUCCUGCUGAGCGACAUCCUGAGAGUCAACACAGAAAUCACAAAGGCACCGCUGAGCGCAAGC AUGAUCAAGAGAUACGACGAACCACCAGGACCUGACACUGCUGAA GGCACUGGUCAGACAGCAGCUG CCGGAAAAGUACAAGGAAAUCUUCUUCGACCAGAGCAAGAACGGAUACGCAGGAUACAUCGACGGAGGA GCAAGCCAGGAAGAAUUCUACAAGUUCAUCAAGCCGAUCCUGGAAAAGAUGGACGGAACAGAAGAACUG CUGGUCAAGCUGAACAGAGAAGACCUGCUGAGAAAGCAGAGAACAUUCGACAACGGAAGCAUCCCGCAC CAGAUCCACCUGGGAGAACUGCACGCAAUCCUGAGAAGACAGGAAGACUUCUACCCGUUCCUGAAGGAC AACAGAGAAAAGAUCGAAAAGAUCCUGACAUUCAGAAUCCCGUACUACGUCGGACCGCUGGCAAGAGGA AACAGCAGAUUCGCAUGGAUGACAAGAAAGAGCGAAGAAACAAUC ACACCGUGGAACUUCGAAGAAGUC GUCGACAAGGGAGCAAGCGCACAGAGCUUCAUCGAAAGAAUGACAAACUUCGACAAGAACCUGCCGAAC GAAAAGGUCCUGCCGAAGCACAGCCUGCUGUACGAAUACUUCACAGUCUACAACGAACUGACAAAGGUC AAGUACGUCACAGAAGGAAUGAGAAAGCCGGCAUUCCUGAGCGGAGAACAGAAGAAGGCAAUCGUCGAC CUGCUGUUCAAGACAAACAGAAAGGUCACAGUCAAGCAGCUGAAGGAAGACUACUUCAAGAAGAUCGAA UGCUUCGACAGCGUCGAAAUCAGCGGAGUCGAAGACAGAUUCAACGCAAGCCUGGGAACAUACCACGACCUGCUGAAGAUCAUCAAGGACAAGGACUUCCUGGACAACGAAG AAAACGAAGACAUCCUGGAAGACAUCGUCCUGACACUGACACUGUUCGAAGACAGAGAAAUGAUCGAAGAAAGACUGAAGACAUACGCACACCUG UUCGACGACAAGGUCAUGAAGCAGCUGAAGAGAAGAAGAUACACAGGAUGGGGAAGACUGAGCAGAAAG CUGAUCAACGGAAUCAGACAAGCAGAGCGGAAAGACAAUCCUGGACUUCCUGAAGAGCGACGGAUUC GCAAACAGAAACUUCAUGCAGCUGAUCCACGACGACAGCCUGACAUUCAAGGAAGACAUCCAGAAGGCAGGGAGACAGCCUGCACGAACACAUCGCAAACCUGGCAGGAAGCCCGGCAAUCAAG AAGGGAAUCCUGCAGACAGUCAAGGUCGUCGACGAACUGGUCAA GGUCAUGGGAAGACACAAGCCGGAA AACAUCGUCAUCGAAAUGGCAAGAGAAAACCAGACAACACAGAAGGGACAGAAGAACAGCAGAGAAAGAAUGAAGAGAAUCGAAGAAGGAAUCAAGGAACUGGGAAGCCAGAUCCUGAAGGAACACCCGGUCGAAAC ACACAGCUGCAGAACGAAAAGCUGUACCUGUACUACCUGCAGAACGGAAGAGACAUGUACGUCGACCAG GAACUGGACAUCAACAGACUGAGCGACUACGACGUCGACCACAUCGUCCCGCAGAGCUUCCUGAAGGACG ACAGCAUCGACAACAAGGUCCUGACAAGAAGCGACAAGAACAGAGGAAAGAGCGACAACGUCCCGAGCG AAGAAGUCGUCAAGAAGAUGAAGAACUACUGGAGACAGCUGCUGAACGC AAAGCUGAUCACACAGAGAA AGUUCGACAACCUGACAAAGGCAGAGAGAGGAGGACUGAGCGAACUGGACAAGGCAGGAUUCAUCAAGA GACAGCUGGUCGAAACAAGACAGAUCACAAAGCACGUCGCACAGAUCCUGGACAGCAGAAUGAACACAA AGUACGACGAAAACGACAAGCUGAUCAGAGAAGUCAAGGUCAUCACACUGAAGAGCAAGCUGGUCAGCG ACUUCAGAAAGGACUUCCAGUUCUACAAGGUCAGAGAAAUCAACAACUACCACCACGCACACGACGCAU ACCUGAACGCAGUCGUCGGAACAGCACUGAUCAAGAAGUACCCGAAGCUGGAAAGCGAAUUCGUCUACGGAGACUACAAGGUCUACGACGUCAGAAAGAUGAUCGCAAAGAGCGAACAGG AAAUCGGAAAGGCAACAG CAAAGUACUUCUUCUACAGCAACAUCAUGAACUUCUUCAAGACAGAAAUCACACUGGCAAACGGAGAAA UCAGAAAGAGACCGCUGAUCGAAACAAACGGAGAAACAGGAGAAAUCGUCUGGGACAAGGGAAGAGACU UCGCAACAGUCAGAAAGGUCCUGAGCAUGCCGCAGGUCAUCGUCAAGAAGACAGAAGUCCAGACAG GAGGAUUCAGCAAGGAAAGCAUCCUGCCGAAGAGAAACAGCGACAAGCUGAUCGCAAGAAAGAAGGACU GGGACCCGAAGAAGUACGGAGGAUUCGACAGCCCGACAGUCGCAUACAGCGUCCUGGUCGUCGCAAAGG UCGAAAAGGGAAAGAGCAAGAAGCUGAAGAGCGUCAAGGAACUG CUGGGAAUCACAAUCAUGGAAAGAA GCAGCUUCGAAAAGAACCCGAUCGACUUCCUGGAAGCAAAGGGAUACAAGGAAGUCAAGAAGGACCUGA UCAUCAAGCUGCCGAAGUACAGCCUGUUCGAACUGGAAAACGGAAGAAAGAGAAUGCUGGCAAGCGCAG GAGAACUGCAGAAGGGAAACGAACUGGCACUGCCGAGCAAGUACGUCAACUUCCUGUACCUGGCAAGCC ACUACGAAAAGCUGAAGGGAAGCCCGGAAGACAACGAACAGAAGCAGCUGUUCGUCGAACAGCACAAGC ACUACCUGGACGAAAUCAUCGAACAGAUCAGCGAAUUCAGCAAGAGAGUCAUCCUGGCAGACGCAAACC UGGACAAGGUCCUGAGCGCAUACAAGCACAGAGACAAGCCGAUCA GAGAACAGGCAGAAAACAUCA UCCACCUGUUCACACUGACAAACCUGGGAGCACCGGCAGCAUUCAAGUACUUCGACACAAUCGACAG AAAGAGAUACACAAGCACAAAGGAAGUCCUGGACGCAACACUGAUCCACCAGAGCAUCACAGGACUGUA CGAAACAAGAAUCGACCUGAGCCAGCUGGGAGGAGACGGAGGAGGAAGCCCGAAGAAGAAGAGAAAGGU CUAGCUAGCCAUCACAUUUAAAAGCAUCUCAGCCUACCAUGAGAAUAAGAGAAAGAAAAUGAAGAUCAA UAGCUUAUUCAUCUCUUUUUCUUUUUCGUUGGUGUAAAGCCAACACCCUGUCUAAAAAACAUAAAUUUC UUUAAUCAUUUUGCCUCUUUUCUCUG UGCUUCAAUUAAUAAAAAAUGGAAAGAACCUCGAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAUCUAG SEQ ID NO: 19268 EMX1 sgRNA modified Contains chemical modifications from Synthego: 2'-F, 2'-O-methyl, phosphorothioate GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUU SEQ ID NO: 19269 EMX1 sgRNA unmodified No modification GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUU SEQ ID NO: 19270 Eco3 ncRNA Eco3 ncRNA has only a 10 bp insertion GGGAUAAUUGAUAAGAAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGA GGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAGAU SEQ ID NO: 19271 Eco3 ncRNA-MS2 Eco3 ncRNA has only a 10 bp insertion and a 3' MS2 stem loop GGGAUAAUUGAUAAGAAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGA GGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19272 Eco3 circular ncRNA Eco3 ncRNA, with a 10 bp insertion only in circular form UACCGGCGAAACAAAAGAAAAAACCAAAAAAACAAAACACAUGAUAAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCCCAUCACAUCAACCGGUGGCGCAU UGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAAAAACAAAAAACAAAACGGCUAUUAUGCGUUACCGGCG SEQ ID NO: 19273 Aco1 RT Aco1 RT RNA sequence, including 5' and 3' UTR AGGGGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACCAUGGAGCCCAAUGACUACGUAAAUAGGUUGCGACAUGCUAUGGAAAUAAGUGAAAACCCGCGCUUUAGCCCUGAGUACAUUGCUCAAUGUUGUACUUACGCCGAGAAUCUCCUCAAGCAAGGGCUGCCAGUUCUGUUCGAUCAAACGCAUAUUCGGAAAGUUCUGGGGAUGGCGGCACCUCGAUUGUGUGAUUAUCACAGA UUCACAAUCCCAAAGCACAACGGAUCUAGAAUUAUCACGGCCCCAUCUA GGAAGCUGAAGCUUCGACAACAAUGGAUAUACCAGAAUAUCCUUAUACGAAAGGAGGCUUCACCGUACACGCACGGAUUUGUUCCUGAACGCAGCAUCGUGACUAACGCAAUCCUCCAUAUAGGAUACGCAUACACCUACUGCGUGGAUAUCACGGAUUUCUUUCCUAGCAUCACUAAGAAGCAGGUCUUGCCUAUAUUCCGAAAUAUGGGCUAUAGUGGUUCUGCUGCAAAUACUCUGC GACCUCUGUUGCUAUGACGGGGUCCUCCCCCAGGGGGCGCCUACUAGCCCAUACCUCA GUAACAUGAUUUGUCGCGAUCUUGAUGACGAAUUGGGGGCUAUGGCCGGCGGUUCCGGGGGAUUUUUACACGGUAUGCGGAUGACAUAGCUAUCCACAAACCAACAACAGCCGCAACUUUUGGAUGCCUUGGGACUUAUCCUCGGGAAGCACGGAUUUCUUAUGAAUCUCGAUAAGUGUCGAGUCUAUAAUCCUGGACAGCCCAAAAGAAUUACUGGAUUGACCGUUCACAAUA GAGUAUCAGUUCCGAAAACCUUUAAGCGAAAUUGCGGCAGAAAUACAUUACUGUCAGAAGU UCGGAGUGACUGCACAUUUGGAAAACACGAAGGCUGCACGAUCCAUCCACUAUAGGGAACAUCUGUAUGGAAAGGCAUACUAUGUUAAAAUGGUUGAGCCCUGAGCUCGGGGCGCACUUCCUCGAUGAGUUGUCAAAGGUAGACUGGCCAGAGCCACCAAAGAAGAAAAGAAAGGUCUGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCCAGCCCUCCUCCCCUUCCUGC ACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGCCCUGCA SEQ ID NO: 19274 Aco1 ncRNA_12 Aco1 ncRNA contains a 12 bp insertion in the EMX1 gene guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgcggcguCAGAAGAAGAAGGGCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCCAAUGuucgaacgaucgGGUGGGCAACCACAAACCCACGAGGGCAGAGUGCUGCUUGCUGCUacgcacgc acggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAAACGUCACCUCCAAUGACUAGGGguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19275 Aco1 ncRNA-sgRNA_50 Aco1 ncRNA-sgRNA with 50 bp insert GGGGUAUAAAACCGGGAACGAUCAGACCGGGGUGAAUUCGCCCCCUUGAUCAAACGGCACUAACCACUGUUUGCCGUGCGUGCGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCC ACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAACGCACGCACGGCAAACAGACAGAUCCAUUAUUAUUCACAAUUUAUUUAGUGAUCGUUCCCGGUUUAUACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUUCCUGCA SEQ ID NO: 19276 Aco1 sgRNA-ncRNA_50 Aco1 sgRNA-ncRNA with 50 bp insert GGGGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGUGCUUUUUGUAUAAAACCGGGAACGAUCAGACCGGGGUGAAUUCGCCCCCUUGAUCAAACGGCACUAACCACUGUUUGCCGUGCGUGCGUACAAACGGCAGAAGCUGGAGGAGGA AGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAACGCACGCACGGCAAACAGACAGAUCCAUUAUUAUUACAAUUUAUUUAGUGAUCGUUCCCGGUUUUAUACCCUGCA SEQ ID NO: 19277 Eco3_EMX1 gRNA_ncRNA_25 The Eco3 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 5' end GAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUUAAUAACAACGAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCU GAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCA SEQ ID NO: 19278 Eco3_ncRNA only Eco3 ncRNA contains a 25 bp insertion in the EMX1 gene GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAUUACGUCUGCAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUC GAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCA SEQ ID NO: 19279 Eco3_ncRNA_AAVS1 gRNA_25 Eco3 ncRNA contains a 25 bp insertion in the AAVS1 gene, which is fused to the AAVS1 sgRNA at the 3' end GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGACUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUC UAACCCCCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGGGGGCCACUAGGGACAGGAUGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUU SEQ ID NO: 19280 Eco3_ncRNA_EMX1 gRNA_25 Eco3 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAG CAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGGCUUUUU SEQ ID NO: 19281 Eco3_ncRNA_EMX1 gRNA_50 Eco3 ncRNA contains a 50 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCA CAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU SEQ ID NO: 19282 Eco3_ncRNA_EMX1 gRNA_75 Eco3 ncRNA contains a 75 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACU CCUUAa aguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUU SEQ ID NO: 19283 Eco3_ncRNA_EMX1 gRNA_100 Eco3 ncRNA contains a 100 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAU GGGGCCCAUGGUUGAA UGACUCCUAUAaaguuCUCCCAUUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAG UCGGUGCUUUUU SEQ ID NO: 19284 Eco3_ncRNA_EMX1 gRNA_GFP gene Eco3 ncRNA contains a GFP gene inserted into the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end. The entire GFP cassette is in antisense orientation and contains a miniature EF1a promoter and a β-globin poly A signal. GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAcccauauguccuuccgagugagagacacaaaaaauuccaacacaccuauugcaaugaaaauaa auuuccuuuuagccagaagucagaugcucaaggggcuucaugauguccccauaauuuuuggcagagggaaaaagaucucagugguauuugugagccagggcauuggccaccagccacccaccuugauaggcagccugcaccugaggagugcggccgcuuuacuuguacagcucguccaugccgaga gugaucccggcggcggucacgaacuccagcaggaccaugugaucgcgcuucucguuggggucuuugcucagggcggacugggugcucagguaguguugucgggcagcagcagcacggggccgucgccgaugggguguucugcugguaguggucggcgagcugcacgcugccguccucgauguuguggcggaucuugaaguucaccuugaugccguucuucugcuugu cggccaugauauagacguuguggcuguaguuguacuccagcuugugccccaggauguugccguccuccuugaagucgaugcccuucagcucgaugcgguucaccagggugucgccucgaacuucaccucggcgcgggucuuguaguugccgucguccuugaagaagauggugcgcuccuggacguagcc uucgggcauggcggacuugaagaagucgugcugcuucauguggucgggguagcggcugaagcacugcacgccguaggucaggguggucacgagggugggccagggcacgggcagcuugccgguggugcagaugaacuucagggucagcuugccguagguggcaucgcccucgcccucgccggacacgcugaacuuguggccguuuacgucgccguccagcucgaccag gaugggcaccacccggugaacagcuccucgcccuugcucaccaugguggcgaccgguggaucccgggcccgcgguaccgucgacugcagaauucgaagcuugagcucgagaucugaguccgguagcgcuagcgggaucugacgguucacuaaacccuguguucuggcggcaaacccguugcgaaaaagaa cguucacggcgacuacugcacuuauauacgguucucccccacccucggggaaaaggcggagccaguacacgacaucacuuucccaguuuaccccgcgccaccuucucuaggcaccgguucaauugccgaccccuccccccaacuucucggggacugugggcgaugugcgcucugcccguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGC CAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGGCUUUUU SEQ ID NO: 19285 Eco3_NT_ncRNA_EMX gRNA_25 The Eco3 ncRNA contains a 25 bp insertion in the EMX1 gene (on the opposite strand to that cut by the sgRNA) which is fused to the EMX1 sgRNA at the 3' end GAAAUGAUAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAugacaucgauguccuccccauuggccugcuucguggcaaugcgccaccgguugaugugaugggagaacuuUAUAGGAGUCAUUCAGAGCUCGAGUuuucuucugcggacuca ggcccuuccuccuccagcuucugccguuuguUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCUUAUCAAAUAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGGCUUUUU SEQ ID NO: 19286 Aco1_EMX1 gRNA_ncRNA_25 The Aco1 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 5' end gaguccgagcagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuuAAUAACAACguauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccgcgcgugcguacaaacggcagaagcugga ggaggaagggccugaguccgagcagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauauuacaauuuauuuagugaucguucccgguuuuauac SEQ ID NO: 19287 Aco1_ncRNA only Aco1 ncRNA contains a 25 bp insertion in the EMX1 gene guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcggcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacg cacgcacggcaaacagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauac SEQ ID NO: 19288 Aco1_ncRNA_AAVS1 gRNA_25 The Aco1 ncRNA contains a 25 bp insertion in the AAVS1 gene, which is fused to the AAVS1 sgRNA at the 3' end guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcggcguUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAacgcacgcacggcaa acagacagauccauuauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19289 Aco1_ncRNA_EMX1 gRNA_25 The Aco1 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaac gcacgcacggcaaacagacagauccauuauauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19290 Aco1_ncRNA_EMX1 gRNA_50 The Aco1 ncRNA contains a 50 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaac gcacgcacggcaaacagacagauccauuauauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19291 Aco1_ncRNA_EMX1 gRNA_75 The Aco1 ncRNA contains a 75 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaucaacc ggg gcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauuacaauuauuuagugaucguucccgguuuuuacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccga gucggugcuuuuu SEQ ID NO: 19292 Aco1_ncRNA_EMX1 gRNA_100 Aco1 ncRNA contains a 100 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcggcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAa aguucuccaca caucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaacgcacgcacggcaaacagacagauccauuauauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaa aguggcaccgagucggugcuuuuu SEQ ID NO: 19293 Aco1_ncRNA_EMX1 gRNA_GFP gene Aco1 ncRNA contains a GFP gene inserted into the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end. The entire GFP cassette is in antisense orientation and contains a miniature EF1a promoter and a β-globin poly A signal. guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcgugcguacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaacccauauguccuuccgugagagagacacacaaaaaauuccaaccacacuauugcaaugaaaauaaauuuccuuuuagccagaagucagaug cucaaggggcuucaugaugucccccauaauuuuuggcagagggaaaaagaucucagugguauuugugagccagggcauuggccaccagccacccucugauaggcagccugcaccugaggagugcggccgcuuuacuuguacagcucguccaugccgagagugaucccggcggcggucacgaacuccagc aggaccaugugaucgcgcuucucguuggggucuuugcucagggcggacugggugcucagguagugguugucgggcagcagcacggggccgucgccgaugggguguucugcugguaguggucggcgagcugcacgcugccguccucgauguuguggcggaucuugaaguucaccuugaugccguucuucugcuugucggccaugauauagacguuguggcuguguagu uguacuccagcuuggccccaggauguugccguccuccuugaagucgaugcccuucagcucgaugcgguucaccaggggugucgcccucgaacuucaccucggcgcgggucuuguaguugccgucguccuugaagaagauggugcgcuccuggacguagccuucgggcauggcggacuugaagaag ucggcugcuucauguggucgggguagcggcugaagcacugcacgccguaggucaggguggucacgagggugggccagggcacgggcagcuugccgguggugcagaugaacuucagggucagcuugccguagguggcaucgccuccgcccucgccggacacgcugaacuuguggccguuuacgucgccguccagcucgaccaggaugggcaccacccggugaac agcuccucgcccuugcucaccaugguggcgaccgguggaucccgggcccgcgguaccgucgacugcagaauucgaagcuugagcucgagaucugaguccgguagcgcuagcggaucugacgguucacuaaacccuguguucuggcggcaaacccguugcgaaaaagaacguucacggcgacuacugcac uuauauacgguucucccccacccucgggaaaaaggcggagccaguacacgacaucacuuucccaguuuaccccgcgccaccuucuaggcaccgguucaauugccgaccccuccccccaacuucucggggacugugggcgaugugcgcucugcccguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugu caacgcacgcacggcaaacagacagauccauuauauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19294 Aco1_NT_ncRNA_EMX1 gRNA_25 The Aco1 ncRNA contains a 25 bp insertion in the EMX1 gene (on the opposite strand to that cut by the sgRNA) which is fused to the EMX1 sgRNA at the 3' end guauaaaaccgggaacgaucagaccggggugaauucgcccccuugaucaaacggcacuaaccacuguuugccggcgugcguugacaucgauguccuccccauuggccugcuucguggcaaugcgccaccgguugaugugaugggagaacuuUAUAGGAGUCAUUCAGAGCUCGAGUuuucuucugcucggacucaggcccuuccuccuccagcuucugccguuugu acgcacgcacggcaaacagacagauccauuauuacaauuuauuuagugaucguucccgguuuuauacAAUAACAACgaguccgagcagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19295 R2042_EMX1 gRNA_ncRNA_25 The R2042 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 5' end gaguccgagcagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuuAAUAACAACGCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCA CGCUCCCUUUAGCAGAGC UAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGC SEQ ID NO: 19296 R2042_ncRNA_25 only R2042 ncRNA contains a 25 bp insertion in the EMX1 gene GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAUGACUCCUAUAAAGU UCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGC SEQ ID NO: 19297 R2042_ncRNA_AAVS1 gRNA_25 The R2042 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the AAVS1 sgRNA at the 3' end GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAUGACAGAAAAGC CCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19298 R2042_ncRNA_EMX1 gRNA_25 The R2042 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAAAAGACUCCUAUAAAGUUC UCCCAU CACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuu uuu SEQ ID NO: 19299 R2042_ncRNA_EMX1 gRNA_50 The R2042 ncRNA contains a 50 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAUCAGUAUCAUGGGGCCC AUGGUUGAAUGACUCCUAU AaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcacc gagucggugcuuuuu SEQ ID NO: 19300 R2042_ncRNA_EMX1 gRNA_75 The R2042 ncRNA contains a 75 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACCGCAGCUUGCC AGCACUUUCAGUAUCAUGGGGCCCAUGGUUG AAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuuga aaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19301 R2042_ncRNA_EMX1 gRNA_100 The R2042 ncRNA contains a 100 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAaACUCGAGCUCUGAGCCCCACUGUCGAGAA GUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUG GGGCCCAUGGUUGAAUGACUCCUAUAaaguuCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCAAAGCUUGUGCCUGAGUGAGAGGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccgu uaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19302 R2042_ncRNA_NT_EMX1 gRNA_25_Dual guide The R2042 ncRNA contains a 25 bp insertion in the EMX1 gene (on the opposite strand to that cut by the sgRNA) that is fused to the EMX1 sgRNA at the 3' end GCGUUAAGGUGGUUAUAUUCUAGUAUUUAUGAAGUGUAGUCGCUUCGAUCGUUAAGGCUGAUUUUAACCUCUGCAUAAUAAUAUCGGUAGAUAUUAUUAUGCACGCUCCCUUUAGCAGAGCUAAGAAUCGCUCACUCAGGCACAAGCUUUGAGGugacaucgauguccuccccauuggccugcuucguggcaaugcgccaccuauaguuaguguacugcaacuuUAUAGGAG UCAUUCAGA GCUCGAGUuuucuucugcucggacucaggcccuuccuccaccagcuucugccguuuguCCUCAAAGCUUGUGCCUGAGUGAGAGCUAAAGAAAAGAAAAGUAGAAUAAGCCACCUUAACGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagu cggugcuuuuu SEQ ID NO: 19303 R2042_RT R2042 RT mRNA sequence, coding sequence only AUGAAGGACGACCAGUACUCUCAGUGGAAGAAGUACUACGAGAGCAGGGGCAUCCUGCCCGAGAUCCAGGACAAGCUCCUGAACUACGCCAAGAUCCACAUCGACAACAACACCCCGGUGAUCUUCAACUUCGAGCACCUGACCCUGCUGCUGGGCAGGGAGAAGAACUACCUGUCCAGCGUGGUGAACAGCCCCGACAGCCACUACAGAAAGUUCAAGAUCAAGAAGAGAUCCGGAGGCGAGAGGGAGAUC ACCGCUCCC UAUCUCAGCCUCCUGGAGAUGCAGUACUGAUCUACAGGAACAUCCUGAUCAACGUGAAGAUCCACUACGCCGCUCACGGCUUCGCUCAGGAUAAGAGCAUUAUCACCAACUCCAGGAACCACCUGGGGCAGAAACAUCUGCUCAAGAUGGAUCUGAAGGAUUUCUUCCCCAGCAUCAAGCUGAACAGAAUCAUCUACAUCUUCAAGAGCCUGGGCUACCCCAUAAUCAUCGCCUUCUACCUGGC CAGCAUCUGCAGCUACA AGGGCCACCUGCCCCAGGGCAGCCCCACAAGCCCUAUCCUGAGCAACAUCGUGAGCAUCACCCUGGACAACAGACUGGUGAAGUUCGCCAGAAAGAUGAAGCUGAGAUCAGCAGGUACGCCGACGACCUGACGUUCAGCGGGGACAAGAUCCCCACCAACUACAUCAAGUACAUCACCGACAUCAUCAAUGAUGAGGGCUUCGAGGUGAACGACACCAAGACCAAGCUCUACCUGAAGGCCGGGAAGAGAAUCGUGACG GG CAUUUCCGUGAUCGGAAAUGACCCGAAGCUUCCGCGGGAAUACAAGCGGAAGCUGAAGCAGGAGCUGCACUACAUCUUCACCUACGGCAUCGGCAGCCACAUGGCCAAGAAGAAGAUCAAGAAGAUCAACUACCUACAGGAUCAUCGGCAAGGUGAACUUCUGGCUGAACAUCGAGCCCGACAACGAGUACGCCAGAAACGCCAAGGCCAAGCUGCUGCUGCUCAUCGACAACccaccaaagaaaaaa agaaagucuga SEQ ID NO: 19304 R6943 RT R6943 RT mRNA sequence, coding sequence only AUGGAGGAGAGCACCAACUACAAGCUGCUCGGUGUGGGGACUGAGCGUGAUCCAACCCGCUACCCCCAACGAGGUGCUGAACUACCUCACCAGCACCCUGAACGAUAACGGGCUGCUGCCCGACGUGGAGAAGAUGAUCCACUACUUUGAGCUGCUGGACCAGCUGGGCUACAUCCACCAGGUGAGCAAGAGGAACAACCUCUACUCACUGACCCCCAGGGGCAACGAAAGGUUGACCCCUGCCCUGAAGAGACU CAGGGACAAGAUCAGACUGUUCAUGCUGGACAACUGCCACAGCAUCAGCAAGCUGGGCGUGCUGGCCAGCACAGAUACAGAGAACAUGGGGGGCGACAGCCCCUCACUCCAGCUGAGGCACAACCUC AAGGAGGUGCCUCCAUCCUUAGCUGGGCUGCUGGAACCCUGCCUAGCUCCUAGGCAGGCUUGGGUUCGGAUCUACGAACAGCUGAACAUCGGCAGCAUGAGCAGCGACGAGGCUAGCACACCCACCACCGCCAGAAAUGCCCCCCUGAGCUUCGUGGGCAGGCUGGGCUUCAGCCUGAACUACUACAGCUUCAACAAGAUCGACGAGCCCCUGUUCAACAACGAUGGCGUGACCGCCAUCGCCAGCUG CAUCGGCAUCAGCCCCGGGCUGAUUACCGCUAUGGUGAAGUCACCAAAGCGGUACUACAGGACCUUCAACCUGAGAAAGAAGUCCGGGGGCUUCAGAUCCAUUCUGGCCCCCAGAAAGUUCAUCAAG ACCAUCCAGUACUGGCUGAAGGAUCAUGUGCUGAACAGGCUCAAGAUCCACAGCUCCUGUUACAGCUACAGGAGCGGCGUGUCCAUCAAGGACAACGCCAUCAACCACGUGAAGAAGAAGUUCGUGGCCAGCAUCGACAUUUCCGAUUACUUCGGAAGCAUCAACAAGAAGAUGGUGAAGGACUGCUUUUACAAGAACAAUAUUCCCGAUCACAUCGUGAAUACCAUCAGCGGCAUCGUGACCUACAA CGACGUGCUGCCUCAGGGCGCUCCCACCAGCCCUAUCAUUAGCAACGCCAUCCUGUUCGAGUUCGACGAGGAGAUGACGGCUCAUGCCCUCACUCUCGACUGUAUCUACACCAGAUACAGCGACGACAUCUCG AUAUCCUCCGACUAUAAGGAGAAUAUCGCCAUCCUGAUCAACAUCGCCGAGGCCAACCUGUUGAGCGCUGGAUUCACGCUCAACAGACAGAAGCAAAGGAUUGCUUCUGACAACAGCCGCCAGGUUGUGACCGGCAUCCUGGUGAACGAGCAUCAGACCCACCAGAUGCUACAGAAAGAAGAUCAGAAGCGCCUUUGAUCACGCCCUGAAGGAGCAGGACGGCUCCCAGCUGACAAUCAACAAGUU GAGGGGCUACCUCAACUACCUGAAGUCCUUCGAGACCUACGGCUUCAAGUUCAACGAGAAGAAGUAUAAGGAGACCCUGGAUUUCCUGAUCGCUCUGAAGCAGAGCccaccaaagaagaaaagaaaggucuga SEQ ID NO: 19305 R6943_ncRNA_EMX1gRNA_GFP gene The R6943 ncRNA contains a GFP gene inserted into the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end. The entire GFP cassette is in antisense orientation and contains a miniature EF1a promoter and a β-globin poly A signal. GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaacccauauguccuuccgugagagacacacaaaaaauuccaacacacuauugcaaugaaaauaauuuccuu uauuagccagaagucagaugcucaaggggcuucaugauguccccuaaauuuuuggcagagggaaaaagaucucagugguauuugugagccagggcauuggccaccagccaccaccuugauaggcagccugcaccugaggagugcggccgcuuuacuuguacagcucguccaugccgagagugaucccggcg gcggucacgaacuccagcaggaccaugugaucgcgcuucucguuggggucuuugcucagggcggacuggggcucagguaguguugucgggcagcagcacggggccgucgccgauggggguguucugcugguaguggucggcgagcugcacgcugccguccucgauguuguggcggaucuugaaguucaccuugaugccguucuucugcuugucggccaugau auagacguuguggcuguuguaguuguacuccagcuuggccccaggauguugccguccuccuugaagucgaugcccuucagcucgaugcgguucaccaggggugucgccucgaacuucaccucggcgcgggucuuguaguugccgucguccuugaagaagauggugcgcuccuggacguagccuucgggcauggcggac uugaagaagucgugcugcuucauguggucgggguagcggcugaagcacugcacgccguaggucaggguggucacgagggugggccagggcacgggcagcuugccgguggugcagaugaacuucagggucagcuugccguagguggcaucgcccucgcccucgccggacacgcugaacuuguggccguuuacgucgccguccagcucgaccaggaugggcaccacccc ggugaacagcuccucgcccuugcuccaugguggcgaccgguggaucccgggcccgcgguaccgucgacugcagaauucgaagcuugagcucgagaucugaguccgguagcgcuagcgggaucugacgguucacuaaacccuguguucuggcggcaaacccguugcgaaaaagaacguucacggcgacuacugcac uuauauacgguucucccccacccucgggaaaaaggcggagccaguacacgacaucacuuucccaguuuaccccgcgccaccuucuaggcaccgguucaauugccgaccccuccccccaacuucucggggacugugggcgaugugcgcucugcccguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugu caGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19306 R6943_ncRNA_EMX1gRNA_100bp The R6943 ncRNA contains a 100 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAU GGUUGAAUGACUCCUAUAaaguu cucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuuga aaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19307 R6943_ncRNA_EMX1gRNA_75bp The R6943 ncRNA contains a 75 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggagggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaaaaagu ucucccaucacauc aaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggca ccgagucggugcuuuuu SEQ ID NO: 19308 R6943_ncRNA_EMX1gRNA_50bp The R6943 ncRNA contains a 50 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucccaucacaaccggugg cgc auugccacgaagcaggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucgg cuuuuu SEQ ID NO: 19309 R6943_ncRNA_EMX1gRNA_25bp The R6943 ncRNA contains a 25 bp insertion in the EMX1 gene, which is fused to the EMX1 sgRNA at the 3' end GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggagggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaagc aggccaauggggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19310 R6943_ncRNA_AAVS1 gRNA_25bp The R6943 ncRNA contains a 25 bp insertion in the AAVS1 gene, which is fused to the AAVS1 sgRNA at the 3' end GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCCGCUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUC CUGAUAUUGGGUCUAACCCCCAGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGCAAUAACAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19311 R6943_ncRNA only_25bp R6943 ncRNA contains a 25 bp insertion in the EMX1 gene GCGGAGUGCUGGCCUCAACUGAUACAGAGAAUAUGGGCGGUGAUUCGCCGUCUUUACAGUUAAGGCACAAUUUAAAAGAGGUUCCGCACCCAAGCCUGUCUUGGGCUGCacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucuccccaucacaucaaccgguggcgcauugccacgaagcaggccaaugg ggaggacaucgaugucaGCAGCCCAAGACAGGCUUGGGUGCGGAUCUACGAGCAAUUAAAUAUUGGUUCGAUGUCCAGUGAUGAGGCCAGCACCCGC SEQ ID NO: 19312 AAVS1 sgRNA Contains chemical modifications from Synthego: 2'-F, 2'-O-methyl, phosphorothioate GGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19313 R1262_RT-1 (R1262 RT) R1262 retrotranscript RT mRNA AGGAUAAUGGGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACCAUGAUCAGCUUCAGCGAGAUCAAGAGCAGAAACGAUUUUGCAGACGCUCUGCAGAUUCCUCGGAGCGUGCUGACGCACGUUCUGUACAUUGCCAAGCCCGAGAGCUUCUACGAGAGCUUCACCAUCCCCAAGAAGAAUGGGGAGGACAGAAUCAUCAUGGCCCCCAAGGGCACCCUGAAGU CCAUCCAGACCAAGCUGAGCAAGCAGCUGGUGGAGUACAGAGCCUCCAUCAGCCAGAAAGGCCAGGAGAAGUCCAACAUCUCCCAUGGCUUCGAGAGGGAGAAGUCCAUCAUCACCAAUGCUCAGAUACACCGCAACAAGCGGUAUGUCAUCAACUACGACCUGAAGGACUUCUUCGACUCCUUCCACUUCGGCAGAGUGGUGGGAUUCUUCGAGAAGAACAAGCACUUCCUGCUGC CCUACGAGGUGGCCGUGAUCAUCGCCCAGCUGACCUGCUAUAAUGGCAGGCUGCCCCAGGGCGCCCCCACAAGCCCUGUGAUCACCAACCUGAUCUGCGAGAUCCUGGACUACAGGGUGCUCAAGAUCGCCAAGAGAUACAAGCUGGACUACACCAGGUACGCUGACGACCUGACCUUCAGCACCAACUACUCCAGAUUCCUGGAGGUGUUCGACAGCUUCGCCAAGGAGCUGCUC CAGGAGAUCAGCAACUCUGGUUUUACCAUAAACCAGAGCAAGACCAGGCUGCUGUACAGAGACAGCCGCCAGGAGGUCACAGGCCUCGUGGUGAAUAAGAAGAUCGGCGUGAACAGAGAGUACGUCAAGAGCACUAGGGCGAUGGCCCAGGCUCUGUACAGCACCGGCGAGUUCACCAUCAACGGCAUCCCCGGCACCAUCAAACAGCUGGAGGGCAGGUUCGGCUUCAUCGACCAG CUGGACCACUACAACAACGUGAUCGACGAUGCCAAGCACGACGCCUACUCCCUGAACGGCAGGGAGAAGCAGUUCCAGGAGUUCCUGUUCUUCAAGACAUUCUUCAACGAGUACCCCCUGGUGAUCACCGAGGGCAAGACCGACAUCAGAUACCUGAAGGCCGCCCUCAAGAGCCUGCACCAGAAGUACCCCGAGCUGAUCUGCAAGGAGGACGACGGAACCUUUCGGUUCAA GAUCAGCUUCUUCAGAAGAUCCAAGAGAUGGAAGUACUUCUGGCAUCAGCAAGGAUGGCGCUGAUGCCAUGAAGCUGCUGUACCGGUUCUUUACCGGACAGAAGGGCGUGAAGAACUACUACAGGCUGUUCGCCGAGAAGUACAAGGCCGUGCAGAGAAACCCCGUGAUCAUGCUGUUUGACAACGAGAUGGAGAGCAAGAGACCCCUCAACAAGUUCAUCUCCGAGGAGGUGAA GAUUCCCUCCUCGGAGCAGCAGCUGUUCAAGGAGCAGCUGUACUACCACCUGAUCCCCGGCAGCAAGACCUACCUGAUGACCCACCCCUUGCCUCCUGGCAAGACAGAGGCCGAGAUCGAGGACCUGUUCCCUACCGAGGUGUUGGGCGUGAAGCUCGACGGCAAGAGCUUCAGCACCAAGGACAAGUUCGACACCAGCAAGUUCUACGGCAAGGACAUCUUCAGCAGCUACGUGUA CGAGCACUGGAAGUCCAUCGACUUCAGCGGCUUCAUCCCCCUGCUGGACAAGAUCACAUGCUCGUGCAGAACGAGAAGAAGCCCGGGCUGAACACACCACCAAAGAAGAAAAGAAAGGUCUGAGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGCGAU SEQ ID NO: 19314 R1262_nc-5 (R1262_ncRNA-EMX1_505) R1262 ncRNA contains a 505 bp insertion in the EMX1 locus GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAACUUCAGCUAAGGAAGCUACCAAUAUUUAGUUUCUGAGUCUCACGACAGACCUCGCGCGUAGAUUGCCAUGCGUAGAGCUAAC GAGCCAGCGGAAAGCGUGAGGCGCUUUUAAGCAUGGCGAGUAAGUGAUCCAACGCUUCGGAUAUGACUAUAUACUUAGGUUCGAUCUCGUCCCGAGAAUUCUAAGCCUCAACAUCUAUGAGUUAUGAGGUUAGCCGAAAAAG CACGUGGUGGCGCCCACCGACUGUUCCCAGACUGUAGCUCUUUGUUCUGUCAAGGCCCGACCUUCAUCGCGGCCGAUUCCUUCUGCGGACCAUACCGUCCUGAUACUUUGGUCAUGUUUCCGUUGUAGGAGUGAACCCACUUGCCUUUGCGUCUUAAUACCAAUGAAAAACCUAUGCACUUUGUACAGGGUACCAUCGGGAUUCUGAACCCUCAGAUAGUGGGGAUCCCGGGUAUAGACC UUUAUCUGCGGUCCAACUUAGGCAUAAACCUCCAUGCUACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19315 R1262_nc-3 (R1262_ncRNA-EMX1_305) R1262 ncRNA contains a 305 bp insertion in the EMX1 locus GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAACGACUACCAAAUCCGCAUGUUACGGGACUUCUUAUUAAUUCUUUUUUCGUGAGGAGCAGCGGAUCUUAAUGGAUGGCC GCAGGUGGUAUGGAAGCUAAUAGCGCGGGUGAGAGGGUAAUCAGCCG UGUCCACCAACACAACGCUAUCGGGCGAUUCUAUAAGAUUCCGCAUUGCGUCUACUUAUAAGAUGUCUCAACGGUAUCCGCAACUUGCGAUGUGCCUGCUAUCCUUAAAUGCAUAUCUCGCCCAGUAGCUUCCCAAUAUGAGAGCAUCAAUUGUAGAUCGGGCCGGGAUAGUCAUGUCGUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGA UGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19316 R1262_nc-1 (R1262_ncRNA-EMX1_25) R1262 ncRNA contains a 25 bp insertion in the EMX1 locus GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUC GAUGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19317 R1262_nc-4 (R1262_ncRNA-EMX1_405) R1262 ncRNA contains a 405 bp insertion in the EMX1 locus GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAACUGCUAAAUCCGCGUGAUAGGGGAUUUGAAGUUUAAUCUUCUAUCGCAAGGAACUGCCGAUCUUAAUGGAUGGCCGGAGGUGG UAUGGAAGCUAUAAGCGCGGGUGAGAGGGUAAUUAGGCGUGUUCACCUACGCUACGCUAACGGGCGAUUCUAUAAGAUUGCACAUUGCGUCA ACUCAUAAGAUGUCUCAACGGCAUGCGCAACUUGUGAAGUGUCUACUAUCCUUAAACGCAUAUCUCGCACAGUAACUCCCGAAUAUGUCCGGCAUCAUGUUGCCCGGGCCGAGUUAGUGUUGAGCUCACGGAACUUAUUGUAUGAGUAGUGAUUUGUAAGAGUUGUCAGUUAGCUCGUUCAGGUAAUAGUUGCCCACACAACGUCAAAAUAAGAGAACGGUCGUAACAUAAAGUUCCCAUC ACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19318 R1262_nc-2 (R1262_ncRNA-EMX1_205) R1262 ncRNA contains a 205 bp insertion in the EMX1 locus GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCA AGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19319 R1262_nc-19 (R1262_ncRNA-sgEMX1_205) The R1262 ncRNA contains a 205 bp insertion in the EMX1 locus, which is combined with the EMX1 sgRNA at the 3' end GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCCAGUGUACCUGCAAGCCGAGAUG GCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCCAGACACCGGGGCA CCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCAAUAACAACGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUU AUCAACUUGAAAAAGUGGCACCGAGUCCGGUGCUUUUUGAU SEQ ID NO: 19320 R1262_nc-20 (R1262_sgEMX1-ncRNA_205) The R1262 ncRNA contains a 205 bp insertion in the EMX1 locus, which is combined with the EMX1 sgRNA at the 5' end GGGAUAAUGAGUCCGAGCAGAAGAAGAAGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGGCUUUUUAAUAACAACGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGA AGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGG UCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUG GGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19321 R1262_nc-13 (R1262_ncRNA-AAVS1_25) R1262 ncRNA contains a 25 bp insertion in the AAVS1 locus GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCACAGUGGGGCCACUAGGGACAGAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCG AGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19322 R1262_nc-14 (R1262_ncRNA-AAVS1_205) R1262 ncRNA contains a 205 bp insertion in the AAVS1 locus GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCACAGUGGGGCCACUAGGGACAGAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCG AGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19323 R1262_nc-15 (R1262_ncRNA-AAVS1_505) R1262 ncRNA contains a 505 bp insertion in the AAVS1 locus GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCUCCACCCCACAGUGGGGCCACUAGGGACAGACUUCAGCUAAGGAAGCUACCAAUAUUUAGUUUCUGAGUCUCACGACAGACCUCGCGCGUAGAUUGCCAUGCGUAGAGCUAACGAGCCAGCGG AAAGCGUGAGGCGCUUUUAAGCAUGGCGAGUAAGUGAUCCAACGCUUCGGAUAUGACUAUAUACUUAGGUUCGAUCUCGUCCCGAGAAUUCUAAGCCUCAACAUCUAUGAGUUAUGAGGUUAGCCGAAAAAGCACGU GGUGGCGCCCACCGACUGUUCCCAGACUGUAGCUCUUUGUUCUGUCAAGGCCCGACCUUCAUCGCGGCCGAUUCCUUCUGCGGACCAUACCGUCCUGAUACUUUGGUCAUGUUUCCGUUGUAGGAGUGAACCCACUUGCCUUUGCGUCUUAAUACCAAUGAAAAACCUAUGCACUUUGUACAGGGUACCAUCGGGAUUCUGAACCCUCAGAUAGUGGGGAUCCCGGGUAUAGACCU UAUCUGCGGUCCAACUUAGGCAUAAACCUCCAUGCUACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19324 R1262_nc-18 (R1262_ncRNA-AAVS1-MS2_505) The R1262 ncRNA contains a 505 bp insertion in the AAVS1 locus and an MS2 stem loop at the 3' end GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCUCCACCCCACAGUGGGGCCACUAGGGACAGACUUCAGCUAAGGAAGCUACCAAUAUUUAGUUUCUGAGUCUCACGACAGACCUCGCGCGUAGAUUGCCAUGCGUAGAGCUAACGAGCCAGCGG AAAGCGUGAGGCGCUUUUAAGCAUGGCGAGUAAGUGAUCCAACGCUUCGGAUAUGACUAUAUACUUAGGUUCGAUCUCGUCCCGAGAAUUCUAAGCCUCAACAUCUAUGAGUUAUGAGGUUAGCCGAAAAAGCACGUGGUGGCGCC CACCGACUGUUCCCAGACUGUAGCUCUUUGUUCUGUCAAGGCCCGACCUUCAUCGCGGCCGAUUCCUUCCGGACCAUACCGUCCUGAUACUUUGGUCAUGUUUCCGUUGUAGGAGUGAACCCACUUGCCUUUGCGUCUUAAUACCAAUGAAAAACCUAUGCACUUUGUACAGGGUACCAUCGGGAUUCUGAACCCUCAGAUAGUGGGGAUCCCGGGUAUAGACCUUUAUCUGC GGUCCAACUUAGGCAUAAACCUCCAUGCUACAAUUAAUGACAGAAAAGCCCCAUCCUUAAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19325 R1262_nc-17 (R1262_ncRNA-AAVS1-MS2_205) The R1262 ncRNA contains a 205 bp insertion in the AAVS1 locus and an MS2 stem loop at the 3' end GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCCACAGUGGGGGCCACUAGGGACAGAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUG UGCUUGGAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19326 R1262_nc-16 (R1262_ncRNA-AAVS1-MS2_25) The R1262 ncRNA contains a 25 bp insertion in the AAVS1 locus and an MS2 stem loop at the 3' end GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCAGUGGGGCCACUAGGGACAGAACUCGAGCUCUGAAUGACUCCUAUAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCCUGAUAUUGGGUCUAACCCCCAG CAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19327 R1262_nc-9 (R1262_ncRNA-EMX1-MS2_305) R1262 ncRNA contains a 305 bp insertion in the EMX1 locus and an MS2 stem loop at the 3' end GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAACGACUACCAAAUCCGCAUGUUACGGGACUUCUUAUUAAUUCUUUUUUCGUGAGGAGCAGCGGAUCUUAAUGGAUGGCC GCAGGUGGUAUGGAAGCUAAUAGCGCGGGUGAGAGGGUAAUCAGCCGUGUCCACCA ACACAACGCUAUCGGGCGAUUCUAUAAGAUUCCGCAUUGCGUCUACUUAUAAGAUGUCUCAACGGUAUCCGCAACUUGCGAUGUGCCUGCUAUCCUUAAAUGCAUAUCUCGCCCAGUAGCUUCCCAAUGAGAGCAUCAAUUGUAGAUCGGGCCGGGAUAGUCAUGUCGUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAG CAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19328 R1262_nc-7 (R1262_ncRNA-EMX1-MS2_25) R1262 ncRNA contains a 25 bp insertion in the EMX1 locus and an MS2 stem loop at the 3' end GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACA UCGAUGUCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19329 R1262_nc-6 (R1262_ncRNA-EMX1_P2A-GFP) R1262 ncRNA contains a P2A-GFP insertion in the EMX1 locus GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAGGAAGCGGAGCCACUAACUUCUCCCUGUUGAAACAAGCAGGGGAUGUCGAAGAGAAUCCCGGGCCAGUGAGCAAGGGCGAGGAGCU GUUCACCGGGGUGGUGCCCA UCCUGGUCGAGCUGGACGGCGACGUAAACGGCCACAAGUUCAGCGUGUCCGGCGAGGGCGAGGGCGAUGCCACCUACGGCAAGCUGACCCUGAAGUUCAUCUGCACCACCGGCAAGCUGCCCGUGCCCUGGCCCACCCUCGUGACCACCCUGACCUACGGCGUGCAGUGCUUCAGCCGCUACCCCGACCACAUGAAGCAGCACGACUUCUUCAAGUCCGCCAUGCCCGAAGGCUACGUCCAGGAGCGCACCAUCU UCU UCAAGGACGACGGCAACUACAAGACCCGCGCCGAGGUGAAGUUCGAGGGCGACACCCUGGUGAACCGCAUCGAGCUGAAGGGCAUCGACUUCAAGGAGGACGGCAACAUCCUGGGGCACAAGCUGGAGUACAACUACAGCCACAACGUCUAUAUCAUGGCCGACAAGCAGAAGAACGGCAUCAAGGUGAACUUCAAGAUCCGCCACAACAUCGAGGACGGCAGCGUGCAGCUCGCCGACCACUACCAGCAGAACA CCCCCAUCGGCGACGGCCCCGUGCUGCUGCCCGACAACCACUACCUGAGCACCCAGUCCGCCCUGAGCAAAGACCCCAACGAGAAGCGCGAUCACAUGGUCCUGCUGGAGUUCGUGACCGCCGCCGGGAUCACUCGGCAUGGACGAGCUACAAGUAGAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCG AU SEQ ID NO: 19330 R1262_nc-22 (R1262_sgAAVS1-ncRNA_205) The R1262 ncRNA contains a 205 bp insertion in the AAVS1 locus, which is combined with the sgAAVS1 at the 5' end. GGGAUAAUGGGGCCACUAGGGACAGGAUGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCCGGGCUUUUUAAUAACAACGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCUCCC ACCCCACAGUGGGGCCACUAGGGACAGAAAGCGGUCUUACGGUCAGU CGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCCUCCUUCCUAGUCCUGAAUAUUG GGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19331 R1262_nc-8 (R1262_ncRNA-EMX1-MS2_205) R1262 ncRNA contains a 205 bp insertion in the EMX1 locus and an MS2 stem loop at the 3' end GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAU GGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19332 R1262_nc-21 (R1262_ncRNA-sgAAVS1_205) The R1262 ncRNA contains a 205 bp insertion in the AAVS1 locus, which is combined with sgAAVS1 at the 3' end. GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUCUGUCCCCUCCACCCCACAGUGGGGGCCACUAGGGACAGAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUG GAGUCAAUCGCAUGUAGGAUGGUCUCCAGACACCGGGGCACCAGU UUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCUCCUUCCUAGUCUCCUGAUAUUGGGUCUAACCCCCAGCAUAAUGAGUUGUCACACGCUCAAUAACAACGGGGCCACUAGGGACAGGAUGUUUCAGAGCUAUGCUGGAAACAGCAUAGCAAGUUGAAAUAAGGCUAGUCCGUUAUCAACU UGAAAAGUGGCACCGAGUCCGGUGCUUUUUGAU SEQ ID NO: 19333 R1262_nc-10 (R1262_ncRNA-EMX1-MS2_405) R1262 ncRNA contains a 405 bp insertion in the EMX1 locus and an MS2 stem loop at the 3' end GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAACUGCUAAAUCCGCGUGAUAGGGGAUUUGAAGUUUAAUCUUCUAUCGCAAGGAACUGCCGAUCUUAAUGGAUGGCCGGAGGUGGUA UGGAAGCUAUAAGCGCGGGUGAGAGGGUAAUUAGGCGUGUUCACCUACGCUACGCUAACGGGCGAUUCUAUAAGAUUGCACAUUGCGUCAACUCAUAAG AUGUCUCAACGGCAUGCGCAACUUGUGAAGUGUACUAUCCUUAAACGCAUAUCUCGCACAGUAACUCCCGAAUAUGUCGCAUCUGAUGUUGCCCGGGCCGAGUUAGUGUUGAGCUCACGGAACUUAUUGUAUGAGUAGUGAUUUGUAAGAGUUGUCAGUUAGCUCGUUCAGGUAAUAGUUGCCCACACAACGUCAAAAUAAGAGAACGGUCGUAACAUAAAGUUCCCCAUCACAUCA ACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19334 R1262_nc-11 (R1262_ncRNA-EMX1-MS2_505) R1262 ncRNA contains a 505 bp insertion in the EMX1 locus and an MS2 stem loop at the 3' end GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAACUUCAGCUAAGGAAGCUACCAAUAUUUAGUUUCUGAGUCUCACGACAGACCUCGCGCGUAGAUUGCCAUGCGUAGAGCUAAC GAGCCAGCGGAAAGCGUGAGGCGCUUUUAAGCAUGGCGAGUAAGUGAUCCAACGCUUCGGAUAUGACUAUACUUAGGUUCGAUCUCGUCCCGAGAAUUCUAAGCCUCAACAUCUAUGAGUUAUGAGGUUAGCCGAAAAAGCACGUGGUG GCGCCCACCGACUGUUCCCAGACUGUAGCUCUUUGUUCUGUCAAGGCCCGACCUUCAUCGCGGCCGAUUCCUUCUGCGGACCAUACCGUCCUGAUACUUUGGUCAUGUUUCCGUUGUAGGAGUGAACCCACUUGCCUUUGCGUCUUAAUACCAAUGAAAAACCUAUGCACUUUGUACAGGGUACCAUCGGGAUUCUGAACCCUCAGAUAGUGGGGAUCCCGGGUAUAGACCUUUAUCU GCGGUCCAACUUAGGCAUAAACCUCCAUGCUACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19335 R1262_nc-12 (R1262_ncRNA-EMX1-MS2_P2A-GFP) R1262 ncRNA contains a P2A-GFP insertion in the EMX1 locus and an MS2 stem loop at the 3' end GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAGGAAGCGGAGCCACUAACUUCUCCCUGUUGAAACAAGCAGGGGAUGUCGAAGAGAAUCCCGGGCCAGUGAGCAAGGGCGAGGAGCU GUUCACCGGGGUGGUGCCCAUCCU GGUCGAGCUGGACGGCGACGUAAACGGCCACAAGUUCAGCGUGUCCGGCGAGGGCGAGGGCGAUGCCACCUACGGCAAGCUGACCCUGAAGUUCAUCUGCACCACCGGCAAGCUGCCCGUGCCCUGGCCCACCCUCGUGACCACCCUGACCUACGGCGUGCAGUGCUUCAGCCGCUACCCGACCACAUGAAGCAGCACGACUUCUUCAAGUCCGCCAUGCCCGAAGGCUACGUCCAGGAGCGCACCAUCUUCU UCAAGGACG ACGGCAACUACAAGACCCGCGCCGAGGUGAAGUUCGAGGGCGACACCCUGGUGAACCGCAUCGAGCUGAAGGGCAUCGACUUCAAGGAGGACGGCAACAUCCUGGGGCACAAGCUGGAGUACAACUACAACAGCCACAACGUCUAUCAUGGCCGACAAGCAGAAGAACGGCAUCAAGGUGAACUUCAAGAUCCGCCACAACAUCGAGGACGGCAGCGUGCAGCUCGCCGACCACUACCAGCAGAACACCCCCAUCGGC GAC GGCCCCGUGCUGCUGCCCGACAACCACUACCUGAGCACCCAGUCCGCCCUGAGCAAAGACCCCAACGAGAAGCGCGAUCACAUGGUCCUGCUGGAGUUCGUGACCGCCGCCGGGAUCACUCGGCAUGGACGAGCUGUACAAGUAGAAAGUUCUCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACC CAUGUGAU SEQ ID NO: 19336 R1262_nc-23 (R1262_ncRNA-EMX1_P2A-GFP_LongHA) Same as R1262_nc-6, but with longer homology arms GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUGAGGCCCCAGUGGCUGCUCUGGGGGCCUCCUGAGUUUCUCAUCUGUGCCCCUCCCUCCCUGGCCCAGGUGAAGGGUGGUUCCAGAACCGGAGGACAAAGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGA GUCCGAGCAGAAGAAGGAAGCGGAGCCACUAACUUCUCCCUGUUGAAACAAGCAGGGGAUGUCGAA GAGAAUCCCGGGCCAGUGAGCAAGGGCGAGGAGCUGUUCACCGGGGUGGGGCCCAUCCUGGUCGAGCUGGACGGCGACGUAAACGGCCACAAGUUCAGCGUGUCCGGCGAGGGCGAGGGCGAUGCCACCUACGGCAAGCUGACCCUGAAGUUCAUCUGCACCACCGGCAAGCUGCCCGUGCCCUGGCCCACCCUCGUGACCACCCUGACCUACGGCGUGCAGUGCUUCAGCCGCUACCCCGACCACAUGAAGCAGC ACGACUUCUUCAAGUCCGCCAUGCCCGAAGGCUACGUCCAGGAGCGCAC CAUCUUCUUCAAGGACGACGGCAACUACAAGACCCGCGCCGAGGUGAAGUUCGAGGGCGACACCCUGGUGAACCGCAUCGAGCUGAAGGGCAUCGACUUCAAGGAGGACGGCAACAUCCUGGGGCACAAGCUGGAGUACAACUACAGCCACAACGUCUAUAUCAUGGCCGACAAGCAGAAGAACGGCAUCAAGGUGAACUUCAAGAUCCGCCACAACAUCGAGGACGGCAGCGUGCAGCGCCGACCA CUACCAGCAGAACACCCCCAUCGGCGACGGCCCCGUGCUGCCGCCCGACAAC CACUACCUGAGCACCCAGUCCGCCCUGAGCAAAGACCCCAACGAGAAGCGCGAUCACAUGGUCCUGCUGGAGUUCGUGACCGCCGCCGGGAUCACUCGGCAUGGACGAGCUGUACAAGUAGAAAGUUCUCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCCAAUGACUAGGGUGGGCAACCACAAACCCACGAGGGCAGAGUGCUGCUUGCUGCUG GCCAGGCCCCUGCGUGGGCCCAAGCGCAUAAUGAGUUGUCACACGCUCGAU SEQ ID NO: 19337 R1262_nc-24 (R1262_ncRNA-EMX1-MS2_P2A-GFP_long HA) Same as R1262_nc-12, but with longer homology arms GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUGAGGCCCCAGUGGCUGCUCUGGGGGCCUCCUGAGUUUCUCAUCUGUGCCCCUCCCUCCCUGGCCCAGGUGAAGGGUGGUUCCAGAACCGGAGGACAAAGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGU CCGAGCAGAAGAAGGAAGCGGAGCCACUAACUUCUCCCUGUUGAAACAAGCAGGGGAUGUCGAAGAGAA UCCCGGGCCAGUGAGCAAGGGCGAGGAGCUGUGUACCGGGGUGGGGCCCAUCCUGGUCGAGCUGGACGGCGACGUAAACGGCCACAAGUUCAGCGUGUCCGGCGAGGGCGAGGGCGAUGCCACCUACGGCAAGCUGACCCUGAAGUUCAUCUGCACCACCGGCAAGCUGCCCGUGCCCUGGCCCACCCUCGUGACCACCCUGACCUACGGCGUGCAGUGCUUCAGCCGCUACCCCGACCACAUGAAGCAGCACGACU UCUUCAAGUCCGCCAUGCCCGAAGGCUACGUCCAGGAGCGCACCAUCUUCUU CAAGGACGACGGCAACUACAAGACCCGCGCCGAGGUGAAGUUCGAGGGCGACACCCUGGUGAACCGCAUCGAGCUGAAGGGCAUCGACUUCAAGGAGGACGGCAACAUCCUGGGGCACAAGCUGGAGUACAACUACAGCCACAACGUCUAUAUCAUGGCCGACAAGCAGAAGAACGGCAUCAAGGUGAACUUCAAGAUCCGCCACAACAUCGAGGACGGCAGCGUGCAGCUCGCCGACCACUACCAGCAGAAC ACCCCCAUCGGCGACGGCCCCGUGCUGCUGCCCGACAACCACUACCUGAGCAC CCAGUCCGCCCUGAGCAAAGACCCCAACGAGAAGCGCGAUCACAUGGUCCUGCUGGAGUUCGUGACCGCCGCCGGGAUCACUCGGCAUGGACGAGCUGUACAAGUAGAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCCAAUGACUAGGGUGGGCAACCACAAACCACGAGGGCAGAGUGCUGCUUGCUGCUGGCCAGGCCCCUGC GUGGGCCCAAGCGCAUAAUGAGUUGUCACACGCUCACAUGAGGAUCACCCAUGUGAU SEQ ID NO: 19338 6083 v1 RT 6083 Retrotranscript RT mRNA AUGAGCAACCCUCAGCCCACCAGGGCUGAGAUCUUCGAGCGCAUCAAACAGAGCUCGAAGCAGGAGGUGAUCCUGGAGGAGAUGCAGAGACUGGGCUUCUGGCCAAGGUCAGAAGGCCAGCCUGAGGUCGCUGCCGACCUCAUCCAGAGAGAAGGCGAGCUCCAAAGGGAGCUCGCCGAACUCAACAAGAAGUUGGCGGUCAAACGCAACCCCGAGAGGGCCCUCAGAGAAAUGAGGAAGCAGAGAAUGAAGGAC GCCAGAGACAAGAGAGAGGUGACCAAGAGAGCCCAGGCUCAGCAGAUACGACAAGGCCCUGCUGUGGCACGAGAAGAGGGCCAGCCACGUGGCCUAUCUUGGCCCUGGUGUGAGCGCCAGCUUACACGAGAACAGCUCUGCUACCCAGGAGCAGGGCGACAAGGGGAAACCCAAGAGCCCGAGAUCGGGCUGUGCCAGACCUGCAGAGGCUGACACUGAACGGGCUGCCUGCUGAUUAGCGCAG CCCAA CUCGCUGAGUCCAUGGGAGUCAGCGUGGCUGAGCUGAGAUUCCUCAGCUUCCACAGGGAGGUGGCUAGGACAAACCACUACCACAGCUUCACGCUCCCCAAGAAGACAGGUGGGGAGCGUCUGAUCAGGCCCCCAUGCCCAGACUGAAGAGGCCCAGUACUGGGUGCUGGACAACGUGCUGGCAAAGAUGCCAGCGCACGAUGCGGCUCACGGCUUCCUGGCCGGCAGAAGCAUCAUCAAGCAAUG CCAAGCCC CAUGCGGGGCAGGAUGUGGUCAUUAACUUGGACGUUAAGGACUUCUUCCCCAGCAUCGCCUUCGGCAGAAUCAAGGGCGUGUUCAGACAGCUGGGCUACGGCGAAAGCAUCGCCACAGUGUUCGCCCUGCUGGCAGCGAGAACAGAGCCCAGGCCUGGCAAGUGGACGGAGAGAGACUCUUCGUCGGCGGCAAGGCCAGAGAAAGAGUGCUGCCUCAGGGCGCUCCUACCAGCCCCAUGCCAGACCAACC UGCUG UGCCGGCGGAUGGAUAGACGUCUGCUGGGUCUCGCGAAGCAGCUGGGCUUCGUGUACACCAGAUACGCCGACGACCUGACCUUCAGCGCCUCCGGCGAACCCGCAAGGGAUAACGUCGGAAAGCUGCUGAGCAGGGUCCGUGGAUCCUGAGGGAUGAGGGGUUCACCCCUCAUCCCGAUAAGGAGAGAGUGAUGAGAAAGGGCAGAAGACAGGAGGUGACCGGCCUUGUGGUCAACUCCGACACUCCC AGCGUG AGCAGGGAGACCAGAAGAAGGCUGAGAGCCGCUCUGCACAGAGCCUCGCAGCCUGACGCUGCGAGCAAACCUGCACAUUGGCAGGGCCAUACGGCCCAGCCAAGUCAGCUGCUGGGCCUGGCCACAUUCGUGCAUCAGAUCGACCCCAAGCAGGGCAAGACCCUGCUCGCUGACGCUCAGCAACUGAUGCGCAGCCCUAUCGACCGCGCUAAUGACGCGGCGAAGUCUGCUAGCAGGGCUGACGCUGCU CAGCAG AGCUUCAGAGUGCUGGCUGCUGCCGGCAAGCCACCAGUUCUUGCCGACGGCAAGAACUGGUGGCAACCUGCUCCUCGCCACACCUGUGCCGGAGAAGACCGACCAGCAGAGAAGAGAGGAAAGGCAGGCUACCAGAAGGCAGCAGGCCGCUGCUGCUGCUCCUCCUCCUAGCAGCACAAGAAGAAACGAGCGGCCGCAACAGGCCGCUCAUGAGCAGCAGGGAGAUGCCCAGCCUCAGAACGAGGC CCCUCCU AGAUUCGACCCCGACCAGUACGCUCCUCCCCCGAGGAACGUGAUGACCUACUGGGCCCAGAUCGCCAUCAGCUUCUUUCUGGGCAGCAUCCUGCACAACAGACUGAUCACGAUCUUCGCCAUGGUGGCGGUGAUCGCUCUGUACUACAUGAGAAGACAGAGAUGGGAUGUCUUCAUGGGCAUCCUGGUGGUGGCCACCCUGCUGGGAUACCUGGUCAGGGGCAUGGGCccaccaaagaaaaaga aaggucuga SEQ ID NO: 19339 6083 v1_ncRNA_AAVS1 gRNA_25bp The R6083 retrotranscript contains a 25 bp insertion at the AAVS1 locus GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUUCUGUCCCCUCCACCCCACAGUGGGGCCACUAGGGACAGAAAUGACAGUGGUUGGUGCUCUAAAAAUUAAUGACAGAAAAGCCCCAUCCUUAGGCCUCCCUCCUUCCUAGUCUC CUGAUAUUGGGUCUAACCCCCAAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGCAAUAACGGGGCCACUAGGGACAGGAUguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19340 6083 v1_ncRNA_only_25bp The R6083 retrotranscript contains a 25 bp insertion at the EMX1 locus GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaag caggccaauggggaggacaucgaugucaAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGC SEQ ID NO: 19341 6083 v1_ncRNA_EMX1 gRNA_25bp The R6083 retrotranscript contains a 25 bp insertion at the EMX1 locus and sgEMX1 at the 3' end GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAAUGACUCCUAUAaaguucucccaucacaucaaccgguggcgcauugccacgaag c aggccaauggggaggacaucgaugucaAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGCAAUAACAACgaguccgagcagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuu uu SEQ ID NO: 19342 6083 v1_ncRNA_EMX1 gRNA_GFP gene The R6083 retrotranscript contains a GFP gene inserted at the EMX1 locus and sgEMX1 at the 3' end GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaccauauguccuuccgugagagacacaaaaaauuccaacacaccuauugcaaugaaaauaa auuuccuuuuuagccagaagucagaugcucaaggggcuucaugauguccccauaauuuuuggcagagggaaaaagaucucagugguauuugugagccagggcauuggccaccagccacccaccuucugauaggcagccugcaccugaggagugcggccgcuuuacuuguacagcucguccaugccgagagugauccc ggcggcggucacgaacuccagcaggaccaugugaucgcgcuucucguuggggucuuugcucagggcggacugggugcucagguaguguugucgggcagcagcacggggccgucgccgauggggguguucugcugguaguggucggcgagcugcacgcugccguccucgauguuguggcggaucuugaaguucaccuugaugccguucuucugcuugcggc caugauauagacguuguggcuguaguaguuguacuccagcuuggccccaggauguugccguccuccuugaagucgaugcccuucagcucgaugcgguucaccaggggugucgcccucgaacuucaccucggcgcgggucuuguaguugccgucguccuugaagaagauggugcgcuccuggacguagccuucgggcauggcggacu ugaagaagucgugcugcuucauguggucgggguagcggcugaagcacugcacgccguaggucaggguggucacgagggugggccagggcacgggcagcuugccgguggugcagaugaacuucagggucagcuugccguagguggcaucgcccucgcccucgccggacacgcugaacuuguggccguuuacgucgccguccagcucgaccaggaugggcaccacccc ggugaacagcuccucgcccuugcucaccaugguggcgaccgguggaucccgggcccgcgguaccgucgacugcagaauucgaagcuugagcucgagaucugaguccgguagcgcuagcgggaucugacgguucacuaaacccuguguucuggcggcaaacccguugcgaaaaagaacguucacggcgacuacugcacuuauau acgguucucccccacccucggggaaaaaggcggagccaguacacgacaucacuuucccaguuuaccccgcgccaccuucucuaggcaccgguucaauugccgaccccuccccccaacuucucggggacugugggcgaugugcgcucugcccguucucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaAG CACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGCAAUAACgaguccgagcagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19343 6083 v1_ncRNA_EMX1 gRNA_50bp The R6083 retrotranscript contains a 50 bp insertion at the EMX1 locus and sgEMX1 at the 3' end GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUacaaacggcagaagcuggagggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguucucca caucaaccgguggcgca uugccacgaagcaggccaauggggaggacaucgaugucaAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagu cggugcuuuuu SEQ ID NO: 19344 6083 v1_ncRNA_EMX1 gRNA_100bp The R6083 retrotranscript contains a 100 bp insertion at the EMX1 locus and sgEMX1 at the 3' end GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUA UCAUGGGGCCCAUGGUUGAAUGACUCCUAUAaaguuc ucccaucacaucaaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuauca acuugaaaaaguggcaccgagucggugcuuuuu SEQ ID NO: 19345 6083 v1_ncRNA_EMX1 gRNA_75bp The R6083 retrotranscript contains a 75 bp insertion at the EMX1 locus and sgEMX1 at the 3' end GCUCCGGAGCAAUGAGCAGGCUCUUGCAAUCCGGGCGGUGUUUCGCCGCCCUUGUGAACUGCCGUUUCAUGCACCACGGGCGCCGUUUUCACUGUGCCACCCCAGCCACGGUAGUGCUacaaacggcagaagcuggaggaggaagggccugaguccgagcagaagaaaACUCGAGCUCUGAGCCCCACCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGA CUCCUAUAaaguucucccaucacauc aaccgguggcgcauugccacgaagcaggccaauggggaggacaucgaugucaAGCACUACCGUGGCGGGGUGGCGUCGAGCGAACAGCUCCCGUCCCCUGAGCCCUACAGGCUCUUGGACGAGAUGCACAUUGCUCCGGAGCAAUAACAACgaguccgagcagaagaagaaguuucagagcuaugcuggaaacagcauagcaaguugaaauaaggcuaguccguuaucaacuugaaa aaguggcaccgagucggugcuuuuu SEQ ID NO: 19346 Aco1_ncRNA-EMX1_100 Aco1 ncRNA contains a 100 bp insertion in the EMX1 gene GGGAUAAUGUAUAAAACCGGGAACGAUCAGACCGGGGUGAAUUCGCCCCCUUGAUCAAACGGCACUAACCACUGUUUGCCGUGCGUGCGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAAAGACUCCUAUAAAGUUCUC CCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAACGCACGCACGGCAAACAGACAGAUCCAUUAUUAUACAAUUUAUUUAGUGAUCGUUCCCGGUUUUAUACGAU SEQ ID NO: 19347 Aco1_ncRNA-EMX1_205 Aco1 ncRNA contains a 205 bp insertion in the EMX1 gene GGGAUAAUGUAUAAAACCGGGAACGAUCAGACCGGGGUGAAUUCGCCCCCUUGAUCAAACGGCACUAACCACUGUUUGCCGUGCGUGCGUACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCG CAUGUAGGAUGGUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAACGCACGGCAAACAGACAGAUCCAUUAUUAUUACAAUUUAUUUAGUGAUCGUUCCCGGUUU UAUACGAU SEQ ID NO: 19348 Eco3_ncRNA-EMX1_205 Eco3 ncRNA contains a 205 bp insertion in the EMX1 gene GGGAUAAUUGAUAAAGAUUCCGUAAGAGCCAAACCUAGCAUUUUAUGGGUUAAUAGCCCAUCGGGCCAUGAGUCAUGGUUUCGCCUAGUAUUUUAGCUAUGCCCGUCGUUCAGUUCGCUGAACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGU GUACCUGCAAGCCGAG AUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCCAGACACCGGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAUCAGCGAACUGAUCGACGUGCUCAAGUAGGUUUGGCUCUUACGGAAUCU UAUCAGAU SEQ ID NO: 19349 R2781_RT-1 R2781 Retrotranscript AGGAUAAUGGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGGAUCCGCCACCAUGGGUUACAACUACGAGUACACCAUCAGGCUGUGUGAGACGCUCAAGUCCUUGAGCGCUGACGACGUGUACAUCAGCAAGUGCUGCAACUACGCCGAGGGACUCCUGGAUAAGGAGCUCCCUGUGAUCUUCGACCCCACCCACCUGAAGCAGAUCCUGAGGCUGGACGACAUCAGCCUCGACGAGUACCACAUCU UCUAUAUCGACAAGAAGAACGGGGGCAGCAGAGAGAUCAAUGC CCCCAGCGAGGAGCUGAAGAAGCGCCAACGCUGGAUUCUCAAGAACAUCCUUGAGAAGAUCAGCAUCAGCCACAACGUGCACGGCUUCAUCAAGGGCAAGAGCAUUGUCAGCAAUGCUCGCAAGCACCUGAACAAGGAGUAUGUGCUGAACAUCGACAUCAAGGACUUCUUCCCCAGCGUCACCAAAUACUCCGUGGAGAAGAUCUUCCGGAGAAUGGGCUACUGCAACAGCGUGGCCCAGCUGCUUGCCCGC GUCUGUUGUUACAGAGGCGGGCUGCCUCAGGGAGCUCCUACAAGCCC CUACCUGGCCAACCUCGCUUUUGACGAGGUUGACCAGGAGAUCAUCAACGUGGUGAGAAACCGGGACAUCACCUACACCAGAUACGCCGACGACAUGACCUUCAGCGCCAACUACGACCUGAGCACCUUCAAGAAGGAGGUGUACAAGAGCCUGGGCAAAUACAGAUUCAGCCCCAACAUCAUGAAGACUCACCAGAUGUCUGGCGAGAAGAGAAAGCUGGUGACCGGCCUGAUCGUGGACGACAAGG UGAAGGUGUGCAAGAAGUACAAGAGAAAGCUGCGGCAGGAGAUCUACUACUG CAAGAAGUUCGGCGUGACCAACCACCUGAGAAACUGCCACAGCGAGAAGUCCAUCAACUACAAGGAGUACCUGUACGGCAAGGCCUACUUCAUCAAGAUGGUCGAGGAGAUCGUCGGCGAAGUUCCUGGCCGACCUGGACAGCAUUGACUGGUACCCACCAAAGAAGAAAAGAAAGGUCUGACUCGAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCCAGCCCUCCUCCCCUUCCU GCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGC SEQ ID NO: 19350 R2781_nc-01_25bp_49-65HA R2781 ncRNA contains a 25 bp insertion in the EMX1 gene with 49 and 69 bp homology arms GGGAUAAUAAGAGCAACUAGAUUGAGGCGAUUCGCCUCCUUGGAAAAGGGUACUAAGUUUCUGUCACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGACAGAAAUGAAAUAAAUAGUUGCUCUU SEQ ID NO: 19351 R2781_nc-02_205bp_49-65HA R2781 ncRNA contains a 205 bp insertion in the EMX1 gene with 49 and 69 bp homology arms GGGAUAAUAAGAGCAACUAGAUUGAGGCGAUUCGCCUCCUUGGAAAAGGGUACUAAGUUUCUGUCACAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCGAGUUCCGUCCAGUUGAGCGUGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCCAGACACC GGGGCACCAGUUUUCACGCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGACAGAAAUGAAAUAAAUAGUAGUUGCUCUU SEQ ID NO: 19352 R2781_nc-03_405bp_49-65HA R2781 ncRNA contains a 405 bp insertion in the EMX1 gene with 49 and 69 bp homology arms GGGAUAAUAAGAGCAACUAGAUUGAGGCGAUUCGCCUCCUUGGAAAAGGGUACUAAGUUUCUGUCACAAAACGGCAGAAGCUGGAGGAGGAAGGGCCUGAGUCCGAGCAGAAGAACUGCUAAAUCCGCGUGAUAGGGGAUUUGAAGUUUAAUCUUCUAUCGCAAGGAACUGCCGAUCUUAAUGGAUGGCCGGAGGUGGUAUGGAAGCUAUAAGCGCGGGUGAGAGGGUAUUAGGCGUGU UCACCUACGCUACGCUAACGGGCGAUUCUAUAAGAUUGCACAUUGCGUCAACUCAUAAGAUGUCUCAACGG CAUGCGCAACUUGUGAAGUGUCUACUAUCCUUAAACGCAUAUCUCGCACAGUAACUCCCGAAUAUGUCCGGCAUCUGAUGUUGCCCGGGCCGAGUUAGUGUUGAGCUCACGGAACUUAUUGUAUGAGUAGUGAUUUGUAAGAGUUGUCAGUUAGCUCGUUCAGGUAAUAGUUGCCCACACAACGUCAAAAUAAGAGAACGGUCGUAACAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUG CCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCAGACAGAAAUGAAAUAAAUAGUAGUUGCUCUU SEQ ID NO: 19353 R2781_nc-04_25bp_30HA R2781 ncRNA contains a 25 bp insertion in the EMX1 gene with 30 bp homology arms on each side GGGAUAAUAAGAGCAACUAGAUUGAGGCGAUUCGCCUCCUUGGAAAAGGGUACUAAGUUUCUGUCGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAACUCGAGCUCUGAAUGACUCCUAUAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCGACAGAAAUGAAAUAAAUAGUAGUUGCUCUU SEQ ID NO: 19354 R2781_nc-05_205bp_30HA R2781 ncRNA contains a 205 bp insertion in the EMX1 gene with 30 bp homology arms on each side GGGAUAAUAAGAGCAACUAGAUUGAGGCGAUUCGCCUCCUUGGAAAAGGGUACUAAGUUUCUGUCGGAGGAAGGGCCUGAGUCCGAGCAGAAGAAAAAGCGGUCUUACGGUCAGUCGUAUUCCUUCUCGAGUUCCGUCCAGUUGAGCGUCACUCCCAGUGUACCUGCAAGCCGAGAUGGCUGUGCUUGGAGUCAAUCGCAUGUAGGAUGGUCCAGACACCGGGGCACCAGUUUUCAC GCCUAAAGCAUAAACGACGAGCAGUCAUGAAAGUCUUAGUACUGGACGUGCCGUUUCACAAAGUUCUCCCAUCACAUCAACCGGUGGCGCAUUGCGACAGAAAUGAAAUAAAUAGUAGUUGCUCUU SEQ ID NO: 19355 R2781_nc-06_405bp_30HA R2781 ncRNA contains a 405 bp insertion in the EMX1 gene with 30 bp homology arms on each side GGGAUAAUAAGAGCAACUAGAUUGAGGCGAUUCGCCUCCUUGGAAAAGGGUACUAAGUUUCUGUCGGAGGAAGGGCCCUGAGUCCGAGCAGAAGAACUGCUAAAUCCGCGUGAUAGGGGAUUUGAAGUUUAAUCUUCUAUCGCAAGGAACUGCCGAUCUUAAUGGAUGGCCGGAGGUGGUAUGGAAGCUAUAAGCGCGGGUGAGAGGGUAAUUAGGCGUGUUCACCUACGCUACGCUA ACGGCGAUUCUAUAAGAUUGCACAUUGCGUCAACUCAUAAGAUGU CUCAACGGCAUGCGCAACUUGUGAAGUGUCUACUAUCCUUAAACGCAUAUCUCGCACAGUAACUCCCGAAUAUGUCCGGCAUCUGAUGUUGCCCGGGCCGAGUUAGUGUUGAGCUCACGGAACUUAUUGUAUGAGUAGUGAUUUGUAAGAGUUGUCAGUUAGCUCGUUCAGGUAAUAGUUGCCCACACAACGUCAAAAUAAGAGAACGGUCGUAACAUAAAGUUCUCCCAUCACAUCAACCGGUG GCGCAUUGCGACAGAAAUGAAAUAAAUAGUAGUUGCUCUU SEQ ID NO: 19356 R1262_nc-46 (ncRNA-EMX1 site 1-deletion) R1262 ncRNA contains EMX1 deletion (del1) GAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUUAACCCUAUGUAGCCUCAGUCUUCCCAUUCAGGCUCAGCUCAGCCUGAGUGUUGAGGCCCCAGUGGCUGCUCUGGGUGGGCAACCACAAAACCCACGAGGGCAGAGUGCUGCUUGCUGCUGCAUAAUGAGUUGUCACA CGCUC SEQ ID NO: 19357 R1262_nc-47 (ncRNA-EMX1 site 2-deletion) R1262 ncRNA contains EMX1 deletion (del2) GAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUCAGAAGAAGAAGGGCUCCCAUCACAUCAACCGGUGGCGCAUUGCCACGAAGCAGGCCAAUGGGGAGGACAUCGAUGUCACCUCCAAUGACCGCAGCCUCCCAGCUGCUCUCCGUGUCUCCAAUCUCCCUUUUGUUUUGAU GCAGCAUAAUGAGUUGUCACACGCUC SEQ ID NO: 19358 R1262_nc-61 (ncRNA-EMX1 sgRNA tandem_sense)-001 The R1262 ncRNA contains a sense sequence with a tandem sgRNA target for insertion into EMX1 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUgaguccgagcagaagaagaagggAUAGGAGUCAUUCAACCAUGGGCCCCAUGAUACUGAAAGUGCUGGCAAGCUGCGGGAGCGAGAUACAUACUUCUCGACAGUGGGGCUCAGAGCUCGAGUugaguccgagcaga agaagaagggGCAUAAUGAGUUGUCACACGCUC SEQ ID NO: 19359 R1262_nc-62 (ncRNA-EMX1 sgRNA tandem_antisense)-001 The R1262 ncRNA contains an antisense sequence with a tandem sgRNA target for insertion into EMX1 GGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUgaguccgagcagaagaagaagggAUAGGAGUCAUUCAACCAUGGGCCCCAUGAUACUGAAAGUGCUGGCAAGCUGCGGGAGCGAGAUACAUACUUCUCGACAGUGGGGCUCAGAGCUCGAGUugaguccgagcaga agaagaagggGCAUAAUGAGUUGUCACACGCUC SEQ ID NO: 19360 R1262_nc-63 (ncRNA_NHEJ_sense)-001 R1262 ncRNA contains a sense sequence to insert into EMX1 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUAAGGAGUCAUUCAACCAUGGGCCCCAUGAUACUGAAAGUGCUGGCAAGCUGCGGGAGCGAGAUACAUACUUCUCGACAGUGGGGCUCAGAGCUCGAGUuGCAAUGAGUUGUCACACGCUC SEQ ID NO: 19361 R1262_nc-64 (ncRNA_NHEJ_antisense)-001 R1262 ncRNA contains an antisense sequence to insert into EMX1 GGGAUAAUGAGCGUGUGACAAGAUUUUGGGCUUGUGUUUCGCAAGCUUUGAUUAAAAUGGUAGAUGGAUUUGCUAUCUAUCGUCAUUUCCAGUACUGUUAUGUaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUGCAUAAUGAGUUGUCACACGCUC SEQ ID NO: 19362 R6342S_nc-63 (ncRNA_NHEJ_antisense)-001 R6342S ncRNA contains a sense sequence to insert into EMX1 GGGAUAAUCAUAGAUUUCUUGGCCUUUAUGCUGUGGUGUUGCGCCACGGUGGAGAUUUGUCAAAUACACAUCAUUAGGUUGCGaACUCGAGCUCUGAGCCCCACUGUCGAGAAGUAUGUAUCUCGCUCCCGCAGCUUGCCAGCACUUUCAGUAUCAUGGGGCCCAUGGUUGAAUGACUCCUAUCACAACCAAAUAUAAGAAUUGUUAGCAAGAAAUCUAUGACAUGAGGAUCACCCAUGU SEQ ID NO: 19363 R6342S_nc-64 (ncRNA_NHEJ_sense)-001 R6342S ncRNA contains an antisense sequence to insert into EMX1 GGGAUAAUCAUAGAUUUCUUGGCCUUUAUGCUGUGGUGUUGCGCCACGGUGGAGAUUUGUCAAAUACACAUCAUUAGGUUGCGAUAGGAGUCAUUCAACCAUGGGCCCCAUGAUACUGAAAGUGCUGGCAAGCUGCGGGAGCGAGAUACAUACUUCUCGACAGUGGGGCUCAGAGCUCGAGUuCACAACCAAAUAUAAGAAUUGUUAGCAAGAAAUCUAUGACAUGAGGAUCACCCAUGU SEQ ID NO: 19364

以下各圖形成本說明書之一部分且包括在內以進一步展示本揭示案之某些態樣,藉由參考此等圖中之一或多者且結合本文所呈遞之特定實施例的詳細描述可更好地理解該等態樣。The following drawings form a part of this specification and are included to further demonstrate certain aspects of the present disclosure, which aspects may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.

圖1A為示意圖,描繪藉由產生msDNA嵌合微衛星分子自基因體DNA階段獲得的天然存在之逆轉錄子。逆轉錄子在細菌基因體中編碼且包含非編碼RNA (ncRNA)部分及編碼特殊逆轉錄酶(RT)之部分。ncRNA及RT最初自逆轉錄子DNA轉錄為單一多順反子資訊。對初始轉錄本進行處理,導致編碼逆轉錄子RT之轉錄本的移除或分離。剩餘轉錄本為ncRNA,其經歷折疊形成二級結構,該二級結構具有數個特徵性莖-環及在ncRNA之5'與3'區域之間形成的雙鏈體(亦即,a1/a2雙鏈體)。經折疊之ncRNA由伴隨之RT識別,該RT單獨經轉譯且以 反式提供。經轉譯之RT通常識別ncRNA中之某些二級結構,且結合 msd區域下游之RNA模板。RT自ncRNA內之雙股RNA結構(a1/a2區域)後立即發現的保守鳥苷(G)殘基之2’端開始,朝向其5'端起始RNA之逆轉錄。ncRNA之一部分(亦即, msd區域)充當逆轉錄之模板,且逆轉錄在到達 msr基因座之前終止。在逆轉錄期間,細胞RNase H降解充當模板之ncRNA區段,但不降解ncRNA之其他部分。逆轉錄之結果msDNA (示意圖右下方)經由2’-5'磷酸二酯鍵保持與RNA模板共價連接,且使用msDNA之3'端與RNA模板進行鹼基配對。 圖1B為示意圖,描繪本揭示案所考慮之重組逆轉錄子之一實施例。在此實施例中,核酸分子包含編碼逆轉錄子ncRNA區域(msr/msd區域)之核苷酸序列,如左上方所描繪。該msr區域已藉由引入一或多個核苷酸修飾(例如,核苷酸取代、缺失或插入)而經修飾。例如,可能需要在msr中引入一或多個核苷酸取代以增強功能性(例如,相應ncRNA與RT之結合、經改良之穩定性、經改良之折疊等)。經修飾之msr係稱為msr’。此外,該msd已藉由引入編碼HDR供體模板之異源核苷酸序列而經修飾。最後,逆轉錄子DNA已經修飾以在逆轉錄子DNA序列之3'端引入編碼引導RNA之核苷酸序列,該逆轉錄子DNA經組態於DNA載體(例如,質體)上。該DNA顯示出經轉錄為多順反子資訊,該多順反子資訊包括msr’/msd’區域(形成ncRNA),該區域在其3'端融合至引導RNA (在其他實施例中,引導RNA可能融合至逆轉錄子ncRNA之5'端)。此中間體顯示出與以 反式提供之逆轉錄酶(例如,藉助於單獨表現載體或遞送之mRNA)形成複合物。右上方示意圖顯示出重組ncRNA與RT之間形成複合物且開始使用msd RNA作為模板序列自共價連接之保守鳥苷(G) (亦即,「啟動性G」或「啟動性鳥苷」)進行逆轉錄。在模板RNA序列完成逆轉錄及RNaseH降解後,形成重組msDNA,其包含三個修飾,如下所示:(a)連接至msDNA之3'端的引導RNA,(b) msr’中之核苷酸變化,以及(c)逆轉錄之單股DNA包含作為HDR供體模板之區域。此類重組msDNA可能接著促進細胞中之各種基因體修飾應用,包括使用以 反式提供至細胞的RNA引導之核酸酶進行基因體編輯。 圖1C為示意圖,描繪本文所述之基於重組逆轉錄子之基因體編輯系統。在涉及RNA引導之核酸酶之基因體編輯的情況下,此類系統之組分可包括(a)以 順式(例如,與重組逆轉錄子msDNA融合)及/或以 反式(例如,單獨在細胞中表現)提供之引導RNA,(b)重組ncRNA (至少包括編碼HDR供體模板之序列及視情況選用的與ncRNA融合之引導RNA),(c)逆轉錄酶,以及(d)可程式化核酸酶。此等組分藉由遞送方式(例如,LNP、脂質體、基於病毒之遞送或被動/主動轉運)以DNA及/或RNA及/或蛋白質之形式提供至細胞。一旦進入細胞內部,即形成重組msDNA。該msDNA及該可程式化核酸酶易位至核酸酶以在標靶DNA位點處進行基因編輯,由此產生經編輯之DNA標靶。 圖1D提供逆轉錄子之自然生命週期的簡化示意圖。逆轉錄子通常包含逆轉錄酶(RT)及兩個經轉錄為單一RNA之非編碼連續反向序列( msrmsd),該單一RNA經折疊成特定二級結構。指示出逆轉錄子RT中之保守NAXXH模體及VTG三聯體。RT與 msd基因座下游之RNA結合,從而在充當引子之保守分支鏈G殘基中存在之2'OH基團的協助下,起始RNA模板朝向其5'端逆轉錄。逆轉錄在到達 msr基因座之前停止,且所得msDNA經由2'-5'磷酸二酯鍵以及該等分子之3'端之鹼基配對,保持共價連接至RNA模板。 圖1E提供逆轉錄子在細胞中之自然生物路徑的詳細表示,以msDNA衛星分子之產生結束。此圖與圖1D平行,但更完整地描繪msDNA產生之階段。(1)描繪逆轉錄子基因座,其包括具有 msr基因座及 msd基因座(兩者均為非編碼的)之ncRNA基因座以及逆轉錄酶(RT)基因座。ncDNA基因座及RT基因座經轉錄為單一RNA轉錄本,該轉錄本描繪於(2)中。表示(1)中之每種組分的顏色貫穿階段(2)至(6)中之每一者。階段(3)及(4)描繪ncRNA部分折疊成一系列莖環,其中ncRNA之5'端及3'端形成雙鏈體。此外,亦顯示具有2’OH基團之保守分支鏈鳥苷殘基之位置。分支鏈鳥苷充當逆轉錄酶之未來啟動位點。階段(4)進一步顯示出編碼逆轉錄酶之轉錄本區域經移出,單獨經轉譯以產生該逆轉錄酶。在階段(5)中,該逆轉錄酶與經折疊之ncRNA結合且開始自引子位點(亦即,具有2’OH末端之保守分支鏈鳥苷殘基)且使用msd RNA序列作為模板來聚合單股DNA (亦即,逆轉錄產物)。逆轉錄終止於msr區域處。該msd RNA模板以核酸外切方式經移除,由此產生包含msr RNA區域之嵌合分子,該嵌合分子藉由與保守鳥苷引子殘基共價連接而共價接合至ssDNA轉錄產物。msr RNA之3'端與ssDNA逆轉錄產物之3'端之間亦形成一個短雙鏈體區域。完整分子係稱為「msDNA」。 圖1F為示意圖,描繪本文所揭示之基於重組逆轉錄子之基因體修飾系統可實現為(a)細胞記錄器系統、(b)基因體編輯系統及(c)重組工程系統。此等用途並不意欲為限制性的。 圖1G為示意圖,描繪針對本文所揭示之重組逆轉錄子所考慮的各種組態。(1)顯示野生型逆轉錄子之操縱子結構;(2)顯示重組逆轉錄子之操縱子結構,該重組逆轉錄子經組態為編碼最終msDNA分子中之HDR供體模板;(3)顯示(2),但進一步經修飾以編碼該逆轉錄子之3'端的引導RNA;(4)顯示(2),但進一步經修飾以編碼 反式引導RNA。 圖1H為示意圖,其強調考慮用於將本文所揭示之基於重組逆轉錄子之基因體修飾系統的組分呈遞至細胞之任何合適組態,包括相對於逆轉錄子ncRNA以 反式提供RT及/或可程式化核酸酶的情況。在一些實施例中,RT及可程式化核酸酶可作為融合蛋白提供。 圖1I描繪出RT及可程式化核酸酶可作為融合蛋白提供(頂部、中間)或彼此分開提供。 圖1J描繪出核定位信號可經工程改造至本揭示案之多肽(例如,RNA引導之核酸酶)中以促進易位至其中發生編輯之細胞的核酸酶中。 圖1K係描繪基因體編輯之實施例的示意圖(未按比例),其中由合適核酸酶(諸如CRISPR/Cas效應酶、ZFN、TALEN、大範圍核酸酶、TnpB、IscB或限制酶(Res))產生之雙股斷裂(DSB)促進插入供體或模板序列(此處顯示為「標記物」,側接與「供體載體」上提供的側接DSB之彼等序列相匹配之同源序列)。 圖1L描繪三種不同種類之逆轉錄子(Eco1-R1、Eco3–R2、Eco5-R3)在特定基因體位點(EMX1基因)處插入16個鹼基對之插入的能力。 圖1M概述用於評估逆轉錄子產生基因體插入之能力之程序。 圖2(SEQ ID NO:19970)為IA/IIA1型逆轉錄子之逆轉錄子ncRNA msr/ msd的共有二級結構之示意圖,該逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖3(SEQ ID NO:19971)為IB1型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖4(SEQ ID NO:19972)為IB2型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖5(SEQ ID NO:19973)為IC型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖6(SEQ ID NO:19974)為IIA型其他逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖7(SEQ ID NO:19975)為IIA2型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖8(SEQ ID NO:19976)為IIA3型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖9(SEQ ID NO:19977)為IIA4型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖10(SEQ ID NO:19978)為IIA5型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖11(SEQ ID NO:19979)為IIIA1型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖12(SEQ ID NO:19980)為IIIA2型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖13(SEQ ID NO:19981)為IIIA3逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖14(SEQ ID NO:19982)為IIIA4型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖15(SEQ ID NO:19983)為IIIA5型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖16(SEQ ID NO:19984)為IIIunk型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖17(SEQ ID NO:19985)為IV型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖18(SEQ ID NO:19986)為IX型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖19(SEQ ID NO:19987)為V型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖20(SEQ ID NO:19988)為VI型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖21(SEQ ID NO:19989)為XI型組1逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖22(SEQ ID NO:19990)為XI型組2逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖23(SEQ ID NO:19991)為XII型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖24(SEQ ID NO:19992)為XIII型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖25(SEQ ID NO:19993)為XIV型逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖26(SEQ ID NO:19994)為Ec107逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖27(SEQ ID NO:19995)為外群A逆轉錄子之逆轉錄子ncRNA msr/msd的共有二級結構之示意圖,該等逆轉錄子藉由如實例3中所述之來自表B之ncRNA序列的計算結構比對產生。 彩色點表示鹼基位於彼位置處之概率( 例如,紅色圓圈表示在97%之情況下存在鹼基,黑色表示在90-97%之情況下存在鹼基,灰色表示在75-90%之情況下存在鹼基,且白色表示在50-75%之情況下存在鹼基),如與間隙(無鹼基)相對,而彩色字母表示保守程度不同之鹼基( 例如,其中紅色表示97%+保守,黑色為90%+保守,且灰色表示至少75%保守)。每個突出顯示之鹼基對表示顯著共變鹼基對。 圖28係根據實例3構建之RT序列之種系發生樹。 圖29係與圖28中之每種逆轉錄子類型相關的逆轉錄子基因座之結構表示。 圖30顯示圖28之逆轉錄子種系發生樹內的某些逆轉錄子(EcoI、Eco3、Eco5、AcoI、RTX003_2042、RTX003_6083v1及RTX003_6943)之位置。 圖31A為本文實例中所測試之例示性逆轉錄子(EcoI)之質體圖。 圖31B為圖31A之質體圖的5’至3’方向之線性表示。 圖31C為本文實例中所測試之例示性逆轉錄子(RTX3_6083v1)之質體圖。 圖31D為圖31C之質體圖的5’至3’方向之線性表示。 圖32表示用於量測逆轉錄子精確編輯及插入缺失之基於質體之分析,如實例中所執行。在步驟1中,將質體(例如,圖31A或圖31C之質體)轉染至經工程改造以表現Cas9之人類細胞(例如,HEK293t細胞)中。使編輯在37℃下進行72小時。在步驟2中,自細胞中提取基因體DNA且用於製備下一代測序(NGS)文庫以進行測序。在編輯之標靶位點(例如,EMX1)上對該文庫進行測序以產生序列讀數。在步驟3中,分析測序讀數以獲得含有所需編輯之序列讀數的頻率(精確編輯之百分比)及所需編輯位點處之插入缺失的頻率(插入缺失之百分比)。 圖33為圖32之等效表示。 圖34表示用於轉染HEK293T細胞之方法。轉染前一天,將細胞接種於24孔板中。使適量質體及轉染試劑(例如,Lipofectamine 3000)混合且轉移至細胞中。72小時培育後,提取基因體DNA且將標靶編輯區域擴增至測序文庫中。藉由CRISPResso2分析測序資料且計算精確編輯及插入缺失之百分比。 圖35(SEQ ID NO:19996-20003) (頂部至底部)係EMX1基因體位點處之Eco3逆轉錄子之參考序列及所需編輯結果的實例。使用CRISPresso2管線對編輯結果進行分析。在此實例中,編輯模板將10 bp插入插入至 EMX1基因(TTACGTCTGC) (SEQ ID NO: 19931)中,同時插入6 bp取代以使PAM序列突變(GAAGGG>AAAGTT) (SEQ ID NO: 19954)。 圖36顯示基於質體之分析(例如,根據圖33)的結果,證明在表現Cas9之HEK293T細胞中使用Eco1逆轉錄子實現高達約0.3%之精確編輯及低至40%之插入缺失。經由使用兩種不同量之Lipofectamine進行脂質轉染來轉染編碼靶向EMX1之Eco1 RT及Eco1 ncRNA-sgRNA融合之質體。 圖37顯示基於質體之分析(例如,根據圖33)的結果,證明使用AcoI實現高達約0.1%之精確編輯及低至3%之插入缺失。Aco1逆轉錄子尚未以實驗方法驗證出產生msDNA。此實驗中觀察到的精確編輯活性強烈地支持Aco1逆轉錄子能夠在人類細胞內生成msDNA。 圖38顯示基於質體之分析(例如,根據圖33)的結果,證明使用RTX003_2042實現高達約0.3%之精確編輯及低至5%之插入缺失。此逆轉錄子可實現與Eco1可相當之精確編輯,但插入缺失顯著較低(10倍)。RTX003_2042為新穎逆轉錄子,且此實驗中觀察到的精確編輯活性強烈地支持RTX003_2042逆轉錄子可能在人類細胞內生成msDNA。 圖39A顯示基於質體之分析的結果,證明使用RTX003_6083v1及6943實現高達約0.05~0.08%之精確編輯及低至2.5~4%之插入缺失。兩者均為新穎逆轉錄子,且此實驗中觀察到的精確編輯活性強烈地支持RTX003_6083v1及6943逆轉錄子可能在人類細胞內生成msDNA。 圖39B顯示使用圖39A之相同分析之後續實驗,指示RTX3_6083v1及RTX3_6943生成比Eco1多3-4倍之精確編輯,而由此兩種逆轉錄子生成之插入缺失低2-3倍。RTX3_2042顯示與Eco1相似頻率之精確編輯,但具有比其他樣品更大之變異性。 圖39C顯示使用圖39A之相同分析之後續實驗,指示RTX3_6083v1及RTX3_6943生成比Eco1多3-4倍之精確編輯,而由此兩種逆轉錄子生成之插入缺失低2-3倍。RTX3_2042顯示與Eco1相似頻率之精確編輯,但具有比其他樣品更大之變異性。 圖39D顯示基於質體之分析的結果,證明與空載體及EcoI相比,使用RTX003_0637、RTX003_1262及RTX003_6342實現高達約0.7%之精確編輯及低至約4%之插入缺失。RTX003_0637、RTX003_1262及RTX003_6342為新穎逆轉錄子,且此實驗中觀察到的精確編輯活性強烈地支持此等逆轉錄子可能在細胞內生成msDNA。 圖39E顯示基於質體之分析的結果,證明包括EcoI、Eco3、RTX3_2042_RT_不活化、RTX3_2042、RTX3_6083v1、RTX3_6943、RTX3_6943、RTX3_1262、RTX3_6342S及RTX3_6342L在內之逆轉錄子陣列之精確編輯及插入缺失生成。與作為對照之空載體及EcoI相比,使用EcoI、Eco3、RTX3_2042_RT_不活化、RTX3_2042、RTX3_6083v1、RTX3_6943、RTX3_6943、RTX3_1262、RTX3_6342S及RTX3_6342L實現高達約2.5%之精確編輯及低至約4%之插入缺失。 圖40表示實例中使用之2-RNA編輯分析,該分析使用兩種RNA組分在HEK293T細胞中的基於電穿孔之遞送來量測例示性逆轉錄子之相對編輯效率。使適量RT mRNA及ncRNA-sgRNA融合混合且電穿孔至細胞中。72小時培育後,提取基因體DNA且將靶向區域擴增至測序文庫中。藉由CRISPResso2分析測序資料且計算精確編輯及插入缺失。 圖41顯示藉由電穿孔遞送至表現Cas9之HEK293T細胞的2-RNA系統(RT mRNA + ncRNA-sgRNA融合)之結果。對Eco1、Eco3及Eco5逆轉錄子進行測試。結果顯示,針對Eco3之精確編輯(左圖)高達0.4%且針對Eco3之插入缺失低至10% (右圖)。Eco3介導之精確編輯隨著ncRNA-sgRNA融合之量增加而增加,RT mRNA與ncRNA-sgRNA融合之間的比率為1:2至1:4。 圖42顯示藉由電穿孔遞送至表現Cas9之293T細胞的兩組分Eco3 RNA系統(RT mRNA + ncRNA-sgRNA融合)之滴定結果。RT mRNA及ncRNA分別以1:2、1:3、1:4、1:5、1:8、1:10之比率混合,且以兩種不同量之RT mRNA (0.2或0.5 µg)進行遞送。左側資料顯示,0.5 µg Eco3在1:3及1:5之RT mRNA:ncRNA比率時產生最高百分比之精確編輯。右側資料進一步顯示,RT mRNA:ncRNA比率愈相當,導致愈低插入缺失百分比之趨勢。 圖43表示三RNA逆轉錄子編輯系統,該系統涉及藉由電穿孔將三種RNA組分(RT mRNA、逆轉錄子ncRNA-sgRNA融合及Cas9 mRNA)遞送至HEK293T細胞中。使適量RT mRNA、ncRNA-sgRNA融合及Cas9 mRNA混合且電穿孔至細胞中。72小時培育後,提取基因體DNA且將靶向區域擴增至測序文庫中。藉由CRISPResso2分析測序資料且計算精確編輯及插入缺失。 圖44顯示藉由電穿孔遞送至293T細胞的三組分Eco3 RNA系統(RT mRNA + ncRNA-sgRNA融合 + Cas9 mRNA)之Cas9 mRNA滴定結果。使RT mRNA及ncRNA-sgRNA融合以圖上之既定量混合,且對Cas9 mRNA之量進行滴定。在0.2 µg Cas9 mRNA下,觀察到高達0.1%之精確編輯。雖然編輯效率比2-RNA系統低一個個量級,但編輯係藉由Cas9及逆轉錄子之特定作用而發生的,因為缺乏任一者均會取消編輯。 圖45描繪在HEK293T細胞中使用三RNA系統之脂質轉染過程。轉染前一天,將細胞接種於96孔板中。使適量RT mRNA、ncRNA-sgRNA融合、Cas9 mRNA及Lipofectamine試劑混合且轉移至細胞中。72小時培育後,提取基因體DNA且將靶向區域擴增至測序文庫中。藉由CRISPResso2分析測序資料且計算精確編輯及插入缺失。 圖46顯示藉由脂質轉染遞送至HEK293T細胞的三組分Aco1 RNA系統(RT mRNA + ncRNA +Cas9 mRNA)之結果。使RT mRNA、ncRNA及Cas9 mRNA以圖中指示之量混合且轉染至HEK293T細胞中。EMX1基因座處之56 bp插入及6 bp缺失經評分為精確編輯,且左圖中約0.1%之細胞群體已經歷精確編輯。編輯依賴於Cas9核酸酶,因為其缺乏會取消編輯。右圖中之插入缺失頻率為約1.5%。 圖47顯示當sgRNA與Eco3逆轉錄子之ncRNA融合時最小Cas9核酸酶活性之結果。Cas9活性藉由插入缺失頻率進行評估。1 µg ncRNA-sgRNA融合顯示比等莫耳之單獨經分離之sgRNA低20倍的活性。同時,比較化學修飾對未經修飾之sgRNA的活性且在圖中描述之條件下,前者顯示比後者高6倍之活性。 圖48表示實例中所用之全RNA編輯分析以量測全RNA形式之樣品逆轉錄子的相對編輯效率,該等逆轉錄子以 反式引導RNA摻加步驟進行修飾。在HEK293T細胞中使用三RNA系統 + sgRNA反式摻加進行電穿孔。使適量RT mRNA、ncRNA-sgRNA融合、Cas9 mRNA及sgRNA混合且電穿孔至細胞中。72小時培育後,提取基因體DNA且將靶向區域擴增至測序文庫中。藉由CRISPResso2分析測序資料且計算精確編輯及插入缺失。 圖49顯示藉由電穿孔 遞送至HEK293T細胞的全RNA系統(RT mRNA + ncRNA-sgRNA融合 + Cas9 mRNA + sgRNA)中之引導RNA摻加結果。在圖上之既定量的Cas9及RT mRNA下,在50、100及200 ng下對引導RNA摻加之量進行滴定。在兩種不同之RT mRNA: ncRNA-sgRNA融合比率=1:6或1:8下進行滴定。全RNA系統中之引導RNA摻加將精確編輯增加高達約50倍。引導RNA之量的增加使精確編輯逐漸增加且1:8之RT mRNA:ncRNA-sgRNA融合的效果略優於1:6,達到13%之精確編輯。右圖顯示各別條件下之插入缺失頻率。 圖50表示在HEK293T細胞中使用三RNA系統 + gRNA反式摻加之脂質轉染過程。轉染前一天,將細胞接種於96孔板中。使適量RT mRNA、ncRNA-sgRNA融合、Cas9 mRNA、sgRNA及Lipofectamine試劑混合且轉移至細胞中。72小時培育後,提取基因體DNA且將靶向區域擴增至測序文庫中。藉由CRISPResso2分析測序資料且計算精確編輯及插入缺失。 圖51顯示藉由脂質轉染 遞送至HEK293T細胞的全RNA系統(RT mRNA + ncRNA-sgRNA融合 + Cas9 mRNA + sgRNA)中之引導RNA摻加結果(亦即,遞送引導RNA之單獨分子團塊。在圖上之既定量的RT mRNA、ncRNA-sgRNA融合及Cas9 mRNA下,在2、5及10 ng下對引導RNA摻加之量進行滴定。全RNA系統中之引導RNA摻加將精確編輯增加高達3.5倍,效率增加12%。在此範圍內增加引導RNA之量並未進一步增加精確編輯。精確編輯完全地依賴於逆轉錄子機器之存在。右圖顯示各別條件下之插入缺失頻率。 圖52顯示藉由脂質轉染遞送至HEK293T細胞的全RNA系統(RT mRNA + Cas9 mRNA + ncRNA-sgRNA融合或單獨ncRNA + sgRNA)中之ncRNA-sgRNA融合分離結果。在圖上之既定量的RT mRNA及Cas9 mRNA下,在0、2、5、10、50及100 ng下對引導RNA摻加之量進行滴定。在10 ng引導RNA下,精確編輯達到2.23%峰值,相比之下,ncRNA-sgRNA融合為1.78%。在此範圍內增加引導RNA之量並未進一步增加精確編輯。右圖顯示各別條件下之插入缺失頻率。 圖53為用於 活體外轉錄以產生ncRNA (左側或A)及ncRNA修飾(右側或B)之經改良模板之示意圖。(A)係關於藉由活體外轉錄最佳化RNA產生。先前進行的產生RNA之活體外轉錄實驗使用含有3’懸垂(與T7啟動子序列在同一股上)之雙股DNA模板。設計且測試具有鈍端之新模板且如圖54所示,引起增加之精確編輯效率。(B)係關於經修飾之ncRNA,其藉由在ncRNA之3’端添加MS2莖環髮夾進行修飾。不受理論束縛,MS2環有助於穩定ncRNA且引起顯著改良之精確編輯效率,如圖54所示。 圖54顯示藉由Lipofectamine MessengerMAX遞送至HEK293T細胞的4組分全RNA系統(RT mRNA + Cas9 mRNA + ncRNA + sgRNA)中之ncRNA-sgRNA融合分離結果。全RNA以如圖所示之固定量經轉染。使用由含有3’懸垂之線性化質體模板生成的RNA,產生1.35%之精確編輯。使用由含有鈍端之經改良線性化質體模板生成的RNA,將精確編輯增加至5.94%。向ncRNA之3’端(鈍端)添加MS2莖環,進一步將精確編輯增加至12.39%。右圖顯示各別條件下之插入缺失頻率。 圖55(SEQ ID NO: 19969)提供藉由加帽及加尾對RNA進行末端保護以免受細胞核酸酶活性影響之示意圖。在(A)中,將7-甲基鳥苷cap0添加至RNA之5’三磷酸上。在(B)中,藉由酶促添加將聚A尾添加至3’端。尾長據估計超過50個核苷酸。在(C)中,顯示含有5’帽及3’尾兩者之RNA。結果在圖56中示出。 圖56顯示藉由Lipofectamine MessengerMAX遞送至HEK293T細胞的4組分全RNA系統(RT mRNA + Cas9 mRNA + ncRNA + sgRNA)中,藉由帽及尾對ncRNA-sgRNA融合進行之末端保護的結果。全RNA以固定量RT mRNA 100 ng、ncRNA-sgRNA 400 ng、Cas9 mRNA 100 ng及sgRNA 5 ng經轉染。ncRNA-gRNA融合為加帽的(+帽–尾)或加聚A尾的(-帽+尾),或同時加帽及加聚A尾(+帽+尾)。使用無末端保護之RNA(-帽–尾)會產生約4.5%之精確編輯,且編輯依賴於逆轉錄子,因為RT缺乏會取消精確編輯。與無帽及尾之情況相比,使用具有帽及尾保護中之任一者或兩者的RNA會產生較低精確編輯(左圖),但降低插入缺失(右圖)。 圖57顯示縮短 msd莖可以逆轉錄子特異性方式調節精確編輯之結果。經修飾之逆轉錄子包括RTX3_4536 (長(L)及短(S)形式)、RTX3_6279 (長(L)及短(S)形式)、RTX3_6342 (長(L)及短(S)形式)、RTX3_6438 (長(L)及短(S)形式)、RTX3_6549 (長(L)及短(S)形式)及RTX3_6605 (長(L)及短(S)形式)。 圖58顯示縮短 msd莖可以逆轉錄子特異性方式調節精確編輯之結果。經修飾之逆轉錄子包括RTX3_5752 (長(L)及短(S)形式)、RTX3_6221 (長(L)及短(S)形式)及RTX3_6034 (長(L)及短(S)形式)。 圖59A證明來自多種進化枝之四種逆轉錄子能夠向EMX1基因座中插入多達100 bp。圖顯示藉由脂質轉染遞送至HEK293T細胞的全RNA系統中之四種不同逆轉錄子之不同模板長度的測試結果。在既定量之RT mRNA及Cas9 mRNA下,添加具有介於10至100 bp範圍內之不同模板長度的ncRNA-sgRNA融合。對於100 bp插入,Eco3之精確編輯為1.04%,Aco1之精確編輯為1.52%,R2042之精確編輯為1.37%,且R6943之精確編輯為0.05%。在各圖之底部列中,顯示各別條件下之插入缺失頻率。 圖59B證明來自多種進化枝之四種逆轉錄子能夠向EMX1基因座中插入多達100 bp。圖顯示與圖59A之構築體相對應的插入缺失結果,圖示了各別條件下之插入缺失頻率。 圖60A證明靶向相同基因體基因座之額外sgRNA會增加插入缺失及插入頻率。此圖報告與圖59A-59B相同之條件,但添加有sgRNA,這會增加精確編輯及插入缺失之頻率。對於100 bp插入,Eco3之精確編輯為1.82%,Aco1之精確編輯為4.50%,R2042之精確編輯為4.40%且R6943之精確編輯為0.38%。在各圖之底部列中,顯示各別條件下之插入缺失頻率。 圖60B證明靶向相同基因體基因座之額外sgRNA會增加插入缺失及插入頻率。此圖報告與圖60A之逆轉錄子相關的插入缺失頻率。 圖61A為示意圖,顯示未編輯之等位基因及使用逆轉錄子編輯系統在EMX1基因座處藉由GFP基因插入編輯之等位基因。引子對EMX1正向及反向僅在5’及3’同源臂外側雜交且在未編輯之等位基因上擴增169 bp。在經編輯之等位基因上,其擴增1433 bp跨GFP基因插入。經編輯之等位基因上指示出偵測5’及3’接合處之引子。經編輯之等位基因上的5’接合處藉由以下引子對進行擴增:EMX1正向及5’接合處GFP反向,且3’接合處藉由以下引子對進行擴增:3’接合處GFP正向及EMX1反向。5’接合處GFP反向引子及3’接合處GFP正向引子均不與未編輯之等位基因結合。 圖61B證明來自多種進化枝之三種逆轉錄子能夠在EMX1基因座處進行GFP整合。該圖提供qPCR資料,描繪三種逆轉錄子在EMX1基因座中之5’或3’接合處插入GFP。將經工程改造以含有GFP基因插入物之ncRNA-sgRNA與每種逆轉錄子之RT mRNA、Cas9 mRNA一起在具有/不具有額外sgRNA的情況下轉染至HEK293T細胞。藉由所指示之引子對分析基因體DNA,以偵測經編輯之等位基因上的5’或3’接合處之GFP基因插入。藉由自–RT樣品之Ct值中減去+RT樣品之Ct值來獲得ΔCt,用於EMX1基因座處的GFP插入之相對定量。ΔCt值愈大,指示樣品中之插入頻率愈高。對於所有三種逆轉錄子,均可偵測到5’及3’接合之擴增子顯著高於背景水準。3’接合之擴增子比5’接合之擴增子更豐富,可能因為逆轉錄產生之msDNA的5’端富集更多,促進了3’處之插入。 圖61C證明兩種逆轉錄子能夠在EMX1基因座處進行GFP整合。該圖提供qPCR資料,描繪兩種逆轉錄子在EMX1基因座中之5’或3’接合處插入GFP。將經工程改造以含有GFP基因插入物之ncRNA-sgRNA與每種逆轉錄子之RT mRNA、Cas9 mRNA一起在具有/不具有額外sgRNA的情況下轉染至HEK293T細胞。藉由表中所指示之引子對分析基因體DNA,以偵測經編輯之等位基因上的5’或3’接合處之GFP基因插入。藉由自EMX1 Ct值中減去接合Ct值來獲得ΔCt,用於EMX1基因座處的GFP插入之相對定量。對於所有兩種逆轉錄子,均可偵測到5’及3’接合處之擴增子顯著高於背景水準。3’接合處之擴增子比5’接合處之擴增子略微更豐富,可能因為逆轉錄產生之msDNA的5’端富集更多,促進了3’處之插入。 圖61D顯示Aco1之PCR產物在Agilent Tapestation高靈敏度D1000 screentape上運行。在RT存在下(+RT樣品,凝膠左側),在經編輯之等位基因的5’及3’接合處均偵測到預期大小之擴增子,如箭頭所示。在RT不存在下(-RT樣品,凝膠右側),未偵測到經編輯之等位基因。三角形指示未編輯之等位基因之預期擴增子,其同時存在於-RT及+RT樣品中。 圖62A證明其中sgRNA位於ncRNA上游之融合RNA內的sgRNA及ncRNA次序在精確編輯及插入缺失方面更活躍。藉由脂質轉染遞送至HEK293T細胞的全RNA系統中之三種不同逆轉錄子之不同ncRNA-sgRNA融合次序的測試結果。在既定量之RT mRNA及Cas9 mRNA下,添加具有不同次序之ncRNA融合。對於Eco3、Aco1及R2042,與ncRNA-sgRNA相比,使用sgRNA-ncRNA時,精確編輯之增加更高。在各圖之底部列中,顯示各別條件下之插入缺失頻率。 圖62B證明在sgRNA位於ncRNA下游之情況下,額外sgRNA補償組態之較低編輯活性。與前一圖相同,但添加sgRNA,在額外sgRNA存在下,ncRNA-sgRNA與sgRNA-ncRNA之間之差異取決於每種逆轉錄子。對於Eco3及R2042,sgRNA-ncRNA之效果優於ncRNA-sgRNA。對於Aco1,ncRNA-sgRNA之效果略微優於sgRNA-ncRNA。在各圖之底部列中,顯示各別條件下之插入缺失頻率。 圖63證明精確編輯需要RT及Cas9。藉由脂質轉染遞送至HEK293T細胞的全RNA系統中之陰性對照之測試結果。精確編輯需要RT及Cas9。當移除此等組分中之任一者時,精確編輯無法偵測到或處於背景水準下。在底部圖中,顯示各別條件下之插入缺失頻率。 圖64證明ncRNA及gRNA之分離不會導致精確編輯頻率降低。藉由脂質轉染遞送至HEK293T細胞的全RNA系統中之四種不同逆轉錄子之ncRNA-sgRNA融合對單獨ncRNA的測試結果。在既定量之RT mRNA、Cas9 mRNA及圖下方所述之額外sgRNA下,添加ncRNA/ncRNA-sgRNA/sgRNA-ncRNA。所測試之所有插入模板均為25 bp。最佳ncRNA形式取決於每種逆轉錄子。對於Eco3,使用單獨ncRNA +額外sgRNA時,觀察到最高精確編輯(9.88%)。對於Aco1,使用ncRNA-sgRNA +額外sgRNA時,觀察到最高精確編輯(2.96%)。對於R2042,使用單獨ncRNA +額外sgRNA時,觀察到最高精確編輯(7.4%)。對於R6943,使用ncRNA-sgRNA融合+額外sgRNA時,觀察到最高精確編輯(0.39%)。在各圖之底部列中,顯示各別條件下之插入缺失頻率。 圖65證明逆轉錄子可在 AAVS1基因座處實現精確編輯。四種不同逆轉錄子將25 bp模板插入AAVS1基因座中之測試結果。藉由脂質轉染將逆轉錄子作為全RNA遞送至HEK293T細胞。對於所有四種逆轉錄子,在額外sgRNA存在下觀察到最精確編輯且其介於0.73%至2.05%範圍內,取決於所用之逆轉錄子。在右側,顯示各別條件下之插入缺失頻率。 圖66證明逆轉錄子可在 AAVS1基因座處實現精確編輯。前一圖中之AAVS1實驗之陰性對照結果。移除RT、ncRNA-sgRNA或Cas9 mRNA會取消精確編輯。在各圖之底部列中,顯示各別條件下之插入缺失頻率。 圖67證明標靶股或非標靶股之模板可用於精確編輯。藉由脂質轉染遞送至HEK293T細胞的全RNA系統中之兩種不同逆轉錄子之不同股上的模板相對於Cas9 sgRNA標靶股之測試結果。在既定量之RT mRNA、Cas9 mRNA及額外sgRNA下,添加ncRNA-sgRNA。所測試之所有插入模板均為25 bp。先前所有圖均編碼由Cas9 sgRNA靶向之同一股上的插入模板,由字母T表示。對於Eco3,在額外sgRNA存在下,當模板位於標靶股上時,精確編輯為1.67%且當模板位於非標靶股上時,精確編輯為0.85%。對於Aco1,在額外sgRNA存在下,當模板位於標靶股上時,精確編輯為2.96%且當模板位於非標靶股上時,精確編輯為3.40%。在各圖之底部列中,顯示各別條件下之插入缺失頻率。 圖68證明在4組分全RNA系統中使用Cas9切口酶觀察到逆轉錄子介導之精確編輯。藉由Lipofectamine MessengerMAX遞送至HEK293T細胞的4組分全RNA系統(RT mRNA + Cas9 mRNA + ncRNA/ncRNA-sgRNA + sgRNA)中之Cas9切口酶活性。全RNA以固定量RT mRNA 100 ng、ncRNA/ncRNA-sgRNA 400 ng、Cas9變異體mRNA 100 ng及sgRNA 5 ng經轉染。使用經分離之ncRNA或ncRNA-sgRNA融合。當ncRNA或ncRNA-sgRNA與額外sgRNA一起使用時,Cas9 WT (野生型)之精確編輯為約9%。使用對非標靶股刻切口之Cas9 H480A突變體,在任何條件下幾乎均看不到高於背景之活性。使用對標靶股刻切口之Cas9 D10A突變體,在經分離及融合ncRNA及sgRNA形式中均觀察到精確編輯為約0.1%頻率。右圖顯示左圖之各別條件下之插入缺失。如預期,對於額外sgRNA存在或不存在下之任一切口酶而言,插入缺失均極低,且僅WT Cas9生成顯著插入/缺失。 圖69證明藉由Cas9切口酶觀察到的逆轉錄子介導之精確編輯依賴於全RNA系統之每種組分。與前一圖相同,但對陰性對照進行測試,該等陰性對照缺乏具有或不具有額外sgRNA之全RNA系統之一組分(Cas9或ncRNA/ncRNA-sgRNA或RT)。在頂部圖中,所測試之所有對照條件均顯示背景精確編輯活性,表明由Cas9 WT或Cas9切口酶實現之精確編輯取決於全RNA系統之每種組分。在底部圖中,僅使用Cas9 WT時可偵測到顯著插入缺失。 圖70證明雙sgRNA增加精確編輯,但不增加插入缺失。該圖顯示藉由脂質轉染遞送至HEK293T細胞的全RNA系統中之R2042之雙sgRNA的測試結果。在既定量之RT mRNA、Cas9 mRNA及ncRNA-sgRNA下,添加額外sgRNA。在所有條件下,使用相同ncRNA-sgRNA,其模板編碼Cas9 sgRNA之非標靶股上的25 bp插入。使用一種額外sgRNA (#1)時,精確編輯為0.27%,在第二sgRNA存在下,精確編輯增加至1.33% (#2)或1.13% (#3)。在底部圖中,顯示各別條件下之插入缺失頻率。參見實例9。 圖71藉由在各種組態中對基於R6083之全RNA逆轉錄子編輯系統進行脂質轉染所投與的HEK293T細胞中之精確編輯及插入缺失結果。組態包括使用融合ncRNA-sgRNA構築體(具有及具有摻加之sgRNA)以及經分離之ncRNA及sgRNA構築體時對不同模板長度(25 bp – 100 bp)進行測試。在既定量之RT mRNA及Cas9 mRNA下,添加具有介於25至100 bp範圍內之不同模板長度的ncRNA-sgRNA融合或具有25 bp插入之ncRNA。觀察到100 bp之精確插入為約2%效率。經分離之ncRNA顯示比具有25 bp插入之ncRNA-sgRNA融合更低的活性。精確編輯取決於RT及ncRNA之存在,如頂部右圖所示。在各圖之底部列中,顯示各別條件下之插入缺失頻率。 圖72R6083逆轉錄子將25 bp模板插入HEK293T細胞中之AAVS1基因座中的測試結果。藉由脂質轉染將逆轉錄子(ncRNA-sgRNA融合)、逆轉錄子RT及Cas9組分作為全RNA遞送至HEK293T細胞。使用額外sgRNA,觀察到3%之編輯效率,且活性完全依賴於RT或ncRNA之存在。在右側,顯示各別條件下之插入缺失頻率。 圖73證明Cas9及RT酶之分離顯示比其在逆轉錄子編輯中與Eco3融合時更高之編輯效率。藉由Lipofectamine MessengerMAX遞送至HEK293T細胞的全RNA系統中,Eco3對Cas9及RT融合進行之精確編輯之測試結果。使用與Cas9-RT融合形式相同莫耳濃度之Cas9或RT mRNA進行分離,且比較融合或經分離之酶針對Eco3 ncRNA (左圖上之左側)或Eco3 ncRNA-sgRNA融合(右圖上之右側)之編輯效率。使用ncRNA及ncRNA-sgRNA兩者時,經分離之Cas9及RT實現比其融合時更高之編輯效率。右圖顯示左圖之各別條件下之插入缺失。 圖74證明ncRNA之帽及尾修飾均使編輯效率顯著增加約2倍。藉由Lipofectamine MessengerMAX遞送至HEK293T細胞的4組分全RNA系統(RT mRNA + Cas9 mRNA + ncRNA + sgRNA)中,藉由帽及尾對ncRNA-sgRNA融合進行之末端保護的結果。全RNA以固定量RT mRNA 100 ng、ncRNA-sgRNA 400 ng、Cas9 mRNA 100 ng及sgRNA 5 ng經轉染。ncRNA-gRNA融合為加帽的(+帽–尾)或加聚A尾的(-帽+尾),或同時加帽及加聚A尾(+帽+尾)。使用無末端保護之RNA(-帽–尾)會產生約4.5%之精確編輯,且編輯依賴於逆轉錄子,因為RT缺乏會取消精確編輯。雖然單獨加帽不會改變編輯效率,但單獨加尾略微增加編輯,且帽及尾兩者使編輯效率顯著增加約2倍(左圖)。在右圖中,所有條件下之插入缺失為可相當的。 圖75ncRNA環化之示意圖,如圖76所證明,ncRNA環化使編輯效率增加。步驟1。使ncRNA側接內部同源序列、內含子及朝向兩端向外之另一同源序列。步驟2。兩個同源序列促進兩端更接近。外源GTP起始一系列轉酯化步驟3。切除內含子且將ncRNA接合成環狀形式。 圖76藉由Lipofectamine MessengerMAX遞送至HEK293T細胞的4組分全RNA系統(RT mRNA + Cas9 mRNA + ncRNA + sgRNA)中,經修飾ncRNA (3’MS2修飾及環狀ncRNA)之精確編輯之測試結果。全RNA以固定量RT mRNA 100 ng、ncRNA 400 ng、Cas9 mRNA 100 ng及sgRNA 5 ng經轉染。在ncRNA 3’端添加MS2莖環之情況下,活性相對於未經修飾之ncRNA (約8%)幾乎翻倍,效率為15%。ncRNA之環化實現了效率之進一步增加,從而使編輯效率達到22%。如圖75所述進行ncRNA之環化。若無sgRNA,則無法如預期偵測到精確編輯及插入缺失。右圖顯示左圖之各別條件下之插入缺失。 圖77結果表明RT與其同源ncRNA之間之配對係精確編輯所需的。來自相同或不同逆轉錄子之RT及ncRNA的配對結果,用於藉由全RNA系統對HEK293T細胞中之EMX1基因座處進行精確編輯。全RNA以固定量RT mRNA 100 ng、ncRNA 400 ng、Cas9 mRNA 100 ng及sgRNA 5 ng經轉染。在左圖中,Aco1 RT與Eco3 ncRNA或其同源Aco1 ncRNA (左圖)配對。Eco3 RT與Aco1 ncRNA或其同源Eco3 ncRNA (右圖)配對。僅當RT與其同源ncRNA配對時,該系統才支持精確編輯。右圖顯示左圖之各別條件下之插入缺失。 圖78逆轉錄子編輯支持插入長達305 bp之外顯子大小長插入。使用Aco1、Eco3及R1262逆轉錄子時EMX1基因座處之外顯子大小長插入之測試結果。Aco1以8、4及1.3%實現10、100及205 bp之精確插入(左圖)。Eco3分別以12及2.3%進行10及205 bp之精確插入。新穎逆轉錄子R1262以20、15及8.8%效率獲得25、205及305 bp插入。底部圖顯示頂部圖之各別條件下之插入缺失。 圖79證明新穎逆轉錄子R1262在AAVS1處以超過11%效率插入205 bp插入。使用新穎逆轉錄子R1262時在AAVS1基因座處之205 bp插入之測試結果。R1262分別以25%及11.1%效率獲得25及205 bp插入。作為對照,精確編輯依賴於RT及sgRNA之存在。右圖顯示左圖之各別條件下之插入缺失。 圖80RT:ncRNA比率之最佳化,以藉由R1262進行精確編輯來安裝205-305 bp插入。使用新穎逆轉錄子R1262時在EMX1基因座處之>200 bp長插入之RT: ncRNA比率的測試結果。RT: ncRNA比率以1:12.5、1:16.7及1:25莫耳比進行測試。1:16.7比率對應於其餘研究中使用之1:4質量比。使用ncRNA (左圖中之右側)時,對於205及305 bp插入,在1:12.5比率下觀察到最佳編輯。使用ncRNA-sgRNA融合(左圖中之左側)時,所測試之所有比率生成可相當的編輯,但隨著ncRNA-sgRNA之量增加,可見編輯增加之微弱趨勢。右圖顯示左圖之各別條件下之插入缺失。 圖81基於NHEJ在標靶位點A處插入逆轉錄子製造之雙股DNA。一些逆轉錄子可能使用msR-msD雜合體作為引子(例如Sen1)而產生雙股DNA。側接引導RNA識別序列插入基因體中之所需序列經整合至ncRNA之msD中。藉由逆轉錄產生之雙股DNA的兩端由核酸酶-引導RNA複合物修剪,且插入至由核酸酶-引導RNA複合物裂解之標靶位點中。B. ncRNA如同A中經設計以含有側接引導RNA識別序列插入標靶位點中之序列。第二ncRNA包括第一ncRNA之反向互補序列,以便其中兩者形成雙鏈體。此類雙股DNA由核酸酶-引導RNA複合物修剪,且插入至由核酸酶-引導RNA複合物裂解之標靶位點中。 圖82A非編碼RNA最佳化A。逆轉錄子ncRNA中存在多種結構組分,1. a1/a2反向重複序列 2. msR莖環 3. msD莖環 4. msR DNA。藉由改變莖環之GC含量、長度及數目來微調此等結構元件之強度,可能增強msD DNA產生及後續基因 編輯。 圖82B非編碼RNA最佳化B。msR RT引子及msD RT模板之分離允許對msR RT引子進行化學修飾。與藉由帽及尾對msD RT模板進行之末端保護組合,可能增強ncRNA之穩定性。在RT引子之3’端及msD RT模板之5’端(以虛線描繪)使用反向互補序列會穩定引子/模板複合物。 圖83A提供經修飾之逆轉錄子基因編輯系統的示意圖大綱,該系統包含具有經改良之保真度及/或可加工性之經工程改造之變異體逆轉錄酶(RT)。經工程改造之RT係藉由在逆轉錄子RT序列中之殘基位置處引入選擇性胺基酸取代而產生的,該等殘基位置與鼠科動物白血病病毒(MMLV) RT中的彼等改變殘基異種同源,由此改良MMLV RT之保真度及可加工性。在步驟(a)中,在逆轉錄子RT與經工程改造之MMLV RT之間構建結構比對,該經工程改造之MMLV RT具有一或多個胺基酸取代,該等取代與MMLV RT背景中經改良之保真度及可加工性相關。在步驟(b)中,檢查該比對以鑑定逆轉錄子RT中與經取代之MMLV RT胺基酸相對應之異種同源胺基酸殘基。一旦進行鑑定,即使用常規重組工程改造方法來構建包含一或多個對應之異種同源胺基酸取代的經工程改造之逆轉錄子RT。在步驟(c)中,可視情況且基於結構比對資訊缺失不必要殘基。在一實施例中,逆轉錄子基因編輯系統可在(d)中包含ncRNA、引導物、可程式化核酸酶(或對其進行編碼之RNA)及經工程改造之RT (或對其進行編碼之RNA)。在步驟(e)中,將該等組分(其可呈全RNA形式)遞送至細胞(例如,藉由LNP遞送),此舉導致(f)靶向單股(在切口酶之情況下)或雙股切割,隨後(g)藉由ncRNA之逆轉錄產物在切割位點處進行靶向修復。這導致(h)具有精確編輯之DNA。 圖83B提供在RTX3_1262 RT及RTX3_6242以及Eco1 RT中安裝之突變之概述,該等突變與文獻中所報告的在MMLV中進行之已知有益胺基酸取代為異種同源的。此外,該表提供與MMLV中之突變及突變結構域相關的已報告表型變化。 圖83C提供RTX3_6342 RT (黃色)及MMLV-RT (蛋白質資料庫4MH8) (綠色)之間的結構比對,這表明RTX3-6342 RT之N末端能夠經截短,例如在建議之截短點處。 圖83D顯示接近Q190之MMLV RT的三維結構以及Eco1中之對應正交位置Q161。在頂部圖片中,麩醯胺酸在MMLV及Eco1中均為保守的。中間圖片顯示Eco1中之Q161位置參與核酸接觸。下部顯示,具有Q190F突變之經工程改造之MMLV變異體使保真度增加約2倍。Eco1中之Q161處的對應取代(亦即,Q161F突變)亦可導致Eco1 RT之保真度增加。 圖83EMMLV RT及Eco1中改良可加工性之突變。提供MMLV RT之E302及Eco1中之對應殘基G283的三維結構,兩者似乎均參與結合至核酸。核酸結合結構域中之殘基可發生突變以強制更強之相互作用。MMLV E302R產生正電荷以與帶負電之模板相互作用。Eco1 G283處於保守DNA結合α螺旋中。將G283及異種同源逆轉錄子殘基突變為帶正電荷之胺基酸可改良可加工性。 圖83FMMLV RT及Eco1中改良可加工性之突變。提供MMLV RT之T306及Eco1中之對應殘基F287的三維結構,兩者似乎均參與結合至核酸。核酸結合結構域中之殘基可發生突變以強制更強之相互作用。MMLV T306K產生正電荷以與帶負電之模板相互作用。Eco1 F287處於保守DNA結合α螺旋中。將F287及異種同源逆轉錄子殘基突變為帶正電荷之胺基酸可改良可加工性。 圖84描述藉由對參與可加工性及保真度之靶向結構域(例如,手掌、手指及拇指結構域)進行飽和突變誘發來製備變異體逆轉錄子RT。藉由將飽和突變誘發引入逆轉錄子RT中之手掌、手指及拇指結構域中來產生經工程改造之RT,該等結構域藉由與MMLV RT進行結構比對經鑑定。在步驟(a)中,在逆轉錄子RT與經工程改造之MMLV RT之間構建結構比對,該經工程改造之MMLV RT具有一或多個胺基酸取代,該等取代與MMLV RT背景中經改良之保真度及可加工性相關,尤其在手掌、手指及拇指結構域中。在步驟(b)中,鑑定手掌、手指及拇指結構域。在步驟(c)中,使用飽和突變誘發方法對手掌、手指及/或拇指結構域進行突變誘發,且在活性分析中篩選所得突變體以選擇具有增加之可加工性及/或保真度的一或多種先導變異體。在步驟(e)中,將包括先導逆轉錄子RT變異體在內的組分(其可呈全RNA形式)遞送至細胞(例如,藉由LNP遞送),此舉導致(f)靶向單股(在切口酶之情況下)或雙股切割,隨後(g)藉由ncRNA之逆轉錄產物在切割位點處進行靶向修復。這導致(h)具有精確編輯之DNA。 圖85A描述藉由構建嵌合RT蛋白來製備變異體逆轉錄子RT,該等嵌合RT蛋白將逆轉錄子RT之「Y區」(亦即,據報告負責結合至逆轉錄子ncRNA之msr區域的區域)與另一RT (例如,MMLV RT)融合或將一種逆轉錄子RT之「Y區」置換為另一逆轉錄子RT之Y區。以此方式,該嵌合蛋白可能針對任何特定ncRNA進行工程改造以具有對應Y區,預期該區域結合至彼ncRNA。 圖85B(SEQ ID NO:19955-19968) (頂部至底部)提供多種逆轉錄子RT之胺基酸序列之多重序列比對(頂部)。加框區域指示保守VTG三聯體之位置,該三聯體標記Y區之N末端側(如Anna J Simon, Andrew D Ellington, Ilya J Finkelstein, Retrons and their applications in genome engineering, Nucleic Acids Research, 第47卷, 第21期, 2019年12月02日,第11007–11019頁中所報告,其以引用之方式併入本文中),其長度為約90個胺基酸。下方示意圖為頂部圖之放大版。 圖86描述藉由融合一或多種可加工性增強因子(諸如Sso7d及Sac7d)及/或一或多種保真度增強因子(例如3’>5’核酸外切酶結構域)以增加校對活性來製備變異體逆轉錄子RT (例如,POLE1、POLD1、POLG、Pfu或KOD)。在選擇適當因子來增強逆轉錄子RT之可加工性及/或保真度時,參考Oscorbin等人, 「The attachment of a DNA-binding Sso7d-like protein improves processivity and resistance to inhibitors of M-MuLV reverse transcriptase,」 FEBS Lett, 594: 4338-4356及Yarnall等人, 「Drag-and-drop genome insertion of large sequences without double-strand DNA cleavage using CRISPR-directed integrases,」 Nature Biotechnol, 2022。 圖87在K562細胞中使用質體進行電穿孔。將適當質體混合物混合且使用Neon電穿孔系統電穿孔至細胞中。72小時培育後,提取基因體DNA且將靶向區域擴增至測序文庫中。藉由CRISPResso2分析測序資料且計算精確編輯及插入缺失。 圖88K562細胞中基於質體之分析的結果證明使用RTX003_2042、6083v1、6943、1262、6342L及6342S時,高達約1~10%之精確編輯且低至5~30%之插入缺失。其均為新穎逆轉錄子,且此實驗中觀察到的精確編輯活性強烈地支持此等新穎逆轉錄子可能在K562細胞內生成msDNA。此等資料複製了先前在293T細胞中可見之結果,表明逆轉錄子介導之基因編輯不受細胞類型特異性效應影響。 圖89基於質體之分析的結果,該分析將文獻注釋之逆轉錄子與K562細胞中之新穎逆轉錄子進行比較。此等資料證明,新穎逆轉錄子RTX3_1262、RTX3_6342L及RTX3_6342S可生成約15-19%之精確編輯及低至20~35%之插入缺失。新穎逆轉錄子之效果與Aco1一樣好或優於Aco1,且優於其他文獻驗證之逆轉錄子:Eco1、Eco3、Sau1及Sen1。 圖90基於質體之分析的結果,該分析將RTX3_2781與K562細胞中之先導逆轉錄子RTX3_1262及RTX3 _6342S/L進行比較。此等資料證明,RTX3_2781以與RTX3_1262及6342S/L可相當之效率安裝精確編輯。 圖91K562細胞中基於質體之分析的結果證明使用Cas9 D10A切口酶,使用RTX003_2042、6083v1、6943、1262、6342L及6342S時,高達約0.6%之精確編輯且低至5%之插入缺失。其均為新穎逆轉錄子,且此實驗中觀察到的精確編輯活性強烈地支持此等新穎逆轉錄子可能在細胞內生成msDNA,可用於切口酶介導之基因編輯。 圖92K562細胞中基於質體之分析的結果證明使用LbCas12a核酸酶實現高達0.5%之精確編輯。此系統中之精確編輯強烈地表明細胞內之msDNA生成以及與用於基因編輯之Cas12a樣核酸酶之可相容性。 圖93K562細胞中基於質體之分析的結果證明使用TnpB核酸酶實現高達0.5%之精確編輯。此系統中之精確編輯表明細胞內之msDNA生成以及與用於基因編輯之TnpB樣核酸酶之可相容性。 圖94RTX3_6342 (自alpha-fold預測之結構)(黃色)與MMLV-RT (PDB 4MH8) (綠色)進行比對。使原生RTX3_6342 RT在N末端融合至非RT相關結構域,這對於逆轉錄可能並非必需的。箭頭標記潛在之相關截短點。Jumper等人 Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021)。https://doi.org/10.1038/s41586-021-03819-2。 圖95K562細胞中基於質體之分析的結果證明使用RTX003_6342S及截短變異體時,高達約20%之精確編輯及約20%之插入缺失。此實驗中觀察到的精確編輯活性強烈地支持此等截短保留在細胞內生成msDNA之能力。 圖96K562細胞中基於質體之分析的結果證明使用具有不同插入物大小之RTX003_6342S及RTX3_1262時,高達約40%之精確編輯及約60%之插入缺失。此實驗中觀察到的精確編輯活性表明,RTX3_6342S可將長達405 bp之序列插入EMX1基因座中,而不會降低精確編輯。在編輯效率下降之前,RTX3_1262可在EMX1基因座處插入多達205 bp。在此實驗中之CRISPResso2分析期間,忽略讀數中之取代。 圖97K562細胞中基於質體之分析的結果證明使用具有不同插入物大小之RTX003_6342S及RTX3_1262,同時忽略或定量CRISPResso2分析期間之取代時,高達約40%之精確編輯及約40%之插入缺失。此實驗中觀察到的精確編輯活性表明,RTX3_6342S可將長達405 bp之序列插入EMX1基因座中,但RT保真度可能限制預期編輯之準確安裝(比較忽略(圓圈)與計數(三角形)取代)。 圖98K562細胞中基於質體之分析的結果,該分析篩選RTX3_6342逆轉錄酶中之突變,在將預測之alphafold RTX3_6342結構及逆轉錄子Eco1晶體結構與ncRNA進行結構比對後,預測該等突變與ncRNA相互作用。大多數突變對較短10 p插入物之安裝具有適度影響,但僅兩者(N465K及N465R)可能增加較大(305 p)插入物之安裝頻率。 圖99K562細胞中基於質體之分析的結果證明,RTX3_6342S中之異種同源MMLV突變不會始終增加>205 bp插入物之安裝頻率。MMLV (L139P) == RTX3_6342 (V265P),MMLV (T306K) == RTX3_6342 (M470K),MMLV (W313F) == RTX3_6342 (V477F)。 圖100K562細胞中基於質體之分析的結果,該分析篩選RTX3_6342逆轉錄酶中之突變,在將預測之alphafold RTX3_6342結構及逆轉錄子Eco1晶體結構與ncRNA進行結構比對後,預測該等突變與ncRNA相互作用。E238R、E479K及K255P係可能增加305 bp插入物在EMX1基因座處之安裝頻率之突變。 圖101野生型及具有不同N末端截短之RTX3_6342的預測結構。用Alphafold生成預測結構,且使用PyMoL比對與Eco1/CryoEM結構(79VU.pdb)進行比對。 圖102K562細胞中基於質體之分析的結果,該分析評估RTX3_6342 N末端螺旋之部分或全部缺失對較長插入物之安裝頻率的影響。吾人觀察到部分N末端截短會增強較長插入物之精確編輯/插入缺失比率。 圖103K562細胞中基於質體之分析的結果,該分析評估DNA結合結構域與RTX3_6342之融合對405 bp插入物之安裝頻率的影響。Sac7d或Sso7d DNA結合結構域與逆轉錄子逆轉錄酶N末端之融合導致405 bp插入物之安裝頻率增加兩倍(RTX3_6342S)或十倍(RTX3_1262)。 圖104K562細胞中基於質體之分析的結果,該分析篩選與RTX3_6342、6083、6943或1262相似之逆轉錄子家族成員。吾人注意到,大多數具有強大基因編輯活性之逆轉錄子屬於RTX3_6342家族。 圖105全RNA系統中之新穎逆轉錄子R2781的精確編輯活性之測試結果。在圖16中,R2781在質體系統中之EMX1基因座處證明10 bp插入,與其他命中R1262、R6342S、6342L具有相似活性。此處,使用兩種不同同源臂(HA)長度(雙臂30 bp或49 bp左臂/65 bp右臂)來評估 R2781在EMX1基因座處插入25 bp、205 bp及405 bp之活性。以11%效率實現25 bp及30 bp同源臂之精確插入。使用較長同源臂(49/65 bp)時,相同長度插入之效率降低至一半,表面此逆轉錄子逆轉錄酶可能具有有限的酶可加工性。相反,插入205或405 bp之活性顯著下降。右圖顯示左圖之各別條件下之插入缺失。 圖106具有Cas.9 D10A之全RNA系統中之新穎逆轉錄子R6342S的精確編輯活性之測試結果。先前,已證明逆轉錄子R6342S可能藉由Cas9 WT實現精準編輯。此處,使用Cas9 D10A或Cas9 D10A R221K/N394K突變體以及10或50 ng sgRNA,在293T細胞中評估EMX1基因座中之25 bp插入之活性。Cas9 D10A證明高達0.4%之編輯,而Cas9 D10A R221K/N394K顯示更高活性,高達0.7%之編輯。此等資料表明,使用Cas9切口酶以及產生雙股斷裂之Cas9 WT可能實現逆轉錄子介導之基因體插入。右圖顯示左圖之各別條件下之插入缺失。 圖107全RNA系統中之新穎逆轉錄子R1262的精確缺失活性之測試結果。在頂部圖中,顯示兩種缺失策略。藉由並置左及右同源臂序列以缺失介入序列來設計逆轉錄子ncRNA。Del1在EMX1基因座處之Cas9切割位點上游缺失214 bp且del2在Cas9切割位點下游缺失248 bp。將逆轉錄子介導之缺失與直接遞送增加量之單股DNA (ssDNA)供體(150及300 ng)進行比較。左圖中之底部圖顯示R1262能夠以與ssDNA供體相似之活性自EMX1基因座缺失248 bp (del2)。右圖顯示左圖之各別條件下之插入缺失。應注意,逆轉錄子介導之缺失導致的意外插入缺失活性比ssDNA供體介導之缺失低約6倍。未偵測到R1262之214 bp (del1)缺失活性 圖108基於全RNA系統中之新穎逆轉錄子R1262及R6342之插入活性的非同源末端接合(NHEJ)之測試結果。在左側,描繪衍生雙股DNA (dsDNA)之策略,該雙股DNA為NHEJ機制之受質。1.逆轉錄子逆轉錄酶由含有有義或反義插入序列之兩個ncRNA生成互補單股DNA。2.互補單股DNA形成雙鏈體 3. Cas9-sgRNA複合物移除兩個末端之Y形單股部分(msR及msD間隔序列)。4. Cas9修剪之鈍端雙股DNA成功地插入標靶位點處。在插入物之兩端串聯包括sgRNA識別序列(與基因體標靶序列相同)會引起鈍端雙股DNA,且僅一個插入方向允許藉由防止Cas9進行再切割來實現更穩定整合。使用此策略,具有兩個互補ncRNA之R1262以約1%效率在EMX1基因座上整合約120 bp dsDNA,當插入序列側接兩個串聯sgRNA序列時,效率更高(頂部圖右側)。無串聯sgRNA序列之R6342之基礎插入活性略微高於R1262之彼活性。底部圖右側顯示頂部圖之各別條件下之插入缺失。 圖109對逆轉錄子RNA之免疫反應之測試結果。使用人類外周血單核細胞(hPBMC)來評估免疫反應,因為PBMC含有先天性及適應性免疫細胞,且其配備有感測器來偵測外源核酸。將冷凍之hPBMC解凍,靜置隔夜。如標籤所示,具有U或m1Ψ之RT mRNA及5’端具有/不具有cap0 (m7G)之ncRNA個別地或一起進行電穿孔,同時來自TriLink之未經修飾之GFP mRNA作為對照。隔夜培養後,分析上清液之細胞介素產生。在所檢查之25種細胞介素及趨化介素中,顯示經偵測高於偵測極限之彼等。在任何逆轉錄子RNA轉染之細胞中均未偵測到I型乾擾素(對外源核酸之免疫反應的標誌),且在逆轉錄子RNA轉染之細胞中偵測到低水準之一些發炎性細胞介素及趨化介素,但其水準比除不具有U修飾之RT mRNA以外的對照GFP mRNA低得多。 圖110全RNA系統中的逆轉錄子介導之基因編輯在人類幹祖細胞(HSPC)中之測試結果。使用骨髓源性CD34+人類幹祖細胞(HSPC)來評估原代細胞中逆轉錄子介導之基因編輯之效能。將冷凍之HSPC解凍以在hSCF、hFLT3-L、hTPO細胞介素存在下擴增三天,從而防止分化。使Cas9 mRNA、引導RNA (gRNA)、R6342 RT mRNA及ncRNA以每個標籤中所指示之質量比混合,且藉由Lonza電穿孔器使用兩種不同程式電穿孔至HSPC。再培養三天后,提取基因體DNA且進行測序以量測編輯頻率。使用共計5微克RNA (以1: 0.3 : 4: 1=Cas9 :gRNA :ncRNA :RT分開),以0.1%頻率觀察到25 bp在AAVS1基因座處之精確插入(左圖)。右圖示出在各別條件下之插入缺失。 圖111全RNA系統中的逆轉錄子介導之基因編輯在人類T細胞中之測試結果。使用來自外周血之人類全T細胞來評估原代細胞中逆轉錄子介導之基因編輯之效能。將冷凍T細胞解凍且在抗CD3/抗CD28結合之磁珠及IL-2細胞介素存在下活化兩天。使Cas9 mRNA、引導RNA (gRNA)、R6342 RT mRNA及ncRNA以每個標籤中所指示之質量比混合,且藉由Neon或Lonza電穿孔器電穿孔至T細胞。再培養三天后,提取基因體DNA且進行測序以量測編輯頻率。使用共計3微克RNA (以1: 0.3 : 3: 1=Cas9 :gRNA :ncRNA :RT分開),使用Neon機器以高達1.7%頻率觀察到25 bp在AAVS1基因座處之精確插入(左圖)。右圖示出在其各別條件下之插入缺失。 圖112提供逆轉錄子ncRNA文庫之示意圖。ncRNA中之不同元件之變異體(經修飾之a1:a2莖、msR、msD區域)與獨特條碼相關,且該文庫經合成為oligo文庫。藉由對插入基因體基因座處之條碼進行測序,可量測人類細胞中之相關ncRNA變異體之效率。 圖113概述實例31之變異體文庫。 圖114提供示意圖,描繪用於篩選實例31中所述之ncRNA變異體文庫的篩選過程。 圖115提供散點圖,概述圖31之結果。在該散點圖中,每個點表示一種變異體,其中y軸上為相對基因體插入水準且x軸上為ssDNA產生水準。虛線表示WT R6342S對照之基因體插入及ssDNA產生水準。因此,虛線右上部分之ncRNA變異體在兩個層面上均優於WT R6342S對照。可見,大多數此等變異體均具有a1a2或msD莖元件。吾人列出13種此類變異體,該等變異體在基因體插入及ssDNA產生水準上均顯示比WT R6342S對照高出超過1.5倍。 FIG. 1A is a schematic diagram depicting a naturally occurring retrotranscript obtained from the genomic DNA stage by generating msDNA chimeric microsatellite molecules. Retrotranscripts are encoded in the bacterial genome and include a noncoding RNA (ncRNA) portion and a portion encoding a specific reverse transcriptase (RT). ncRNA and RT are initially transcribed from the retrotranscript DNA as a single polycistronic message. The initial transcript is processed, resulting in the removal or separation of the transcript encoding the retrotranscript RT. The remaining transcript is ncRNA, which undergoes folding to form a secondary structure having several characteristic stem-loops and a duplex formed between the 5' and 3' regions of the ncRNA (i.e., a1/a2 duplex). The folded ncRNA is recognized by the accompanying RT, which is translated separately and provided in trans . The translated RT usually recognizes certain secondary structures in the ncRNA and binds to the RNA template downstream of the msd region. RT starts from the 2' end of the conserved guanosine (G) residue found immediately after the double-stranded RNA structure (a1/a2 region) within the ncRNA and initiates reverse transcription of the RNA toward its 5' end. A portion of the ncRNA (i.e., the msd region) serves as a template for reverse transcription, and reverse transcription terminates before reaching the msr locus. During reverse transcription, cellular RNase H degrades the ncRNA segment that serves as a template, but does not degrade other parts of the ncRNA. The result of reverse transcription, msDNA (lower right of the schematic), remains covalently linked to the RNA template via a 2'-5' phosphodiester bond, and the 3' end of the msDNA is used for base pairing with the RNA template. Figure 1B is a schematic diagram depicting one embodiment of the recombinant retrotranscript considered in the present disclosure. In this embodiment, the nucleic acid molecule comprises a nucleotide sequence encoding the ncRNA region (msr/msd region) of the retrotranscript, as depicted in the upper left corner. The msr region has been modified by introducing one or more nucleotide modifications (e.g., nucleotide substitutions, deletions, or insertions). For example, it may be necessary to introduce one or more nucleotide substitutions in the msr to enhance functionality (e.g., binding of the corresponding ncRNA to RT, improved stability, improved folding, etc.). The modified msr is referred to as msr'. In addition, the msd has been modified by introducing a heterologous nucleotide sequence encoding an HDR donor template. Finally, the retroviral DNA has been modified to introduce a nucleotide sequence encoding a guide RNA at the 3' end of the retroviral DNA sequence, and the retroviral DNA is configured on a DNA vector (e.g., a plasmid). The DNA is shown to be transcribed as a polycistronic message, which includes a msr'/msd' region (forming a ncRNA) that is fused to the guide RNA at its 3' end (in other embodiments, the guide RNA may be fused to the 5' end of the retroviral ncRNA). This intermediate is shown to form a complex with a reverse transcriptase provided in trans (e.g., by means of a separate expression vector or delivered mRNA). The schematic diagram on the upper right shows the formation of a complex between the recombinant ncRNA and RT and the initiation of reverse transcription from a covalently linked conserved guanosine (G) (i.e., "activating G" or "activating guanosine") using msd RNA as a template sequence. After the template RNA sequence is reverse transcribed and degraded by RNaseH, a recombinant msDNA is formed, which includes three modifications as shown below: (a) a guide RNA attached to the 3' end of the msDNA, (b) nucleotide changes in msr', and (c) the reverse transcribed single-stranded DNA includes a region that serves as an HDR donor template. This type of recombinant msDNA may then promote a variety of genome modification applications in cells, including genome editing using RNA-guided nucleases provided to cells in trans . Figure 1C is a schematic diagram depicting the genome editing system based on recombinant retrotranscripts described herein. In the case of genome editing involving RNA-guided nucleases, the components of such systems may include (a) guide RNA provided in cis (e.g., fused to recombinant retrotranscript msDNA) and/or in trans (e.g., expressed alone in cells), (b) recombinant ncRNA (including at least a sequence encoding an HDR donor template and, optionally, a guide RNA fused to the ncRNA), (c) a reverse transcriptase, and (d) a programmable nuclease. These components are provided to cells in the form of DNA and/or RNA and/or protein by delivery means (e.g., LNP, liposomes, virus-based delivery, or passive/active transport). Once inside the cell, recombinant msDNA is formed. The msDNA and the programmable nuclease translocate to the nuclease to perform gene editing at the target DNA site, thereby generating an edited DNA target. Figure 1D provides a simplified schematic diagram of the natural life cycle of a retrotranscript. A retrotranscript typically comprises a reverse transcriptase (RT) and two non-coding continuous reverse sequences ( msr and msd ) transcribed into a single RNA, which is folded into a specific secondary structure. The conserved NAXXH motif and VTG triplet in the retrotranscript RT are indicated. RT binds to the RNA downstream of the msd locus, thereby initiating reverse transcription of the RNA template toward its 5' end with the assistance of the 2'OH group present in the conserved branch chain G residue acting as a primer. Reverse transcription stops before reaching the msr locus, and the resulting msDNA remains covalently linked to the RNA template via 2'-5' phosphodiester bonds and base pairing at the 3' end of the molecules. Figure 1E provides a detailed representation of the natural biological pathway of the retrotransposons in the cell, ending with the production of msDNA satellite molecules. This figure is parallel to Figure 1D, but more completely depicts the stages of msDNA production. (1) Depiction of the retrotransposons locus, which includes an ncRNA locus with an msr locus and an msd locus (both of which are non-coding) and a reverse transcriptase (RT) locus. The ncDNA locus and RT locus are transcribed into a single RNA transcript, which is depicted in (2). The colors representing each component in (1) run through each of stages (2) to (6). Stages (3) and (4) depict the ncRNA portion folding into a series of stem loops, wherein the 5' and 3' ends of the ncRNA form a duplex. In addition, the position of the conserved branched-chain guanosine residue with a 2'OH group is also shown. The branched-chain guanosine serves as the future start site of the reverse transcriptase. Stage (4) further shows that the transcription region encoding the reverse transcriptase is removed and translated alone to produce the reverse transcriptase. In stage (5), the reverse transcriptase binds to the folded ncRNA and starts from the primer site (i.e., the conserved branched-chain guanosine residue with a 2'OH end) and uses the msd RNA sequence as a template to polymerize single-stranded DNA (i.e., the reverse transcriptase product). Reverse transcription terminates at the msr region. The msd RNA template is removed by exonucleolytic exotomy, thereby generating a chimeric molecule comprising a msr RNA region, which is covalently linked to the ssDNA transcript by covalently linking to a conserved guanosine primer residue. A short duplex region is also formed between the 3' end of the msr RNA and the 3' end of the ssDNA reverse transcript. The complete molecule is called "msDNA". Figure 1F is a schematic diagram depicting that the genome modification system based on the recombinant retrotranscriptor disclosed herein can be implemented as (a) a cell recorder system, (b) a genome editing system, and (c) a recombinant engineering system. These uses are not intended to be limiting. Figure 1G is a schematic diagram depicting various configurations considered for the recombinant retrotranscriptor disclosed herein. (1) shows the operator structure of a wild-type retrotranscript; (2) shows the operator structure of a recombinant retrotranscript, which is configured to encode the HDR donor template in the final msDNA molecule; (3) shows (2), but is further modified to encode the guide RNA at the 3' end of the retrotranscript; (4) shows (2), but is further modified to encode the trans- guide RNA. Figure 1H is a schematic diagram that emphasizes that any suitable configuration for presenting the components of the recombinant retrotranscript-based genome modification system disclosed herein to a cell is contemplated, including the case where RT and/or programmable nuclease are provided in trans relative to the retrotranscript ncRNA. In some embodiments, RT and programmable nuclease can be provided as fusion proteins. Figure 1I depicts that RT and programmable nucleases can be provided as fusion proteins (top, middle) or separately from each other. Figure 1J depicts that nuclear localization signals can be engineered into polypeptides of the present disclosure (e.g., RNA-guided nucleases) to facilitate translocation of the nuclease into the cell in which editing occurs. Figure 1K is a schematic (not to scale) depicting an embodiment of genome editing in which a double-strand break (DSB) generated by an appropriate nuclease (such as a CRISPR/Cas effector enzyme, ZFN, TALEN, meganuclease, TnpB, IscB, or restriction enzyme (Res)) facilitates insertion of a donor or template sequence (shown here as a "marker," flanked by homologous sequences that match those sequences flanking the DSB provided on the "donor vector"). FIG. 1L depicts the ability of three different types of retrotranscripts (Eco1-R1, Eco3-R2, Eco5-R3) to insert 16 base pair insertions at a specific genome site (EMX1 gene). FIG. 1M summarizes the procedure used to evaluate the ability of retrotranscripts to produce genome insertions. FIG. 2 (SEQ ID NO: 19970) is a schematic diagram of the consensus secondary structure of the retrotranscript ncRNA msr / msd of a type IA/IIA1 retrotranscript, generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 3 (SEQ ID NO: 19971) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IB1 retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present in 97% of cases, black represents a base present in 90-97% of cases, gray represents a base present in 75-90% of cases, and white represents a base present in 50-75% of cases), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 4 (SEQ ID NO: 19972) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IB2 retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 5 (SEQ ID NO: 19973) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IC retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present in 97% of cases, black represents a base present in 90-97% of cases, gray represents a base present in 75-90% of cases, and white represents a base present in 50-75% of cases), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 6 (SEQ ID NO: 19974) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of other retrotranscripts of type IIA, which were generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 7 (SEQ ID NO: 19975) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IIA2 retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present 97% of the time, black represents a base present 90-97% of the time, gray represents a base present 75-90% of the time, and white represents a base present 50-75% of the time), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 8 (SEQ ID NO: 19976) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IIA3 retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 9 (SEQ ID NO: 19977) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IIA4 retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present 97% of the time, black represents a base present 90-97% of the time, gray represents a base present 75-90% of the time, and white represents a base present 50-75% of the time), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 10 (SEQ ID NO: 19978) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IIA5 retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 11 (SEQ ID NO: 19979) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IIIA1 retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present 97% of the time, black represents a base present 90-97% of the time, gray represents a base present 75-90% of the time, and white represents a base present 50-75% of the time), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 12 (SEQ ID NO: 19980) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IIIA2 retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 13 (SEQ ID NO: 19981) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of the IIIA3 retrotranscript, which were generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present 97% of the time, black represents a base present 90-97% of the time, gray represents a base present 75-90% of the time, and white represents a base present 50-75% of the time), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 14 (SEQ ID NO: 19982) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IIIA4 retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 15 (SEQ ID NO: 19983) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IIIA5 retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present 97% of the time, black represents a base present 90-97% of the time, gray represents a base present 75-90% of the time, and white represents a base present 50-75% of the time), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 16 (SEQ ID NO: 19984) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IIIunk retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 17 (SEQ ID NO: 19985) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IV retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present 97% of the time, black represents a base present 90-97% of the time, gray represents a base present 75-90% of the time, and white represents a base present 50-75% of the time), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 18 (SEQ ID NO: 19986) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type IX retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 19 (SEQ ID NO: 19987) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of V-type retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present in 97% of cases, black represents a base present in 90-97% of cases, gray represents a base present in 75-90% of cases, and white represents a base present in 50-75% of cases), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 20 (SEQ ID NO: 19988) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type VI retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 21 (SEQ ID NO: 19989) is a schematic representation of the consensus secondary structure of retrotranscript ncRNA msr/msd of type XI group 1 retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present 97% of the time, black represents a base present 90-97% of the time, gray represents a base present 75-90% of the time, and white represents a base present 50-75% of the time), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 22 (SEQ ID NO: 19990) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type XI group 2 retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 23 (SEQ ID NO: 19991) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type XII retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present 97% of the time, black represents a base present 90-97% of the time, gray represents a base present 75-90% of the time, and white represents a base present 50-75% of the time), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 24 (SEQ ID NO: 19992) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type XIII retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 25 (SEQ ID NO: 19993) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of type XIV retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present 97% of the time, black represents a base present 90-97% of the time, gray represents a base present 75-90% of the time, and white represents a base present 50-75% of the time), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 26 (SEQ ID NO: 19994) is a schematic diagram of the consensus secondary structure of the retrotranscript ncRNA msr/msd of the Ec107 retrotranscript, which was generated by computational structural alignment of the ncRNA sequences from Table B as described in Example 3. The colored dots represent the probability of a base being at that position ( e.g. , a red circle indicates that the base is present 97% of the time, black indicates that the base is present 90-97% of the time, gray indicates that the base is present 75-90% of the time, and white indicates that the base is present 50-75% of the time), as opposed to a gap (no base), and the colored letters represent bases with varying degrees of conservation ( e.g. , where red indicates 97%+ conservation, black is 90%+ conservation, and gray indicates at least 75% conservation). Each highlighted base pair represents a significantly covariant base pair. Figure 27 (SEQ ID NO: 19995) is a schematic diagram of the consensus secondary structure of retrotranscript ncRNA msr/msd of outgroup A retrotranscripts generated by computational structural alignment of ncRNA sequences from Table B as described in Example 3. Colored dots represent the probability of a base being at that position ( e.g. , a red circle represents a base present 97% of the time, black represents a base present 90-97% of the time, gray represents a base present 75-90% of the time, and white represents a base present 50-75% of the time), as opposed to a gap (no base), and colored letters represent bases of varying degrees of conservation ( e.g. , where red represents 97%+ conservation, black represents 90%+ conservation, and gray represents at least 75% conservation). Each highlighted base pair represents a significant covariant base pair. Figure 28 is a germline tree of the RT sequence constructed according to Example 3. Figure 29 is a structural representation of the retrotransposon locus associated with each retrotransposon type in Figure 28. Figure 30 shows the position of some retrotransposon (EcoI, Eco3, Eco5, AcoI, RTX003_2042, RTX003_6083v1 and RTX003_6943) in the retrotransposon germline tree of Figure 28. Figure 31A is a plastid map of an exemplary retrotransposon (EcoI) tested in Examples herein. Figure 31B is a linear representation of the 5' to 3' direction of the plastid map of Figure 31A. Figure 31C is a plastid map of an exemplary retrotransposons (RTX3_6083v1) tested in the examples herein. Figure 31D is a linear representation of the 5' to 3' direction of the plastid map of Figure 31C. Figure 32 represents a plastid-based analysis for measuring precise editing and insertion and deletion of retrotransposons, as performed in the examples. In step 1, a plastid (e.g., a plasmid of Figure 31A or Figure 31C) is transfected into a human cell (e.g., HEK293t cell) engineered to express Cas9. Editing is allowed to proceed at 37°C for 72 hours. In step 2, genomic DNA is extracted from the cells and used to prepare a next generation sequencing (NGS) library for sequencing. The library is sequenced at the target site of the edit (e.g., EMX1) to generate sequence reads. In step 3, the sequencing reads are analyzed to obtain the frequency of sequence reads containing the desired edit (percentage of exact edits) and the frequency of indels at the desired edit site (percentage of indels). Figure 33 is an equivalent representation of Figure 32. Figure 34 represents a method for transfecting HEK293T cells. One day before transfection, cells are seeded in 24-well plates. An appropriate amount of plasmid and transfection reagent (e.g., Lipofectamine 3000) are mixed and transferred to the cells. After 72 hours of incubation, the genomic DNA is extracted and the target edited region is expanded into the sequencing library. Sequencing data was analyzed by CRISPResso2 and the percentage of accurate edits and indels was calculated. Figure 35 (SEQ ID NO: 19996-20003) (top to bottom) is an example of a reference sequence of the Eco3 retrotransposons at the EMX1 genome site and the desired editing results. The editing results were analyzed using the CRISPresso2 pipeline. In this example, the editing template inserted a 10 bp insertion into the EMX1 gene (TTACGTCTGC) (SEQ ID NO: 19931) and simultaneously inserted a 6 bp substitution to mutate the PAM sequence (GAAGGG>AAAGTT) (SEQ ID NO: 19954). Figure 36 shows the results of plastid-based analysis (e.g., according to Figure 33), demonstrating that the use of Eco1 retrotransposons in HEK293T cells expressing Cas9 achieves precise editing up to about 0.3% and indels as low as 40%. Plasmids encoding Eco1 RT and Eco1 ncRNA-sgRNA fusions targeting EMX1 were transfected by lipid transfection using two different amounts of Lipofectamine. Figure 37 shows the results of plastid-based analysis (e.g., according to Figure 33), demonstrating that the use of AcoI achieves precise editing up to about 0.1% and indels as low as 3%. Aco1 retrotransposons have not been experimentally verified to produce msDNA. The precise editing activity observed in this experiment strongly supports that Aco1 retrotransposons can generate msDNA in human cells. Figure 38 shows the results of a plastid-based analysis (e.g., based on Figure 33), demonstrating that precise editing up to about 0.3% and indels as low as 5% were achieved using RTX003_2042. This retrotranscript can achieve precise editing comparable to Eco1, but with significantly lower indels (10-fold). RTX003_2042 is a novel retrotranscript, and the precise editing activity observed in this experiment strongly supports that the RTX003_2042 retrotranscript may generate msDNA in human cells. Figure 39A shows the results of a plastid-based analysis, demonstrating that precise editing up to about 0.05-0.08% and indels as low as 2.5-4% were achieved using RTX003_6083v1 and 6943. Both are novel retrotranscripts, and the precise editing activity observed in this experiment strongly supports that the RTX003_6083v1 and 6943 retrotranscripts may generate msDNA in human cells. Figure 39B shows a follow-up experiment using the same analysis of Figure 39A, indicating that RTX3_6083v1 and RTX3_6943 generate 3-4 times more precise edits than Eco1, while the indels generated by these two retrotranscripts are 2-3 times lower. RTX3_2042 shows a similar frequency of precise edits as Eco1, but with greater variability than other samples. Figure 39C shows a follow-up experiment using the same analysis of Figure 39A, indicating that RTX3_6083v1 and RTX3_6943 generate 3-4 times more precise edits than Eco1, while the indels generated by these two retrotransposons are 2-3 times lower. RTX3_2042 shows a similar frequency of precise edits as Eco1, but has greater variability than other samples. Figure 39D shows the results of a plasmid-based analysis, demonstrating that precise edits as high as about 0.7% and indels as low as about 4% are achieved using RTX003_0637, RTX003_1262, and RTX003_6342 compared to empty vector and EcoI. RTX003_0637, RTX003_1262, and RTX003_6342 are novel retrotransposons, and the precise editing activity observed in this experiment strongly supports that these retrotransposons can generate msDNA in cells. Figure 39E shows the results of plastid-based analysis, demonstrating precise editing and indel generation for an array of retrotransposons including EcoI, Eco3, RTX3_2042_RT_inactive, RTX3_2042, RTX3_6083v1, RTX3_6943, RTX3_6943, RTX3_1262, RTX3_6342S, and RTX3_6342L. Compared to empty vector and EcoI as controls, accurate editing up to about 2.5% and indels as low as about 4% were achieved using EcoI, Eco3, RTX3_2042_RT_inactive, RTX3_2042, RTX3_6083v1, RTX3_6943, RTX3_6943, RTX3_1262, RTX3_6342S, and RTX3_6342L. Figure 40 shows the 2-RNA editing assay used in the examples, which uses electroporation-based delivery of two RNA components in HEK293T cells to measure the relative editing efficiency of exemplary retrotransposons. Appropriate amounts of RT mRNA and ncRNA-sgRNA fusions were mixed and electroporated into cells. After 72 hours of incubation, genomic DNA was extracted and the targeted region was expanded into the sequencing library. Sequencing data were analyzed by CRISPResso2 and precise edits and indels were calculated. Figure 41 shows the results of a 2-RNA system (RT mRNA + ncRNA-sgRNA fusion) delivered by electroporation to HEK293T cells expressing Cas9. Eco1, Eco3 and Eco5 retrotranscripts were tested. The results showed that precise editing for Eco3 (left figure) was as high as 0.4% and indels for Eco3 were as low as 10% (right figure). Eco3-mediated precise editing increased with increasing amounts of ncRNA-sgRNA fusions, with a ratio between RT mRNA and ncRNA-sgRNA fusions of 1:2 to 1:4. Figure 42 shows the titration results of a two-component Eco3 RNA system (RT mRNA + ncRNA-sgRNA fusion) delivered to 293T cells expressing Cas9 by electroporation. RT mRNA and ncRNA were mixed at ratios of 1:2, 1:3, 1:4, 1:5, 1:8, and 1:10, respectively, and two different amounts of RT mRNA (0.2 or 0.5 µg) were delivered. The left data show that 0.5 µg Eco3 produced the highest percentage of accurate edits at 1:3 and 1:5 RT mRNA:ncRNA ratios. The right data further show that the more equal the RT mRNA:ncRNA ratio, the lower the trend of indel percentage. Figure 43 represents a three-RNA retrotranscript editing system involving the delivery of three RNA components (RT mRNA, retrotranscript ncRNA-sgRNA fusion, and Cas9 mRNA) to HEK293T cells by electroporation. Appropriate amounts of RT mRNA, ncRNA-sgRNA fusion, and Cas9 mRNA were mixed and electroporated into cells. After 72 hours of incubation, genomic DNA was extracted and the targeted region was expanded into a sequencing library. Sequencing data was analyzed by CRISPResso2 and precise edits and indels were calculated. Figure 44 shows the Cas9 mRNA titration results of the three-component Eco3 RNA system (RT mRNA + ncRNA-sgRNA fusion + Cas9 mRNA) delivered to 293T cells by electroporation. RT mRNA and ncRNA-sgRNA fusion were mixed in the amounts given in the figure, and the amount of Cas9 mRNA was titrated. At 0.2 µg Cas9 mRNA, precise edits of up to 0.1% were observed. Although the editing efficiency was orders of magnitude lower than the 2-RNA system, the editing occurred through the specific action of Cas9 and the retrotransposons, as the absence of either abolished the editing. Figure 45 depicts the lipofection process in HEK293T cells using the three-RNA system. One day before transfection, cells were seeded in 96-well plates. Appropriate amounts of RT mRNA, ncRNA-sgRNA fusion, Cas9 mRNA and Lipofectamine reagent were mixed and transferred to the cells. After 72 hours of incubation, the genomic DNA was extracted and the targeted region was expanded into a sequencing library. Sequencing data were analyzed by CRISPResso2 and precise edits and indels were calculated. Figure 46 shows the results of a three-component Aco1 RNA system (RT mRNA + ncRNA + Cas9 mRNA) delivered to HEK293T cells by lipofection. RT mRNA, ncRNA, and Cas9 mRNA were mixed and transfected into HEK293T cells in the amounts indicated in the figure. 56 bp insertions and 6 bp deletions at the EMX1 locus were scored as precise edits, and approximately 0.1% of the cell population in the left figure had undergone precise edits. Editing is dependent on the Cas9 nuclease, as its absence cancels editing. The frequency of indels in the right figure is approximately 1.5%. Figure 47 shows the results of minimal Cas9 nuclease activity when sgRNA is fused to the ncRNA of the Eco3 retrotransposons. Cas9 activity was assessed by indel frequency. 1 µg of ncRNA-sgRNA fusion showed 20-fold lower activity than the sgRNA isolated alone at the same mole. At the same time, the activity of chemically modified to unmodified sgRNA was compared and under the conditions described in the figure, the former showed 6-fold higher activity than the latter. Figure 48 shows the whole RNA editing analysis used in the examples to measure the relative editing efficiency of sample retrotransposons in whole RNA form, which were modified with a trans- guide RNA addition step. Electroporation was performed in HEK293T cells using the three-RNA system + sgRNA trans-addition. Appropriate amounts of RT mRNA, ncRNA-sgRNA fusion, Cas9 mRNA and sgRNA were mixed and electroporated into cells. After 72 hours of incubation, genomic DNA was extracted and the targeted region was expanded into the sequencing library. Sequencing data were analyzed by CRISPResso2 and precise edits and indels were calculated. Figure 49 shows the results of guide RNA addition in the full RNA system (RT mRNA + ncRNA-sgRNA fusion + Cas9 mRNA + sgRNA) delivered to HEK293T cells by electroporation. At the given amount of Cas9 and RT mRNA in the figure, the amount of guide RNA addition was titrated at 50, 100 and 200 ng. Titration was performed at two different RT mRNA: ncRNA-sgRNA fusion ratios = 1:6 or 1:8. Guide RNA addition in the all-RNA system increases precise editing by up to about 50-fold. Increasing the amount of guide RNA gradually increases precise editing and the effect of 1:8 RT mRNA:ncRNA-sgRNA fusion is slightly better than 1:6, achieving 13% precise editing. The right figure shows the insertion and deletion frequency under various conditions. Figure 50 shows the lipid transfection process using the three-RNA system + gRNA trans-addition in HEK293T cells. The day before transfection, cells were inoculated in 96-well plates. The appropriate amount of RT mRNA, ncRNA-sgRNA fusion, Cas9 mRNA, sgRNA and Lipofectamine reagent were mixed and transferred to the cells. After 72 hours of incubation, the genomic DNA was extracted and the targeted region was expanded to the sequencing library. Sequencing data were analyzed by CRISPResso2 and precise edits and indels were calculated. Figure 51 shows the results of guide RNA addition in the whole RNA system (RT mRNA + ncRNA-sgRNA fusion + Cas9 mRNA + sgRNA) delivered to HEK293T cells by lipid transfection (i.e., a single molecular cluster of guide RNA was delivered. At the given amount of RT mRNA, ncRNA-sgRNA fusion and Cas9 mRNA in the figure, at 2, 5 and 10 The amount of guide RNA added was titrated at 0, 2, 5, 10, 50 and 100 ng. The addition of guide RNA in the all-RNA system increased precise editing by up to 3.5-fold and the efficiency by 12%. Increasing the amount of guide RNA within this range did not further increase precise editing. Precise editing is completely dependent on the presence of the retrotranscriptome machinery. The right graph shows the insertion and deletion frequencies under the respective conditions. Figure 52 shows the results of ncRNA-sgRNA fusion separation in the all-RNA system (RT mRNA + Cas9 mRNA + ncRNA-sgRNA fusion or ncRNA + sgRNA alone) delivered to HEK293T cells by lipid transfection. At the given amounts of RT mRNA and Cas9 mRNA in the figure, the amount of guide RNA added was titrated at 0, 2, 5, 10, 50 and 100 ng. At 10 ng guide RNA, accurate editing peaked at 2.23%, compared to 1.78% for ncRNA-sgRNA fusions. Increasing the amount of guide RNA within this range did not further increase accurate editing. The right panel shows the indel frequency under each condition. Figure 53 is a graph used for in vitro transcription to generate ncRNA Schematic diagram of improved templates (left side or A) and ncRNA modifications (right side or B). (A) is about optimizing RNA production by in vitro transcription. Previous in vitro transcription experiments to produce RNA used double-stranded DNA templates containing a 3' overhang (on the same strand as the T7 promoter sequence). New templates with blunt ends were designed and tested and resulted in increased precision editing efficiency as shown in Figure 54. (B) is about modified ncRNA, which was modified by adding an MS2 stem loop hairpin to the 3' end of the ncRNA. Without being bound by theory, the MS2 loop helps stabilize the ncRNA and results in significantly improved precision editing efficiency as shown in Figure 54. Figure 54 shows the optimization of RNA production by Lipofectamine Results of ncRNA-sgRNA fusion isolation in a 4-component total RNA system (RT mRNA + Cas9 mRNA + ncRNA + sgRNA) delivered to HEK293T cells by MessengerMAX. Total RNA was transfected at fixed amounts as shown. Using RNA generated from a linearized plastid template containing a 3' overhang, 1.35% precise edits were produced. Using RNA generated from a modified linearized plastid template containing blunt ends increased precise edits to 5.94%. Adding an MS2 stem loop to the 3' end (blunt end) of the ncRNA further increased precise edits to 12.39%. The right panel shows the indel frequencies under the respective conditions. Figure 55 (SEQ ID NO: 19969) provides a schematic diagram of end protection of RNA from cellular nuclease activity by capping and tailing. In (A), 7-methylguanosine cap0 is added to the 5' triphosphate of the RNA. In (B), a poly A tail is added to the 3' end by enzymatic addition. The tail length is estimated to be over 50 nucleotides. In (C), RNA containing both a 5' cap and a 3' tail is shown. The results are shown in Figure 56. Figure 56 shows the results of end protection of ncRNA-sgRNA fusions by caps and tails in a 4-component all-RNA system (RT mRNA + Cas9 mRNA + ncRNA + sgRNA) delivered to HEK293T cells by Lipofectamine MessengerMAX. Total RNA was transfected with fixed amounts of RT mRNA 100 ng, ncRNA-sgRNA 400 ng, Cas9 mRNA 100 ng, and sgRNA 5 ng. The ncRNA-gRNA fusions were capped (+cap-tail) or poly-A-tailed (-cap+tail), or both capped and poly-A-tailed (+cap+tail). Using RNA without end protection (-cap-tail) resulted in approximately 4.5% precise editing, and editing was retrotranscript-dependent, as RT deficiency abolished precise editing. Using RNA with either or both cap and tail protection resulted in lower precise editing (left panel), but reduced indels (right panel), compared to the case without cap and tail. Figure 57 shows the results that shortening the msd stem can regulate precise editing in a retrotranscript-specific manner. The modified retrotranscripts include RTX3_4536 (long (L) and short (S) forms), RTX3_6279 (long (L) and short (S) forms), RTX3_6342 (long (L) and short (S) forms), RTX3_6438 (long (L) and short (S) forms), RTX3_6549 (long (L) and short (S) forms) and RTX3_6605 (long (L) and short (S) forms). Figure 58 shows that shortening the msd stem can regulate the results of precise editing in a retrotranscript-specific manner. The modified retrotranscripts include RTX3_5752 (long (L) and short (S) forms), RTX3_6221 (long (L) and short (S) forms) and RTX3_6034 (long (L) and short (S) forms). Figure 59A demonstrates that four retrotransposons from multiple evolutionary branches can insert up to 100 bp into the EMX1 locus. The figure shows the test results of different template lengths of four different retrotransposons in a total RNA system delivered to HEK293T cells by lipofection. Under a given amount of RT mRNA and Cas9 mRNA, ncRNA-sgRNA fusions with different template lengths ranging from 10 to 100 bp were added. For 100 bp insertion, the precise editing of Eco3 was 1.04%, the precise editing of Aco1 was 1.52%, the precise editing of R2042 was 1.37%, and the precise editing of R6943 was 0.05%. In the bottom row of each figure, the insertion and deletion frequencies under the respective conditions are shown. Figure 59B demonstrates that four retrotransposons from multiple evolutionary branches can insert up to 100 bp into the EMX1 locus. The figure shows the indel results corresponding to the constructs of Figure 59A, illustrating the indel frequencies under individual conditions. Figure 60A demonstrates that additional sgRNAs targeting the same genome locus increase indel and insertion frequencies. This figure reports the same conditions as Figures 59A-59B, but with the addition of sgRNAs, which increase the frequency of precise editing and indels. For 100 bp insertions, the precise editing of Eco3 was 1.82%, the precise editing of Aco1 was 4.50%, the precise editing of R2042 was 4.40%, and the precise editing of R6943 was 0.38%. In the bottom row of each figure, the insertion and deletion frequency under each condition is shown. Figure 60B proves that additional sgRNA targeting the same genome locus will increase the insertion and deletion and insertion frequency. This figure reports the insertion and deletion frequency associated with the retrotransposons of Figure 60A. Figure 61A is a schematic diagram showing unedited alleles and alleles edited by inserting the GFP gene at the EMX1 locus using the retrotransposons editing system. The primers for EMX1 forward and reverse only cross over the 5' and 3' homologous arms and extend 169 bp on the unedited allele. On the edited allele, it extends 1433 bp across the GFP gene insertion. The primers for detecting 5' and 3' junctions are indicated on the edited allele. The 5' junction on the edited allele was amplified by the following primer pair: EMX1 forward and 5' junction GFP reverse, and the 3' junction was amplified by the following primer pair: 3' junction GFP forward and EMX1 reverse. Neither the 5' junction GFP reverse primer nor the 3' junction GFP forward primer binds to the unedited allele. Figure 61B demonstrates that three retrotransposons from multiple evolutionary branches are capable of GFP integration at the EMX1 locus. The figure provides qPCR data depicting the insertion of GFP at the 5' or 3' junction of the three retrotransposons in the EMX1 locus. The ncRNA-sgRNA engineered to contain the GFP gene insert was transfected into HEK293T cells with/without additional sgRNAs, together with the RT mRNA and Cas9 mRNA of each retrotransposon. The genomic DNA was analyzed by the indicated primer pairs to detect the GFP gene insertion at the 5' or 3' junction on the edited allele. The ΔCt was obtained by subtracting the Ct value of the +RT sample from the Ct value of the -RT sample for relative quantification of GFP insertion at the EMX1 locus. The larger the ΔCt value, the higher the insertion frequency in the indicated sample. For all three retrotransposons, 5' and 3'-joined amplicon were detected to be significantly higher than the background level. The 3'-joined amplicon is more abundant than the 5'-joined amplicon, probably because the 5' end of the msDNA produced by retrotransposition is more enriched, which promotes the insertion at the 3'. Figure 61C proves that the two retrotransposons are able to integrate GFP at the EMX1 locus. The figure provides qPCR data depicting the insertion of GFP at the 5' or 3' junction in the EMX1 locus by two retrotranscripts. ncRNA-sgRNAs engineered to contain GFP gene inserts were transfected into HEK293T cells along with the RT mRNA and Cas9 mRNA of each retrotranscript with/without additional sgRNA. Genomic DNA was analyzed by the primer pairs indicated in the table to detect GFP gene insertion at the 5' or 3' junction on the edited allele. ΔCt was obtained by subtracting the junction Ct value from the EMX1 Ct value for relative quantification of GFP insertion at the EMX1 locus. For all two retrotranscripts, amplicon at the 5' and 3' junctions was detected significantly above background levels. The amplicon at the 3' junction is slightly more abundant than the amplicon at the 5' junction, probably because the 5' end of the msDNA generated by reverse transcription is more enriched, promoting insertion at the 3'. Figure 61D shows that the PCR product of Aco1 was run on an Agilent Tapestation high sensitivity D1000 screentape. In the presence of RT (+RT sample, left side of gel), amplicon of the expected size was detected at both the 5' and 3' junctions of the edited allele, as shown by the arrows. In the absence of RT (-RT sample, right side of gel), the edited allele was not detected. The triangle indicates the expected amplicon of the unedited allele, which is present in both -RT and +RT samples. Figure 62A demonstrates that sgRNA and ncRNA sequences in fusion RNAs where the sgRNA is located upstream of the ncRNA are more active in terms of precise editing and indels. Results of testing different ncRNA-sgRNA fusion sequences of three different retrotransposons in a whole RNA system delivered to HEK293T cells by lipofection. At a given amount of RT mRNA and Cas9 mRNA, ncRNA fusions with different sequences were added. For Eco3, Aco1, and R2042, the increase in precise editing was higher when sgRNA-ncRNA was used compared to ncRNA-sgRNA. In the bottom row of each figure, the indel frequency under the respective conditions is shown. Figure 62B demonstrates that in the case where the sgRNA is located downstream of the ncRNA, the additional sgRNA compensates for the lower editing activity of the configuration. Same as the previous figure, but with the addition of sgRNA. The difference between ncRNA-sgRNA and sgRNA-ncRNA in the presence of additional sgRNA depends on each retrotranscript. For Eco3 and R2042, sgRNA-ncRNA is more effective than ncRNA-sgRNA. For Aco1, ncRNA-sgRNA is slightly more effective than sgRNA-ncRNA. In the bottom row of each figure, the insertion and deletion frequency under the respective conditions is shown. Figure 63 demonstrates that RT and Cas9 are required for precise editing. Test results of negative controls in a whole RNA system delivered to HEK293T cells by lipid transfection. RT and Cas9 are required for precise editing. When any of these components is removed, precise editing cannot be detected or is at background levels. In the bottom graph, the insertion and deletion frequencies under the respective conditions are shown. Figure 64 demonstrates that the separation of ncRNA and gRNA does not lead to a decrease in the frequency of precise editing. Results of testing ncRNA-sgRNA fusions of four different retrotranscripts against single ncRNAs in a whole RNA system delivered to HEK293T cells by lipofection. ncRNA/ncRNA-sgRNA/sgRNA-ncRNA were added with a given amount of RT mRNA, Cas9 mRNA, and additional sgRNA as described below the figure. All inserted templates tested were 25 bp. The optimal ncRNA format depends on each retrotranscript. For Eco3, the highest precise editing (9.88%) was observed when using single ncRNA + additional sgRNA. For Aco1, the highest precise editing was observed when ncRNA-sgRNA + additional sgRNA was used (2.96%). For R2042, the highest precise editing was observed when ncRNA alone + additional sgRNA was used (7.4%). For R6943, the highest precise editing was observed when ncRNA-sgRNA fusion + additional sgRNA was used (0.39%). In the bottom row of each figure, the indel frequency under each condition is shown. Figure 65 demonstrates that retrotranscripts can achieve precise editing at the AAVS1 locus. Test results of four different retrotranscripts inserting a 25 bp template into the AAVS1 locus. Retrotranscripts were delivered as whole RNA to HEK293T cells by lipofection. For all four retrotranscripts, the most precise editing was observed in the presence of additional sgRNA and ranged from 0.73% to 2.05%, depending on the retrotranscript used. On the right, the indel frequencies under the respective conditions are shown. Figure 66 demonstrates that retrotranscripts can achieve precise editing at the AAVS1 locus. Negative control results for the AAVS1 experiment in the previous figure. Removing RT, ncRNA-sgRNA, or Cas9 mRNA abolishes precise editing. In the bottom row of each figure, the indel frequencies under the respective conditions are shown. Figure 67 demonstrates that templates of either the target strand or the non-target strand can be used for precise editing. Results of testing templates on different strands of two different retrotransposons versus the Cas9 sgRNA target strand in an all-RNA system delivered to HEK293T cells by lipofection. ncRNA-sgRNA was added at given amounts of RT mRNA, Cas9 mRNA, and additional sgRNA. All insert templates tested were 25 bp. All previous figures encode insert templates on the same strand targeted by the Cas9 sgRNA, indicated by the letter T. For Eco3, in the presence of additional sgRNA, precise editing was 1.67% when the template was on the target strand and 0.85% when the template was on the non-target strand. For Aco1, in the presence of additional sgRNA, precise editing was 2.96% when the template was on the target strand and 3.40% when the template was on the non-target strand. In the bottom row of each figure, the indel frequency under each condition is shown. Figure 68 demonstrates that retrotranscript-mediated precise editing is observed using Cas9 nickase in the 4-component all-RNA system. Cas9 nickase activity in the 4-component all-RNA system (RT mRNA + Cas9 mRNA + ncRNA/ncRNA-sgRNA + sgRNA) delivered to HEK293T cells by Lipofectamine MessengerMAX. Total RNA was transfected with fixed amounts of RT mRNA 100 ng, ncRNA/ncRNA-sgRNA 400 ng, Cas9 variant mRNA 100 ng, and sgRNA 5 ng. Isolated ncRNA or ncRNA-sgRNA fusion was used. When ncRNA or ncRNA-sgRNA was used with additional sgRNA, the precise editing of Cas9 WT (wild type) was about 9%. Using the Cas9 H480A mutant that nicks the non-target strand, almost no activity above background was seen under any condition. Using the Cas9 D10A mutant that nicks the target strand, precise editing was observed at a frequency of about 0.1% in both isolated and fused ncRNA and sgRNA formats. The right panel shows the indels under the respective conditions of the left panel. As expected, indels were extremely low for any nickase in the presence or absence of additional sgRNAs, and only WT Cas9 generated significant indels. Figure 69 demonstrates that precise retrotranscript-mediated editing observed by Cas9 nickase depends on each component of the all-RNA system. Same as the previous figure, but negative controls were tested that lacked one component of the all-RNA system with or without additional sgRNAs (Cas9 or ncRNA/ncRNA-sgRNA or RT). In the top figure, all control conditions tested showed background precise editing activity, indicating that precise editing achieved by Cas9 WT or Cas9 nickase depends on each component of the all-RNA system. In the bottom figure, significant indels were detected when only Cas9 WT was used. Figure 70 demonstrates that dual sgRNAs increase precise editing but not indels. The figure shows the results of testing dual sgRNAs of R2042 in a whole RNA system delivered to HEK293T cells by lipofection. Additional sgRNAs were added at given amounts of RT mRNA, Cas9 mRNA, and ncRNA-sgRNA. Under all conditions, the same ncRNA-sgRNA was used, whose template encoded a 25 bp insertion on the non-target strand of the Cas9 sgRNA. When one additional sgRNA (#1) was used, precise editing was 0.27%, which increased to 1.33% (#2) or 1.13% (#3) in the presence of a second sgRNA. In the bottom figure, the indel frequency under the respective conditions is shown. See Example 9. FIG. 71 Precise editing and indel results in HEK293T cells administered by lipid transfection of the R6083-based all-RNA retrotranscript editing system in various configurations. Configurations included testing of different template lengths (25 bp – 100 bp) using fusion ncRNA-sgRNA constructs (with and with spiked sgRNA) as well as separated ncRNA and sgRNA constructs. With given amounts of RT mRNA and Cas9 mRNA, ncRNA-sgRNA fusions with different template lengths ranging from 25 to 100 bp or ncRNA with a 25 bp insertion were added. Precise insertion of 100 bp was observed with an efficiency of approximately 2%. Separated ncRNAs showed lower activity than ncRNA-sgRNA fusions with a 25 bp insertion. Precise editing depends on the presence of RT and ncRNA, as shown in the top right figure. In the bottom row of each figure, the insertion and deletion frequencies under the respective conditions are shown. Figure 72 Results of testing the R6083 retrotranscript to insert a 25 bp template into the AAVS1 locus in HEK293T cells. The retrotranscript (ncRNA-sgRNA fusion), retrotranscript RT and Cas9 components were delivered to HEK293T cells as whole RNA by lipofection. Using additional sgRNA, an editing efficiency of 3% was observed, and the activity was completely dependent on the presence of RT or ncRNA. On the right, the insertion and deletion frequencies under the respective conditions are shown. Figure 73 demonstrates that the separation of Cas9 and RT enzymes shows higher editing efficiency than when they are fused to Eco3 in retrotranscript editing. Results of precise editing of Eco3 against Cas9 and RT fusions in an all-RNA system delivered to HEK293T cells by Lipofectamine MessengerMAX. Cas9 or RT mRNAs were isolated at the same molar concentration as the Cas9-RT fusion format, and the editing efficiency of the fused or isolated enzymes was compared against Eco3 ncRNA (left side on the left figure) or Eco3 ncRNA-sgRNA fusion (right side on the right figure). When both ncRNA and ncRNA-sgRNA were used, isolated Cas9 and RT achieved higher editing efficiency than when they were fused. The right figure shows the indels under the respective conditions of the left figure. Figure 74 demonstrates that both cap and tail modifications of the ncRNA significantly increased the editing efficiency by about 2-fold. Results of end protection of ncRNA-sgRNA fusions by capping and tailing in a 4-component total RNA system (RT mRNA + Cas9 mRNA + ncRNA + sgRNA) delivered to HEK293T cells by Lipofectamine MessengerMAX. Total RNA was transfected with fixed amounts of RT mRNA 100 ng, ncRNA-sgRNA 400 ng, Cas9 mRNA 100 ng, and sgRNA 5 ng. ncRNA-gRNA fusions were capped (+cap–tailing) or poly-A-tailed (-cap+tailing), or both capped and poly-A-tailed (+cap+tailing). Using RNA without end protection (-cap–tailing) resulted in approximately 4.5% precise editing, and editing was dependent on the retrotranscript, as RT deficiency abolished precise editing. Although capping alone does not change editing efficiency, tailing alone slightly increases editing, and both capping and tailing significantly increase editing efficiency by about 2-fold (left figure). In the right figure, indels under all conditions are comparable. Figure 75 Schematic diagram of ncRNA circularization, as demonstrated in Figure 76, ncRNA circularization increases editing efficiency. Step 1. The ncRNA is flanked by internal homologous sequences, introns, and another homologous sequence toward both ends. Step 2. The two homologous sequences bring the two ends closer. Exogenous GTP initiates a series of transesterification Step 3. Introns are removed and the ncRNA is joined into a circular form. FIG. 76 Results of testing for precise editing of modified ncRNAs (3'MS2 modified and circular ncRNA) in a 4-component total RNA system (RT mRNA + Cas9 mRNA + ncRNA + sgRNA) delivered to HEK293T cells by Lipofectamine MessengerMAX. Total RNA was transfected with fixed amounts of 100 ng RT mRNA, 400 ng ncRNA, 100 ng Cas9 mRNA, and 5 ng sgRNA. When the MS2 stem loop was added to the 3' end of the ncRNA, the activity was almost doubled relative to the unmodified ncRNA (approximately 8%), with an efficiency of 15%. Circularization of the ncRNA achieved a further increase in efficiency, resulting in an editing efficiency of 22%. Circularization of the ncRNA was performed as described in FIG. 75. Without sgRNA, precise editing and indels cannot be detected as expected. The right figure shows indels under the respective conditions of the left figure. The results of Figure 77 show that the pairing between RT and its homologous ncRNA is required for precise editing. The pairing results of RT and ncRNA from the same or different retrotransposons are used for precise editing of the EMX1 locus in HEK293T cells by the all-RNA system. All RNA was transfected with a fixed amount of RT mRNA 100 ng, ncRNA 400 ng, Cas9 mRNA 100 ng and sgRNA 5 ng. In the left figure, Aco1 RT is paired with Eco3 ncRNA or its homologous Aco1 ncRNA (left figure). Eco3 RT is paired with Aco1 ncRNA or its homologous Eco3 ncRNA (right figure). The system supports precise editing only when the RT is paired with its cognate ncRNA. The right figure shows the indels under the respective conditions of the left figure. Figure 78 Retrotranscript editing supports the insertion of exon-sized long inserts up to 305 bp. Test results of exon-sized long inserts at the EMX1 locus using Aco1, Eco3 and R1262 retrotranscripts. Aco1 achieved precise insertions of 10, 100 and 205 bp with 8, 4 and 1.3% (left figure). Eco3 performed precise insertions of 10 and 205 bp with 12 and 2.3%, respectively. The novel retrotranscript R1262 obtained 25, 205 and 305 bp insertions with efficiencies of 20, 15 and 8.8%. The bottom figure shows the indels under the respective conditions of the top figure. Figure 79 demonstrates that the novel retrotranscript R1262 inserts a 205 bp insertion at AAVS1 with over 11% efficiency. Test results of a 205 bp insertion at the AAVS1 locus using the novel retrotranscript R1262. R1262 obtained 25 and 205 bp insertions with 25% and 11.1% efficiency, respectively. As a control, accurate editing depends on the presence of RT and sgRNA. The right figure shows the indels under the respective conditions of the left figure. Figure 80 Optimization of the RT: ncRNA ratio to install 205-305 bp insertions with accurate editing by R1262. Test results of the RT: ncRNA ratio for >200 bp long insertions at the EMX1 locus using the novel retrotranscript R1262. RT: ncRNA ratios were tested at 1:12.5, 1:16.7, and 1:25 molar ratios. The 1:16.7 ratio corresponds to the 1:4 mass ratio used in the rest of the study. When using ncRNA (right side in the left figure), the best editing was observed at a 1:12.5 ratio for 205 and 305 bp insertions. When using ncRNA-sgRNA fusions (left side in the left figure), all ratios tested produced comparable edits, but a slight trend of increased editing was seen as the amount of ncRNA-sgRNA increased. The right figure shows the indels under the respective conditions of the left figure. Figure 81 Double-stranded DNA made based on NHEJ insertion of the retrotransposon at target site A. Some retrotranscripts may use msR-msD hybrids as primers (e.g., Sen1) to generate double-stranded DNA. The desired sequence inserted into the genome by the flanking guide RNA recognition sequence is integrated into the msD of the ncRNA. The two ends of the double-stranded DNA generated by reverse transcription are trimmed by the nuclease-guide RNA complex and inserted into the target site cleaved by the nuclease-guide RNA complex. B. ncRNA is designed to contain a sequence inserted into the target site by the flanking guide RNA recognition sequence as in A. The second ncRNA includes the reverse complementary sequence of the first ncRNA so that the two form a duplex. This type of double-stranded DNA is trimmed by the nuclease-guide RNA complex and inserted into the target site cleaved by the nuclease-guide RNA complex. Figure 82A Non-coding RNA optimization A. There are multiple structural components in the retrotranscript ncRNA, 1. a1/a2 inverted repeat sequence 2. msR stem loop 3. msD stem loop 4. msR DNA. Fine-tuning the strength of these structural elements by changing the GC content, length and number of the stem loop may enhance msD DNA production and subsequent gene editing. Figure 82B Non-coding RNA optimization B. Separation of msR RT primer and msD RT template allows chemical modification of msR RT primer. In combination with end protection of msD RT template by cap and tail, the stability of ncRNA may be enhanced. The use of reverse complementary sequences at the 3' end of the RT primer and the 5' end of the msD RT template (depicted in dotted lines) will stabilize the primer/template complex. Figure 83A provides a schematic outline of a modified retrotranscript gene editing system comprising engineered variant reverse transcriptases (RTs) with improved fidelity and/or processability. The engineered RTs are generated by introducing selective amino acid substitutions at residue positions in the retrotranscript RT sequence that are heterologous to those altered residues in the murine leukemia virus (MMLV) RT, thereby improving the fidelity and processability of the MMLV RT. In step (a), a structural alignment is constructed between the retrotranscript RT and an engineered MMLV RT having one or more amino acid substitutions associated with improved fidelity and processability in the context of the MMLV RT. In step (b), the alignment is checked to identify heterologous homologous amino acid residues in the retrotranscript RT that correspond to the substituted MMLV RT amino acid. Once identified, conventional recombinant engineering methods are used to construct an engineered retrotranscript RT comprising one or more corresponding heterologous homologous amino acid substitutions. In step (c), unnecessary residues may be deleted as appropriate and based on the structural alignment information. In one embodiment, the retrotranscript gene editing system may include ncRNA, a guide, a programmable nuclease (or an RNA encoding it), and an engineered RT (or an RNA encoding it) in (d). In step (e), the components (which may be in the form of whole RNA) are delivered to the cell (e.g., by LNP delivery), which results in (f) targeted single-strand (in the case of nickase) or double-strand cleavage, followed by (g) targeted repair at the cleavage site by the reverse transcriptase of the ncRNA. This results in (h) DNA with precise editing. Figure 83B provides an overview of the mutations installed in RTX3_1262 RT and RTX3_6242 and Eco1 RT, which are heterologous to known beneficial amino acid substitutions reported in the literature for MMLV. In addition, the table provides reported phenotypic changes associated with mutations and mutant domains in MMLV. Figure 83C provides a structural alignment between RTX3-6342 RT (yellow) and MMLV-RT (protein database 4MH8) (green), which shows that the N-terminus of RTX3-6342 RT can be truncated, for example at the proposed truncation point. Figure 83D shows the three-dimensional structure of MMLV RT near Q190 and the corresponding orthogonal position Q161 in Eco1. In the top picture, glutamine is conserved in both MMLV and Eco1. The middle picture shows that the Q161 position in Eco1 is involved in nucleic acid contacts. The bottom shows that the engineered MMLV variant with the Q190F mutation increases fidelity by about 2 times. The corresponding substitution at Q161 in Eco1 (i.e., the Q161F mutation) can also lead to increased fidelity of Eco1 RT. Figure 83E Mutations that improve processability in MMLV RT and Eco1. The three-dimensional structure of E302 of MMLV RT and the corresponding residue G283 in Eco1 is provided, both of which appear to be involved in binding to nucleic acids. Residues in the nucleic acid binding domain can be mutated to force stronger interactions. MMLV E302R generates a positive charge to interact with negatively charged templates. Eco1 G283 is in a conserved DNA binding alpha helix. Mutating G283 and heterologous retrotranscript residues to positively charged amino acids can improve processability. Figure 83F Mutations that improve processability in MMLV RT and Eco1. Provided are the three-dimensional structures of T306 of MMLV RT and the corresponding residue F287 in Eco1, both of which appear to be involved in binding to nucleic acids. Residues in nucleic acid binding domains can be mutated to force stronger interactions. MMLV T306K generates a positive charge to interact with negatively charged templates. Eco1 F287 is in a conserved DNA binding alpha helix. Mutating F287 and heterologous retrotranscript residues to positively charged amino acids can improve processability. Figure 84 describes the preparation of variant retrotranscriptor RTs by saturation mutagenesis induction of targeted domains involved in processability and fidelity (e.g., palm, finger, and thumb domains). Engineered RTs are generated by inducing saturation mutations into the palm, finger, and thumb domains in the retrotranscript RT, which domains are identified by structural alignment with the MMLV RT. In step (a), a structural alignment is constructed between the retrotranscript RT and the engineered MMLV RT, which has one or more amino acid substitutions associated with improved fidelity and processability in the context of the MMLV RT, particularly in the palm, finger, and thumb domains. In step (b), the palm, finger, and thumb domains are identified. In step (c), mutation induction is performed on the palm, finger and/or thumb domains using a saturation mutation induction method, and the resulting mutants are screened in an activity assay to select one or more lead variants with increased processibility and/or fidelity. In step (e), a component including the lead retrotranscript RT variant (which may be in the form of whole RNA) is delivered to the cell (e.g., by LNP delivery), which results in (f) targeted single-strand (in the case of nickase) or double-strand cleavage, followed by (g) targeted repair at the cleavage site by the retrotranscribed product of the ncRNA. This results in (h) DNA with precise editing. FIG. 85A depicts the preparation of variant retrotranscript RTs by constructing chimeric RT proteins that fuse the "Y region" of a retrotranscript RT (i.e., the region reported to be responsible for binding to the msr region of a retrotranscript ncRNA) to another RT (e.g., MMLV RT) or replace the "Y region" of one retrotranscript RT with the Y region of another retrotranscript RT. In this way, the chimeric protein may be engineered for any particular ncRNA to have a corresponding Y region that is expected to bind to that ncRNA. FIG. 85B (SEQ ID NO: 19955-19968) (top to bottom) provides a multiple sequence alignment of the amino acid sequences of various retrotranscript RTs (top). The boxed region indicates the location of the conserved VTG triplet, which marks the N-terminal side of the Y region (as reported in Anna J Simon, Andrew D Ellington, Ilya J Finkelstein, Retrons and their applications in genome engineering, Nucleic Acids Research , Volume 47, Issue 21, December 2, 2019, Pages 11007–11019, which is incorporated herein by reference), which is about 90 amino acids in length. The schematic diagram below is an enlarged version of the top figure. FIG. 86 depicts the preparation of a variant retrotranscript RT (e.g., POLE1, POLD1, POLG, Pfu, or KOD) by fusing one or more processability enhancers (e.g., Sso7d and Sac7d) and/or one or more fidelity enhancers (e.g., 3'>5' exonuclease domains) to increase proofreading activity. When selecting appropriate factors to enhance the processability and/or fidelity of the retrotranscript RT, refer to Oscorbin et al., "The attachment of a DNA-binding Sso7d-like protein improves processivity and resistance to inhibitors of M-MuLV reverse transcriptase," FEBS Lett, 594: 4338-4356 and Yarnall et al., "Drag-and-drop genome insertion of large sequences without double-strand DNA cleavage using CRISPR-directed integrases," Nature Biotechnol, 2022. FIG87 Electroporation using plasmids in K562 cells. Appropriate plasmid mixtures were mixed and electroporated into cells using the Neon electroporation system. After 72 hours of incubation, genomic DNA was extracted and targeted regions were expanded into sequencing libraries. Sequencing data were analyzed by CRISPResso2 and precise edits and indels were calculated. FIG88 Results of plasmid-based analysis in K562 cells demonstrate precise edits up to approximately 1-10% and indels as low as 5-30% using RTX003_2042, 6083v1, 6943, 1262, 6342L, and 6342S. They are all novel retrotranscripts, and the precise editing activity observed in this experiment strongly supports that these novel retrotranscripts may generate msDNA in K562 cells. These data replicate the results previously seen in 293T cells, indicating that retrotranscript-mediated gene editing is not affected by cell type-specific effects. Figure 89 is based on the results of plastid analysis, which compares the retrotranscripts annotated in the literature with novel retrotranscripts in K562 cells. These data demonstrate that the novel retrotranscripts RTX3_1262, RTX3_6342L and RTX3_6342S can generate approximately 15-19% precise editing and as low as 20~35% indels. The novel retrotransposons performed as well or better than Aco1 and better than other retrotransposons validated in the literature: Eco1, Eco3, Sau1, and Sen1. Figure 90 Results of plastid-based analysis comparing RTX3_2781 to the leading retrotransposons RTX3_1262 and RTX3_6342S/L in K562 cells. These data demonstrate that RTX3_2781 installs precise edits with comparable efficiency to RTX3_1262 and 6342S/L. Figure 91 Results of plastid-based analysis in K562 cells demonstrate up to approximately 0.6% precise editing and as low as 5% indels using Cas9 D10A nickase, using RTX003_2042, 6083v1, 6943, 1262, 6342L, and 6342S. These are all novel retrotranscripts, and the precise editing activity observed in this experiment strongly supports that these novel retrotranscripts may generate msDNA in cells that can be used for nickase-mediated gene editing. Figure 92 Results of plastid-based analysis in K562 cells demonstrate up to 0.5% precise editing using LbCas12a nuclease. Precise editing in this system strongly suggests intracellular msDNA production and compatibility with Cas12a-like nucleases for gene editing. Figure 93 Results from plastid-based analysis in K562 cells demonstrate up to 0.5% precise editing using TnpB nuclease. Precise editing in this system suggests intracellular msDNA production and compatibility with TnpB-like nucleases for gene editing. Figure 94 RTX3_6342 (structure predicted from alpha-fold) (yellow) is aligned with MMLV-RT (PDB 4MH8) (green). The native RTX3_6342 RT is fused to a non-RT-related domain at the N-terminus, which may not be necessary for reverse transcription. Arrows mark potential relevant truncation points. Jumper et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2. Figure 95 Results from plastid-based analysis in K562 cells demonstrate up to ~20% precise editing and ~20% indels using RTX003_6342S and truncation variants. The precise editing activity observed in this experiment strongly supports that these truncations retain the ability to generate msDNA in cells. Figure 96 Results of plastid-based analysis in K562 cells demonstrate up to about 40% accurate editing and about 60% indels using RTX003_6342S and RTX3_1262 with different insert sizes. The accurate editing activity observed in this experiment shows that RTX3_6342S can insert sequences up to 405 bp into the EMX1 locus without reducing accurate editing. RTX3_1262 can insert up to 205 bp at the EMX1 locus before the editing efficiency decreases. During the CRISPResso2 analysis in this experiment, substitutions in the reads were ignored. Figure 97 Results from plastid-based assays in K562 cells demonstrate up to ~40% precise edits and ~40% indels when using RTX003_6342S and RTX3_1262 with different insert sizes while either ignoring or quantifying substitutions during the CRISPResso2 assay. The precise editing activity observed in this experiment suggests that RTX3_6342S can insert sequences up to 405 bp long into the EMX1 locus, but RT fidelity may limit accurate installation of intended edits (compare ignoring (circles) vs. counting (triangles) substitutions). FIG98 Results of plastid-based analysis in K562 cells screening for mutations in the RTX3_6342 reverse transcriptase that are predicted to interact with the ncRNA after structural alignment of the predicted alphafold RTX3_6342 structure and the crystal structure of the retrotranscript Eco1 with the ncRNA. Most mutations have a modest effect on the installation of shorter 10 p inserts, but only two (N465K and N465R) may increase the installation frequency of larger (305 p) inserts. FIG99 Results of plastid-based analysis in K562 cells demonstrate that heterologous homologous MMLV mutations in RTX3_6342S do not consistently increase the installation frequency of >205 bp inserts. MMLV (L139P) == RTX3_6342 (V265P), MMLV (T306K) == RTX3_6342 (M470K), MMLV (W313F) == RTX3_6342 (V477F). Figure 100 Results of a plastid-based analysis in K562 cells that screened for mutations in the RTX3_6342 reverse transcriptase that were predicted to interact with the ncRNA after structural alignment of the predicted alphafold RTX3_6342 structure and the crystal structure of the retrotranscript Eco1 with the ncRNA. E238R, E479K, and K255P are mutations that may increase the frequency of installation of the 305 bp insert at the EMX1 locus. Figure 101 Predicted structures of wild type and RTX3_6342 with different N-terminal truncations. The predicted structures were generated with Alphafold and aligned with the Eco1/CryoEM structure (79VU.pdb) using PyMoL alignment. Figure 102 Results of a plastid-based analysis in K562 cells evaluating the effect of partial or complete deletion of the RTX3_6342 N-terminal helix on the installation frequency of longer inserts. We observed that partial N-terminal truncations enhanced the precise editing/indel ratio of longer inserts. Figure 103 Results of a plastid-based analysis in K562 cells evaluating the effect of fusion of the DNA binding domain to RTX3_6342 on the installation frequency of 405 bp inserts. Fusion of the Sac7d or Sso7d DNA binding domain to the N-terminus of the retrotranscriptase resulted in a two-fold (RTX3_6342S) or ten-fold (RTX3_1262) increase in the frequency of installation of a 405 bp insert. FIG104 Results of a plastid-based assay in K562 cells that screened for retrotranscript family members similar to RTX3_6342, 6083, 6943, or 1262. It was noted that the majority of retrotranscripts with robust gene editing activity belonged to the RTX3_6342 family. FIG105 Results of testing the precise editing activity of the novel retrotranscript R2781 in an all-RNA system. In Figure 16, R2781 demonstrated 10 bp insertion at the EMX1 locus in the plastid system, with similar activity to other hits R1262, R6342S, 6342L. Here, two different homology arm (HA) lengths (double-arm 30 bp or 49 bp left arm/65 bp right arm) were used to assess the activity of R2781 inserting 25 bp, 205 bp and 405 bp at the EMX1 locus. Accurate insertion of 25 bp and 30 bp homology arms was achieved with 11% efficiency. When using longer homology arms (49/65 bp), the efficiency of insertion of the same length was reduced to half, indicating that this retrotransposons reverse transcriptase may have limited enzymatic processability. On the contrary, the activity of inserting 205 or 405 bp decreased significantly. The right figure shows the insertion and deletion under the respective conditions of the left figure. FIG. 106 Test results of the precise editing activity of the novel retrotranscript R6342S in an all-RNA system with Cas.9 D10A. Previously, it was demonstrated that retrotranscript R6342S could achieve precise editing by Cas9 WT. Here, the activity of a 25 bp insertion in the EMX1 locus was assessed in 293T cells using Cas9 D10A or Cas9 D10A R221K/N394K mutants and 10 or 50 ng sgRNA. Cas9 D10A demonstrated up to 0.4% editing, while Cas9 D10A R221K/N394K showed higher activity, up to 0.7% editing. These data indicate that retrotranscript-mediated genome insertion may be achieved using Cas9 nickase and Cas9 WT that produces double-strand breaks. The right figure shows the insertion and deletion under the respective conditions of the left figure. Figure 107 Test results of the precise deletion activity of the novel retrotranscript R1262 in the all-RNA system. In the top figure, two deletion strategies are shown. Retrotranscript ncRNAs are designed by juxtaposing left and right homologous arm sequences to delete intervening sequences. Del1 deletes 214 bp upstream of the Cas9 cleavage site at the EMX1 locus and del2 deletes 248 bp downstream of the Cas9 cleavage site. Retrotranscript-mediated deletions were compared with direct delivery of increasing amounts of single-stranded DNA (ssDNA) donors (150 and 300 ng). The bottom figure in the left figure shows that R1262 can delete 248 bp (del2) from the EMX1 locus with activity similar to that of the ssDNA donor. The right figure shows the insertion and deletion under the respective conditions of the left figure. It should be noted that the accidental insertion and deletion activity caused by the deletion mediated by the retrotranscript is about 6 times lower than that mediated by the ssDNA donor. The 214 bp (del1) deletion activity of R1262 was not detected. Figure 108 is based on the test results of non-homologous end joining (NHEJ) of the insertion activity of the novel retrotranscripts R1262 and R6342 in the all-RNA system. On the left, the strategy of deriving double-stranded DNA (dsDNA) is depicted, which is the substrate of the NHEJ mechanism. 1. Retrotranscript reverse transcriptase generates complementary single-stranded DNA from two ncRNAs containing sense or antisense insertion sequences. 2. Complementary single-stranded DNA forms a duplex 3. The Cas9-sgRNA complex removes the Y-shaped single-stranded parts (msR and msD spacer sequences) at both ends. 4. The blunt-ended double-stranded DNA trimmed by Cas9 is successfully inserted into the target site. Including sgRNA recognition sequences (the same as the genome target sequence) in tandem at both ends of the insert will result in blunt-ended double-stranded DNA, and only one insertion direction allows for more stable integration by preventing Cas9 from re-cutting. Using this strategy, R1262 with two complementary ncRNAs integrated approximately 120 bp dsDNA at the EMX1 locus with an efficiency of approximately 1%, and the efficiency was even higher when the inserted sequence was flanked by two tandem sgRNA sequences (right side of the top figure). The basal insertion activity of R6342 without the tandem sgRNA sequence is slightly higher than that of R1262. The right side of the bottom graph shows the insertions and deletions under the respective conditions of the top graph. Figure 109 Test results of immune response to retrotranscript RNA. Human peripheral blood mononuclear cells (hPBMCs) were used to assess immune responses because PBMCs contain innate and adaptive immune cells and are equipped with sensors to detect exogenous nucleic acids. Frozen hPBMCs were thawed and left overnight. As indicated by the labels, RT mRNAs with U or m1Ψ and ncRNAs with/without cap0 (m7G) at the 5' end were electroporated individually or together, while unmodified GFP mRNA from TriLink was used as a control. After overnight culture, the supernatant was analyzed for cytokine production. Of the 25 interleukins and chemokines examined, those detected above the detection limit were shown. Type I interferons (a marker of immune response to exogenous nucleic acids) were not detected in any retrotranscript RNA transfected cells, and low levels of some inflammatory interleukins and chemokines were detected in retrotranscript RNA transfected cells, but their levels were much lower than the control GFP mRNA except for the RT mRNA without U modification. Figure 110 Results of testing retrotranscript-mediated gene editing in human stem progenitor cells (HSPC) in the all-RNA system. Bone marrow-derived CD34+ human stem progenitor cells (HSPC) were used to evaluate the efficacy of retrotranscript-mediated gene editing in primary cells. Frozen HSPCs were thawed and expanded for three days in the presence of hSCF, hFLT3-L, and hTPO cytokines to prevent differentiation. Cas9 mRNA, guide RNA (gRNA), R6342 RT mRNA, and ncRNA were mixed at the mass ratio indicated in each tag and electroporated to HSPCs using two different programs using a Lonza electroporator. After another three days of culture, genomic DNA was extracted and sequenced to measure the frequency of editing. Using a total of 5 micrograms of RNA (separated by 1: 0.3: 4: 1 = Cas9: gRNA: ncRNA: RT), a precise insertion of 25 bp at the AAVS1 locus was observed at a frequency of 0.1% (left figure). The right figure shows the insertion and deletion under each condition. Figure 111 Test results of retrotranscript-mediated gene editing in human T cells in the all-RNA system. Human all-T cells from peripheral blood were used to evaluate the efficacy of retrotranscript-mediated gene editing in primary cells. Frozen T cells were thawed and activated for two days in the presence of anti-CD3/anti-CD28-conjugated magnetic beads and IL-2 cytokines. Cas9 mRNA, guide RNA (gRNA), R6342 RT mRNA and ncRNA were mixed at the mass ratio indicated in each label and electroporated into T cells by Neon or Lonza electroporator. After another three days of culture, genomic DNA was extracted and sequenced to measure the editing frequency. Using a total of 3 micrograms of RNA (separated by 1: 0.3: 3: 1 = Cas9: gRNA: ncRNA: RT), a precise insertion of 25 bp at the AAVS1 locus was observed with a frequency of up to 1.7% using the Neon machine (left figure). The right figure shows the insertion and deletion under their respective conditions. Figure 112 provides a schematic diagram of the retrotranscript ncRNA library. Variants of different elements in ncRNA (modified a1: a2 stem, msR, msD regions) are associated with unique barcodes, and the library is synthesized as an oligo library. By sequencing the barcode inserted at the genome locus, the efficiency of related ncRNA variants in human cells can be measured. Figure 113 summarizes the variant library of Example 31. Figure 114 provides a schematic diagram depicting the screening process for screening the ncRNA variant library described in Example 31. Figure 115 provides a scatter plot summarizing the results of Figure 31. In the scatter plot, each point represents a variant, where the relative genome insertion level is on the y-axis and the ssDNA production level is on the x-axis. The dotted line represents the genome insertion and ssDNA production levels of the WT R6342S control. Therefore, the ncRNA variants in the upper right portion of the dotted line are superior to the WT R6342S control on both levels. It can be seen that most of these variants have a1a2 or msD stem elements. We identified 13 such variants that showed >1.5-fold increases in both genomic insertion and ssDNA production levels compared to the WT R6342S control.

TW202426060A_112132199_SEQL.xmlTW202426060A_112132199_SEQL.xml

Claims (48)

一種包含一或多種遞送媒劑之基因編輯系統,其中: 該(等)遞送媒劑包含RNA貨物, 該RNA貨物包含(a)至少一種編碼(i)核酸可程式化核酸酶及(ii)逆轉錄子逆轉錄酶之mRNA分子,(b)經工程改造之逆轉錄子ncRNA,及(c)用於該核酸可程式化核酸酶之引導RNA, 每種遞送媒劑含有(a)(i)及/或(a)(ii)及/或(b)及/或(c), 藉此一種遞送媒劑或超過一種遞送媒劑遞送(a)(i)、(a)(ii)、(b)及(c)。 A gene editing system comprising one or more delivery vehicles, wherein: The delivery vehicle(s) comprises an RNA cargo, The RNA cargo comprises (a) at least one mRNA molecule encoding (i) a nucleic acid programmable nuclease and (ii) a retrotranscriptase, (b) an engineered retrotranscript ncRNA, and (c) a guide RNA for the nucleic acid programmable nuclease, Each delivery vehicle contains (a)(i) and/or (a)(ii) and/or (b) and/or (c), and (a)(i), (a)(ii), (b) and (c) are delivered by one or more delivery vehicles. 如請求項1之基因編輯系統, 其中該經工程改造之逆轉錄子ncRNA包含取代至逆轉錄子ncRNA中之HDR核苷酸序列; 其中該逆轉錄子逆轉錄酶具有與表A之逆轉錄子逆轉錄酶包含至少90%序列一致性之胺基酸序列; 其中該逆轉錄子ncRNA與表B之逆轉錄子ncRNA具有約85%至98%序列一致性。 A gene editing system as claimed in claim 1, wherein the engineered retrovirus ncRNA comprises an HDR nucleotide sequence substituted into the retrovirus ncRNA; wherein the retrovirus reverse transcriptase has an amino acid sequence having at least 90% sequence identity with the retrovirus reverse transcriptase of Table A; wherein the retrovirus ncRNA has about 85% to 98% sequence identity with the retrovirus ncRNA of Table B. 如請求項2之基因編輯系統,其中該逆轉錄子ncRNA及該逆轉錄子逆轉錄酶來自同一進化枝。A gene editing system as claimed in claim 2, wherein the retrovirus ncRNA and the retrovirus reverse transcriptase are from the same evolutionary branch. 如請求項2之基因編輯系統,其中該逆轉錄子ncRNA核苷酸序列與SEQ ID NO:15327具有約85%至98%序列一致性,且該逆轉錄子逆轉錄酶與I-C型逆轉錄子逆轉錄酶具有至少90%序列一致性。A gene editing system as claimed in claim 2, wherein the retrotranscript ncRNA nucleotide sequence has approximately 85% to 98% sequence identity with SEQ ID NO: 15327, and the retrotranscript reverse transcriptase has at least 90% sequence identity with type I-C retrotranscript reverse transcriptase. 如請求項4之基因編輯系統,其中該逆轉錄子逆轉錄酶包含與SEQ ID NO:1262至少約90%一致之胺基酸序列。The gene editing system of claim 4, wherein the retrovirus reverse transcriptase comprises an amino acid sequence that is at least about 90% identical to SEQ ID NO:1262. 如請求項2之基因編輯系統,其中該逆轉錄子ncRNA核苷酸序列與SEQ ID NO:16411具有約85%至98%序列一致性,且該逆轉錄子逆轉錄酶與III型逆轉錄子逆轉錄酶具有至少90%序列一致性。The gene editing system of claim 2, wherein the retrotranscript ncRNA nucleotide sequence has about 85% to 98% sequence identity with SEQ ID NO: 16411, and the retrotranscript reverse transcriptase has at least 90% sequence identity with type III retrotranscript reverse transcriptase. 如請求項6之基因編輯系統,其中該逆轉錄子逆轉錄酶包含與SEQ ID NO:2781至少約90%一致之胺基酸序列。The gene editing system of claim 6, wherein the retrovirus reverse transcriptase comprises an amino acid sequence that is at least about 90% identical to SEQ ID NO:2781. 如請求項2之基因編輯系統,其中該逆轉錄子ncRNA核苷酸序列與SEQ ID NO:18731具有約55%至90%序列一致性,且該逆轉錄子逆轉錄酶與XIII型逆轉錄子逆轉錄酶具有至少90%序列一致性。The gene editing system of claim 2, wherein the retrotranscript ncRNA nucleotide sequence has about 55% to 90% sequence identity with SEQ ID NO: 18731, and the retrotranscript reverse transcriptase has at least 90% sequence identity with type XIII retrotranscript reverse transcriptase. 如請求項8之基因編輯系統,其中該經工程改造之逆轉錄子ncRNA包含與SEQ ID NO:19927至少90%一致之核苷酸序列及插入其中之HDR模板。The gene editing system of claim 8, wherein the engineered retrotran ncRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 19927 and an HDR template inserted therein. 如請求項8之基因編輯系統,其中該經工程改造之逆轉錄子ncRNA包含與SEQ ID NO:19928至少90%一致之核苷酸序列及插入其中之HDR模板。The gene editing system of claim 8, wherein the engineered retrotran ncRNA comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 19928 and an HDR template inserted therein. 如請求項8之基因編輯系統,其中該逆轉錄子逆轉錄酶包含與SEQ ID NO:6342至少約90%一致之胺基酸序列。The gene editing system of claim 8, wherein the retrovirus reverse transcriptase comprises an amino acid sequence that is at least about 90% identical to SEQ ID NO:6342. 如請求項1之基因編輯系統,其中該逆轉錄子逆轉錄酶包含至少一種增加可加工性及/或保真度之胺基酸取代。The gene editing system of claim 1, wherein the retrovirus reverse transcriptase comprises at least one amino acid substitution that increases processability and/or fidelity. 如請求項12之基因編輯系統,其中該逆轉錄子逆轉錄酶包含對應於Eco1 RT中之以下胺基酸殘基的胺基酸殘基中之胺基酸取代:Q190、E302或T306。The gene editing system of claim 12, wherein the retrovirus reverse transcriptase comprises an amino acid substitution in an amino acid residue corresponding to the following amino acid residue in Eco1 RT: Q190, E302 or T306. 如請求項12之基因編輯系統,其中該逆轉錄子逆轉錄酶包含對應於Eco1 RT中之以下胺基酸取代的胺基酸殘基中之胺基酸取代:Q190F、E302R或T306K。The gene editing system of claim 12, wherein the retrovirus reverse transcriptase comprises an amino acid substitution in an amino acid residue corresponding to the following amino acid substitution in Eco1 RT: Q190F, E302R or T306K. 一種包含一或多種遞送媒劑之基因編輯系統,其中: 該(等)遞送媒劑包含RNA貨物, 該RNA貨物包含(a)至少一種編碼(i)核酸可程式化核酸酶及(ii)經工程改造之逆轉錄子逆轉錄酶的mRNA分子,(b)經工程改造之逆轉錄子ncRNA,及(c)用於該可程式化核酸酶之引導RNA, 每種遞送媒劑含有(a)(i)及/或(a)(ii)及/或(b)及/或(c), 藉此一種遞送媒劑或超過一種遞送媒劑遞送(a)(i)、(a)(ii)、(b)及(c),且 其中該經工程改造之逆轉錄子逆轉錄酶包含可加工性增強結構域或保真度增強結構域。 A gene editing system comprising one or more delivery vehicles, wherein: the delivery vehicle(s) comprises an RNA cargo, the RNA cargo comprises (a) at least one mRNA molecule encoding (i) a nucleic acid programmable nuclease and (ii) an engineered retrotranscript reverse transcriptase, (b) an engineered retrotranscript ncRNA, and (c) a guide RNA for the programmable nuclease, each delivery vehicle contains (a)(i) and/or (a)(ii) and/or (b) and/or (c), (a)(i), (a)(ii), (b) and (c) are delivered by one or more delivery vehicles, and The engineered retrotransposase comprises a processability-enhancing domain or a fidelity-enhancing domain. 如請求項15之基因編輯系統,其中該可加工性增強結構域包含Sso7d或Sac7d。The gene editing system of claim 15, wherein the processability enhancing domain comprises Sso7d or Sac7d. 如請求項15之基因編輯系統,其中該保真度增強結構域包含3’至5’核酸外切酶結構域。A gene editing system as claimed in claim 15, wherein the fidelity enhancing domain comprises a 3’ to 5’ exonuclease domain. 如請求項17之基因編輯系統,其中該核酸外切酶結構域包含POLE1 POLD1、POLG、Pfu或KOD。The gene editing system of claim 17, wherein the exonuclease domain comprises POLE1 POLD1, POLG, Pfu or KOD. 如請求項15之基因編輯系統,其中該經工程改造之逆轉錄子ncRNA包含取代至逆轉錄子ncRNA中之HDR核苷酸序列; 其中該逆轉錄子逆轉錄酶具有與表A之逆轉錄子逆轉錄酶包含至少90%序列一致性之胺基酸序列; 其中該逆轉錄子ncRNA與表B之逆轉錄子ncRNA具有約85%至98%序列一致性。 A gene editing system as claimed in claim 15, wherein the engineered retrovirus ncRNA comprises an HDR nucleotide sequence substituted into the retrovirus ncRNA; wherein the retrovirus reverse transcriptase has an amino acid sequence having at least 90% sequence identity with the retrovirus reverse transcriptase of Table A; wherein the retrovirus ncRNA has about 85% to 98% sequence identity with the retrovirus ncRNA of Table B. 如請求項15之基因編輯系統,其中該逆轉錄子ncRNA及該逆轉錄子逆轉錄酶來自同一進化枝。A gene editing system as claimed in claim 15, wherein the retrovirus ncRNA and the retrovirus reverse transcriptase are from the same evolutionary branch. 一種包含一或多種遞送媒劑之基因編輯系統,其中: 該(等)遞送媒劑包含RNA貨物, 該RNA貨物包含(a)至少一種編碼(i)核酸可程式化核酸酶及(ii)經工程改造之逆轉錄酶的mRNA分子,(b)經工程改造之逆轉錄子ncRNA,及(c)用於該可程式化核酸酶之引導RNA, 每種遞送媒劑含有(a)(i)及/或(a)(ii)及/或(b)及/或(c), 藉此一種遞送媒劑或超過一種遞送媒劑遞送(a)(i)、(a)(ii)、(b)及(c),且 其中該經工程改造之逆轉錄酶包含來自對應於該經工程改造之逆轉錄子ncRNA的逆轉錄子RT之Y區結構域。 A gene editing system comprising one or more delivery vehicles, wherein: the delivery vehicle(s) comprises an RNA cargo, the RNA cargo comprises (a) at least one mRNA molecule encoding (i) a nucleic acid programmable nuclease and (ii) an engineered reverse transcriptase, (b) an engineered reverse transcriptase ncRNA, and (c) a guide RNA for the programmable nuclease, each delivery vehicle contains (a)(i) and/or (a)(ii) and/or (b) and/or (c), (a)(i), (a)(ii), (b) and (c) are delivered by one delivery vehicle or more than one delivery vehicle, and The engineered reverse transcriptase comprises a Y region domain from a reverse transcriptase RT corresponding to the engineered reverse transcriptase ncRNA. 如請求項21之基因編輯系統,其中該經工程改造之逆轉錄酶係包含與該逆轉錄子RT之該Y區融合的MMLV RT之嵌合體。The gene editing system of claim 21, wherein the engineered reverse transcriptase is a chimera comprising MMLV RT fused to the Y region of the reverse transcriptase RT. 如請求項1之基因編輯系統,其中(a)(i)及(a)(ii)包含編碼該核酸可程式化核酸酶及該逆轉錄子逆轉錄酶之單一mRNA分子。The gene editing system of claim 1, wherein (a)(i) and (a)(ii) comprise a single mRNA molecule encoding the nucleic acid programmable nuclease and the retrotransposase. 如請求項23之基因編輯系統,其中(a)(i)及(a)(ii)經編碼且表現為融合蛋白。The gene editing system of claim 23, wherein (a)(i) and (a)(ii) are encoded and expressed as a fusion protein. 如請求項24之基因編輯系統,其中該融合蛋白包含與該逆轉錄子逆轉錄酶之N末端融合的該核酸可程式化核酸酶之C末端(核酸酶:RT融合)。The gene editing system of claim 24, wherein the fusion protein comprises the C-terminus of the nucleic acid programmable nuclease fused to the N-terminus of the reverse transcriptase of the retrovirus (nuclease:RT fusion). 如請求項24之基因編輯系統,其中該融合蛋白包含與該逆轉錄子逆轉錄酶之C末端融合的該核酸可程式化核酸酶之N末端(RT:核酸酶融合)。A gene editing system as claimed in claim 24, wherein the fusion protein comprises the N-terminus of the nucleic acid programmable nuclease fused to the C-terminus of the reverse transcriptase (RT: nuclease fusion). 如請求項1之基因編輯系統,其中(a)(i)及(a)(ii)包含編碼該核酸可程式化核酸酶之第一mRNA分子及編碼該逆轉錄子逆轉錄酶之第二mRNA分子。The gene editing system of claim 1, wherein (a)(i) and (a)(ii) comprise a first mRNA molecule encoding the nucleic acid programmable nuclease and a second mRNA molecule encoding the retrotransposase. 如請求項1之基因編輯系統,其中(c)與(a)(i)、(a)(ii)及(b)分開或以 反式提供。 The gene editing system of claim 1, wherein (c) is separated from (a)(i), (a)(ii) and (b) or provided in trans . 如請求項1之基因編輯系統,其中(b)該經工程改造之逆轉錄子ncRNA及(c)該引導RNA經融合或以 順式提供。 The gene editing system of claim 1, wherein (b) the engineered retrotran ncRNA and (c) the guide RNA are fused or provided in cis form . 如請求項29之基因編輯系統,其中該引導RNA融合至該逆轉錄子ncRNA之5’端。A gene editing system as claimed in claim 29, wherein the guide RNA is fused to the 5’ end of the retrotranscript ncRNA. 如請求項29之基因編輯系統,其中該引導RNA融合至該逆轉錄子ncRNA之3’端。A gene editing system as claimed in claim 29, wherein the guide RNA is fused to the 3’ end of the retrotranscript ncRNA. 如請求項29之基因編輯系統,其中該經工程改造之ncRNA包含融合至該逆轉錄子ncRNA之5’端的第一引導RNA及融合至該逆轉錄子ncRNA之3’端的第二引導RNA,且該等第一及第二引導RNA靶向不同序列。A gene editing system as claimed in claim 29, wherein the engineered ncRNA comprises a first guide RNA fused to the 5’ end of the retrotranscript ncRNA and a second guide RNA fused to the 3’ end of the retrotranscript ncRNA, and the first and second guide RNAs target different sequences. 如請求項1之基因編輯系統,其中該一或多種遞送媒劑包含脂質體或脂質奈米顆粒(LNP)。A gene editing system as claimed in claim 1, wherein the one or more delivery vehicles comprise liposomes or lipid nanoparticles (LNPs). 如請求項1之基因編輯系統,其中(a)該至少一種編碼(i)該核酸可程式化核酸酶及(ii)該逆轉錄子逆轉錄酶之mRNA分子及(b)該經工程改造之逆轉錄子ncRNA係在同一遞送媒劑中。The gene editing system of claim 1, wherein (a) the at least one mRNA molecule encoding (i) the nucleic acid programmable nuclease and (ii) the retrotranscriptase and (b) the engineered retrotranscript ncRNA are in the same delivery medium. 如請求項1之基因編輯系統,其中(a)該至少一種編碼(i)該核酸可程式化核酸酶及(ii)該逆轉錄子逆轉錄酶之mRNA分子及(b)該經工程改造之逆轉錄子ncRNA係在單獨遞送媒劑中。The gene editing system of claim 1, wherein (a) the at least one mRNA molecule encoding (i) the nucleic acid programmable nuclease and (ii) the retrotranscriptase and (b) the engineered retrotranscript ncRNA are in a separate delivery medium. 如請求項1之基因編輯系統,其中該核酸可程式化核酸酶及該逆轉錄子逆轉錄酶係在單獨mRNA分子上經編碼且(a)(i)及(a)(ii)之彼等單獨mRNA分子含於同一遞送媒劑中。The gene editing system of claim 1, wherein the nucleic acid programmable nuclease and the retroviral reverse transcriptase are encoded on separate mRNA molecules and those separate mRNA molecules of (a)(i) and (a)(ii) are contained in the same delivery medium. 如請求項1之基因編輯系統,其中該核酸可程式化核酸酶及該逆轉錄子逆轉錄酶係在單獨mRNA分子上經編碼且(a)(i)及(a)(ii)之彼等單獨mRNA分子含於不同遞送媒劑中。A gene editing system as claimed in claim 1, wherein the nucleic acid programmable nuclease and the retroviral reverse transcriptase are encoded on separate mRNA molecules and those separate mRNA molecules of (a)(i) and (a)(ii) are contained in different delivery media. 如請求項1之基因編輯系統,其中該經工程改造之逆轉錄子ncRNA包括編碼供體多核苷酸之相關序列,該供體多核苷酸包含欲整合至細胞中之標靶序列處的預期編輯,且其中該供體多核苷酸側接與該標靶序列5’處之序列雜交的5'同源臂及與該標靶序列3'處之序列雜交的3'同源臂。A gene editing system as claimed in claim 1, wherein the engineered retrotranscript ncRNA includes a related sequence encoding a donor polynucleotide, wherein the donor polynucleotide comprises the expected editing at a target sequence to be integrated into a cell, and wherein the donor polynucleotide is flanked by a 5' homology arm hybridized to a sequence at the 5' position of the target sequence and a 3' homology arm hybridized to a sequence at the 3' position of the target sequence. 如請求項1之基因編輯系統,其中該核酸可程式化核酸酶包含Cas9核酸酶、TnpB核酸酶或Cas12a核酸酶。A gene editing system as claimed in claim 1, wherein the nucleic acid programmable nuclease comprises Cas9 nuclease, TnpB nuclease or Cas12a nuclease. 如請求項1之基因編輯系統,其中該核酸可程式化核酸酶包含Cas9核酸酶。A gene editing system as claimed in claim 1, wherein the nucleic acid programmable nuclease comprises a Cas9 nuclease. 如請求項1之基因編輯系統,其中該核酸可程式化核酸酶包含Cas9切口酶。A gene editing system as claimed in claim 1, wherein the nucleic acid programmable nuclease comprises a Cas9 nickase. 一種經分離之細胞,其包含如請求項1之基因編輯系統。An isolated cell comprising the gene editing system of claim 1. 如請求項42之經分離之細胞,其中該經分離之細胞為哺乳動物細胞。The isolated cell of claim 42, wherein the isolated cell is a mammalian cell. 如請求項43之經分離之細胞,其中該哺乳動物細胞為人類細胞。The isolated cell of claim 43, wherein the mammalian cell is a human cell. 一種組合物,其包含: a)   如請求項1之基因編輯系統;及 b)   醫藥學上或獸醫學上可接受之載劑。 A composition comprising: a)   the gene editing system of claim 1; and b)   a pharmaceutically or veterinarily acceptable carrier. 如請求項45之組合物,其中該遞送媒劑為脂質奈米顆粒,其包含: a)     一或多種可離子化脂質; b)     一或多種結構脂質; c)     一或多種PEG化脂質;及 d)     一或多種磷脂。 A composition as claimed in claim 45, wherein the delivery medium is a lipid nanoparticle comprising: a)     one or more ionizable lipids; b)     one or more structured lipids; c)     one or more PEGylated lipids; and d)     one or more phospholipids. 如請求項46之組合物,其中該一或多種可離子化脂質包含表2中所陳述之可離子化脂質。The composition of claim 46, wherein the one or more ionizable lipids comprise the ionizable lipids described in Table 2. 一種對細胞進行遺傳修飾之方法,該方法包括: 使如請求項1之基因編輯系統與該細胞接觸,由此將該RNA貨物遞送至該細胞, 其中: 該核酸可程式化核酸酶與該引導RNA形成複合物,其中該引導RNA將該複合物引導至該標靶序列, 該核酸可程式化核酸酶在該標靶序列中產生雙股斷裂, 該逆轉錄子逆轉錄酶及該經工程改造之逆轉錄子ncRNA產生包含該供體多核苷酸之RT DNA,且 該供體多核苷酸整合至該標靶序列處, 藉此編輯該細胞係經遺傳修飾的。 A method for genetically modifying a cell, the method comprising: contacting the gene editing system of claim 1 with the cell, thereby delivering the RNA cargo to the cell, wherein: the nucleic acid programmable nuclease forms a complex with the guide RNA, wherein the guide RNA guides the complex to the target sequence, the nucleic acid programmable nuclease generates a double-strand break in the target sequence, the retrotransposase and the engineered retrotransposase ncRNA generate RT DNA containing the donor polynucleotide, and the donor polynucleotide is integrated into the target sequence, thereby editing the cell to be genetically modified.
TW112132199A 2022-08-25 2023-08-25 Engineered retrons and methods of use TW202426060A (en)

Applications Claiming Priority (16)

Application Number Priority Date Filing Date Title
US202263373545P 2022-08-25 2022-08-25
US63/373,545 2022-08-25
US202263476900P 2022-12-22 2022-12-22
US18/087,673 US11866728B2 (en) 2022-01-21 2022-12-22 Engineered retrons and methods of use
US63/476,900 2022-12-22
US18/087,673 2022-12-22
WOPCT/US23/61038 2023-01-20
PCT/US2023/061038 WO2023141602A2 (en) 2022-01-21 2023-01-20 Engineered retrons and methods of use
US202363488317P 2023-03-03 2023-03-03
US63/488,317 2023-03-03
US202363491603P 2023-03-22 2023-03-22
US63/491,603 2023-03-22
US202363515783P 2023-07-26 2023-07-26
US63/515,783 2023-07-26
WOPCT/US23/72872 2023-08-24
PCT/US2023/072872 WO2024044723A1 (en) 2022-08-25 2023-08-24 Engineered retrons and methods of use

Publications (1)

Publication Number Publication Date
TW202426060A true TW202426060A (en) 2024-07-01

Family

ID=92929087

Family Applications (1)

Application Number Title Priority Date Filing Date
TW112132199A TW202426060A (en) 2022-08-25 2023-08-25 Engineered retrons and methods of use

Country Status (1)

Country Link
TW (1) TW202426060A (en)

Similar Documents

Publication Publication Date Title
US20240084332A1 (en) Reprogrammable tnpb polypeptides and use thereof
US20230025039A1 (en) Novel type vi crispr enzymes and systems
CN109207477B (en) CRISPR enzymes and systems
US20210079366A1 (en) Cas12a systems, methods, and compositions for targeted rna base editing
US20210163944A1 (en) Novel cas12b enzymes and systems
AU2022201165A1 (en) CRISPR enzyme mutations reducing off-target effects
CA3153902A1 (en) Engineered muscle targeting compositions
US20230392131A1 (en) Reprogrammable iscb nucleases and uses thereof
US20240132916A1 (en) Nuclease-guided non-ltr retrotransposons and uses thereof
AU2016279062A1 (en) Novel CRISPR enzymes and systems
CA3183530A1 (en) Engineered muscle targeting compositions
US20220340936A1 (en) Programmable polynucleotide editors for enhanced homologous recombination
US20230040216A1 (en) Retrotransposons and use thereof
US20220403357A1 (en) Small type ii cas proteins and methods of use thereof
US20240318165A1 (en) Type i-b crispr-associated transposase systems
US20220235340A1 (en) Novel crispr-cas systems and uses thereof
US11946077B2 (en) OMNI-59, 61, 67, 76, 79, 80, 81, and 82 CRISPR nucleases
US20220380758A1 (en) Type i-b crispr-associated transposase systems
US20220298501A1 (en) Crispr-associated mu transposase systems
CA3178165A1 (en) Crispr-associated transposase systems and methods of use thereof
US20240110203A1 (en) Dna nuclease guided transposase compositions and methods of use thereof
WO2023141602A2 (en) Engineered retrons and methods of use
WO2023081756A1 (en) Precise genome editing using retrons
TW202421205A (en) Gene editing components, systems, and methods of use
CA3230479A1 (en) Engineered muscle targeting compositions