TW202008302A - DNA-based data access by converting the input data into a set of nucleotide sequences and synthesizing a set of nucleic acids including the set of nucleotide sequences - Google Patents

DNA-based data access by converting the input data into a set of nucleotide sequences and synthesizing a set of nucleic acids including the set of nucleotide sequences Download PDF

Info

Publication number
TW202008302A
TW202008302A TW107127162A TW107127162A TW202008302A TW 202008302 A TW202008302 A TW 202008302A TW 107127162 A TW107127162 A TW 107127162A TW 107127162 A TW107127162 A TW 107127162A TW 202008302 A TW202008302 A TW 202008302A
Authority
TW
Taiwan
Prior art keywords
aforementioned
dna
integer
artificial sequence
patent application
Prior art date
Application number
TW107127162A
Other languages
Chinese (zh)
Other versions
TWI770247B (en
Inventor
樊隆
Original Assignee
大陸商南京金斯瑞生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大陸商南京金斯瑞生物科技有限公司 filed Critical 大陸商南京金斯瑞生物科技有限公司
Priority to TW107127162A priority Critical patent/TWI770247B/en
Publication of TW202008302A publication Critical patent/TW202008302A/en
Application granted granted Critical
Publication of TWI770247B publication Critical patent/TWI770247B/en

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to DNA-based data storage. The present invention provides an exemplary method for storing the input data on a nucleic acid including: converting the input data into a set of nucleotide sequences and synthesizing a set of nucleic acids including the set of nucleotide sequences. The aforementioned conversion includes a data processing step and a nucleotide encoding step. The aforementioned data processing step includes converting the aforementioned input data into a binary string. The aforementioned nucleotide encoding step includes converting a binary string by using a 5-bit transcoding frame to obtain the aforementioned set of nucleotide sequences.

Description

以DNA為基礎之資料存取 DNA-based data access

本發明通常關於資料儲存及取回,且更具體地,關於實現可靠且有效的基於DNA的資料儲存和取回的技術。 The present invention generally relates to data storage and retrieval, and more specifically, to techniques for achieving reliable and effective DNA-based data storage and retrieval.

利用DNA作為用於資料儲存和取回(retrival)工具的設計可以追溯到1988年,當時Joe Davis和他的合作者創建了一個名為「Mocrovenus」的合成DNA,用於編碼圖示(icon)並將其整合到大腸桿菌細胞中。與例如磁帶和硬碟等傳統儲存介質相比,基於DNA的儲存具有更高密度(例如~1mm3用於儲存1EB資料)、更長的儲存期(例如,在-18℃下超過100萬年)以及更低的維護成本的優勢。DNA儲存是一個前瞻性的研究領域,其基於用於DNA儲存介質生成的寡核苷酸合成(尤其是像CustomArray的高通量合成平台)和用於資訊取回的測序(尤其是下一代測序(NGS),如Illumina HiSeq 2500和MiSeq)。 The use of DNA as a tool for data storage and retrieval (retrival) can be traced back to 1988, when Joe Davis and his collaborators created a synthetic DNA called "Mocrovenus" for coding icons (icon) And integrate it into E. coli cells. Compared with traditional storage media such as magnetic tapes and hard drives, DNA-based storage has a higher density (for example ~1mm 3 for storing 1EB data) and a longer storage period (for example, more than 1 million years at -18°C) And the advantage of lower maintenance costs. DNA storage is a prospective field of research based on oligonucleotide synthesis for DNA storage media generation (especially high-throughput synthesis platforms like CustomArray) and sequencing for information retrieval (especially next-generation sequencing (NGS), such as Illumina HiSeq 2500 and MiSeq).

然而,目前,基於DNA的資料儲存具有許多限制。例如,DNA合成的生產成本相當高,而由於測序,資料取回的速度可能會低。因此,基於DNA的儲存已被認為更適合於大規模檔案儲存,其涉及儲存介質 的較少數量的讀取和寫入。進一步地,在該過程的各個階段(例如編碼、寫入、儲存、解碼、讀取、取回)可引入許多錯誤,從而危及資料流程的輸入和輸出。示例性錯誤包括在合成和測序期間引起的DNA片段突變、缺失(deletions)、插入、丟失以及長期儲存後的變性。此外,當使用DNA儲存大量資料時,實現對資料的一部分的隨機存取而不是全面地取回該資料可能是具有挑戰性的。 However, at present, DNA-based data storage has many limitations. For example, the production cost of DNA synthesis is quite high, and the speed of data retrieval may be low due to sequencing. Therefore, DNA-based storage has been considered more suitable for large-scale file storage, which involves a smaller number of reads and writes to storage media. Further, many errors can be introduced at various stages of the process (such as encoding, writing, storing, decoding, reading, and retrieving), thereby jeopardizing the input and output of the data flow. Exemplary errors include mutations, deletions, insertions, losses, and denaturation after long-term storage of DNA fragments during synthesis and sequencing. In addition, when using DNA to store large amounts of data, it may be challenging to achieve random access to a portion of the data instead of fully retrieving the data.

本發明關於實現可靠且有效的基於DNA的資料儲存和取回的技術。具體地,本發明提供在核酸(例如去氧核糖核酸(「DNA」))上儲存輸入資料的精確、有效且可靠的方法。特別地,本發明利用新型5比特轉碼框架將一個或更多個資料檔案轉換成核酸序列(例如DNA序列)。本發明進一步提供一種集成過程,其包括用於有效且可靠的資料儲存和取回的壓縮演算法、錯誤校正演算法和轉碼框架。此外,本發明允許亂數據存取,其在將大規模資料儲存在一起時特別有益,但是在給定時間僅需瀏覽部分資訊。可以根據本發明的方法儲存的資料包括可以以數字方式(即,以二進位資料的方式)表示的任何類型的資料,包括例如文字檔、高清晰度影片、圖像及/或音訊。 The invention relates to a technology for realizing reliable and effective DNA-based data storage and retrieval. Specifically, the present invention provides an accurate, effective, and reliable method for storing input data on nucleic acids (eg, deoxyribonucleic acid ("DNA")). In particular, the present invention utilizes a novel 5-bit transcoding framework to convert one or more data files into nucleic acid sequences (eg, DNA sequences). The present invention further provides an integration process that includes a compression algorithm, an error correction algorithm, and a transcoding framework for efficient and reliable data storage and retrieval. In addition, the present invention allows random data access, which is particularly beneficial when storing large-scale data together, but only needs to browse part of the information at a given time. The data that can be stored according to the method of the present invention includes any type of data that can be represented digitally (ie, in the form of binary data), including, for example, text files, high-definition movies, images, and/or audio.

在一些實施型態中,提供一種用於在核酸上儲存輸入資料 的方法,該方法包括:a)將輸入資料轉換成核苷酸序列組,其中該轉換包括i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組;以及b)合成包括該核苷酸序列組的核酸組。 In some embodiments, a method for storing input data on a nucleic acid is provided. The method includes: a) converting the input data into a set of nucleotide sequences, wherein the conversion includes i) a data processing step, including converting the input data The data is converted into a binary string; and ii) a nucleotide coding step, including converting the binary string using a 5-bit transcoding framework to obtain a nucleotide sequence group; and b) synthesizing a nucleic acid group including the nucleotide sequence group .

在一些實施型態中,提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法,該方法包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences.

在一些實施型態中,資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。 In some embodiments, the data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings.

在一些實施型態中,核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串。 In some embodiments, the nucleotide coding step includes converting each 5-bit binary string to an integer ranging from 0 to 31 to obtain an integer string.

在一些實施型態中,核苷酸編碼步驟進一步包括使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。 In some embodiments, the nucleotide encoding step further includes converting the integer string using a 5-bit transcoding framework to obtain a set of nucleotide sequences.

在一些實施型態中,核苷酸編碼步驟進一步包括將該整數串分成多個具有預定長度的初始整數子序列。 In some embodiments, the nucleotide coding step further includes dividing the integer string into a plurality of initial integer subsequences with a predetermined length.

在一些實施型態中,多個初始整數子序列中的每一個的長度基於所選擇的合成平台的寡聚物長度、所需的容錯度、輸入資料的大小、所選擇的錯誤校正碼或其組合確定。 In some embodiments, the length of each of the plurality of initial integer subsequences is based on the oligomer length of the selected synthesis platform, the required error tolerance, the size of the input data, the selected error correction code or The combination is determined.

在一些實施型態中,核苷酸編碼步驟進一步包括添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列。 In some embodiments, the nucleotide coding step further includes adding index information to each of the plurality of initial integer subsequences to obtain a plurality of indexed integer subsequences.

在一些實施型態中,添加至多個初始整數子序列中的每一個的索引資訊包括整數序列,其中整數序列的長度基於輸入資料的大小。 In some implementations, the index information added to each of the plurality of initial integer sub-sequences includes an integer sequence, where the length of the integer sequence is based on the size of the input data.

在一些實施型態中,核苷酸編碼步驟包括,在添加索引資訊之後,添加冗餘數據至多個具有索引的整數子序列,從而獲得多個具有冗餘的整數子序列。 In some embodiments, the nucleotide coding step includes, after adding the index information, adding redundant data to a plurality of indexed integer subsequences, thereby obtaining a plurality of redundant integer subsequences.

在一些實施型態中,添加冗餘數據至多個具有索引的整數子序列包括:創建空矩陣,其中空矩陣中的列數大於多個具有索引的整數子序列的大小,且其中空矩陣中的行數大於在多個具有索引的整數子序列中每一個中的整數個數;用多個具有索引的整數子序列及透過應用錯誤校正編碼生成的資料填充空矩陣;以及基於被填充的矩陣獲得多個具有冗餘的整數子序列。 In some implementations, adding redundant data to multiple indexed integer subsequences includes: creating an empty matrix, where the number of columns in the empty matrix is greater than the size of multiple indexed integer subsequences, and The number of rows is greater than the number of integers in each of the multiple indexed integer subsequences; filling the empty matrix with multiple indexed integer subsequences and data generated by applying error correction coding; and obtaining based on the filled matrix Multiple integer subsequences with redundancy.

在一些實施型態中,空矩陣的列數基於所選擇的合成平台的寡聚物長度、錯誤校正碼的類型、預定的容錯度值、多個具有索引的整數子序列的大小或其組合來確定。 In some embodiments, the number of columns of the empty matrix is based on the length of the oligomer of the selected synthesis platform, the type of error correction code, the predetermined error tolerance value, the size of multiple indexed integer subsequences, or a combination thereof determine.

在一些實施型態中,空矩陣的行數基於所選擇的合成平台的寡聚物長度、錯誤校正碼的類型、預定的容錯度值、多個具有索引的整數子序列的大小或其組合來確定。 In some embodiments, the number of rows of the empty matrix is based on the oligomer length of the selected synthesis platform, the type of error correction code, the predetermined error tolerance value, the size of multiple indexed integer subsequences, or a combination thereof determine.

在一些實施型態中,錯誤校正編碼係里德-所羅門(「RS」)編碼(Reed-Solomon(「RS」)coding)。 In some implementations, the error correction coding is Reed-Solomon ("RS") coding.

在一些實施型態中,通過應用錯誤校正編碼生成的資料係透過應用RS編碼的串校正及/或RS編碼的塊校正來生成的。 In some embodiments, the data generated by applying the error correction code is generated by applying RS code string correction and/or RS code block correction.

在一些實施型態中,5比特轉碼框架以表2為依據。 In some implementations, the 5-bit transcoding framework is based on Table 2.

在一些實施型態中,R及Y的選擇是基於:1)與緊鄰R或Y前面的核苷酸不同;和/或2)核苷酸序列的估算GC含量。 In some embodiments, the selection of R and Y is based on: 1) a nucleotide different from the nucleotide immediately before R or Y; and/or 2) the estimated GC content of the nucleotide sequence.

在一些實施型態中,輸入資料對應於壓縮檔。在一些實施型態中,輸入資料對應於兩個或更多個檔。 In some implementations, the input data corresponds to compressed files. In some implementations, the input data corresponds to two or more files.

在一些實施型態中,輸入資料對應於文字檔。 In some implementations, the input data corresponds to a text file.

在一些實施型態中,資料處理進一步包括壓縮輸入資料以獲得壓縮檔以及將壓縮檔轉換成二進位串。 In some implementations, the data processing further includes compressing the input data to obtain a compressed file and converting the compressed file into a binary string.

在一些實施型態中,壓縮檔是使用朗佩爾-齊科-瑪律可夫鏈演算法(Lempel-Zic-Markov chain algorithm)(「LZMA」)進行壓縮的。 In some embodiments, the compressed file is compressed using the Lempel-Zic-Markov chain algorithm ("LZMA").

在一些實施型態中,資料處理步驟進一步包括:將兩個或更多個檔歸組為TAR檔。 In some embodiments, the data processing step further includes: grouping two or more files into TAR files.

在一些實施型態中,使用朗佩爾-齊科-瑪律可夫鏈演算法(Lempel-Zic-Markov chain algorithm)(「LZMA」)將TAR檔進一步壓縮。 In some implementations, the TAR file is further compressed using the Lempel-Zic-Markov chain algorithm ("LZMA").

在一些實施型態中,核苷酸編碼步驟進一步包括將引物序列對附加至核苷酸序列組的每個核苷酸序列的5’及3’末端。 In some embodiments, the nucleotide coding step further includes appending primer sequence pairs to the 5'and 3'ends of each nucleotide sequence of the nucleotide sequence group.

在一些實施型態中,將引物對附接至合成核酸組。 In some embodiments, the primer pair is attached to the synthetic nucleic acid set.

在一些實施型態中,提供在核酸上儲存兩組或更多組輸入資料的方法,該方法包括:a)根據本發明所記載之任何一種方法,將兩組或更多組輸入資料分別轉換成兩組或更多組相應的核苷酸序列;b)分別將引物序列對附加至兩組或更多組相應的核苷酸序列中的每一組的5’和3’末端,其中前述兩組或更多組相應的核苷酸序列的引物對彼此不同;以及c)分別合成包括前述兩組或更多組相應的核苷酸序列的兩組或更多組核酸。 In some embodiments, a method for storing two or more sets of input data on a nucleic acid is provided. The method includes: a) converting two or more sets of input data separately according to any method described in the present invention Into two or more sets of corresponding nucleotide sequences; b) appending primer sequence pairs to the 5'and 3'ends of each of the two or more sets of corresponding nucleotide sequences respectively, wherein The primer pairs of two or more sets of corresponding nucleotide sequences are different from each other; and c) Two or more sets of nucleic acids including the aforementioned two or more sets of corresponding nucleotide sequences are synthesized, respectively.

在一些實施型態中,其中每對引物具有不同於兩組或更多組相應的核苷酸序列或其互補序列中的任何一個的序列。 In some embodiments, each pair of primers has a sequence different from any one of two or more sets of corresponding nucleotide sequences or complementary sequences thereof.

在一些實施型態中,前述合成核酸組的GC含量範圍為30%至70%。在一些實施型態中,前述合成核酸組的GC含量範圍小於約70%。 In some embodiments, the GC content of the aforementioned synthetic nucleic acid group ranges from 30% to 70%. In some embodiments, the GC content of the aforementioned synthetic nucleic acid group ranges from less than about 70%.

在一些實施型態中,儲存前述合成核酸組。在一些實施型態中,前述合成核酸組透過乾燥儲存。在一些實施型態中,前述合成核酸組透過凍乾法儲存。 In some embodiments, the aforementioned synthetic nucleic acid set is stored. In some embodiments, the aforementioned synthetic nucleic acid set is stored dry. In some embodiments, the aforementioned synthetic nucleic acid set is stored by lyophilization.

在一些實施型態中,將前述合成核酸組固定在載體上。在一些實施型態中,前述載體是微陣列。 In some embodiments, the aforementioned synthetic nucleic acid set is fixed on a carrier. In some embodiments, the aforementioned carrier is a microarray.

在一些實施型態中,提供一種用於取回(retrieving)儲存在核酸上的輸出資料的方法,該方法包括:a)獲得核酸組的核苷酸序列組,b)將該核苷酸序列組轉換成輸出資料,其中前述轉換包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將該二進位串轉換成輸出資料,從而獲得前述輸出資料。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided, the method includes: a) obtaining a nucleotide sequence group of a nucleic acid group, b) the nucleotide sequence The group is converted into output data, wherein the aforementioned conversion includes: i) a nucleotide decoding step, including converting the nucleotide sequence group into a binary string using a 5-bit transcoding framework; and ii) a data processing step, including the two The carry string is converted into output data to obtain the aforementioned output data.

在一些實施型態中,在取回輸出資料之前擴增核酸組。 In some embodiments, the nucleic acid set is amplified before the output data is retrieved.

在一些實施型態中,測序核酸組以生成多個序列讀數。 In some embodiments, the nucleic acid set is sequenced to generate multiple sequence reads.

在一些實施型態中,將多個序列讀數配對、合併及過濾以獲得前述核苷酸序列組。 In some embodiments, multiple sequence reads are paired, merged, and filtered to obtain the aforementioned set of nucleotide sequences.

在一些實施型態中,提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法,該方法包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步 驟,包括將該二進位串轉換成輸出資料。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data.

在一些實施型態中,核苷酸解碼步驟包括將核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。 In some embodiments, the nucleotide decoding step includes converting the set of nucleotide sequences into multiple integer subsequences including integers in the range of 0-31.

在一些實施型態中,核苷酸解碼步驟進一步包括將錯誤校正編碼應用於多個整數子序列,從而獲得多個具有索引的整數子序列。 In some embodiments, the nucleotide decoding step further includes applying an error correction code to multiple integer subsequences to obtain multiple indexed integer subsequences.

在一些實施型態中,應用錯誤校正編碼的步驟包括:i)將RS編碼串校正應用於多個整數子序列以獲得多個一致整數子序列;以及ii)將RS編碼塊校正應用於多個一致整數子序列以獲得多個具有索引的整數子序列。 In some embodiments, the step of applying error correction coding includes: i) applying RS coding string correction to multiple integer subsequences to obtain multiple uniform integer subsequences; and ii) applying RS coding block correction to multiple Consistent integer subsequences to obtain multiple indexed integer subsequences.

在一些實施型態中,核苷酸解碼步驟進一步包括從多個具有索引的整數子序列中去除索引以獲得多個核心整數子序列。 In some embodiments, the nucleotide decoding step further includes removing the index from the plurality of indexed integer subsequences to obtain a plurality of core integer subsequences.

在一些實施型態中,核苷酸解碼步驟進一步包括將核心整數子序列合併為整數串。 In some embodiments, the nucleotide decoding step further includes combining core integer subsequences into integer strings.

在一些實施型態中,核苷酸解碼步驟進一步包括將整數串轉換成二進位串。 In some embodiments, the nucleotide decoding step further includes converting the integer string to a binary string.

在一些實施型態中,將輸出資料儲存在壓縮檔中。在一些實施型態中,資料處理步驟進一步包括解壓壓縮檔。在一些實施型態中,解壓通過LZMA演算法進行。 In some implementations, the output data is stored in a compressed file. In some embodiments, the data processing step further includes decompressing the compressed file. In some embodiments, the decompression is performed by LZMA algorithm.

在一些實施型態中,輸出資料對應於多個檔。在一些實施型態中,透過TAR演算法從輸出資料中提取前述多個檔。 In some implementations, the output data corresponds to multiple files. In some implementations, the aforementioned multiple files are extracted from the output data through the TAR algorithm.

在一些實施型態中,5比特轉碼框架以表2為依據。 In some implementations, the 5-bit transcoding framework is based on Table 2.

在一些實施型態中,核酸組包括位於5’及3’末端的引物序 列且前述方法包括在核苷酸解碼步驟之前去除引物序列。 In some embodiments, the nucleic acid set includes primer sequences at the 5'and 3'ends and the aforementioned method includes removing the primer sequence before the nucleotide decoding step.

在一些實施型態中,提供一種用於取回儲存在感興趣的核酸組上的輸出資料的方法,其中感興趣的核酸組是存在於混合物中的多組核苷酸序列中的一組,每一組編碼不同的輸出資料組且具有位於3’和5’末端的不同的引物對組,該方法包括:a)使用對應於感興趣的核酸的引物對擴增該核酸組;b)獲得擴增核酸的核苷酸序列組,c)根據如上述實施型態中的方法將該核苷酸序列組轉換成輸出資料;從而獲得前述輸出資料。 In some embodiments, a method for retrieving output data stored on a nucleic acid group of interest is provided, wherein the nucleic acid group of interest is one of multiple sets of nucleotide sequences present in the mixture, Each set encodes a different output data set and has different primer pair sets at the 3'and 5'ends. The method includes: a) amplifying the nucleic acid set using primer pairs corresponding to the nucleic acid of interest; b) obtained The nucleotide sequence group of the amplified nucleic acid, c) converting the nucleotide sequence group into output data according to the method in the above-mentioned embodiment mode; thereby obtaining the aforementioned output data.

在一些實施型態中,提供一種用於取回儲存在兩組或更多組感興趣的核酸上的相應的兩組或更多組輸出資料的方法,其中前述兩組或更多組感興趣的核酸是在存在於混合物中的多個核苷酸序列之中,每一組編碼不同的輸出資料組且具有位於3’及5’末端的不同的引物對組,該方法包括:a)使用對應於前述兩組或更多組感興趣的核酸的引物對擴增(例如,分別擴增或一起擴增)前述兩組或更多組感興趣的核酸;b)獲得擴增核酸的兩組或更多組核苷酸序列,c)根據本發明所記載之任一種方法將前述兩組或更多組核苷酸序列分別轉換成兩組或更多組輸出資料;從而獲得前述兩組或更多組輸出資料。 In some embodiments, a method for retrieving corresponding two or more sets of output data stored on two or more sets of nucleic acids of interest is provided, wherein the aforementioned two or more sets of interest The nucleic acid is among multiple nucleotide sequences present in the mixture, each group encodes a different output data set and has different primer pair sets at the 3'and 5'ends. The method includes: a) using Primer pairs corresponding to the aforementioned two or more sets of nucleic acids of interest are amplified (eg, amplified separately or together) to the aforementioned two or more sets of nucleic acids of interest; b) two sets of amplified nucleic acids are obtained Or more sets of nucleotide sequences, c) according to any one of the methods described in the present invention, the two or more sets of nucleotide sequences are converted into two or more sets of output data; More sets of output data.

在一些實施型態中,提供一種儲存一個或更多個程式的非暫時性電腦可讀儲存介質,前述一個或更多個程式包括指令,當其由電子裝置的一個或更多個處理器執行時,使該電子裝置實施如本發明所記載之任何一種方法。 In some embodiments, a non-transitory computer-readable storage medium storing one or more programs is provided. The aforementioned one or more programs include instructions, which are executed by one or more processors of an electronic device When the electronic device is implemented, any method as described in the present invention.

本發明進一步提供一種用於提供基於核酸的資料儲存或從核酸中取回資料的系統,包括:一個或更多個處理器;記憶體;及一個或 更多個程式,其中前述一個或多個程式儲存在記憶體中且被配置為由前述一個或多個處理器執行,前述一個或多個程式包括用於實施如本發明所記載之任何一種方法的指令。 The present invention further provides a system for providing nucleic acid-based data storage or retrieving data from nucleic acids, including: one or more processors; memory; and one or more programs, wherein one or more of the foregoing The program is stored in the memory and is configured to be executed by the aforementioned one or more processors. The aforementioned one or more programs include instructions for implementing any one of the methods described in the present invention.

本發明進一步提供一種用於提供基於核酸的資料儲存或從核酸中取回資料的電子裝置,該裝置包括用於實施如本發明所記載之任一種方法的設備。 The present invention further provides an electronic device for providing nucleic acid-based data storage or retrieving data from nucleic acids, the device comprising equipment for implementing any of the methods described in the present invention.

本發明提供在核酸(例如去氧核糖核酸(「DNA」))上儲存輸入資料的精確、有效且可靠的方法。具體地,本發明利用新型5比特轉碼框架將一個或多個資料檔案轉換成核酸序列(例如DNA序列)。此新型5比特轉碼框架允許進行有效地核酸序列設計,從而達到(strikes)正確的GC含量,避免某些均聚物(例如長度為4個或4個以上核苷酸的均聚物),並降低核酸合成和擴增中的錯誤率。本發明進一步提供一種集成過程,其包括用於有效且可靠的資料儲存和取回的壓縮演算法、錯誤校正演算法和轉碼框架。本發明提供的方法可以用於儲存任何大小的資料,包括大尺寸檔。此外,本發明允許隨機資料存取,其在將大規模資料儲存在一起,但是在給定時間僅需要瀏覽部分資訊時特別有益。可以根據本發明的方法儲存的資料包括可以以數字方式(即,以二進位資料的方式)表示的任何類型的資料,包括例如文字檔、高清晰度影片、圖像及/或音訊。 The present invention provides an accurate, effective, and reliable method for storing input data on nucleic acids, such as deoxyribonucleic acid ("DNA"). Specifically, the present invention utilizes a novel 5-bit transcoding framework to convert one or more data files into nucleic acid sequences (eg, DNA sequences). This new 5-bit transcoding framework allows efficient nucleic acid sequence design to strike the correct GC content and avoid certain homopolymers (such as homopolymers with 4 or more nucleotides in length), And reduce the error rate in nucleic acid synthesis and amplification. The present invention further provides an integration process that includes a compression algorithm, an error correction algorithm, and a transcoding framework for efficient and reliable data storage and retrieval. The method provided by the present invention can be used to store data of any size, including large-size files. In addition, the present invention allows random data access, which is particularly beneficial when storing large-scale data together, but only needs to browse part of the information at a given time. The data that can be stored according to the method of the present invention includes any type of data that can be represented digitally (ie, in the form of binary data), including, for example, text files, high-definition movies, images, and/or audio.

【圖1】表示根據一些實施型態的用於提供基於DNA的資料儲存和取回的示例性過程。 [Figure 1] Represents an exemplary process for providing DNA-based data storage and retrieval according to some implementations.

【圖2】表示根據一些實施型態的用於處理用於基於DNA的資料儲存的壓縮檔的示例性手段。 [FIG. 2] Represents exemplary means for processing compressed files for DNA-based data storage according to some implementations.

【圖3A】表示根據一些實施型態的用於添加索引和冗餘數據至待儲存的數字內容的示例性步驟。 [FIG. 3A] Represents exemplary steps for adding indexes and redundant data to digital content to be stored according to some implementations.

【圖3B】描繪根據一些實施型態的用於添加索引和冗餘數據至待儲存的數字內容的示例性步驟。 [FIG. 3B] Depicts exemplary steps for adding indexes and redundant data to digital content to be stored according to some implementations.

【圖3C】描繪根據一些實施型態的用於添加索引和冗餘數據至待儲存的數字內容的示例性步驟。 [FIG. 3C] Depicts exemplary steps for adding indexes and redundant data to digital content to be stored according to some implementations.

【圖3D】描繪根據一些實施型態的用於添加索引和冗餘數據至待儲存的數字內容的示例性步驟。 [FIG. 3D] Depicts exemplary steps for adding indexes and redundant data to digital content to be stored according to some implementations.

【圖4】表示根據一些實施型態的用於處理用於基於DNA的資料儲存的壓縮檔的示例性手段。 [FIG. 4] Represents exemplary means for processing compressed files for DNA-based data storage according to some implementation types.

【圖5】表示根據一些實施型態的示例性5比特轉碼框架。 [Figure 5] Represents an exemplary 5-bit transcoding framework according to some implementations.

【圖6】表示根據一些實施型態的待儲存和取回的示例性文本部分。 [FIG. 6] Represents an exemplary text portion to be stored and retrieved according to some implementations.

【圖7】表示根據一些實施型態的基於DNA的資料儲存和取回技術的示例性實現。 [FIG. 7] Represents an exemplary implementation of DNA-based data storage and retrieval technology according to some implementation types.

【圖8】描繪根據一些實施型態的示例性電子裝置。 [FIG. 8] Depicts an exemplary electronic device according to some implementations.

【圖9A】表示根據一些實施型態的用於提供基於DNA的資料儲存的示例性過程。 [FIG. 9A] Represents an exemplary process for providing DNA-based data storage according to some implementations.

【圖9B】表示根據一些實施型態的用於提供基於DNA的資料取回的示例性過程。 [FIG. 9B] Represents an exemplary process for providing DNA-based data retrieval according to some implementation types.

本發明提供在核酸(例如去氧核糖核酸(「DNA」))上儲存輸入資料的精確、有效且可靠的方法。具體地,本發明利用新型5比特轉碼框架將一個或多個資料檔案轉換成核酸序列(例如DNA序列)。此新型5比特轉碼框架允許進行有效地核酸序列設計,從而達到(strikes)正確的GC含量,避免某些均聚物(例如長度為4個或4個以上核苷酸的均聚物),並降低核酸合成和擴增中的錯誤率。本發明進一步提供一種集成過程,其包括用於有效且可靠的資料儲存和取回的壓縮演算法、錯誤校正演算法和轉碼框架。本發明提供的方法可以用於儲存任何大小的資料,包括大尺寸檔。此外,本發明允許隨機資料存取,其在將大規模資料儲存在一起,但是在給定時間僅需要瀏覽部分資訊時特別有益。可以根據本發明的方法儲存的資料包括可以以數字方式(即,以二進位資料的方式)表示的任何類型的資料,包括例如文字檔、高清晰度影片、圖像及/或音訊。 The present invention provides an accurate, effective, and reliable method for storing input data on nucleic acids, such as deoxyribonucleic acid ("DNA"). Specifically, the present invention utilizes a novel 5-bit transcoding framework to convert one or more data files into nucleic acid sequences (eg, DNA sequences). This new 5-bit transcoding framework allows efficient nucleic acid sequence design to strike the correct GC content and avoid certain homopolymers (such as homopolymers with 4 or more nucleotides in length), And reduce the error rate in nucleic acid synthesis and amplification. The present invention further provides an integration process that includes a compression algorithm, an error correction algorithm, and a transcoding framework for efficient and reliable data storage and retrieval. The method provided by the present invention can be used to store data of any size, including large-size files. In addition, the present invention allows random data access, which is particularly beneficial when storing large-scale data together, but only needs to browse part of the information at a given time. The data that can be stored according to the method of the present invention includes any type of data that can be represented digitally (ie, in the form of binary data), including, for example, text files, high-definition movies, images, and/or audio.

因此,本發明在一方面提供一種用於將輸入資料儲存在核酸組上的方法,以及將輸入資料轉換成核苷酸序列組的方法。在另一方面,提供一種用於取回儲存在核酸上的輸出資料的方法,以及將核苷酸序列組轉換成輸出資料的方法。進一步提供一種用於儲存一個或更多個程式的系統和非暫時性電腦可讀儲存介質用於實施本發明所記載之方法的任何一個或更多個步驟。 Therefore, in one aspect, the present invention provides a method for storing input data on a nucleic acid group and a method for converting the input data into a nucleotide sequence group. In another aspect, a method for retrieving output data stored on a nucleic acid and a method for converting a set of nucleotide sequences into output data are provided. Further provided is a system for storing one or more programs and a non-transitory computer-readable storage medium for implementing any one or more steps of the methods described in the present invention.

應理解,本發明描述的本發明的實施型態包括「由實施型態組成」及/或「基本上由實施型態組成」。 It should be understood that the embodiments of the present invention described in the present invention include "consisting of implementation forms" and/or "essentially consisting of implementation forms".

本發明提及的「約」某值或參數包括(和描述)針對該值或參數本身的變化。例如,涉及「約X」的描述包括「X」的描述。 The reference to "about" a value or parameter in the present invention includes (and describes) a change to the value or parameter itself. For example, a description related to "about X" includes a description of "X".

如本發明所用,提及的「非」某值或參數通常表示並描述「除了」該值或參數。例如,該方法不用於治療X型癌症,意味著該方法用於治療除X以外的其他類型的癌症。 As used in the present invention, the reference to a value or parameter that is "not" generally means and describes "except" the value or parameter. For example, this method is not used to treat type X cancer, meaning that the method is used to treat other types of cancer than X.

如本發明和所附的申請專利範圍所用,單數形式包括複數指示物,除非上下文另有明確規定。 As used in the present invention and the appended patent application, the singular form includes plural indicators unless the context clearly dictates otherwise.

如本發明和所附的申請專利範圍所用的,「一組」是指一個或多個指示物,除非上下文另有明確規定。核酸組可以是編碼同一檔或一起壓縮的同一組檔的資料的核酸。在一些實施型態中,在同一檔中的核酸可具有相同的附加至5’和3’末端的引物組。 As used in the present invention and the appended patent applications, "a group" refers to one or more indicators unless the context clearly dictates otherwise. The nucleic acid group may be a nucleic acid encoding data of the same file or the same file compressed together. In some embodiments, the nucleic acids in the same file may have the same primer sets appended to the 5'and 3'ends.

編碼資料和資料儲存的方法 Encoding data and data storage method

本發明在一方面提供一種用於將輸入資料轉換成核苷酸序列組的方法(例如電腦實現方法)。該方法通常包括資料處理步驟,其將輸入資料轉換成二進位串,和核苷酸編碼步驟,其使用5比特轉碼框架轉換前述二進位串以獲得核苷酸序列組。該方法可用於將輸入資料儲存在核酸組上,其涉及首先將輸入資料轉換成核苷酸序列組,然後合成包括前述核苷酸序列組的核酸組。 In one aspect, the present invention provides a method (for example, a computer-implemented method) for converting input data into a set of nucleotide sequences. The method generally includes a data processing step, which converts the input data into a binary string, and a nucleotide encoding step, which uses a 5-bit transcoding framework to convert the aforementioned binary string to obtain a set of nucleotide sequences. This method can be used to store input data on a nucleic acid group, which involves first converting the input data into a nucleotide sequence group, and then synthesizing the nucleic acid group including the aforementioned nucleotide sequence group.

輸入資料可以表示任何數量的任何類型的檔,例如文字檔、影像檔、影片/音訊檔(例如高清晰度檔)等。該檔可以是非壓縮的或 壓縮的。當檔是非壓縮的,可將其在轉換成二進位串之前先進行壓縮。例如,可使用朗佩爾-齊科-瑪律可夫鏈演算法(Lempel-Ziv-Markov Chain algorithm)將檔案壓縮為LZMA檔(例如A.lzma)。在一些實施型態中,首先將兩個或更多個檔(例如三個、四個、五個、六個和更多個檔)歸組到一起,例如TAR檔(例如A.tar),且將該TAR檔進一步壓縮為LZMA檔(例如A.tar.lzma)。如此,該方法可以允許多個檔(例如1-5、5-10、10-15、15-25、25-35、35-50)儲存在單個核酸組合物中。 The input data can represent any number of files of any type, such as text files, image files, video/audio files (such as high-definition files), etc. The file can be uncompressed or compressed. When the file is uncompressed, it can be compressed before being converted into a binary string. For example, the Lempel-Ziv-Markov Chain algorithm (Lempel-Ziv-Markov Chain algorithm) can be used to compress the files into LZMA files (eg A.lzma). In some embodiments, two or more files (such as three, four, five, six, and more files) are first grouped together, such as a TAR file (such as A.tar), And the TAR file is further compressed into an LZMA file (for example, A.tar.lzma). As such, the method can allow multiple files (eg, 1-5, 5-10, 10-15, 15-25, 25-35, 35-50) to be stored in a single nucleic acid composition.

在一些實施型態中,允許針對單個檔中的位置的隨機存取,該單個檔可被分為多組資料,且將多組資料各自進行如下所述的壓縮和處理。例如,對應於具有10個章節的書的數字化檔可被分為10個檔,每個檔對應於單個章節。然後將該十個檔案壓縮和處理以實現任何一個章節的自由存取。 In some implementations, random access to positions in a single file is allowed, the single file can be divided into multiple sets of data, and the multiple sets of data are each compressed and processed as described below. For example, a digitized file corresponding to a book with 10 chapters can be divided into 10 files, each file corresponding to a single chapter. Then compress and process the ten files to achieve free access to any chapter.

資料處理步驟將輸入資料轉換成二進位串。該二進位串可例如通過遵循本發明所記載之5比特轉碼框架直接轉換成核苷酸序列組。作為選擇,該二進位串可被進一步轉換成整數串,然後其例如通過遵循5比特轉碼框架轉換成核苷酸序列組。在一些實施型態中,進一步對該整數串進行錯誤校正編碼和/或其他處理,以生成具有冗餘的多個整數子序列,然後具有冗餘的多個整數子序列例如通過遵循5比特轉碼框架轉換成核苷酸序列組。 The data processing step converts the input data into a binary string. The binary string can be directly converted into a set of nucleotide sequences, for example, by following the 5-bit transcoding framework described in the present invention. Alternatively, the binary string can be further converted into an integer string, which is then converted into a set of nucleotide sequences, for example, by following a 5-bit transcoding framework. In some embodiments, the integer string is further subjected to error correction coding and/or other processing to generate multiple integer subsequences with redundancy, and then multiple integer subsequences with redundancy, for example, by following the 5-bit conversion The code frame is converted into a set of nucleotide sequences.

因此,例如,在一些實施型態中,提供一種用於將輸入資料轉換成核苷酸序列組的方法(例如電腦實現方法),其中該轉換包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步 驟,包括使用5比特轉碼框架轉換前述二進位串以獲得核苷酸序列組。在一些實施型態中,提供一種用於在核酸上儲存輸入資料的方法,該方法包括:a)將輸入資料轉換成核苷酸序列組,其中該轉換包括i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換前述二進位串以獲得核苷酸序列組;以及b)合成包括前述核苷酸序列組的核酸組。 Therefore, for example, in some embodiments, a method (for example, a computer-implemented method) for converting input data into a set of nucleotide sequences is provided, wherein the conversion includes: i) a data processing step, including converting the input data Into a binary string; and ii) a nucleotide encoding step, which includes converting the aforementioned binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. In some embodiments, a method for storing input data on a nucleic acid is provided. The method includes: a) converting the input data into a set of nucleotide sequences, wherein the conversion includes i) a data processing step, including converting the input data The data is converted into a binary string; and ii) a nucleotide coding step, which includes converting the aforementioned binary string using a 5-bit transcoding framework to obtain a nucleotide sequence group; and b) synthesizing a nucleic acid group including the aforementioned nucleotide sequence group .

在一些實施型態中,資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列,其中的每一個可被進一步轉換成整數0至31範圍內的整數以獲得整數串。該整數串可例如使用5比特轉碼框架直接轉換成核苷酸序列組。作為選擇,對該整數串進行如下所述的進一步操作。 In some implementations, the data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings, each of which can be further converted into integers ranging from integers 0 to 31 to obtain integer strings. This integer string can be directly converted into a set of nucleotide sequences, for example, using a 5-bit transcoding framework. Alternatively, the integer string is further operated as described below.

具體地,整數串可分成多個具有預定長度的初始整數子序列。該初始整數子序列的預定長度基於多個因素計算,前述因素包括合成平台的寡聚物長度、所選擇的錯誤校正碼、所需的容錯度、寡聚物的合成錯誤率和/或總的編碼資料大小,如下文的詳細討論。例如,該整數串可使用長度固定(例如22個整數)的滑動視窗分成一列非重疊整數子序列。然後可添加索引至多個初始整數子序列中的每一個以生成多個具有索引的整數子序列。該索引可含有一些同樣在0至31範圍內的整數。該索引的長度是靈活的且取決於DNA合成的產量和資料大小。 Specifically, the integer string may be divided into a plurality of initial integer sub-sequences having a predetermined length. The predetermined length of the initial integer subsequence is calculated based on a number of factors including the length of the oligomer of the synthesis platform, the selected error correction code, the required error tolerance, the synthesis error rate of the oligomer and/or the total The encoding data size is discussed in detail below. For example, the integer string can be divided into a column of non-overlapping integer subsequences using a sliding window of fixed length (eg, 22 integers). An index can then be added to each of the multiple initial integer subsequences to generate multiple indexed integer subsequences. The index may contain integers that are also in the range 0 to 31. The length of the index is flexible and depends on the yield of DNA synthesis and the size of the data.

在一些實施型態中,添加冗餘數據以生成多個具有冗餘的整數子序列。例如,將里德-所羅門(Reed-Solomon,RS)錯誤校正編碼應用於多個整數子序列以通過RS編碼的串校正和塊校正生成一新列的具有冗餘的整數子序列。冗余是指過量的合成寡聚物以提供對丟失的魯棒性 (robustness to dropout)。串校正中的冗餘有助於寡聚物的轉換和顛換(transitions and transversions)的錯誤校正。塊矯正中的冗餘能夠實現資訊的插入、缺失和完全丟失的校正。 In some embodiments, redundant data is added to generate multiple integer subsequences with redundancy. For example, Reed-Solomon (RS) error correction coding is applied to multiple integer subsequences to generate a new column of integer subsequences with redundancy through RS coded string correction and block correction. Redundancy refers to excess synthetic oligomers to provide robustness to dropout. Redundancy in string correction helps error correction of oligomer transitions and transversions. Redundancy in block correction can realize the correction of information insertion, deletion and complete loss.

在一個示例性的實施型態中,添加冗餘數據至多個具有索引的整數子序列包括:創建空矩陣,其中空矩陣中的列數大於多個具有索引的整數子序列的大小,且其中空矩陣中的行數大於在多個具有索引的整數子序列中每一個中的整數個數;用多個具有索引的整數子序列和通過應用錯誤校正編碼生成的資料填充空矩陣;以及基於被填充的矩陣獲得多個具有冗餘的整數子序列。空矩陣的列數和/或行數可以基於錯誤校正碼的類型、預定的容錯度值、多個具有索引的整數子序列的大小或其組合確定。錯誤校正編碼是里德-所羅門(「RS」)編碼(Reed-Solomon(「RS」)coding)。在一些實施型態中,透過應用錯誤校正編碼生成的資料是透過應用RS編碼的串校正和RS編碼的塊校正生成的。 In an exemplary implementation form, adding redundant data to multiple indexed integer subsequences includes: creating an empty matrix, where the number of columns in the empty matrix is greater than the size of multiple indexed integer subsequences, and wherein the empty The number of rows in the matrix is greater than the number of integers in each of multiple indexed integer subsequences; filling the empty matrix with multiple indexed integer subsequences and data generated by applying error correction coding; and based on the filled Obtains multiple integer subsequences with redundancy. The number of columns and/or rows of the empty matrix may be determined based on the type of error correction code, the predetermined error tolerance value, the size of multiple indexed integer subsequences, or a combination thereof. The error correction coding is Reed-Solomon ("RS") coding. In some embodiments, the data generated by applying the error correction code is generated by applying RS code string correction and RS code block correction.

在一些實施型態中,核苷酸編碼步驟進一步包括將引物序列對附加至核苷酸序列組的5’和3’末端。前述引物可用於例如通過PCR擴增方法擴增核酸組。在一些實施型態中,將引物序列在合成之前添加至核苷酸序列組。作為選擇,可將引物例如透過連接附接至合成核酸。 In some embodiments, the nucleotide coding step further includes appending primer sequence pairs to the 5'and 3'ends of the nucleotide sequence set. The aforementioned primers can be used, for example, to amplify nucleic acid groups by PCR amplification methods. In some embodiments, the primer sequence is added to the set of nucleotide sequences before synthesis. Alternatively, the primer can be attached to the synthetic nucleic acid, for example, by ligation.

前述方法可用於將兩組或更多組輸入資料儲存在核酸上。具體地,該方法包括a)將兩組或更多組輸入資料分別轉換成兩組或更多組相應的核苷酸序列;b)分別將引物序列對附加至前述兩組或更多組相應的核苷酸序列中的每一個的5’和3’末端,其中前述兩組或更多組相應的核苷酸序列中的每一個的引物彼此不同,以及c)分別合成包括前述兩組或更 多組相應的核苷酸序列的多組核酸。每個引物對可具有與兩個或更多個相應的核苷酸序列或其互補序列中的任何一個不相同的序列。 The aforementioned method can be used to store two or more sets of input data on a nucleic acid. Specifically, the method includes a) converting two or more sets of input data into two or more sets of corresponding nucleotide sequences; b) appending primer sequence pairs to the aforementioned two or more sets respectively The 5'and 3'ends of each of the nucleotide sequences of the above, wherein the primers of each of the aforementioned two or more corresponding sets of nucleotide sequences are different from each other, and c) the synthesis includes the aforementioned two or Multiple sets of nucleic acids with corresponding sets of nucleotide sequences. Each primer pair may have a sequence that is different from any one of two or more corresponding nucleotide sequences or complementary sequences thereof.

合成核酸可具有約30%至約70%的GC含量。例如,合成核酸可具有約40%至約60%、約30%至約40%、約40%至約50%、約50%至約60%或約60%至約70%中的任何一個的GC含量。在一些實施型態中,合成核酸不具有長於3個核苷酸的均聚物(例如,不具有4、5、6、7、8、9或10個核苷酸的均聚物)。在一些實施型態中的核酸是寡核苷酸,例如約50、150、200、300或400個中任一核苷酸長度的寡核苷酸。在一些實施型態中,核酸組包括約1、2、3、5、10、15或更多個中任意個數的寡核苷酸。 The synthetic nucleic acid may have a GC content of about 30% to about 70%. For example, the synthetic nucleic acid may have any of about 40% to about 60%, about 30% to about 40%, about 40% to about 50%, about 50% to about 60%, or about 60% to about 70% GC content. In some embodiments, the synthetic nucleic acid does not have a homopolymer longer than 3 nucleotides (eg, does not have a homopolymer of 4, 5, 6, 7, 8, 9, or 10 nucleotides). In some embodiments, the nucleic acid is an oligonucleotide, for example, an oligonucleotide of about 50, 150, 200, 300, or 400 nucleotides in length. In some embodiments, the nucleic acid set includes about 1, 2, 3, 5, 10, 15 or more oligonucleotides in any number.

在一些實施型態中,前述方法進一步包括儲存合成核酸組。在一些實施型態中,將核酸組通過乾燥,例如凍乾法儲存。核酸組可作為乾燥組合物,包括凍乾組合物儲存。在一些實施型態中,將核酸組固定在載體上,包括如微陣列的固體載體。在一些實施型態中,將核酸儲存在具有密度為每1英寸×3英寸面積上約5μg的微陣列上(例如,在CustomArray 12K晶片中)。在一些實施型態中,輸入資料的大小為至少約50MB。 In some embodiments, the aforementioned method further includes storing the synthetic nucleic acid set. In some embodiments, the nucleic acid set is stored by drying, for example, by lyophilization. The nucleic acid set can be stored as a dry composition, including a lyophilized composition. In some embodiments, the nucleic acid set is immobilized on a support, including a solid support such as a microarray. In some embodiments, nucleic acids are stored on a microarray with a density of about 5 μg per 1 inch×3 inch area (eg, in a CustomArray 12K wafer). In some implementations, the size of the input data is at least about 50MB.

解碼核酸序列和資料取回的方法 Decoding nucleic acid sequence and data retrieval method

本發明在另一方面提供一種用於將核苷酸序列組轉換成輸出資料的方法(例如電腦實現方法)。該方法幾乎是編碼程式的相反的過程,且通常包括核苷酸解碼步驟,其將核苷酸序列組例如通過使用5比特轉碼框架轉換成二進位串,以及資料處理步驟,其將二進位串轉換成輸出資料。該方法可用於取回儲存在核酸組上的輸出資料,其涉及獲得該核酸 組的核苷酸序列,以及然後將該核苷酸序列組轉換成輸出資料。 In another aspect, the present invention provides a method (for example, a computer-implemented method) for converting a set of nucleotide sequences into output data. This method is almost the opposite process of the coding program, and usually includes a nucleotide decoding step, which converts a nucleotide sequence group into a binary string, for example, by using a 5-bit transcoding framework, and a data processing step, which converts the binary Serial conversion into output data. The method can be used to retrieve output data stored on a nucleic acid group, which involves obtaining the nucleotide sequence of the nucleic acid group, and then converting the nucleotide sequence group into output data.

在一些實施型態中,首先擴增前述核酸組,例如透過使用存在於該核酸組3’和5’末端的引物。且該擴增核酸可進行測序,例如下一代測序。下一代測序技術是所屬技術領域之通常知識者所知悉的。例如,核酸通過使用依諾米那(Illumina)測序方法測序。屬於特定檔的序列可通過比對引物序列獲得。在一些實施型態中,該方法包括NGS文庫製備。當前述核酸組存在於包括編碼不同資料的不同核酸組的混合物中,感興趣的核酸組可通過使用該感興趣的核酸組的獨特的引物對特異性擴增,從而允許對應於該感興趣的核酸組的資料的隨機存取。如果需要在下一代測序的單次運行中讀取和解碼幾個壓縮檔,則通過PCR擴增它們所有相應的核酸組,並且將使用所有相應的對。 In some embodiments, the aforementioned nucleic acid group is first amplified, for example, by using primers present at the 3'and 5'ends of the nucleic acid group. And the amplified nucleic acid can be sequenced, such as next-generation sequencing. Next-generation sequencing technology is known to those of ordinary skill in the technical field. For example, nucleic acids are sequenced using the Illumina sequencing method. Sequences belonging to a specific file can be obtained by aligning primer sequences. In some embodiments, the method includes NGS library preparation. When the aforementioned nucleic acid group is present in a mixture including different nucleic acid groups encoding different materials, the nucleic acid group of interest can be specifically amplified by using the unique primer pair of the nucleic acid group of interest, thereby allowing the corresponding Random access to nucleic acid group data. If it is necessary to read and decode several compressed files in a single run of next-generation sequencing, then all their corresponding nucleic acid groups will be amplified by PCR and all corresponding pairs will be used.

在一些實施型態中,該方法包括雙端(pair-end)下一代測序以及讀數配對和合併,其中來自單個簇的正向和反向讀數將被配對併合並成單個讀數,且所有具有不規則長度的新讀數將被過濾。而且,根據引物序列,可將所有讀數針對各自壓縮的檔分組。然後可去除該引物,且可將核苷酸序列轉換成多個包括0-31範圍內的整數的整數子序列,或直接轉換成二進位串,前述二進位串隨後被轉換成輸出資料。 In some embodiments, the method includes pair-end next-generation sequencing and read pairing and merging, where forward and reverse reads from a single cluster will be paired and merged into a single read, and all have New readings of regular length will be filtered. Furthermore, based on the primer sequence, all readings can be grouped for their respective compressed files. The primer can then be removed, and the nucleotide sequence can be converted into multiple integer subsequences including integers in the range of 0-31, or directly into a binary string, which is then converted into output data.

在一些實施型態中,前述方法進一步包括應用多個整數子序列的錯誤校正以獲得多個具有索引的整數子序列。在一個示例性實施型態中,應用錯誤校正編碼的步驟包括:i)將RS編碼串校正應用於多個整數子序列以獲得多個一致整數子序列;以及ii)將RS編碼塊校正應用於前述多個一致整數子序列以獲得多個具有索引的整數子序列。因為一種核酸在 合成過程中可具有許多分子拷貝且被多次測序,所以許多讀數可能代表一個核酸。由於在高通量合成和測序的期間引起的錯誤,這些讀數可能發生變化,但與原始設計的核酸完全匹配的正確讀數仍然具有計數優勢。通過在整數串的每個位置處的基於最高頻率的校正,可校正共用相同索引的所有整數串並將其合併到串校正和塊校正之間的一致整數串中。 In some embodiments, the aforementioned method further includes applying error correction of multiple integer subsequences to obtain multiple indexed integer subsequences. In an exemplary embodiment, the step of applying error correction coding includes: i) applying RS coding string correction to multiple integer subsequences to obtain multiple uniform integer subsequences; and ii) applying RS coding block correction to The foregoing multiple uniform integer subsequences obtain multiple indexed integer subsequences. Because a nucleic acid can have many molecular copies and be sequenced multiple times during synthesis, many reads may represent a nucleic acid. These readings may change due to errors caused during high-throughput synthesis and sequencing, but correct readings that exactly match the originally designed nucleic acid still have a counting advantage. With the highest frequency-based correction at each position of the integer string, all integer strings sharing the same index can be corrected and merged into a consistent integer string between string correction and block correction.

然後來自多個具有索引的整數子序列的索引可被去除以獲得多個核心整數子序列。然後該整數串可被連接成完整的整數串然後被轉換成二進位串。隨後該二進位串可被寫入檔,例如壓縮檔。隨後可將該壓縮檔例如通過使用LZMA演算法解壓。如果該解壓檔包括對應於多個檔的資料,將該解壓檔通過TAR演算法進一步處理(例如提取)以獲得前述多個檔。 Then indexes from multiple indexed integer subsequences can be removed to obtain multiple core integer subsequences. The integer string can then be concatenated into a complete integer string and then converted into a binary string. The binary string can then be written to a file, such as a compressed file. The compressed file can then be decompressed, for example, by using the LZMA algorithm. If the decompressed file includes data corresponding to multiple files, the decompressed file is further processed (eg, extracted) by the TAR algorithm to obtain the aforementioned multiple files.

在一些實施型態中,前述方法可用於取回儲存在感興趣的核酸組上的輸出資料,其中感興趣的核酸組是存在於混合物中的多組核苷酸序列中的一組,每一組編碼不同的輸出資料組且具有位於3’和5’末端的不同的引物對組。該方法包括a)使用對應於感興趣的核酸組的引物對擴增前述核酸組;b)獲得擴增核酸組的核苷酸序列組,c)以及根據上述實施型態中的方法將該核苷酸序列組轉換成輸出資料;從而獲得前述輸出資料。 In some embodiments, the aforementioned method can be used to retrieve the output data stored on the nucleic acid group of interest, where the nucleic acid group of interest is one of multiple sets of nucleotide sequences present in the mixture, each The sets encode different sets of output data and have different sets of primer pairs at the 3'and 5'ends. The method includes a) amplifying the aforementioned nucleic acid group using primer pairs corresponding to the nucleic acid group of interest; b) obtaining the nucleotide sequence group of the amplified nucleic acid group, c) and applying the core according to the method in the above-described embodiment mode The nucleotide sequence group is converted into output data; thereby obtaining the aforementioned output data.

在一些實施型態中,提供用於取回儲存在兩組或更多組感興趣的核酸上的相應的兩組或更多組輸出資料的方法,其中前述感興趣的核酸組在存在於混合物中的多個核苷酸序列組之中,每一組編碼不同的輸出資料組且具有位於3’和5’末端的不同的引物對組,該方法包括:a)使用 對應於前述兩組或更多組感興趣的核酸的引物對擴增(例如,分別擴增或一起擴增)前述兩組或更多組感興趣的核酸;b)獲得前述兩組或更多組擴增核酸的兩組或更多組核苷酸序列,以及c)將前述兩組或更多組核苷酸序列分別轉換成兩組或更多組輸出資料;從而獲得前述兩組或更多組輸出資料。 In some embodiments, a method for retrieving corresponding two or more sets of output data stored on two or more sets of nucleic acids of interest is provided, wherein the aforementioned set of nucleic acids of interest is present in the mixture Among the multiple nucleotide sequence groups in each, each group encodes a different output data group and has different primer pair groups located at the 3'and 5'ends. The method includes: a) using corresponding to the aforementioned two groups or Primer pairs of more sets of nucleic acids of interest are amplified (eg, amplified separately or together) to the aforementioned two or more sets of nucleic acids of interest; b) two of the aforementioned two or more sets of amplified nucleic acids are obtained One or more sets of nucleotide sequences, and c) converting the aforementioned two or more sets of nucleotide sequences into two or more sets of output data, respectively; thereby obtaining the aforementioned two or more sets of output data.

5比特轉碼框架 5-bit transcoding framework

本發明的方法利用新型5比特轉碼框架用於將二進位串或整數串轉換成核苷酸序列組。「5比特轉碼框架」是指根據下文的表1的轉換。通常,來自二進位串的每5個連續的比特(bits)可以表示為0至31之間的整數以及之後的3個核苷酸(即3聚體)。例如,核酸具有四個鹼基(例如A、T、G和C),因此二聚體(即NN)應該具有16種(例如AA、AT、AG、AC、TA、TT、TG、TC、GA、GT、GG、GC、CA、CT、CG和CC)。假設簡併鹼基R和Y在二聚體之後連接,三聚體(NNR/NNY)應該由32種組成,其也與0至31範圍內的32個整數很好地匹配並使二進位串轉換成DNA序列。 The method of the present invention utilizes a novel 5-bit transcoding framework for converting binary strings or integer strings into groups of nucleotide sequences. "5-bit transcoding framework" refers to the conversion according to Table 1 below. In general, every 5 consecutive bits from a binary string can be expressed as an integer between 0 and 31 and the following 3 nucleotides (ie, 3-mers). For example, nucleic acids have four bases (eg A, T, G and C), so the dimer (ie NN) should have 16 types (eg AA, AT, AG, AC, TA, TT, TG, TC, GA , GT, GG, GC, CA, CT, CG and CC). Assuming that the degenerate bases R and Y are connected after the dimer, the trimer (NNR/NNY) should consist of 32 species, which also matches well with the 32 integers in the range of 0 to 31 and makes the binary string Converted to DNA sequence.

在一些實施型態中,R選自A、T、G和C中的任意兩個,而Y選自A、T、G和C中相應的另外兩個。在一些實施型態中,R選自A和G,而Y選自T和C。在一些實施型態中,R選自A和C,而Y選自T和G。在一些實施型態中,R選自T和G,而Y選自A和C。在一些實施型態中,R選自T和C,而Y選自A和G。 In some embodiments, R is selected from any two of A, T, G, and C, and Y is selected from the other two of A, T, G, and C. In some embodiments, R is selected from A and G, and Y is selected from T and C. In some embodiments, R is selected from A and C, and Y is selected from T and G. In some embodiments, R is selected from T and G, and Y is selected from A and C. In some embodiments, R is selected from T and C, and Y is selected from A and G.

例如為了保持所需的GC含量和/或避免均聚物的目的,對應於R和Y的核苷酸的選擇可取決於它們前面鹼基。例如,在一個方案中R 選自A和G且Y選自C和T,是否選擇A或G為R以及是否選擇C或T為Y取決於它們前面的鹼基(即三聚體的第二鹼基)。在一些實施型態中,選擇R和Y以使第二和第三鹼基不相同。在一些實施型態中,選擇R和Y以維持所需的GC平衡。只要遵循規則,R和Y可以隨機選擇。此轉碼框架的編碼潛力是1.67(即針對3nt為5比特)。 For example, for the purpose of maintaining the desired GC content and/or avoiding homopolymers, the choice of nucleotides corresponding to R and Y may depend on the base in front of them. For example, in one scheme R is selected from A and G and Y is selected from C and T, whether A or G is selected as R and whether C or T is selected as Y depends on the base in front of them (ie the second of the trimer Bases). In some embodiments, R and Y are selected so that the second and third bases are not the same. In some embodiments, R and Y are selected to maintain the desired GC balance. As long as the rules are followed, R and Y can be chosen randomly. The coding potential of this transcoding framework is 1.67 (ie 5 bits for 3nt).

Figure 107127162-A0202-12-0020-1
Figure 107127162-A0202-12-0020-1

表2提供了示例性5比特轉碼框架,在表2所描繪的具體的實施例中,當Y要選自C和Y時,則R選自A和G。會理解,可以使用遵循相同原理的其他轉碼框架。 Table 2 provides an exemplary 5-bit transcoding framework. In the specific embodiment depicted in Table 2, when Y is to be selected from C and Y, then R is selected from A and G. It will be understood that other transcoding frameworks that follow the same principle can be used.

表2

Figure 107127162-A0202-12-0021-2
Table 2
Figure 107127162-A0202-12-0021-2

核酸的合成與儲存 Synthesis and storage of nucleic acids

包括所需的核苷酸序列的核酸可使用任何核酸合成方法合成。在一些實施型態中,該核酸通過化學合成法合成。高通量核酸合成的方法描述在Maurer等人於2002年2月17日提交的名稱為「COMBINATORIAL SYNTHESIS ON ARRAYS」的國際申請No.WO 2002US40580中,其公佈號為WO 03052383,於2016年12月以「ELECTROCHEMICALLY GENERATED ACID AND ITS CONTAINMENT TO 100 MICRON REACTION AREAS FOR THE PRODUCTION OF DNA MICROARRAYS」的名稱公開,其通過引用以其全部併入本發明。 Nucleic acids including the desired nucleotide sequence can be synthesized using any nucleic acid synthesis method. In some embodiments, the nucleic acid is synthesized by chemical synthesis. The method of high-throughput nucleic acid synthesis is described in International Application No. WO 2002US40580 entitled "COMBINATORIAL SYNTHESIS ON ARRAYS" filed by Maurer et al. on February 17, 2002, and its publication number is WO 03052383 in December 2016. It is disclosed as "ELECTROCHEMICALLY GENERATED ACID AND ITS CONTAINMENT TO 100 MICRON REACTION AREAS FOR THE PRODUCTION OF DNA MICROARRAYS", which is incorporated by reference in its entirety.

核酸一旦合成可儲存在不同的介質中。在一些實施型態中,將核酸乾燥(例如凍乾)且儲存在小瓶中。在一些實施型態中,將核酸固定在載體上,例如,諸如微陣列的固體載體。 Once synthesized, nucleic acids can be stored in different media. In some embodiments, nucleic acids are dried (eg, lyophilized) and stored in vials. In some embodiments, the nucleic acid is immobilized on a support, for example, a solid support such as a microarray.

電腦可讀儲存介質和系統 Computer readable storage medium and system

本發明進一步提供一種儲存一個或更多個程式的非暫時性電腦可讀儲存介質,前述一個或更多個程式包括指令,當其由電子裝置的一個或更多個處理器執行時,使該電子裝置實施如本發明所記載之任何一種方法的一個或更多個步驟。 The present invention further provides a non-transitory computer-readable storage medium that stores one or more programs. The aforementioned one or more programs include instructions that, when executed by one or more processors of an electronic device, cause the The electronic device implements one or more steps of any method described in the present invention.

在一些實施型態中,提供一種用於提供基於核酸的資料儲存或從核酸中取回資料的系統,該系統包括:一個或更多個處理器;記憶體;和一個或更多個程式,其中前述一個或多個程式儲存在記憶體中且被配置為由前述一個或多個處理器執行,前述一個或更多個程式包括用於實施如本發明所記載之任何一種方法的一個或更多個步驟的指令。 In some embodiments, a system for providing nucleic acid-based data storage or retrieving data from a nucleic acid is provided, the system includes: one or more processors; memory; and one or more programs, Wherein the aforementioned one or more programs are stored in the memory and configured to be executed by the aforementioned one or more processors, the aforementioned one or more programs include one or more for implementing any of the methods described in the present invention Multi-step instructions.

在一些實施型態中,提供一種用於提供基於核酸的資料儲存或從核酸中取回資料的電子裝置,該裝置包括實施如本發明所記載之任何一種方法的設備。 In some embodiments, an electronic device for providing nucleic acid-based data storage or retrieving data from nucleic acids is provided. The device includes a device that implements any of the methods described in the present invention.

示例性實施型態 Exemplary implementation

在一些實施型態中,提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法,該方法包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串,以及使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings. The nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string, and converting the integer string using a 5-bit transcoding framework to obtain a nucleotide sequence group.

在一些實施型態中,提供將輸入資料轉換成核苷酸序列組 的電腦實現方法,該方法包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串,以及使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。核苷酸編碼步驟進一步包括將該整數串分成多個具有預定長度的初始整數子序列。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting the input data into a binary string; and ii) nucleotide encoding The steps include converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings. The nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string, and converting the integer string using a 5-bit transcoding framework to obtain a nucleotide sequence group. The nucleotide coding step further includes dividing the integer string into a plurality of initial integer subsequences having a predetermined length.

在一些實施型態中,多個初始整數子序列中的每一個的長度基於所選擇的合成平台的寡聚物長度、所需的容錯度、輸入資料的大小、所選擇的錯誤校正碼或其組合確定。 In some embodiments, the length of each of the plurality of initial integer subsequences is based on the oligomer length of the selected synthesis platform, the required error tolerance, the size of the input data, the selected error correction code or The combination is determined.

在一些實施型態中,提供一種將輸入資料轉換成核苷酸序列組的電腦實現方法:該方法包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串,以及使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。核苷酸編碼步驟進一步包括將該整數串分成多個具有預定長度的初始整數子序列。核苷酸編碼步驟進一步包括添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided: the method includes: i) data processing steps, including converting the input data into a binary string; and ii) nucleotides The coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings. The nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string, and converting the integer string using a 5-bit transcoding framework to obtain a nucleotide sequence group. The nucleotide coding step further includes dividing the integer string into a plurality of initial integer subsequences having a predetermined length. The nucleotide coding step further includes adding index information to each of the plurality of initial integer subsequences to obtain a plurality of indexed integer subsequences.

在一些實施型態中,添加至多個初始整數子序列中的每一個的索引資訊包括整數序列,其中整數序列的長度基於輸入資料的大小。 In some implementations, the index information added to each of the plurality of initial integer sub-sequences includes an integer sequence, where the length of the integer sequence is based on the size of the input data.

在一些實施型態中,提供一種用於將輸入資料轉換成核苷 酸序列組的電腦實現方法,該方法包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串,以及使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。核苷酸編碼步驟進一步包括將該整數串分成多個具有預定長度的初始整數子序列。核苷酸編碼步驟進一步包括添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列。核苷酸編碼步驟進一步包括在添加索引資訊之後,添加冗餘數據至多個具有索引的整數子序列,從而獲得多個具有冗餘的整數子序列。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings. The nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string, and converting the integer string using a 5-bit transcoding framework to obtain a nucleotide sequence group. The nucleotide coding step further includes dividing the integer string into a plurality of initial integer subsequences having a predetermined length. The nucleotide coding step further includes adding index information to each of the plurality of initial integer subsequences to obtain a plurality of indexed integer subsequences. The nucleotide coding step further includes adding redundant data to multiple integer subsequences with indexes after adding index information, thereby obtaining multiple integer subsequences with redundancy.

在一些實施型態中,提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法,該方法包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串,以及使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。核苷酸編碼步驟進一步包括將該整數串分成多個具有預定長度的初始整數子序列。核苷酸編碼步驟進一步包括添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列。核苷酸編碼步驟進一步包括在添加索引資訊之後,添加冗餘數據至多個具有索引的整數子序列,從而獲得多個具有冗餘的整數子序列。添加冗餘數據至多個具有索引的整數子序列包括:創建空矩陣,其中空矩陣中的 列數大於多個具有索引的整數子序列的大小,且其中空矩陣中的行數大於在多個具有索引的整數子序列中的每一個中的整數的個數;用多個具有索引的整數子序列和通過應用錯誤校正編碼生成的資料填充空矩陣;以及基於被填充的矩陣獲得多個具有冗餘的(整數)子序列。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings. The nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string, and converting the integer string using a 5-bit transcoding framework to obtain a nucleotide sequence group. The nucleotide coding step further includes dividing the integer string into a plurality of initial integer subsequences having a predetermined length. The nucleotide coding step further includes adding index information to each of the plurality of initial integer subsequences to obtain a plurality of indexed integer subsequences. The nucleotide coding step further includes adding redundant data to multiple integer subsequences with indexes after adding index information, thereby obtaining multiple integer subsequences with redundancy. Adding redundant data to multiple indexed integer subsequences includes: creating an empty matrix, where the number of columns in the empty matrix is greater than the size of multiple indexed integer subsequences, and where the number of rows in the empty matrix is greater than that in multiple The number of integers in each of the indexed integer subsequences; filling the empty matrix with multiple indexed integer subsequences and data generated by applying error correction coding; and obtaining multiple redundancy based on the filled matrix (Integer) subsequence.

在一些實施型態中,空矩陣的列數基於所選擇的合成平台的寡聚物長度、錯誤校正碼的類型、預定的容錯度值、多個具有索引的整數子序列的大小或其組合確定。 In some embodiments, the number of empty matrix columns is determined based on the oligomer length of the selected synthesis platform, the type of error correction code, the predetermined error tolerance value, the size of multiple indexed integer subsequences, or a combination thereof .

在一些實施型態中,空矩陣的行數基於所選擇的合成平台的寡聚物長度、錯誤校正碼的類型、預定的容錯度值、多個具有索引的整數子序列的大小或其組合確定。 In some embodiments, the number of rows of the empty matrix is determined based on the oligomer length of the selected synthesis platform, the type of error correction code, the predetermined error tolerance value, the size of multiple indexed integer subsequences, or a combination thereof .

在一些實施型態中,錯誤校正編碼是里德-所羅門(「RS」)編碼。 In some implementations, the error correction code is Reed-Solomon ("RS") code.

在一些實施型態中,提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法,該方法包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串,以及使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。核苷酸編碼步驟進一步包括將該整數串分成多個具有預定長度的初始整數子序列。核苷酸編碼步驟進一步包括添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列。核苷酸編碼步驟進一步包括在添加索引資訊之後,添加冗餘數據至多 個具有索引的整數子序列,從而獲得多個具有冗餘的整數子序列。添加冗餘數據至多個具有索引的整數子序列包括:創建空矩陣,其中空矩陣中的列數大於多個具有索引的整數子序列的大小,且其中空矩陣中的行數大於在多個具有索引的整數子序列中的每一個中的整數的個數;用多個具有索引的整數子序列和通過應用錯誤校正編碼生成的資料填充空矩陣;以及基於被填充的矩陣獲得多個具有冗餘的整數子序列。通過應用錯誤校正編碼生成的資料是通過應用RS編碼的串校正和/或RS編碼的塊校正來生成的。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings. The nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string, and converting the integer string using a 5-bit transcoding framework to obtain a nucleotide sequence group. The nucleotide coding step further includes dividing the integer string into a plurality of initial integer subsequences having a predetermined length. The nucleotide coding step further includes adding index information to each of the plurality of initial integer subsequences to obtain a plurality of indexed integer subsequences. The nucleotide coding step further includes adding redundant data to multiple integer subsequences with indexes after adding index information, thereby obtaining multiple integer subsequences with redundancy. Adding redundant data to multiple indexed integer subsequences includes: creating an empty matrix, where the number of columns in the empty matrix is greater than the size of multiple indexed integer subsequences, and where the number of rows in the empty matrix is greater than that in multiple The number of integers in each of the indexed integer subsequences; filling the empty matrix with multiple indexed integer subsequences and data generated by applying error correction coding; and obtaining multiple redundancy based on the filled matrix Integer subsequence. The material generated by applying error correction coding is generated by applying RS coded string correction and/or RS coded block correction.

在一些實施型態中,提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法,該方法包括:i)將輸入資料轉換成二進位串;ii)將二進位串分成非重疊的5比特二進位串的序列;iii)將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串以及使用5比特轉碼框架轉換該整數串;iv)將該整數串分成多個具有預定長度的初始整數子序列;v)添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列;vi)在添加索引資訊之後,添加冗餘數據至多個具有索引的整數子序列,從而獲得多個具有冗餘的整數子序列,從而獲得核苷酸序列組。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) converting the input data into a binary string; ii) dividing the binary string into non-overlapping The sequence of the 5-bit binary string; iii) convert each 5-bit binary string to an integer in the range of 0 to 31 to obtain an integer string and convert the integer string using a 5-bit transcoding framework; iv) convert the integer string Divided into multiple initial integer subsequences with a predetermined length; v) Add index information to each of the multiple initial integer subsequences to obtain multiple indexed integer subsequences; vi) After adding index information, add redundant data To multiple integer subsequences with indexes, to obtain multiple integer subsequences with redundancy, to obtain a nucleotide sequence group.

在一些實施型態中,提供一種用於在核酸上儲存輸入資料的方法,該方法包括:i)將輸入資料轉換成二進位串;ii)將二進位串分成非重疊的5比特二進位串的序列;iii)將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串以及使用5比特轉碼框架轉換該整數串;iv)將該整數串分成多個具有預定長度的初始整數子序列;v)添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列;vi)在 添加索引資訊之後,添加冗餘數據至多個具有索引的整數子序列,從而獲得多個具有冗餘的整數子序列,從而獲得核苷酸序列組;以及vii)合成包括該核苷酸序列組的核酸組。 In some embodiments, a method for storing input data on a nucleic acid is provided. The method includes: i) converting the input data into a binary string; ii) dividing the binary string into non-overlapping 5-bit binary strings Iii) Convert each 5-bit binary string to an integer in the range of 0 to 31 to obtain an integer string and use a 5-bit transcoding framework to convert the integer string; iv) Divide the integer string into multiples with a predetermined length Initial integer sub-sequence; v) add index information to each of the initial integer sub-sequences to obtain multiple indexed integer sub-sequences; vi) after adding index information, add redundant data to multiple indexed integers Subsequences, thereby obtaining multiple integer subsequences with redundancy, thereby obtaining a nucleotide sequence group; and vii) synthesizing a nucleic acid group including the nucleotide sequence group.

在一些實施型態中,提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法,該方法包括:i)將輸入資料轉換成二進位串;ii)將二進位串分成非重疊的5比特二進位串的序列;iii)將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串以及使用5比特轉碼框架轉換該整數串;iv)將該整數串分成多個具有預定長度的初始整數子序列;v)添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列;vi)創建空矩陣,其中空矩陣中的列數大於多個具有索引的整數子序列的大小,且其中空矩陣中的行數大於在多個具有索引的整數子序列中的每一個中的整數的個數;vii)用多個具有索引的整數子序列和通過應用錯誤校正編碼生成的資料填充空矩陣(例如,通過應用RS編碼的串校正和/或RS編碼的塊校正);以及viii)基於被填充的矩陣獲得多個具有冗餘的整數子序列,從而獲得核苷酸序列組。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) converting the input data into a binary string; ii) dividing the binary string into non-overlapping Sequence of 5 bit binary strings; iii) convert each 5 bit binary string to an integer in the range of 0 to 31 to obtain an integer string and convert the integer string using a 5-bit transcoding framework; iv) convert the integer string Divided into multiple initial integer subsequences with a predetermined length; v) Add index information to each of the multiple initial integer subsequences to obtain multiple indexed integer subsequences; vi) Create an empty matrix, where the columns in the empty matrix The number is greater than the size of multiple indexed integer subsequences, and the number of rows in the empty matrix is greater than the number of integers in each of the multiple indexed integer subsequences; vii) use multiple indexed Integer subsequences and data generated by applying error correction coding to fill the empty matrix (for example, by applying RS-coded string correction and/or RS-coded block correction); and viii) obtaining multiple redundant data based on the filled matrix Integer subsequences to obtain a set of nucleotide sequences.

在一些實施型態中,提供一種用於在核酸上儲存輸入資料的方法,該方法包括:i)將輸入資料轉換成二進位串;ii)將二進位串分成非重疊的5比特二進位串的序列;iii)將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串以及使用5比特轉碼框架轉換該整數串;iv)將該整數串分成多個具有預定長度的初始整數子序列;v)添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列;vi)創建空矩陣,其中空矩陣中的列數大於多個具有索引的整數子序列的大小, 且其中空矩陣中的行數大於在多個具有索引的整數子序列中的每一個中的整數的個數;vii)用多個具有索引的整數子序列和通過應用錯誤校正編碼生成的資料填充空矩陣(例如,通過應用RS編碼的串校正和/或RS編碼的塊校正);以及viii)基於被填充的矩陣獲得多個具有冗餘的整數子序列,從而獲得前述核苷酸序列組;以及xi)合成包括該核苷酸序列組的核酸組。 In some embodiments, a method for storing input data on a nucleic acid is provided. The method includes: i) converting the input data into a binary string; ii) dividing the binary string into non-overlapping 5-bit binary strings Iii) Convert each 5-bit binary string to an integer in the range of 0 to 31 to obtain an integer string and use a 5-bit transcoding framework to convert the integer string; iv) Divide the integer string into multiples with a predetermined length Initial integer sub-sequences; v) add index information to each of the multiple initial integer sub-sequences to obtain multiple indexed integer sub-sequences; vi) create an empty matrix, where the number of columns in the empty matrix is greater than multiple indexed The size of the integer subsequence of, and the number of rows in the empty matrix is greater than the number of integers in each of the multiple indexed integer subsequences; vii) use multiple indexed integer subsequences and pass the application The data generated by the error correction coding fills the empty matrix (for example, by applying RS-coded string correction and/or RS-coded block correction); and viii) obtaining multiple integer subsequences with redundancy based on the filled matrix, thereby obtaining The aforementioned nucleotide sequence group; and xi) synthesis of a nucleic acid group including the nucleotide sequence group.

在一些實施型態中,提供一種用於取回儲存在核酸上的輸出資料的方法,該方法包括:i)獲得核酸組的核苷酸序列組,ii)將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列;iii)將該核苷酸序列組轉換成二進位串;以及iv)將該二進位串轉換成輸出資料,從而獲得前述輸出資料。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided, the method comprising: i) obtaining a nucleotide sequence group of a nucleic acid group, ii) converting the nucleotide sequence group into Multiple integer subsequences including integers in the range of 0-31; iii) converting the nucleotide sequence group into a binary string; and iv) converting the binary string into output data, thereby obtaining the aforementioned output data.

在一些實施型態中,提供一種用於取回儲存在核酸上的輸出資料的方法,該方法包括:i)測序核酸組以生成多個序列讀數;ii)配對、合併和/或過濾以獲得核苷酸序列組;iii)將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列;iv)將錯誤校正編碼應用於前述多個整數子序列,從而獲得多個具有索引的整數子序列;v)將前述具有索引的多個整數子序列轉換成二進位串;以及vi)將該二進位串轉換成輸出資料,從而獲得前述輸出資料。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided, the method comprising: i) sequencing a nucleic acid set to generate multiple sequence reads; ii) pairing, merging and/or filtering to obtain Nucleotide sequence group; iii) convert the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31; iv) apply error correction codes to the aforementioned multiple integer subsequences, thereby obtaining multiple Indexed integer subsequences; v) converting the aforementioned indexed integer subsequences into a binary string; and vi) converting the binary string into output data to obtain the aforementioned output data.

在一些實施型態中,提供一種用於取回儲存在核酸上的輸出資料的方法,該方法包括:i)測序核酸組以生成多個序列讀數;ii)配對、合併和/或過濾以獲得核苷酸序列組;iii)將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列;iv)將RS編碼串校正應用於前述 多個整數子序列以獲得多個一致整數子序列;v)將RS編碼塊校正應用於前述多個一致的整數子序列以獲得多個具有索引的整數子序列;vi)將前述具有索引的多個整數子序列轉換成二進位串;以及vii)將該二進位串轉換成輸出資料,從而獲得前述輸出資料。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided, the method comprising: i) sequencing a nucleic acid set to generate multiple sequence reads; ii) pairing, merging and/or filtering to obtain Nucleotide sequence group; iii) convert the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31; iv) apply RS coding string correction to the foregoing multiple integer subsequences to obtain multiple Uniform integer subsequences; v) applying RS coding block correction to the foregoing multiple uniform integer subsequences to obtain multiple indexed integer subsequences; vi) converting the foregoing indexed integer subsequences to binary String; and vii) Convert the binary string into output data, thereby obtaining the aforementioned output data.

在一些實施型態中,提供一種用於取回儲存在核酸上的輸出資料的方法,該方法包括:i)測序核酸組以生成多個序列讀數;ii)配對、合併和/或過濾以獲得核苷酸序列組;iii)將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列;iv)將RS編碼串校正應用於前述多個整數子序列以獲得多個一致的整數子序列;v)將RS編碼塊校正應用於前述多個一致的整數子序列以獲得多個具有索引的整數子序列;vi)從前述多個具有索引的整數子序列中去除索引以獲得多個核心整數子序列;vii)將前述核心整數子序列合併為整數串;viii)將前述整數串轉換成二進位串;以及ix)將該二進位串轉換成輸出資料,從而獲得前述輸出資料。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided, the method comprising: i) sequencing a nucleic acid set to generate multiple sequence reads; ii) pairing, merging and/or filtering to obtain Nucleotide sequence group; iii) convert the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31; iv) apply RS coding string correction to the foregoing multiple integer subsequences to obtain multiple Uniform integer subsequences; v) applying RS coding block correction to the foregoing multiple uniform integer subsequences to obtain multiple indexed integer subsequences; vi) removing indexes from the foregoing multiple indexed integer subsequences To obtain multiple core integer subsequences; vii) merge the aforementioned core integer subsequences into an integer string; viii) convert the aforementioned integer string into a binary string; and ix) convert the binary string into output data to obtain the aforementioned Output data.

在一些實施型態中,提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法,該方法包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。前述5比特轉碼框架以表2為依據。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The aforementioned 5-bit transcoding framework is based on Table 2.

在一些實施型態中,提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法,該方法包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。前述5比特轉碼框架以表2為依據。R和Y的選擇是基於:1)與緊鄰R或Y前面的核苷酸不同;以及和/或2)核苷 酸序列的估算GC含量。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The aforementioned 5-bit transcoding framework is based on Table 2. The selection of R and Y is based on: 1) different from the nucleotide immediately before R or Y; and/or 2) the estimated GC content of the nucleotide sequence.

在一些實施型態中,提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法,該方法包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。輸入資料對應於壓縮檔。壓縮檔是使用朗佩爾-齊科-瑪律可夫鏈演算法(Lempel-Zic-Markov chain algorithm)(「LZMA」)進行壓縮的。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The input data corresponds to the compressed file. The compressed file is compressed using the Lempel-Zic-Markov chain algorithm ("LZMA").

在一些實施型態中,提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法,該方法包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。輸入資料對應於兩個或更多個檔。資料處理步驟進一步包括:將兩個或更多個檔歸組為TAR檔。使用朗佩爾-齊科-瑪律可夫鏈演算法(Lempel-Zic-Markov chain algorithm)(「LZMA」)將TAR檔進一步壓縮。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The input data corresponds to two or more files. The data processing step further includes: grouping two or more files into TAR files. The TAR file is further compressed using the Lempel-Zic-Markov chain algorithm ("LZMA").

在一些實施型態中,提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法,該方法包括:i)資料處理步驟,包括將輸入資料轉換成二進位串;以及ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。核苷酸編碼步驟進一步包括將引物序列對附加至前述核苷酸序列組的每個核苷酸序列的5’和3’末端。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The nucleotide coding step further includes appending primer sequence pairs to the 5'and 3'ends of each nucleotide sequence of the aforementioned nucleotide sequence group.

在一些實施型態中,提供一種用於在核酸上儲存輸入資料的方法包括a)將前述輸入資料轉換成核苷酸序列組,其中前述轉換包括i)資料處理步驟,包括將輸入資料轉換成二進位串;ii)核苷酸編碼步驟,包 括使用5比特轉碼框架轉換二進位串以獲得核苷酸序列組;以及b)合成包括該核苷酸序列組的核酸組。該方法進一步包括附接引物對至前述合成核酸組。 In some embodiments, a method for storing input data on a nucleic acid includes a) converting the foregoing input data into a nucleotide sequence group, wherein the foregoing conversion includes i) a data processing step, including converting the input data into Binary string; ii) Nucleotide encoding step, including converting the binary string using a 5-bit transcoding framework to obtain a nucleotide sequence group; and b) Synthesizing a nucleic acid group including the nucleotide sequence group. The method further includes attaching a primer pair to the aforementioned synthetic nucleic acid set.

在一些實施型態中,提供一種在核酸上儲存兩組或更多組輸入資料的方法,該方法包括:a)根據本發明所記載之任何一種方法,將兩組或更多組輸入資料分別轉換成兩組或更多組相應的核苷酸序列;b)分別將引物序列對附加至兩組或更多組相應的核苷酸序列中的每一組的5’和3’末端,其中前述兩組或更多組相應的核苷酸序列的引物對彼此不同;以及c)分別合成包括前述兩組或更多組相應的核苷酸序列的兩組或更多組核酸。 In some embodiments, a method for storing two or more sets of input data on a nucleic acid is provided. The method includes: a) according to any one of the methods described in the present invention, separate two or more sets of input data Convert into two or more sets of corresponding nucleotide sequences; b) Attach primer sequence pairs to the 5'and 3'ends of each of the two or more sets of corresponding nucleotide sequences, respectively, where The primer pairs of the aforementioned two or more sets of corresponding nucleotide sequences are different from each other; and c) Two or more sets of nucleic acids including the aforementioned two or more sets of corresponding nucleotide sequences are synthesized, respectively.

在一些實施型態中,提供一種在核酸上儲存兩組或更多組輸入資料的方法,該方法包括:a)根據本發明所記載之任何一種方法,將兩組或更多組輸入資料分別轉換成兩組或更多組相應的核苷酸序列;b)分別將引物序列對附加至兩組或更多組相應的核苷酸序列中的每一組的5’和3’末端,其中前述兩組或更多組相應的核苷酸序列的引物對彼此不同;以及c)分別合成包括前述兩組或更多組相應的核苷酸序列的兩組或更多組核酸。其中每對引物具有的序列不同於兩組或更多組相應的核苷酸序列或其互補序列中的任何一個。 In some embodiments, a method for storing two or more sets of input data on a nucleic acid is provided. The method includes: a) according to any one of the methods described in the present invention, separate two or more sets of input data Convert into two or more sets of corresponding nucleotide sequences; b) Attach primer sequence pairs to the 5'and 3'ends of each of the two or more sets of corresponding nucleotide sequences, respectively, where The primer pairs of the aforementioned two or more sets of corresponding nucleotide sequences are different from each other; and c) Two or more sets of nucleic acids including the aforementioned two or more sets of corresponding nucleotide sequences are synthesized, respectively. Each pair of primers has a sequence different from any one of two or more sets of corresponding nucleotide sequences or complementary sequences thereof.

在一些實施型態中,合成核酸組的GC含量範圍為30%至70%。 In some embodiments, the GC content of the synthetic nucleic acid group ranges from 30% to 70%.

在一些實施型態中,提供一種用於在核酸上儲存輸入資料的方法,該方法包括a)將前述輸入資料轉換成核苷酸序列組,其中前述轉 換包括i)資料處理步驟,包括將輸入資料轉換成二進位串;ii)核苷酸編碼步驟,包括使用5比特轉碼框架轉換二進位串以獲得核苷酸序列組;以及b)合成包括前述核苷酸序列組的核酸組。該方法進一步包括儲存前述合成核酸組。 In some embodiments, a method for storing input data on a nucleic acid is provided. The method includes a) converting the foregoing input data into a set of nucleotide sequences, wherein the foregoing conversion includes i) a data processing step, including converting the input The data is converted into a binary string; ii) a nucleotide coding step, which includes converting the binary string using a 5-bit transcoding framework to obtain a nucleotide sequence group; and b) synthesizing a nucleic acid group including the aforementioned nucleotide sequence group. The method further includes storing the aforementioned synthetic nucleic acid set.

在一些實施型態中,前述合成核酸組通過乾燥儲存。在一些實施型態中,前述合成核酸組通過凍乾法儲存。 In some embodiments, the aforementioned synthetic nucleic acid set is stored by drying. In some embodiments, the aforementioned synthetic nucleic acid set is stored by lyophilization.

在一些實施型態中,將合成核酸組固定在載體上,其可為微陣列。 In some embodiments, the synthetic nucleic acid set is fixed on a carrier, which may be a microarray.

在一些實施型態中,提供一種用於取回儲存在核酸上的輸出資料的方法,該方法包括:a)獲得核酸組的核苷酸序列組,b)將該核苷酸序列組轉換成輸出資料,其中前述轉換包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將該二進位串轉換成輸出資料,從而獲得前述輸出資料。該方法包括在取回輸出資料之前擴增前述核酸組。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided. The method includes: a) obtaining a nucleotide sequence group of a nucleic acid group, b) converting the nucleotide sequence group into Output data, wherein the aforementioned conversion includes: i) a nucleotide decoding step, including converting the nucleotide sequence group into a binary string using a 5-bit transcoding framework; and ii) a data processing step, including converting the binary string Output data to obtain the aforementioned output data. The method includes amplifying the aforementioned nucleic acid group before retrieving the output data.

在一些實施型態中,提供一種用於取回儲存在核酸上的輸出資料的方法,該方法包括:a)獲得核酸組的核苷酸序列組,b)將該核苷酸序列組轉換成輸出資料,其中前述轉換包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將該二進位串轉換成輸出資料,從而獲得前述輸出資料。該方法進一步包括測序前述核酸組以生成多個序列讀數。將多個序列讀數配對、合併和過濾以獲得前述核苷酸序列組。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided. The method includes: a) obtaining a nucleotide sequence group of a nucleic acid group, b) converting the nucleotide sequence group into Output data, wherein the aforementioned conversion includes: i) a nucleotide decoding step, including converting the nucleotide sequence group into a binary string using a 5-bit transcoding framework; and ii) a data processing step, including converting the binary string Output data to obtain the aforementioned output data. The method further includes sequencing the aforementioned nucleic acid set to generate multiple sequence reads. Multiple sequence reads are paired, combined and filtered to obtain the aforementioned set of nucleotide sequences.

在一些實施型態中,提供一種用於將核苷酸序列組轉換成 輸出資料的電腦實現方法,該方法包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將該二進位串轉換成輸出資料。核苷酸解碼步驟將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The nucleotide decoding step converts the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31.

在一些實施型態中,提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法,該方法包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將該二進位串轉換成輸出資料。核苷酸解碼步驟將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。核苷酸解碼步驟進一步包括將錯誤校正編碼應用於前述多個整數子序列,從而獲得多個具有索引的整數子序列。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The nucleotide decoding step converts the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31. The nucleotide decoding step further includes applying an error correction code to the foregoing multiple integer subsequences, thereby obtaining multiple indexed integer subsequences.

在一些實施型態中,提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法,該方法包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將該二進位串轉換成輸出資料。核苷酸解碼步驟將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。核苷酸解碼步驟進一步包括將錯誤校正編碼應用於前述多個整數子序列,從而獲得多個具有索引的整數子序列。應用錯誤校正編碼的步驟包括:i)將RS編碼串校正應用於前述多個整數子序列以獲得多個一致整數子序列;以及ii)將RS編碼塊校正應用於前述多個一致整數子序列以獲得多個具有索引的整數子序列。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The nucleotide decoding step converts the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31. The nucleotide decoding step further includes applying an error correction code to the foregoing multiple integer subsequences, thereby obtaining multiple indexed integer subsequences. The steps of applying error correction coding include: i) applying RS coding string correction to the foregoing multiple integer subsequences to obtain multiple uniform integer subsequences; and ii) applying RS coding block correction to the foregoing multiple uniform integer subsequences to Obtain multiple indexed integer subsequences.

在一些實施型態中,提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法,該方法包括:i)核苷酸解碼步驟,包括使用5 比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將該二進位串轉換成輸出資料。核苷酸解碼步驟將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。核苷酸解碼步驟進一步包括將錯誤校正編碼應用於前述多個整數子序列,從而獲得多個具有索引的整數子序列。核苷酸解碼步驟進一步包括從前述多個具有索引的整數子序列中去除索引以獲得多個核心整數子序列。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The nucleotide decoding step converts the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31. The nucleotide decoding step further includes applying an error correction code to the foregoing multiple integer subsequences, thereby obtaining multiple indexed integer subsequences. The nucleotide decoding step further includes removing the index from the aforementioned plurality of indexed integer subsequences to obtain a plurality of core integer subsequences.

在一些實施型態中,提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法,該方法包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將該二進位串轉換成輸出資料。將輸出資料儲存在壓縮檔中。資料處理步驟進一步包括例如藉由通過LZMA演算法解壓壓縮檔。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. Store the output data in a compressed file. The data processing step further includes, for example, by decompressing the compressed file through the LZMA algorithm.

在一些實施型態中,提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法,該方法包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將該二進位串轉換成輸出資料。輸出資料對應於多個檔。該方法進一步包括通過TAR演算法從輸出資料中提取前述多個檔。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The output data corresponds to multiple files. The method further includes extracting the aforementioned multiple files from the output data through the TAR algorithm.

在一些實施型態中,提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法,該方法包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將該二進位串轉換成輸出資料。核苷酸解碼步驟將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。核苷酸解碼步驟進一步包括將錯誤校正編碼應用於前述多個整數子序列,從而獲得多個具有索引的 整數子序列。核苷酸解碼步驟進一步包括從前述多個具有索引的整數子序列中去除索引以獲得多個核心整數子序列。核苷酸解碼步驟進一步包括將核心整數子序列合併為整數串以及將前述整數串轉換成二進位串。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The nucleotide decoding step converts the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31. The nucleotide decoding step further includes applying an error correction code to the aforementioned multiple integer subsequences, thereby obtaining multiple indexed integer subsequences. The nucleotide decoding step further includes removing the index from the aforementioned plurality of indexed integer subsequences to obtain a plurality of core integer subsequences. The nucleotide decoding step further includes merging the core integer subsequences into an integer string and converting the aforementioned integer string into a binary string.

在一些實施型態中,提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法,該方法包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將該二進位串轉換成輸出資料。前述5比特轉碼框架以表2為依據。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The aforementioned 5-bit transcoding framework is based on Table 2.

在一些實施型態中,提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法,該方法包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將該二進位串轉換成輸出資料。前述核酸組包括位於5’和3’末端的引物序列且該方法包括在核苷酸解碼步驟之前去除前述引物序列。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The aforementioned nucleic acid set includes primer sequences at the 5'and 3'ends and the method includes removing the aforementioned primer sequence before the nucleotide decoding step.

在一些實施型態中,提供一種用於基於DNA的資料儲存的電腦可行的方法,該方法包括:將數字化檔轉換成二進位串;使用5比特轉碼框架轉換前述二進位串以獲得整數串;從前述整數串中獲得多個整數子序列;以及將前述多個整數子序列轉換成多個DNA寡聚物的表現形式用於合成DNA。 In some embodiments, a computer feasible method for DNA-based data storage is provided. The method includes: converting a digitized file into a binary string; using a 5-bit transcoding framework to convert the aforementioned binary string to obtain an integer string ; Obtain multiple integer subsequences from the aforementioned integer string; and convert the foregoing multiple integer subsequences into multiple DNA oligomers for the synthesis of DNA.

在一些實施型態中,使用5比特轉碼框架轉換前述二進位串以獲得整數串包括:將二進位串分成非重疊的5比特二進位串的序列;將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串。在一些實施型態中,將前述整數串進一步分成多個具有預定長度的初始整數子序列。 在一些實施型態中,獲得待轉換的多個整數子序列包括:將索引資訊添加至初始多個整數子序列中的每個子序列;在添加索引資訊之後,添加冗餘數據至初始多個整數子序列以獲得多個整數子序列。在一些實施型態中,添加至每個初始多個子序列的索引資訊包括整數串,且其中對應於索引資訊的整數串的長度以數字化檔的大小為基礎。 In some embodiments, using the 5-bit transcoding framework to convert the aforementioned binary string to obtain an integer string includes: dividing the binary string into a sequence of non-overlapping 5-bit binary strings; converting each 5-bit binary string into Integers in the range 0 to 31 to obtain integer strings. In some embodiments, the aforementioned integer string is further divided into a plurality of initial integer sub-sequences having a predetermined length. In some implementations, obtaining multiple integer subsequences to be converted includes: adding index information to each of the initial multiple integer subsequences; after adding index information, adding redundant data to the initial multiple integers Subsequence to obtain multiple integer subsequences. In some implementations, the index information added to each initial plurality of subsequences includes integer strings, and the length of the integer string corresponding to the index information is based on the size of the digitized file.

在一些實施型態中,前述方法包括添加冗餘數據至多個整數子序列,其可以包括,例如,獲得初始多個整數子序列的子集;選擇空矩陣,其中空矩陣中的列數大於子集中子序列的數量,且其中空矩陣中的行數大於子集的每個子序列中的整數的個數;用初始多個整數子序列的子集和對應於錯誤校正碼的資料填充空矩陣;以及基於被填充的矩陣獲得多個整數子序列。在一些實施型態中,空矩陣的列數基於錯誤校正碼的類型、預定的容錯度值,子集的大小或其組合來選擇。在一些實施型態中,空矩陣的行數基於錯誤校正碼的類型、預定的容錯度值、子集的大小或其組合來選擇。 In some embodiments, the foregoing method includes adding redundant data to multiple integer subsequences, which may include, for example, obtaining a subset of the initial multiple integer subsequences; selecting an empty matrix, where the number of columns in the empty matrix is greater than the sub The number of subsequences in the set, and the number of rows in the empty matrix is greater than the number of integers in each subsequence of the subset; fill the empty matrix with a subset of the initial multiple integer subsequences and data corresponding to the error correction code; And obtain multiple integer subsequences based on the filled matrix. In some implementations, the number of columns of the empty matrix is selected based on the type of error correction code, the predetermined error tolerance value, the size of the subset, or a combination thereof. In some implementations, the number of rows of the empty matrix is selected based on the type of error correction code, the predetermined error tolerance value, the size of the subset, or a combination thereof.

在一些實施型態中,錯誤校正碼是里德-所羅門(「RS」)碼。在一些實施型態中,將多個整數子序列轉換成多個DNA寡聚物的表現形式包括將前述多個整數子序列的整數轉換成三個核苷酸的表現形式,其中:這三個核苷酸的第一個選自A、T、G和C,這三個核苷酸的第二個選自A、T、G和C,且這三個核苷酸的第三個選自兩種選擇之一。 In some implementations, the error correction code is a Reed-Solomon ("RS") code. In some embodiments, the conversion of multiple integer subsequences into multiple DNA oligomers includes the conversion of the integers of the foregoing multiple integer subsequences into three nucleotides, where: these three The first of the nucleotides is selected from A, T, G and C, the second of these three nucleotides is selected from A, T, G and C, and the third of these three nucleotides is selected from One of two options.

在一些實施型態中,數字化檔是對應於由一個或更多個檔或目錄構成的組的壓縮檔。在一些實施型態中,數字化檔包括使用朗佩爾-齊科-瑪律可夫鏈演算法壓縮的對應於由一個或更多個檔或目錄構成的組的 LZMA檔。 In some embodiments, the digitized file is a compressed file corresponding to the group consisting of one or more files or directories. In some implementations, the digitized files include LZMA files corresponding to the group consisting of one or more files or directories compressed using the Lampel-Zico-Markov chain algorithm.

在根據上述任何一個實施型態的一些實施型態中,其中前述方法進一步包括:添加表示引物對的資料至多個DNA寡聚物的表現形式的每個寡聚物表現形式;以及在添加表示引物對的資訊之後,從而基於多個DNA寡聚物的表現形式進行DNA合成。 In some implementation forms according to any one of the implementation forms above, wherein the aforementioned method further comprises: adding data representing the primer pair to each oligomer expression form of the plurality of DNA oligomer expression forms; After the correct information, DNA synthesis is performed based on the representation of multiple DNA oligomers.

在一些實施型態中,前述方法進一步包括:獲得第二數字化檔;基於第二數字化檔獲得第二多個DNA寡聚物的表現形式;添加表示第二引物對的資料至第二多個DNA寡聚物的表現形式的每個寡聚物表現形式,其中第二引物對與第一引物對不同;以及基於多個DNA寡聚物的表現形式和第二多個DNA寡聚物的表現形式進行DNA合成。 In some embodiments, the foregoing method further includes: obtaining a second digitized file; obtaining a representation of the second plurality of DNA oligomers based on the second digitized file; adding data representing the second primer pair to the second plurality of DNA Each oligomer representation of the oligomer, where the second primer pair is different from the first primer pair; and based on the representation of multiple DNA oligomers and the representation of second multiple DNA oligomers Perform DNA synthesis.

在一些實施型態中,提供一種用於基於DNA的資料取回的電腦可行的方法,該方法包括:獲得多個對應於數字化檔的讀數;基於前述多個讀數,獲得多個整數子序列;將前述多個整數子序列轉換成整數串;使用5比特框架將前述整數串轉換成二進位串;以及基於二進位串獲得數字化檔。在一些實施型態中,獲得多個對應於前述數字化檔的讀數包括:識別與數字化檔預關聯的引物。在一些實施型態中,獲得多個整數子序列包括進行基於多個讀數的基於頻率的錯誤校正。在一些實施型態中,使用5比特轉碼框架將整數串轉換成二進位串包括:將整數串的每個整數轉換成5比特二進位數字。 In some embodiments, a computer-feasible method for DNA-based data retrieval is provided. The method includes: obtaining multiple readings corresponding to digitized files; based on the foregoing multiple readings, obtaining multiple integer subsequences; Converting the aforementioned multiple integer subsequences into an integer string; using a 5-bit frame to convert the aforementioned integer string into a binary string; and obtaining a digitized file based on the binary string. In some embodiments, obtaining multiple readings corresponding to the aforementioned digital file includes identifying primers pre-associated with the digital file. In some embodiments, obtaining multiple integer subsequences includes performing frequency-based error correction based on multiple readings. In some implementations, using a 5-bit transcoding framework to convert an integer string to a binary string includes: converting each integer of the integer string to a 5-bit binary number.

在一些實施型態中,提供一種儲存一個或更多個程式的非暫時性電腦可讀儲存介質,前述一個或更多個程式包括指令,當其由電子裝置的一個或更多個處理器執行時,使電子裝置:將數字化檔轉換成二進 位串;使用5比特轉碼框架轉換前述二進位串以獲得整數串;從前述整數串中獲得多個整數子序列;以及將前述多個整數子序列轉換成多個DNA寡聚物的表現形式用於DNA合成。 In some embodiments, a non-transitory computer-readable storage medium storing one or more programs is provided. The aforementioned one or more programs include instructions, which are executed by one or more processors of an electronic device When the electronic device: convert the digitized file into a binary string; use a 5-bit transcoding framework to convert the aforementioned binary string to obtain an integer string; obtain multiple integer subsequences from the foregoing integer string; and convert the multiple integer substrings The sequence is converted into a representation of multiple DNA oligomers for DNA synthesis.

在一些實施型態中,提供一種用於提供基於DNA的資料儲存的系統,前述系統包括:一個或更多個處理器;記憶體;和一個或更多個程式,其中前述一個或多個程式儲存在記憶體中且被配置為由一個或多個處理器執行,前述一個或更多個程式包括:用於將數字化檔轉換成二進位串的指令;用於使用5比特編碼框架轉換前述二進位串以獲得整數串的指令;用於從前述整數串中獲得多個整數子序列的指令;以及用於將前述多個整數子序列轉換成多個DNA寡聚物的表現形式的指令。 In some embodiments, a system for providing DNA-based data storage is provided. The foregoing system includes: one or more processors; a memory; and one or more programs, wherein the aforementioned one or more programs Stored in memory and configured to be executed by one or more processors, the aforementioned one or more programs include: instructions for converting digitized files into binary strings; for converting the aforementioned two using a 5-bit encoding framework Instructions for obtaining a string of integers; instructions for obtaining multiple integer subsequences from the aforementioned integer string; and instructions for converting the aforementioned multiple integer subsequences into representations of multiple DNA oligomers.

在一些實施型態中,提供一種儲存一個或更多個程式的非暫時性電腦可讀儲存介質,前述一個或更多個程式包括指令,當其由電子裝置的一個或更多個處理器執行時,使電子設備獲得多個對應於數字化檔的讀數;基於前述多個讀數,獲得多個整數子序列;將前述多個整數子序列轉換成整數串;使用5比特框架將前述整數串轉換成二進位串;以及基於前述二進位串獲得數字化檔。 In some embodiments, a non-transitory computer-readable storage medium storing one or more programs is provided. The aforementioned one or more programs include instructions, which are executed by one or more processors of an electronic device When the electronic device obtains multiple readings corresponding to the digitized file; based on the multiple readings, obtains multiple integer subsequences; converts the multiple integer subsequences into integer strings; uses a 5-bit frame to convert the foregoing integer strings into Binary string; and obtaining a digitized file based on the aforementioned binary string.

在一些實施型態中,提供一種用於提供基於DNA的資料儲存的系統,前述系統包括:一個或更多個處理器;記憶體;和一個或更多個程式,其中前述一個或多個程式儲存在記憶體中且被配置為由前述一個或多個處理器執行,前述一個或更多個程式包括:用於獲得多個對應於數字化檔的讀數的指令;用於基於前述多個讀數獲得多個整數子序列的指令;用於將前述多個整數子序列轉換成整數串的指令;用於使用5比特框 架將前述整數串轉換成二進位串的指令;以及用於基於前述二進位串獲得數字化檔的指令。 In some embodiments, a system for providing DNA-based data storage is provided. The foregoing system includes: one or more processors; a memory; and one or more programs, wherein the aforementioned one or more programs Stored in the memory and configured to be executed by the aforementioned one or more processors, the aforementioned one or more programs include: instructions for obtaining a plurality of readings corresponding to the digitized file; for obtaining based on the aforementioned plurality of readings Instructions for multiple integer subsequences; instructions for converting the aforementioned multiple integer subsequences into integer strings; instructions for converting the aforementioned integer strings into binary strings using a 5-bit frame; and for use based on the aforementioned binary strings Get instructions for digital files.

根據示例性實現方法,前述方法的不同步驟通過一個或更多個電腦軟體程式實現,此軟體程式包括設計為由根據本發明的繼電器模組的資料處理器執行的軟體指令以及設計為控制該方法的不同步驟的執行的軟體指令。 According to an exemplary implementation method, the different steps of the aforementioned method are implemented by one or more computer software programs, which include software instructions designed to be executed by the data processor of the relay module according to the invention and designed to control the method Software instructions for the execution of different steps.

因此,本發明一方面亦關於易於由電腦或由資料處理器執行的程式,此程式包括命令以控制如上前述的方法的步驟的執行。 Therefore, on the one hand, the present invention also relates to a program that is easy to be executed by a computer or by a data processor. This program includes commands to control the execution of the steps of the aforementioned method.

此方法可以以原始碼、目標代碼或在原始碼和目標代碼之間的代碼的形式,例如以部分編譯的形式或以任何其他所需形式,使用任何程式設計語言。 This method can use any programming language in the form of source code, object code, or code between source code and object code, for example, in the form of partial compilation or in any other desired form.

本發明亦關於一種可由資料處理器讀取並包括如上所述的程式的指令的資訊介質。 The invention also relates to an information medium that can be read by the data processor and includes instructions of the program as described above.

資訊介質可以是能夠儲存程式的任何實體或設備。例如,該介質可以包括諸如ROM(其代表「唯讀記憶體」)的存放裝置,例如CD-ROM(其代表「光碟唯讀記憶體」)或微電子電路ROM或磁記錄設備,例如軟碟或硬碟驅動器。 The information medium may be any entity or device capable of storing programs. For example, the medium may include a storage device such as a ROM (which stands for "read only memory"), such as a CD-ROM (which stands for "optical disk read only memory") or a microelectronic circuit ROM or magnetic recording device, such as a floppy disk Or hard drive.

進一步地,資訊介質可以是通過無線電或其他方式的能傳送的載體,例如能通過電纜和光纜送達的電信號或光信號。該程式尤其可以是下載到網際網路類型的網路中。 Further, the information medium may be a carrier that can be transmitted by radio or other means, for example, an electrical signal or an optical signal that can be delivered through a cable and an optical cable. In particular, the program can be downloaded into an Internet-type network.

作為選擇,資訊介質可以是包含前述程式的積體電路,該電路適於執行或用於執行所討論的方法。 Alternatively, the information medium may be an integrated circuit containing the aforementioned program, which is suitable for performing or for performing the method in question.

根據一個實施型態,借助於軟體及/或硬體元件來實現本發明的實施型態。從這個觀點來看,術語「模組」在此文檔中可以對應於軟體元件和硬體元件或一組硬體和軟體元件。 According to an implementation form, the implementation form of the present invention is implemented by means of software and/or hardware components. From this point of view, the term "module" in this document may correspond to software components and hardware components or a group of hardware and software components.

軟體元件對應於一個或更多個電腦程式、一個或更多個程式的副程式或更通常地對應於程式或軟體程式的任何元素,其能夠根據下面針對所涉及的模組描述的內容實現一個功能或一組功能。一個如此的軟體元件通過物理實體(終端、伺服器等)的資料處理器執行且能夠存取該物理實體(記憶體、記錄介質、通信匯流排、輸入/輸出電子板、使用者介面等)的硬體資源。 A software component corresponds to one or more computer programs, a subprogram of one or more programs, or more generally to any element of a program or a software program, which can implement one according to the content described below for the modules involved Function or set of functions. Such a software component is executed by a data processor of a physical entity (terminal, server, etc.) and can access the physical entity (memory, recording medium, communication bus, input/output electronic board, user interface, etc.) Hardware resources.

相似地,硬體元件對應於能夠根據下面針對所涉及的模組描述的內容實現一個功能或一組功能的硬體單元的任何元素。它可以是可程式設計硬體元件或具有用於執行軟體的積體電路的元件,例如積體電路、智慧卡、記憶體卡、用於執行固件的電子板等。在變體中,硬體元件包括作為積體電路的處理器,例如中央處理單元及/或微處理器和/或專用積體電路(ASIC)及/或專用指令集處理器(ASIP)及/或圖形處理單元(GPU)及/或物理處理單元(PPU)及/或數字訊號處理器(DSP)及/或圖像處理器及/或輔助處理器及/或浮點單元及/或網路處理器及/或音訊處理器及/或多核處理器。此外,硬體元件還可包括基帶處理器(包括例如記憶體單元和固件)和/或接收或發送無線電信號的無線電電子電路(其可包括電線)。在一個實施型態中,硬體元件符合一個或多個標準,例如ISO/IEC 18092/ECMA-340、ISO/IEC 21481/ECMA-352、GSMA、StoLPaN、ETSI/SCP(智慧卡平台)、GlobalPlatform(即安全元件)。在一個變形例中, 硬體元件是射頻識別(RFID)標籤。在一個實施型態中,硬體元件包括實現藍牙通信及/或Wi-Fi通信及/或Zigbee通信及/或USB通信及/或火線通信和/或NFC(用於近場)通信的電路。 Similarly, a hardware element corresponds to any element of a hardware unit that can implement a function or a group of functions according to the content described below for the involved module. It can be a programmable hardware component or a component with an integrated circuit for executing software, such as an integrated circuit, a smart card, a memory card, an electronic board for executing firmware, etc. In a variant, the hardware element includes a processor as an integrated circuit, such as a central processing unit and/or a microprocessor and/or a dedicated integrated circuit (ASIC) and/or a dedicated instruction set processor (ASIP) and/or Or graphics processing unit (GPU) and/or physical processing unit (PPU) and/or digital signal processor (DSP) and/or image processor and/or auxiliary processor and/or floating point unit and/or network Processor and/or audio processor and/or multi-core processor. In addition, the hardware component may also include a baseband processor (including, for example, a memory unit and firmware) and/or a radio electronic circuit (which may include wires) that receives or transmits radio signals. In one embodiment, the hardware components comply with one or more standards, such as ISO/IEC 18092/ECMA-340, ISO/IEC 21481/ECMA-352, GSMA, StoLPaN, ETSI/SCP (Smart Card Platform), GlobalPlatform (Ie secure element). In a variant, the hardware element is a radio frequency identification (RFID) tag. In one embodiment, the hardware components include circuits that implement Bluetooth communication and/or Wi-Fi communication and/or Zigbee communication and/or USB communication and/or FireWire communication and/or NFC (for near field) communication.

應當注意,獲得本發明中的元素/值的步驟可以被視為在電子設備的記憶體單元中讀取這種元素/值的步驟或者通過通信手段接收來自另一電子設備的這種元素/值的步驟。 It should be noted that the step of obtaining the element/value in the present invention may be regarded as a step of reading such element/value in the memory unit of the electronic device or receiving such element/value from another electronic device through communication means A step of.

示例性過程 Exemplary process

圖1表示根據一些實施型態的用於提供基於DNA的資料儲存和取回的示例性過程。具體地,示例性步驟102-110涉及編碼數字資料用於儲存,且示例性步驟112-122涉及解碼儲存資訊用於取回。下文,參考圖2-5,進一步具體描述圖1中的示例性步驟。 FIG. 1 shows an exemplary process for providing DNA-based data storage and retrieval according to some embodiments. Specifically, exemplary steps 102-110 involve encoding digital data for storage, and exemplary steps 112-122 involve decoding stored information for retrieval. Hereinafter, referring to FIGS. 2-5, the exemplary steps in FIG. 1 will be described in further detail.

1.編碼1. Coding

在步驟102(「資料壓縮」)中,將一個或更多個檔及/或目錄打包到單個檔中,然後壓縮成壓縮檔。在一些實施例中,該檔及/或目錄被打包到TAR檔(例如,File.tar)中,然後使用朗佩爾-齊科-瑪律可夫鏈演算法(即LZMA演算法)將其壓縮成LZMA檔(例如,File.tar.lzma)。在一些實施例中,一個LZMA檔作為用於資料取回(例如,在解碼期間)的單個不可拆分的單元進行操作。因此,如果要將多個檔及目錄一起儲存但隨機且獨立地取回,則應將它們分組為多個TAR檔,並在此步驟壓縮為多個相應的LZMA檔。 In step 102 ("data compression"), one or more files and/or directories are packaged into a single file, and then compressed into a compressed file. In some embodiments, the file and/or directory is packaged into a TAR file (for example, File.tar), and then the Lampere-Zico-Malkov Chain Algorithm (ie, LZMA Algorithm) Compressed into LZMA file (for example, File.tar.lzma). In some embodiments, one LZMA file operates as a single non-splitter unit for data retrieval (eg, during decoding). Therefore, if multiple files and directories are to be stored together but retrieved randomly and independently, they should be grouped into multiple TAR files and compressed into multiple corresponding LZMA files at this step.

在步驟104中,實施第一輪資料轉碼。首先,將每個LZMA 檔轉換為二進位串。作為實施例,參考圖2中,名為「File.tar.lzma」的檔被轉換為二進位串。然後將二進位字元串轉換為整數串B(「0;10;25;...;4;8;31」)。在所描繪的實施例中,使用5比特轉碼框架實現從二進位串到整數串B的轉換。如圖所示,二進位串被分成一系列非重疊的5比特二進位串,例如「00000」和「01010」。然後將每個5比特二進位串轉換為整數以形成整數串B。所屬技術領域之通常知識者應知悉,在該5比特轉碼框架下,整數字串中的每個整數範圍是從0(對應於「00000」)到31(對應於「11111」)。 In step 104, the first round of data transcoding is implemented. First, convert each LZMA file to a binary string. As an example, referring to FIG. 2, a file named "File.tar.lzma" is converted to a binary string. Then convert the binary character string into an integer string B ("0; 10; 25; ...; 4; 8; 31"). In the depicted embodiment, the conversion from a binary string to an integer string B is achieved using a 5-bit transcoding framework. As shown in the figure, the binary string is divided into a series of non-overlapping 5-bit binary strings, such as "00000" and "01010". Each 5-bit binary string is then converted to an integer to form an integer string B. Those of ordinary skill in the art should know that, under the 5-bit transcoding framework, each integer in the integer string ranges from 0 (corresponding to "00000") to 31 (corresponding to "11111").

如圖2所示,然後使用長度固定的滑動視窗將整數串B劃分為多個非重疊的整數子序列(例如,[A1,A2,...,An])。在圖2中描繪的實施例中,如圖2所描繪的,每個整數子序列(例如,A1)由22個整數組成。最後,將索引資訊附加到每個子序列的開頭以形成具有索引的新的多個整數子序列(例如,[B1,B2,...,Bn])。在所描繪的實施例中,索引資訊包括3個整數的序列,每個整數的範圍從0至31。索引序列的長度可以基於各種因素選擇,例如壓縮檔的大小和DNA合成的產量。 As shown in Fig. 2, a fixed-length sliding window is then used to divide the integer string B into multiple non-overlapping integer subsequences (eg, [A1, A2, ..., An]). In the embodiment depicted in FIG. 2, as depicted in FIG. 2, each integer subsequence (eg, A1) consists of 22 integers. Finally, the index information is appended to the beginning of each subsequence to form a new multiple integer subsequence with indexes (eg, [B1, B2, ..., Bn]). In the depicted embodiment, the index information includes a sequence of 3 integers, each integer ranging from 0 to 31. The length of the index sequence can be selected based on various factors, such as the size of the compressed file and the yield of DNA synthesis.

回到圖1,在步驟106中,多個具有索引的整數子序列(例如,如圖2中所示的[B1,B2,...,Bn])被進一步轉化為多個具有索引和冗餘的整數子序列(例如,如圖4所示的[C1,C2,...,Cm])。各種錯誤校正編碼演算法,例如里德-所羅門(RS)編碼、噴泉編碼(fountain coding)和漢明編碼(hamming coding),可用於將冗餘數據添加到要儲存的數字資料。在較佳的實施型態中,使用RS編碼是因為其魯棒性和易於實現。 Returning to FIG. 1, in step 106, multiple indexed integer subsequences (for example, [B1, B2, ..., Bn] as shown in FIG. 2) are further transformed into multiple indexed and redundant The remaining integer subsequences (for example, [C1, C2, ..., Cm] as shown in FIG. 4). Various error correction coding algorithms, such as Reed-Solomon (RS) coding, fountain coding, and hamming coding, can be used to add redundant data to the digital data to be stored. In a preferred embodiment, RS coding is used because of its robustness and ease of implementation.

圖3A-D顯示了用於向數字內容(例如,由多個整數子序列 [A1,A2,...,An]表示)添加索引和冗餘以獲得[C1,C2,...,Cm]的示例性過程。具體地,圖3A-D顯示了如何使用RS編碼處理前五個整數子序列(即A1、A2、A3、A4和A5)以形成[C1,C2,...,C31]。對於其餘的整數子序列(即,A6,...An),每五個連續整數子序列以與圖3A-D中所示的類似的方式作為一個單元處理。在此實施例中,經由29×31矩陣一起處理五個整數子序列,使得塊校正的同位(parity)為26(即,31-5=26),因此31中的13個(即,26/2=13)個寡聚物可能丟失,但能根據RS編碼的原理恢復。 Figures 3A-D show how to add indexes and redundancy to digital content (eg, represented by multiple integer subsequences [A1, A2, ..., An]) to obtain [C1, C2, ..., Cm ]'S exemplary process. Specifically, FIGS. 3A-D show how to use RS encoding to process the first five integer subsequences (ie, A1, A2, A3, A4, and A5) to form [C1, C2, ..., C31]. For the remaining integer sub-sequences (ie, A6,...An), every five consecutive integer sub-sequences are treated as a unit in a manner similar to that shown in FIGS. 3A-D. In this embodiment, the five integer subsequences are processed together via a 29×31 matrix, so that the parity of the block correction is 26 (ie, 31-5=26), so 13 of 31 (ie, 26/ 2=13) oligomers may be lost, but can be recovered according to the principle of RS encoding.

參考圖3A,準備一個29×31的空矩陣,用來自[A1,A2,...,An]的前五個整數串A1、A2、A3、A4、A5填充此矩陣,顯示為佔據22×5的子矩陣。此區域為中心資料塊。 Referring to FIG. 3A, prepare a 29×31 empty matrix, and fill the matrix with the first five integer strings A1, A2, A3, A4, A5 from [A1, A2, ..., An], which are shown to occupy 22× Sub-matrix of 5. This area is the central data block.

回到圖3B,將由三個0至31範圍內的整數組成的索引序列附加到每列的起始作為一個獨特的索引,該索引串可在附加之前儲存。如圖所示,該索引會以升序儲存或分配,例如0-0-0、0-0-1、0-0-2、……、0-0-31、0-1-31、……。在圖3B中,該編入索引的整數串分別標為B1、B2、B3、B4和B5。 Returning to FIG. 3B, an index sequence consisting of three integers ranging from 0 to 31 is appended to the beginning of each column as a unique index, and the index string can be stored before appending. As shown in the figure, the index will be stored or allocated in ascending order, such as 0-0-0, 0-0-1, 0-0-2, ..., 0-0-31, 0-1-31, ... . In FIG. 3B, the indexed integer strings are labeled B1, B2, B3, B4, and B5, respectively.

參考圖3C,RS編碼用於逐行填充核心資料塊佔據的每一行的空白區域。此步驟被稱為「塊校正」並且有助於處理在合成、測序期間例如丟失的寡聚物和插入缺失(包括插入和缺失)以及長期儲存的變性。 Referring to FIG. 3C, RS coding is used to fill the blank area of each line occupied by the core data block line by line. This step is called "block correction" and helps to deal with, for example, lost oligomers and insertion deletions (including insertions and deletions) and long-term storage denaturation during synthesis and sequencing.

回到圖3D,RS編碼用於逐列填充整個矩陣的每列的空白區域。該步驟被稱為「串校正」並且有助於校正例如在合成、測序和長期儲存期間引起的點突變。如圖3D所示,矩陣現在包括31個整數串[C1, C2,...,C31]。換言之,在塊校正和字串校正之後,前述5個整數子序列A1-A5被轉化為31個整數子序列C1-C31。此外,A1-A5中的每一個包含22個整數,而C1-C31中的每一個包含29個整數(包括3個額外的索引整數和4個用於錯誤校正的RS編碼的同位的額外整數)。應該理解的是,如圖3A-D所示的各種尺寸僅是示例性的。索引串的長度(圖3A-D中的3)、矩陣的大小(例如,圖3A-D中的29×31)以及要作為單元處理的整數串的數量(例如,圖3A-D中的5)可以基於多種因素選擇,例如所使用的錯誤編碼的類型、所需的容錯度和DNA合成平台的特徵。 Returning to FIG. 3D, RS coding is used to fill the blank area of each column of the entire matrix column by column. This step is called "string correction" and helps correct for point mutations caused during synthesis, sequencing, and long-term storage, for example. As shown in FIG. 3D, the matrix now includes 31 integer strings [C1, C2, ..., C31]. In other words, after the block correction and the string correction, the aforementioned 5 integer sub-sequences A1-A5 are converted into 31 integer sub-sequences C1-C31. In addition, each of A1-A5 contains 22 integers, while each of C1-C31 contains 29 integers (including 3 additional index integers and 4 parity extra integers for RS encoding for error correction) . It should be understood that the various dimensions shown in FIGS. 3A-D are merely exemplary. The length of the index string (3 in Figures 3A-D), the size of the matrix (for example, 29×31 in Figures 3A-D), and the number of integer strings to be treated as units (for example, 5 in Figures 3A-D) ) Can be selected based on a variety of factors, such as the type of error code used, the required fault tolerance, and the characteristics of the DNA synthesis platform.

如圖4所示,通過根據參考圖3A-D描述的技術的RS編碼的一輪串校正和RS編碼的一輪塊校正,將具有索引的多個整數子序列,[B1,B2,...,Bn],轉化為具有冗餘的多個整數子序列,[C1,C2,...,Cm],其中m大於n。此外,整數子序列[C1,C2,...,Cm]中的每個整數的範圍為0至31。 As shown in FIG. 4, through a round of string correction of RS coding and a round of block correction of RS coding according to the technique described with reference to FIGS. 3A-D, multiple integer subsequences with indexes, [B1, B2, ..., Bn], converted to multiple integer subsequences with redundancy, [C1, C2, ..., Cm], where m is greater than n. In addition, each integer in the integer subsequence [C1, C2, ..., Cm] ranges from 0 to 31.

在圖3A-D中所描繪的實施例中。基於多個因素計算初始整數子序列諸如A1的長度(在所描繪的實施例中為22)。具體地,具有索引和冗餘的整數串的長度(在所描繪的實施例中表示為L,29)由合成平台的寡聚物長度計算。兩個字串(在所描繪的實施例中表示為X,4)的竒偶校驗和塊校正由寡聚物的合成誤差率、所使用的錯誤校正碼和所需的容錯率確定。索引長度(在所描繪的實施例中表示為Y,3)由總編碼資料大小確定。因此,初始整數串的長度(表示為Z)是Z=L-X-Y。 In the embodiment depicted in Figures 3A-D. The length of the initial integer sub-sequence such as A1 (22 in the depicted embodiment) is calculated based on multiple factors. Specifically, the length of the integer string with index and redundancy (denoted as L, 29 in the depicted embodiment) is calculated from the oligomer length of the synthesis platform. The parity check and block correction of the two strings (denoted as X, 4 in the depicted embodiment) are determined by the synthetic error rate of the oligomer, the error correction code used, and the required error tolerance rate. The index length (denoted as Y, 3 in the depicted embodiment) is determined by the total coded data size. Therefore, the length of the initial integer string (expressed as Z) is Z=L-X-Y.

回到圖1,在步驟108中,實施第二輪轉碼以將具有冗餘的整數串(例如,[C1,C2,...,Cm])列表轉化為多個DNA寡聚物的表現形 式(例如,[D1,D2,...,Dm])。DNA寡聚物的每種表現形式含有用於合成的個鹼基A、T、G和C。特別是,「5比特轉碼框架」可以再次使用。此處,整數串[C1,C2,...,Cm]中的每個整數的範圍為0至31,因此可以唯一地映射到32種3核苷酸中的一種(例如,三聚體,包括NNY和NNR,其中N代表A、T、G、C;Y代表C和T;且R代表A和G)。例如,如圖5所示,整數6對應於5比特二進位串「00110」並且可以在特定策略下翻譯成「AGR」。在一些實施例中,5比特轉碼框架可以提供整數與DNA寡聚物的表現形式之間的直接轉換,而無需任何中間步驟(例如,首先將整數轉換為二進位串)。 Returning to FIG. 1, in step 108, a second round of transcoding is implemented to convert the list of redundant integer strings (eg, [C1, C2, ..., Cm]) into a representation of multiple DNA oligomers (For example, [D1, D2, ..., Dm]). Each form of DNA oligomer contains the bases A, T, G, and C used for synthesis. In particular, the "5-bit transcoding framework" can be used again. Here, each integer in the integer string [C1, C2, ..., Cm] ranges from 0 to 31, so it can be uniquely mapped to one of 32 kinds of 3 nucleotides (for example, trimer, Including NNY and NNR, where N represents A, T, G, C; Y represents C and T; and R represents A and G). For example, as shown in Figure 5, the integer 6 corresponds to the 5-bit binary string "00110" and can be translated into "AGR" under a specific strategy. In some embodiments, the 5-bit transcoding framework can provide a direct conversion between integers and representations of DNA oligomers without any intermediate steps (eg, first convert the integer to a binary string).

因此,每個整數子序列中的29個整數中的每一個(例如,C1)可以被映射成3核苷酸。在轉換所有[C1,C2,...,Cm]之後,用C或T取代Y,而在DNA合成之前用A或G取代R時。這樣做是為了確保第3個鹼基與三聚體的第2個鹼基不同,並避免3個連續相同的鹼基(例如,AAA、GGG、TTT、CCC)。此外,通過Y和R的選擇,每種寡聚物的GC百分比應限制在30%至70%。取代步驟既減少了由寡聚物合成引起的誤差,又對於改善寡聚物合成的校正比率具有重要意義。 Thus, each of the 29 integers (eg, C1) in each integer subsequence can be mapped to 3 nucleotides. After converting all [C1, C2, ..., Cm], replace Y with C or T, and replace R with A or G before DNA synthesis. This is done to ensure that the third base is different from the second base of the trimer, and to avoid three consecutive identical bases (eg, AAA, GGG, TTT, CCC). In addition, by choosing Y and R, the GC percentage of each oligomer should be limited to 30% to 70%. The replacement step not only reduces the error caused by oligomer synthesis, but also has important significance for improving the correction ratio of oligomer synthesis.

根據RS編碼的原理,可容忍的誤差可以包括,來自圖3A-D所示的示例性方案中的相同矩陣的每個寡聚物的兩個(即,字串校正的竒偶校驗,4的一半)突變以及31個寡聚物中的13個(即,塊校正的同位,26的一半)丟失的寡聚物(包括完全丟失的寡聚物和具有插入缺失的寡聚物)。 According to the principle of RS coding, tolerable errors may include two from each oligomer of the same matrix in the exemplary scheme shown in FIGS. 3A-D (ie, string-corrected parity check, 4 Half) mutations and 13 out of 31 oligomers (ie, block corrected parity, half of 26) lost oligomers (including completely lost oligomers and oligomers with indels).

參考圖1,在步驟110中,附加引物對並進行DNA合成。在 一些實施例中,單個壓縮檔(例如,圖4的File.tar.lzma)被轉化為DNA寡聚物的多個表現形式(例如,圖4中的[D1,D2,...,Dn])。並且在對應於壓縮檔的每個寡聚物的兩個末端添加相同引物序列對的表現形式。對於要同時儲存和合成但需要在隨後的讀取和解碼期間隨機可存取的多個壓縮檔,選擇獨特的正交引物對用於每個壓縮檔並與其相關聯。例如,如果有3個壓縮檔要同時儲存和合成但需要在隨後的讀取和解碼期間隨機存取,則選擇3對獨特的正交引物分別與3個壓縮檔相關聯。對於每個壓縮檔,將所選引物對附加到對應於壓縮檔的多個寡聚物中每個寡聚物。然後,可以將對應於前述多個壓縮檔的所有寡聚物合並且同時一起合成為儲存介質。 Referring to FIG. 1, in step 110, a primer pair is added and DNA synthesis is performed. In some embodiments, a single compressed file (eg, File.tar.lzma of FIG. 4) is converted into multiple representations of DNA oligomers (eg, [D1, D2, ..., Dn in FIG. 4 ]). And the expression form of adding the same primer sequence pair at the two ends of each oligomer corresponding to the compressed file. For multiple compression files to be stored and synthesized simultaneously but requiring random access during subsequent reading and decoding, a unique orthogonal primer pair is selected for and associated with each compression file. For example, if there are 3 compressed files to be stored and synthesized at the same time but need to be randomly accessed during subsequent reading and decoding, then select 3 pairs of unique orthogonal primers to be associated with each of the 3 compressed files. For each compressed file, the selected primer pair is appended to each oligomer in the multiple oligomers corresponding to the compressed file. Then, all the oligomers corresponding to the aforementioned multiple compression files can be combined and synthesized together into a storage medium at the same time.

為了選擇引物對,可以使用多種標準。例如,可以選擇引物對以避免同源二聚體、異源二聚體、髮夾結構並且具有足夠的特異性(例如,沒有針對編碼核酸序列的結合位點)。在一些實例中,使用多重PCR引物設計標準。 To select primer pairs, a variety of criteria can be used. For example, primer pairs can be selected to avoid homodimers, heterodimers, hairpin structures and have sufficient specificity (eg, there is no binding site for the encoding nucleic acid sequence). In some examples, multiple PCR primer design standards are used.

2.解碼 2. Decoding

解碼程式基本上是編碼程式的逆過程。參考圖1,在步驟112中,使用引物對進行PCR以擴增相應壓縮檔(例如,圖4的File.tar)寡聚物列表(例如,圖4中的[D1,D2,...,Dn])。如果需要用單次運行的NGS讀取和解碼多個壓縮檔,則應使用所有相應的引物對通過PCR擴增所有其相應的寡聚物列表。此步驟也稱為「NGS庫製備」。 The decoding program is basically the reverse process of the encoding program. Referring to FIG. 1, in step 112, PCR is performed using primer pairs to amplify the corresponding compressed file (for example, File.tar of FIG. 4) oligomer list (for example, [D1, D2, ..., in FIG. 4 Dn]). If you need to read and decode multiple compressed files with a single run of NGS, you should amplify all their corresponding oligomer lists by PCR using all corresponding primer pairs. This step is also called "NGS library preparation".

在步驟114中,(例如,通過Illumina測序系統)進行雙端下一代測序和讀數配對和合併。具體地,將來自相同簇的正向和反向讀數配對並合併成單個讀數,並且將過濾具有不規則長度的所有新讀數(例如, 具有插入缺失的讀數)。此外,根據引物序列,可以針對每個壓縮檔對所有讀數進行分組。在隨後的步驟中,對應於相同壓縮檔的讀數(即,共用相同引物的讀數)將被一起分析。 In step 114, double-ended next-generation sequencing and read pairing and merging are performed (eg, by Illumina sequencing system). Specifically, the forward and reverse readings from the same cluster are paired and merged into a single reading, and all new readings with irregular lengths (eg, readings with insertion deletions) will be filtered. In addition, based on the primer sequence, all readings can be grouped for each compressed file. In subsequent steps, the readings corresponding to the same compressed file (ie, readings sharing the same primer) will be analyzed together.

在步驟116中,進行反向RS編碼。在一些實施例中,會利用29乘31的零矩陣但非空矩陣。具體地,每個來自單個壓縮檔的讀數具有在兩個末端被移除的PCR引物,然後通過RS編碼的串校正將其轉換成整數子序列,目的是對突變進行錯誤校正。因為一種寡聚物在合成過程中可能具有許多分子拷貝且被多次測序,上述許多讀取可能源自一種寡聚物。由於在高通量合成和測序期間引起的錯誤,這些讀數可能發生變化,但正確的讀數應占主導地位。通過在整數子序列的每個位置處的基於最高頻率的校正,可校正共用相同索引的所有整數子序列並將其合併成一致整數子序列。例如,對於共用相同索引的一組讀數,其一致整數子序列的各個位置應該由在該位置最頻繁出現的整數確定。 In step 116, reverse RS coding is performed. In some embodiments, a 29 by 31 zero matrix but a non-empty matrix will be used. Specifically, each reading from a single compressed file has PCR primers removed at both ends, which are then converted into integer subsequences by RS-encoded string correction with the purpose of error correction for mutations. Because an oligomer may have many molecular copies and be sequenced multiple times during synthesis, many of the above reads may be derived from an oligomer. These readings may change due to errors caused during high-throughput synthesis and sequencing, but correct readings should dominate. Through the highest frequency-based correction at each position of the integer subsequence, all integer subsequences sharing the same index can be corrected and merged into a uniform integer subsequence. For example, for a group of readings that share the same index, each position of a consistent integer subsequence should be determined by the integer that occurs most frequently at that position.

在步驟118,整數串的清單可以通過RS編碼的塊校正來完全解碼,恢復丟失的寡聚物以及具有插入和刪除的寡聚物。由於一種寡聚物在合成過程中可能具有許多分子拷貝並且被多次測序,因此許多讀數可能代表一種寡聚物。由於在高通量合成和測序期間引起的錯誤,這些讀數可能發生變化,但是與原始設計的寡聚物良好匹配的正確讀數仍然具有計數優勢。通過在整數串的每個位置處的基於最高頻率的校正,可以校正共用相同索引的所有整數串並將其合併到字串校正和塊校正之間的一致整數串中。由於具有插入和刪除的寡聚物具有不規則的長度並且將在錯誤校正期間刪除,因此相應的資料完全等於資訊缺乏並且需要恢復。基於索引資 訊,矩陣的列在基於最高頻率的校正之後填充。 At step 118, the list of integer strings can be fully decoded by RS-encoded block correction, recovering lost oligomers and oligomers with insertions and deletions. Since an oligomer may have many molecular copies and be sequenced multiple times during synthesis, many readings may represent an oligomer. These readings may change due to errors caused during high-throughput synthesis and sequencing, but correct readings that are well matched to the originally designed oligomers still have a counting advantage. Through the highest frequency-based correction at each position of the integer string, all integer strings sharing the same index can be corrected and merged into a consistent integer string between word string correction and block correction. Since oligomers with insertions and deletions have irregular lengths and will be deleted during error correction, the corresponding data is completely equal to lack of information and needs to be restored. Based on the index information, the columns of the matrix are filled in after the correction based on the highest frequency.

在步驟120中,進行轉碼。讀數通過索引儲存,然後從每個整數子序列中刪除索引。然後可以將所有整數子序列連接成單個整數字串,然後通過5比特轉碼框架將其轉移(transferred into)到二進位字元串中。 In step 120, transcoding is performed. Readings are stored by index, and then the index is deleted from each integer subsequence. Then all integer sub-sequences can be connected into a single integer string, which is then transferred into a binary string through a 5-bit transcoding framework.

在步驟122中,進行解壓縮。具體地,系統將二進位串寫入壓縮檔,然後依次通過LZMA演算法和TAR演算法解壓壓縮檔。對於多個壓縮檔的隨機存取,應該獨立地為每個壓縮檔執行步驟116到122。池(pool)可以儲存多個壓縮檔。每個壓縮檔都有自己的PCR引物。在解碼期間,不必對整個池進行測序。相反,相應的PCR引物用於擴增某個壓縮檔的寡聚物,然後對擴增的寡聚物進行測序以解碼此相應的壓縮檔而不是整個池。 In step 122, decompression is performed. Specifically, the system writes the binary string to the compressed file, and then decompresses the compressed file through the LZMA algorithm and the TAR algorithm in sequence. For random access to multiple compressed files, steps 116 to 122 should be performed independently for each compressed file. The pool can store multiple compressed files. Each compressed file has its own PCR primer. During decoding, it is not necessary to sequence the entire pool. Instead, the corresponding PCR primers are used to amplify the oligomer of a certain compressed file, and then sequence the amplified oligomer to decode this corresponding compressed file rather than the entire pool.

如上文所討論的,利用(leveraged)了5比特轉碼框架。具體地,來自二進位串的每5個連續的比特可以表示為0至31之間的整數以及之後的3個核苷酸[nt](即三聚體)。例如,DNA寡聚物由四個鹼基組成(例如,A、T、G和C),因此二聚體(即NN)應該有16種(例如AA、AT、AG、AC、TA、TT、TG、TC、GA、GT、GG、GC、CA、CT、CG和CC)。假設簡併鹼基R和Y在二聚體之後連接,三聚體(NNR/NNY)應該由32種組成,其也與0至31範圍內的32個整數良好地匹配並使二進位串良好地轉移到DNA序列中。在寡聚物合成過程中,是否選擇A或G來表示R以及是否選擇C或T來代替Y取決於它們前面的鹼基(即三聚體的第2個鹼基),實際上前述系統可以使得第2個和第3個鹼基不同,然後與此同時保持GC平 衡。鑒於此前提條件達到,將在候選鹼基之間隨機選擇精確鹼基。總之,該轉換框架的編碼潛力是1.67(即針對3nt為5比特)。 As discussed above, a 5-bit transcoding framework is utilized. Specifically, every 5 consecutive bits from the binary string can be expressed as an integer between 0 and 31 and the following 3 nucleotides [nt] (ie, trimer). For example, DNA oligomers are composed of four bases (eg, A, T, G, and C), so there should be 16 dimers (ie, NN) (eg, AA, AT, AG, AC, TA, TT, TG, TC, GA, GT, GG, GC, CA, CT, CG and CC). Assuming that the degenerate bases R and Y are connected after the dimer, the trimer (NNR/NNY) should consist of 32 species, which also matches well with 32 integers in the range of 0 to 31 and makes the binary string good To the DNA sequence. In the process of oligomer synthesis, whether to choose A or G to represent R and whether to choose C or T to replace Y depends on the base in front of them (ie the second base of the trimer). In fact, the aforementioned system can Make the second and third bases different, and at the same time maintain GC balance. In view of this premise, the exact bases will be randomly selected between candidate bases. In short, the coding potential of the conversion framework is 1.67 (ie 5 bits for 3nt).

圖7示出了基於DNA的資料儲存和取回技術的示例性實現。在這裡,包含如圖6所示的中文字元的文字檔(資料大小:1.16kb)根據本發明所記載之過程經由DNA儲存。 FIG. 7 shows an exemplary implementation of DNA-based data storage and retrieval technology. Here, a text file (data size: 1.16 kb) containing Chinese characters as shown in FIG. 6 is stored via DNA according to the process described in the present invention.

在編碼期間,將文字檔壓縮成單個壓縮檔,然後使用403個具有87nt長度的寡聚物通過DNA儲存框架儲存。同時,為了模擬隨機存取,使用該壓縮檔的6個副本並選擇6對引物。將每對引物添加在403個寡聚物中的每一個的兩個末端。前述6對引物(每個20nt)是正交的,這意味著它們中的任意兩個具有足夠的漢明距離,並且與403個寡聚物中的任何一個具有較少的相似性。在此提交的ASCII文字檔中的序列表包括SEQ ID NO.1-SEQ ID NO.403和作為SEQ ID NO.404-415的引物對PP NO.1-PP NO.6。 During encoding, the text file is compressed into a single compressed file, and then 403 oligomers with a length of 87 nt are stored through a DNA storage framework. At the same time, in order to simulate random access, 6 copies of the compressed file are used and 6 pairs of primers are selected. Each pair of primers was added to both ends of each of the 403 oligomers. The aforementioned 6 pairs of primers (each 20 nt) are orthogonal, which means that any two of them have a sufficient Hamming distance and have less similarity to any of the 403 oligomers. The sequence table in the ASCII text file submitted here includes SEQ ID NO.1-SEQ ID NO.403 and primer pairs PP NO.1-PP NO.6 as SEQ ID NO.404-415.

然後進行寡聚物池的合成。使用CustomArray,Inc.開發的CustomArray平台合成總共2418(即403乘以6)個寡聚物。每種寡聚物為127nt,其包括總共40nt引物(每個末端20nt)。 Then the synthesis of the oligomer pool is carried out. A total of 2418 (ie 403 times 6) oligomers were synthesized using the CustomArray platform developed by CustomArray, Inc. Each oligomer is 127 nt, which includes a total of 40 nt primers (20 nt at each end).

然後進行PCR擴增和NGS。對所有壓縮檔副本進行6次PCR反應。在使用TruSeq DNA PCR-free HT文庫製備試劑盒(以板格式(plate format)的96個索引,96個樣品)和6個文庫索引製備6個樣品之後,由於寡聚物的127nt的長度,使用MiSeq試劑盒V3(150個循環)對合併(pooled)的樣品一起進行測序。NGS資料的Q30為94%(官方標準>85%),簇密度為1,301K/mm2(官方標準1200-1400K/mm2)。 Then PCR amplification and NGS are performed. Perform 6 PCR reactions on all compressed file copies. After using TruSeq DNA PCR-free HT library preparation kit (96 indexes in plate format, 96 samples) and 6 library indexes to prepare 6 samples, due to the length of oligomer 127nt, use MiSeq kit V3 (150 cycles) sequenced pooled samples together. The Q30 of NGS data is 94% (official standard>85%), and the cluster density is 1,301K/mm 2 (official standard 1200-1400K/mm 2 ).

最後,進行解碼。在對壓縮檔的每個副本進行獨立解碼之後,可以隨機地成功取回和解壓所有副本而沒有任何錯誤。 Finally, decode. After each copy of the compressed file is independently decoded, all copies can be retrieved and decompressed randomly without any errors.

圖8呈現了可用於執行本發明的方法的一個或多個步驟的設備。標示為800的這種設備包括計算單元(例如,「中央處理單元」CPU),標示為801,以及一個或更多個記憶體單元(例如RAM(「隨機存取記憶體」)塊(其中中間結果可以是在執行電腦程式的指令期間暫時儲存),或除其他方面之外儲存電腦程式的ROM塊,或EEPROM(「電子抹除式可複寫唯讀記憶體」)塊或快閃記憶體塊),標示為802。電腦程式可由計算單元執行的指令構成。這樣的設備800還可以包括標示為803的專用單元,其構成輸入-輸出介面以允許設備800與其他設備通信。特別地,此專用單元803可以與天線連接(以便在沒有接觸的情況下進行通信),或者與序列埠連接(以進行通信「接觸」)。應該注意,這些單元可以通過例如匯流排一起交換資料。 Figure 8 presents an apparatus that can be used to perform one or more steps of the method of the present invention. This device labeled 800 includes a computing unit (eg, "Central Processing Unit" CPU), labeled 801, and one or more memory units (eg, RAM ("Random Access Memory") blocks (where the The result can be a temporary storage during the execution of the instructions of the computer program), or a ROM block that stores the computer program, among other things, or an EEPROM ("electronically erasable and rewritable read-only memory") block or flash memory block ), marked as 802. The computer program can be composed of instructions executed by the computing unit. Such a device 800 may also include a dedicated unit labeled 803, which constitutes an input-output interface to allow the device 800 to communicate with other devices. In particular, this dedicated unit 803 can be connected to an antenna (for communication without contact), or to a serial port (for communication "contact"). It should be noted that these units can exchange data together via, for example, a bus.

在替代實施型態中,先前描述的方法的一些或所有步驟可以在可程式設計FPGA(「現場可程式設計閘陣列」(「Field Programmable Gate Array」))組件或ASIC(「專用積體電路」)元件中的硬體中實現。 In alternative implementations, some or all of the steps of the previously described method can be in a programmable FPGA ("Field Programmable Gate Array") component or an ASIC ("dedicated integrated circuit" ) Implemented in the hardware of the component.

在替代實施型態中,先前描述的方法的一些或所有步驟可以在包括記憶體單元和處理單元的電子設備(如圖8中所揭示)上執行。這種裝置800可以與高通量合成平台(例如CustomArray)以及DNA測序儀(例如MiSeq測序儀)組合使用。 In alternative embodiments, some or all of the steps of the previously described method may be performed on an electronic device that includes a memory unit and a processing unit (as disclosed in FIG. 8). This device 800 can be used in combination with a high-throughput synthesis platform (eg CustomArray) and a DNA sequencer (eg MiSeq sequencer).

圖9A描繪了用於在核酸上儲存輸入資料的示例性方法900。在塊902上,輸入資料被轉換成核苷酸序列組。在塊904上,輸入資 料被轉換成二進位串。在塊906上,使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。在塊908上,合成包括該核苷酸序列組的核酸組。 9A depicts an exemplary method 900 for storing input data on nucleic acids. At block 902, the input data is converted into a set of nucleotide sequences. At block 904, the input data is converted into a binary string. At block 906, the binary string is converted using a 5-bit transcoding framework to obtain a set of nucleotide sequences. At block 908, a nucleic acid group including the nucleotide sequence group is synthesized.

圖9B描繪了用於取回儲存在核酸上的輸出資料的示例性方法950。在塊952上,獲得核酸組的核苷酸序列組。在塊954上,該核苷酸序列組被轉換成輸出資料。具體地,在塊956上,使用5比特轉碼框架將該核苷酸序列組轉換成二進位串。在塊958上,該二進位串被轉換為輸出資料。 9B depicts an exemplary method 950 for retrieving output data stored on nucleic acids. At block 952, a nucleotide sequence set of nucleic acid sets is obtained. At block 954, the set of nucleotide sequences is converted into output data. Specifically, at block 956, the set of nucleotide sequences is converted into a binary string using a 5-bit transcoding framework. At block 958, the binary string is converted into output data.

儘管已經參考附圖充分描述了本發明和實施例,但應注意,各種變化和修改對於所屬技術領域之通常知識者而言是顯而易見的。這些變化和修改應被理解為包括在由申請專利範圍限定所揭露之內容及實施例的範圍內。 Although the present invention and embodiments have been fully described with reference to the accompanying drawings, it should be noted that various changes and modifications will be apparent to those of ordinary skill in the art. These changes and modifications should be understood to be included within the scope of the contents and embodiments disclosed by the scope of the patent application.

出於解釋的目的,已經參考具體實施型態描述了前述說明書。然而,以上說明性討論並非旨在窮舉或將本發明限制於所揭露的精確形式。鑒於上述教導,許多修改和變化都是可能的。所選擇和描述的實施例是為了最好地解釋技術的原理及其實際應用。因此,本所屬技術領域之通常知識者能夠最好地利用具有適合於預期的特定用途的各種修改的技術和各種實施型態。 For the purpose of explanation, the foregoing description has been described with reference to specific embodiments. However, the above illustrative discussion is not intended to be exhaustive or to limit the invention to the precise forms disclosed. In light of the above teachings, many modifications and changes are possible. The selected and described embodiments are to best explain the principle of the technology and its practical application. Therefore, those of ordinary skill in the art can best utilize various modified techniques and various implementations suitable for the specific intended use.

此申請要求於2017年7月25日提交的中國專利申請No.201710611123.2的權益,其全部內容通過引用併入本發明用於全部目的。 This application requires the rights and interests of China Patent Application No. 201710611123.2 filed on July 25, 2017, the entire contents of which are incorporated by reference into the present invention for all purposes.

以下提交在ASCII文字檔中的內容通過引用整體併入本發明:序列表的電腦可讀形式(CRF)(檔案名:申請號107127162序列表電子資料.TXT,記錄日期:2018年11月30日,大小:179KB)。 The following content submitted in an ASCII text file is incorporated into the present invention by reference in its entirety: Computer readable form (CRF) of sequence table (file name: application number 107127162 sequence table electronic data. TXT, record date: November 30, 2018 , Size: 179KB).

<110> 大陸商南京金斯瑞生物科技有限公司 <110> Continental Business Nanjing Kingsray Biological Technology Co., Ltd.

<120> 以DNA為基礎之資料存取 <120> DNA-based data access

<130> 75989-20003.40 <130> 75989-20003.40

<140> 107127162 <140> 107127162

<141> 2018-08-03 <141> 2018-08-03

<150> CN201710611123.2 <150> CN201710611123.2

<151> 2018-07-25 <151> 2018-07-25

<160> 415 <160> 415

<170> FastSEQ for Windows Version 4.0 <170> FastSEQ for Windows Version 4.0

<210> 1 <210> 1

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 1

Figure 107127162-A0202-12-0053-3
<400> 1
Figure 107127162-A0202-12-0053-3

<210> 2 <210> 2

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 2

Figure 107127162-A0202-12-0053-4
<400> 2
Figure 107127162-A0202-12-0053-4

<210> 3 <210> 3

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 3

Figure 107127162-A0202-12-0054-5
<400> 3
Figure 107127162-A0202-12-0054-5

<210> 4 <210> 4

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 4

Figure 107127162-A0202-12-0054-6
<400> 4
Figure 107127162-A0202-12-0054-6

<210> 5 <210> 5

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 5

Figure 107127162-A0202-12-0054-7
<400> 5
Figure 107127162-A0202-12-0054-7

<210> 6 <210> 6

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 6

Figure 107127162-A0202-12-0055-8
<400> 6
Figure 107127162-A0202-12-0055-8

<210> 7 <210> 7

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 7

Figure 107127162-A0202-12-0055-9
<400> 7
Figure 107127162-A0202-12-0055-9

<210> 8 <210> 8

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 8

Figure 107127162-A0202-12-0055-10
<400> 8
Figure 107127162-A0202-12-0055-10

<210> 9 <210> 9

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 9

Figure 107127162-A0202-12-0055-11
<400> 9
Figure 107127162-A0202-12-0055-11

<210> 10 <210> 10

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 10

Figure 107127162-A0202-12-0056-12
<400> 10
Figure 107127162-A0202-12-0056-12

<210> 11 <210> 11

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 11

Figure 107127162-A0202-12-0056-13
<400> 11
Figure 107127162-A0202-12-0056-13

<210> 12 <210> 12

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 12

Figure 107127162-A0202-12-0056-14
<400> 12
Figure 107127162-A0202-12-0056-14

<210> 13 <210> 13

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 13

Figure 107127162-A0202-12-0056-15
<400> 13
Figure 107127162-A0202-12-0056-15

<210> 14 <210> 14

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 14

Figure 107127162-A0202-12-0057-16
<400> 14
Figure 107127162-A0202-12-0057-16

<210> 15 <210> 15

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 15

Figure 107127162-A0202-12-0057-17
<400> 15
Figure 107127162-A0202-12-0057-17

<210> 16 <210> 16

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 16

Figure 107127162-A0202-12-0057-18
<400> 16
Figure 107127162-A0202-12-0057-18

<210> 17 <210> 17

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 17

Figure 107127162-A0202-12-0058-19
<400> 17
Figure 107127162-A0202-12-0058-19

<210> 18 <210> 18

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 18

Figure 107127162-A0202-12-0058-20
<400> 18
Figure 107127162-A0202-12-0058-20

<210> 19 <210> 19

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 19

Figure 107127162-A0202-12-0058-21
<400> 19
Figure 107127162-A0202-12-0058-21

<210> 20 <210> 20

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 20

Figure 107127162-A0202-12-0058-22
<400> 20
Figure 107127162-A0202-12-0058-22

<210> 21 <210> 21

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 21

Figure 107127162-A0202-12-0059-23
<400> 21
Figure 107127162-A0202-12-0059-23

<210> 22 <210> 22

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 22

Figure 107127162-A0202-12-0059-24
<400> 22
Figure 107127162-A0202-12-0059-24

<210> 23 <210> 23

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 23

Figure 107127162-A0202-12-0059-25
<400> 23
Figure 107127162-A0202-12-0059-25

<210> 24 <210> 24

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 24

Figure 107127162-A0202-12-0059-26
<400> 24
Figure 107127162-A0202-12-0059-26

<210> 25 <210> 25

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 25

Figure 107127162-A0202-12-0060-27
<400> 25
Figure 107127162-A0202-12-0060-27

<210> 26 <210> 26

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 26

Figure 107127162-A0202-12-0060-28
<400> 26
Figure 107127162-A0202-12-0060-28

<210> 27 <210> 27

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 27

Figure 107127162-A0202-12-0060-29
<400> 27
Figure 107127162-A0202-12-0060-29

<210> 28 <210> 28

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 28

Figure 107127162-A0202-12-0061-30
<400> 28
Figure 107127162-A0202-12-0061-30

<210> 29 <210> 29

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 29

Figure 107127162-A0202-12-0061-31
<400> 29
Figure 107127162-A0202-12-0061-31

<210> 30 <210> 30

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 30

Figure 107127162-A0202-12-0061-32
<400> 30
Figure 107127162-A0202-12-0061-32

<210> 31 <210> 31

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 31

Figure 107127162-A0202-12-0061-33
<400> 31
Figure 107127162-A0202-12-0061-33

<210> 32 <210> 32

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 32

Figure 107127162-A0202-12-0062-34
<400> 32
Figure 107127162-A0202-12-0062-34

<210> 33 <210> 33

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 33

Figure 107127162-A0202-12-0062-35
<400> 33
Figure 107127162-A0202-12-0062-35

<210> 34 <210> 34

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 34

Figure 107127162-A0202-12-0062-36
<400> 34
Figure 107127162-A0202-12-0062-36

<210> 35 <210> 35

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 35

Figure 107127162-A0202-12-0062-37
<400> 35
Figure 107127162-A0202-12-0062-37

<210> 36 <210> 36

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 36

Figure 107127162-A0202-12-0063-38
<400> 36
Figure 107127162-A0202-12-0063-38

<210> 37 <210> 37

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 37

Figure 107127162-A0202-12-0063-39
<400> 37
Figure 107127162-A0202-12-0063-39

<210> 38 <210> 38

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 38

Figure 107127162-A0202-12-0063-40
<400> 38
Figure 107127162-A0202-12-0063-40

<210> 39 <210> 39

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 39

Figure 107127162-A0202-12-0064-41
<400> 39
Figure 107127162-A0202-12-0064-41

<210> 40 <210> 40

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 40

Figure 107127162-A0202-12-0064-42
<400> 40
Figure 107127162-A0202-12-0064-42

<210> 41 <210> 41

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 41

Figure 107127162-A0202-12-0064-43
<400> 41
Figure 107127162-A0202-12-0064-43

<210> 42 <210> 42

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 42

Figure 107127162-A0202-12-0064-44
<400> 42
Figure 107127162-A0202-12-0064-44

<210> 43 <210> 43

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 43

Figure 107127162-A0202-12-0065-45
<400> 43
Figure 107127162-A0202-12-0065-45

<210> 44 <210> 44

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 44

Figure 107127162-A0202-12-0065-46
<400> 44
Figure 107127162-A0202-12-0065-46

<210> 45 <210> 45

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 45

Figure 107127162-A0202-12-0065-47
<400> 45
Figure 107127162-A0202-12-0065-47

<210> 46 <210> 46

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 46

Figure 107127162-A0202-12-0065-48
<400> 46
Figure 107127162-A0202-12-0065-48

<210> 47 <210> 47

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 47

Figure 107127162-A0202-12-0066-49
<400> 47
Figure 107127162-A0202-12-0066-49

<210> 48 <210> 48

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 48

Figure 107127162-A0202-12-0066-50
<400> 48
Figure 107127162-A0202-12-0066-50

<210> 49 <210> 49

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 49

Figure 107127162-A0202-12-0066-51
<400> 49
Figure 107127162-A0202-12-0066-51

<210> 50 <210> 50

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 50

Figure 107127162-A0202-12-0067-52
<400> 50
Figure 107127162-A0202-12-0067-52

<210> 51 <210> 51

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 51

Figure 107127162-A0202-12-0067-53
<400> 51
Figure 107127162-A0202-12-0067-53

<210> 52 <210> 52

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 52

Figure 107127162-A0202-12-0067-54
<400> 52
Figure 107127162-A0202-12-0067-54

<210> 53 <210> 53

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 53

Figure 107127162-A0202-12-0067-55
<400> 53
Figure 107127162-A0202-12-0067-55

<210> 54 <210> 54

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 54

Figure 107127162-A0202-12-0068-56
<400> 54
Figure 107127162-A0202-12-0068-56

<210> 55 <210> 55

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 55

Figure 107127162-A0202-12-0068-57
<400> 55
Figure 107127162-A0202-12-0068-57

<210> 56 <210> 56

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 56

Figure 107127162-A0202-12-0068-58
<400> 56
Figure 107127162-A0202-12-0068-58

<210> 57 <210> 57

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 57

Figure 107127162-A0202-12-0068-59
<400> 57
Figure 107127162-A0202-12-0068-59

<210> 58 <210> 58

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 58

Figure 107127162-A0202-12-0069-60
<400> 58
Figure 107127162-A0202-12-0069-60

<210> 59 <210> 59

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 59

Figure 107127162-A0202-12-0069-62
<400> 59
Figure 107127162-A0202-12-0069-62

<210> 60 <210> 60

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 60

Figure 107127162-A0202-12-0069-63
<400> 60
Figure 107127162-A0202-12-0069-63

<210> 61 <210> 61

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 61

Figure 107127162-A0202-12-0070-64
<400> 61
Figure 107127162-A0202-12-0070-64

<210> 62 <210> 62

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 62

Figure 107127162-A0202-12-0070-65
<400> 62
Figure 107127162-A0202-12-0070-65

<210> 63 <210> 63

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 63

Figure 107127162-A0202-12-0070-66
<400> 63
Figure 107127162-A0202-12-0070-66

<210> 64 <210> 64

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 64

Figure 107127162-A0202-12-0070-67
<400> 64
Figure 107127162-A0202-12-0070-67

<210> 65 <210> 65

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 65

Figure 107127162-A0202-12-0071-68
<400> 65
Figure 107127162-A0202-12-0071-68

<210> 66 <210> 66

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 66

Figure 107127162-A0202-12-0071-69
<400> 66
Figure 107127162-A0202-12-0071-69

<210> 67 <210> 67

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 67

Figure 107127162-A0202-12-0071-70
<400> 67
Figure 107127162-A0202-12-0071-70

<210> 68 <210> 68

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 68

Figure 107127162-A0202-12-0071-71
<400> 68
Figure 107127162-A0202-12-0071-71

<210> 69 <210> 69

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 69

Figure 107127162-A0202-12-0072-72
<400> 69
Figure 107127162-A0202-12-0072-72

<210> 70 <210> 70

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 70

Figure 107127162-A0202-12-0072-73
<400> 70
Figure 107127162-A0202-12-0072-73

<210> 71 <210> 71

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 71

Figure 107127162-A0202-12-0072-74
<400> 71
Figure 107127162-A0202-12-0072-74

<210> 72 <210> 72

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 72

Figure 107127162-A0202-12-0073-75
<400> 72
Figure 107127162-A0202-12-0073-75

<210> 73 <210> 73

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 73

Figure 107127162-A0202-12-0073-76
<400> 73
Figure 107127162-A0202-12-0073-76

<210> 74 <210> 74

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 74

Figure 107127162-A0202-12-0073-77
<400> 74
Figure 107127162-A0202-12-0073-77

<210> 75 <210> 75

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 75

Figure 107127162-A0202-12-0073-78
<400> 75
Figure 107127162-A0202-12-0073-78

<210> 76 <210> 76

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 76

Figure 107127162-A0202-12-0074-79
<400> 76
Figure 107127162-A0202-12-0074-79

<210> 77 <210> 77

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 77

Figure 107127162-A0202-12-0074-80
<400> 77
Figure 107127162-A0202-12-0074-80

<210> 78 <210> 78

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 78

Figure 107127162-A0202-12-0074-81
<400> 78
Figure 107127162-A0202-12-0074-81

<210> 79 <210> 79

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 79

Figure 107127162-A0202-12-0074-82
<400> 79
Figure 107127162-A0202-12-0074-82

<210> 80 <210> 80

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 80

Figure 107127162-A0202-12-0075-83
<400> 80
Figure 107127162-A0202-12-0075-83

<210> 81 <210> 81

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 81

Figure 107127162-A0202-12-0075-84
<400> 81
Figure 107127162-A0202-12-0075-84

<210> 82 <210> 82

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 82

Figure 107127162-A0202-12-0075-85
<400> 82
Figure 107127162-A0202-12-0075-85

<210> 83 <210> 83

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 83

Figure 107127162-A0202-12-0076-86
<400> 83
Figure 107127162-A0202-12-0076-86

<210> 84 <210> 84

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 84

Figure 107127162-A0202-12-0076-87
<400> 84
Figure 107127162-A0202-12-0076-87

<210> 85 <210> 85

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 85

Figure 107127162-A0202-12-0076-88
<400> 85
Figure 107127162-A0202-12-0076-88

<210> 86 <210> 86

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 86

Figure 107127162-A0202-12-0076-89
<400> 86
Figure 107127162-A0202-12-0076-89

<210> 87 <210> 87

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 87

Figure 107127162-A0202-12-0077-90
<400> 87
Figure 107127162-A0202-12-0077-90

<210> 88 <210> 88

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 88

Figure 107127162-A0202-12-0077-91
<400> 88
Figure 107127162-A0202-12-0077-91

<210> 89 <210> 89

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 89

Figure 107127162-A0202-12-0077-92
<400> 89
Figure 107127162-A0202-12-0077-92

<210> 90 <210> 90

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 90

Figure 107127162-A0202-12-0077-93
<400> 90
Figure 107127162-A0202-12-0077-93

<210> 91 <210> 91

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 91

Figure 107127162-A0202-12-0078-94
<400> 91
Figure 107127162-A0202-12-0078-94

<210> 92 <210> 92

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 92

Figure 107127162-A0202-12-0078-95
<400> 92
Figure 107127162-A0202-12-0078-95

<210> 93 <210> 93

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 93

Figure 107127162-A0202-12-0078-96
<400> 93
Figure 107127162-A0202-12-0078-96

<210> 94 <210> 94

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 94

Figure 107127162-A0202-12-0079-97
<400> 94
Figure 107127162-A0202-12-0079-97

<210> 95 <210> 95

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 95

Figure 107127162-A0202-12-0079-98
<400> 95
Figure 107127162-A0202-12-0079-98

<210> 96 <210> 96

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 96

Figure 107127162-A0202-12-0079-99
<400> 96
Figure 107127162-A0202-12-0079-99

<210> 97 <210> 97

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 97

Figure 107127162-A0202-12-0079-100
<400> 97
Figure 107127162-A0202-12-0079-100

<210> 98 <210> 98

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 98

Figure 107127162-A0202-12-0080-101
<400> 98
Figure 107127162-A0202-12-0080-101

<210> 99 <210> 99

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 99

Figure 107127162-A0202-12-0080-102
<400> 99
Figure 107127162-A0202-12-0080-102

<210> 100 <210> 100

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 100

Figure 107127162-A0202-12-0080-103
<400> 100
Figure 107127162-A0202-12-0080-103

<210> 101 <210> 101

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 101

Figure 107127162-A0202-12-0080-104
<400> 101
Figure 107127162-A0202-12-0080-104

<210> 102 <210> 102

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 102

Figure 107127162-A0202-12-0081-105
<400> 102
Figure 107127162-A0202-12-0081-105

<210> 103 <210> 103

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 103

Figure 107127162-A0202-12-0081-106
<400> 103
Figure 107127162-A0202-12-0081-106

<210> 104 <210> 104

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 104

Figure 107127162-A0202-12-0081-107
<400> 104
Figure 107127162-A0202-12-0081-107

<210> 105 <210> 105

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 105

Figure 107127162-A0202-12-0082-108
<400> 105
Figure 107127162-A0202-12-0082-108

<210> 106 <210> 106

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 106

Figure 107127162-A0202-12-0082-109
<400> 106
Figure 107127162-A0202-12-0082-109

<210> 107 <210> 107

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 107

Figure 107127162-A0202-12-0082-110
<400> 107
Figure 107127162-A0202-12-0082-110

<210> 108 <210> 108

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 108

Figure 107127162-A0202-12-0082-111
<400> 108
Figure 107127162-A0202-12-0082-111

<210> 109 <210> 109

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 109

Figure 107127162-A0202-12-0083-112
<400> 109
Figure 107127162-A0202-12-0083-112

<210> 110 <210> 110

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 110

Figure 107127162-A0202-12-0083-113
<400> 110
Figure 107127162-A0202-12-0083-113

<210> 111 <210> 111

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 111

Figure 107127162-A0202-12-0083-114
<400> 111
Figure 107127162-A0202-12-0083-114

<210> 112 <210> 112

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 112

Figure 107127162-A0202-12-0083-115
<400> 112
Figure 107127162-A0202-12-0083-115

<210> 113 <210> 113

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 113

Figure 107127162-A0202-12-0084-116
<400> 113
Figure 107127162-A0202-12-0084-116

<210> 114 <210> 114

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 114

Figure 107127162-A0202-12-0084-117
<400> 114
Figure 107127162-A0202-12-0084-117

<210> 115 <210> 115

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 115

Figure 107127162-A0202-12-0084-119
<400> 115
Figure 107127162-A0202-12-0084-119

<210> 116 <210> 116

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 116

Figure 107127162-A0202-12-0085-120
<400> 116
Figure 107127162-A0202-12-0085-120

<210> 117 <210> 117

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 117

Figure 107127162-A0202-12-0085-121
<400> 117
Figure 107127162-A0202-12-0085-121

<210> 118 <210> 118

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 118

Figure 107127162-A0202-12-0085-122
<400> 118
Figure 107127162-A0202-12-0085-122

<210> 119 <210> 119

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 119

Figure 107127162-A0202-12-0085-123
<400> 119
Figure 107127162-A0202-12-0085-123

<210> 120 <210> 120

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 120

Figure 107127162-A0202-12-0086-124
<400> 120
Figure 107127162-A0202-12-0086-124

<210> 121 <210> 121

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 121

Figure 107127162-A0202-12-0086-125
<400> 121
Figure 107127162-A0202-12-0086-125

<210> 122 <210> 122

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 122

Figure 107127162-A0202-12-0086-126
<400> 122
Figure 107127162-A0202-12-0086-126

<210> 123 <210> 123

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 123

Figure 107127162-A0202-12-0086-127
<400> 123
Figure 107127162-A0202-12-0086-127

<210> 124 <210> 124

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 124

Figure 107127162-A0202-12-0087-128
<400> 124
Figure 107127162-A0202-12-0087-128

<210> 125 <210> 125

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 125

Figure 107127162-A0202-12-0087-129
<400> 125
Figure 107127162-A0202-12-0087-129

<210> 126 <210> 126

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 126

Figure 107127162-A0202-12-0087-130
<400> 126
Figure 107127162-A0202-12-0087-130

<210> 127 <210> 127

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 127

Figure 107127162-A0202-12-0088-131
<400> 127
Figure 107127162-A0202-12-0088-131

<210> 128 <210> 128

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 128

Figure 107127162-A0202-12-0088-132
<400> 128
Figure 107127162-A0202-12-0088-132

<210> 129 <210> 129

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 129

Figure 107127162-A0202-12-0088-133
<400> 129
Figure 107127162-A0202-12-0088-133

<210> 130 <210> 130

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 130

Figure 107127162-A0202-12-0088-134
<400> 130
Figure 107127162-A0202-12-0088-134

<210> 131 <210> 131

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 131

Figure 107127162-A0202-12-0089-135
<400> 131
Figure 107127162-A0202-12-0089-135

<210> 132 <210> 132

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 132

Figure 107127162-A0202-12-0089-136
<400> 132
Figure 107127162-A0202-12-0089-136

<210> 133 <210> 133

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 133

Figure 107127162-A0202-12-0089-137
<400> 133
Figure 107127162-A0202-12-0089-137

<210> 134 <210> 134

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 134

Figure 107127162-A0202-12-0089-138
<400> 134
Figure 107127162-A0202-12-0089-138

<210> 135 <210> 135

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 135

Figure 107127162-A0202-12-0090-139
<400> 135
Figure 107127162-A0202-12-0090-139

<210> 136 <210> 136

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 136

Figure 107127162-A0202-12-0090-140
<400> 136
Figure 107127162-A0202-12-0090-140

<210> 137 <210> 137

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 137

Figure 107127162-A0202-12-0090-141
<400> 137
Figure 107127162-A0202-12-0090-141

<210> 138 <210> 138

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 138

Figure 107127162-A0202-12-0091-142
<400> 138
Figure 107127162-A0202-12-0091-142

<210> 139 <210> 139

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 139

Figure 107127162-A0202-12-0091-143
<400> 139
Figure 107127162-A0202-12-0091-143

<210> 140 <210> 140

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 140

Figure 107127162-A0202-12-0091-144
<400> 140
Figure 107127162-A0202-12-0091-144

<210> 141 <210> 141

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 141

Figure 107127162-A0202-12-0091-145
<400> 141
Figure 107127162-A0202-12-0091-145

<210> 142 <210> 142

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 142

Figure 107127162-A0202-12-0092-146
<400> 142
Figure 107127162-A0202-12-0092-146

<210> 143 <210> 143

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 143

Figure 107127162-A0202-12-0092-147
<400> 143
Figure 107127162-A0202-12-0092-147

<210> 144 <210> 144

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 144

Figure 107127162-A0202-12-0092-148
<400> 144
Figure 107127162-A0202-12-0092-148

<210> 145 <210> 145

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 145

Figure 107127162-A0202-12-0092-149
<400> 145
Figure 107127162-A0202-12-0092-149

<210> 146 <210> 146

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 146

Figure 107127162-A0202-12-0093-150
<400> 146
Figure 107127162-A0202-12-0093-150

<210> 147 <210> 147

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 147

Figure 107127162-A0202-12-0093-151
<400> 147
Figure 107127162-A0202-12-0093-151

<210> 148 <210> 148

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 148

Figure 107127162-A0202-12-0093-152
<400> 148
Figure 107127162-A0202-12-0093-152

<210> 149 <210> 149

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 149

Figure 107127162-A0202-12-0094-153
<400> 149
Figure 107127162-A0202-12-0094-153

<210> 150 <210> 150

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 150

Figure 107127162-A0202-12-0094-154
<400> 150
Figure 107127162-A0202-12-0094-154

<210> 151 <210> 151

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 151

Figure 107127162-A0202-12-0094-155
<400> 151
Figure 107127162-A0202-12-0094-155

<210> 152 <210> 152

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 152

Figure 107127162-A0202-12-0094-156
<400> 152
Figure 107127162-A0202-12-0094-156

<210> 153 <210> 153

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 153

Figure 107127162-A0202-12-0095-157
<400> 153
Figure 107127162-A0202-12-0095-157

<210> 154 <210> 154

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 154

Figure 107127162-A0202-12-0095-158
<400> 154
Figure 107127162-A0202-12-0095-158

<210> 155 <210> 155

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 155

Figure 107127162-A0202-12-0095-159
<400> 155
Figure 107127162-A0202-12-0095-159

<210> 156 <210> 156

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 156

Figure 107127162-A0202-12-0095-160
<400> 156
Figure 107127162-A0202-12-0095-160

<210> 157 <210> 157

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 157

Figure 107127162-A0202-12-0096-161
<400> 157
Figure 107127162-A0202-12-0096-161

<210> 158 <210> 158

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 158

Figure 107127162-A0202-12-0096-162
<400> 158
Figure 107127162-A0202-12-0096-162

<210> 159 <210> 159

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 159

Figure 107127162-A0202-12-0096-163
<400> 159
Figure 107127162-A0202-12-0096-163

<210> 160 <210> 160

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 160

Figure 107127162-A0202-12-0097-164
<400> 160
Figure 107127162-A0202-12-0097-164

<210> 161 <210> 161

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 161

Figure 107127162-A0202-12-0097-165
<400> 161
Figure 107127162-A0202-12-0097-165

<210> 162 <210> 162

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 162

Figure 107127162-A0202-12-0097-166
<400> 162
Figure 107127162-A0202-12-0097-166

<210> 163 <210> 163

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 163

Figure 107127162-A0202-12-0097-167
<400> 163
Figure 107127162-A0202-12-0097-167

<210> 164 <210> 164

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 164

Figure 107127162-A0202-12-0098-168
<400> 164
Figure 107127162-A0202-12-0098-168

<210> 165 <210> 165

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 165

Figure 107127162-A0202-12-0098-169
<400> 165
Figure 107127162-A0202-12-0098-169

<210> 166 <210> 166

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 166

Figure 107127162-A0202-12-0098-170
<400> 166
Figure 107127162-A0202-12-0098-170

<210> 167 <210> 167

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 167

Figure 107127162-A0202-12-0098-171
<400> 167
Figure 107127162-A0202-12-0098-171

<210> 168 <210> 168

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 168

Figure 107127162-A0202-12-0099-172
<400> 168
Figure 107127162-A0202-12-0099-172

<210> 169 <210> 169

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 169

Figure 107127162-A0202-12-0099-173
<400> 169
Figure 107127162-A0202-12-0099-173

<210> 170 <210> 170

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 170

Figure 107127162-A0202-12-0099-174
<400> 170
Figure 107127162-A0202-12-0099-174

<210> 171 <210> 171

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 171

Figure 107127162-A0202-12-0100-175
<400> 171
Figure 107127162-A0202-12-0100-175

<210> 172 <210> 172

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 172

Figure 107127162-A0202-12-0100-176
<400> 172
Figure 107127162-A0202-12-0100-176

<210> 173 <210> 173

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 173

Figure 107127162-A0202-12-0100-177
<400> 173
Figure 107127162-A0202-12-0100-177

<210> 174 <210> 174

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 174

Figure 107127162-A0202-12-0100-178
<400> 174
Figure 107127162-A0202-12-0100-178

<210> 175 <210> 175

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 175

Figure 107127162-A0202-12-0101-179
<400> 175
Figure 107127162-A0202-12-0101-179

<210> 176 <210> 176

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 176

Figure 107127162-A0202-12-0101-180
<400> 176
Figure 107127162-A0202-12-0101-180

<210> 177 <210> 177

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 177

Figure 107127162-A0202-12-0101-181
<400> 177
Figure 107127162-A0202-12-0101-181

<210> 178 <210> 178

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 178

Figure 107127162-A0202-12-0101-182
<400> 178
Figure 107127162-A0202-12-0101-182

<210> 179 <210> 179

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 179

Figure 107127162-A0202-12-0102-183
<400> 179
Figure 107127162-A0202-12-0102-183

<210> 180 <210> 180

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 180

Figure 107127162-A0202-12-0102-184
<400> 180
Figure 107127162-A0202-12-0102-184

<210> 181 <210> 181

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 181

Figure 107127162-A0202-12-0102-185
<400> 181
Figure 107127162-A0202-12-0102-185

<210> 182 <210> 182

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 182

Figure 107127162-A0202-12-0103-186
<400> 182
Figure 107127162-A0202-12-0103-186

<210> 183 <210> 183

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 183

Figure 107127162-A0202-12-0103-187
<400> 183
Figure 107127162-A0202-12-0103-187

<210> 184 <210> 184

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 184

Figure 107127162-A0202-12-0103-188
<400> 184
Figure 107127162-A0202-12-0103-188

<210> 185 <210> 185

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 185

Figure 107127162-A0202-12-0103-189
<400> 185
Figure 107127162-A0202-12-0103-189

<210> 186 <210> 186

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 186

Figure 107127162-A0202-12-0104-190
<400> 186
Figure 107127162-A0202-12-0104-190

<210> 187 <210> 187

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 187

Figure 107127162-A0202-12-0104-191
<400> 187
Figure 107127162-A0202-12-0104-191

<210> 188 <210> 188

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 188

Figure 107127162-A0202-12-0104-192
<400> 188
Figure 107127162-A0202-12-0104-192

<210> 189 <210> 189

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 189

Figure 107127162-A0202-12-0104-193
<400> 189
Figure 107127162-A0202-12-0104-193

<210> 190 <210> 190

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 190

Figure 107127162-A0202-12-0105-194
<400> 190
Figure 107127162-A0202-12-0105-194

<210> 191 <210> 191

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 191

Figure 107127162-A0202-12-0105-195
<400> 191
Figure 107127162-A0202-12-0105-195

<210> 192 <210> 192

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 192

Figure 107127162-A0202-12-0105-196
<400> 192
Figure 107127162-A0202-12-0105-196

<210> 193 <210> 193

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 193

Figure 107127162-A0202-12-0106-197
<400> 193
Figure 107127162-A0202-12-0106-197

<210> 194 <210> 194

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 194

Figure 107127162-A0202-12-0106-198
<400> 194
Figure 107127162-A0202-12-0106-198

<210> 195 <210> 195

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 195

Figure 107127162-A0202-12-0106-199
<400> 195
Figure 107127162-A0202-12-0106-199

<210> 196 <210> 196

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 196

Figure 107127162-A0202-12-0106-200
<400> 196
Figure 107127162-A0202-12-0106-200

<210> 197 <210> 197

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 197

Figure 107127162-A0202-12-0107-201
<400> 197
Figure 107127162-A0202-12-0107-201

<210> 198 <210> 198

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 198

Figure 107127162-A0202-12-0107-202
<400> 198
Figure 107127162-A0202-12-0107-202

<210> 199 <210> 199

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 199

Figure 107127162-A0202-12-0107-203
<400> 199
Figure 107127162-A0202-12-0107-203

<210> 200 <210> 200

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 200

Figure 107127162-A0202-12-0107-204
<400> 200
Figure 107127162-A0202-12-0107-204

<210> 201 <210> 201

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 201

Figure 107127162-A0202-12-0108-205
<400> 201
Figure 107127162-A0202-12-0108-205

<210> 202 <210> 202

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 202

Figure 107127162-A0202-12-0108-206
<400> 202
Figure 107127162-A0202-12-0108-206

<210> 203 <210> 203

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 203

Figure 107127162-A0202-12-0108-207
<400> 203
Figure 107127162-A0202-12-0108-207

<210> 204 <210> 204

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 204

Figure 107127162-A0202-12-0109-208
<400> 204
Figure 107127162-A0202-12-0109-208

<210> 205 <210> 205

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 205

Figure 107127162-A0202-12-0109-209
<400> 205
Figure 107127162-A0202-12-0109-209

<210> 206 <210> 206

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 206

Figure 107127162-A0202-12-0109-210
<400> 206
Figure 107127162-A0202-12-0109-210

<210> 207 <210> 207

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 207

Figure 107127162-A0202-12-0109-211
<400> 207
Figure 107127162-A0202-12-0109-211

<210> 208 <210> 208

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 208

Figure 107127162-A0202-12-0110-212
<400> 208
Figure 107127162-A0202-12-0110-212

<210> 209 <210> 209

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 209

Figure 107127162-A0202-12-0110-213
<400> 209
Figure 107127162-A0202-12-0110-213

<210> 210 <210> 210

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 210

Figure 107127162-A0202-12-0110-214
<400> 210
Figure 107127162-A0202-12-0110-214

<210> 211 <210> 211

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 211

Figure 107127162-A0202-12-0110-215
<400> 211
Figure 107127162-A0202-12-0110-215

<210> 212 <210> 212

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 212

Figure 107127162-A0202-12-0111-216
<400> 212
Figure 107127162-A0202-12-0111-216

<210> 213 <210> 213

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 213

Figure 107127162-A0202-12-0111-217
<400> 213
Figure 107127162-A0202-12-0111-217

<210> 214 <210> 214

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 214

Figure 107127162-A0202-12-0111-218
<400> 214
Figure 107127162-A0202-12-0111-218

<210> 215 <210> 215

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 215

Figure 107127162-A0202-12-0112-219
<400> 215
Figure 107127162-A0202-12-0112-219

<210> 216 <210> 216

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 216

Figure 107127162-A0202-12-0112-220
<400> 216
Figure 107127162-A0202-12-0112-220

<210> 217 <210> 217

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 217

Figure 107127162-A0202-12-0112-221
<400> 217
Figure 107127162-A0202-12-0112-221

<210> 218 <210> 218

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 218

Figure 107127162-A0202-12-0112-222
<400> 218
Figure 107127162-A0202-12-0112-222

<210> 219 <210> 219

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 219

Figure 107127162-A0202-12-0113-223
<400> 219
Figure 107127162-A0202-12-0113-223

<210> 220 <210> 220

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 220

Figure 107127162-A0202-12-0113-224
<400> 220
Figure 107127162-A0202-12-0113-224

<210> 221 <210> 221

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 221

Figure 107127162-A0202-12-0113-225
<400> 221
Figure 107127162-A0202-12-0113-225

<210> 222 <210> 222

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 222

Figure 107127162-A0202-12-0113-226
<400> 222
Figure 107127162-A0202-12-0113-226

<210> 223 <210> 223

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 223

Figure 107127162-A0202-12-0114-227
<400> 223
Figure 107127162-A0202-12-0114-227

<210> 224 <210> 224

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 224

Figure 107127162-A0202-12-0114-228
<400> 224
Figure 107127162-A0202-12-0114-228

<210> 225 <210> 225

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 225

Figure 107127162-A0202-12-0114-230
<400> 225
Figure 107127162-A0202-12-0114-230

<210> 226 <210> 226

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 226

Figure 107127162-A0202-12-0115-231
<400> 226
Figure 107127162-A0202-12-0115-231

<210> 227 <210> 227

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 227

Figure 107127162-A0202-12-0115-232
<400> 227
Figure 107127162-A0202-12-0115-232

<210> 228 <210> 228

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 228

Figure 107127162-A0202-12-0115-233
<400> 228
Figure 107127162-A0202-12-0115-233

<210> 229 <210> 229

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 229

Figure 107127162-A0202-12-0115-234
<400> 229
Figure 107127162-A0202-12-0115-234

<210> 230 <210> 230

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 230

Figure 107127162-A0202-12-0116-235
<400> 230
Figure 107127162-A0202-12-0116-235

<210> 231 <210> 231

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 231

Figure 107127162-A0202-12-0116-236
<400> 231
Figure 107127162-A0202-12-0116-236

<210> 232 <210> 232

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 232

Figure 107127162-A0202-12-0116-237
<400> 232
Figure 107127162-A0202-12-0116-237

<210> 233 <210> 233

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 233

Figure 107127162-A0202-12-0116-238
<400> 233
Figure 107127162-A0202-12-0116-238

<210> 234 <210> 234

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 234

Figure 107127162-A0202-12-0117-239
<400> 234
Figure 107127162-A0202-12-0117-239

<210> 235 <210> 235

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 235

Figure 107127162-A0202-12-0117-240
<400> 235
Figure 107127162-A0202-12-0117-240

<210> 236 <210> 236

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 236

Figure 107127162-A0202-12-0117-241
<400> 236
Figure 107127162-A0202-12-0117-241

<210> 237 <210> 237

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 237

Figure 107127162-A0202-12-0118-242
<400> 237
Figure 107127162-A0202-12-0118-242

<210> 238 <210> 238

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 238

Figure 107127162-A0202-12-0118-243
<400> 238
Figure 107127162-A0202-12-0118-243

<210> 239 <210> 239

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 239

Figure 107127162-A0202-12-0118-244
<400> 239
Figure 107127162-A0202-12-0118-244

<210> 240 <210> 240

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 240

Figure 107127162-A0202-12-0118-245
<400> 240
Figure 107127162-A0202-12-0118-245

<210> 241 <210> 241

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 241

Figure 107127162-A0202-12-0119-246
<400> 241
Figure 107127162-A0202-12-0119-246

<210> 242 <210> 242

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 242

Figure 107127162-A0202-12-0119-247
<400> 242
Figure 107127162-A0202-12-0119-247

<210> 243 <210> 243

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 243

Figure 107127162-A0202-12-0119-248
<400> 243
Figure 107127162-A0202-12-0119-248

<210> 244 <210> 244

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 244

Figure 107127162-A0202-12-0119-249
<400> 244
Figure 107127162-A0202-12-0119-249

<210> 245 <210> 245

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 245

Figure 107127162-A0202-12-0120-250
<400> 245
Figure 107127162-A0202-12-0120-250

<210> 246 <210> 246

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 246

Figure 107127162-A0202-12-0120-251
<400> 246
Figure 107127162-A0202-12-0120-251

<210> 247 <210> 247

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 247

Figure 107127162-A0202-12-0120-252
<400> 247
Figure 107127162-A0202-12-0120-252

<210> 248 <210> 248

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 248

Figure 107127162-A0202-12-0121-253
<400> 248
Figure 107127162-A0202-12-0121-253

<210> 249 <210> 249

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 249

Figure 107127162-A0202-12-0121-254
<400> 249
Figure 107127162-A0202-12-0121-254

<210> 250 <210> 250

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 250

Figure 107127162-A0202-12-0121-255
<400> 250
Figure 107127162-A0202-12-0121-255

<210> 251 <210> 251

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 251

Figure 107127162-A0202-12-0121-256
<400> 251
Figure 107127162-A0202-12-0121-256

<210> 252 <210> 252

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 252

Figure 107127162-A0202-12-0122-257
<400> 252
Figure 107127162-A0202-12-0122-257

<210> 253 <210> 253

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 253

Figure 107127162-A0202-12-0122-258
<400> 253
Figure 107127162-A0202-12-0122-258

<210> 254 <210> 254

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 254

Figure 107127162-A0202-12-0122-259
<400> 254
Figure 107127162-A0202-12-0122-259

<210> 255 <210> 255

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 255

Figure 107127162-A0202-12-0122-260
<400> 255
Figure 107127162-A0202-12-0122-260

<210> 256 <210> 256

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 256

Figure 107127162-A0202-12-0123-261
<400> 256
Figure 107127162-A0202-12-0123-261

<210> 257 <210> 257

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 257

Figure 107127162-A0202-12-0123-262
<400> 257
Figure 107127162-A0202-12-0123-262

<210> 258 <210> 258

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 258

Figure 107127162-A0202-12-0123-263
<400> 258
Figure 107127162-A0202-12-0123-263

<210> 259 <210> 259

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 259

Figure 107127162-A0202-12-0124-264
<400> 259
Figure 107127162-A0202-12-0124-264

<210> 260 <210> 260

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 260

Figure 107127162-A0202-12-0124-265
<400> 260
Figure 107127162-A0202-12-0124-265

<210> 261 <210> 261

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 261

Figure 107127162-A0202-12-0124-266
<400> 261
Figure 107127162-A0202-12-0124-266

<210> 262 <210> 262

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 262

Figure 107127162-A0202-12-0124-267
<400> 262
Figure 107127162-A0202-12-0124-267

<210> 263 <210> 263

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 263

Figure 107127162-A0202-12-0125-268
<400> 263
Figure 107127162-A0202-12-0125-268

<210> 264 <210> 264

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 264

Figure 107127162-A0202-12-0125-269
<400> 264
Figure 107127162-A0202-12-0125-269

<210> 265 <210> 265

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 265

Figure 107127162-A0202-12-0125-270
<400> 265
Figure 107127162-A0202-12-0125-270

<210> 266 <210> 266

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 266

Figure 107127162-A0202-12-0125-271
<400> 266
Figure 107127162-A0202-12-0125-271

<210> 267 <210> 267

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 267

Figure 107127162-A0202-12-0126-273
<400> 267
Figure 107127162-A0202-12-0126-273

<210> 268 <210> 268

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 268

Figure 107127162-A0202-12-0126-274
<400> 268
Figure 107127162-A0202-12-0126-274

<210> 269 <210> 269

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 269

Figure 107127162-A0202-12-0126-275
<400> 269
Figure 107127162-A0202-12-0126-275

<210> 270 <210> 270

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<222> 合成構建體 <222> Synthetic construct

<400> 270

Figure 107127162-A0202-12-0127-276
<400> 270
Figure 107127162-A0202-12-0127-276

<210> 271 <210> 271

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 271

Figure 107127162-A0202-12-0127-277
<400> 271
Figure 107127162-A0202-12-0127-277

<210> 272 <210> 272

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 272

Figure 107127162-A0202-12-0127-278
<400> 272
Figure 107127162-A0202-12-0127-278

<210> 273 <210> 273

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 273

Figure 107127162-A0202-12-0127-279
<400> 273
Figure 107127162-A0202-12-0127-279

<210> 274 <210> 274

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 274

Figure 107127162-A0202-12-0128-280
<400> 274
Figure 107127162-A0202-12-0128-280

<210> 275 <210> 275

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 275

Figure 107127162-A0202-12-0128-281
<400> 275
Figure 107127162-A0202-12-0128-281

<210> 276 <210> 276

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 276

Figure 107127162-A0202-12-0128-282
<400> 276
Figure 107127162-A0202-12-0128-282

<210> 277 <210> 277

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 277

Figure 107127162-A0202-12-0128-283
<400> 277
Figure 107127162-A0202-12-0128-283

<210> 278 <210> 278

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 278

Figure 107127162-A0202-12-0129-284
<400> 278
Figure 107127162-A0202-12-0129-284

<210> 279 <210> 279

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 279

Figure 107127162-A0202-12-0129-285
<400> 279
Figure 107127162-A0202-12-0129-285

<210> 280 <210> 280

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 280

Figure 107127162-A0202-12-0129-286
<400> 280
Figure 107127162-A0202-12-0129-286

<210> 281 <210> 281

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 281

Figure 107127162-A0202-12-0130-287
<400> 281
Figure 107127162-A0202-12-0130-287

<210> 282 <210> 282

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 282

Figure 107127162-A0202-12-0130-288
<400> 282
Figure 107127162-A0202-12-0130-288

<210> 283 <210> 283

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 283

Figure 107127162-A0202-12-0130-289
<400> 283
Figure 107127162-A0202-12-0130-289

<210> 284 <210> 284

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 284

Figure 107127162-A0202-12-0130-290
<400> 284
Figure 107127162-A0202-12-0130-290

<210> 285 <210> 285

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 285

Figure 107127162-A0202-12-0131-291
<400> 285
Figure 107127162-A0202-12-0131-291

<210> 286 <210> 286

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 286

Figure 107127162-A0202-12-0131-292
<400> 286
Figure 107127162-A0202-12-0131-292

<210> 287 <210> 287

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 287

Figure 107127162-A0202-12-0131-293
<400> 287
Figure 107127162-A0202-12-0131-293

<210> 288 <210> 288

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 288

Figure 107127162-A0202-12-0131-294
<400> 288
Figure 107127162-A0202-12-0131-294

<210> 289 <210> 289

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 289

Figure 107127162-A0202-12-0132-295
<400> 289
Figure 107127162-A0202-12-0132-295

<210> 290 <210> 290

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 290

Figure 107127162-A0202-12-0132-296
<400> 290
Figure 107127162-A0202-12-0132-296

<210> 291 <210> 291

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 291

Figure 107127162-A0202-12-0132-297
<400> 291
Figure 107127162-A0202-12-0132-297

<210> 292 <210> 292

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 292

Figure 107127162-A0202-12-0133-298
<400> 292
Figure 107127162-A0202-12-0133-298

<210> 293 <210> 293

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 293

Figure 107127162-A0202-12-0133-299
<400> 293
Figure 107127162-A0202-12-0133-299

<210> 294 <210> 294

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 294

Figure 107127162-A0202-12-0133-301
<400> 294
Figure 107127162-A0202-12-0133-301

<210> 295 <210> 295

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 295

Figure 107127162-A0202-12-0133-302
<400> 295
Figure 107127162-A0202-12-0133-302

<210> 296 <210> 296

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 296

Figure 107127162-A0202-12-0134-303
<400> 296
Figure 107127162-A0202-12-0134-303

<210> 297 <210> 297

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 297

Figure 107127162-A0202-12-0134-304
<400> 297
Figure 107127162-A0202-12-0134-304

<210> 298 <210> 298

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 298

Figure 107127162-A0202-12-0134-305
<400> 298
Figure 107127162-A0202-12-0134-305

<210> 299 <210> 299

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 299

Figure 107127162-A0202-12-0134-306
<400> 299
Figure 107127162-A0202-12-0134-306

<210> 300 <210> 300

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 300

Figure 107127162-A0202-12-0135-307
<400> 300
Figure 107127162-A0202-12-0135-307

<210> 301 <210> 301

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 301

Figure 107127162-A0202-12-0135-308
<400> 301
Figure 107127162-A0202-12-0135-308

<210> 302 <210> 302

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 302

Figure 107127162-A0202-12-0135-309
<400> 302
Figure 107127162-A0202-12-0135-309

<210> 303 <210> 303

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 303

Figure 107127162-A0202-12-0136-310
<400> 303
Figure 107127162-A0202-12-0136-310

<210> 304 <210> 304

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 304

Figure 107127162-A0202-12-0136-311
<400> 304
Figure 107127162-A0202-12-0136-311

<210> 305 <210> 305

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 305

Figure 107127162-A0202-12-0136-312
<400> 305
Figure 107127162-A0202-12-0136-312

<210> 306 <210> 306

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 306

Figure 107127162-A0202-12-0136-313
<400> 306
Figure 107127162-A0202-12-0136-313

<210> 307 <210> 307

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 307

Figure 107127162-A0202-12-0137-314
<400> 307
Figure 107127162-A0202-12-0137-314

<210> 308 <210> 308

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 308

Figure 107127162-A0202-12-0137-315
<400> 308
Figure 107127162-A0202-12-0137-315

<210> 309 <210> 309

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 309

Figure 107127162-A0202-12-0137-316
<400> 309
Figure 107127162-A0202-12-0137-316

<210> 310 <210> 310

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 310

Figure 107127162-A0202-12-0137-317
<400> 310
Figure 107127162-A0202-12-0137-317

<210> 311 <210> 311

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 311

Figure 107127162-A0202-12-0138-318
<400> 311
Figure 107127162-A0202-12-0138-318

<210> 312 <210> 312

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 312

Figure 107127162-A0202-12-0138-319
<400> 312
Figure 107127162-A0202-12-0138-319

<210> 313 <210> 313

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 313

Figure 107127162-A0202-12-0138-320
<400> 313
Figure 107127162-A0202-12-0138-320

<210> 314 <210> 314

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 314

Figure 107127162-A0202-12-0139-321
<400> 314
Figure 107127162-A0202-12-0139-321

<210> 315 <210> 315

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 315

Figure 107127162-A0202-12-0139-322
<400> 315
Figure 107127162-A0202-12-0139-322

<210> 316 <210> 316

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 316

Figure 107127162-A0202-12-0139-323
<400> 316
Figure 107127162-A0202-12-0139-323

<210> 317 <210> 317

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 317

Figure 107127162-A0202-12-0139-324
<400> 317
Figure 107127162-A0202-12-0139-324

<210> 318 <210> 318

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 318

Figure 107127162-A0202-12-0140-325
<400> 318
Figure 107127162-A0202-12-0140-325

<210> 319 <210> 319

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 319

Figure 107127162-A0202-12-0140-326
<400> 319
Figure 107127162-A0202-12-0140-326

<210> 320 <210> 320

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 320

Figure 107127162-A0202-12-0140-327
<400> 320
Figure 107127162-A0202-12-0140-327

<210> 321 <210> 321

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 321

Figure 107127162-A0202-12-0140-328
<400> 321
Figure 107127162-A0202-12-0140-328

<210> 322 <210> 322

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 322

Figure 107127162-A0202-12-0141-329
<400> 322
Figure 107127162-A0202-12-0141-329

<210> 323 <210> 323

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 323

Figure 107127162-A0202-12-0141-330
<400> 323
Figure 107127162-A0202-12-0141-330

<210> 324 <210> 324

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 324

Figure 107127162-A0202-12-0141-331
<400> 324
Figure 107127162-A0202-12-0141-331

<210> 325 <210> 325

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 325

Figure 107127162-A0202-12-0142-332
<400> 325
Figure 107127162-A0202-12-0142-332

<210> 326 <210> 326

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 326

Figure 107127162-A0202-12-0142-333
<400> 326
Figure 107127162-A0202-12-0142-333

<210> 327 <210> 327

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 327

Figure 107127162-A0202-12-0142-334
<400> 327
Figure 107127162-A0202-12-0142-334

<210> 328 <210> 328

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 328

Figure 107127162-A0202-12-0142-335
<400> 328
Figure 107127162-A0202-12-0142-335

<210> 329 <210> 329

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 329

Figure 107127162-A0202-12-0143-336
<400> 329
Figure 107127162-A0202-12-0143-336

<210> 330 <210> 330

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 330

Figure 107127162-A0202-12-0143-337
<400> 330
Figure 107127162-A0202-12-0143-337

<210> 331 <210> 331

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 331

Figure 107127162-A0202-12-0143-338
<400> 331
Figure 107127162-A0202-12-0143-338

<210> 332 <210> 332

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 332

Figure 107127162-A0202-12-0143-339
<400> 332
Figure 107127162-A0202-12-0143-339

<210> 333 <210> 333

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 333

Figure 107127162-A0202-12-0144-340
<400> 333
Figure 107127162-A0202-12-0144-340

<210> 334 <210> 334

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 334

Figure 107127162-A0202-12-0144-341
<400> 334
Figure 107127162-A0202-12-0144-341

<210> 335 <210> 335

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 335

Figure 107127162-A0202-12-0144-342
<400> 335
Figure 107127162-A0202-12-0144-342

<210> 336 <210> 336

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 336

Figure 107127162-A0202-12-0145-343
<400> 336
Figure 107127162-A0202-12-0145-343

<210> 337 <210> 337

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 337

Figure 107127162-A0202-12-0145-344
<400> 337
Figure 107127162-A0202-12-0145-344

<210> 338 <210> 338

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 338

Figure 107127162-A0202-12-0145-345
<400> 338
Figure 107127162-A0202-12-0145-345

<210> 339 <210> 339

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 339

Figure 107127162-A0202-12-0145-346
<400> 339
Figure 107127162-A0202-12-0145-346

<210> 340 <210> 340

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 340

Figure 107127162-A0202-12-0146-347
<400> 340
Figure 107127162-A0202-12-0146-347

<210> 341 <210> 341

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 341

Figure 107127162-A0202-12-0146-348
<400> 341
Figure 107127162-A0202-12-0146-348

<210> 342 <210> 342

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 342

Figure 107127162-A0202-12-0146-349
<400> 342
Figure 107127162-A0202-12-0146-349

<210> 343 <210> 343

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 343

Figure 107127162-A0202-12-0146-350
<400> 343
Figure 107127162-A0202-12-0146-350

<210> 344 <210> 344

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 344

Figure 107127162-A0202-12-0147-351
<400> 344
Figure 107127162-A0202-12-0147-351

<210> 345 <210> 345

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 345

Figure 107127162-A0202-12-0147-352
<400> 345
Figure 107127162-A0202-12-0147-352

<210> 346 <210> 346

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 346

Figure 107127162-A0202-12-0147-353
<400> 346
Figure 107127162-A0202-12-0147-353

<210> 347 <210> 347

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 347

Figure 107127162-A0202-12-0148-354
<400> 347
Figure 107127162-A0202-12-0148-354

<210> 348 <210> 348

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 348

Figure 107127162-A0202-12-0148-355
<400> 348
Figure 107127162-A0202-12-0148-355

<210> 349 <210> 349

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 349

Figure 107127162-A0202-12-0148-356
<400> 349
Figure 107127162-A0202-12-0148-356

<210> 350 <210> 350

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 350

Figure 107127162-A0202-12-0148-357
<400> 350
Figure 107127162-A0202-12-0148-357

<210> 351 <210> 351

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 351

Figure 107127162-A0202-12-0149-358
<400> 351
Figure 107127162-A0202-12-0149-358

<210> 352 <210> 352

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 352

Figure 107127162-A0202-12-0149-359
<400> 352
Figure 107127162-A0202-12-0149-359

<210> 353 <210> 353

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 353

Figure 107127162-A0202-12-0149-360
<400> 353
Figure 107127162-A0202-12-0149-360

<210> 354 <210> 354

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 354

Figure 107127162-A0202-12-0149-362
<400> 354
Figure 107127162-A0202-12-0149-362

<210> 355 <210> 355

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 355

Figure 107127162-A0202-12-0150-363
<400> 355
Figure 107127162-A0202-12-0150-363

<210> 356 <210> 356

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 356

Figure 107127162-A0202-12-0150-364
<400> 356
Figure 107127162-A0202-12-0150-364

<210> 357 <210> 357

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 357

Figure 107127162-A0202-12-0150-365
<400> 357
Figure 107127162-A0202-12-0150-365

<210> 358 <210> 358

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 358

Figure 107127162-A0202-12-0151-366
<400> 358
Figure 107127162-A0202-12-0151-366

<210> 359 <210> 359

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 359

Figure 107127162-A0202-12-0151-367
<400> 359
Figure 107127162-A0202-12-0151-367

<210> 360 <210> 360

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 360

Figure 107127162-A0202-12-0151-368
<400> 360
Figure 107127162-A0202-12-0151-368

<210> 361 <210> 361

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 361

Figure 107127162-A0202-12-0151-369
<400> 361
Figure 107127162-A0202-12-0151-369

<210> 362 <210> 362

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 362

Figure 107127162-A0202-12-0152-370
<400> 362
Figure 107127162-A0202-12-0152-370

<210> 363 <210> 363

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 363

Figure 107127162-A0202-12-0152-371
<400> 363
Figure 107127162-A0202-12-0152-371

<210> 364 <210> 364

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 364

Figure 107127162-A0202-12-0152-372
<400> 364
Figure 107127162-A0202-12-0152-372

<210> 365 <210> 365

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 365

Figure 107127162-A0202-12-0152-373
<400> 365
Figure 107127162-A0202-12-0152-373

<210> 366 <210> 366

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 366

Figure 107127162-A0202-12-0153-374
<400> 366
Figure 107127162-A0202-12-0153-374

<210> 367 <210> 367

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 367

Figure 107127162-A0202-12-0153-375
<400> 367
Figure 107127162-A0202-12-0153-375

<210> 368 <210> 368

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 368

Figure 107127162-A0202-12-0153-376
<400> 368
Figure 107127162-A0202-12-0153-376

<210> 369 <210> 369

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 369

Figure 107127162-A0202-12-0154-377
<400> 369
Figure 107127162-A0202-12-0154-377

<210> 370 <210> 370

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 370

Figure 107127162-A0202-12-0154-378
<400> 370
Figure 107127162-A0202-12-0154-378

<210> 371 <210> 371

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 371

Figure 107127162-A0202-12-0154-379
<400> 371
Figure 107127162-A0202-12-0154-379

<210> 372 <210> 372

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 372

Figure 107127162-A0202-12-0154-380
<400> 372
Figure 107127162-A0202-12-0154-380

<210> 373 <210> 373

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 373

Figure 107127162-A0202-12-0155-381
<400> 373
Figure 107127162-A0202-12-0155-381

<210> 374 <210> 374

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 374

Figure 107127162-A0202-12-0155-382
<400> 374
Figure 107127162-A0202-12-0155-382

<210> 375 <210> 375

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 375

Figure 107127162-A0202-12-0155-383
<400> 375
Figure 107127162-A0202-12-0155-383

<210> 376 <210> 376

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 376

Figure 107127162-A0202-12-0155-384
<400> 376
Figure 107127162-A0202-12-0155-384

<210> 377 <210> 377

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 377

Figure 107127162-A0202-12-0156-385
<400> 377
Figure 107127162-A0202-12-0156-385

<210> 378 <210> 378

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 378

Figure 107127162-A0202-12-0156-386
<400> 378
Figure 107127162-A0202-12-0156-386

<210> 379 <210> 379

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 379

Figure 107127162-A0202-12-0156-387
<400> 379
Figure 107127162-A0202-12-0156-387

<210> 380 <210> 380

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 380

Figure 107127162-A0202-12-0157-388
<400> 380
Figure 107127162-A0202-12-0157-388

<210> 381 <210> 381

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 381

Figure 107127162-A0202-12-0157-389
<400> 381
Figure 107127162-A0202-12-0157-389

<210> 382 <210> 382

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 382

Figure 107127162-A0202-12-0157-390
<400> 382
Figure 107127162-A0202-12-0157-390

<210> 383 <210> 383

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 383

Figure 107127162-A0202-12-0157-391
<400> 383
Figure 107127162-A0202-12-0157-391

<210> 384 <210> 384

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 384

Figure 107127162-A0202-12-0158-392
<400> 384
Figure 107127162-A0202-12-0158-392

<210> 385 <210> 385

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 385

Figure 107127162-A0202-12-0158-393
<400> 385
Figure 107127162-A0202-12-0158-393

<210> 386 <210> 386

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 386

Figure 107127162-A0202-12-0158-394
<400> 386
Figure 107127162-A0202-12-0158-394

<210> 387 <210> 387

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 387

Figure 107127162-A0202-12-0158-395
<400> 387
Figure 107127162-A0202-12-0158-395

<210> 388 <210> 388

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 388

Figure 107127162-A0202-12-0159-396
<400> 388
Figure 107127162-A0202-12-0159-396

<210> 389 <210> 389

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 389

Figure 107127162-A0202-12-0159-397
<400> 389
Figure 107127162-A0202-12-0159-397

<210> 390 <210> 390

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 390

Figure 107127162-A0202-12-0159-398
<400> 390
Figure 107127162-A0202-12-0159-398

<210> 391 <210> 391

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 391

Figure 107127162-A0202-12-0160-399
<400> 391
Figure 107127162-A0202-12-0160-399

<210> 392 <210> 392

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 392

Figure 107127162-A0202-12-0160-400
<400> 392
Figure 107127162-A0202-12-0160-400

<210> 393 <210> 393

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 393

Figure 107127162-A0202-12-0160-401
<400> 393
Figure 107127162-A0202-12-0160-401

<210> 394 <210> 394

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 394

Figure 107127162-A0202-12-0160-402
<400> 394
Figure 107127162-A0202-12-0160-402

<210> 395 <210> 395

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 395

Figure 107127162-A0202-12-0161-403
<400> 395
Figure 107127162-A0202-12-0161-403

<210> 396 <210> 396

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 396

Figure 107127162-A0202-12-0161-404
<400> 396
Figure 107127162-A0202-12-0161-404

<210> 397 <210> 397

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 397

Figure 107127162-A0202-12-0161-405
<400> 397
Figure 107127162-A0202-12-0161-405

<210> 398 <210> 398

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 398

Figure 107127162-A0202-12-0161-406
<400> 398
Figure 107127162-A0202-12-0161-406

<210> 399 <210> 399

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 399

Figure 107127162-A0202-12-0162-407
<400> 399
Figure 107127162-A0202-12-0162-407

<210> 400 <210> 400

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 400

Figure 107127162-A0202-12-0162-408
<400> 400
Figure 107127162-A0202-12-0162-408

<210> 401 <210> 401

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 401

Figure 107127162-A0202-12-0162-409
<400> 401
Figure 107127162-A0202-12-0162-409

<210> 402 <210> 402

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 402

Figure 107127162-A0202-12-0163-410
<400> 402
Figure 107127162-A0202-12-0163-410

<210> 403 <210> 403

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 403

Figure 107127162-A0202-12-0163-411
<400> 403
Figure 107127162-A0202-12-0163-411

<210> 404 <210> 404

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 404

Figure 107127162-A0202-12-0163-412
<400> 404
Figure 107127162-A0202-12-0163-412

<210> 405 <210> 405

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 405

Figure 107127162-A0202-12-0163-413
<400> 405
Figure 107127162-A0202-12-0163-413

<210> 406 <210> 406

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 406

Figure 107127162-A0202-12-0164-414
<400> 406
Figure 107127162-A0202-12-0164-414

<210> 407 <210> 407

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 407

Figure 107127162-A0202-12-0164-415
<400> 407
Figure 107127162-A0202-12-0164-415

<210> 408 <210> 408

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 408

Figure 107127162-A0202-12-0164-416
<400> 408
Figure 107127162-A0202-12-0164-416

<210> 409 <210> 409

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 409

Figure 107127162-A0202-12-0164-417
<400> 409
Figure 107127162-A0202-12-0164-417

<210> 410 <210> 410

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 410

Figure 107127162-A0202-12-0165-418
<400> 410
Figure 107127162-A0202-12-0165-418

<210> 411 <210> 411

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 411

Figure 107127162-A0202-12-0165-419
<400> 411
Figure 107127162-A0202-12-0165-419

<210> 412 <210> 412

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 412

Figure 107127162-A0202-12-0165-420
<400> 412
Figure 107127162-A0202-12-0165-420

<210> 413 <210> 413

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 413

Figure 107127162-A0202-12-0165-421
<400> 413
Figure 107127162-A0202-12-0165-421

<210> 414 <210> 414

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 414

Figure 107127162-A0202-12-0166-422
<400> 414
Figure 107127162-A0202-12-0166-422

<210> 415 <210> 415

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 415

Figure 107127162-A0202-12-0166-423
<400> 415
Figure 107127162-A0202-12-0166-423

Claims (58)

一種用於在核酸上儲存輸入資料的方法,其特徵係包括:a)將前述輸入資料轉換成核苷酸序列組,其中前述轉換包括i)資料處理步驟,其包括將前述輸入資料轉換成二進位串;ii)核苷酸編碼步驟,其包括使用5比特轉碼框架轉換前述二進位串以獲得前述核苷酸序列組;以及b)合成包括前述核苷酸序列組的核酸組。 A method for storing input data on a nucleic acid, characterized in that it includes: a) converting the foregoing input data into a nucleotide sequence group, wherein the foregoing conversion includes i) a data processing step, which includes converting the foregoing input data into two The carry string; ii) a nucleotide coding step, which includes converting the aforementioned binary string using a 5-bit transcoding framework to obtain the aforementioned nucleotide sequence group; and b) synthesizing the nucleic acid group including the aforementioned nucleotide sequence group. 一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法,其特徵係包括:i)資料處理步驟,其包括將前述輸入資料轉換成二進位串;ii)核苷酸編碼步驟,其包括使用5比特轉碼框架轉換前述二進位串以獲得核苷酸序列組。 A computer-implemented method for converting input data into a set of nucleotide sequences, characterized in that it includes: i) a data processing step, which includes converting the aforementioned input data into a binary string; ii) a nucleotide encoding step, which This involves converting the aforementioned binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. 如申請專利範圍第1或2項所記載之方法,其中,前述資料處理步驟包括將前述二進位串分成非重疊的5比特二進位串的序列。 The method as described in item 1 or 2 of the patent application scope, wherein the aforementioned data processing step includes dividing the aforementioned binary string into a sequence of non-overlapping 5-bit binary strings. 如申請專利範圍第3項所記載之方法,其中,前述核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串。 The method as described in item 3 of the patent application scope, wherein the aforementioned nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string. 如申請專利範圍第4項所記載之方法,其中,前述核苷酸編碼步驟進一步包括用前述5比特轉碼框架轉換前述整數串以獲得前述核苷酸序列組。 The method as described in item 4 of the patent application scope, wherein the nucleotide encoding step further includes converting the integer string with the 5-bit transcoding framework to obtain the nucleotide sequence group. 如申請專利範圍第4項所記載之方法,其中,前述核苷酸編碼步驟進一步包括將前述整數串分成多個具有預定長度的初始整數子序列。 The method as described in item 4 of the patent application scope, wherein the aforementioned nucleotide coding step further includes dividing the aforementioned integer string into a plurality of initial integer subsequences having a predetermined length. 如申請專利範圍第6項所記載之方法,其中,前述多個初始整數子序列中的每一個的長度基於所選擇的合成平台的寡聚物長度、所需的容錯度、 前述輸入資料的大小、所選擇的錯誤校正碼或其組合來確定。 The method as described in item 6 of the patent application scope, wherein the length of each of the plurality of initial integer subsequences is based on the length of the oligomer of the selected synthesis platform, the required error tolerance, and the size of the input data 3. Determine the selected error correction code or a combination thereof. 如申請專利範圍第6或7項所記載之方法,其中,前述核苷酸編碼步驟進一步包括添加索引資訊至前述多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列。 The method as described in item 6 or 7 of the patent application scope, wherein the nucleotide encoding step further includes adding index information to each of the plurality of initial integer subsequences to obtain a plurality of indexed integer subsequences. 如申請專利範圍第8項所記載之方法,其中,添加至前述多個初始整數子序列中的每一個的前述索引資訊包括整數序列,其中前述整數序列的長度基於前述輸入資料的大小。 The method as recited in item 8 of the patent application scope, wherein the index information added to each of the plurality of initial integer subsequences includes an integer sequence, wherein the length of the integer sequence is based on the size of the input data. 如申請專利範圍第8或9項所記載之方法,其中,前述核苷酸編碼步驟包括,在添加前述索引資訊之後,添加冗餘數據至前述多個具有索引的整數子序列,從而獲得多個具有冗餘的整數子序列。 The method as described in item 8 or 9 of the patent application, wherein the nucleotide coding step includes, after adding the index information, adding redundant data to the multiple indexed integer subsequences, thereby obtaining multiple With redundant integer subsequences. 如申請專利範圍第10項所記載之方法,其中,添加冗餘數據至前述多個具有索引的整數子序列包括:創建空矩陣,其中前述空矩陣中的列數大於前述多個具有索引的整數子序列的大小,且其中前述空矩陣中的行數大於在前述多個具有索引的整數子序列中每一個中的整數個數;用前述多個具有索引的整數子序列及透過應用錯誤校正編碼生成的資料填充前述空矩陣;以及基於被填充的矩陣獲得前述多個具有冗餘的整數子序列。 The method as described in item 10 of the patent application scope, wherein adding redundant data to the aforementioned multiple indexed integer subsequences includes: creating an empty matrix, wherein the number of columns in the aforementioned empty matrix is greater than the aforementioned multiple indexed integers The size of the subsequence, and wherein the number of rows in the empty matrix is greater than the number of integers in each of the plurality of indexed integer subsequences; encoding with the plurality of indexed integer subsequences and by applying error correction The generated data fills the aforementioned empty matrix; and obtains the aforementioned multiple integer subsequences with redundancy based on the filled matrix. 如申請專利範圍第11項所記載之方法,其中,前述空矩陣的列數基於所選擇的合成平台的寡聚物長度、錯誤校正碼的類型、預定的容錯度值、前述多個具有索引的整數子序列的大小或其組合來確定。 The method as described in item 11 of the patent application range, wherein the number of columns of the empty matrix is based on the length of the oligomer of the selected synthesis platform, the type of error correction code, the predetermined error tolerance value, the plurality of indexed The size of the integer subsequence or a combination thereof is determined. 如申請專利範圍第11或12項所記載之方法,其中,前述空矩陣的行數 基於所選擇的合成平台的寡聚物長度、錯誤校正碼的類型、預定的容錯度值、前述多個具有索引的整數子序列的大小或其組合來確定。 The method as described in item 11 or 12 of the patent application scope, wherein the number of rows of the aforementioned empty matrix is based on the length of the oligomer of the selected synthesis platform, the type of error correction code, the predetermined error tolerance value, the plurality of The size of the indexed integer subsequence or a combination thereof is determined. 如申請專利範圍第11-13項中任一項所記載之方法,其中,前述錯誤校正編碼係里德-所羅門(「RS」)編碼。 The method as described in any of items 11-13 of the patent application scope, wherein the aforementioned error correction code is a Reed-Solomon ("RS") code. 如申請專利範圍第14項所記載之方法,其中,前述透過應用錯誤校正編碼生成的資料係透過應用前述RS編碼的串校正及/或前述RS編碼的塊校正來生成的。 The method described in item 14 of the patent application scope, wherein the data generated by applying the error correction coding is generated by applying the string correction of the RS coding and/or the block correction of the RS coding. 如申請專利範圍第1-15項中任一項所記載之方法,其中,前述5比特轉碼框架以表2為依據。 The method as described in any of items 1-15 of the patent application scope, wherein the aforementioned 5-bit transcoding framework is based on Table 2. 如申請專利範圍第16項所記載之方法,其中,R及Y的選擇係基於:1)與緊鄰R或Y前面的核苷酸不同;及/或2)前述核苷酸序列的估算GC含量。 The method as described in item 16 of the patent application scope, wherein the selection of R and Y is based on: 1) different from the nucleotide immediately before R or Y; and/or 2) the estimated GC content of the aforementioned nucleotide sequence . 如申請專利範圍第1-17項中任一項所記載之方法,其中,前述輸入資料對應於壓縮檔。 The method as described in any of items 1-17 of the patent application scope, wherein the aforementioned input data corresponds to a compressed file. 如申請專利範圍第1-18項中任一項所記載之方法,其中,前述輸入資料對應於兩個或更多個檔。 The method as described in any of items 1 to 18 of the patent application scope, wherein the aforementioned input data corresponds to two or more files. 如申請專利範圍第1-17或19項中任一項所記載之方法,其中,前述輸入資料對應於文字檔。 The method as described in any of items 1-17 or 19 of the patent application scope, wherein the aforementioned input data corresponds to a text file. 如申請專利範圍第1-20項中任一項所記載之方法,其中,前述資料處理步驟進一步包括壓縮前述輸入資料以獲得壓縮檔以及將前述壓縮檔轉換成二進位串。 The method as described in any one of items 1-20 of the patent application scope, wherein the aforementioned data processing step further includes compressing the aforementioned input data to obtain a compressed file and converting the aforementioned compressed file into a binary string. 如申請專利範圍第18或21項所記載之方法,其中,前述壓縮檔係使用朗佩爾-齊科-瑪律可夫鏈演算法(「LZMA」)進行壓縮的。 The method as described in item 18 or 21 of the patent application scope, wherein the aforementioned compression file is compressed using the Lampel-Zico-Marykov Chain Algorithm ("LZMA"). 如申請專利範圍第19項所記載之方法,其中,前述資料處理步驟進一步包括:將兩個或更多個檔歸組為TAR檔。 As described in Item 19 of the patent application scope, the aforementioned data processing step further includes: grouping two or more files into TAR files. 如申請專利範圍第23項所記載之方法,其中,使用朗佩爾-齊科-瑪律可夫鏈演算法(「LZMA」)將前述TAR檔進一步壓縮。 The method as described in item 23 of the patent application scope, in which the aforementioned TAR file is further compressed using the Lampel-Zico-Markov Chain Algorithm ("LZMA"). 如申請專利範圍第1-24項中任一項所記載之方法,其中,前述核苷酸編碼步驟進一步包括將引物序列對附加至前述核苷酸序列組的每個核苷酸序列的5’及3’末端。 The method as described in any one of claims 1-24, wherein the aforementioned nucleotide coding step further includes appending a primer sequence pair to the 5'of each nucleotide sequence of the aforementioned nucleotide sequence group And the 3'end. 如申請專利範圍第1項所記載之方法,其中,進一步包括附接引物對至合成核酸組。 The method as described in item 1 of the patent application scope, which further includes attaching a primer pair to the synthetic nucleic acid group. 一種在核酸上儲存兩組或更多組輸入資料的方法,其特徵係包括:a)如申請專利範圍第2-19項中任一項所記載之方法,將前述兩組或更多組輸入資料分別轉換成兩組或更多組相應的核苷酸序列;b)分別將引物序列對附加至前述兩組或更多組相應的核苷酸序列中的每一組的5’及3’末端,其中用於前述兩組或更多組相應的核苷酸序列的引物對彼此不同;以及c)合成分別包括前述兩組或更多組相應的核苷酸序列的兩組或更多組核酸。 A method for storing two or more sets of input data on a nucleic acid, the characteristics of which include: a) As described in any one of items 2-19 of the patent application scope, input the two or more sets of the foregoing Convert the data into two or more sets of corresponding nucleotide sequences; b) Attach the primer sequence pairs to the 5'and 3'of each of the two or more sets of corresponding nucleotide sequences respectively Ends, where the primer pairs used for the aforementioned two or more sets of corresponding nucleotide sequences are different from each other; and c) synthesize two or more sets including the aforementioned two or more sets of corresponding nucleotide sequences, respectively Nucleic acid. 如申請專利範圍第27項所記載之方法,其中,每對引物具有的序列不同於前述兩組或更多組相應的核苷酸序列或其互補序列中的任何一個。 The method as described in item 27 of the patent application range, wherein each pair of primers has a sequence different from any one of the aforementioned two or more sets of corresponding nucleotide sequences or complementary sequences thereof. 如申請專利範圍第1或3-28項中任一項所記載之方法,其中,合成核酸組的GC含量範圍為30%至70%。 The method as described in any of items 1 or 3-28 of the patent application scope, wherein the GC content of the synthetic nucleic acid group ranges from 30% to 70%. 如申請專利範圍第1或3-29項中任一項所記載之方法,其中,合成核酸 組的GC含量範圍小於約70%。 The method as described in any of items 1 or 3-29 of the patent application scope, wherein the GC content of the synthetic nucleic acid group is less than about 70%. 如申請專利範圍第1項所記載之方法,其中,進一步包括儲存合成核酸組。 The method as described in item 1 of the patent application scope, which further includes storing the synthetic nucleic acid group. 如申請專利範圍第31項所記載之方法,其中,前述合成核酸組透過乾燥儲存。 The method as described in item 31 of the patent application scope, wherein the aforementioned synthetic nucleic acid group is stored by drying. 如申請專利範圍第32項所記載之方法,其中,前述合成核酸組透過凍乾法儲存。 The method as described in item 32 of the patent application scope, wherein the aforementioned synthetic nucleic acid group is stored by lyophilization. 如申請專利範圍第31項所記載之方法,其中,將前述合成核酸組固定在載體上。 The method as described in item 31 of the patent application scope, wherein the aforementioned synthetic nucleic acid group is fixed on a carrier. 如申請專利範圍第34項所記載之方法,其中,前述載體係微陣列。 The method described in item 34 of the patent application scope, wherein the aforementioned carrier is a microarray. 一種用於取回儲存在核酸上的輸出資料的方法,其特徵係包括:a)獲得核酸組的核苷酸序列組,b)將前述核苷酸序列組轉換成前述輸出資料,其中前述轉換包括:i)核酸解碼步驟,包括使用5比特轉碼框架將前述核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將二進位串轉換成前述輸出資料,從而獲得前述輸出資料。 A method for retrieving output data stored on a nucleic acid, characterized by: a) obtaining a nucleotide sequence group of a nucleic acid group, b) converting the aforementioned nucleotide sequence group into the aforementioned output data, wherein the aforementioned conversion Including: i) nucleic acid decoding step, including converting the aforementioned nucleotide sequence group into a binary string using a 5-bit transcoding framework; and ii) data processing step, including converting the binary string into the aforementioned output data, thereby obtaining the aforementioned output data. 如申請專利範圍第36項所記載之方法,其中,前述方法包括在取回前述輸出資料之前擴增前述核酸組。 The method as described in item 36 of the patent application scope, wherein the foregoing method includes amplifying the nucleic acid group before retrieving the output data. 如申請專利範圍第36-37項中任一項所記載之方法,其中,進一步包括測序前述核酸組以生成多個序列讀數。 The method as described in any one of the patent application items 36-37, further comprising sequencing the aforementioned nucleic acid group to generate multiple sequence reads. 如申請專利範圍第38項所記載之方法,其中,將前述多個序列讀數配 對、合併及過濾以獲得前述核苷酸序列組。 The method as described in item 38 of the patent application scope, wherein the plurality of sequence reads are paired, combined and filtered to obtain the aforementioned nucleotide sequence group. 一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法,其特徵係包括:i)核苷酸解碼步驟,包括使用5比特轉碼框架將前述核苷酸序列組轉換成二進位串;以及ii)資料處理步驟,包括將二進位串轉換成前述輸出資料。 A computer-implemented method for converting a nucleotide sequence group into output data, which includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the aforementioned nucleotide sequence group into a binary string ; And ii) data processing steps, including converting the binary string into the aforementioned output data. 如申請專利範圍第36-40項中任一項所記載之方法,其中,前述核苷酸解碼步驟包括將前述核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。 The method as described in any one of patent application items 36-40, wherein the nucleotide decoding step includes converting the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31 . 如申請專利範圍第41項所記載之方法,其中,前述核苷酸解碼步驟進一步包括將錯誤校正編碼應用於前述多個整數子序列,從而獲得前述多個具有索引的整數子序列。 The method as described in item 41 of the patent application range, wherein the nucleotide decoding step further includes applying an error correction code to the plurality of integer subsequences, thereby obtaining the plurality of indexed integer subsequences. 如申請專利範圍第42項所記載之方法,其中,前述應用錯誤校正編碼的步驟包括:i)將RS編碼串校正應用於前述多個整數子序列以獲得多個一致整數子序列;以及ii)將RS編碼塊校正應用於前述多個一致整數子序列以獲得前述多個具有索引的整數子序列。 The method as described in item 42 of the patent application range, wherein the step of applying error correction coding includes: i) applying RS code string correction to the foregoing multiple integer subsequences to obtain multiple uniform integer subsequences; and ii) The RS coding block correction is applied to the aforementioned plurality of uniform integer subsequences to obtain the aforementioned plurality of indexed integer subsequences. 如申請專利範圍第42或43項所記載之方法,其中,前述核苷酸解碼步驟進一步包括從前述多個具有索引的整數子序列中去除前述索引以獲得多個核心整數子序列。 The method as recited in item 42 or 43 of the patent application range, wherein the nucleotide decoding step further includes removing the index from the plurality of indexed integer subsequences to obtain a plurality of core integer subsequences. 如申請專利範圍第44項所記載之方法,其中,前述核苷酸解碼步驟進 一步包括將前述核心整數子序列合併為整數串。 The method as described in item 44 of the patent application range, wherein the aforementioned nucleotide decoding step further includes combining the aforementioned core integer subsequences into an integer string. 如申請專利範圍第45項所記載之方法,其中,前述核苷酸解碼步驟進一步包括將前述整數串轉換成二進位串。 The method as described in item 45 of the patent application scope, wherein the nucleotide decoding step further includes converting the integer string into a binary string. 如申請專利範圍第46項所記載之方法,其中,將前述輸出資料儲存在壓縮檔中。 The method as described in item 46 of the patent application scope, in which the aforementioned output data is stored in a compressed file. 如申請專利範圍第47項所記載之方法,其中前述資料處理步驟進一步包括解壓前述壓縮檔。 The method as described in item 47 of the patent application scope, wherein the aforementioned data processing step further includes decompressing the aforementioned compressed file. 如申請專利範圍第48項所記載之方法,其中,前述解壓透過LZMA演算法進行。 The method as described in item 48 of the patent application scope, wherein the decompression is performed through the LZMA algorithm. 如申請專利範圍第46項所記載之方法,其中,前述輸出資料對應於多個檔。 The method as described in item 46 of the patent application scope, wherein the aforementioned output data corresponds to multiple files. 如申請專利範圍第50項所記載之方法,其中,進一步包括透過TAR演算法從前述輸出資料中提取前述多個檔。 The method as described in item 50 of the patent application scope further includes extracting the plurality of files from the output data through the TAR algorithm. 如申請專利範圍第36-51項中任一項所記載之方法,其中,前述5比特轉碼框架以表2為依據。 The method as described in any of items 36-51 of the patent application range, wherein the aforementioned 5-bit transcoding framework is based on Table 2. 如申請專利範圍第36-53項中任一項所記載之方法,其中,前述核酸組包括位於3’及5’末端的引物序列且前述方法包括在前述核苷酸解碼步驟之前去除前述引物序列。 The method as described in any one of patent application items 36-53, wherein the nucleic acid set includes primer sequences at the 3'and 5'ends and the method includes removing the primer sequence before the nucleotide decoding step . 一種用於取回儲存在感興趣的核酸組上的輸出資料的方法,其特徵係,前述感興趣的核酸組係存在於混合物中的多組核苷酸序列中的一組,每一組編碼不同的輸出資料組且具有位於3’及5’末端的不同的引物對組,前述方法包括: a)使用對應於感興趣的核酸的引物對擴增前述核酸組;b)獲得擴增核酸的核苷酸序列組,c)如申請專利範圍第40-52項中任一項所記載之方法,將前述核苷酸序列組轉換成前述輸出資料;從而獲得前述輸出資料。 A method for retrieving output data stored on a nucleic acid group of interest, characterized in that the aforementioned nucleic acid group of interest is one of multiple sets of nucleotide sequences present in the mixture, each group encoding Different output data sets with different primer pair sets at the 3'and 5'ends, the aforementioned method includes: a) amplifying the aforementioned nucleic acid set using primer pairs corresponding to the nucleic acid of interest; b) obtaining the amplified nucleic acid Nucleotide sequence group, c) converting the aforementioned nucleotide sequence group into the aforementioned output data according to the method described in any of items 40-52 of the patent application scope; thereby obtaining the aforementioned output data. 一種用於取回儲存在兩組或更多組感興趣的核酸上的相應的兩組或更多組輸出資料的方法,其特徵係,前述兩組或更多組感興趣的核酸係在存在於混合物中的多個核苷酸序列之中,每一組編碼不同的輸出資料組且具有位於3’及5’末端的不同的引物對組,前述方法包括:a)使用對應於前述兩組或更多組感興趣的核酸的引物對擴增前述兩組或更多組感興趣的核酸;b)獲得擴增核酸的兩組或更多組核苷酸序列,c)如申請專利範圍第40-52項中任一項所記載之方法,將前述兩組或更多組核苷酸序列分別轉換成前述兩組或更多組輸出資料;從而獲得前述兩組或更多組輸出資料。 A method for retrieving corresponding two or more sets of output data stored on two or more sets of nucleic acids of interest, characterized by the fact that the aforementioned two or more sets of nucleic acids of interest exist Among the multiple nucleotide sequences in the mixture, each set encodes a different set of output data and has different sets of primer pairs at the 3'and 5'ends. The foregoing methods include: a) using the corresponding two sets Or more pairs of primers of the nucleic acid of interest to amplify the aforementioned two or more sets of nucleic acids of interest; b) obtain two or more sets of nucleotide sequences of the amplified nucleic acid, c) if the patent application The method described in any one of items 40-52 converts the aforementioned two or more sets of nucleotide sequences into the aforementioned two or more sets of output data; thereby obtaining the aforementioned two or more sets of output data. 一種儲存一個或更多個程式的非暫時性電腦可讀儲存介質,其特徵係,前述一個或更多個程式包括指令,當其由電子裝置的一個或更多個處理器執行時,使前述電子裝置實施如申請專利範圍第2-36或40-52項中任一項所記載之方法。 A non-transitory computer-readable storage medium storing one or more programs, characterized in that the aforementioned one or more programs include instructions that when executed by one or more processors of an electronic device, cause the aforementioned The electronic device implements the method described in any one of the patent application items 2-36 or 40-52. 一種用於提供基於核酸的資料儲存或從核酸中取回資料的系統,其特徵係包括:一個或更多個處理器; 記憶體;及一個或更多個程式,其中前述一個或多個程式儲存在前述記憶體中且被配置為由前述一個或多個處理器執行,前述一個或更多個程式包括用於實施如申請專利範圍第2-36或40-52中任一項所記載之方法的指令。 A system for providing nucleic acid-based data storage or retrieving data from nucleic acid, characterized in that it includes: one or more processors; memory; and one or more programs, wherein the aforementioned one or more programs Stored in the aforementioned memory and configured to be executed by the aforementioned one or more processors, the aforementioned one or more programs including are used to implement as described in any one of patent application scope 2-36 or 40-52 Method instructions. 一種用於提供基於核酸的資料儲存或從核酸中取回資料的電子裝置,其特徵係,前述裝置包括用於實施如申請專利範圍第2-36或40-52項中任一項所記載之方法的設備。 An electronic device for providing nucleic acid-based data storage or retrieving data from a nucleic acid, characterized in that the aforementioned device includes a device for implementing as described in any one of patent application items 2-36 or 40-52 Method equipment.
TW107127162A 2018-08-03 2018-08-03 Nucleic acid method for data storage, and non-transitory computer-readable storage medium, system, and electronic device TWI770247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW107127162A TWI770247B (en) 2018-08-03 2018-08-03 Nucleic acid method for data storage, and non-transitory computer-readable storage medium, system, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW107127162A TWI770247B (en) 2018-08-03 2018-08-03 Nucleic acid method for data storage, and non-transitory computer-readable storage medium, system, and electronic device

Publications (2)

Publication Number Publication Date
TW202008302A true TW202008302A (en) 2020-02-16
TWI770247B TWI770247B (en) 2022-07-11

Family

ID=70413093

Family Applications (1)

Application Number Title Priority Date Filing Date
TW107127162A TWI770247B (en) 2018-08-03 2018-08-03 Nucleic acid method for data storage, and non-transitory computer-readable storage medium, system, and electronic device

Country Status (1)

Country Link
TW (1) TWI770247B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040153255A1 (en) * 2003-02-03 2004-08-05 Ahn Tae-Jin Apparatus and method for encoding DNA sequence, and computer readable medium
KR101638594B1 (en) * 2010-05-26 2016-07-20 삼성전자주식회사 Method and apparatus for searching DNA sequence
CN103093121B (en) * 2012-12-28 2016-01-27 深圳先进技术研究院 The compression storage of two-way multistep deBruijn figure and building method
US9892237B2 (en) * 2014-02-06 2018-02-13 Reference Genomics, Inc. System and method for characterizing biological sequence data through a probabilistic data structure
GB2530012A (en) * 2014-08-05 2016-03-16 Illumina Cambridge Ltd Methods and systems for data analysis and compression
CN106845158A (en) * 2017-02-17 2017-06-13 苏州泓迅生物科技股份有限公司 A kind of method that information Store is carried out using DNA

Also Published As

Publication number Publication date
TWI770247B (en) 2022-07-11

Similar Documents

Publication Publication Date Title
JP7090148B2 (en) DNA-based data storage and data retrieval
AU2018247323B2 (en) High-Capacity Storage of Digital Information in DNA
CN109830263B (en) DNA storage method based on oligonucleotide sequence coding storage
US11435905B1 (en) Accurate and efficient DNA-based storage of electronic data
Ping et al. Carbon-based archiving: current progress and future prospects of DNA-based data storage
JP2023029836A (en) Nucleic acid-based data storage
US20180046921A1 (en) Code generation method, code generating apparatus and computer readable storage medium
CN112382340A (en) Coding and decoding method and coding and decoding device for binary information to base sequence for DNA data storage
EP2947779A1 (en) Method and apparatus for storing information units in nucleic acid molecules and nucleic acid storage system
JP6786144B1 (en) DNA-based data storage methods, decoding methods, systems and equipment
CN112749247B (en) Text information storage and reading method and device
WO2017085245A1 (en) Methods for encoding and decoding a binary string and system therefore
Wu et al. HD-code: End-to-end high density code for DNA storage
TWI770247B (en) Nucleic acid method for data storage, and non-transitory computer-readable storage medium, system, and electronic device
CN115249509A (en) Data coding method and decoding method based on deoxyribonucleic acid technology
KR20040071993A (en) Method to encode a DNA sequence and to compress a DNA sequence
Bhattacharyya et al. Recent directions in compressing next generation sequencing data
Zhao et al. Composite Hedges Nanopores: A High INDEL-Correcting Codec System for Rapid and Portable DNA Data Readout
Dobretsberger Novel Algorithms for the Analysis and Manipulation of Short Genomic Sequences