Disclosure of Invention
The application aims to provide a novel library, a reagent and application for evaluating the quality of next-generation sequencing.
One aspect of the present application discloses a library for quality evaluation of next generation sequencing, which is a single stranded DNA library of known sequences with different base characteristics, and to which a linker sequence and an index sequence are ligated; wherein the single-stranded DNA library having known sequences with different base characteristics comprises AT least one of a high AT content single-stranded DNA, a high GC content single-stranded DNA, a poly-structure single-stranded DNA, and a hairpin structure single-stranded DNA.
It should be noted that, during the actual sequencing process of unknown sequences, there may be various base characteristics affecting the sequencing accuracy, such as high AT content, high GC content, poly structure and hairpin structure, etc. the present application creatively adopts artificial synthesis method to synthesize single-stranded DNA library with known sequences with the above different base characteristics; therefore, by comparing the sequencing result with the known sequence, the sequencing deviation of the adopted sequencing platform can be known, and the quality of the second-generation sequencing can be evaluated. By means of the sequencing deviation, the sequencing deviation can be further corrected in a targeted mode, and therefore sequencing accuracy is improved.
It is understood that the library for second-generation sequencing quality assessment of the present application can perform, in addition to the second-generation sequencing quality assessment, as mentioned above, further correction and optimization of the second-generation sequencing to improve the sequencing accuracy or the sequencing quality.
It should be noted that, for convenience of use, and further to reduce the library construction process and reduce the base errors or errors introduced by the library construction process, it is preferable that the linker sequence and the index sequence are ligated in advance in the library, that is, the linker sequence and the index sequence are artificially synthesized directly together when synthesizing the sequence of the library; this avoids the reaction step of adding additional linker and index sequences to the library. The specific sequences of the linker sequence and the index sequence can be referred to an existing sequencing platform, and are not limited herein.
Preferably, the library also has universal primer binding sequences at both ends.
It should be noted that the purpose of the universal primer binding sequence is to allow all libraries of different sequences to be amplified using the same pair of primers, for example, from the six libraries of the present application, using the same universal primer binding sequence, only one pair of primers is required to amplify the six libraries, and one pair of primers is not required for each library.
Preferably, the library of the present application consists of at least one of the sequence shown by SEQ ID NO.7, the sequence shown by SEQ ID NO.8, the sequence shown by SEQ ID NO.9, the sequence shown by SEQ ID NO.10, the sequence shown by SEQ ID NO.11 and the sequence shown by SEQ ID NO. 12.
It should be noted that the libraries of the sequences shown in SEQ ID nos. 7 to 12 are only six libraries that can be verified to be effective in evaluating and optimizing the second-generation sequencing quality in one implementation manner of the present application; one skilled in the art can also artificially synthesize more libraries for quality evaluation or optimization of second generation sequencing based on the present application, according to the guidance of the present application.
In yet another aspect of the present application, a cloning vector is disclosed, the cloning vector comprising a plasmid and an insert, wherein the insert comprises a library of the present application.
Preferably, the plasmid is pMD18-T or pMD 19-T.
In a preferred embodiment of the present invention, the library sequence is obtained infinitely by inserting the synthetic library sequence into a plasmid, and then replicating the library once.
In another aspect, the present invention discloses an engineered bacterium comprising a recipient bacterium and the cloning vector of the present invention introduced and stored in the recipient bacterium.
Preferably, the recipient bacterium employed herein is E.coli.
It should be noted that, after the library is cloned into a plasmid, the library can be infinitely used only by synthesizing a single-stranded DNA library once, and the sequence synthesis cost is reduced without synthesizing again. In the subsequent use, the required library can be obtained only by culturing engineering bacteria and extracting plasmids. And the sequence of the library already comprises a sequencing joint adopted by a corresponding sequencing platform, and sequencing can be carried out through simple library construction. The whole process is simple and convenient, and the stability is high.
In yet another aspect, the present application discloses a reagent for quality evaluation of second generation sequencing, the reagent comprising the library of the present application, the cloning vector of the present application, or the engineered bacterium of the present application.
The library, the cloning vector and the engineering bacteria can be used for evaluating the quality of second-generation sequencing, or can be used for correcting and optimizing second-generation sequencing so as to improve the sequencing accuracy or the sequencing quality; therefore, any one of them can be prepared into a kit for convenient use.
Preferably, the reagent of the present application further comprises a universal primer, wherein the upstream primer of the universal primer is a sequence shown in SEQ ID No.13, and the downstream primer is a sequence shown in SEQ ID No. 14.
It should be noted that the universal primers are designed for the universal primer binding sequences at both ends of the library, and the library or cloning vector can be amplified to obtain the library sequences. For ease of use, the universal primers are included in the kits of the present application as a separate package.
It should also be noted that for cloning vectors, such as pMD18-T or pMD19-T, which have plasmid amplification primers themselves, or can be designed for plasmids to amplify different inserts simultaneously, there is no need to design universal primer binding sequences at both ends of the library, and the plasmid amplification primers can be used directly for library amplification or sequencing, and there is no need for separate universal primers for the sequence shown in SEQ ID NO.13 and the sequence shown in SEQ ID NO. 14. The specific manner in which this is done is not limited herein.
More preferably, the reagent of the present application further comprises a splint oligo having a sequence shown in SEQ ID NO. 15.
It should be noted that the splint oligo functions to circularize the library DNA, and in one implementation of the present application, sequencing is performed using DNA nanosphere technology, thus circularizing the library is required. It is understood that the splint oligo may be omitted if the DNA nanoball technology is not used, and is not particularly limited herein.
The application also discloses applications of the library, the cloning vector, the engineering bacteria or the reagent in the evaluation of the relation between the basic group and the sequencing quality, the evaluation of the preference and the accuracy of the basic group of the amplification enzyme, the evaluation of the accuracy of the sequencing enzyme, the extraction evaluation or improvement of the basic group signal, the detection of the accuracy of the second generation sequencing, the detection of the error rate of each link from the library construction to the sequencing or the optimization of each link from the library construction to the sequencing.
The library of the application, based on the cloning vector, the engineering bacteria and the reagent of the library of the application, can be used for performing quality evaluation on second-generation sequencing; the principle is to compare and analyze the deviation between the sequencing result and the known library sequence, and the deviation can be used for evaluating the sequencing quality, evaluating the accuracy of the amplification enzyme and the sequencing enzyme or carrying out optimization based on the deviation. It is understood that, based on the above principle, the library, cloning vector, engineering bacteria, reagent, etc. of the present application can evaluate, detect, and optimize each step of the second generation sequencing process, which is not limited herein.
The application also discloses a method for improving the accuracy of nucleic acid sequencing, which comprises the steps of sequencing by adopting a single-stranded DNA library of a known sequence with different base characteristics, comparing a sequencing result with the known sequence, carrying out statistical analysis on sequencing deviation existing in different base characteristics, and correcting a sequencing software algorithm according to the sequencing deviation, so that the accuracy of nucleic acid sequencing is improved; the single-stranded DNA library having known sequences with different base characteristics includes AT least one of a high AT content single-stranded DNA, a high GC content single-stranded DNA, a poly-structure single-stranded DNA, and a hairpin structure single-stranded DNA.
Preferably, the poly-structure single-stranded DNA includes at least one of poly a-structure single-stranded DNA, poly T-structure single-stranded DNA, poly G-structure single-stranded DNA, and poly C-structure single-stranded DNA.
Preferably, a single-stranded DNA library is the library of the present application.
It should be noted that the method for improving the accuracy of nucleic acid sequencing is actually based on the library of the present application, and according to the principle of the present application, the quality evaluation is performed on the second-generation sequencing, so as to optimize and improve the accuracy of sequencing. Based on the same principle, on the basis of the method for improving the nucleic acid sequencing accuracy, the method for evaluating the nucleic acid sequencing quality, the method for evaluating the relation between the base and the sequencing quality, the method for evaluating the base preference and the accuracy of the amplification enzyme, the method for evaluating the accuracy of the sequencing enzyme, the method for extracting, evaluating or improving the base signal, the method for detecting the accuracy of the second-generation sequencing, the method for detecting the error rate of each link from library construction to sequencing, the method for optimizing the scheme of each link from library construction to sequencing and the like can be provided, and the method is not particularly limited.
It should be noted that the method of the present application can improve the accuracy of nucleic acid sequencing, and likewise, the method of the present application can also be used to evaluate the base bias and accuracy of the amplification enzyme, for example, by comparing the sequencing results of the single-stranded DNA library before and after amplification with the amplification enzyme, the influence of the amplification enzyme on the sequencing bias can be analyzed, so as to achieve the purpose of evaluating the accuracy of the amplification enzyme, and by analyzing the specific type of the sequencing bias, the base bias of the amplification enzyme can be known. The principle of the accuracy evaluation of the sequencing enzyme is similar. In addition, the method can improve the accuracy of nucleic acid sequencing, and the key point is that after the sequencing result is compared and analyzed with the known sequence, a sequencing software algorithm is corrected, wherein the sequencing software algorithm comprises the processing of base signal extraction, so that the method can be applied to improving or evaluating the base signal extraction.
Due to the adoption of the technical scheme, the beneficial effects of the application are as follows:
the library is designed with various base characteristics with controllable structures, sequencing is carried out by adopting sequences with known base characteristics, the influence and deviation of different base characteristics on the second-generation sequencing can be evaluated, the quality evaluation of the second-generation sequencing is realized, the deviations are corrected in a targeted manner, and then the optimization of the second-generation sequencing is realized. The method for improving the nucleic acid sequencing accuracy creatively adopts the base characteristics of the library, and obtains the sequencing deviation of different base characteristics by comparing the sequencing result with the known library sequence, thereby guiding the improvement of the sequencing software algorithm and further achieving the purpose of improving the sequencing accuracy; by the method, sequencing deviation can be effectively reduced, and a simple and effective method is provided for improving sequencing accuracy.
Detailed Description
Through a large number of experiments and researches, the base content complexity of various sequencing objects is an important factor influencing the quality of next-generation sequencing in the actual sequencing process. For example, for a sequence with uniform AT and GC distribution and few poly structures and hairpin structures, both illuma and proton can reach 99.9 percent of accuracy; however, for sequences with high AT content, high GC content, or more poly structures and hairpin structures, the sequencing accuracy is greatly reduced, and even the use requirement of accurate sequencing in precise medical treatment cannot be effectively met.
For this reason, the present application creatively proposes and develops a single-stranded DNA library having known sequences with different base characteristics, wherein the sequences include various base characteristics specially designed, including high AT content, high GC content, poly structure, hairpin structure, etc.; in one implementation of the present application, there are six single-stranded DNAs of the sequences shown in SEQ ID No.7 to SEQ ID No. 12; by adopting the library designed by the application, the known sequence with specific base characteristics is subjected to second-generation sequencing, and the deviation between the sequencing result and the known library sequence is analyzed and compared, so that the accuracy or the sequencing quality of the second-generation sequencing under various base characteristics is analyzed, the deviation obtained by analysis is corrected, and then the second-generation sequencing is optimized.
Before constructing the library of the present application, a set of standard nucleic acids is designed in advance, and these nucleic acids contain various base characteristics required by the library of the present application, and then a part or all of the sequences of the set of standard nucleic acids are selected for library construction. In one implementation of the present application, the standard nucleic acid consists of at least one of six single-stranded DNAs; the sequences of the six single-stranded DNAs are sequentially a sequence shown by SEQ ID NO.1, a sequence shown by SEQ ID NO.2, a sequence shown by SEQ ID NO.3, a sequence shown by SEQ ID NO.4, a sequence shown by SEQ ID NO.5 and a sequence shown by SEQ ID NO. 6. The libraries of sequences shown in SEQ ID NO.7 to 12 in the present application correspond in sequence to the standard nucleic acids of the sequences shown in SEQ ID NO.1 to 6 in the present application.
The present application is described in further detail below with reference to specific examples. The following examples are intended to be illustrative of the present application only and should not be construed as limiting the present application.
Examples
In this example, a set of standard nucleic acid sequences respectively containing base features such as high AT content, high GC content, poly structure and hairpin structure is first designed, then a library is designed for the standard nucleic acid sequences, and BGISEQ linker sequence, index sequence and universal primer binding sequence are added to the library sequences. Artificially synthesizing a designed library sequence, inserting the artificially synthesized library sequence into a pMD19-T plasmid, and introducing the plasmid into Escherichia coli to prepare the engineering bacteria. And extracting plasmids in the engineering bacteria to obtain a library sequence for next generation sequencing and evaluating the sequencing quality. The details are as follows:
design of first, Standard nucleic acids
In this example, six standard nucleic acid sequences were designed based on the base features commonly described in actual sequencing, such as high AT content, high GC content, poly structure, hairpin structure, etc., and different index sequences were used for each standard nucleic acid sequence. Details are shown in table 1.
TABLE 1 sequences of standard nucleic acids
The six standard nucleic acid sequences of this example include two high GC sequences, two high AT sequences, and two random sequences, both of which are common sequences with similar ACGT content, for comparative analysis. Wherein each standard nucleic acid sequence corresponds to an index sequence, i.e., a barcode sequence, for distinguishing different sequences. The two high GC sequences and the two high AT sequences comprise a hairpin structure and a poly structure.
Second, library sequence design and construction
Most of the six standard nucleic acid sequences designed in this example were selected to construct a library, and a linker sequence suitable for BGISEQ was inserted into the library, and the same universal primer binding sequence was ligated to both ends of each of the six standard nucleic acid sequences. The library sequences designed for the six single-stranded DNA standard nucleic acid sequences of the sequences shown in SEQ ID NO.1 to SEQ ID NO.6 are the sequences shown in SEQ ID NO.7 to SEQ ID NO.12 in sequence.
SEQ ID NO.7:
5’-GATATCTGCAGGCATAGAATGAATATTATTGAATCAATAATTAAAGTCGGAGGCCAAGCGGTCTTAGGAAGACAAACTAGTACGTCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTTTACAACTACAGATAATGGGCTGGATACATGGAATGATTATAGATATATTAAGGAATAATGTTAATTAATGCCTAAATTAATTAATCTAAGGGGGTTAATACTTCAGCCTGTGATATC-3’;
SEQ ID NO.8:
5’-GATATCTGCAGGCATGAATAATAATGGAATAGCAATAATTAAAGTCGGAGGCCAAGCGGTCTTAGGAAGACAACGATCAGTACCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTTATATAATGTAATACATAATATTAATATATTAATTATTGTATGATTGTTATCTATTACAGTCTAGTACTGACCCGTAGACATATATGCCCCCGATTAATTACTTATCAGCCTGTGATATC-3’;
SEQ ID NO.9:
5’-GATATCTGCAGGCATCGGCCGCGGCGTCCAGTGCGCGGCGCTAGAGCCGGCAAGTCGGAGGCCAAGCGGTCTTAGGAAGACAACGCTATGTACCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTTCCGCCGCGGTCGCTTGTCCGGCCGCCGGTCCGGCGCCGGCGGCGCAAAGTGCCAGGCCGAGCCGGCGAACCAGCGGTCCGAAAAACACGGACACTCAGCCTGTGATATC-3’;
SEQ ID NO.10:
5’-GATATCTGCAGGCATCACCGCCGAGGCCGCGGCGGAGACCGCCGGCGCAGGAAGTCGGAGGCCAAGCGGTCTTAGGAAGACAACAGAGTGTACCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTTCAAACTACCGGCGCGGCGCTCCTCCGGCCGTCCGCCGCCGACCGGCGGCGGCGTTCCGGTGTGGCACTCCAGGTGGCCGGTTCTCTGCCAAGCGTCAGCCTGTGATATC-3’;
SEQ ID NO.11:
5’-GATATCTGCAGGCATGAAGAACAACCCCGCACGACGCCTACCAACCAAGTCGGAGGCCAAGCGGTCTTAGGAAGACAACTGTATCGTACAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTTGCTGTTCGCGGCCGATGTTCGTATAAGATATAAGTTTGGGTATATTCCAGTTTATCGATCGTATCGAAATGTATGAGTTTATACAGGTCCTACTTCAACTCAGCCTGTGATATC-3’;
SEQ ID NO.12:
5’-GATATCTGCAGGCATACTAGACCAGTTCATTATTATAGTGCTAGCCAAAGTCGGAGGCCAAGCGGTCTTAGGAAGACAAACATCAACGTCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTTGACGGATTCCCTCGCTTTCTATTGGCTGACAGTACAAGTAACATAGGTTGGGTCGGTTAACCCTGCCGTCACAAGTGGAACGATGTTAATAGTTGCGGTCAGCCTGTGATATC-3’;
In the above six library sequences, "GATATCTGCAGGCAT" is a universal primer binding sequence at the 5 'end, and "TCAGCCTGTGATATC" is a universal primer binding sequence at the 3' end, and universal primers are designed for these two sequences. "AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAANNNNNNNNNNCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTT" is a linker sequence comprising an index sequence, where "NNNNNNNN" is a 10bp index sequence. The index sequence of the sequence library shown in SEQ ID NO.7 is "ACTAGTACGT", the index sequence of the sequence library shown in SEQ ID NO.8 is "CGATCAGTAC", the index sequence of the sequence library shown in SEQ ID NO.9 is "CGCTATGTAC", the index sequence of the sequence library shown in SEQ ID NO.10 is "CAGAGTGTAC", the index sequence of the sequence library shown in SEQ ID NO.11 is "CTGTATCGTA", and the index sequence of the sequence library shown in SEQ ID NO.12 is "ACATCAACGT".
In the universal primer, the upstream primer is a sequence shown by SEQ ID NO.13, and the downstream primer is a sequence shown by SEQ ID NO. 14;
SEQ ID NO.13:5’-GATATCTGCAGGCAT-3’;
SEQ ID NO.14:5’-GATATCACAGGCTGA-3’。
in the method, the DNA nanosphere technology is adopted for sequencing, and library DNA needs to be cyclized, so that the splint oligo is designed and has a sequence shown as SEQ ID NO. 15;
SEQ ID NO.15:5’-ATGCCTGCAGATATCGATATCACAGGCTGA-3’。
the libraries of sequences shown in SEQ ID NO.7 to SEQ ID NO.12 of this example, as well as the universal primers, splint oligo, were all synthesized by Shanghai.
Third, cloning vector and engineering bacterium construction
The synthesized library sequences were cloned, and the cloning vector was introduced into E.coli. The cloning vector and the construction of the engineering bacteria are synthesized by Nanjing Kinsley.
Fourth, library acquisition
Culturing the preserved engineering bacteria in LB culture medium at 37 deg.C overnight, and culturing with Thermo Fisher
And extracting the plasmid according to the instruction mode of the kit. And the extracted plasmid is subjected to PCR amplification by adopting a universal primer, and a PCR amplification product can be directly used for sequencing after cyclization.
1. Plasmid extraction
The plasmid extraction of this example employed
Plasmid extraction kit, extraction procedure reference
The description is not repeated herein.
PCR amplification
PCR amplification system 100. mu.L, comprising: 20. mu.L of 5 XHi-Fi enzyme reaction solution, 5. mu.L of dNTPs mixed solution with each component being 10mM, 1. mu.L of Hi-Fi enzyme with 1U/. mu.L, 6. mu.L of upstream primer with 20. mu.M, 6. mu.L of downstream primer with 20. mu.M, and 1. mu. L, ddH of extracted plasmid template2O61. mu.L, a total of 100. mu.L.
The PCR amplification conditions were 98 ℃ for 3min, followed by 33 cycles: 20s at 98 ℃, 15s at 60 ℃ and 30s at 72 ℃; after the circulation was completed, Hold was performed at 72 ℃ for 5min and 4 ℃.
Circularization of PCR amplification product
In this example, magnetic beads are used to purify PCR amplification products, and then purified PCR amplification products are circularized according to BGIseq500SE50 circularization library construction kit and procedure. The specific steps for circularizing the PCR amplification product are described in the kit instructions, and will not be described herein.
Fifth, library sequencing detection and sequencing accuracy detection
To verify that the synthesized library with known sequence can satisfy the sequencing of BGISEQ platform, six libraries of the sequences shown in SEQ ID No.7 to SEQ ID No.12 obtained in the example are subjected to sequencing verification of SE50+10 according to BGISEQ500SE50 kit.
The cyclization products of the six libraries are taken and subjected to DNB preparation according to the operation flow of BGISEQ 500. Then 15. mu.L of each prepared DNB is taken and mixed into a DNB system of 90. mu.L, the chip is manufactured according to the standard flow, and the SE50+10 sequencing mode is selected for sequencing.
Sequencing results show that the sequencing results of six libraries of sequences shown by SEQ ID NO.7 to SEQ ID NO.12 are distinguished according to the index sequence, the first 50bp results of the sequencing of the six library sequences are the same as the actual standard nucleic acid sequence, the first 50bp results of the sequencing of the six library sequences are shown in figures 1 to 6, and the figures 1 to 6 sequentially correspond to the sequencing results of the six libraries of the sequences shown by SEQ ID NO.7 to SEQ ID NO. 12; the library construction is successful, and the algorithm basecall is accurate.
Sixth, evaluation of sequencing quality
In order to compare the relationship between sequencing quality and bases, sequencing of SE100 was performed on a library of sequences represented by SEQ ID NO.7 with a high AT content (referred to as high AT library for short) and a library of sequences represented by SEQ ID NO.9 with a high GC content (referred to as high GC library for short) using a sequencing kit of BGISEQ500SE100+ 10.
Preparation and chip fabrication of DNB were the same as "five, library sequencing assay and sequencing accuracy assay". Only a library of the sequence shown in SEQ ID NO.7 and a library of the sequence shown in SEQ ID NO.9 were prepared and subjected to on-machine sequencing in SE100 in this experiment.
The sequencing quality of the two libraries was analyzed and compared, as shown in Table 2, the library of the sequence shown in SEQ ID NO.9 with high GC content had a Q30 lower than that of the library of the sequence shown in SEQ ID NO.7 with high AT content and a higher error rate than that of the library with high AT content. For this reason, targeted optimization can be carried out for libraries rich in GC content in a subsequent improvement of the sequencing technology.
TABLE 2 comparison of sequencing quality of two libraries
Name (R)
|
PredQual
|
GC content%
|
Q10%
|
Q10%
|
Q10%
|
EsErr%
|
High AT library
|
33
|
27.05%
|
99.16
|
98.02
|
91.44
|
0.23
|
High GC libraries
|
33
|
75.47%
|
98.16
|
94.18
|
85.05
|
0.68 |
In addition, further analysis of the relationship between bases and quality values, as shown in FIG. 7, FIG. 7 is a Q30 distribution diagram of a high GC library, and it can be clearly seen that at the 60bp, 68bp, 81bp, 91bp, 97bp, the Q30 diagram has a significant downward trend, and all the positions corresponding to the sequence have a common characteristic that when the base G is followed by A, the sequencing quality of A is deteriorated, which provides a direction for the optimization of the subsequent sequencing technology.
Therefore, the standard nucleic acid and the library based on the standard nucleic acid can evaluate the base preference and accuracy of sequencing in the second-generation sequencing, detect the accuracy of the second-generation sequencing and evaluate the quality of the second-generation sequencing; and the sequencing result and the analysis of the base characteristics are optimized in a targeted manner, so that the accuracy of nucleic acid sequencing is improved.
The foregoing is a more detailed description of the present application in connection with specific embodiments thereof, and it is not intended that the present application be limited to the specific embodiments thereof. It will be apparent to those skilled in the art from this disclosure that many more simple derivations or substitutions can be made without departing from the spirit of the disclosure.
SEQUENCE LISTING
<110> Shenzhen Huashengshengsciences institute
<120> library, reagent and application for second-generation sequencing quality evaluation
<130> 17I25566-A23542
<160> 15
<170> PatentIn version 3.3
<210> 1
<211> 150
<212> DNA
<213> Artificial sequence
<400> 1
tacaactaca gataatgggc tggatacatg gaatgattat agatatatta aggaataatg 60
ttaattaatg cctaaattaa ttaatctaag ggggttaata ctatgtgtta attaatctta 120
ttagaatgaa tattattgaa tcaataatta 150
<210> 2
<211> 150
<212> DNA
<213> Artificial sequence
<400> 2
atataatgta atacataata ttaatatatt aattattgta tgattgatat ctattacagt 60
ctagtactga cccgtagaca tatatgcccc cgattaatta cttaggctta ttaataatat 120
ataggaataa taatggaata gcaataatta 150
<210> 3
<211> 150
<212> DNA
<213> Artificial sequence
<400> 3
ccgccgcggt cgcttgtccg gccgccggtc cggcgccggc ggcgcaaagt gccaggccga 60
gccggcgaac cagcggtccg aaaaacacgg acacggtaac ctcaccacga tggccggccg 120
cggcgtccag tgcgcggcgc tagagccggc 150
<210> 4
<211> 150
<212> DNA
<213> Artificial sequence
<400> 4
caaactaccg gcgcggcgct cctccggccg tccgccgccg accggcggcg gcgttccggt 60
gtggcactcc aggtggccgg ttctctgcca agcggcaggc gaaaaatcga cggccaccgc 120
cgaggccgcg gcggagaccg ccggcgcagg 150
<210> 5
<211> 150
<212> DNA
<213> Artificial sequence
<400> 5
gctgttcgcg gccgatgttc gtataagata taagtttggg tatattccag tttatcgatc 60
gtatcgaaat gtatgagttt atacaggtcc tacttcaaca agcggcactt tactaccgtg 120
aagaacaacc ccgcacgacg cctaccaacc 150
<210> 6
<211> 150
<212> DNA
<213> Artificial sequence
<400> 6
gacggattcc ctcgctttct attggctgac agtacaagta acataggttg ggtcggttaa 60
ccctgccgtc acaagtggaa cgatgttaat agttgcggaa ccctatgttc ggcggaatac 120
tagaccagtt cattattata gtgctagcca 150
<210> 7
<211> 244
<212> DNA
<213> Artificial sequence
<400> 7
gatatctgca ggcatagaat gaatattatt gaatcaataa ttaaagtcgg aggccaagcg 60
gtcttaggaa gacaaactag tacgtcaact ccttggctca cagaacgaca tggctacgat 120
ccgactttac aactacagat aatgggctgg atacatggaa tgattataga tatattaagg 180
aataatgtta attaatgcct aaattaatta atctaagggg gttaatactt cagcctgtga 240
tatc 244
<210> 8
<211> 244
<212> DNA
<213> Artificial sequence
<400> 8
gatatctgca ggcatgaata ataatggaat agcaataatt aaagtcggag gccaagcggt 60
cttaggaaga caacgatcag taccaactcc ttggctcaca gaacgacatg gctacgatcc 120
gacttatata atgtaataca taatattaat atattaatta ttgtatgatt gttatctatt 180
acagtctagt actgacccgt agacatatat gcccccgatt aattacttat cagcctgtga 240
tatc 244
<210> 9
<211> 244
<212> DNA
<213> Artificial sequence
<400> 9
gatatctgca ggcatcggcc gcggcgtcca gtgcgcggcg ctagagccgg caagtcggag 60
gccaagcggt cttaggaaga caacgctatg taccaactcc ttggctcaca gaacgacatg 120
gctacgatcc gacttccgcc gcggtcgctt gtccggccgc cggtccggcg ccggcggcgc 180
aaagtgccag gccgagccgg cgaaccagcg gtccgaaaaa cacggacact cagcctgtga 240
tatc 244
<210> 10
<211> 244
<212> DNA
<213> Artificial sequence
<400> 10
gatatctgca ggcatcaccg ccgaggccgc ggcggagacc gccggcgcag gaagtcggag 60
gccaagcggt cttaggaaga caacagagtg taccaactcc ttggctcaca gaacgacatg 120
gctacgatcc gacttcaaac taccggcgcg gcgctcctcc ggccgtccgc cgccgaccgg 180
cggcggcgtt ccggtgtggc actccaggtg gccggttctc tgccaagcgt cagcctgtga 240
tatc 244
<210> 11
<211> 244
<212> DNA
<213> Artificial sequence
<400> 11
gatatctgca ggcatgaaga acaaccccgc acgacgccta ccaaccaagt cggaggccaa 60
gcggtcttag gaagacaact gtatcgtaca actccttggc tcacagaacg acatggctac 120
gatccgactt gctgttcgcg gccgatgttc gtataagata taagtttggg tatattccag 180
tttatcgatc gtatcgaaat gtatgagttt atacaggtcc tacttcaact cagcctgtga 240
tatc 244
<210> 12
<211> 244
<212> DNA
<213> Artificial sequence
<400> 12
gatatctgca ggcatactag accagttcat tattatagtg ctagccaaag tcggaggcca 60
agcggtctta ggaagacaaa catcaacgtc aactccttgg ctcacagaac gacatggcta 120
cgatccgact tgacggattc cctcgctttc tattggctga cagtacaagt aacataggtt 180
gggtcggtta accctgccgt cacaagtgga acgatgttaa tagttgcggt cagcctgtga 240
tatc 244
<210> 13
<211> 15
<212> DNA
<213> Artificial sequence
<400> 13
gatatctgca ggcat 15
<210> 14
<211> 15
<212> DNA
<213> Artificial sequence
<400> 14
gatatcacag gctga 15
<210> 15
<211> 30
<212> DNA
<213> Artificial sequence
<400> 15
atgcctgcag atatcgatat cacaggctga 30