CN104232626A - Barcode object in reduced-representation genome sequencing library and design method thereof - Google Patents

Barcode object in reduced-representation genome sequencing library and design method thereof Download PDF

Info

Publication number
CN104232626A
CN104232626A CN201310233343.8A CN201310233343A CN104232626A CN 104232626 A CN104232626 A CN 104232626A CN 201310233343 A CN201310233343 A CN 201310233343A CN 104232626 A CN104232626 A CN 104232626A
Authority
CN
China
Prior art keywords
bar code
thing
code thing
seq
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310233343.8A
Other languages
Chinese (zh)
Inventor
方东明
郭钰
原辉
刘勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Shenzhen Co Ltd
Original Assignee
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd filed Critical BGI Shenzhen Co Ltd
Priority to CN201310233343.8A priority Critical patent/CN104232626A/en
Publication of CN104232626A publication Critical patent/CN104232626A/en
Pending legal-status Critical Current

Links

Abstract

The invention belongs to the field of genomics and especially relates to the field of reduced-representation genome sequencing. More specifically, the invention provides a design method of a barcode object in a reduced-representation genome sequencing library. The invention provides a design method of a barcode object in a reduced-representation genome sequencing library and a designed barcode object matrix M (N*L) obtained by the method. The designed barcode object matrix M (N*L) accords with the standard in the specification of the application.

Description

Simplify bar code thing and method of design thereof in gene order-checking library
Technical field
The invention belongs to genomics field, particularly simplify gene order-checking field.
Background technology
Simplify the general name that gene order-checking (reduced-representation sequencing) technology is a series of technology grown up on two generations order-checking basis in recent years, mainly comprise RAD(restriction-site associated DNA), GBS(genotyping-by-sequencing), 2b-RAD(type IIB restriction-site associated DNA), double digestion GBS(two-enzyme genotyping-by-sequencing), double digestion RAD(double digest restriction-site associated DNA) etc. technology.
The ultimate principle of these technology is that sample DNA is cut after process through enzyme, carries out upper machine order-checking to it.Order-checking is all check order to sample restriction enzyme site neighboring area instead of check order to full-length genome, and therefore the lower machine data amount of each sample is only 0.05-0.4 × (variant between different simplification genomic sequencing technique).Little and whole genome can be uniformly distributed in again owing to simplifying genomic sequencing technique data volume, so these technology are just obtaining applying more and more widely in low cost gene type, by a large amount of for the aspect such as genetic linkage map structure, population genetic diversity assessment, Evolution of Population analysis after gene type.
In the document delivered at present, simplify genomic sequencing technique and mainly carry out on Illumina Hiseq platform.Because it is little to simplify genomic sequencing technique single sampled data amount, and Illumina Hiseq sequenator list swimming lane (lane) capacity is comparatively large, and (the SE50 raw data output of such as single swimming lane is at about 7.5G, the raw data output of PE50 is at about 15G), so the single swimming lane of Illumina Hiseq sequenator can hold the sample of dozens or even hundreds of simplification gene order-checking.In the face of so many sample, the document delivered at present is all taked to mix the mode that (pooling) builds storehouse: first carry out enzyme to sample DNA and cut, cut sticky end at enzyme afterwards and connect the joint (as Fig. 1 and Fig. 2) that the preceding paragraph contains bar code thing (barcode), then mixing connecting the DNA of good joint as a sample (i.e. mixture), finally end reparation being carried out to this sample, adding A, PCR, all the other are built storehouse and operate to cut glue purification etc.Thisly build storehouse mode, only cut at enzyme, this two step of jointing needs to operate (namely a storehouse) to each sample, and after blending, then be equivalent to only operate in (i.e. a various storehouse) a sample, with conventional Illumina Hiseq build storehouse compared with a storehouse, so just greatly save manpower and time.
Simplify genomic sequencing technique, when building storehouse, one section of bar code thing is with by joint, its effect is mainly in order to distinguish mixed sample.In the lower machine data after two generation sequencer, bar code thing is positioned at the high order end of every bar section of reading (read).Different samples, with different bar code things, so just can reach and can also distinguish sample under the storehouse mode of building in a various storehouse.Visible, bar code thing plays a very important role in simplification gene order-checking.
In the document before delivered, great majority are proposed the scheme of bar code thing design and give the bar code designed thing, as shown in table 1.Wherein, the bar code thing design reported in double digestion GBS document enumerates the scheme reported in other documents and adds new index, and result is optimized more, is report best scheme in the document delivered at present.
Table 1: deliver the bar code thing design in document at present
As previously noted, the capacity of the single swimming lane of Illumina Hiseq sequenator is comparatively large, so other sequencing libraries of Illumina Hiseq platform, such as small segment library, transcript library etc., face the problem that a swimming lane holds multiple sample equally.Current settling mode is when Jian Ku proceeds to PCR step, and PCR primer adds the mode (as shown in Figure 3) of a segment index (index), then by machine upper after sample mix, distinguishes sample afterwards according to index.Make all capacity of single swimming lane all be used like this, order-checking cost is reduced.
summary of the invention
Inventors have investigated the bar code thing design in the simplification gene order-checking library proposed in the document delivered at present, based on following 3, the design of described bar code thing be optimized:
1), in the scheme provided at document, the bar code thing number of design is all many.The bar code thing that each laced belt is different, means and will synthesize a lot of joints, the cost of such joint can be higher.Especially 5 ' the end for joints such as RAD, double digestion RAD needs phosphorylation, result in joint synthesis expense and greatly improves.How to use less joint, meet again that to hold numerous samples in a swimming lane be a problem simultaneously.
To this, the scheme that contriver proposes common small segment is built storehouse to add index and simplify gene order-checking and mix the mode of building storehouse and combine, thus use a small amount of bar code thing can reach the object of multiple sample on single swimming lane.This can reduce costs greatly.Such as, the scheme of contriver only can synthesize 12 bar code things, uses the PCR primer indexed when PCR.Than if any 24 samples, when contriver builds storehouse, 12 samples are compiled is one group, often organizes 12 samples and uses 12 bar code things.So just two groups be can be divided into, group 1, group 2 are designated as.After group 1 and group 2 are blended together a sample respectively, use a kind of index in the PCR stage group 1 of building storehouse, group 2 uses another index.Like this in lower machine data, can first according to index by 2 groups of samples separately, and then according to bar code thing, 12 samples be separated.Compared with 48 that use in document, 96,384 bar code things, the expense of synthetic linker is reduced to original 0.25,0.125,0.03125 respectively.
2) the Illunima Hiseq base error rate that starts when checking order of platform that checks order is higher, and the initiating terminal simplifying gene order-checking is bar code thing.Therefore, base mistake may be had to the order-checking of bar code thing.In the scheme that document is mentioned, any two bar code things have 2 mispairing at least.If occur 2 base mistakes to the order-checking of bar code thing, and mistake is positioned in mispairing just, two similar bar code things will be obscured and cannot differentiate, and such section of reading just can only be filtered.
The present invention, when generating bar code thing, has carried out strict regulation to the mispairing between any two bar code things: the mispairing at least having more than 3.Even if guaranteeing like this to occur 2 base mistakes when starting to check order, also can distinguish according to bar code thing the section of reading and belong to that sample, instead of to be filtered, like this can retention data to greatest extent.
3) the bar code thing base sequence length used is longer, then permutation and combination is more, and the sequence generating at least 3 mispairing is also easier.Therefore, base number is decided to be between 5bp to 9bp by contriver, can improve arithmetic speed than the 4bp to 9bp reported in document.
Based on above-mentioned 3 points, inventors herein propose the method for design of bar code thing in a kind of simplification gene order-checking library of optimization.In simplification gene order-checking library of the present invention, the method for design of bar code thing comprises according to required bar code thing number (N) and required bar code thing length (L), forms bar code thing matrix M (N × L) (having the capable L row of N in representing matrix M),
Between described bar code thing length L, there is some difference, and L is the longest is max, and the shortest is min, by calculating the difference (max-min+1) between max and min, and N is distributed to max successively, max-1, ..., min+1, min, according to all lengths number, length is arranged from big to small, obtains N number of bar code thing that incomplete M, M represent different lengths, and all lengths is uniformly distributed, the every a line in this matrix represents a bar code thing; Bar code thing meets following condition:
A) in described ill-conditioned matrix, on any row the same position of all bar code things (i.e. on), its total based composition (ATCG) is even: when namely N is even number, then any row must exist A+C=T+G; When N is odd number, then any row must be A+C+/-1=T+G, when calculating, the shortcoming part occurred due to the length difference of L in M must not be filled; Namely, in all bar code things obtained, on their any same position, the distribution of four kinds of bases is all uniform;
B) all there is the mispairing of more than 3 in any two bar code things;
C) can not there are 3 consecutive identical bases in the based composition of bar code thing;
D) there is not the base composition identical with corresponding restriction enzyme site in bar code thing.
Present invention also offers a kind of design matrix M(N × L simplifying bar code thing in gene order-checking library):
The capable L row of N are had in matrix M, represent in M and have length to be the N bar bar code thing of L, between described bar code thing length L, there is some difference, L is the longest is max, the shortest is min, by calculating the difference (max-min+1) between max and min, and N is distributed to max successively, max-1, ..., min+1, min, according to all lengths number, length is arranged from big to small, obtains N number of bar code thing that incomplete M, M represent different lengths, and all lengths is uniformly distributed, the every a line in this matrix represents a bar code thing; Bar code thing meets following condition:
A) in described ill-conditioned matrix, on any row the same position of all bar code things (i.e. on), its total based composition (ATCG) is even: when namely N is even number, then any row must exist A+C=T+G; When N is odd number, then any row must be A+C+/-1=T+G, when calculating, the shortcoming part occurred due to the length difference of L in M must not be filled; Namely, in all bar code things obtained, on their any same position, the distribution of four kinds of bases is all uniform;
B) all there is the mispairing of more than 3 in any two bar code things;
C) can not there are 3 consecutive identical bases in the based composition of bar code thing;
D) there is not the base composition identical with corresponding restriction enzyme site in bar code thing.
Present invention also offers a kind of method simplifying gene order-checking, comprise step
A) two or more sample DNA is extracted,
B) above-mentioned sample DNA is broken into fragment respectively, such as 100-1000bp, more preferably 150-500bp, most preferably 200-300bp, particularly 200-250bp, fragment;
C) the above-mentioned DNA interrupted is connected a joint respectively, described joint contains the bar code thing in the bar code thing of method design of the present invention or design bar code thing matrix M of the present invention, and the joint that different sample connects is different;
D) by the DNA sample mixing after jointing, and optionally described biased sample carried out end reparation, add A, PCR, cut glue purification,
F) simplification gene order-checking is carried out to above-mentioned biased sample.
Bar code thing design of the present invention not only has to have delivered in document and had superiority, and expense greatly reduces, and amount of available data is more, and arithmetic speed is faster, is a kind of bar code object space case more optimized.
Accompanying drawing explanation
Fig. 1. with the sequence measuring joints of bar code thing in double digestion RAD.
Fig. 2 simplifies gene order-checking and builds storehouse general step.
Fig. 3 small segment builds storehouse general flow.
Embodiment
In the text, bar code thing, joint are as follows with the meaning of index: joint is that bar code thing is positioned on joint for carrying out identifying and being connected with the end sequence that enzyme is cut.Index is positioned in PCR primer, index sequence is incorporated in PCR primer in PCR reaction.Wherein index is used for primary characterization, and bar code thing is used for secondary characterization.In this application, the joint that synthesis is returned is two strands, needs annealing to form double-strand.The known strand of those skilled in the art forms the method for double-strand.
Contriver is by studying the design of bar code thing in the simplification gene order-checking library reported in the document delivered at present, propose the method for design simplifying bar code thing in gene order-checking library, described method comprises according to required bar code thing number (N) and required bar code thing length (L), form bar code thing matrix M (N × L) (having the capable L row of N in representing matrix M)
Between described bar code thing length L, there is some difference, and L is the longest is max, and the shortest is min, by calculating the difference (max-min+1) between max and min, and N is distributed to max successively, max-1, ..., min+1, min, according to all lengths number, length is arranged from big to small, obtain incomplete M, M represents N number of bar code thing of different lengths, and all lengths is uniformly distributed, the every a line in this matrix represents a bar code thing
Wherein preferably, N is 3-36, preferred 6-24, preferred 8-20, more preferably 12;
Preferably, the lower limit of length L be 2,3,4,5,6,7,8,9,10,11,12,13,14, the upper limit of 15bp, L 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20bp; Such as between 5-9bp, absorbancy balance when making sequencer;
Bar code thing meets following condition:
A) in described ill-conditioned matrix, on any row the same position of all bar code things (i.e. on), its total based composition (ATCG) is even: when namely N is even number, then any row must exist A+C=T+G; When N is odd number, then any row must be A+C+/-1=T+G, when calculating, the shortcoming part occurred due to the length difference of L in M must not be filled; Namely, in all bar code things obtained, on their any same position, the distribution of four kinds of bases is all uniform;
B) all there is the mispairing of more than 3 in any two bar code things;
C) can not there are 3 consecutive identical bases in the based composition of bar code thing;
D) there is not the base composition identical with corresponding restriction enzyme site in bar code thing.
In the present invention, restriction endonuclease refers to restriction endonuclease.The splitting of chain of the many nucleosidases of its energy catalysis, only has an effect to some position in base sequence certain in thymus nucleic acid, and the chain of these positions is cut, as table 2.In a preferred embodiment, restriction endonuclease is EcoRI.
Table 2 endonuclease recognition sequence and cutting sequence
Note: in form, W represents A or T; N represents any one base in ATCG.
Present invention also offers a kind of design matrix M(N × L simplifying bar code thing in gene order-checking library):
The capable L row of N are had in matrix M, represent in M and have length to be the N bar bar code thing of L, between described bar code thing length L, there is some difference, L is the longest is max, and the shortest is min, by calculating the difference (max-min+1) between max and min, and N is distributed to max successively, max-1 ..., min+1, min, according to all lengths number, length is arranged from big to small, obtain incomplete M, M represents N number of bar code thing of different lengths, and all lengths is uniformly distributed, the every a line in this matrix represents a bar code thing
Wherein preferably, N is 3-36, preferred 6-24, preferred 8-20, more preferably 12;
Preferably, the lower limit of length L be 2,3,4,5,6,7,8,9,10,11,12,13,14, the upper limit of 15bp, L 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20bp; Such as between 5-9bp, absorbancy balance when making sequencer;
Bar code thing meets following condition:
A) in described ill-conditioned matrix, on any row the same position of all bar code things (i.e. on), its total based composition (ATCG) is even: when namely N is even number, then any row must exist A+C=T+G; When N is odd number, then any row must be A+C+/-1=T+G, when calculating, the shortcoming part occurred due to the length difference of L in M must not be filled; Namely, in all bar code things obtained, on their any same position, the distribution of four kinds of bases is all uniform;
B) all there is the mispairing of more than 3 in any two bar code things;
C) can not there are 3 consecutive identical bases in the based composition of bar code thing;
D) there is not the base composition identical with corresponding restriction enzyme site in bar code thing.
In a preferred embodiment, following preferred design is in greater detail inventor provided:
1) according to required bar code thing number (N), distribute to A, T, C and G successively, number of times again assigned by each base carries out longitudinal random alignment, in conjunction with needed for bar code thing length (L), just can obtain matrix M (N × L) (having the capable L row of N in representing matrix M), the bar code thing that this matrix representative is all;
For matrix M (N × L), on any row the same position of all bar code things (i.e. on), its total based composition (ATCG) is even: when namely N is even number, then any row must exist A+C=T+G; When N is odd number, then any row must be A+C+/-1=T+G;
On this basis, in all bar code things obtained, can guarantee that on their any same position, the distribution of four kinds of bases is all uniform.But requiring will there is certain difference in length between bar code thing when testing, therefore, the length of each bar code thing being limited as follows:
2) suppose that L is the longest for max, the shortest be min, and each bar code thing length of generation will be uniformly distributed.By calculating the difference (max-min+1) between max and min, and N is distributed to max successively, max-1 ..., min+1, min.According to all lengths number, length is arranged from big to small, obtain an incomplete M.To this M by 1) perform, in M, shortcoming part is not filled, then M represents N number of bar code thing of different lengths, and all lengths is uniformly distributed.
If N very little, then can not meet enough samples; If N is too many, then the speed of the bar code thing combination produced can decline, and cost also can increase.Therefore, Binding experiment needs, and acquiescence N is 12, L is 5-9, and on any one site, the distribution of base is all uniform.If need more or longer bar code thing combination, can be arranged by corresponding parameter.
The bar code thing of above-mentioned design also must meet following condition:
A) data obfuscation in order to prevent bar code thing from causing due to the mistake that checks order, must guarantee the mispairing that any two bar code things all exist more than 3.
B) for bar code thing self, if there is too much consecutive identical base, the sequencing result in next site can be had influence on, therefore, must guarantee that the based composition of bar code thing can not occur that the identical base of more than 2 connects together.
C) bar code thing combines with restriction enzyme site, restriction endonuclease recognition sequence is there is again after being connected with restriction enzyme site to prevent bar code thing, and cause this position to be again cut open, must guarantee can not there is the base composition identical with corresponding restriction enzyme site in produced bar code thing.
For a), b), c), the design of program adopts mode in parallel, often produces a bar code thing and all can judge it, do not meet a), b), c) in any one all will return 2).
D) for the qualified multiple bar code thing combination of generation, to meet the needs of self-selection.Adopt the method for multithreading, control number of threads (C) by parameters, so, will C the M satisfied condition be obtained in result, and any two matrixes can not be identical.
Like this, the bar code thing of this embodiment design has following characteristics:
(1) number is 12;
(2) length is between 5-9bp, absorbancy balance when making sequencer;
(3) at least there are 3 mispairing;
(4) not containing 3 consecutive identical bases;
(5) not containing corresponding restriction enzyme site;
In addition, contriver, when building storehouse, makes index of reference carry out building storehouse, upper machine while such 12 bar code things just can complete multiple sample.
In sum, bar code thing design of the present invention not only has to have delivered in document and had superiority, and expense greatly reduces, and amount of available data is more, and arithmetic speed is faster, is a kind of bar code object space case more optimized.
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
The bar code thing combination that embodiment 1-generation length is not identical:
The bar code thing number needed and length are respectively 12 and 5-9bp; Produce the bar code thing of length difference 1, the distribution of the length of these bar code things is uniform; Minimum length 5 in the bar code thing produced; The mispairing 3 that any two bar code things at least exist; Several above consecutive identical base 3 can not be there is in bar code thing simultaneously; Produce qualified 1 group of bar code thing combination; The combination of required endonuclease recognition sequence can not be comprised in the bar code thing produced, here for EcoRI restriction endonuclease for AATTC), as shown in table 3.
Table 3M(12 × 9) in a kind of combination
Bar code thing Sequence
1 GCATAATA(SEQ?ID?No.194)
2 GAGATTCT(SEQ?ID?No.195)
3 TCACACG(SEQ?ID?No.196)
4 CTAGGGA(SEQ?ID?No.197)
5 AATCGA(SEQ?ID?No.198)
6 AGCACT(SEQ?ID?No.199)
7 TGCGC(SEQ?ID?No.200)
8 CTTAT(SEQ?ID?No.201)
9 TGGCA(SEQ?ID?No.202)
10 CAGG(SEQ?ID?No.203)
11 GTCT(SEQ?ID?No.204)
12 ACTT(SEQ?ID?No.205)
The bar code thing combination that embodiment 2-generation many groups length is identical:
The bar code thing number needed and length are respectively 12 and 5-9bp; Produce the bar code thing of length difference 1, these bar code things and the distribution of length is uniform; Minimum length 5 in the bar code thing produced; The mispairing 3 that any two bar code things at least exist; Several above consecutive identical base 3 can not be there is in bar code thing simultaneously; Produce the number 5 groups of qualified bar code thing combination; The combination of required endonuclease sequence can not be comprised in the bar code thing produced, here for EcoRI restriction endonuclease for AATTC), as shown in table 4.
Table 4M(12 × 9) in 5 kinds of combinations
From above example, what bar code thing designed is technically feasible.The bar code thing of design has all done strict requirement in length and composition, after synthesis, and reusable edible experimentally.
Bar code thing designed by program is also applied to experiment, needs, comprise RAD, GBS, 2b-RAD, double digestion GBS and double digestion RAD, arrange different bar code things in conjunction with different technologies.
The preparation of the sample used in embodiment is summarized as follows:
24 samples of paddy rice are checked order, checks order and to carry out on Illumina Hiseq2000 platform.The specific experiment method of often kind of technology carries out (RAD according to the method that the document of relevant art provides respectively 1/ 2008, GBS 2/ 2011, double digestion GBS 5/ 2012, two enzyme RAD 3/ 2012,2b-RAD 6/ 2012).In experiment, joint used, PCR primer are by the synthesis of Invitrogen company, and reagent is purchased from the company consistent with reference.Joint synthesis return after, by joint P1, P2(as table 5 example, the bar code thing that has used in embodiment sees below) annealed become double-strand after use.Such as, first by the positive minus strand of each joint respectively adding distil water be dissolved as 100 μMs, then in PCR instrument, carry out anneal according to following table, annealing parameter arranges as follows: 95 DEG C 10 minutes, 70 DEG C 10 minutes, 65 DEG C 10 minutes, 60 DEG C 10 minutes, 55 DEG C 10 minutes, 50 DEG C 10 minutes, 45 DEG C 10 minutes, 40 DEG C 10 minutes, 35 DEG C 10 minutes, 30 DEG C 10 minutes, 25 DEG C 30 minutes.
Table 5: deliver the joint used in document at present
Note: " xxxx " represents the bar code thing (the bar code thing that has used in embodiment sees below) in joint
Table 6: the PCR primer that each technology uses, uses universal primer and the index primer containing index 1 after sample 1-12 mixes, use universal primer and the index primer containing index 2, index lowercase after sample 13-24 mixes
Mark
Bar code thing joint of the present invention can form test kit with universal primer of the present invention and index primer.Such as, as hereafter table 7,9,11, the universal primer of the bar code thing joint shown in 13 and 15 and table 6 and index primer composition test kit.
The test process of often kind of technology and result are distinguished as follows:
1) RAD technical testing: 24 parts of paddy rice sample DNAs, after EcoRI enzyme is cut, connect the joint with bar code thing respectively according to table 7; Interrupt the fragment into about 500bp afterwards; 1-12 sample is mixed into a sample, and 13-24 sample is mixed into a sample; Glue reclaims purifying; End is repaired, and adds A; 1-12 paddy rice mixture uses the PCR primer (SEQ ID No.12) of universal primer (SEQ ID No.11) and index 1, and 13-24 sample uses the PCR primer (SEQ ID No.12) of universal primer (SEQ ID No.11) and index 2 to carry out PCR reaction; Glue reclaims the fragment between 350-550; Quality inspection, upper machine order-checking.Detailed process sees reference document 1.The results are shown in Table 8.
Table 7RAD bar code thing used joint, bar code thing lowercase represents
The section of the reading number statistics of Different Individual in table 8RAD technology
Note: 12 the bar code things being applied to RAD technology order-checking (carrying out on Illumina Hiseq platform), 2 indexes (namely for primary characterization: index 1 is AAGCAATG, index 2 is AATCCGAA), for distinguishing the order-checking of 24 paddy rice individualities.The bar code thing number needed and length are respectively 12 and 5-9bp; The length distribution of these bar code things is uniform; Minimum length 5 in the bar code thing produced; The mispairing 3 that any two bar code things at least exist; Several above consecutive identical base 3 can not be there is in bar code thing simultaneously; Produce qualified 1 group of bar code thing combination; The combination of required endonuclease sequence can not be comprised in the bar code thing produced, here for EcoRI restriction endonuclease (recognition site GAATTC) for AATTC, namely can not there is AATTC in bar code thing.In table 8, number hurdle represents in lower machine data, the number of the section of reading in each sample corresponding to bar code thing.The bar code thing that uses experimentally, distinguishes the Different Individual of lower machine data, and add up meet the section of reading number (section of reading fore-end and one of them bar code thing completely the same, think that this sequence belongs to corresponding individuality; If all inconsistent, think what order-checking mistake caused, this part is not taken statistics, lower same).According to above form, in the section of the reading number that sample obtains by checking order, the minimum section of reading number is 6,367,801, and the maximum section of reading number is 13,286,102, and maximum with minimum ratio is 2.09, meets the uniform requirement of data.
2) GBS technical testing: 24 paddy rice sample DNAs, after ApeKI enzyme is cut, connect the joint with bar code thing respectively according to table 9; 1-12 sample is mixed into a sample, and 13-24 sample is mixed into a sample; Purifying; 1-12 paddy rice mixture uses the PCR primer (SEQ ID No.12) of universal primer (SEQ ID No.11) and index 1, and 13-24 sample uses the PCR primer (SEQ ID No.13) of universal primer (SEQ ID No.11) and index 2 to carry out PCR reaction; Product purification, quality inspection, upper machine order-checking.Detailed process sees reference document 2.The results are shown in Table 10.
Table 9GBS bar code thing used joint, bar code thing lowercase marks
The section of the reading number statistics of Different Individual in table 10GBS technology
Note: 12 the bar code things being applied to GBS technology order-checking (carrying out on Illumina Hiseq platform), 2 indexes are (namely for primary characterization: index 1 is AAGCAATG, index 2 is AATCCGAA), for distinguishing the order-checking of 24 paddy rice individualities, (the bar code thing number of needs and length are respectively 12 and 5-9bp; Produce the bar code thing of different lengths difference 1, these bar code things and the distribution of length is uniform; Minimum length 5 in the bar code thing produced; The mispairing 3 that any two bar code things at least exist; Several above consecutive identical base 3 can not be there is in bar code thing simultaneously; Produce qualified 1 group of bar code thing combination; The combination of required endonuclease sequence can not be comprised in the bar code thing produced, here with ApeKI restriction endonuclease (recognition site GCWGC, wherein W is A or T) for example is CAGC and CTGC).According to above form, in the section of the reading number that sample obtains by checking order, the minimum section of reading number is 7,125,489, and the maximum section of reading number is 12,545,796, and maximum with minimum ratio is 1.76, meets the uniform requirement of data.
3) 2b-RAD technical testing: 24 paddy rice sample DNAs, after BsaXI enzyme is cut, connect the joint with bar code thing respectively according to table 11; 1-12 sample is mixed into a sample, and 13-24 sample is mixed into a sample; Purifying; 1-12 paddy rice mixture uses the PCR primer (SEQ ID No.12) of universal primer (SEQ ID No.11) and index 1, and 13-24 sample uses the PCR primer (SEQ ID No.13) of universal primer (SEQ ID No.11) and index 2 to carry out PCR reaction; Glue reclaims the fragment between 140-180bp; Product purification, quality inspection, upper machine order-checking.Detailed process sees reference document 6.The results are shown in Table 12.
Table 112b-RAD bar code thing used joint, bar code thing lowercase represents
The section of the reading number statistics of Different Individual in table 122b-RAD technology
Note: 12 the bar code things being applied to 2b-RAD technology order-checking (carrying out on Illumina Hiseq platform), 2 indexes are (namely for primary characterization: index 1 is AAGCAATG, index 2 is AATCCGAA), for distinguishing the order-checking of 24 paddy rice individualities, (the bar code thing number of needs and length are respectively 12 and 5-9bp; These bar code things and the distribution of length is uniform; Minimum length 5 in the bar code thing produced; The mispairing 3 that any two bar code things at least exist; Several above consecutive identical base 3 can not be there is in bar code thing simultaneously; Produce qualified 1 group of bar code thing combination; Required endonuclease sequence combination can not be comprised, here with BsaXI restriction endonuclease (recognition site ACN in the bar code thing produced 5cTCC) for example is that NNN, N represent any base), namely do not limit base composition in bar code thing.According to above form, in the section of the reading number that sample obtains by checking order, the minimum section of reading number is 6,481,245, and the maximum section of reading number is 12,458,655, and maximum with minimum ratio is 1.92, meets the uniform requirement of data.
4) double digestion GBS technical testing: 24 paddy rice samples, after PstI and MspI enzyme is cut, connect the joint with bar code thing respectively according to table 13; 1-12 sample is mixed into a sample, and 13-24 sample is mixed into a sample; Purifying; 1-12 paddy rice mixture uses the PCR primer (SEQ ID No.12) of universal primer (SEQ ID No.11) and index 1, and 13-24 sample uses the PCR primer (SEQ ID No.13) of universal primer (SEQ ID No.11) and index 2 to carry out PCR reaction; Glue reclaims the fragment between 350-550bp; Product purification, quality inspection, upper machine order-checking.Detailed process sees reference document 5.The results are shown in Table 14.
Table 13 double digestion GBS bar code thing used joint, bar code thing lowercase represents
The section of the reading number statistics of Different Individual in table 14 double digestion GBS technology
Note: 12 the bar code things being applied to double digestion GBS technology order-checking (carrying out on Illumina Hiseq platform), 2 indexes are (namely for primary characterization: index 1 is AAGCAATG, index 2 is AATCCGAA), for distinguishing the order-checking of 24 paddy rice individualities, (the bar code thing number of needs and length are respectively 12 and 5-9bp; These bar code things and the distribution of length is uniform; Minimum length 5 in the bar code thing produced; The mispairing 3 that any two bar code things at least exist; Several above consecutive identical base 3 can not be there is in bar code thing simultaneously; Produce the number 1 group of qualified bar code thing combination; The combination of required endonuclease sequence can not be comprised in the bar code thing produced, here for PstI restriction endonuclease (recognition site CTGCAG) for TGCAG), namely can not there is TGCAG in bar code thing.According to above form, in the section of the reading number that sample obtains by checking order, the minimum section of reading number is 6,487,589, and the maximum section of reading number is 13,324,953, and maximum with minimum ratio is 2.05, meets the uniform requirement of data.
5) double digestion RAD technical testing: 24 paddy rice samples, after EcoRI and MspI enzyme is cut, connect the joint with bar code thing respectively according to table 15; 1-12 sample is mixed into a sample, and 13-24 sample is mixed into a sample; Purifying; 1-12 paddy rice mixture uses the PCR primer (SEQ ID No.12) of universal primer (SEQ ID No.11) and index 1, and 13-24 sample uses the PCR primer (SEQ ID No.13) of universal primer (SEQ ID No.11) and index 2 to carry out PCR reaction; Glue reclaims the fragment between 350-550bp; Product purification, quality inspection, upper machine order-checking.Detailed process sees reference document 3.The results are shown in Table 16.
Table 15 double digestion RAD bar code thing used joint, bar code thing lowercase represents
The section of the reading number statistics of Different Individual in table 16 double digestion RAD technology
Note: 12 the bar code things being applied to double digestion RAD technology order-checking (carrying out on Illumina Hiseq platform), 2 indexes are (namely for primary characterization: index 1 is AAGCAATG, index 2 is AATCCGAA), for distinguishing the order-checking of 24 paddy rice individualities, required endonuclease sequence combination can not be comprised in the bar code thing produced, here for EcoRI restriction endonuclease (recognition site GAATTC) for AATTC), namely can not there is AATTC in bar code thing.According to above form, in the section of the reading number that sample obtains by checking order, the minimum section of reading number is 6,244,682, and the maximum section of reading number is 12,834,134, and maximum with minimum ratio is 2.06, meets the uniform requirement of data.
As can be seen from above form, in bar code thing combination under different technologies route, in the lower machine data produced, the ratio of the maxima and minima of individual number is in 1-3 scope, therefore, think that the data volume of the Different Individual that bar code thing combines is uniform substantially, do not have to occur too large deviation, data can be analyzed for information analysis personnel.It is feasible that experiment proves the optimization of this scheme to bar code thing.
Reference:
1.Baird?NA,et?al.Rapid?SNP?Discovery?and?Genetic?Mapping?Using?Sequenced?RAD?Markers.Plos?One,2008,3:e3376
2.Elshire?RJ,et?al.A?Robust,Simple?Genotyping-by-Sequencing(GBS)Approach?for?High?Diversity?Species.PLoS?ONE,2011,6:e19379
3.Peterson?BK,et?al.Double?Digest?RADseq:An?Inexpensive?Method?for?De?Novo?SNP?Discovery?and?Genotyping?in?Model?and?Non-Model?Species.PLoS?ONE,2012,7:e37135
4.Pfender?WF,et?al.Mapping?with?RAD(restriction-site?associated?DNA)markers?to?rapidly?identify?QTL?for?stem?rust?resistance?in?Lolium?perenne.Theor?Appl?Genet,2011May,122:1467-80
5.Poland?JA,et?al.Development?of?High-Density?Genetic?Maps?for?Barley?and?Wheat?Using?a?Novel?Two-Enzyme?Genotyping-by-Sequencing?Approach.PLoSONE,2012,7:e32253
6.Wang,et?al.2b-RAD:a?simple?and?flexible?method?for?genome-wide?genotyping.Nature?Methods,2012May20,9:808-10。

Claims (9)

1. simplify a method of design for bar code thing in gene order-checking library, described method comprises according to required bar code thing number N and required bar code thing length L, forms bar code thing matrix M (N × L),
Between described bar code thing length L, there is some difference, and L is the longest is max, and the shortest is min, by calculating the difference max-min+1 between max and min, and N is distributed to max successively, max-1, ..., min+1, min, according to all lengths number, length is arranged from big to small, obtains N number of bar code thing that incomplete M, M represent different lengths, and all lengths is uniformly distributed, the every a line in this matrix represents a bar code thing;
Bar code thing meets following condition:
A) in described ill-conditioned matrix, on any row, its total base ATCG composition is even: when namely N is even number, then any row must exist A+C=T+G; When N is odd number, then any row must be A+C+/-1=T+G, wherein when calculating, the shortcoming part occurred due to the length difference of L in M must not be filled; Namely, in all bar code things obtained, on their any same position, the distribution of four kinds of bases is all uniform;
B) all there is the mispairing of more than 3 in any two bar code things;
C) can not there are 3 consecutive identical bases in the based composition of bar code thing;
D) there is not the base composition identical with corresponding restriction enzyme site in bar code thing.
2. one kind simplifies the matrix M (N × L) of bar code thing design in gene order-checking library
The capable L row of N are had in matrix M, represent in M and have length to be the N bar bar code thing of L, between described bar code thing length L, there is some difference, L is the longest is max, the shortest is min, by calculating the difference max-min+1 between max and min, and N is distributed to max successively, max-1, ..., min+1, min, according to all lengths number, length is arranged from big to small, obtains N number of bar code thing that incomplete M, M represent different lengths, and all lengths is uniformly distributed, the every a line in this matrix represents a bar code thing;
Bar code thing meets following condition:
A) in described ill-conditioned matrix, on any row, its total base ATCG composition is even: when namely N is even number, then any row must exist A+C=T+G; When N is odd number, then any row must be A+C+/-1=T+G, when calculating, the shortcoming part occurred due to the length difference of L in M must not be filled; Namely, in all bar code things obtained, on their any same position, the distribution of four kinds of bases is all uniform;
B) all there is the mispairing of more than 3 in any two bar code things;
C) can not there are 3 consecutive identical bases in the based composition of bar code thing;
D) there is not the base composition identical with corresponding restriction enzyme site in bar code thing.
3. the method for claim 1 or the design bar code thing matrix M of claim 2, wherein N is 3-36, preferred 6-24, preferred 8-20, more preferably 12.
4. the method for claim 1 or the design bar code thing matrix M of claim 2, wherein the lower limit of length L be 2,3,4,5,6,7,8,9,10,11,12,13,14, the upper limit of 15bp, L 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20bp; Such as between 5-9bp.
5. the method for claim 1 or the design bar code thing matrix M of claim 2, restriction endonuclease is EcoRI, ApeKI, BsaXI, PstI etc.
6. the method for claim 1 or the design bar code thing matrix M of claim 2, wherein said simplification gene order-checking is RAD, 2b-RAD, double digestion GBS, double digestion RAD.
7. the method for claim 1 or the design bar code thing matrix M of claim 2, wherein said design bar code thing is SEQ ID No.38-49, SEQ ID No.74-85, SEQ ID No.110-121, SEQ ID No.146-157, SEQ ID No.182-193, SEQ ID No.194-205, SEQ ID No.206-217, SEQ ID No.218-229, SEQ ID No.230-241, SEQ ID No.242-253 or SEQ ID No.254-265.
8. the bar code thing joint of the barcode of the design bar code thing matrix M of the design bar code thing matrix M that the method containing any one of claim 1-7 obtains or any one of claim 2-7.
9. the bar code thing joint of claim 8, described joint is SEQ ID No.14-37, SEQ ID No.50-73, SEQ ID No.86-109, SEQ ID No.122-145 or SEQ ID No.158-181.
CN201310233343.8A 2013-06-13 2013-06-13 Barcode object in reduced-representation genome sequencing library and design method thereof Pending CN104232626A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310233343.8A CN104232626A (en) 2013-06-13 2013-06-13 Barcode object in reduced-representation genome sequencing library and design method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310233343.8A CN104232626A (en) 2013-06-13 2013-06-13 Barcode object in reduced-representation genome sequencing library and design method thereof

Publications (1)

Publication Number Publication Date
CN104232626A true CN104232626A (en) 2014-12-24

Family

ID=52221524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310233343.8A Pending CN104232626A (en) 2013-06-13 2013-06-13 Barcode object in reduced-representation genome sequencing library and design method thereof

Country Status (1)

Country Link
CN (1) CN104232626A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104480217A (en) * 2014-12-26 2015-04-01 上海派森诺生物科技有限公司 Simplified genome sequencing method
CN104562214A (en) * 2014-12-26 2015-04-29 上海派森诺生物科技有限公司 Reduced-representation genome library building method based on type IIB restriction enzyme digestion
CN105696088A (en) * 2015-11-11 2016-06-22 中国科学院昆明植物研究所 Construction method for double enzyme digestion simplified genome next generation sequencing library and matched kit
CN108018607A (en) * 2016-10-28 2018-05-11 深圳华大基因股份有限公司 A kind of sequence label for lifting microarray dataset library fractionation rate mixes storehouse method and apparatus
CN108179174A (en) * 2018-01-15 2018-06-19 武汉爱基百客生物科技有限公司 A kind of high-throughput construction method for simplifying gene order-checking library
CN111961707A (en) * 2020-10-14 2020-11-20 苏州贝康医疗器械有限公司 Nucleic acid library construction method and application thereof in analysis of embryo chromosome structural abnormality before implantation
US11708574B2 (en) * 2016-06-10 2023-07-25 Myriad Women's Health, Inc. Nucleic acid sequencing adapters and uses thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010039991A2 (en) * 2008-10-02 2010-04-08 The Texas A&M University System Method of generating informative dna templates for high-throughput sequencing applications
CN102115789A (en) * 2010-12-15 2011-07-06 厦门大学 Nucleic acid label for second-generation high-flux sequencing and design method thereof
CN102409048A (en) * 2010-09-21 2012-04-11 深圳华大基因科技有限公司 DNA index library building method based on high throughput sequencing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010039991A2 (en) * 2008-10-02 2010-04-08 The Texas A&M University System Method of generating informative dna templates for high-throughput sequencing applications
CN102409048A (en) * 2010-09-21 2012-04-11 深圳华大基因科技有限公司 DNA index library building method based on high throughput sequencing
CN102115789A (en) * 2010-12-15 2011-07-06 厦门大学 Nucleic acid label for second-generation high-flux sequencing and design method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JESSE A. POLAND, ET AL.: "Development of High-Density Genetic Maps for Barley and Wheat Using a Novel Two-Enzyme Genotyping-by-Sequencing Approach", 《PLOS ONE》 *
ROBERT J. ELSHIRE, ET AL.: "A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species", 《PLOS ONE》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104480217A (en) * 2014-12-26 2015-04-01 上海派森诺生物科技有限公司 Simplified genome sequencing method
CN104562214A (en) * 2014-12-26 2015-04-29 上海派森诺生物科技有限公司 Reduced-representation genome library building method based on type IIB restriction enzyme digestion
CN105696088A (en) * 2015-11-11 2016-06-22 中国科学院昆明植物研究所 Construction method for double enzyme digestion simplified genome next generation sequencing library and matched kit
CN105696088B (en) * 2015-11-11 2018-05-01 中国科学院昆明植物研究所 A kind of double digestion simplifies genome two generations sequencing library construction method and matched reagent box
US11708574B2 (en) * 2016-06-10 2023-07-25 Myriad Women's Health, Inc. Nucleic acid sequencing adapters and uses thereof
CN108018607A (en) * 2016-10-28 2018-05-11 深圳华大基因股份有限公司 A kind of sequence label for lifting microarray dataset library fractionation rate mixes storehouse method and apparatus
CN108018607B (en) * 2016-10-28 2021-04-27 深圳华大基因股份有限公司 Tag sequence library mixing method and device for improving sequencing platform library resolution rate
CN108179174A (en) * 2018-01-15 2018-06-19 武汉爱基百客生物科技有限公司 A kind of high-throughput construction method for simplifying gene order-checking library
CN111961707A (en) * 2020-10-14 2020-11-20 苏州贝康医疗器械有限公司 Nucleic acid library construction method and application thereof in analysis of embryo chromosome structural abnormality before implantation
CN111961707B (en) * 2020-10-14 2021-01-15 苏州贝康医疗器械有限公司 Nucleic acid library construction method and application thereof in analysis of embryo chromosome structural abnormality before implantation

Similar Documents

Publication Publication Date Title
CN104232626A (en) Barcode object in reduced-representation genome sequencing library and design method thereof
Elshire et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species
CN105506125B (en) A kind of sequencing approach and a kind of two generation sequencing libraries of DNA
CN105121664B (en) Mixture and its it is compositions related in nucleic acid sequencing approach
JP5801349B2 (en) Method for identifying the clonal source of restriction fragments
CN104232627B (en) 2b-RAD pooling technology
US20140121118A1 (en) Methods, systems and compositions regarding multiplex construction protein amino-acid substitutions and identification of sequence-activity relationships, to provide gene replacement such as with tagged mutant genes, such as via efficient homologous recombination
US20160194699A1 (en) Molecular coding for analysis of composition of macromolecules and molecular complexes
JP7332733B2 (en) High molecular weight DNA sample tracking tags for next generation sequencing
US20220127597A1 (en) Haplotagging - haplotype phasing and single-tube combinatorial barcoding of nucleic acid molecules using bead-immobilized tn5 transposase
CN106555226A (en) A kind of method and test kit for building high-throughput sequencing library
JP6430631B2 (en) Linker elements and methods for constructing sequencing libraries using them
US20200010875A1 (en) Method for increasing throughput of single molecule sequencing by concatenating short dna fragments
US9334532B2 (en) Complexity reduction method
CN113005121B (en) Linker elements, kits and uses related thereto
CN113308562B (en) Cotton whole genome 40K single nucleotide site and application thereof in cotton genotyping
WO2012037875A1 (en) Dna tags and use thereof
Myllykangas et al. Targeted sequencing library preparation by genomic DNA circularization
US8829172B2 (en) Multiplex barcoded paired-end diTag (mbPED) sequencing approach and ITS application in fusion gene identification
CN108026525A (en) The composition and method of polynucleotides assembling
CN107002150B (en) High-throughput detection method for DNA synthesis product
Zhang et al. LIFE‐Seq: a universal L arge I ntegrated DNA F ragment E nrichment Seq uencing strategy for deciphering the transgene integration of genetically modified organisms
CN113811618B (en) Sequencing library construction based on methylated DNA target region, system and application
CN114207229A (en) Flexible and high throughput sequencing of target genomic regions
CN106702007B (en) High-resolution detection method for nucleic acid binding/melting fluorescence

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141224