CN110042148A

CN110042148A - A kind of method and its application of effective acquisition chloroplast DNA sequencing data

Info

Publication number: CN110042148A
Application number: CN201810040063.8A
Authority: CN
Inventors: 宋跃; 宋波; 符渊; 刘欢; 程时锋; 刘心
Original assignee: Shenzhen BGI Life Science Research Institute
Current assignee: Shenzhen Huada Sansheng Garden Technology Co ltd
Priority date: 2018-01-16
Filing date: 2018-01-16
Publication date: 2019-07-23
Anticipated expiration: 2038-01-16
Also published as: CN110042148B

Abstract

This application discloses a kind of method and its application of effective acquisition chloroplast DNA sequencing data.The present processes construct nonredundancy chloroplaset sequence intersection including the use of chloroplaset nucleic acid sequence set；Probe is designed according to nonredundancy chloroplaset sequence intersection, the capture of chloroplast DNA segment is carried out to sample to be tested full-length genome, then be sequenced, obtains chloroplast DNA sequencing data.The present processes can capture chloroplast DNA segment directly from sample to be tested full-length genome, simple to operate, require sample quality lower.The present processes have broad applicability, can the plant chloroplast DNA to Different Evolutionary branch capture, capture region does not have Preference, can guarantee sequencing output data wide spreadability；Chloroplast DNA sequencing data especially suitable for large-scale larger evolutionary branching obtains, and the further investigation for large-scale plant evolution and heredity is laid a good foundation.

Description

A kind of method and its application of effective acquisition chloroplast DNA sequencing data

Technical field

This application involves chloroplasets, and detection field is sequenced, more particularly to a kind of effective acquisition chloroplast DNA sequencing data Method and its application.

Background technique

Chloroplaset is managed as, containing the organelle of hereditary information, having become in Plant Genome Research at present in plant The carrier of genetic information thought.It is simple with structure, and the small and highly conserved gene region characteristic of genome is very suitable to make It is sequenced and is assembled with second generation sequencing technologies (NGS).Chloroplaset can be interpreted comprehensively by second generation sequencing technologies to be taken The hereditary information of band, to plant classification is solved, heredity and plant geography research are evolved and carried out to plant genetic all with weight Want meaning.

There are mainly two types of the methods for obtaining chloroplast DNA sequencing data at present.First method is that skill is sequenced in the second generation With regard to already existing chloroplaset enrichment method before art, specifically, being first to be enriched with to chloroplaset itself, leaf is then extracted again Green body DNA finally carries out machine sequencing and obtains chloroplast DNA sequencing data.This method, sample bigger to sample requirement amount Quality also has higher requirements, and is not suitable for treasuring species or specimen samples；Also, chloroplaset enrichment process is cumbersome, It is not easy to grasp；It needs to expend a large amount of human and material resources and time simultaneously, is not suitable for carrying out large batch of chloroplaset sequencing.The Two kinds of methods are that the data separation method based on second generation sequencing technologies has been able to the maturation of second generation sequencing technologies Generation is several times as much as the data of plant full-length genome (abbreviation gDNA), wherein including a large amount of Matrix attachment regions and Chloroplast gene number According to；Separation is carried out to obtain chloroplast DNA sequencing data according to chloroplaset and Matrix attachment region to sequencing data.This method pair Bioinformatics technique is more demanding；Also, chloroplast DNA generally only accounts for full-length genome (gDNA) about 0.5-13%, and data The small part of isolated chloroplast DNA sequencing data and the total chloroplast DNA data of Zhan, causes the wave of a large amount of sequencing datas Take, reasonable chloroplaset sequencing data amount cannot be obtained as required, is not also suitable for large batch of chloroplaset sequencing point equally Analysis.

With going deep into plant research, need more and more to the chloroplast gene group information of Different Evolutionary background into The heredity and Study on Evolution of row system carry out large-scale Chloroplast gene sequencing and have become necessary choice.However it is existing Acquisition chloroplast DNA sequencing data method, be not able to satisfy the use demand of extensive Chloroplast gene sequencing already, this Greatly limit and hinder the further investigation and exploitation of plant genetic and evolution.

Summary of the invention

The purpose of the application is to provide a kind of method and its application of new effective acquisition chloroplast DNA sequencing data.

The application uses following technical scheme:

The one side of the application discloses a kind of method of effective acquisition chloroplast DNA sequencing data, green including the use of leaf Body nucleic acid sequence set constructs nonredundancy chloroplaset sequence intersection；It is carried out according to constructed nonredundancy chloroplaset sequence intersection Probe design；Hybrid capture is carried out using full-length genome of the designed probe to sample to be tested, obtains the chloroplast DNA of enrichment Segment；The chloroplast DNA segment of enrichment is sequenced, the chloroplast DNA sequencing data of the application is obtained.Wherein, chloroplaset Nucleic acid sequence set is the set collecting the existing all chloroplaset nucleic acid sequences disclosed and being formed；And construct nonredundancy leaf Green body sequence intersection is primarily referred to as, and removes duplicate redundant sequence in all chloroplaset nucleic acid sequences, and the sequence finally obtained is closed Collection.

It should be noted that the chloroplast DNA sequencing data acquisition methods of the application, creative constructing in advance are non-superfluous Then remaining chloroplaset sequence intersection carries out probe design for nonredundancy chloroplaset sequence intersection again.On the one hand, the side of the application Method can capture chloroplast DNA segment directly from the full-length genome of sample to be tested, relative to extracting after direct enrichment chloroplaset For the method for its DNA, the present processes are more simple and convenient, moreover, the requirement to sample quality is relatively low.On the other hand, In the present processes, designed probe has wide applicability, being capable of plant chloroplast DNA to Different Evolutionary branch It is captured；The region of capture does not have Preference, can guarantee randomness and wide spreadability that output data is sequenced；Compared to existing For some data separation methods, the present processes covering is wider, more efficient, can capture and obtain chloroplast DNA 90% or more gene order and data.Also, the present processes, the leaf especially suitable for large-scale larger evolutionary branching The acquisition of green body DNA sequencing data, the further investigation for large-scale plant evolution and heredity are laid a good foundation.

Preferably, wherein building nonredundancy chloroplaset sequence intersection specifically includes following steps,

(1) public database is utilized, all chloroplaset nucleic acid sequences disclosed is obtained, obtains chloroplaset nucleic acid sequence Set；

(2) according to the species information of each chloroplaset nucleic acid sequence, according to spore relationship, with the chloroplaset nucleic acid of acquisition Species chadogram is constructed based on arrangement set, and all chloroplaset nucleic acid sequences are screened according to constructed spore tree, Ensure to retain in each evolutionary branching of spore tree the chloroplaset nucleic acid sequence of the 1-2 assembling preferable species of result, Obtain initial chloroplaset nucleic acid sequence intersection；

(3) one of object is selected according to the evolutionary degree of spore tree according to initial chloroplaset nucleic acid sequence intersection For kind labeled as species are referred to, remaining is labeled as non-reference species, by the sequence of reference species and the progress of the sequence of non-reference species It compares two-by-two, location information of the high similarity homology region in non-reference species gene group is recorded according to comparison result, and will The nucleic acid sequence annotation of high similarity homology region is N；Meanwhile the sequence of reference species itself is compared, it will be high similar It spends longest sequence in homology region to retain, the nucleic acid sequence annotation of reinforcement similarity homology region is N；

Wherein, it is compared two-by-two with reference to the sequence of species and the sequence of non-reference species, can be removed and refers to species The redundant sequence of the high similarity of sequence, and refer to species sequence itself comparison, allow for reference in species there are it is multiple come In the case where source or multiple duplicate sequences, to remove with reference to the redundant sequence in species own sequence；The application removal is superfluous The mode of remaining sequence is to annotate the nucleic acid sequence of high similarity homology region for N, that is to say, that and this section of sequence of indirect deletion Column, but its nucleotide annotate as N, the region annotated as N will not be analyzed when analysis, nonredundancy can be made in this way The location information of chloroplaset sequence will not change before and after removing redundancy；

(4) based on the intersection of step (3) removal redundant sequence obtained, according to the method for step (3), one by one more It changes with reference to species, is iterated comparison, come out until not new high similarity homology region is identified, likewise, according to step Suddenly the method for (3) annotates the nucleic acid sequence of high similarity homology region for N, alternatively, knot will be compared with reference to species own sequence The nucleic acid sequence annotation of shorter high similarity homology region is N in fruit；Obtain nonredundancy chloroplaset sequence intersection.

Wherein, replacement refers to reference to species and all makees each species in initial chloroplaset nucleic acid sequence intersection respectively one by one For with reference to species, iteration one by one is compared, until all species all compare completion, the high similarity homologous region of all species is removed The redundant sequence in domain.Iteration comparison refers to that the de-redundancy carried out based on the result that last time compares de-redundancy next time compares. The nucleic acid sequence of high similarity homology region shorter in reference species own sequence comparison result is annotated as N, refers to reservation Longest sequence in high similarity homology region, the nucleic acid sequence of reinforcement similarity homology region all annotate as N.

Preferably, the judgment basis of high similarity homology region is that similarity is greater than 90%, and the length of aligned sequences Greater than 90bp.

Preferably, probe design is carried out according to constructed nonredundancy chloroplaset sequence intersection, specifically includes following steps,

(1) each to the upstream and downstream of its each section of nucleic acid sequence according to nonredundancy chloroplaset sequence intersection obtained Extend 30-45bp, obtains the location coordinate information of probe design section；If upstream or the downstream area alkali of certain section of nucleic acid sequence Base length is less than 30bp, then directly using the location information of this section of nucleic acid sequence as the location coordinate information of probe design section；

(2) location coordinate information obtained according to step (1), in the probe design section of location coordinate information mark, Design the specific hybrid capture probe of each nucleic acid sequence in nonredundancy chloroplaset sequence intersection.

It should be noted that in the application, since the nucleotide of redundant sequence is annotated with N, all positions Coordinate information, the coordinate information actually and in original genomic sequence.In the application, to the upper of its each section of nucleic acid sequence Trip and downstream respectively extend 30-45bp and refer to, in nonredundancy chloroplaset sequence intersection, in the coordinate information of each section of nucleic acid sequence On the basis of, then to upstream and downstream respectively extend 30-45bp, the area coordinate as the design of last probe.When designing probe, In probe design section, script can be annotated and revert to script sequence for the upstream and downstream 30-45bp of N.In the application, upstream is under Swimming each extension 30-45bp is that comprehensive two o'clock accounts for: first, it can guarantee the marginal position for probe design section, have Enough additional sequences carry out probe design, so as to pick out optimal probe sequence.Second, the probe sequence of design is long Degree is about 90bp, is set as extending 30-45bp, can guarantee that the base of the probe sequence at least 50% or more of design can be covered Cover the region for needing to design probe.It is appreciated that upstream and downstream respectively extends 30-45bp, this range is in practical operation Reference value can carry out appropriate adjustment, be not specifically limited herein specifically when design.

The another side of the application discloses the method for the application effective acquisition chloroplast DNA sequencing data in chloroplast DNA Application in enrichment, chloroplaset library construction, the extensive plant evolution research based on chloroplaset information or genetic research.

It should be noted that the method for the effective acquisition chloroplast DNA sequencing data of the application, key, which is that, to be passed through Nonredundancy chloroplaset sequence intersection is constructed, and probe is designed according to the nonredundancy chloroplaset sequence intersection, carries out chloroplast DNA piece Section enrichment；To efficiently obtain chloroplast DNA sequencing data.Therefore, the present processes completely can be by its Chloroplast DNA fragmentation enriching section, which pulls out, to be come, and chloroplast DNA enrichment or chloroplaset library construction are individually carried out.In addition, the application The method of effective acquisition chloroplast DNA sequencing data, the chloroplast DNA sequencing especially suitable for large-scale larger evolutionary branching Therefore the acquisition of data also can be completely used for the research of extensive plant evolution or genetic research based on chloroplaset information.

The application's discloses a kind of method for preparing chloroplast DNA segment hybrid capture probe on one side again, including following Step, (one) utilize chloroplaset nucleic acid sequence set, construct nonredundancy chloroplaset sequence intersection；(2) according to constructed non-superfluous Remaining chloroplaset sequence intersection carries out probe design, obtains the chloroplast DNA segment hybrid capture probe.

Preferably, the method that the application prepares chloroplast DNA segment hybrid capture probe in step (1), constructs non-superfluous Remaining chloroplaset sequence intersection, specifically includes,

(1) initial chloroplaset nucleic acid sequence intersection is obtained: according to the species information of each chloroplaset nucleic acid sequence, according to species Evolutionary relationship is constructed species chadogram based on chloroplaset nucleic acid sequence set, and is sieved according to constructed spore tree Select all chloroplaset nucleic acid sequences, it is ensured that it is preferable to retain 1-2 assembling result in each evolutionary branching of spore tree The chloroplaset nucleic acid sequence of species obtains initial chloroplaset nucleic acid sequence intersection；

(2) de-redundancy is carried out based on reference to species: according to initial chloroplaset nucleic acid sequence intersection, according to spore The evolutionary degree of tree is selected one of species and is labeled as with reference to species, remaining is labeled as non-reference species, by reference species Sequence and the sequence of non-reference species are compared two-by-two, record high similarity homology region in non-reference object according to comparison result Location information in kind genome, and the nucleic acid sequence of high similarity homology region is annotated as N；Meanwhile certainly to reference species The sequence of body is compared, and sequence longest in high similarity homology region is retained, the core of reinforcement similarity homology region Acid sequence injection is interpreted as N；

(3) iteration, which compares, obtains nonredundancy chloroplaset sequence intersection: with the conjunction of step (2) removal redundant sequence obtained Based on collection, according to the method for step (2), replacement refers to species one by one, is iterated comparison, until not new high similarity Homology region is identified to be come out, likewise, the nucleic acid sequence annotation of high similarity homology region is by the method according to step (2) N, alternatively, annotating the nucleic acid sequence of high similarity homology region shorter in reference species own sequence comparison result for N；I.e. Obtain nonredundancy chloroplaset sequence intersection.

Preferably, the method that the application prepares chloroplast DNA segment hybrid capture probe, in step (2), according to institute's structure The nonredundancy chloroplaset sequence intersection built carries out probe design, specifically includes,

(1) location coordinate information of probe design section is determined: right according to nonredundancy chloroplaset sequence intersection obtained The upstream and downstream of its each section of nucleic acid sequence respectively extends 30-45bp, obtains the location coordinate information of probe design section；If The upstream of certain section of nucleic acid sequence or downstream area bases longs are less than 30bp, then are directly made with the location information of this section of nucleic acid sequence For the location coordinate information of probe design section；

(2) probe designs: the location coordinate information obtained according to step (1), marks in the location coordinate information Probe design section in, design nonredundancy chloroplaset sequence intersection in each nucleic acid sequence specific hybrid capture probe, i.e., Chloroplast DNA segment hybrid capture probe.

Preferential, in step (1), chloroplaset nucleic acid sequence collection is combined into all having draped over one's shoulders using public database acquisition The set of the chloroplaset nucleic acid sequence of dew.

It is appreciated that as set forth above, the method that the application prepares chloroplast DNA segment hybrid capture probe, it is real It is exactly formed with reference to the part steps in the method for the application effective acquisition chloroplast DNA sequencing data on border.

The leaf for disclosing the application on one side again and preparing the method preparation of chloroplast DNA segment hybrid capture probe of the application Green body DNA fragmentation hybrid capture probe.

It is appreciated that using the chloroplast DNA segment hybrid capture probe of the present processes preparation, it can not only be to not Plant chloroplast DNA fragmentation with evolutionary branching is captured；Moreover, can realize full-length genome substantially to Chloroplast gene The segment of level captures, and in angiosperm, gymnosperm and pteridophyte, can capture > 90% genome area.

The application's discloses a kind of method of chloroplast DNA enrichment on one side again, including the chloroplast DNA using the application Segment hybrid capture probe carries out hybrid capture to the full-length genome of sample to be tested, realizes chloroplast DNA enrichment.

The application's discloses a kind of construction method in chloroplast DNA library on one side again, including green using the leaf of the application Body DNA fragmentation hybrid capture probe carries out hybrid capture to the full-length genome of sample to be tested, realizes chloroplast DNA enrichment, then Library construction is carried out using the chloroplast DNA of enrichment, obtains chloroplast DNA library.

The beneficial effects of the present application are as follows:

The method of the application effective acquisition chloroplast DNA sequencing data, can be directly from the full-length genome of sample to be tested Capture chloroplast DNA segment, it is simple to operate, moreover, the requirement to sample quality is relatively low, as long as can extract acquisition to The full-length genome of sample.In addition, the present processes have wide applicability, it can be to the plant of Different Evolutionary branch Object chloroplast DNA is captured；The region of capture does not have Preference, can guarantee randomness and all standing that output data is sequenced Property；For existing data separation method, the present processes covering is wider, more efficient.The present processes, it is special Not Shi Yongyu large-scale larger evolutionary branching chloroplast DNA sequencing data acquisition, be large-scale plant evolution and something lost The further investigation of biography is laid a good foundation.

Specific embodiment

With going deep into plant research, the chloroplast gene group information of more and more Different Evolutionary backgrounds is needed to be The heredity and Study on Evolution of system carry out large-scale Chloroplast gene sequencing and have become necessary choice.But existing leaf Green body gene sequencing data capture method is not particularly suited for large-scale chloroplaset sequencing.Existing capture probe technology is not yet Probe design is carried out on the region of chloroplaset full-length genome, cannot carry out the research of chloroplaset full-length genome comprehensively, also not have A set of probe that can be effectively captured for the chloroplaset complete genome DNA segment of Different Evolutionary branch.

For this purpose, the application has developed a kind of new strategy of chloroplast genomic dna fragment probe capture, ground using the application The hybrid capture probe of hair strategy is efficiently enriched with chloroplast DNA segment, is then sequenced, can efficiently be obtained Chloroplast DNA sequencing data, be both able to satisfy in scientific research to generate data volume requirement, while also can be effectively reduced sequencing at This, such that carrying out extensive chloroplaset using second generation sequencing technologies is sequenced.

Based on the above research, present applicant proposes a kind of method of effective acquisition chloroplast DNA sequencing data, a kind of prepare The method of chloroplast DNA segment hybrid capture probe, a kind of method of chloroplast DNA enrichment and a kind of chloroplast DNA library Construction method.The core of the above each method is essentially all the hybrid capture probe of the application R & D Strategy to chloroplast DNA Segment carries out efficiently concentrating；Nonredundancy chloroplaset sequence intersection is constructed first, then according to constructed nonredundancy chloroplaset Sequence intersection carries out probe design, obtains chloroplast DNA segment hybrid capture probe, or further carry out leaf using probe Green body DNA enrichment, or library construction is carried out to the chloroplast DNA of enrichment, or survey to the chloroplast DNA of enrichment Sequence obtains chloroplast DNA sequencing data.

It is appreciated that the application constructs nonredundancy chloroplaset sequence intersection first, then according to constructed nonredundancy leaf The basic invention thinking that green body sequence intersection carries out probe design is not limited only to chloroplaset, is also applied for other organelles or target Gene.

The application is described in further detail below by specific embodiment.Following embodiment only to the application carry out into One step explanation, should not be construed as the limitation to the application.

Embodiment

The method that this example obtains chloroplast DNA sequencing data, including the use of the chloroplaset disclosed in public database Nucleic acid sequence constructs nonredundancy chloroplaset sequence intersection, and wherein public database Primary Reference nucleic acid data collection is relatively complete The NCBI in face, this example have collected plant chloroplast gene data all on NCBI, carry out nonredundancy chloroplaset sequence intersection structure It builds.Then, probe design is carried out according to constructed nonredundancy chloroplaset sequence intersection；Test sample is treated using designed probe The full-length genome of product carries out hybrid capture, obtains the chloroplast DNA segment of enrichment；The chloroplast DNA segment of enrichment is surveyed Sequence obtains chloroplast DNA sequencing data.It is specific as follows:

1. constructing nonredundancy chloroplaset sequence intersection

(1) public database is utilized, all chloroplaset nucleic acid sequences disclosed is obtained, obtains chloroplaset nucleic acid sequence Set.This example downloads the Chloroplast gene sequence all having disclosed from NCBI public database and same species is chosen It selects the sequence that assembling quality is best, in the chloroplaset nucleic acid sequence set of this example, has collected 567 sequences of 544 species in total Column.

(2) according to the species information of each chloroplaset nucleic acid sequence, according to spore relationship, with chloroplaset nucleic acid sequence collection It is combined into fundamental construction spore tree, and all chloroplaset nucleic acid sequences are screened according to constructed spore tree, it is ensured that object The chloroplaset nucleic acid sequence for retaining the 1-2 assembling preferable species of result in each evolutionary branching of kind chadogram, obtains just Beginning chloroplaset nucleic acid sequence intersection.In the initial chloroplaset nucleic acid sequence intersection of this example, finally, 99 plant species are picked out altogether Chloroplast gene nucleic acid sequence, specific nucleic acid sequence information is as shown in table 1.

99 plant species and its chloroplast gene group information that table 1 is selected

(3) nucleic acid sequence in initial chloroplaset nucleic acid sequence intersection is carried out comparing the high phase obtained between sequence two-by-two De-redundancy is carried out like the homologous block of degree, and to the homologous block of high similarity.Specifically, genome sequence intersection is according to assembling result And evolutionary degree, it picks out arabidopsis and is used as with reference to species, remaining species is labeled as non-reference species.To non-reference species gene Group sequence construct sequence index, using arabidopsis chloroplaset genome sequence as search sequence.By blastn by arabidopsis Chloroplast gene sequence and non-reference species gene group sequence carry out sequence alignment, according to e-value < 1e^-5To comparison result It is screened.In the result that screening obtains, defined nucleotide sequence similarity > 90%, and aligned sequences length > 90bp comparison knot High similarity homology region of the fruit between sequence, and high similarity homology region is recorded in non-reference species according to comparison result Location information in genome.Retain arabidopsis chloroplaset genomic nucleic acids information；It is high by non-reference species gene group sequence The nucleic acid sequence annotation of similarity homology region is N.Meanwhile arabidopsis chloroplaset genome sequence being carried out certainly by blastn Sequence alignment between body according to above-mentioned identical screening conditions and defines the condition of high similarity homology region, according to identifying High similarity homology region sequence length, longest homologous sequence is retained, the sequence of reinforcement similarity homology region Column annotation is N.

(4) sequence Jing Guo step (3) de-redundancy, replacement refer to species, are iterated comparison to the result of step (3), High similarity homology region all in intersection sequence is identified, de-redundancy is carried out to the high similarity homology region of identification.It is logical The sequence homology for crossing step (3) compares, and obtains non-reference species gene group sequence and arabidopsis chloroplaset sequence and arabidopsis High similarity homology region between chloroplaset itself, and de-redundancy is carried out to these homologous blocks.The purpose of step (4) be for De-redundancy further is carried out to the high similarity homology region between 99 Chloroplast gene sequences.By warp in step (3) The sequence intersection obtained after de-redundancy is crossed to compare using the iteration that blastn carries out itself；Each time after iteration, ratio is checked To in result whether there is also the high similarity homology region between sequence, similarity > 90% and aligned sequences length > 90bp, and according to the method in step (3), the shorter nucleic acid sequence in high similarity homology region is annotated as N.Iteration ratio To until not new high similarity homology region is identified out.New sequence intersection, i.e. the nonredundancy chloroplaset sequence of this example Column intersection, having between any two sequences does not have high similarity homology region, and at the same time it is green to retain different plant species leaf Between body sequence the characteristics of sequence polymorphism information, the probe suitable for carrying out next step is designed.

2. designing probe according to nonredundancy chloroplaset sequence intersection

(1) from nonredundancy chloroplaset sequence intersection, the coordinate information of probe design section is obtained according to nucleic acid sequence.Through It crosses the sequence intersection annotated again to annotate the nucleic acid sequence of the homologous block of similarity high between sequence for N, be formed non- It is the design section of probe that the sequence intersection of redundancy, which is not the region of N in each sequence in intersection,.Specifically, each to its The upstream and downstream of section nucleic acid sequence respectively extends 40bp and obtains the location coordinate information of probe design section；If certain section of nucleic acid sequence The upstream of column or downstream area bases longs are less than 40bp, then are directly designed using the location information of this section of nucleic acid sequence as probe The location coordinate information in region.

(2) it according to the location coordinate information of acquisition, in the probe design section of location coordinate information mark, designs non-superfluous The specific hybrid capture probe of each nucleic acid sequence in remaining chloroplaset sequence intersection.Wherein, the sequence intersection amplifying nucleic acid of nonredundancy Sequence area can be corresponded to directly on original series, therefore, the location coordinate information of all sequences, as original gene group sequence Coordinate information in column.

In this example, the location coordinate information in arabidopsis thaliana sequence for probe design is as shown in table 2.

The location coordinate information of 2 arabidopsis thaliana sequence middle probe of table design

This example carries out probe design using the sequence capturing probe design software NimbleDesign of Roche Holding Ag, and this example is total 180519 hybrid capture probes have been designed and synthesized altogether.

Plant chloroplast genome sequence in this example comprehensive collection Different Evolutionary branch combines leaf using spore tree Green body genome assembling quality, screens the genome sequence of collection, ensure that the genome sequence that design probe is used Assembling quality all with higher, while having corresponding species chloroplaset sequence in each evolutionary branching, it ensure that design Probe out has extensive use property for different plant chloroplasts.Using dynamic programming algorithm to Chloroplast gene sequence It is compared two-by-two, according to the comparison length between sequence similarity and sequence as the homologous region between standard identification sequence Block, the nucleic acid sequence annotation to the homologous block identified is N, de-redundancy.It is reference sequences by setting arabidopsis, improves sequence The operability of column intersection de-redundancy.It compares, is utmostly reduced in the sequence intersection of collection due to chloroplaset base by iteration The bulk redundancy sequence because of present in sequence intersection caused by organizing highly conserved characteristic.Sequence intersection by de-redundancy, according to Nucleic acid sequence coordinate corresponds to original genomic sequence center acid region and had both contained sequence polymorphism between different plant species, Redundancy between sequence is also effectively reduced simultaneously.The probe of this example design can completely cover the Chloroplast gene of different plant species Sequence, to Chloroplast gene segment have preferable capture effect, also, capture chloroplast DNA segment can directly into Row builds library and sequencing.

In order to test chloroplaset capture effect of the probe to Different Evolutionary branch plant of this example design, this example is used and is closed At probe, hybrid capture has been carried out to the full-length genome of three tomato, ginkgo and lotus throne fern plant species respectively, it is each to obtain From chloroplast DNA segment, and the chloroplast DNA segment of capture is sequenced respectively, to obtain chloroplast DNA sequencing number According to.

It should be noted that selection tomato, ginkgo and lotus throne fern these three plant species, the reason is that, these three species There is complete Chloroplast gene sequence in NCBI, and three is belonging respectively to different evolutionary branchings, tomato belongs to quilt Plant, ginkgo belongs to gymnosperm, and lotus throne Cyclosorus is in pteridophyte.It is tested using the plant of Different Evolutionary branch, it can To detect capture ability of the probe to the chloroplast DNA segment of Different Evolutionary branch plant of this example design.

Chloroplast gene pack is carried out to the full-length genome of tomato, ginkgo and lotus throne fern respectively using the probe of this example design The probe catching method of section capture, this example refers to Qiao, Xian et al. " Genome-wide Target Enrichment- aided Chip Design:a 66K SNP Chip for Cashmere Goat."Scientific Reports7,no.1 (2017):8621；It is specific as follows:

(1) Plant Genome gDNA fragmentation is handled

The Plant Genome extracts kit provided using TIANGEN company is respectively to tomato, ginkgo and lotus throne fern sample Carry out gDNA extraction.Obtained gDNA is interrupted genomic DNA to the segment of 170bp using Covaris LE220 ultrasonic instrument. In the way of magnetic bead absorption, by the Ampure XP magnetic bead of 1 times of volume in the DNA interrupted, absorption 5 minutes is mixed, supernatant is taken Magnetic bead is abandoned, the XP magnetic bead of DNA sample volume Yu 0.5 times of magnetic bead volume is added, abandons supernatant, magnetic after mixing absorption 10~15 minutes Pearl is washed twice with 75% ethyl alcohol, the TE eluted dna of 42 μ L.

(2) end reparation is carried out to the plant gDNA of fragmentation

Will purifying with XP magnetic bead to be placed at room temperature for 30min spare.

The DNA fragmentation of each 42 μ L of sample is added in 58 μ L End Repair Master Mix and is reacted.58μL End Repair Master Mix prepare system include: 10 × End Repair Buffer, 10 μ L, 1.6 dNTPs μ L, 1 μ L of T4DNA Polymerase, 2 μ L of Klenow DNA Polymerase, 2.2 μ L of T4Polynucleotide Kinase, Supplement ddH₂O to 58 μ L.

Reaction solution is mixed, is placed in PCR instrument and reacts, reaction condition are as follows: 20 DEG C of reaction 30min are kept in 4 DEG C.

180 μ L XP magnetic beads are added in each reaction tube, stand 5min；

Supernatant is removed, adds 200 μ L, 80% ethyl alcohol, stands 30s on magnetic frame；

Supernatant is removed, 32 μ L ddH are added₂O after standing 5min on magnetic frame, draws 30 μ L liquid to new 1.5mL Centrifuge tube obtains 30 μ L DNA fragmentations of purifying.

20 μ L Adenylation Master Mix will be added in 30 μ L DNA fragmentations of obtained purifying to react. The Adenylation Master Mix of 20 μ L prepare system include: 10 × Klenow Polymerase Buffer, 5 μ L, 1 μ L of dATP, 3 μ L of Exon (-) Klenow supplement ddH₂O to 20 μ L.

Reaction solution mixes, and is placed in PCR instrument and is reacted, reaction condition are as follows: 37 DEG C of reaction 30min are kept in 4 DEG C.

90 μ L XP magnetic beads are added in each sample, stand 5min；

Supernatant is removed, 15 μ L ddH are added₂O after standing 5min on magnetic frame, draws 13 μ L liquid to new 1.5mL Centrifuge tube obtains the DNA fragmentation that the end of 13 μ L purifying is repaired.

(3) specific sequence measuring joints are connected at gDNA segment both ends

Each in step (2) is repaired to by end and added the DNA fragmentation of A tail, the i.e. DNA fragmentation of 13 μ L, 37 μ are added The Ligation Master Mix of L is reacted.It includes: 5 × T4DNA that the Ligation Master Mix of 37 μ L, which prepares system, 10 μ L of Ligase Buffer, Sure Select Adapter Oligo Mix10 μ L, 1.5 μ L of T4DNA Ligase, supplement ddH₂O to 37 μ L.

Above-mentioned reaction system is placed in PCR instrument and is reacted, reaction condition are as follows: protected under the conditions of 4 DEG C after 20 DEG C of reaction 15min It holds.

90 μ L XP magnetic beads are added in each sample, stand 5min；

Supernatant is removed, 32 μ L ddH are added₂O after standing 5min on magnetic frame, draws 30 μ L liquid to new 1.5mL Centrifuge tube, i.e. 30 μ L DNA fragmentations of adjunction head.

(4) PCR amplification is connected with the segment gDNA of connector and purifies

This example, which is entirely tested, builds library and hybrid process is using Agilent Sure Select Reagent Kit kit It is operated, PCR amplification primer is also provided by kit.

The DNA fragmentation for taking 15 μ L steps (3) to finally obtain is added 35 μ L PCR Reaction Mix and is reacted.35μL The reaction system of PCR Reaction Mix include: 1.25 μ L of Sure Select Primer, Sure Select ILM Indexing Pre-Capture PCR Reverse Primer 1.25μL、5×Herculase II Reaction 10 μ L of Buffer, 0.5 μ L of 100mmol/L dNTP Mix, 1 μ L of Herculase II Fusion DNA Polymerase, Supplement ddH₂O to 35 μ L.

PCR reaction solution is mixed, following reaction condition: 95 DEG C of 2min is run in PCR instrument, is recycled subsequently into 10: 95 DEG C 30s, 65 DEG C of 30s, 72 DEG C of 60s, after circulation terminates 72 DEG C of 10min, 4 DEG C of preservations.

90 μ L XP magnetic beads are added in obtained PCR reaction product, each sample, stand 5min；

Supernatant is removed, 30 μ L ddH are added₂O after standing 5min on magnetic frame, draws 28 μ L liquid to new 1.5mL Centrifuge tube, the 28 μ L liquid are the pcr amplification product purified, that is, the library gDNA with given joint purified.

(5) using the probe of this example design and the library gDNA with given joint after purification in hybridization buffer into Row hybridization, specific as follows:

A) library gDNA that step (4) finally obtains is diluted to the concentration of 221ng/ μ L, draw 3.4 libraries μ L gDNA, Total amount is about 750ng, into new 1.5mL centrifuge tube.Add 5.6 μ L Sure Select in the library gDNA of each sample Block Mix, is reacted in PCR instrument after mixing.Reaction condition are as follows: 95 DEG C of reaction 5min；65 DEG C of holdings.5.6μL Sure It includes: the SureSelect Indexing Block 1 of 2.5 μ L, 2.5 μ L that Select Block Mix, which prepares system, The Sure Select ILM Indexing Block 3 of SureSelect Block2 and 0.6 μ L.

B) in the library gDNA for finally obtaining step a), 20 μ L Capture Library are added in each sample Hybridization Mix；Sample is mixed, is placed in PCR instrument and carries out hybridization reaction, reaction condition are as follows: 65 DEG C of reactions are for 24 hours. The preparation system of the Capture Library Hybridization Mix of 20 μ L includes: the SureSelect Hyb of 6.63 μ L 1, the SureSelect Hyb 2 of 0.27 μ L, the SureSelect Hyb 3 of 2.65 μ L, 3.45 μ L SureSelect Hyb 4, The Capture Library (Probe) of the RNase Blcok and 2 μ L of 5 μ L concentration 10%.

(6) target DNA fragment of hybrid capture is separated and is purified using magnetic bead, obtain plant chloroplast genome DNA library, specific as follows:

SureSelect Wash Buffer 2 is preheated under the conditions of 65 DEG C in advance.By MyOne Streptavidin T1magnetic Beads acutely vibrates resuspension with whirlpool mixed instrument to mixing；200 μ L SureSelect are added into magnetic bead Binding Buffer is stored at room temperature 5min in magnetic frame, and Aspirate supernatant simultaneously abandons.

It is added in magnetic bead using 200 μ L SureSelect Binding Buffer, is acutely vibrated 5 seconds with whirlpool mixed instrument Magnetic bead is resuspended in clock.

Whole hybridization reaction systems are added in above-mentioned 200 μ L magnetic bead and are mixed.Room temperature is mixed on Nutator blending instrument Even 30min.

Sample, which is placed on magnetic frame, to be stored at room temperature 5min and moves back except after supernatant, and 65 DEG C of 200 μ L preheatings are added Magnetic bead is resuspended in SureSelect Wash Buffer 2.65 DEG C of heat preservation 10min in PCR instrument.

The above-mentioned sample by 65 DEG C of heat preservation 10min is placed on magnetic frame and is stored at room temperature 5min；

Supernatant is removed, 30 μ L ddH are added₂O after standing 5min on magnetic frame, draws liquid and is centrifuged to new 1.5mL Pipe, the chloroplast DNA frag-ment libraries as captured.

(7) it is sequenced

The survey of PE100 is carried out using Illumina Hiseq.4000 microarray dataset to the Chloroplast gene library of acquisition Sequence.

(8) sequencing result is analyzed

The joint sequence in initial data generated to sequencing is filtered；Higher low quality base ratio will be contained simultaneously Data be filtered；The filter condition being arranged in this example be low quality base ratio >=10%.Wherein, joint sequence, that is, structure Build the joint sequence added during library.

It by the sequencing data by filtering, is compared using bwa software with template sequence, counts sequencing data in template Coverage in sequence, the results are shown in Table 3.Wherein, tomato, ginkgo or the lotus throne fern disclosed in template sequence, that is, NCBI it is complete Whole Chloroplast gene sequence；The NCBI complete excision genome sequence of the sequencing data of tomato and tomato is compared It is right, count its coverage；The sequencing data of ginkgo is compared with the NCBI complete excision genome sequence of ginkgo, is counted Its coverage；The sequencing data of lotus throne fern is compared with the NCBI complete excision genome sequence of lotus throne fern, counts it Coverage.

3 sequencing data of table and coverage statistical result

In table 3, NCBI number, i.e., number of the three kinds of test plants chloroplaset sequences used in this example on NCBI；Mould Plate sequence length, i.e. test plants Chloroplast gene reference sequences length, reference sequences derive from NCBI；Sequencing data amount, That is the lower machine data volume that the chloroplaset segment of probe capture is generated by sequencing；Average sequencing depth descends machine data volume/template Sequence length；Template sequence length is covered, that is, descends machine data that can compare the base number of template sequence base zone, it can be anti- Mirror the case where base in template sequence region is measured；Sequencing data coverage rate, i.e. covering template sequence length/template are long Degree, reaction is that the base of template sequence can be sequenced the ratio of data cover.

Table 3 the results show that carry out the capture of chloroplast DNA segment using the probe of this example design, and be sequenced, can It is efficient to obtain chloroplast DNA sequencing data, 90% is greater than to the coverage rate of total Chloroplast gene sequence, wherein tomato Coverage rate be even up to 98.5%.Also, the probe and chloroplast DNA sequencing data acquisition methods of this example, to Different Evolutionary The plant species of branch can effectively obtain its chloroplast DNA sequencing data, without Preference, can guarantee that output is sequenced The wide spreadability of data, the acquisition of the chloroplast DNA sequencing data especially suitable for large-scale Different Evolutionary branch species are The further investigation of large-scale plant evolution and heredity is laid a good foundation.

The foregoing is a further detailed description of the present application in conjunction with specific implementation manners, and it cannot be said that this Shen Specific implementation please is only limited to these instructions.For those of ordinary skill in the art to which this application belongs, it is not taking off Under the premise of from the application design, a number of simple deductions or replacements can also be made.

Claims

1. a kind of method of effective acquisition chloroplast DNA sequencing data, it is characterised in that: including the use of chloroplaset nucleic acid sequence collection It closes, constructs nonredundancy chloroplaset sequence intersection；Probe design is carried out according to constructed nonredundancy chloroplaset sequence intersection；Using Designed probe carries out hybrid capture to the full-length genome of sample to be tested, obtains the chloroplast DNA segment of enrichment；To enrichment Chloroplast DNA segment is sequenced, and the chloroplast DNA sequencing data is obtained.

2. according to the method described in claim 1, it is characterized by: the building nonredundancy chloroplaset sequence intersection, specific to wrap Include following steps,

(1) public database is utilized, all chloroplaset nucleic acid sequences disclosed is obtained, obtains the chloroplaset nucleic acid sequence Set；

(2) according to the species information of each chloroplaset nucleic acid sequence, according to spore relationship, with the chloroplaset nucleic acid sequence collection It is combined into fundamental construction spore tree, and all chloroplaset nucleic acid sequences are screened according to constructed spore tree, it is ensured that object The chloroplaset nucleic acid sequence for retaining the 1-2 assembling preferable species of result in each evolutionary branching of kind chadogram, obtains just Beginning chloroplaset nucleic acid sequence intersection；

(3) one of species mark is selected according to the evolutionary degree of spore tree according to initial chloroplaset nucleic acid sequence intersection It is denoted as with reference to species, remaining is labeled as non-reference species, and the sequence of reference species and the sequence of non-reference species are carried out two-by-two It compares, location information of the high similarity homology region in non-reference species gene group is recorded according to comparison result, and by high phase Nucleic acid sequence annotation like degree homology region is N；Meanwhile the sequence of reference species itself is compared, high similarity is same Longest sequence retains in source region, and the nucleic acid sequence annotation of reinforcement similarity homology region is N；

(4) based on the intersection of step (3) removal redundant sequence obtained, according to the method for step (3), replacement is joined one by one Species are examined, comparison is iterated, are come out until not new high similarity homology region is identified, likewise, according to step (3) Method the nucleic acid sequence of high similarity homology region is annotated as N, alternatively, will with reference in species own sequence comparison result compared with The nucleic acid sequence annotation of short high similarity homology region is N；Obtain the nonredundancy chloroplaset sequence intersection.

3. according to the method described in claim 2, it is characterized by: the judgment basis of the high similarity homology region is similar Degree is greater than 90%, and the length of aligned sequences is greater than 90bp.

4. method according to claim 1-3, it is characterised in that: the nonredundancy chloroplaset according to constructed by Sequence intersection carries out probe design, specifically includes following steps,

(1) according to nonredundancy chloroplaset sequence intersection obtained, the upstream and downstream of its each section of nucleic acid sequence is respectively extended 30-45bp obtains the location coordinate information of probe design section；If the upstream of certain section of nucleic acid sequence or downstream area base are long Degree is less than 30bp, then directly using the location information of this section of nucleic acid sequence as the location coordinate information of probe design section；

(2) location coordinate information obtained according to step (1), in the probe design section of location coordinate information mark It is interior, design the specific hybrid capture probe of each nucleic acid sequence in nonredundancy chloroplaset sequence intersection.

5. method according to claim 1-4 chloroplast DNA enrichment, chloroplaset library construction, based on leaf it is green Application in the research of extensive plant evolution or genetic research of body information.

6. a kind of method for preparing chloroplast DNA segment hybrid capture probe, it is characterised in that: include the following steps,

(1) chloroplaset nucleic acid sequence set is utilized, nonredundancy chloroplaset sequence intersection is constructed；

(2) probe design is carried out according to constructed nonredundancy chloroplaset sequence intersection, it is miscellaneous obtains the chloroplast DNA segment Hand over capture probe.

7. according to the method described in claim 6, it is characterized by: constructing nonredundancy chloroplaset sequence in the step (1) Intersection specifically includes,

(1) initial chloroplaset nucleic acid sequence intersection is obtained: according to the species information of each chloroplaset nucleic acid sequence, according to spore Relationship is constructed species chadogram based on chloroplaset nucleic acid sequence set, and screens institute according to constructed spore tree There is chloroplaset nucleic acid sequence, it is ensured that retain the 1-2 preferable species of assembling result in each evolutionary branching of spore tree Chloroplaset nucleic acid sequence, obtain initial chloroplaset nucleic acid sequence intersection；

(2) de-redundancy is carried out based on reference to species: according to initial chloroplaset nucleic acid sequence intersection, according to spore tree Evolutionary degree is selected one of species and is labeled as with reference to species, remaining is labeled as non-reference species, by the sequence of reference species It is compared two-by-two with the sequence of non-reference species, high similarity homology region is recorded in non-reference species base according to comparison result It annotates because of the location information in group, and by the nucleic acid sequence of high similarity homology region as N；Meanwhile to reference species itself Sequence is compared, and sequence longest in high similarity homology region is retained, the nucleic acid sequence of reinforcement similarity homology region Column annotation is N；

(3) iteration, which compares, obtains nonredundancy chloroplaset sequence intersection: the intersection with step (2) removal redundant sequence obtained is Basis, according to the method for step (2), replacement refers to species one by one, is iterated comparison, until not new high similarity is homologous Region is identified to be come out, likewise, the method according to step (2) annotates the nucleic acid sequence of high similarity homology region for N, or Person annotates the nucleic acid sequence of high similarity homology region shorter in reference species own sequence comparison result for N；Obtain The nonredundancy chloroplaset sequence intersection；

Preferential, in step (2), probe design is carried out according to constructed nonredundancy chloroplaset sequence intersection, specifically includes,

(1) location coordinate information of probe design section is determined: every to its according to nonredundancy chloroplaset sequence intersection obtained The upstream and downstream of one section of nucleic acid sequence respectively extends 30-45bp, obtains the location coordinate information of probe design section；If certain section The upstream of nucleic acid sequence or downstream area bases longs are less than 30bp, then directly using the location information of this section of nucleic acid sequence as spy The location coordinate information of needle design section；

(2) probe designs: the location coordinate information obtained according to step (1), in the spy of location coordinate information mark In needle design section, the specific hybrid capture probe of each nucleic acid sequence in nonredundancy chloroplaset sequence intersection is designed, i.e. leaf is green Body DNA fragmentation hybrid capture probe；

Preferential, in the step (1), chloroplaset nucleic acid sequence collection is combined into all having draped over one's shoulders using public database acquisition The set of the chloroplaset nucleic acid sequence of dew.

8. the chloroplast DNA segment hybrid capture probe of method preparation according to claim 6 or 7.

9. a kind of method of chloroplast DNA enrichment, it is characterised in that: including using chloroplast DNA segment according to any one of claims 8 Hybrid capture probe carries out hybrid capture to the full-length genome of sample to be tested, realizes chloroplast DNA enrichment.

10. a kind of construction method in chloroplast DNA library, it is characterised in that: including using chloroplaset according to any one of claims 8 DNA fragmentation hybrid capture probe carries out hybrid capture to the full-length genome of sample to be tested, realizes chloroplast DNA enrichment, then adopt Library construction is carried out with the chloroplast DNA of enrichment, obtains the chloroplast DNA library.