CN109338011A

CN109338011A - A kind of method of the gene of high flux screening Plant Genome difference equipotential expression

Info

Publication number: CN109338011A
Application number: CN201811563126.4A
Authority: CN
Inventors: 张德强; 轩安然; 宋跃朋
Original assignee: Beijing Forestry University
Current assignee: Beijing Forestry University
Priority date: 2018-12-20
Filing date: 2018-12-20
Publication date: 2019-02-15
Anticipated expiration: 2038-12-20
Also published as: CN109338011B

Abstract

The present invention provides a kind of methods in the heterozygous variance site of high flux screening Plant Genome equipotential imbalance expression, belong to genetics research field, comprising the following steps: 1) external source Control factors processing vegetable material obtains processing sample；2) total serum IgE of extraction process sample and control sample constructs chain specificity sequencing library and carries out high-flux sequence；3) it filters the raw sequencing data and obtains clean reads；4) acquisition comparison result is compared with reference genome respectively in the clean reads；5) extra duplication read in comparison result is deleted, the sequence around the site INDEL will occur and carry out comparing acquisition final comparison result data again；6) the final comparison result data is subjected to the heterozygous variance site that loci variation screening obtains the expression of equipotential imbalance.The method is solved parses genome to the analysis problem of extraneous environmental response in equipotential level.

Description

A kind of method of the gene of high flux screening Plant Genome difference equipotential expression

Technical field

The invention belongs to genetics research fields more particularly to a kind of high flux screening Plant Genome difference equipotential to express Gene method.

Background technique

With the development of sequencing technologies, high throughput sequencing technologies using more and more extensive, be hereditary information announcement and The biological studies such as gene expression regulation provide important information, have become the research most common experiment skill of genomics Art.It is rapidly developed the advantages that over 1973, Sanger PCR sequencing PCR is convenient and simple with its, reliable accurate, sequencing fragment is long, And it is widely used in scientific research, significant contribution is made that for scientific development.But Sanger sequencing is again because it can not be further The disadvantages of expanding parallel and milligram ammonia limits further research.High-flux sequence comes into being, it can be disposable right Millions of to 1,000,000,000 DNA moleculars are sequenced parallel, allow to transcript profile and genome progress depth to a species Enter, is careful, overall picture analysis, also known as deep sequencing.

Biology generates evolution along with environmental change, has occurred the variation of genome sequence, and from resulting in " gene Polymorphism ", wherein also including polymorphic allele, one side allelic variation occurs to be likely to result in protein function in code area It changes to influence phenotype, on the other hand variation occurs may also to have an impact the expression of gene in noncoding region and most Lead to the different to influence phenotype of allele expression quantity eventually.

Under extraneous environmental stimulus, the expression of portion gene does not show significant difference, therefore is difficult to distinguish Further functional study is carried out with the gene that response is handled is filtered out, the expression of difference equipotential can directly and sensitively reflect each Kind biology and influence of the abiotic Control factors for expression.The gene for filtering out the expression of difference equipotential can be to external environment The regulatory mechanism of stimulation plant growth carries out deep parsing.

Summary of the invention

In consideration of it, the purpose of the present invention is to provide a kind of genes of high flux screening Plant Genome difference equipotential expression Method, after the method quickly can accurately filter out external source environmental stimulus, Plant Genome difference equipotential expression base Cause.

In order to achieve the above-mentioned object of the invention, the present invention provides following technical schemes:

A kind of method in the heterozygous variance site of high flux screening Plant Genome equipotential imbalance expression, including following step It is rapid:

1) external source Control factors processing vegetable material obtains processing sample, and the sample for not carrying out external source Control factors processing is Control sample；

2) extraction step 1) described in handle the total serum IgE of sample and control sample, building processing sample and control sample Chain specificity sequencing library；Text is sequenced with chain specificity of the Illumina Hiseq2500 to the processing sample and control sample Library carries out the raw sequencing data that high-flux sequence obtains processing sample and control sample；

3) raw sequencing data for filtering the processing sample and control sample obtains processing sample and control sample clean reads；

4) the clean reads of the processing sample and control sample is compared at acquisition with reference genome respectively Manage sample comparison result and control sample comparison result；

5) it is formed in delete processing sample comparison result and control sample comparison result by PCR amplification respectively identical more Then remaining duplication read the total 500bp sequence of the site INDEL upstream and downstream will occurs and be compared at acquisition again with reference to genome Manage the final comparison result data of sample and control sample；

6) the final comparison result data of the processing sample and control sample is subjected to loci variation screening, screening The processing heterozygous sites that sample is identical as genotype in control sample, expression quantity is different, obtain the heterozygosis of equipotential imbalance expression Variant sites.

Preferably, the external source Control factors include exogenous hormone and environment-stress.

Preferably, the environment-stress includes arid, saline and alkaline, freezing and flood.

Preferably, the plant is forest.

Preferably, the forest is poplar.

Preferably, the raw sequencing data of the filtering processing sample and control sample uses fastx in step 3) (version:0.0.13) it carries out.

Preferably, the variation of loci described in step 6) screening is carried out using SNP calling.

It preferably, further include to the imbalance after the step 6) obtains the heterozygous variance site that equipotential imbalance is expressed The heterozygous variance site of expression carries out the annotation on genome, excavates gene where the heterozygous variance site of the uneven expression Or the Allele Specific expression pattern of genomic elements.

Preferably, when the plant is poplar, the reference genome is Populus trichocarpa v3.0Poplar。

Preferably, high-flux sequence described in step 2) is both-end sequencing, a length of 100nt of the reading of the high-flux sequence.

Beneficial effects of the present invention: the method for the invention is by extracting the sample and untreated of external source Control factors processing Control sample RNA, construct chain specificity sequencing library, carry out high-flux sequence, utilize transcript profile data parse equipotential position The expression pattern of point, and can accurately screen in the variation position of external source Control factors equipotential level difference expression before and after the processing Point；The method is solved parses genome to the analysis problem of extraneous environmental response in equipotential level, is surveyed using transcript profile Ordinal number carries out parting according to genome, is systematically screened in equipotential level to the expression of equipotential Site discrepancy.The present invention is also Second generation high throughput sequencing technologies are utilized, high throughput is carried out to the uneven expression that heterozygosity site before and after the processing occurs Screening.

Detailed description of the invention

Fig. 1 is the screening that the equipotential difference expression gene of HORMONE TREATMENT is responded in forest provided by the embodiment of the present invention 1 Method flow schematic diagram；

Fig. 2 is the comparison statistical chart in the allelic variation site of IAA processing group and control sample.

Specific embodiment

The present invention provides a kind of sides in the heterozygous variance site of high flux screening Plant Genome equipotential imbalance expression Method, comprising the following steps:

3) raw sequencing data for filtering the processing sample and control sample obtains processing sample and control sample Clean reads (clean data)；

4) the clean reads (clean data) of the processing sample and control sample is carried out with reference genome respectively It compares and obtains processing sample comparison result and control sample comparison result；

5) it is formed in delete processing sample comparison result and control sample comparison result by PCR amplification respectively identical more Then remaining duplication read is compared the total 500bp of upstream and downstream for INDEL (insertion and deletion) site occur with reference to genome again To the final comparison result data for obtaining processing sample and control sample；

In the present invention, the plant is preferably forest, more preferably poplar, in specific implementation process of the present invention, institute Stating poplar is preferably Chinese white poplar or populus simonii.In the present invention, the vegetable material is preferably re selection blade.The present invention In, the external source Control factors preferably include exogenous hormone and environment-stress；The exogenous hormone preferably include auxin, The basic element of cell division, gibberellin and abscisic acid, in specific implementation process of the present invention, the exogenous hormone is preferably auxin IAA； Heretofore described environment-stress preferably includes arid, saline and alkaline, freezing and flood, in specific implementation process of the present invention, institute Environment-stress is stated as arid.

In the present invention, when the external source Control factors are exogenous hormone, the processing is preferably molten with exogenous hormone Blade to the blade of liquid spray on plants re selection has liquid drippage, and the control sample uses pure water spray on plants clone The blade of plant to blade has liquid drippage.The present invention does not have particular/special requirement to the concentration of the exogenous hormone solution, in this hair In bright specific implementation process, using 100 μM of IAA solution.The present invention 4~8h after exogenous hormone processing, described in extraction Handle the RNA of sample and control sample blade.In the present invention, when the external source Control factors are drought stress, the place Reason is preferably that 8~15d does not water, the untreated preferably normal watering of the control sample.The present invention after the treatment, mentions Take the RNA of the processing sample and control sample blade.

In the present invention, the method for extracting the total serum IgE of the processing sample and control sample is preferably CTAB method.The present invention There is no particular/special requirement to the design parameter and step of the CTAB method, using the CTAB method of this field routine.

The present invention preferably carries out purifying and integrity detection to the total serum IgE after obtaining the total serum IgE；It is described pure Change and preferably carried out using RNase-Free DNase Set (Qiagen), the integrity detection is preferably to the purifying Total serum IgE afterwards carries out agarose electrophoresis, observes band, and as included three bands in electrophoretic band, and band is bright, clearly, side Edge is sharp keen, then illustrates that the integrality of the total serum IgE is good；Wherein two bands are brighter above, respectively represent 28S and 18S rRNA, And first band (28S rRNA) brightness should be 2 times of Article 2 (18S rRNA) brightness.The present invention is obtaining the purifying It further include measuring its purity and total amount after total serum IgE；The measuring method is specially to use using RNase-free water as blank control Spectrophotometer measures A230, A260 and A280 value of each sample total serum IgE respectively, determines the purity of RNA sample and calculates it always Amount selects the sample of purity qualification to carry out subsequent operation, needs to extract again if purity is unqualified.A260/A280 and A260/ A230 is the indicated value of RNA purity, and the ratio of A260/A280 indicates that the purity of RNA is good 1.8~2.0 at pH7-8.5；It is pure Net sample A260/A230 ratio should be greater than 2.0 (RNA).If ratio is lower than 2.0, indicate that there are protein or phenolic materials The influence of matter needs to extract the total serum IgE of sample again.The calculating of the RNA total amount measures OD using passing through for this field routine The calculation method of value.

The present invention is in the total serum IgE for obtaining the processing sample and control sample, the chain of building processing sample and control sample Specific sequencing library；With Illumina Hiseq2500 to the chain specificity sequencing library of the processing sample and control sample Carry out the raw sequencing data that high-flux sequence obtains processing sample and control sample.The building handles sample in the present invention The step of carrying out high-flux sequence with the chain specificity sequencing library of control sample and to the library preferably entrusts sequencing Company carries out, and bold and unconstrained biology Co., Ltd (the Shanghai Bioarray of Shanghai uncle is entrusted in specific implementation process of the present invention Co.Ltd. it) completes.In the present invention, the high-flux sequence is preferably both-end sequencing, and the reading length of the high-flux sequence is preferably 100nt.In the present invention, the quality control standard of the high-flux sequence is preferred are as follows: initial data is many after each sample provides sequencing In 10G；Sequencing reading length is more than 90nt, and every ratio to base quality greater than 20 (Q20) is not less than 85%.It is described in the present invention Chain specific DNA sequencing library is specific the following steps are included: by the processing group of extraction and laboratory sample plant total serum IgE, utilizes Ribo-Zero^TMRRNA Removal Kits (Plant) kit removes rRNA, followed by paramagnetic particle method combination Poly's (A) RNA is obtained Poly (A)-RNA sample, is digested using RNase Rd to linear rna, and Poly (A) -/Ribo-RNA sample is obtained Product finally construct the chain specificity sequencing library of processing sample and control sample respectively.

The present invention after obtaining the raw sequencing data of the processing sample and control sample, filter the processing sample with The raw sequencing data of control sample obtains the clean reads (clean data) of processing sample and control sample.In the present invention, Sequencing obtains in Raw Reads may be lower containing overall quality, relatively low etc. underproof containing sequencing primer, end mass Reads, the underproof Reads probably affect to analysis quality, so must be filtered to it； In the present invention, the filtering is preferably carried out using fastx (version:0.0.13), network address: http: // Hannonlab.cshl.edu/fastx_toolkit/index.html, comprising the following steps:

The relatively low reads of overall quality is removed, Q is greater than reads of the 20 base proportions less than 50% and is removed；

3 ' end quality Q of removal is lower than 10 base, i.e. base error rate is less than 0.1, wherein Q=-10logerror_ ratio；

Remove joint sequence contained in reads；

The fuzzy N base contained in removal reads, the N base are since sequencing intensity is inadequate, and machine can not identify Base；

Remove the reads that length is less than 20nt；

Ribosome RNA reads is removed, all rRNA using processing vegetable material are as comparing template, in comparison The reads of template is ribosome RNA reads.

The present invention divides the clean reads of the processing sample and control sample after obtaining the clean reads Acquisition processing sample comparison result and control sample comparison result is not compared with reference genome；The comparison is preferably answered It is carried out with the spliced mapping algorithm of tophat (version:2.0.9) software, filtered reads is subjected to gene Group mapping, the spliced mapping algorithm allow to be unable to the matched reads segmentation of overall length and carry out mapping, compared with Suitable for eukaryon (have introne between area) transcript profile sequencing data；Described to compare mispairing number≤3 allowed, each reads allows Hits≤1 multi, the comparison result that the genome mapping is generated generate BAM file.

The present invention is after obtaining the comparison result, difference delete processing sample comparison result and control sample comparison result In identical extra duplication read formed by PCR amplification.During preparing chain specificity sequencing library, since PCR expands Increasing can have some deviations, if there are two reads length having the same and comparisons in processing sample or control sample The same position with reference to genome has been arrived, and being considered as such reads is the reads to be deleted from PCR amplification It removes.The upstream and downstream for the site INDEL occur is total to by the present invention after deletion forms identical extra duplication read by PCR amplification 500bp and the final comparison result data for being compared acquisition processing sample and control sample again with reference to genome；Preferably INDEL site upstream 250bp and downstream 250bp to occur to be compared again with reference to genome, near INDEL Alignment is usually inaccurate, needs to carry out realignment using known INDEL information.In the present invention, aforesaid operations tool Body the following steps are included:

Input file: the comparison result file (BAM format) that the tophat (version:2.0.9) software is generated, It is inputted with reference genome sequence file (file format fasta, such as genome.fa), uses samtools software pair Genome.fa file establishes index；Using the SortSam algorithm in picard-tools software by dyeing same in BAM file The corresponding entry of body is ranked up from small to large according to coordinate sequence；Identification is gone to be expanded by PCR using picard-tools in next step Increasing is formed by duplicates, and a flag is arranged to indicate them to these sequences.It carries out in next step RealignerTargetCreator, output one include the file of possible indels；Then IndelRealigner carries out realign using this file.Parameter is set as VALIDATION_STRINGENCY= LENIENT。

Finally, by the comparison result progress form modifying of sample each after processing, addition header file (microarray dataset information, Sample names etc., parameter setting are as follows: PL=illumina, PU=pgPU, SM=sample names), carry out subsequent variant sites inspection It surveys.

The final comparison result data of the processing sample and control sample is carried out loci variation screening by the present invention, The heterozygous variance site that Screening Treatment sample is identical as genotype in control sample, expression quantity is different obtains equipotential imbalance table The heterozygous variance site reached.Heretofore described loci variation screening is preferably carried out using SNP calling, the present invention In, the SNP calling preferably uses GATK (version:4.0.1.0) software to carry out, the specific steps are as follows:

It is detected using the variation that HaplotypeCaller tool in software carries out each sample, -- pair-hmm-gap- Continuation-penalty parameter is set as 10, -- emit-ref-confidence GVCF, remaining parameter are default Value, obtains the variation information of each sample；

Using the CombineGVCFs tool in GATK software by the variation file mergences of each sample；

It recycles the GenotypeGVCFs tool in GATK software to carry out the detection of the allelic variation between each sample, generates one A vcf file includes the variant sites and genotype information of all samples in vcf file；

Finally it is filtered using the VariantFiltration tool in GATK software, to reduce false positive, setting The detection of 35bp sliding window, 35bp is interior will to be marked as SNP cluster and be filtered removal containing 3 or more SNP, Finally obtain the allelic variation site information of transcript profile.

Position that loci make a variation with reference genome compared with occurs filtering out processing sample, control sample for the present invention Then point filters out the variant sites of the identical heterozygosis of genotype from the site of the allelic variation, then again from obtaining The different site of expression quantity obtains the heterozygous variance site of equipotential imbalance expression before and after Screening Treatment in the variant sites of heterozygosis. In the present invention, the expression quantity is calculated by comparing the reads number to specific loci.

The present invention further includes to the miscellaneous of the uneven expression behind the heterozygous variance site for obtaining the expression of equipotential imbalance The annotation on variant sites progress genome is closed, gene or genome where the heterozygous variance site of the uneven expression are excavated The Allele Specific expression pattern of element.

The VCF file in the heterozygous variance site that the present invention expresses the equipotential imbalance is carried out using ANNOVAR tool Annotation, ANNOVAR support three kinds of various forms of annotations: the annotation based on gene, annotation based on region and are based on filter Annotation.These three annotate the different aspect for being directed to each variant sites respectively: the annotation (gene-based based on gene Annotation) announcement variant sites and functional impact known direct relationship and it is generated；Based on region Annotation (region-based annotation) disclose variant and different genes group particular segment relationship, such as: it is It is no to fall in known conservative gene group region；This is then provided based on the annotation (filter-based annotation) for crossing filter The range of information of a variant, such as: the frequency in different groups.The note based on gene is preferably carried out in the present invention It releases, the gene annotation includes the following steps:

Database prepares, and downloads corresponding comment file to specified directory such as :/data2/disk1/ptrdbAnnovar/, Annotation based on gene, ANNOVAR need the gene annotation file (can convert to obtain by gff3) and FASTA of genePred format The transcript sequence file of format；

Change data format: the mistake for being obtained S106 step using the convert2annovar tool in ANNOVAR software Vcf file (sample_filter.vcf) is changed to the specified format of ANNOVAR after filter；

It is annotated using table_annovar.pl tool in ANNOVAR software.

What the generation difference imbalance for filtering out stable heterozygosis that the method for the invention is capable of fast high-flux was expressed etc. Position site, and can know the change degree of external source Control factors loci imbalance expression before and after the processing.

Technical solution provided by the invention is described in detail below with reference to embodiment, but they cannot be understood For limiting the scope of the present invention.

Embodiment 1

The acquisition of raw material: Populus Tomentosa Superior Clones (Populus tomentosa ' 1316 ') derive from national Chinese white poplar Germplasm resource bank；

The various reagents used in CTAB method are commercial product；

Specific steps are as follows:

Step S101, select the annual plant of Populus Tomentosa Superior Clones, the re selection for experiment be planted in In soil, the flowerpot that turf and perlite (volume ratio 1:1:1) are matrix, place be Beijing woods University greenhouse (40 ° of 0'N, 116 ° of 20'E), daily light application time is 16 hours, 20 DEG C of temperature.Using 100 μM of IAA solution, (IAA powder is dissolved in 95% wine Essence, then 100 μM are settled to distilled water.) blade of laboratory sample Populus tomentosa Clones plant is sprayed, until on blade There is liquid drippage；Control sample is sprayed with water in an identical manner.Choose 6h control sample and laboratory sample after IAA is handled Blade be sequenced for transcript profile, acquire the climax leaves of control sample and laboratory sample plant same area, laboratory sample and right Product respectively include that three biology repeat in the same old way.

The total serum IgE of the sample of acquisition is extracted using conventional CTAB method.Then RNase-Free DNase is used Set (Qiagen) purifies the RNA of extraction.The integrality of agarose gel electrophoresis detection RNA sample.Integrality judgement mark Quasi-: complete RNA agarose electrophoresis has three bands, above two bands it is most bright, respectively represent 28S and 18S rRNA, and first Band (28S rRNA) brightness should be 2 times of Article 2 (18S rRNA) brightness；And band is bright, and clearly, clear-cut margin.With RNase-free water is blank control, measures A230, A260 and A280 value of each RNA sample respectively using spectrophotometer, sentences Determine the purity of RNA sample and calculates its total amount, concentration: 0.093 μ g/ μ L mass: 8.83 μ g；A260/A280:3.5.

The processing group and laboratory sample plant total serum IgE extracted according to above-mentioned steps, utilize Ribo-Zero^TMrRNA Removal Kits (Plant) kit removes rRNA and obtains Poly (A)-followed by the RNA of paramagnetic particle method combination Poly (A) RNA sample digests linear rna using RNase Rd, obtains Poly (A) -/Ribo-RNA sample, and building is handled respectively Bidirection chain specificity database.

2500 ultra-high throughput sequenator of implementation steps S102, HiSeq carries out the chain specificity database that the step constructs Both-end sequencing, reads a length of 100nt.Library construction and sequencing are by bold and unconstrained biology Co., Ltd (the Shanghai Bioarray of Shanghai uncle Co.Ltd. it) completes.

S103, sequencing obtains may be lower containing overall quality, inclined containing sequencing primer, end mass in Raw Reads Low underproof Reads, these underproof Reads probably affect to analysis quality, so necessary It is filtered, obtains the clean Reads that can be used for data analysis, then carry out with reference to genome alignment.

First, clean reads is carried out using fastx (version:0.0.13), steps are as follows for main filtration: S1031 The relatively low reads of overall quality is removed, quality is greater than reads of the 20 base proportions less than 50% and is removed；

3 ' end quality Q of S1032 removal is lower than 10 base, i.e. base error rate is less than 0.1, wherein Q=- 10logerror_ratio；

S1033 removes joint sequence contained in reads；

The fuzzy N base contained in S1034 removal reads is the unrecognized alkali of machine since sequencing intensity is inadequate Base；

S1035 removes sequencing fragment (reads) of the length less than 20；

S1036 removes ribosome RNA, using all rRNA of poplar as comparing template, template in comparison Reads is ribosome RNAreads.

Sequencing and filter result are as shown in table 1

The data summarization of 1 Chinese white poplar RNA-seq of table

Summary of P.tomentosa RNA-seq data

S104, main operational steps are as follows:

S1041 input file: the comparison result file (BAM format) that Tophat2 software generates, and refer to genome sequence File, file format are fasta (such as genome.fa)

S1042 establishes genome.fa file using samtools software and indexes.

S1043 utilizes the SortSam algorithm in picard-tools software by the corresponding item of chromosome same in BAM file Mesh is ranked up from small to large according to coordinate sequence.During preparing library, due to that can exist during PCR amplification Deviation, if the two reads length having the same and same position of genome has been arrived in comparison, it is judged that such Reads is from PCR amplification, so going identification to be formed by by PCR amplification using picard-tools in next step A flag is arranged to these sequences to indicate them in duplicates.

Alignment near S1044INDEL is usually inaccurate, needs to carry out using known indel information Realignment, in two steps, the first step carry out RealignerTargetCreator, and output one includes possible The file of indels.Second step IndelRealigner carries out realign using this file.Parameter is set as VALIDATION_ STRINGENCY=LENIENT.

The comparison result of sample each after processing is carried out form modifying by S1045, be added header file (microarray dataset information, Sample names etc., parameter setting are as follows: PL=illumina, PU=pgPU, SM=sample names), carry out subsequent variant sites inspection It surveys.

S105, according to step S104 as a result, carrying out SNP calling, step using GATK (version:4.0.1.0) software It is rapid as follows:

S1051 is detected first with the variation that HaplotypeCaller tool in software carries out each sample, -- pair- Hmm-gap-continuation-penalty parameter is set as 10, -- emit-ref-confidence GVCF, remaining parameter It is default value, obtains the variation information of each sample.

S1052 utilizes the CombineGVCFs tool in GATK software by the variation file mergences of each sample

S1053 recycles the GenotypeGVCFs tool in GATK software to carry out the detection of the allelic variation between each sample, raw It can include the variant sites and genotype information of all samples at a vcf file, in vcf file.

S1054 is finally filtered using the VariantFiltration tool in GATK software, to reduce false positive, 35bp sliding window is arranged to detect, will be marked as SNP cluster containing 3 SNP in 35bp and be filtered removal Finally obtain the allelic variation site information of transcript profile.

S1055, to the reads number for arriving loci, obtains the expression pattern of each loci by calculating ratio, sieves Select the journey of the loci of the generation difference imbalance expression of stable heterozygosis and the equipotential imbalance expression of quantitative Treatment front and back Degree changes.

2 Chinese white poplar of table responds the allelic variation site type and quantity statistics of IAA processing

Poplar is diploid, and two homologue one comes from male parent, and one from female parent.Two dyes under normal circumstances Colour solid same position base is identical, but it is different that homologue same position base can occur in genetic evolution process The case where, material is thus formed a locis.Splice by transcription sequencing read and with reference to genome alignment, it can be found that Certain site has and also has the read inconsistent with genome is referred to reference to the consistent read of genome, and this kind of site is flagged as One heterozygosis loci；It is also possible to only a kind of base and different from reference genome, this is homozygosis 2；Only a kind of alkali Base and as homozygosis 1 consistent with this site base on reference genome.Transcript profile sequencing result before and after the processing is carried out respectively Site identification.Identify Mutation situation classification: heterozygosis-homozygosis 1；Heterozygosis-homozygosis 2；It is homozygous-homozygous；Homozygous 1- heterozygosis；It is homozygous 2- heterozygosis；Heterozygosis-heterozygosis totally six seed type.It is heterozygosis that it is preceding, which to screen wherein processing, is also as stablized for the site of heterozygosis after processing The site of expression, i.e. loci.

As can be seen from Table 2 Chinese white poplar through IAA before and after the processing Allele Specific expression variant sites variation type and The difference situation of quantity.Wherein heterozygosis-hybrid type SNP is the significance bit that can further analyze the expression of equipotential imbalance Point, this type account for the overwhelming majority of variant sites.

Step S106 filters out the heterozygosis of equipotential imbalance expression before and after the processing according to the type and quantity of variant sites Variant sites, to find the gene of generation equipotential differential expression.

The expression of equipotential imbalance occurs after HORMONE TREATMENT according to the loci that table 2 filters out 108152 heterozygosis, it will The mutant gene type of laboratory sample and control sample/with reference to the ratio progress variance analysis between genotypic expression amount, as a result such as Table 3 shows, the uneven expression significant difference of loci before and after the processing.

3 processing group front and back mutant gene type of table/with reference to one-way analysis of variance and the F inspection of genotype

According to the difference of the ratio of different genotype expression quantity, the significant site of 47 equipotential differential expressions is filtered out altogether, And annotated on comospore poplar genome, as shown in table 4.

4 Chinese white poplar of table responds the Allele Specific expressing gene of IAA processing

The title and difference of the significant gene of loci differential expression occurs after Chinese white poplar IAA processing as can be seen from Table 4 Sorrow of separation condition.It can be seen that mutant gene type is generally increased with the ratio with reference to genotype after IAA processing in table 4, equipotential is uneven The ratio of weighing apparatus expression rises, and is predominantly located at the exon region of gene.

Embodiment 2

It below will be with populus simonii (being derived from national populus simonii germplasm resource bank) drought stress allelic differences before and after the processing For the screening process of expression, screening technique provided by the present invention is further illustrated.

The acquisition of raw material: the individual that growth potential is same or similar in populus simonii clone is chosen, is carried out the following processing: arid Processing, does not water for 10 days；Control sample cultivates time watering (water being poured within two days, until soil is sufficiently drenched) by normal.

The various reagents used in CTAB method are commercial product；

Specific steps are as follows:

Step S101, selects populus simonii choiceness plant, and the re selection for experiment is planted in soil, turf It is in the flowerpot of matrix with perlite (volume ratio 1:1:1), place is Beijing woods University greenhouse (40 ° of 0'N, 116 ° of 20' E), daily light application time is 16 hours.Drought stress is carried out to processing group, in such a way that 10 days do not water, control sample is pressed It is normal to cultivate time watering.The blade of control sample and laboratory sample is sequenced for transcript profile after selection processing, acquires control sample The climax leaves of product and laboratory sample plant same area, laboratory sample and control sample respectively include that three biology repeat.

The total serum IgE of micro blade is extracted using conventional CTAB method.

Then the RNA of extraction is purified using RNase-Free DNase Set (Qiagen).Ago-Gel electricity The integrality of swimming detection RNA sample.Using RNsae-free water as blank control, each RNA sample is measured respectively using spectrophotometer A230, A260 and A280 value of product determine the purity of RNA sample and calculate its total amount.

Control sample chain specific cDNA libraries and experiment are established according to the control sample and laboratory sample total serum IgE that extract Sample chain specific cDNA libraries, the building of chain specificity database and sequencing are by the bold and unconstrained biology Co., Ltd (Shanghai of Shanghai uncle Bioarray Co.Ltd.) it completes.

First, clean reads is carried out using fastx (version:0.0.13), steps are as follows for main filtration:

S1031 removes the relatively low reads of overall quality, and quality is greater than 20 readss of the base proportion less than 50% Removal；

S1033 removes joint sequence contained in reads；

S1035 removes sequencing fragment (reads) of the length less than 20；

S1036 removes ribosome RNA, using be poplar all rRNA as comparing template, template in comparison Reads is ribosome RNAreads.

Sequencing and filter result are as shown in table 5.

The data summarization of 5 populus simonii RNA-seq of table

Summary of P.simonii RNA-seq data

S104, main operational steps are as follows:

S1042 establishes genome.fa file using samtools software and indexes；

Alignment near S1044 INDEL is usually inaccurate, needs to carry out using known indel information Realignment, in two steps, the first step carry out RealignerTargetCreator, and output one includes possible The file of indels.Second step IndelRealigner carries out realign using this file.Parameter is set as VALIDATION_ STRINGENCY=LENIENT.

About S105, according to step S104 as a result, carrying out SNP using GATK (version:4.0.1.0) software Calling, steps are as follows:

S1052 utilizes the CombineGVCFs tool in GATK software by the variation file mergences of each sample.

6 populus simonii of table responds the allelic variation site type and quantity statistics of drought stress processing

Variation class of the populus simonii in the variant sites of drought stress Allele Specific expression before and after the processing as can be seen from Table 6 The difference situation of type and quantity.Wherein heterozygosis-hybrid type SNP be can further analyze equipotential imbalance expression it is effective Site, this type also account for the most of of variant sites.

Step S106 can be filtered out before and after the processing according to the type and quantity of the variant sites found in step S105 The heterozygous variance site of equipotential imbalance expression, to find the gene of generation equipotential differential expression.

Screening technique provided by the invention has the advantage that it can be seen from the experimental data of both examples above

1) the method is solved parses genome to the analysis problem of extraneous environmental response in equipotential level, using turning The sequencing of record group carries out parting to genome, is systematically screened in equipotential level to the expression of equipotential Site discrepancy.

2) the method takes full advantage of second generation high throughput sequencing technologies, and SNP site that can be different to idiostatic carries out High-throughput quantization screening；

3) screening and annotation of the method by equipotential difference expression sites can be expressed equipotential level difference occurs Genetic model be further analyzed with verify its SNP function.

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of method in the heterozygous variance site of high flux screening Plant Genome equipotential imbalance expression, including following step It is rapid:

1) external source Control factors processing vegetable material obtains processing sample, does not carry out the sample of external source Control factors processing as control Sample；

2) extraction step 1) described in handle the total serum IgE of sample and control sample, the chain of building processing sample and control sample is special Anisotropic sequencing library；With Illumina Hiseq2500 to it is described processing sample and control sample chain specificity sequencing library into Row high-flux sequence obtains the raw sequencing data of processing sample and control sample；

3) raw sequencing data for filtering the processing sample and control sample obtains processing sample and control sample cleanreads；

4) acquisition processing sample is compared in the cleanreads of the processing sample and control sample with reference genome respectively Product comparison result and control sample comparison result；

5) it is formed in delete processing sample comparison result and control sample comparison result by PCR amplification respectively identical extra multiple Then read processed is compared acquisition processing sample again by the total 500bp of the upstream and downstream for the site INDEL occur and with reference to genome With the final comparison result data of control sample；

6) the final comparison result data of the processing sample and control sample is subjected to loci variation screening, Screening Treatment The heterozygous sites that sample is identical as genotype in control sample, expression quantity is different obtain the heterozygous variance of equipotential imbalance expression Site.

2. the method according to claim 1, wherein the external source Control factors include exogenous hormone and the environment side of body Compel.

3. according to the method described in claim 2, it is characterized in that, the environment-stress includes arid, saline and alkaline, freezing and flood Flood.

4. the method according to claim 1, wherein the plant is forest.

5. according to the method described in claim 4, it is characterized in that, the forest is poplar.

6. the method according to claim 1, wherein the filtering processing sample and control sample in step 3) Raw sequencing data is carried out using fastx (version:0.0.13).

7. the method according to claim 1, wherein the variation screening of loci described in step 6) uses SNP Calling is carried out.

8. the method according to claim 1, wherein the heterozygosis that the step 6) obtains the expression of equipotential imbalance becomes Further include the annotation carried out to the heterozygous variance site of the uneven expression on genome after ectopic sites, excavates the imbalance The Allele Specific expression pattern of gene or genomic elements where the heterozygous variance site of expression.

9. method according to claim 1 or 5, which is characterized in that described to refer to genome when the plant is poplar For Populus trichocarpa v3.0Poplar.

10. the method according to claim 1, wherein high-flux sequence described in step 2) is both-end sequencing, institute State a length of 100nt of reading of high-flux sequence.