CN109338011A - A kind of method of the gene of high flux screening Plant Genome difference equipotential expression - Google Patents
A kind of method of the gene of high flux screening Plant Genome difference equipotential expression Download PDFInfo
- Publication number
- CN109338011A CN109338011A CN201811563126.4A CN201811563126A CN109338011A CN 109338011 A CN109338011 A CN 109338011A CN 201811563126 A CN201811563126 A CN 201811563126A CN 109338011 A CN109338011 A CN 109338011A
- Authority
- CN
- China
- Prior art keywords
- sample
- expression
- processing
- control sample
- comparison result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/6895—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Immunology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Botany (AREA)
- Mycology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention provides a kind of methods in the heterozygous variance site of high flux screening Plant Genome equipotential imbalance expression, belong to genetics research field, comprising the following steps: 1) external source Control factors processing vegetable material obtains processing sample;2) total serum IgE of extraction process sample and control sample constructs chain specificity sequencing library and carries out high-flux sequence;3) it filters the raw sequencing data and obtains clean reads;4) acquisition comparison result is compared with reference genome respectively in the clean reads;5) extra duplication read in comparison result is deleted, the sequence around the site INDEL will occur and carry out comparing acquisition final comparison result data again;6) the final comparison result data is subjected to the heterozygous variance site that loci variation screening obtains the expression of equipotential imbalance.The method is solved parses genome to the analysis problem of extraneous environmental response in equipotential level.
Description
Technical field
The invention belongs to genetics research fields more particularly to a kind of high flux screening Plant Genome difference equipotential to express
Gene method.
Background technique
With the development of sequencing technologies, high throughput sequencing technologies using more and more extensive, be hereditary information announcement and
The biological studies such as gene expression regulation provide important information, have become the research most common experiment skill of genomics
Art.It is rapidly developed the advantages that over 1973, Sanger PCR sequencing PCR is convenient and simple with its, reliable accurate, sequencing fragment is long,
And it is widely used in scientific research, significant contribution is made that for scientific development.But Sanger sequencing is again because it can not be further
The disadvantages of expanding parallel and milligram ammonia limits further research.High-flux sequence comes into being, it can be disposable right
Millions of to 1,000,000,000 DNA moleculars are sequenced parallel, allow to transcript profile and genome progress depth to a species
Enter, is careful, overall picture analysis, also known as deep sequencing.
Biology generates evolution along with environmental change, has occurred the variation of genome sequence, and from resulting in " gene
Polymorphism ", wherein also including polymorphic allele, one side allelic variation occurs to be likely to result in protein function in code area
It changes to influence phenotype, on the other hand variation occurs may also to have an impact the expression of gene in noncoding region and most
Lead to the different to influence phenotype of allele expression quantity eventually.
Under extraneous environmental stimulus, the expression of portion gene does not show significant difference, therefore is difficult to distinguish
Further functional study is carried out with the gene that response is handled is filtered out, the expression of difference equipotential can directly and sensitively reflect each
Kind biology and influence of the abiotic Control factors for expression.The gene for filtering out the expression of difference equipotential can be to external environment
The regulatory mechanism of stimulation plant growth carries out deep parsing.
Summary of the invention
In consideration of it, the purpose of the present invention is to provide a kind of genes of high flux screening Plant Genome difference equipotential expression
Method, after the method quickly can accurately filter out external source environmental stimulus, Plant Genome difference equipotential expression base
Cause.
In order to achieve the above-mentioned object of the invention, the present invention provides following technical schemes:
A kind of method in the heterozygous variance site of high flux screening Plant Genome equipotential imbalance expression, including following step
It is rapid:
1) external source Control factors processing vegetable material obtains processing sample, and the sample for not carrying out external source Control factors processing is
Control sample;
2) extraction step 1) described in handle the total serum IgE of sample and control sample, building processing sample and control sample
Chain specificity sequencing library;Text is sequenced with chain specificity of the Illumina Hiseq2500 to the processing sample and control sample
Library carries out the raw sequencing data that high-flux sequence obtains processing sample and control sample;
3) raw sequencing data for filtering the processing sample and control sample obtains processing sample and control sample
clean reads;
4) the clean reads of the processing sample and control sample is compared at acquisition with reference genome respectively
Manage sample comparison result and control sample comparison result;
5) it is formed in delete processing sample comparison result and control sample comparison result by PCR amplification respectively identical more
Then remaining duplication read the total 500bp sequence of the site INDEL upstream and downstream will occurs and be compared at acquisition again with reference to genome
Manage the final comparison result data of sample and control sample;
6) the final comparison result data of the processing sample and control sample is subjected to loci variation screening, screening
The processing heterozygous sites that sample is identical as genotype in control sample, expression quantity is different, obtain the heterozygosis of equipotential imbalance expression
Variant sites.
Preferably, the external source Control factors include exogenous hormone and environment-stress.
Preferably, the environment-stress includes arid, saline and alkaline, freezing and flood.
Preferably, the plant is forest.
Preferably, the forest is poplar.
Preferably, the raw sequencing data of the filtering processing sample and control sample uses fastx in step 3)
(version:0.0.13) it carries out.
Preferably, the variation of loci described in step 6) screening is carried out using SNP calling.
It preferably, further include to the imbalance after the step 6) obtains the heterozygous variance site that equipotential imbalance is expressed
The heterozygous variance site of expression carries out the annotation on genome, excavates gene where the heterozygous variance site of the uneven expression
Or the Allele Specific expression pattern of genomic elements.
Preferably, when the plant is poplar, the reference genome is Populus trichocarpa
v3.0Poplar。
Preferably, high-flux sequence described in step 2) is both-end sequencing, a length of 100nt of the reading of the high-flux sequence.
Beneficial effects of the present invention: the method for the invention is by extracting the sample and untreated of external source Control factors processing
Control sample RNA, construct chain specificity sequencing library, carry out high-flux sequence, utilize transcript profile data parse equipotential position
The expression pattern of point, and can accurately screen in the variation position of external source Control factors equipotential level difference expression before and after the processing
Point;The method is solved parses genome to the analysis problem of extraneous environmental response in equipotential level, is surveyed using transcript profile
Ordinal number carries out parting according to genome, is systematically screened in equipotential level to the expression of equipotential Site discrepancy.The present invention is also
Second generation high throughput sequencing technologies are utilized, high throughput is carried out to the uneven expression that heterozygosity site before and after the processing occurs
Screening.
Detailed description of the invention
Fig. 1 is the screening that the equipotential difference expression gene of HORMONE TREATMENT is responded in forest provided by the embodiment of the present invention 1
Method flow schematic diagram;
Fig. 2 is the comparison statistical chart in the allelic variation site of IAA processing group and control sample.
Specific embodiment
The present invention provides a kind of sides in the heterozygous variance site of high flux screening Plant Genome equipotential imbalance expression
Method, comprising the following steps:
1) external source Control factors processing vegetable material obtains processing sample, and the sample for not carrying out external source Control factors processing is
Control sample;
2) extraction step 1) described in handle the total serum IgE of sample and control sample, building processing sample and control sample
Chain specificity sequencing library;Text is sequenced with chain specificity of the Illumina Hiseq2500 to the processing sample and control sample
Library carries out the raw sequencing data that high-flux sequence obtains processing sample and control sample;
3) raw sequencing data for filtering the processing sample and control sample obtains processing sample and control sample
Clean reads (clean data);
4) the clean reads (clean data) of the processing sample and control sample is carried out with reference genome respectively
It compares and obtains processing sample comparison result and control sample comparison result;
5) it is formed in delete processing sample comparison result and control sample comparison result by PCR amplification respectively identical more
Then remaining duplication read is compared the total 500bp of upstream and downstream for INDEL (insertion and deletion) site occur with reference to genome again
To the final comparison result data for obtaining processing sample and control sample;
6) the final comparison result data of the processing sample and control sample is subjected to loci variation screening, screening
The processing heterozygous sites that sample is identical as genotype in control sample, expression quantity is different, obtain the heterozygosis of equipotential imbalance expression
Variant sites.
In the present invention, the plant is preferably forest, more preferably poplar, in specific implementation process of the present invention, institute
Stating poplar is preferably Chinese white poplar or populus simonii.In the present invention, the vegetable material is preferably re selection blade.The present invention
In, the external source Control factors preferably include exogenous hormone and environment-stress;The exogenous hormone preferably include auxin,
The basic element of cell division, gibberellin and abscisic acid, in specific implementation process of the present invention, the exogenous hormone is preferably auxin IAA;
Heretofore described environment-stress preferably includes arid, saline and alkaline, freezing and flood, in specific implementation process of the present invention, institute
Environment-stress is stated as arid.
In the present invention, when the external source Control factors are exogenous hormone, the processing is preferably molten with exogenous hormone
Blade to the blade of liquid spray on plants re selection has liquid drippage, and the control sample uses pure water spray on plants clone
The blade of plant to blade has liquid drippage.The present invention does not have particular/special requirement to the concentration of the exogenous hormone solution, in this hair
In bright specific implementation process, using 100 μM of IAA solution.The present invention 4~8h after exogenous hormone processing, described in extraction
Handle the RNA of sample and control sample blade.In the present invention, when the external source Control factors are drought stress, the place
Reason is preferably that 8~15d does not water, the untreated preferably normal watering of the control sample.The present invention after the treatment, mentions
Take the RNA of the processing sample and control sample blade.
In the present invention, the method for extracting the total serum IgE of the processing sample and control sample is preferably CTAB method.The present invention
There is no particular/special requirement to the design parameter and step of the CTAB method, using the CTAB method of this field routine.
The present invention preferably carries out purifying and integrity detection to the total serum IgE after obtaining the total serum IgE;It is described pure
Change and preferably carried out using RNase-Free DNase Set (Qiagen), the integrity detection is preferably to the purifying
Total serum IgE afterwards carries out agarose electrophoresis, observes band, and as included three bands in electrophoretic band, and band is bright, clearly, side
Edge is sharp keen, then illustrates that the integrality of the total serum IgE is good;Wherein two bands are brighter above, respectively represent 28S and 18S rRNA,
And first band (28S rRNA) brightness should be 2 times of Article 2 (18S rRNA) brightness.The present invention is obtaining the purifying
It further include measuring its purity and total amount after total serum IgE;The measuring method is specially to use using RNase-free water as blank control
Spectrophotometer measures A230, A260 and A280 value of each sample total serum IgE respectively, determines the purity of RNA sample and calculates it always
Amount selects the sample of purity qualification to carry out subsequent operation, needs to extract again if purity is unqualified.A260/A280 and A260/
A230 is the indicated value of RNA purity, and the ratio of A260/A280 indicates that the purity of RNA is good 1.8~2.0 at pH7-8.5;It is pure
Net sample A260/A230 ratio should be greater than 2.0 (RNA).If ratio is lower than 2.0, indicate that there are protein or phenolic materials
The influence of matter needs to extract the total serum IgE of sample again.The calculating of the RNA total amount measures OD using passing through for this field routine
The calculation method of value.
The present invention is in the total serum IgE for obtaining the processing sample and control sample, the chain of building processing sample and control sample
Specific sequencing library;With Illumina Hiseq2500 to the chain specificity sequencing library of the processing sample and control sample
Carry out the raw sequencing data that high-flux sequence obtains processing sample and control sample.The building handles sample in the present invention
The step of carrying out high-flux sequence with the chain specificity sequencing library of control sample and to the library preferably entrusts sequencing
Company carries out, and bold and unconstrained biology Co., Ltd (the Shanghai Bioarray of Shanghai uncle is entrusted in specific implementation process of the present invention
Co.Ltd. it) completes.In the present invention, the high-flux sequence is preferably both-end sequencing, and the reading length of the high-flux sequence is preferably
100nt.In the present invention, the quality control standard of the high-flux sequence is preferred are as follows: initial data is many after each sample provides sequencing
In 10G;Sequencing reading length is more than 90nt, and every ratio to base quality greater than 20 (Q20) is not less than 85%.It is described in the present invention
Chain specific DNA sequencing library is specific the following steps are included: by the processing group of extraction and laboratory sample plant total serum IgE, utilizes
Ribo-ZeroTMRRNA Removal Kits (Plant) kit removes rRNA, followed by paramagnetic particle method combination Poly's (A)
RNA is obtained Poly (A)-RNA sample, is digested using RNase Rd to linear rna, and Poly (A) -/Ribo-RNA sample is obtained
Product finally construct the chain specificity sequencing library of processing sample and control sample respectively.
The present invention after obtaining the raw sequencing data of the processing sample and control sample, filter the processing sample with
The raw sequencing data of control sample obtains the clean reads (clean data) of processing sample and control sample.In the present invention,
Sequencing obtains in Raw Reads may be lower containing overall quality, relatively low etc. underproof containing sequencing primer, end mass
Reads, the underproof Reads probably affect to analysis quality, so must be filtered to it;
In the present invention, the filtering is preferably carried out using fastx (version:0.0.13), network address: http: //
Hannonlab.cshl.edu/fastx_toolkit/index.html, comprising the following steps:
The relatively low reads of overall quality is removed, Q is greater than reads of the 20 base proportions less than 50% and is removed;
3 ' end quality Q of removal is lower than 10 base, i.e. base error rate is less than 0.1, wherein Q=-10logerror_
ratio;
Remove joint sequence contained in reads;
The fuzzy N base contained in removal reads, the N base are since sequencing intensity is inadequate, and machine can not identify
Base;
Remove the reads that length is less than 20nt;
Ribosome RNA reads is removed, all rRNA using processing vegetable material are as comparing template, in comparison
The reads of template is ribosome RNA reads.
The present invention divides the clean reads of the processing sample and control sample after obtaining the clean reads
Acquisition processing sample comparison result and control sample comparison result is not compared with reference genome;The comparison is preferably answered
It is carried out with the spliced mapping algorithm of tophat (version:2.0.9) software, filtered reads is subjected to gene
Group mapping, the spliced mapping algorithm allow to be unable to the matched reads segmentation of overall length and carry out mapping, compared with
Suitable for eukaryon (have introne between area) transcript profile sequencing data;Described to compare mispairing number≤3 allowed, each reads allows
Hits≤1 multi, the comparison result that the genome mapping is generated generate BAM file.
The present invention is after obtaining the comparison result, difference delete processing sample comparison result and control sample comparison result
In identical extra duplication read formed by PCR amplification.During preparing chain specificity sequencing library, since PCR expands
Increasing can have some deviations, if there are two reads length having the same and comparisons in processing sample or control sample
The same position with reference to genome has been arrived, and being considered as such reads is the reads to be deleted from PCR amplification
It removes.The upstream and downstream for the site INDEL occur is total to by the present invention after deletion forms identical extra duplication read by PCR amplification
500bp and the final comparison result data for being compared acquisition processing sample and control sample again with reference to genome;Preferably
INDEL site upstream 250bp and downstream 250bp to occur to be compared again with reference to genome, near INDEL
Alignment is usually inaccurate, needs to carry out realignment using known INDEL information.In the present invention, aforesaid operations tool
Body the following steps are included:
Input file: the comparison result file (BAM format) that the tophat (version:2.0.9) software is generated,
It is inputted with reference genome sequence file (file format fasta, such as genome.fa), uses samtools software pair
Genome.fa file establishes index;Using the SortSam algorithm in picard-tools software by dyeing same in BAM file
The corresponding entry of body is ranked up from small to large according to coordinate sequence;Identification is gone to be expanded by PCR using picard-tools in next step
Increasing is formed by duplicates, and a flag is arranged to indicate them to these sequences.It carries out in next step
RealignerTargetCreator, output one include the file of possible indels;Then
IndelRealigner carries out realign using this file.Parameter is set as VALIDATION_STRINGENCY=
LENIENT。
Finally, by the comparison result progress form modifying of sample each after processing, addition header file (microarray dataset information,
Sample names etc., parameter setting are as follows: PL=illumina, PU=pgPU, SM=sample names), carry out subsequent variant sites inspection
It surveys.
The final comparison result data of the processing sample and control sample is carried out loci variation screening by the present invention,
The heterozygous variance site that Screening Treatment sample is identical as genotype in control sample, expression quantity is different obtains equipotential imbalance table
The heterozygous variance site reached.Heretofore described loci variation screening is preferably carried out using SNP calling, the present invention
In, the SNP calling preferably uses GATK (version:4.0.1.0) software to carry out, the specific steps are as follows:
It is detected using the variation that HaplotypeCaller tool in software carries out each sample, -- pair-hmm-gap-
Continuation-penalty parameter is set as 10, -- emit-ref-confidence GVCF, remaining parameter are default
Value, obtains the variation information of each sample;
Using the CombineGVCFs tool in GATK software by the variation file mergences of each sample;
It recycles the GenotypeGVCFs tool in GATK software to carry out the detection of the allelic variation between each sample, generates one
A vcf file includes the variant sites and genotype information of all samples in vcf file;
Finally it is filtered using the VariantFiltration tool in GATK software, to reduce false positive, setting
The detection of 35bp sliding window, 35bp is interior will to be marked as SNP cluster and be filtered removal containing 3 or more SNP,
Finally obtain the allelic variation site information of transcript profile.
Position that loci make a variation with reference genome compared with occurs filtering out processing sample, control sample for the present invention
Then point filters out the variant sites of the identical heterozygosis of genotype from the site of the allelic variation, then again from obtaining
The different site of expression quantity obtains the heterozygous variance site of equipotential imbalance expression before and after Screening Treatment in the variant sites of heterozygosis.
In the present invention, the expression quantity is calculated by comparing the reads number to specific loci.
The present invention further includes to the miscellaneous of the uneven expression behind the heterozygous variance site for obtaining the expression of equipotential imbalance
The annotation on variant sites progress genome is closed, gene or genome where the heterozygous variance site of the uneven expression are excavated
The Allele Specific expression pattern of element.
The VCF file in the heterozygous variance site that the present invention expresses the equipotential imbalance is carried out using ANNOVAR tool
Annotation, ANNOVAR support three kinds of various forms of annotations: the annotation based on gene, annotation based on region and are based on filter
Annotation.These three annotate the different aspect for being directed to each variant sites respectively: the annotation (gene-based based on gene
Annotation) announcement variant sites and functional impact known direct relationship and it is generated;Based on region
Annotation (region-based annotation) disclose variant and different genes group particular segment relationship, such as: it is
It is no to fall in known conservative gene group region;This is then provided based on the annotation (filter-based annotation) for crossing filter
The range of information of a variant, such as: the frequency in different groups.The note based on gene is preferably carried out in the present invention
It releases, the gene annotation includes the following steps:
Database prepares, and downloads corresponding comment file to specified directory such as :/data2/disk1/ptrdbAnnovar/,
Annotation based on gene, ANNOVAR need the gene annotation file (can convert to obtain by gff3) and FASTA of genePred format
The transcript sequence file of format;
Change data format: the mistake for being obtained S106 step using the convert2annovar tool in ANNOVAR software
Vcf file (sample_filter.vcf) is changed to the specified format of ANNOVAR after filter;
It is annotated using table_annovar.pl tool in ANNOVAR software.
What the generation difference imbalance for filtering out stable heterozygosis that the method for the invention is capable of fast high-flux was expressed etc.
Position site, and can know the change degree of external source Control factors loci imbalance expression before and after the processing.
Technical solution provided by the invention is described in detail below with reference to embodiment, but they cannot be understood
For limiting the scope of the present invention.
Embodiment 1
The acquisition of raw material: Populus Tomentosa Superior Clones (Populus tomentosa ' 1316 ') derive from national Chinese white poplar
Germplasm resource bank;
The various reagents used in CTAB method are commercial product;
Specific steps are as follows:
Step S101, select the annual plant of Populus Tomentosa Superior Clones, the re selection for experiment be planted in
In soil, the flowerpot that turf and perlite (volume ratio 1:1:1) are matrix, place be Beijing woods University greenhouse (40 ° of 0'N,
116 ° of 20'E), daily light application time is 16 hours, 20 DEG C of temperature.Using 100 μM of IAA solution, (IAA powder is dissolved in 95% wine
Essence, then 100 μM are settled to distilled water.) blade of laboratory sample Populus tomentosa Clones plant is sprayed, until on blade
There is liquid drippage;Control sample is sprayed with water in an identical manner.Choose 6h control sample and laboratory sample after IAA is handled
Blade be sequenced for transcript profile, acquire the climax leaves of control sample and laboratory sample plant same area, laboratory sample and right
Product respectively include that three biology repeat in the same old way.
The total serum IgE of the sample of acquisition is extracted using conventional CTAB method.Then RNase-Free DNase is used
Set (Qiagen) purifies the RNA of extraction.The integrality of agarose gel electrophoresis detection RNA sample.Integrality judgement mark
Quasi-: complete RNA agarose electrophoresis has three bands, above two bands it is most bright, respectively represent 28S and 18S rRNA, and first
Band (28S rRNA) brightness should be 2 times of Article 2 (18S rRNA) brightness;And band is bright, and clearly, clear-cut margin.With
RNase-free water is blank control, measures A230, A260 and A280 value of each RNA sample respectively using spectrophotometer, sentences
Determine the purity of RNA sample and calculates its total amount, concentration: 0.093 μ g/ μ L mass: 8.83 μ g;A260/A280:3.5.
The processing group and laboratory sample plant total serum IgE extracted according to above-mentioned steps, utilize Ribo-ZeroTMrRNA
Removal Kits (Plant) kit removes rRNA and obtains Poly (A)-followed by the RNA of paramagnetic particle method combination Poly (A)
RNA sample digests linear rna using RNase Rd, obtains Poly (A) -/Ribo-RNA sample, and building is handled respectively
Bidirection chain specificity database.
2500 ultra-high throughput sequenator of implementation steps S102, HiSeq carries out the chain specificity database that the step constructs
Both-end sequencing, reads a length of 100nt.Library construction and sequencing are by bold and unconstrained biology Co., Ltd (the Shanghai Bioarray of Shanghai uncle
Co.Ltd. it) completes.
S103, sequencing obtains may be lower containing overall quality, inclined containing sequencing primer, end mass in Raw Reads
Low underproof Reads, these underproof Reads probably affect to analysis quality, so necessary
It is filtered, obtains the clean Reads that can be used for data analysis, then carry out with reference to genome alignment.
First, clean reads is carried out using fastx (version:0.0.13), steps are as follows for main filtration: S1031
The relatively low reads of overall quality is removed, quality is greater than reads of the 20 base proportions less than 50% and is removed;
3 ' end quality Q of S1032 removal is lower than 10 base, i.e. base error rate is less than 0.1, wherein Q=-
10logerror_ratio;
S1033 removes joint sequence contained in reads;
The fuzzy N base contained in S1034 removal reads is the unrecognized alkali of machine since sequencing intensity is inadequate
Base;
S1035 removes sequencing fragment (reads) of the length less than 20;
S1036 removes ribosome RNA, using all rRNA of poplar as comparing template, template in comparison
Reads is ribosome RNAreads.
Sequencing and filter result are as shown in table 1
The data summarization of 1 Chinese white poplar RNA-seq of table
Summary of P.tomentosa RNA-seq data
S104, main operational steps are as follows:
S1041 input file: the comparison result file (BAM format) that Tophat2 software generates, and refer to genome sequence
File, file format are fasta (such as genome.fa)
S1042 establishes genome.fa file using samtools software and indexes.
S1043 utilizes the SortSam algorithm in picard-tools software by the corresponding item of chromosome same in BAM file
Mesh is ranked up from small to large according to coordinate sequence.During preparing library, due to that can exist during PCR amplification
Deviation, if the two reads length having the same and same position of genome has been arrived in comparison, it is judged that such
Reads is from PCR amplification, so going identification to be formed by by PCR amplification using picard-tools in next step
A flag is arranged to these sequences to indicate them in duplicates.
Alignment near S1044INDEL is usually inaccurate, needs to carry out using known indel information
Realignment, in two steps, the first step carry out RealignerTargetCreator, and output one includes possible
The file of indels.Second step IndelRealigner carries out realign using this file.Parameter is set as VALIDATION_
STRINGENCY=LENIENT.
The comparison result of sample each after processing is carried out form modifying by S1045, be added header file (microarray dataset information,
Sample names etc., parameter setting are as follows: PL=illumina, PU=pgPU, SM=sample names), carry out subsequent variant sites inspection
It surveys.
S105, according to step S104 as a result, carrying out SNP calling, step using GATK (version:4.0.1.0) software
It is rapid as follows:
S1051 is detected first with the variation that HaplotypeCaller tool in software carries out each sample, -- pair-
Hmm-gap-continuation-penalty parameter is set as 10, -- emit-ref-confidence GVCF, remaining parameter
It is default value, obtains the variation information of each sample.
S1052 utilizes the CombineGVCFs tool in GATK software by the variation file mergences of each sample
S1053 recycles the GenotypeGVCFs tool in GATK software to carry out the detection of the allelic variation between each sample, raw
It can include the variant sites and genotype information of all samples at a vcf file, in vcf file.
S1054 is finally filtered using the VariantFiltration tool in GATK software, to reduce false positive,
35bp sliding window is arranged to detect, will be marked as SNP cluster containing 3 SNP in 35bp and be filtered removal
Finally obtain the allelic variation site information of transcript profile.
S1055, to the reads number for arriving loci, obtains the expression pattern of each loci by calculating ratio, sieves
Select the journey of the loci of the generation difference imbalance expression of stable heterozygosis and the equipotential imbalance expression of quantitative Treatment front and back
Degree changes.
2 Chinese white poplar of table responds the allelic variation site type and quantity statistics of IAA processing
Poplar is diploid, and two homologue one comes from male parent, and one from female parent.Two dyes under normal circumstances
Colour solid same position base is identical, but it is different that homologue same position base can occur in genetic evolution process
The case where, material is thus formed a locis.Splice by transcription sequencing read and with reference to genome alignment, it can be found that
Certain site has and also has the read inconsistent with genome is referred to reference to the consistent read of genome, and this kind of site is flagged as
One heterozygosis loci;It is also possible to only a kind of base and different from reference genome, this is homozygosis 2;Only a kind of alkali
Base and as homozygosis 1 consistent with this site base on reference genome.Transcript profile sequencing result before and after the processing is carried out respectively
Site identification.Identify Mutation situation classification: heterozygosis-homozygosis 1;Heterozygosis-homozygosis 2;It is homozygous-homozygous;Homozygous 1- heterozygosis;It is homozygous
2- heterozygosis;Heterozygosis-heterozygosis totally six seed type.It is heterozygosis that it is preceding, which to screen wherein processing, is also as stablized for the site of heterozygosis after processing
The site of expression, i.e. loci.
As can be seen from Table 2 Chinese white poplar through IAA before and after the processing Allele Specific expression variant sites variation type and
The difference situation of quantity.Wherein heterozygosis-hybrid type SNP is the significance bit that can further analyze the expression of equipotential imbalance
Point, this type account for the overwhelming majority of variant sites.
Step S106 filters out the heterozygosis of equipotential imbalance expression before and after the processing according to the type and quantity of variant sites
Variant sites, to find the gene of generation equipotential differential expression.
The expression of equipotential imbalance occurs after HORMONE TREATMENT according to the loci that table 2 filters out 108152 heterozygosis, it will
The mutant gene type of laboratory sample and control sample/with reference to the ratio progress variance analysis between genotypic expression amount, as a result such as
Table 3 shows, the uneven expression significant difference of loci before and after the processing.
3 processing group front and back mutant gene type of table/with reference to one-way analysis of variance and the F inspection of genotype
According to the difference of the ratio of different genotype expression quantity, the significant site of 47 equipotential differential expressions is filtered out altogether,
And annotated on comospore poplar genome, as shown in table 4.
4 Chinese white poplar of table responds the Allele Specific expressing gene of IAA processing
The title and difference of the significant gene of loci differential expression occurs after Chinese white poplar IAA processing as can be seen from Table 4
Sorrow of separation condition.It can be seen that mutant gene type is generally increased with the ratio with reference to genotype after IAA processing in table 4, equipotential is uneven
The ratio of weighing apparatus expression rises, and is predominantly located at the exon region of gene.
Embodiment 2
It below will be with populus simonii (being derived from national populus simonii germplasm resource bank) drought stress allelic differences before and after the processing
For the screening process of expression, screening technique provided by the present invention is further illustrated.
The acquisition of raw material: the individual that growth potential is same or similar in populus simonii clone is chosen, is carried out the following processing: arid
Processing, does not water for 10 days;Control sample cultivates time watering (water being poured within two days, until soil is sufficiently drenched) by normal.
The various reagents used in CTAB method are commercial product;
Specific steps are as follows:
Step S101, selects populus simonii choiceness plant, and the re selection for experiment is planted in soil, turf
It is in the flowerpot of matrix with perlite (volume ratio 1:1:1), place is Beijing woods University greenhouse (40 ° of 0'N, 116 ° of 20'
E), daily light application time is 16 hours.Drought stress is carried out to processing group, in such a way that 10 days do not water, control sample is pressed
It is normal to cultivate time watering.The blade of control sample and laboratory sample is sequenced for transcript profile after selection processing, acquires control sample
The climax leaves of product and laboratory sample plant same area, laboratory sample and control sample respectively include that three biology repeat.
The total serum IgE of micro blade is extracted using conventional CTAB method.
Then the RNA of extraction is purified using RNase-Free DNase Set (Qiagen).Ago-Gel electricity
The integrality of swimming detection RNA sample.Using RNsae-free water as blank control, each RNA sample is measured respectively using spectrophotometer
A230, A260 and A280 value of product determine the purity of RNA sample and calculate its total amount.
Control sample chain specific cDNA libraries and experiment are established according to the control sample and laboratory sample total serum IgE that extract
Sample chain specific cDNA libraries, the building of chain specificity database and sequencing are by the bold and unconstrained biology Co., Ltd (Shanghai of Shanghai uncle
Bioarray Co.Ltd.) it completes.
S103, sequencing obtains may be lower containing overall quality, inclined containing sequencing primer, end mass in Raw Reads
Low underproof Reads, these underproof Reads probably affect to analysis quality, so necessary
It is filtered, obtains the clean Reads that can be used for data analysis, then carry out with reference to genome alignment.
First, clean reads is carried out using fastx (version:0.0.13), steps are as follows for main filtration:
S1031 removes the relatively low reads of overall quality, and quality is greater than 20 readss of the base proportion less than 50%
Removal;
3 ' end quality Q of S1032 removal is lower than 10 base, i.e. base error rate is less than 0.1, wherein Q=-
10logerror_ratio;
S1033 removes joint sequence contained in reads;
The fuzzy N base contained in S1034 removal reads is the unrecognized alkali of machine since sequencing intensity is inadequate
Base;
S1035 removes sequencing fragment (reads) of the length less than 20;
S1036 removes ribosome RNA, using be poplar all rRNA as comparing template, template in comparison
Reads is ribosome RNAreads.
Sequencing and filter result are as shown in table 5.
The data summarization of 5 populus simonii RNA-seq of table
Summary of P.simonii RNA-seq data
S104, main operational steps are as follows:
S1041 input file: the comparison result file (BAM format) that Tophat2 software generates, and refer to genome sequence
File, file format are fasta (such as genome.fa)
S1042 establishes genome.fa file using samtools software and indexes;
S1043 utilizes the SortSam algorithm in picard-tools software by the corresponding item of chromosome same in BAM file
Mesh is ranked up from small to large according to coordinate sequence.During preparing library, due to that can exist during PCR amplification
Deviation, if the two reads length having the same and same position of genome has been arrived in comparison, it is judged that such
Reads is from PCR amplification, so going identification to be formed by by PCR amplification using picard-tools in next step
A flag is arranged to these sequences to indicate them in duplicates.
Alignment near S1044 INDEL is usually inaccurate, needs to carry out using known indel information
Realignment, in two steps, the first step carry out RealignerTargetCreator, and output one includes possible
The file of indels.Second step IndelRealigner carries out realign using this file.Parameter is set as VALIDATION_
STRINGENCY=LENIENT.
The comparison result of sample each after processing is carried out form modifying by S1045, be added header file (microarray dataset information,
Sample names etc., parameter setting are as follows: PL=illumina, PU=pgPU, SM=sample names), carry out subsequent variant sites inspection
It surveys.
About S105, according to step S104 as a result, carrying out SNP using GATK (version:4.0.1.0) software
Calling, steps are as follows:
S1051 is detected first with the variation that HaplotypeCaller tool in software carries out each sample, -- pair-
Hmm-gap-continuation-penalty parameter is set as 10, -- emit-ref-confidence GVCF, remaining parameter
It is default value, obtains the variation information of each sample.
S1052 utilizes the CombineGVCFs tool in GATK software by the variation file mergences of each sample.
S1053 recycles the GenotypeGVCFs tool in GATK software to carry out the detection of the allelic variation between each sample, raw
It can include the variant sites and genotype information of all samples at a vcf file, in vcf file.
S1054 is finally filtered using the VariantFiltration tool in GATK software, to reduce false positive,
35bp sliding window is arranged to detect, will be marked as SNP cluster containing 3 SNP in 35bp and be filtered removal
Finally obtain the allelic variation site information of transcript profile.
S1055, to the reads number for arriving loci, obtains the expression pattern of each loci by calculating ratio, sieves
Select the journey of the loci of the generation difference imbalance expression of stable heterozygosis and the equipotential imbalance expression of quantitative Treatment front and back
Degree changes.
6 populus simonii of table responds the allelic variation site type and quantity statistics of drought stress processing
Variation class of the populus simonii in the variant sites of drought stress Allele Specific expression before and after the processing as can be seen from Table 6
The difference situation of type and quantity.Wherein heterozygosis-hybrid type SNP be can further analyze equipotential imbalance expression it is effective
Site, this type also account for the most of of variant sites.
Step S106 can be filtered out before and after the processing according to the type and quantity of the variant sites found in step S105
The heterozygous variance site of equipotential imbalance expression, to find the gene of generation equipotential differential expression.
Screening technique provided by the invention has the advantage that it can be seen from the experimental data of both examples above
1) the method is solved parses genome to the analysis problem of extraneous environmental response in equipotential level, using turning
The sequencing of record group carries out parting to genome, is systematically screened in equipotential level to the expression of equipotential Site discrepancy.
2) the method takes full advantage of second generation high throughput sequencing technologies, and SNP site that can be different to idiostatic carries out
High-throughput quantization screening;
3) screening and annotation of the method by equipotential difference expression sites can be expressed equipotential level difference occurs
Genetic model be further analyzed with verify its SNP function.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (10)
1. a kind of method in the heterozygous variance site of high flux screening Plant Genome equipotential imbalance expression, including following step
It is rapid:
1) external source Control factors processing vegetable material obtains processing sample, does not carry out the sample of external source Control factors processing as control
Sample;
2) extraction step 1) described in handle the total serum IgE of sample and control sample, the chain of building processing sample and control sample is special
Anisotropic sequencing library;With Illumina Hiseq2500 to it is described processing sample and control sample chain specificity sequencing library into
Row high-flux sequence obtains the raw sequencing data of processing sample and control sample;
3) raw sequencing data for filtering the processing sample and control sample obtains processing sample and control sample
cleanreads;
4) acquisition processing sample is compared in the cleanreads of the processing sample and control sample with reference genome respectively
Product comparison result and control sample comparison result;
5) it is formed in delete processing sample comparison result and control sample comparison result by PCR amplification respectively identical extra multiple
Then read processed is compared acquisition processing sample again by the total 500bp of the upstream and downstream for the site INDEL occur and with reference to genome
With the final comparison result data of control sample;
6) the final comparison result data of the processing sample and control sample is subjected to loci variation screening, Screening Treatment
The heterozygous sites that sample is identical as genotype in control sample, expression quantity is different obtain the heterozygous variance of equipotential imbalance expression
Site.
2. the method according to claim 1, wherein the external source Control factors include exogenous hormone and the environment side of body
Compel.
3. according to the method described in claim 2, it is characterized in that, the environment-stress includes arid, saline and alkaline, freezing and flood
Flood.
4. the method according to claim 1, wherein the plant is forest.
5. according to the method described in claim 4, it is characterized in that, the forest is poplar.
6. the method according to claim 1, wherein the filtering processing sample and control sample in step 3)
Raw sequencing data is carried out using fastx (version:0.0.13).
7. the method according to claim 1, wherein the variation screening of loci described in step 6) uses SNP
Calling is carried out.
8. the method according to claim 1, wherein the heterozygosis that the step 6) obtains the expression of equipotential imbalance becomes
Further include the annotation carried out to the heterozygous variance site of the uneven expression on genome after ectopic sites, excavates the imbalance
The Allele Specific expression pattern of gene or genomic elements where the heterozygous variance site of expression.
9. method according to claim 1 or 5, which is characterized in that described to refer to genome when the plant is poplar
For Populus trichocarpa v3.0Poplar.
10. the method according to claim 1, wherein high-flux sequence described in step 2) is both-end sequencing, institute
State a length of 100nt of reading of high-flux sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811563126.4A CN109338011B (en) | 2018-12-20 | 2018-12-20 | Method for high-throughput screening of plant genome differential allelic expression genes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811563126.4A CN109338011B (en) | 2018-12-20 | 2018-12-20 | Method for high-throughput screening of plant genome differential allelic expression genes |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109338011A true CN109338011A (en) | 2019-02-15 |
CN109338011B CN109338011B (en) | 2021-09-24 |
Family
ID=65304676
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811563126.4A Active CN109338011B (en) | 2018-12-20 | 2018-12-20 | Method for high-throughput screening of plant genome differential allelic expression genes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109338011B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113005215A (en) * | 2021-02-26 | 2021-06-22 | 北京林业大学 | Haplotype molecular marker related to poplar wood yield and application thereof |
CN114974425A (en) * | 2022-04-22 | 2022-08-30 | 深圳市仙湖植物园(深圳市园林研究中心) | Method for detecting plant RNA editing sites |
CN116312776A (en) * | 2022-12-08 | 2023-06-23 | 上海生物制品研究所有限责任公司 | Method for detecting differentiated RNA editing sites |
-
2018
- 2018-12-20 CN CN201811563126.4A patent/CN109338011B/en active Active
Non-Patent Citations (4)
Title |
---|
JING WANG ET AL: "Variant Calling Using NGS Data in European Aspen (Populus tremula)", 《ADVANCES IN THE UNDERSTANDING OF BIOLOGICAL》 * |
卓仁英等: "林木功能基因组学及其研究策略", 《西南林学院学报》 * |
施季森等: "木本植物全基因组测序研究进展", 《遗传》 * |
欧佳佳: "杨树干旱响应转录组测序分析", 《中国优秀硕士学位论文全文数据库 农业科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113005215A (en) * | 2021-02-26 | 2021-06-22 | 北京林业大学 | Haplotype molecular marker related to poplar wood yield and application thereof |
CN114974425A (en) * | 2022-04-22 | 2022-08-30 | 深圳市仙湖植物园(深圳市园林研究中心) | Method for detecting plant RNA editing sites |
CN116312776A (en) * | 2022-12-08 | 2023-06-23 | 上海生物制品研究所有限责任公司 | Method for detecting differentiated RNA editing sites |
CN116312776B (en) * | 2022-12-08 | 2024-01-19 | 上海生物制品研究所有限责任公司 | Method for detecting differentiated RNA editing sites |
Also Published As
Publication number | Publication date |
---|---|
CN109338011B (en) | 2021-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wei et al. | A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant | |
Zhebentyayeva et al. | Genetic characterization of worldwide Prunus domestica (plum) germplasm using sequence-based genotyping | |
Pavan et al. | Genotyping-by-sequencing of a melon (Cucumis melo L.) germplasm collection from a secondary center of diversity highlights patterns of genetic variation and genomic features of different gene pools | |
Cormier et al. | Re‐annotation, improved large‐scale assembly and establishment of a catalogue of noncoding loci for the genome of the model brown alga Ectocarpus | |
Velasco et al. | Evolutionary genomics of peach and almond domestication | |
Meng et al. | Characterization of a Saccharum spontaneum with a basic chromosome number of x= 10 provides new insights on genome evolution in genus Saccharum | |
Akagi et al. | Genome-wide view of genetic diversity reveals paths of selection and cultivar differentiation in peach domestication | |
Zhou et al. | The telomere-to-telomere genome of Fragaria vesca reveals the genomic evolution of Fragaria and the origin of cultivated octoploid strawberry | |
CN109338011A (en) | A kind of method of the gene of high flux screening Plant Genome difference equipotential expression | |
Haas et al. | Single nucleotide polymorphism charting of P. patens reveals accumulation of somatic mutations during in vitro culture on the scale of natural variation by selfing | |
Xie et al. | A chromosome-scale reference genome of Aquilegia oxysepala var. kansuensis | |
Zhang et al. | A high-density genetic map of tetraploid Salix matsudana using specific length amplified fragment sequencing (SLAF-seq) | |
CN104673884A (en) | Method of developing polymorphic EST-SSR marker by utilizing complete genome and EST data | |
Li et al. | Comparative chloroplast genomics and phylogenetic analysis of Thuniopsis and closely related genera within Coelogyninae (Orchidaceae) | |
CN110846429A (en) | Corn whole genome InDel chip and application thereof | |
CN108192893B (en) | Method for developing blumea balsamifera SSR primer based on transcriptome sequencing | |
CN112289384A (en) | Construction method and application of whole citrus genome KASP marker library | |
Muñoz-Pérez et al. | Genome-wide diversity analysis to infer population structure and linkage disequilibrium among Colombian coconut germplasm | |
Gong et al. | Evolution of the sex-determining region in Ginkgo biloba | |
Wang et al. | Variation burst during dedifferentiation and increased CHH-type DNA methylation after 30 years of in vitro culture of sweet orange | |
CN106987652B (en) | SNP (Single nucleotide polymorphism) marker for identifying sex of litsea cubeba and screening method of SNP marker | |
Bao et al. | A chromosomal-scale genome assembly of modern cultivated hybrid sugarcane provides insights into origination and evolution | |
CN106521004A (en) | Indel marker in linkage with carrot genic male sterility gene and application of Indel marker | |
Fu et al. | High-quality reference genome sequences of two Cannaceae species provide insights into the evolution of Cannaceae | |
CN114854893B (en) | SNPs (single nucleotide polymorphisms) mark associated with millet heading stage characters and identification method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |