CN110379462A - A method of based on the golden waist Chloroplast gene sequence of Illumina Technical form China - Google Patents
A method of based on the golden waist Chloroplast gene sequence of Illumina Technical form China Download PDFInfo
- Publication number
- CN110379462A CN110379462A CN201910546474.9A CN201910546474A CN110379462A CN 110379462 A CN110379462 A CN 110379462A CN 201910546474 A CN201910546474 A CN 201910546474A CN 110379462 A CN110379462 A CN 110379462A
- Authority
- CN
- China
- Prior art keywords
- chloroplast gene
- scaffold
- genome
- waist
- chloroplast
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Zoology (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Immunology (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Genetics & Genomics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
- Peptides Or Proteins (AREA)
Abstract
The present invention discloses a kind of method using the golden waist Chloroplast gene sequence of Illumina data assembling China.This method is divided into four steps, and after Chinese golden waist chloroplaset full-length genome is sequenced by Illumina technology first, obtained initial data is compared to chloroplaset with reference to the data on genome, compared.Secondly, interrupting the reads in comparison for kmer length, kmer is carried out based on De Bruijin algorithm and is assembled into different contigs, and is again coupled to obtain the bigger scaffold of data volume using the overlap relationship between contigs.Third, the data being sequenced using original I llumina, further extend obtained scaffold, will be on the Chloroplast gene of the direct map of the scaffold that finally obtained to the nearest species of affiliation, its region IR is found, carrying out splicing manually can be obtained complete plant chloroplast genome.Biggest advantage of the present invention is directly can to obtain Chloroplast gene and its complete map using bioinformatics method and means without extracting chloroplaset.
Description
Technical field
The invention belongs to technical field of biological information, and in particular to one kind is based on the golden waist leaf of Illumina Technical form China
The method of green body genome sequence.
Background technique
About 70 kinds of the golden waist platymiscium whole world, Asia, Europe, non-, U.S. four continent are distributed, based on the distribution of Asia temperate zone.China
It has now been found that about 36 kinds of golden waist platymiscium, is distributed widely in Yunnan, Tibet, Sichuan, Guizhou, Hubei, Hunan, northeast etc. more than 20
A province.
Golden waist platymiscium has higher medical value because being rich in flavone compound, and " Chinese Plants will ", " middle traditional Chinese medicines plant figure
Mirror " and " national Chinese herbal medicine compilation " about this platymiscium drug effect record, be chiefly used in it is clearing heat and detoxicating, treatment liver and bladder disease
Deng.It is also used widely in Chinese traditional medicine historical development, as being called Ya Jima, Tibetan medicine scholar's Supreme Being's Ma in Tibetan medicine
You are red to increase flat arrange and is loaded with that " Ya Jima is born in high mountain stone gap, and bitter is cool in nature, delays and vomits and diarrhoea, controls gallbladder disease in works " Jingzhubencao "."
Furthermore golden waist platymiscium is also included in traditional Mongolian medicine works " errorless anaesthetic mirror ".Recent study shows that golden waist platymiscium is universal
Has good antitumor and antiviral activity containing higher flavonoids and triterpene compound, wherein from Chrysosplenium nudicaule
The pentacyclic triterpene being separated in C.nudicale to pernicious brown tumor (A375), 4 kinds of gastric cancers (ST-KM, KaTo-III, NKPS,
KKLS) all have with bladder cancer (KK-47) compared with high inhibition effect, and golden waist ketone B and gold peculiar and generally existing in the platymiscium
Waist ketone C (Chrysosplenol B, Chrysosplenol C) has significant antiviral activity.The above physiological activity shows
Golden waist platymiscium is worth further research and development.
Genome two generations sequencing technologies are widely used to the every field of life science, are sequenced compared to mulberry lattice, two generations
The sequencing cost of sequencing substantially reduces, and sequencing speed is high, and accuracy is preferable.Now widely applied includes Roche company
454 sequencing systems, the SOLiD microarray dataset of ABI company (Applied Biosystems), Illumina company Solexa
Microarray dataset, wherein Illumina is extensive rapidly since its sequencing cost is low, it is fast to estimate in all sequencing technologies
It uses.
Plant cell has 3 sets of Matrix attachment region, mitochondrial genomes and Chloroplast gene genetic systems, and all relatively independent
Heredity.Its Chloroplast is able to carry out semi-autonomous formula duplication, is prevalent in algae and green plants, is the life that light is biology
Life activity provides energy, becomes the energy source for promoting primordial growth traits to evolve, has played important work in the very long evolution of life
With.And Chloroplast gene is more conservative compared with Matrix attachment region, often as probing into species affiliation, origin is evolved etc. according to
One of according to.
Chloroplast gene is smaller, if directly Chloroplast gene is sequenced, is merely able to analysis chloroplast number evidence, and
Full-length genome, which is sequenced, at present can equally extract chloroplast number evidence, not only save the time, and data can be mostly used, pole
It reduces costs greatly.However, in the prior art, there is no the structure compositions of the Chloroplast gene to golden waist to provide explicitly
It records, the structure of the Chloroplast gene of golden waist how is obtained by the splicing of sequencing technologies and sequencing fragment, thus after being
The exploitation of continuous gold waist germ plasm resource lays the foundation.In addition, different plants, due to the difference of its genome sequence composition, Ye Lv
Body sequence composition is different, and during splicing sequencing judges, applicable algorithm is also different, ABySS, SOAP de novo-
Trans, Oases, IDBA-Tran, BinPacker, Bridger, Trinity etc..Trinity is most widely used, generally acknowledged spends most
High from the beginning transcript profile splicing software and first software specifically for transcript profile splicing exploitation.How to find out most suitable
Algorithm and this field technical problem urgently to be solved of alloy waist Chloroplast gene splicing.
Summary of the invention
In order to solve the problems in the prior art, the present invention provides one kind based on the golden waist of Illumina Technical form China
The method of Chloroplast gene sequence can be assembled directly using Chinese golden waist whole genome sequence data, and this method is suitable
For the tissue of golden waist Chloroplast gene, and the structure of the Chloroplast gene of Chinese golden waist is obtained, to be subsequent gold
The exploitation of waist platymiscium germ plasm resource lays the foundation.
In one embodiment, the present invention provides a kind of golden waist Chloroplast gene structure map of China, and feature is such as
Shown in attached drawing 1.
In one embodiment, it is magnificent in exploitation to provide a kind of golden waist Chloroplast gene structure map of China by the present invention
Application in golden waist germ plasm resource.
In one embodiment, the present invention provides a kind of method of golden waist Chloroplast gene sequence of assembling China,
It is characterized in that, the step of the method are as follows: assembling plant chloroplast genome sequence method provided by the invention, specific steps
Are as follows:
(1) substantially estimates the size of sample, and sample is sequenced using Illumina technology;
(2) compares sequencing data to chloroplaset with reference on genome, extracts the data on comparing;
(3) assembles Chloroplast gene based on De Bruijin algorithm, and sequence is broken into kmer length,
Kmer value is 21-127, and selection wherein assembles most suitable kmer value;
(4) after abbreviation De Bruijin figure, an optimal Euler way is found in De Bruijin figure or its subgraph
Diameter, the corresponding base sequence in the path is contigs;
(5) is using the overlap relationship between contigs, and by it, further connection forms scaffold;
(6) is further extended scaffold using Illumina sequencing data;
(7) it on the direct map of scaffold to the Chloroplast gene of the nearest species of affiliation that obtains upper step, looks for
It can be adjusted manually to four regions --- LSC, SSC, IRa, IRb of chloroplaset since IRa and IRb is inverted repeats
Whole splicing obtains complete Chloroplast gene.
In one embodiment, the step (1) is that sample does not need to be separated plant chloroplast, can be directly to full-length genome
It is sequenced.
In one embodiment, the step (2) is the chloroplaset full genome that chloroplaset refers to that genome is sibling species
Group.
In one embodiment, step (3) step (3) choose 21-127 wherein all odd numbers as kmer value
It is tested.
In one embodiment, the step 4) be overlap relationship be previous contig rear several sequences with
Preceding several sequences of the latter contig are identical or are almost similar, it can connect two contig
In one embodiment, the step (6) is using original data as reference sequences, to scaffold into
Row extends.
In one embodiment, the step are as follows:
1) it, estimates sample size and is sequenced
The Genome Size of golden waist is substantially estimated using flow cytometry, and then the genome of Chinese golden waist is mentioned
It takes, and is sequenced using the Hiseq PE150 of Illumina company, finally obtain the data volume of 2.02G, it is then right
RawData carries out connector, and Quality Control obtains CleanData.
2) it, compares and extracts
Choose China gold waist sibling species purple bergenia herb Bergenia purpurascens Engl. (NC_036061.1
Bergenia scopulosa chloroplast, complete genome) it is that chloroplaset refers to genome, utilize bwa software
CleanData is compared with purple bergenia herb, recycles samtools to extract the data in comparison, at this time substantially from complete
Chloroplast number evidence is isolated in genome.
3) most suitable kmer value, is chosen
The file of extraction is finally bam format, and the file of bam format is converted to two using bam2fastq software
Fastq format.It is analyzed using the kmer that kmergenie software carries out 21-127mer, it is reversed due to existing in Chloroplast gene
Repetitive sequence, therefore two peak values should be presented in kmer figure, wherein previous peak value is small, the latter peak value is big, and previous peak value
Size is about the half of latter peak value, and therefrom choosing and assembling best kmer value is 81.
4) it, is assembled
Assembling splicing is carried out using ABYSS algorithm, as a result, Contigs Number 49, Min Contig
Number 81,Max Contig Number 20389Contig N50 7702;
5) it, is attached using overlap
There to be being attached for overlap between two contig using Sequencher5.4.6 software, obtains longer
scaffold。
6), further scaffolding
Using original CleanData data, scaffold is further extended by SSPACE software, is finally obtained
The number of scaffolds is 19, wherein the smallest scaffold length is 192bp, longest scaffold length is
53813bp, scaffold N50 length are 38067bp.
7) it, finds the region IR and is spliced
On finally obtained scaffold direct map to the Chloroplast gene of purple bergenia herb, practical that uses will be found
There is first three scaffold, finds the region IR of Chinese golden waist, connect and Chloroplast gene can be obtained, and be based on
Sequence information is analyzed, and Chloroplast gene structure chart is constructed, as shown in claim 1.
In one embodiment, a kind of method that the present invention provides golden waist Chloroplast gene sequence of assembling China is being made
Application in standby gold waist Chloroplast gene structure map.
In one embodiment, a kind of method that the present invention provides golden waist Chloroplast gene sequence of assembling China is being opened
Send out the application in golden waist germ plasm resource.
Compared with prior art, can achieve it is following the utility model has the advantages that
The present invention obtains Chinese golden waist Chloroplast gene structure map for the first time, and the heredity for the golden waist of later period China is ground
Study carefully, germ plasm resource utilizes, and lays the foundation.Meanwhile the present invention obtains ABYSS by groping different stitching algorithm and condition
It is the most suitable joining method for carrying out golden waist Chloroplast gene sequence assembling, quickly and accurately to obtain golden waist Chloroplast gene
Structure lay the foundation with composition.
Detailed description of the invention
Fig. 1 is Chinese golden waist Chloroplast gene structure map;
Fig. 2 is the kmer figure in Chinese golden waist Chloroplast gene sequence assembling method;
Specific embodiment
For a better understanding of the technical solution of the present invention, the technology provided below with reference to embodiment the present invention is described in detail
Scheme.
The assemble method of the golden waist Chloroplast gene sequence of 1 China of embodiment
1, it estimates sample size and is sequenced
The Genome Size that Chinese golden waist is substantially estimated using flow cytometry, then extracts its genome,
And be sequenced using the Hiseq PE150 of Illumina company, the data volume of 2.02G is finally obtained, it is then right
RawData carries out connector, and Quality Control obtains CleanData.
2, it compares and extracts
Choose China gold waist sibling species purple bergenia herb Bergenia purpurascens Engl. (NC_036061.1
Bergenia scopulosa chloroplast, complete genome) it is that chloroplaset refers to genome, utilize bwa software
By the China gold CleanData of waist and being compared for purple bergenia herb, samtools is recycled to propose the data in comparison
It takes, substantially isolates chloroplast number evidence from full-length genome at this time.
3, most suitable kmer value is chosen
The file of extraction is finally bam format, and the file of bam format is converted to two using bam2fastq software
The sequence of fastq format.It is analyzed using the kmer that kmergenie software carries out 21-127mer, therefrom show that this sequence is optimal
Kmer value is 81.Since there are inverted repeats in Chloroplast gene, therefore two peak values should be presented in kmer figure, such as Fig. 2 institute
Show, wherein it is 157 that the Kmer depth of previous peak value is small, the Kmer depth greatly 319 of the latter peak value, and previous peak value pair
The Kmer depth answered is about the half of latter peak value.
4, it is assembled
It is assembled using typical algorithm in De Bruijin figure stitching algorithm, and removes the sequence of redundancy with cd-hit software
Column,
5, it is attached using overlap
There to be being attached for overlap between two contig using Sequencher5.4.6 software, obtains longer
scaffold。
6, further scaffolding
Using original CleanData data, scaffold is further extended by SSPACE software, is finally obtained
Scaffolds ascertain the number and the length information of difference scaffolds.
7, it finds the region IR and is spliced
On finally obtained scaffold direct map to the Chloroplast gene of purple bergenia herb, practical that uses will be found
There is first three scaffold, finds the region IR of Chinese golden waist, connect and Chloroplast gene can be obtained.
Influence of the different stitching algorithms of embodiment 2 to Chinese golden waist Chloroplast gene sequence assembling
In order to probe into influence of the different stitching algorithms to Chinese golden waist Chloroplast gene sequence assembling, make every effort to search out most
The stitching algorithm of the suitable golden waist Chloroplast gene sequence assembling of China, this implementation use following experimental design:
The Genome Size of China gold waist Chrysosplenium sinicum Maxim. is only 300M, in golden waist category
Belong to the lesser species of genome content, therefore, China gold waist Chrysosplenium sinicum is selected in this experiment
Maxim. the assembling of Chloroplast gene is carried out as material.
Giemsa staining group: using conventional Giemsa solution
ABYSS group: ABYSS algorithm is selected to be assembled;
Velvet group: Velvet algorithm is selected to be assembled;
SPAdes group: SPAdes algorithm is selected to be assembled;
SOAPdenovo group: SOAPdenovo algorithm is selected to be assembled;
Specific experiment is as follows:
1, it estimates sample size and is sequenced
The size that Chinese golden waist is substantially estimated using flow cytometry is 300M, then to the genome of Chinese golden waist into
Row extracts, and is sequenced using the Hiseq PE150 of Illumina company, finally obtains the data volume of 2.02G, so
Connector is carried out to RawData afterwards, Quality Control obtains CleanData.
2, it compares and extracts
Choose China gold waist sibling species purple bergenia herb Bergenia purpurascens Engl. (NC_036061.1
Bergenia scopulosa chloroplast, complete genome) it is that chloroplaset refers to genome, utilize bwa software
CleanData is compared with purple bergenia herb, recycles samtools to extract the data in comparison, at this time substantially from complete
Chloroplast number evidence is isolated in genome.
3, most suitable kmer value is chosen
The file of extraction is finally bam format, and the file of bam format is converted to two using bam2fastq software
The sequence of fastq format.It is analyzed using the kmer that kmergenie software carries out 21-127mer, therefrom show that this sequence is optimal
Kmer value is 81.Since there are inverted repeats in Chloroplast gene, therefore two peak values should be presented in kmer figure, such as Fig. 2 institute
Show, wherein it is 157 that the Kmer depth of previous peak value is small, the Kmer depth greatly 319 of the latter peak value, and previous peak value pair
The Kmer depth answered is about the half of latter peak value.
4, it is assembled
There is typical algorithm in De Bruijin figure stitching algorithm: ABYSS, Velvet, SPAdes, SOAPdenovo, utilizes
The above corresponding software is assembled respectively, and the sequence of redundancy is removed with cd-hit software, and the result of four composite softwares is as follows:
1 four composite softwares of table assemble information
From can be seen that stitching algorithm different in 4 in above-mentioned table 1, wherein the result of ABYSS is best, after being best suited for
The experiment process of phase, it is seen then that ABYSS is the most suitable joining method for carrying out the golden waist Chloroplast gene sequence assembling of China.
The assembling and Chloroplast gene structure of the golden waist Chloroplast gene sequence of 3 China of embodiment
1, it estimates sample size and is sequenced
The Genome Size of golden waist is substantially estimated using flow cytometry, and then the genome of Chinese golden waist is mentioned
It takes, and is sequenced using the Hiseq PE150 of Illumina company, finally obtain the data volume of 2.02G, it is then right
RawData carries out connector, and Quality Control obtains CleanData.
2, it compares and extracts
Choose China gold waist sibling species purple bergenia herb Bergenia purpurascens Engl. (NC_036061.1
Bergenia scopulosa chloroplast, complete genome) it is that chloroplaset refers to genome, utilize bwa software
CleanData is compared with purple bergenia herb, recycles samtools to extract the data in comparison, at this time substantially from complete
Chloroplast number evidence is isolated in genome.
3, most suitable kmer value is chosen
4, the file extracted is finally bam format, and the file of bam format is converted to two using bam2fastq software
The sequence of fastq format.It is analyzed using the kmer that kmergenie software carries out 21-127mer, therefrom show that this sequence is best
Kmer value be 81.Since there are inverted repeats in Chloroplast gene, therefore two peak values, such as Fig. 2 should be presented in kmer figure
It is shown, wherein it is 157 that the Kmer depth of previous peak value is small, the Kmer depth greatly 319 of the latter peak value, and previous peak value
Corresponding Kmer depth is about the half of latter peak value.It is assembled
Assembling splicing is carried out using ABYSS algorithm, as a result, Contigs Number 49, Min Contig
Number 81,Max Contig Number 20389Contig N50 7702;
5, it is attached using overlap
There to be being attached for overlap between two contig using Sequencher5.4.6 software, obtains longer
scaffold。
6, further scaffolding
Using original CleanData data, scaffold is further extended by SSPACE software, is finally obtained
The number of scaffolds is 19, wherein the smallest scaffold length is 192bp, longest scaffold length is
53813bp, scaffold N50 length are 38067bp.
7, it finds the region IR and is spliced
On finally obtained scaffold direct map to the Chloroplast gene of purple bergenia herb, practical that uses will be found
There is first three scaffold, finds the region IR of Chinese golden waist, connect and Chloroplast gene can be obtained, and be based on
Sequence information is analyzed, and Chloroplast gene structure chart is constructed, as shown in Fig. 1.
Above the present invention is described in detail with a general description of the specific embodiments, but in the present invention
On the basis of, it can be made some modifications or improvements, this will be apparent to those skilled in the art.Therefore, not
These modifications or improvements on the basis of deviation spirit of that invention, fall within the scope of the claimed invention.
Claims (10)
1. a kind of golden waist Chloroplast gene structure map of China, characterized in that the map is as shown in Fig. 1.
2. answering in the golden waist Chloroplast gene structure map of China golden waist germ plasm resource magnificent in exploitation described in claim 1
With.
3. a kind of method for assembling golden waist Chloroplast gene sequence, which is characterized in that the step of the method are as follows: the present invention mentions
The assembling Chloroplast gene sequence method of confession, specific steps are as follows:
(1) substantially estimates the size of sample, and sample is sequenced using Illumina technology;
(2) compares sequencing data to chloroplaset with reference on genome, extracts the data on comparing;
(3) assembles Chloroplast gene based on De Bruijin algorithm, and sequence is broken into kmer length, kmer value
For 21-127, selection wherein assembles most suitable kmer value;
(4) after abbreviation De Bruijin figure, optimal Euler's approach is found in De Bruijin figure or its subgraph, it should
The corresponding base sequence in path is contigs;
(5) is using the overlap relationship between contigs, and by it, further connection forms scaffold;
(6) is further extended scaffold using Illumina sequencing data;
(7) on the direct map of scaffold to the Chloroplast gene of the nearest species of affiliation that obtains upper step, leaf is found
Four regions --- LSC, SSC, IRa, IRb of green body may be manually adjusted spelling since IRa and IRb is inverted repeats
It connects to obtain complete Chloroplast gene.
4. the method for Chloroplast gene sequence as claimed in claim 3, which is characterized in that the step (1) is not required to for sample
Chloroplaset is separated, directly full-length genome can be sequenced and obtain complete map.
5. the method for Chloroplast gene sequence as claimed in claim 3, which is characterized in that the step (2) is chloroplaset ginseng
Examine the chloroplaset full-length genome that genome is sibling species.
6. the method for Chloroplast gene sequence as claimed in claim 3, which is characterized in that step (3) step (3) is chosen
21-127 wherein tested as kmer value by all odd numbers.
7. the method for Chloroplast gene sequence as claimed in claim 3, which is characterized in that the step 4) is the pass overlap
System is that rear several sequences of previous contig are identical as preceding several sequences of the latter contig or be almost similar, it can will
Two contig are connected.
8. gold waist chromosome flaking method as claimed in claim 3, which is characterized in that the step (6) utilizes original data
As reference sequences, scaffold is extended.
9. gold waist chromosome flaking method as claimed in claim 3, which is characterized in that the step are as follows:
1) it, estimates sample size and is sequenced
The Genome Size of golden waist is substantially estimated using flow cytometry, and then the genome of Chinese golden waist is extracted,
And be sequenced using the Hiseq PE150 of Illumina company, the data volume of 2.02G is finally obtained, it is then right
RawData carries out connector, and Quality Control obtains CleanData;
2) it, compares and extracts
Choose edge species purple bergenia herb Bergenia purpurascens Engl. (NC_ close with Chinese gold waist
036061.1Bergenia scopulosa chloroplast, complete genome) it is that chloroplaset refers to genome, benefit
CleanData is compared with purple bergenia herb with bwa software, samtools is recycled to extract the data in comparison, this
When chloroplast number evidence is substantially isolated from full-length genome;
3) most suitable kmer value, is chosen
The file of extraction is finally bam format, and the file of bam format is converted to two fastq lattice using bam2fastq software
Formula.It is analyzed using the kmer that kmergenie software carries out 21-127mer, since there are inverted repeat sequences in Chloroplast gene
Column, therefore two peak values should be presented in kmer figure, wherein previous peak value is small, the latter peak value is big, and the size of previous peak value is about
For the half of latter peak value, therefrom choosing and assembling best kmer value is 81;
4) it, is assembled
Assembling splicing is carried out using ABYSS algorithm, as a result, Contigs Number 49, Min Contig Number
81,Max Contig Number 20389Contig N50 7702;
5) it, is attached using overlap
There to be being attached for overlap between two contig using Sequencher5.4.6 software, obtains longer
scaffold;
6), further scaffolding
Using original CleanData data, scaffold is further extended by SSPACE software, is finally obtained
The number of scaffolds is 19, wherein the smallest scaffold length is 192bp, longest scaffold length is
53813bp, scaffold N50 length are 38067bp;
7) it, finds the region IR and is spliced
Before only having on finally obtained scaffold direct map to the Chloroplast gene of purple bergenia herb, find actually to use
Three scaffold find the region IR of Chinese golden waist, connect and Chloroplast gene can be obtained, and be based on sequence
Information is analyzed, and Chloroplast gene structure chart is constructed, as shown in claim 1.
10. method described in claim 3-8 is preparing the golden waist Chloroplast gene structure map of China described in claim 1
In application;Or the application in the golden waist germ plasm resource magnificent in exploitation of method described in claim 3-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910546474.9A CN110379462B (en) | 2019-06-21 | 2019-06-21 | Method for assembling Chinese Jinyao chloroplast genome sequence based on Illumina technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910546474.9A CN110379462B (en) | 2019-06-21 | 2019-06-21 | Method for assembling Chinese Jinyao chloroplast genome sequence based on Illumina technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110379462A true CN110379462A (en) | 2019-10-25 |
CN110379462B CN110379462B (en) | 2021-11-26 |
Family
ID=68250575
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910546474.9A Active CN110379462B (en) | 2019-06-21 | 2019-06-21 | Method for assembling Chinese Jinyao chloroplast genome sequence based on Illumina technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110379462B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112259169A (en) * | 2020-11-18 | 2021-01-22 | 东北农业大学 | Method for rapidly acquiring chloroplast genome from transcriptome data |
CN118298914A (en) * | 2024-01-25 | 2024-07-05 | 南京林业大学 | Method for assembling plant cell organelle pan structure and counting different image frequency |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103093121A (en) * | 2012-12-28 | 2013-05-08 | 深圳先进技术研究院 | Compressed storage and construction method of two-way multi-step deBruijn graph |
CN104951672A (en) * | 2015-06-19 | 2015-09-30 | 中国科学院计算技术研究所 | Splicing method and system of second generation and third generation genomic sequencing data combination |
CN108897986A (en) * | 2018-05-29 | 2018-11-27 | 中南大学 | A kind of genome sequence joining method based on protein information |
WO2019000254A1 (en) * | 2017-06-28 | 2019-01-03 | 中国医学科学院药用植物研究所 | Method for quality control of chinese patent medicine based on metagenome |
CN109411014A (en) * | 2018-10-09 | 2019-03-01 | 中国科学院昆明植物研究所 | A kind of cyclic method of plant chloroplast full-length genome assembling based on the sequencing of two generations |
-
2019
- 2019-06-21 CN CN201910546474.9A patent/CN110379462B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103093121A (en) * | 2012-12-28 | 2013-05-08 | 深圳先进技术研究院 | Compressed storage and construction method of two-way multi-step deBruijn graph |
CN104951672A (en) * | 2015-06-19 | 2015-09-30 | 中国科学院计算技术研究所 | Splicing method and system of second generation and third generation genomic sequencing data combination |
WO2019000254A1 (en) * | 2017-06-28 | 2019-01-03 | 中国医学科学院药用植物研究所 | Method for quality control of chinese patent medicine based on metagenome |
CN108897986A (en) * | 2018-05-29 | 2018-11-27 | 中南大学 | A kind of genome sequence joining method based on protein information |
CN109411014A (en) * | 2018-10-09 | 2019-03-01 | 中国科学院昆明植物研究所 | A kind of cyclic method of plant chloroplast full-length genome assembling based on the sequencing of two generations |
Non-Patent Citations (1)
Title |
---|
蔡杰: "锦葵科椴树属(Tilia L.)的叶绿体系统发育基因组学研究", 《中国博⼠学位论⽂全⽂数据库 基础科学辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112259169A (en) * | 2020-11-18 | 2021-01-22 | 东北农业大学 | Method for rapidly acquiring chloroplast genome from transcriptome data |
CN112259169B (en) * | 2020-11-18 | 2024-01-30 | 东北农业大学 | Method for rapidly obtaining chloroplast genome from transcriptome data |
CN118298914A (en) * | 2024-01-25 | 2024-07-05 | 南京林业大学 | Method for assembling plant cell organelle pan structure and counting different image frequency |
Also Published As
Publication number | Publication date |
---|---|
CN110379462B (en) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zeng et al. | Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. Et Zucc.) Maxim | |
Ang et al. | Proteogenomics: from next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine | |
Gai et al. | Transcriptome analysis of tree peony during chilling requirement fulfillment: assembling, annotation and markers discovering | |
CN110117653A (en) | The detection method and kit of the mutation rate in lung cancer mutational site | |
CN110379462A (en) | A method of based on the golden waist Chloroplast gene sequence of Illumina Technical form China | |
Zhang et al. | De novo characterization of Panax japonicus CA Mey transcriptome and genes related to triterpenoid saponin biosynthesis | |
US11398294B2 (en) | Method for controlling the quality of traditional Chinese patent medicines based on metagenomics | |
Zhang et al. | Complete chloroplast genomes of Leptodermis scabrida complex: Comparative genomic analyses and phylogenetic relationships | |
CN108796075A (en) | Detect application and the kit of circRNF13 and LOC284454 reagents | |
CN108315393A (en) | The quantitatively method of detection dissociative DNA, application and the kit for detecting dissociative DNA | |
Jia et al. | A chromosome-level reference genome of Chinese balloon flower (Platycodon grandiflorus) | |
Huang et al. | De novo transcriptome analysis of a medicinal fungi Phellinus linteus and identification of SSR markers | |
Yang et al. | Functional genome of medicinal plants | |
CN104293963B (en) | A kind of method of 145 SNP judgment of Wuzhishan minipig inbred line of application | |
Han et al. | Transcriptomic landscape of Dendrobium huoshanense and its genes related to polysaccharide biosynthesis | |
CN104293889B (en) | The screening technique of the microRNA that male and female flowering organs differences is expressed in dioecian plant | |
CN112972459A (en) | Application of radix astragali monomer in liver protection | |
Zhao et al. | Transcriptome profiling and digital gene expression analysis of Fallopia multiflora to discover putative genes involved in the biosynthesis of 2, 3, 5, 4′-tetrahydroxy stilbene-2-O-β-d-glucoside | |
Wang et al. | De novo characterization of the root transcriptome and development of EST-SSR markers in paris polyphylla smith var. yunnanensis, an endangered medical plant | |
CN107177676A (en) | Long-chain non-coding RNA NONHSAT113026 is used for the purposes of Diagnosis of Renal Cell Carcinoma molecular marker | |
Yang et al. | Genome-wide survey and genetic characteristics of Ophichthus evermanni based on Illumina sequencing platform | |
Wang et al. | Genome-wide transcriptional excavation of Dipsacus asperoides unmasked both cryptic asperosaponin biosynthetic genes and SSR markers | |
CN116875721A (en) | Application of cfDNA of cryptococcus in diagnosis of cryptococcus infection | |
CN108660213A (en) | The application of three kinds of non-coding RNA reagents of detection and kit | |
Yang et al. | The Houttuynia cordata genome provides insights into the regulatory mechanism of flavonoid biosynthesis in Yuxingcao |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |