CN112349350B - Method for strain identification based on Dunaliella core genome sequence - Google Patents

Method for strain identification based on Dunaliella core genome sequence Download PDF

Info

Publication number
CN112349350B
CN112349350B CN202011238521.2A CN202011238521A CN112349350B CN 112349350 B CN112349350 B CN 112349350B CN 202011238521 A CN202011238521 A CN 202011238521A CN 112349350 B CN112349350 B CN 112349350B
Authority
CN
China
Prior art keywords
dunaliella
genome
strain
sequencing
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011238521.2A
Other languages
Chinese (zh)
Other versions
CN112349350A (en
Inventor
高帆
宋韡
南芳茹
冯佳
谢树莲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Aixin Biotechnology Co.,Ltd.
Original Assignee
Shanxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi University filed Critical Shanxi University
Priority to CN202011238521.2A priority Critical patent/CN112349350B/en
Publication of CN112349350A publication Critical patent/CN112349350A/en
Application granted granted Critical
Publication of CN112349350B publication Critical patent/CN112349350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/6895Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Abstract

The invention belongs to the technical field of plant molecular identification, and particularly relates to a method for strain identification based on a dunaliella core gene sequence. The method mainly comprises the following steps: collecting, purifying and culturing a sample; extracting whole genome DNA; constructing a DNA sequencing library; obtaining whole genome sequencing data of an alga strain to be detected and Dunaliella quartolecta; screening and de novo assembling a core genome sequencing fragment of the Dunaliella D.quartz pecta, and performing gene component, protein function annotation and genome contig colinearity analysis on the assembled core genome sequence; the method comprises the steps of constructing a phylogenetic tree by utilizing single nucleotide polymorphism, and when the to-be-detected algae strain and the Dunaliella tertiolecta are gathered into a cluster, the branched data support rate is 0.99-1.00, the genetic similarity percentage is more than or equal to 99%, and the to-be-detected algae strain is D.quartz.

Description

Method for strain identification based on Dunaliella core genome sequence
Technical Field
The invention belongs to the technical field of plant molecular identification, and particularly relates to a method for strain identification based on a Dunaliella core genome sequence.
Background
Dunaliella viridis Dunaliella quatolytica is a eukaryotic unicellular microalgae living in oceans, salt lakes and other extreme environments, belongs to Chlorophyta, Chlorophyceae, Volvocales, Dunaliella, has strong stress resistance, no cell wall, contains a chromoplast and a protein nucleus, and has flagella at the top of the cell. The Dunaliella tertiolecta D.quartolecta is rich in bioactive substances such as glycerol, beta-carotene, algal polysaccharides and the like, and belongs to characteristic economic microalgae. The characteristic strain in the Dunaliella D.quartz is used as a bioreactor to extract active substances and carry out industrial production, and the method has important application prospect in the fields of food processing, medical care, biodiesel and the like. However, at present, 23 types of dunaliella identified at home and abroad have similar morphology and broad-spectrum salt tolerance, and the identification of the dunaliella D.quartz is difficult from the morphological point of view. Although the efficiency of identifying the algal strains is improved from the perspective of DNA (deoxyribonucleic acid) markers, gene markers and protein markers, the accuracy is still limited by factors such as molecular marker means, conservation of fragments and non-universality of amplification or experimental procedures, the conventional molecular identification of some kindred algal strains usually has the defects of few candidate amplification fragments, poor specificity of universal markers, long development period of novel markers and specific primers, optimization of PCR (polymerase chain reaction) amplification procedures and the like, and the obtained identification result also often has false positive. As an important characteristic strain with high added value in the genus Dunaliella, the molecular identification of the D.quartz pecta resource of the Dunaliella is very key. Therefore, there is a need to develop a more accurate, rapid and universal method for identifying the D.quartolecta molecule in Dunaliella.
Due to the rapid development of next generation DNA sequencing technologies, molecular identification technologies based on the whole genome level of species are possible. Compared with the traditional molecular identification technology, the identification genetic information quantity of the whole genome level is larger, the detection range is wider, the identification of related species is more effective, and the obtained genetic variation information is richer. Currently, whole genome sequencing data for many model species have been published. Although reference genome sequencing data of dunaliella salina (d.salina) has been published in 2017 (Dunsal1 v.2), there has been no report on whole genome sequencing work of the strain as another typical dunaliella salina d.quartococta. The currently popular second generation and third generation combined sequencing technology is used for sequencing the whole genome of a species, and although complete genetic information of the species can be obtained, the following defects still exist: (1) all sequencing fragments need to be completely compared, the operation time is long, the data output is huge, a large amount of time and resources of a computer can be consumed, and the molecular identification work is not facilitated to be carried out in time; (2) genome assembly and biological information analysis not only highly depend on second-generation and third-generation high-throughput sequencing platforms of domestic and foreign sequencing companies, such as Illimina, Nanopore, PacBio and the like, but also are limited by the size of species genomes and the computing capability of the platforms, so that the result output period is longer, the manufacturing cost is higher, and common laboratories are often difficult to bear; (3) molecular identification is carried out on related species, the whole genome re-sequencing quality of the related species is highly dependent, the whole genome re-sequencing quality is closely related to the genome quality of a reference species, if the genome sequencing depth of the reference species is not enough and the assembling quality is not high, the re-sequencing result of the genome of the species to be detected is influenced, and further the species identification is deviated.
Therefore, how to provide an accurate, efficient and economic method for identifying the dunaliella D.quatorecta from the strain to be detected is an urgent technical problem to be solved in the field.
Disclosure of Invention
The invention provides a method for strain identification based on a Dunaliella core genome sequence.
In order to achieve the purpose, the invention adopts the following technical scheme:
the method for strain identification based on the core genome sequence of the dunaliella salina comprises the following steps:
(1) collecting, purifying and culturing a sample: collecting an alga strain to be detected and a Dunaliella tertiolecta D.quartz, purifying the alga strain to be detected, and then carrying out indoor expanded culture;
(2) extracting whole genome DNA: respectively extracting the whole genome DNA of the to-be-detected alga strain and the D.quartolecta by using an improved CTAB method, and freezing and storing;
(3) respectively constructing a DNA sequencing library after breaking and purifying the whole genome DNA of the alga strain to be detected and the Dunaliella D.quartz ectca in the step (2);
(4) sequencing the DNA sequencing libraries in the step (3) by adopting a high-throughput sequencing method respectively to obtain second-generation sequencing data of the to-be-detected alga strain and the D.quartolecta whole genome;
(5) taking the saline Dunaliella salina whole genome data published by NCBI as reference, comparing the D.quatolytica whole genome sequencing data obtained in the step (4) with the data, obtaining the D.quatolytica core genome sequence of the Dunaliella salina through screening, de novo assembly and quality evaluation, wherein the size of the core genome sequence is 6592916bp, the number of contigs is 3000, the length of the maximum contig is 1133322bp, the average length of the contig is 2197.64bp, the length of the contig N50 is 15270, the proportion of the complete gene is 23.65%, the proportion of the single copy gene is 15.18%, the proportion of the multi-copy gene is 13.76%, the proportion of vacancy/deletion is 1.89%, and the proportion of the incomplete fragment is 17.45%, constructing a Dunaliella salinolytica core genome circular map which is assembled de, and then performing gene component, protein function annotation and genome overlap collinearity analysis on the D.quatolytica core genome sequence of the Dunaliella salinolytica;
(6) And (3) taking the core genome sequence of the Dunaliella D.quartz Colecta constructed in the step (5) as a reference, comparing the whole genome sequencing data of the to-be-detected algal strain obtained in the step (4) and published genome sequencing data of representative algae with the to-be-detected algal strain, detecting single nucleotide polymorphism and insertion/deletion sites among species, and constructing a phylogenetic tree by using the single nucleotide polymorphism, wherein when the to-be-detected algal strain and the Dunaliella D.quartz Colecta are gathered into a cluster, the branched data support rate is 0.99-1.00, the genetic similarity percentage is more than or equal to 99%, and the to-be-detected algal strain is the Dunaliella D.quartz Colecta.
Further, the indoor expanding culture in the step (1) comprises the following specific steps: performing monoclonal picking on algal cells of an algal strain to be detected under an aseptic condition, performing indoor expanded culture under the aseptic condition after passing microscopic examination, wherein the indoor expanded culture condition is as follows: the photoperiod is 18 h: 6h, light intensity 19000lx, temperature: keeping the aseptic ventilation environment at 23 +/-3 ℃, shaking the culture dish every 5 days to prevent the algal cells from adhering to the walls, performing microscopic examination on 0.5-1 mL of algal solution, and preparing the following culture medium solutions to perform indoor expanded culture on the algal strains to be detected, wherein the formula of the culture medium is as follows:
30g/L NaCl,1.5g/L NaNO3,1.4g/L K2HPO4,1.75g/L MgSO4·7H2O,1.36g/LCaCl2·7H2O,1.2g/LNa2CO3,0.006g/L FeC6H5O7,0.005g/LNaH2PO4·2H2O,0.5g/LCo(NO3)2·6H2O,0.8g/LCuSO4·5H2O,2.3g/LZnSO4·7H2O,0.03g/LH3BO3,4.0g/LNa2MoO4·2H2O,0.02g/LMnCl2·4H2O,0.5g/LVB1,0.5g/LVB12VH 0.5g/L and ultrapure water to constant volume of 1L.
Further, the improved CTAB method in the step (2) comprises the following specific steps: taking 600-800 mg of algae to be tested, washing with ultrapure water for 2-3 times, centrifuging at 4 ℃ 8000r/min for 1.5min, adding liquid nitrogen, grinding for 15sec, adding 800 mu L of 2% W/V CTAB solution preheated at 20 ℃ and 1 mu L of 1% V/V beta-mercaptoethanol, uniformly mixing, carrying out water bath at 60 ℃ for 1.5h, shaking for 1 time every 20min, adding 800 mu L of LTris saturated phenol, centrifuging at 4 ℃ 12000r/min for 2.5min, taking supernatant, adding the mixture into the mixture, and adding the mixture into the mixture in a volume ratio of 25: 24: 2, mixing Tris saturated phenol, chloroform and isoamylol, standing for 10min at 4 ℃ after vortex oscillation, uniformly mixing for 2-3 times, and adding 800 mu L of ddH treated by 0.1% V/V DEPC2O, water bath at 60 ℃ for 30min, centrifuging at 4 ℃ for 4min at 12000r/min, taking supernatant, adding 150mL of 3mol/L sodium acetate and 250mL of 4-5 ℃ precooled absolute ethanol, precipitating at-20 ℃ for 50min, centrifuging at 4 ℃ for 3min at 10000r/min, discarding supernatant, adding 1mL of 4-5 ℃ precooled 70% V/V ethanol solution, carrying out vortex oscillation for 20sec, volatilizing liquid in a nucleic acid vacuum drying system after discarding supernatant, adding 100 xTE buffer solution to dissolve precipitate so as to ensure that the DNA concentration is more than or equal to 150 ng/mu L and the 1% W/V agarose gel electrophoresis combined fluorescence quantifier is used for detecting genome DNA, ensuring that an electrophoresis strip is bright and has no degradation, and OD is not degraded 260/OD2801.8 to 1.9, and no pollution.
Further, the specific steps of constructing the DNA sequencing library in the step (3) are as follows: breaking the whole genome DNA by using a strong-grade ultrasonic wave band of 80-100W for 6sec, repeating the breaking for 1 time every 3sec, carrying out ultrasonic treatment for 5 times in total, and setting breaking parameters to be 300-400 bp; carrying out agarose gel electrophoresis on the fragments, and recovering 300-400 bp target fragments by using the agarose gel; adsorbing and recovering the target fragments by using silicon-based magnetic beads, and detecting the quality of the adsorbed and recovered target fragments by using a fluorescence quantitative instrument; DNA end repair, adding A at the 3' end; adding a joint for a connection reaction, and purifying, converting and PCR verifying a connection product; and (3) carrying out single-stranded DNA cyclization reaction on the positive product after the positive product is denatured at 95 ℃ for 20sec, and purifying the product to construct a whole genome DNA sequencing library for use in the computer.
Further, the specific steps of obtaining the core genome sequence of the dunaliella d.quartz necta after screening, assembling and quality evaluation in the step (5) are as follows: screening from a sequencing platform to obtain a high-quality sequence, taking a fragment with the screening sequencing depth of 50-80 x, the average length of 12-15K and the length of N50 greater than 18K as a query sequence, replying the query sequence to a reported dunaliella salina reference genome (Dunal 1 v.2) by utilizing SOAPaligner or BWA software, further screening a sequencing fragment with the sequence consistency of more than or equal to 90 percent and the comparison result E value of less than 1E-10 as dunaliella salina D.quartolola core genome sequence candidate data; comparing all the residual sequencing fragments with the candidate data set to obtain an overlapping area between comparison data; error correction and correction operation are carried out on the comparison result by using Falcon or Pilot software, and the contig is assembled by using SOAPde novo 2.04, Mecat, HERA or Canu software; determining the order of each contig using BySS 2.2.3, Velvet 1.2.10 or ABySS 2.2.3 software; carrying out whole genome coverage measurement and calculation by using BAMStats or GATK DepthOfCoverage software, and screening a core sequence with reference genome coverage of not less than 50% and contig continuous arrangement number of not less than 2000; evaluating the assembly quality of the screened overlapped groups by using BUSCO 2.0 or Quast software, and selecting an assembly sequence with the complete gene ratio of more than or equal to 20 percent, the single-copy gene ratio of 15 percent, the multi-copy gene ratio of more than or equal to 12 percent and the deletion/vacancy ratio of less than or equal to 3 percent as a Dunaliella D.quartolecta core genome sequence; the circular map of the core genome of this species was constructed using the Circos software.
Further, in the step (5), the gene composition, protein function annotation and genome contig collinearity analysis are carried out on the core genome sequence of the dunaliella D.quartolecta, and the specific steps are as follows: CDS prediction is carried out on the assembly data by using Augusts 3.3.3, ESTScan3.0.1, TransDecoder 2.0.1 or Prodigal 2.6.1 software, repeated sequence analysis is carried out on the assembly data by using replay asker 4.0.9, replay proteomMask 3.2.2, LTR-FINDER, Piler 1.0.6 or replay Scout 1.0.5 software, protein sequences coded by CDS are aligned to NR database by using Diamond 0.9.14 or BLASTX software and are annotated with functions, and after the predicted protein sequences are aligned by BLASTSc, MCanX, Last, Mugsy, Spines or progressive masive software, the co-linear analysis of genome is carried out.
Further, the specific steps of constructing the phylogenetic tree by using the single nucleotide polymorphisms in the step (6) are as follows: comparing the algae strain to be detected and 5-6 kinds of representative algae genome data reported in an NCBI database with the Dunaliella D.quartz core genome sequence assembled in the step (5) by using LASTZ 1.02.00 or Mauvee 2.3.1 software, extracting the corresponding genotype of each species and the Dunaliella D.quartz core genome according to the result of the compared collinear block, merging, extracting and filtering the genotype information of all the species by using the Dunaliella D.quartz core genome as a template, and detecting the single nucleotide polymorphism data and the insertion/deletion site data by using BWA0.7.17 software; based on single nucleotide polymorphism data, a phylogenetic tree is constructed by utilizing a maximum likelihood algorithm in easy SpecifesTree 1.0, MEGA 5.0, TreeBeST 1.9.2, PHYLIP, Puzzle 5.2 or PHYLO-WIN software, and then the genetic relationship between the to-be-detected algae strain and the Dunaliella D.quartz necta is determined.
Further, the deletion rate of the filtration is not higher than 20%.
The method provided by the invention does not completely depend on the known whole-genome sequencing result of the Dunaliella, the genome of a related strain without published genome sequencing data, namely the Dunaliella D.quartz genome, is sequenced, and the defects of time consumption, high dependence on an advanced sequencing system platform, high manufacturing cost and the like in the traditional genome sequencing are avoided and overcome by using an optimized data comparison method and a sequence assembly strategy. An operator can perform sequencing data processing, assembling and information analysis according to the genome core sequence and the program command constructed by the invention after obtaining the second-generation sequencing data from a domestic sequencing company, the steps can select a wide software range, the program setting in the example is strict, the operation on the computer is easy, and the method has wide application prospects in the aspects of Dunaliella strain molecule identification, variation detection, system evolution analysis and the like.
On the basis that the whole genome sequencing data of the Dunaliella alga D.quartz necta is not published at home and abroad, the invention firstly constructs the core genome assembly sequence of the Dunaliella alga D.quartz necta, the sequence comprises the current most abundant genetic information and the D.quartz necta core genome information with higher assembly quality, and theory and information support are provided for the genetic oriented improvement and the industrial application of the alga strain by taking the D.quartz necta as reference.
Compared with the prior art, the invention has the following advantages:
1. according to the invention, a D.quartolecta core genome sequence of the dunaliella is constructed for the first time by utilizing a second-generation sequencing combined genome de novo assembly technology, and the sequence contains the D.quartolecta core genome information which is most abundant in genetic information amount and higher in assembly quality at present, so that the blank of the genome information of the species is made up.
2. The core genome sequence of the Dunaliella D.quartz necta constructed by the invention can be applied to the molecular identification of the algae strain, and can be used as the theoretical and technical basis for the phylogenetic and evolutionary research and identification of the Dunaliella at home and abroad while greatly improving the accurate identification efficiency of the Dunaliella strain.
3. Compared with the published Dunaliella salina D.salina whole genome sequence, the Dunaliella salina D.quartz genome constructed by the invention has smaller data volume, and is used as a reference sequence to analyze the sequencing data of the genome of the strain to be detected, so that the data comparison time can be greatly shortened, the effective Single Nucleotide Polymorphism (SNP) data acquisition efficiency of the strain to be detected is improved, the important reference value is provided for the genetic variation analysis of the genome level Dunaliella salina related strain, and a rich data basis is provided for the systematic research of origin and evolution of low-class algae, particularly green algae.
4. By taking the core genome sequence of the Dunaliella alga D.quartz necta constructed by the invention as reference, corresponding experimental groups and control groups are set according to different experimental purposes of researchers, or the alga strain and the kindred strain thereof are compared to mine difference or characteristic genes, which lays a foundation for improving and researching the quality of the alga strain from the molecular level and promoting the industrial application of the alga strain.
5. The method for indoor expanded culture of the Dunaliella D.quartolecta and the to-be-detected algal strains, the improved CTAB method, the screening of core genome sequencing data and the de novo assembly of sequencing fragments can be widely applied to algae, particularly to the aspects of artificial culture of green algae, high-quality whole genome DNA extraction, genome sequencing data optimization processing and the like, has shorter experimental period, higher efficiency and easy operation compared with the traditional method, and is a set of indirectly-replicable technical method.
Drawings
FIG. 1 is a circular map of the core genome of Dunaliella alga D.quartolecta assembled from the head, the outermost layer of the map is the nucleotide sequence size coordinate (unit: Mbp), the inner side is the de novo assembled fragments arranged based on the sequence identity (relative to the reference genome Dunsal1 v.2), the internal lines of the genome fragments represent the gene sites of each type, the innermost side is the corresponding contig sequencing abundance map, and the internal part of the circular map is the basic information of the core genome of the alga;
FIG. 2 is a morphological observation result of an alga strain to be identified (tentatively named Dunaliella sp.) after indoor expanding culture for 30 days, wherein the upper part is macroscopic condition, the lower part is microscopic condition (scale bar: 50 μm), and No. 1-4 samples of the alga are sequentially arranged from left to right;
FIG. 3 is a schematic diagram of 1% agarose gel electrophoresis detection of whole genome DNA of a sample to be identified, M1 and M2 represent DNAsadeders;
FIG. 4 is a plot of collinearity analysis scatter diagram between the D.quartolecta core genome of Dunaliella and the sequencing fragment of the genome of the strain to be identified, the dots in the plot represent collinearity blocks between the genomes of the two species, and A and B in the plot represent 2 collinearity regions densely distributed between the D.quartolecta and the genome of the strain to be identified, respectively;
FIG. 5 is a phylogenetic tree between 7 different algae constructed based on Single Nucleotide Polymorphism (SNP) data, the phylogenetic tree construction algorithm is maximum likelihood method, the step value is set to 1000, and the data between each branch node represents the support rate and the genetic similarity percentage respectively;
FIG. 6 is a circle of collinearity analysis within the core genome of an identified Dunaliella strain Dq _ SX, the connecting lines between the segments within the circle representing possible doubling events during evolution of the species' genome, the numbers on the circle representing core genome contig numbers;
FIG. 7 is a histogram of the frequency distribution of the Ka/Ks values of the identified Dunaliella strain Dq _ SX, where the data on the histogram represent the frequency values in different intervals, Ka represents nucleotide non-synonymous substitution rate, and Ks represents nucleotide synonymous substitution rate;
FIG. 8 is a histogram of the statistics of the annotation information of the protein COG in the core genome of an identified Dunaliella strain Dq _ SX, i.e., the orthologous protein database, with the histogram accounting for the functional information of the homologous protein annotation information at the top20 (top 20);
FIG. 9 is a diagram showing prediction of transmembrane domain of a transcription regulatory factor in the identified Dunaliella strain Dq _ SX, in which different lines represent the region of the membrane, the intramembrane region and the extramembrane region, respectively, the vertical axis represents the probability value predicted by the region, and the horizontal axis represents the amino acid position;
FIG. 10 is a diagram showing the structure prediction of a signal peptide of a transcription regulator identified in Dunaliella strain Dq _ SX, wherein C-score, S-score and Y-score represent the cleavage site score, signal peptide score and comprehensive score value, respectively;
fig. 11 is a venturi diagram of metabolic pathways of d.quartz ecta and Dq _ SX of dunaliella, the intersection part is a common metabolic pathway between two algal strains, and the metabolic pathway prediction of the two algal strains is performed based on KEGG database, i.e. japanese Kyoto gene and genome encyclopedia;
Fig. 12 is a map of the unique pre-20 (top20) metabolic pathway enrichment bubbles in dunaliella d.quartz, the metabolic pathway information is from KEGG, i.e. japanese kyoto genes and genome encyclopedia database, the larger the bubble volume represents the more genes involved in the pathway, the darker the bubble color represents the higher the confidence of the pathway (the lower the Q value), the degree of enrichment (significance) is expressed as the enrichment ratio, which is the number of genes/total number of genes annotated by KEGG pathway;
fig. 13 is a map of the enrichment of the top20 (top20) metabolic pathway unique to the identified strain Dq _ SX, the metabolic pathway information is from kyoto genes and genome encyclopedia database (KEGG) in japan, the larger the bubble volume represents the larger the number of genes involved in the pathway, the darker the bubble color represents the higher the confidence (lower Q-value) of the pathway, the degree of enrichment (significance) is expressed as an enrichment ratio, which is the number of genes/total number of genes annotated by the KEGG pathway;
fig. 14 is a GO enrichment analysis histogram of the d.quartolecta significantly enriched metabolic pathway top20 (top20), GO is a database established by the gene ontology association, and the more GO entries, the higher the corresponding-log 10(Q value) (the higher the confidence), the higher the degree of the gene participating in the biological function;
Fig. 15 is a GO enrichment analysis histogram of the identified strain Dq _ SX significantly enriching the top20 ranking in the metabolic pathway (top20), GO is a database established by the gene ontology association, the more GO entries, the higher the corresponding-log 10(Q value) (higher confidence), the higher the degree of gene involvement in the biological function;
FIG. 16 is a phylogenetic tree constructed based on ITS genes of 21 Dunaliella, the construction algorithm of the phylogenetic tree is a maximum likelihood method, the step value is set to 1000, and the data among the branch nodes respectively represent the support rate and the genetic similarity percentage;
FIG. 17 is a phylogenetic tree constructed based on 21 Dunaliella SSR markers, the evolutionary tree construction algorithm is a maximum likelihood method, the step value is set to 1000, and data among branch nodes respectively represent support rate and genetic similarity percentage;
FIG. 18 is a phylogenetic tree constructed based on 21 Dunaliella genome SNP, the evolutionary tree construction algorithm is a maximum likelihood method, the step value is set to 1000, and the data among all branch nodes respectively represent the support rate and the genetic similarity percentage.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
A method for whole genome sequencing of Dunaliella D.quartolecta and de novo assembly of core genome sequence fragments thereof comprises the following steps:
step 1, performing monoclonal picking on an alga cell of a strain of Dunaliella D.quartz necta under an aseptic condition, performing indoor expanded culture under the aseptic condition after passing microscopic examination, wherein the indoor expanded culture condition is as follows: the photoperiod is 18 h: 6h, light intensity 19000lx, temperature: keeping the aseptic ventilation environment at 23 +/-3 ℃, shaking the culture dish every 5 days to prevent the algal cells from adhering to the walls, performing microscopic examination on 0.5-1 mL of algal solution, and preparing the following culture medium solutions to perform indoor expanded culture on the algal strains to be detected, wherein the formula of the culture medium is as follows:
30g/L NaCl,1.5g/L NaNO3,1.4g/L K2HPO4,1.75g/L MgSO4·7H2O,1.36g/LCaCl2·7H2O,1.2g/LNa2CO3,0.006g/L FeC6H5O7,0.005g/LNaH2PO4·2H2O,0.5g/LCo(NO3)2·6H2O,0.8g/LCuSO4·5H2O,2.3g/LZnSO4·7H2O,0.03g/LH3BO3,4.0g/LNa2MoO4·2H2O,0.02g/LMnCl2·4H2O,0.5g/LVB1,0.5g/LVB12VH is 0.5g/L, and the volume of ultrapure water is constant to 1L;
step 2, extracting the whole genome DNA of the Dunaliella D.quartz necta by using the improved CTAB method of the invention, ensuring that the DNA concentration is not lower than 150 ng/mu L and the OD is not lower than260/OD280Between 1.8 and 1.9, free of protein, salt ion and RNA contamination; the specific procedures are as follows: taking 600-800 mg of indoor expanded cultured algae cells, centrifuging at 8000r/min at 4 ℃ for 1.5min, adding liquid nitrogen, grinding for 15sec, adding 800 mu L of 2% W/V CTAB solution preheated at 20 ℃ and 1 mu L of 1% beta-mercaptoethanol (V/V), uniformly mixing, then carrying out water bath at 60 ℃ for 1.5h, shaking up 1 time every 20min during the mixing, adding 800 mu L of L-phenol, centrifuging at 12000r/min at 4 ℃ for 2.5min after uniformly mixing, taking supernatant, adding the mixture into the mixture, and adding the mixture into the mixture according to the volume ratio of 25: 24: 2 Tris saturated phenol, chloroform and iso And (3) standing the amyl alcohol mixed solution for 10min at 4 ℃ after vortex oscillation, uniformly mixing for 2-3 times, and adding 800 mu L of ddH treated by 0.1% DEPC (V/V)2O, carrying out water bath at 60 ℃ for 30min, centrifuging at 4 ℃ of 12000r/min for 4min, taking supernatant, adding 150mL of 3mol/L sodium acetate and 250mL of 4-5 ℃ absolute ethyl alcohol, precipitating at-20 ℃ for 50min, centrifuging at 4 ℃ of 10000r/min for 3min, then discarding supernatant, adding 1mL of 70% (V/V) ethanol solution precooled at 4-5 ℃ and carrying out vortex oscillation for 20sec, removing supernatant, volatilizing liquid in a nucleic acid vacuum drying system, and adding a proper amount of 100 × TE buffer solution (10mmol/LTris-HCl, 1mmol/L EDTA) to dissolve precipitate;
step 3, after the whole genome DNA is broken for 5 times (6 sec/time, On/6s Off and once every 3 sec) by using the strong energy (80-100W) of a non-contact ultrasonic crusher, obtaining a short DNA fragment which meets the length requirement (300-400 bp);
step 4, carrying out 1.5% TBE agarose gel recovery and magnetic bead purification and selection on the DNA fragment (AgencourtAmpure XP Beads magnetic Beads are selected in the invention), carrying out further screening to obtain a sample with the size of 300-400 bp, detecting the quality of the sample, and ensuring that the quality of the genomic DNA meets the quality standard of the step (1);
step 5, repairing the ends of the obtained qualified DNA sample under the action of T4 DNA polymerase and Klenow polymerase, preparing blunt ends, and adding A bases at the 3' end; preparing a connection reaction system: 1 μ LT4 DNA ligase, 1 μ LT vector, 5 μ L of 1 Xligation reaction buffer, 5 μ L linker (10 μmol/L), 5 μ L DNA sample, sterile water to constant volume of 20 μ L; obtaining a connecting reaction product after water bath at 16 ℃ overnight, and purifying the product according to the requirements of an Agencourt AMPure XP kit; carrying out PCR verification and sequencing on the purified product by bacterial liquid after competent cell transformation and blue-white screening (the step can be finished by a sequencing company), selecting a positive cloning result, and detecting an amplification product by using an Agilent 2100 Bioanalyzer; after the positive amplification product is denatured at 96 ℃ for 30sec, a DNA circularization amplification system is prepared: 2 mu L of DNA sample, 4 mu L of 5 × Rapid ligation buffer, 1 mu L of ligase, and double distilled water to constant volume of 20 mu L; after the amplification system is subjected to water bath at 25 ℃ for 15min, adding linear DNA digestive enzyme for digestion for 10min, and finally obtaining a DNA sequencing library; detecting the concentration of the library by using an Agilent SureSelectQXT WGS instrument, ensuring that the concentration of the library does not exceed 2nmol/L and the volume is not less than 12 mu L;
Step 6, performing gradient PCR on the sequencing library obtained in the step 5 to prepare an amplification system: mu.L of the library sample to be tested, 1. mu.L of each primer pair (optionally using a second generation sequencing adapter primer kit), 0.5. mu.L of DNA polymerase, 2.5. mu.L of dNTPs, and 1.5. mu.L of MgCl22.5 μ Lbuffer buffer, ddH2O is added to the volume of 25 mu L; the PCR amplification procedure was: cycling at 96 deg.C for 3min and 96 deg.C for 30sec for 40 times (reducing 1 deg.C to 56 deg.C and 72 deg.C for 45sec every 0.5 sec), at 72 deg.C for 8min, and storing at 4 deg.C; the amplified fragment is subjected to high-throughput sequencing by a combined anchored polymerization technology (cPAS), and the step is finished by a sequencing company with related technical qualification;
and 7, filtering the original sequencing data of the Dunaliella D.quartz-origin obtained in the step 6, filtering out low-quality sequencing data (short sequences with the length less than 5kb, sequences with the average quality less than 8 and linker sequences) by using ngsQCToolkit 2.3.3, respectively storing the obtained high-quality sequencing data in a FASTQ file format, wherein the file is named as Dq.fq, and performing core fragment screening and assembling on the D.quartz-origin whole genome sequencing data (Dq.fq) of the Dunaliella.
Step 8, the specific steps of core genome fragment screening and assembling are as follows: screening a sequencing data set with the sequencing depth of 50-80X, the average length of 12-15K and the length of N50 larger than 18K from the D.quartolecta sequencing data (dq.fq), replying the sequencing data set to a Dunsal1 v.2 of the Dunaliella salina (D.salinalinina) reference genome, performing quality control on the replying result by using Picard software, setting the comparison rate to be more than or equal to 90 percent and the comparison parameter to be 1e-10, and screening a sequence meeting the conditions as the D.quartolecta genome core sequence candidate data; performing BLASTn comparison on the residual D.quartolecta genome sequencing data and core sequence candidate data by using Burrows-Wheeleraliment (BWA) software, setting comparison parameters to be 1e-8, performing error correction by using Falcon software, acquiring an overlapping region between comparison data, and performing contig assembly by using SOAPde novo 2.04 software, wherein the set program command is as follows:
1)#maximal read length
2)max_rd_len=100
3)[LIB]
4)#average insert size
5)avg_ins=300
6)#ifsequence needs to be reversed
7)reverse_seq=0
8)#in which part(s)the reads are used
9)asm_flags=3
10)#use only first 100 bps ofeach read
11)rd_len_cutoff=100
12)#in which order the reads are used while scaffolding
13)rank=1
14)#cutoffofpair number for a reliable connection(at least 3 for short insert size)
15)pair_num_cutoff=3
16)#minimum aligned length to contigs for a reliable read location(at least 32for short insert size)
17)map_len=32
18)#a pair offastq file,read 1 file should always be followed by read 2 file
19)q1=/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/Dq_1.fq
20)#SOAPdenovo-63mer all–s config.txt-p 10-K 55-M 3-F-u–o
21)#SOAPdenovo-63mer all-s-config.txt p 40-K 27-D 1-N 500m-o./result/MDCZ_27>MDCZ_27.log
22)SOAPdenovo-63mer all-s/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/soapdenovo/config.txt-p 10-K 55-o
23)/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/soapdenovo/test
24)qsub-l nodes=1-q queue8./soap.sh
And 9, reassembling the contigs by ABySS 2.2.3 software, wherein the set program command is as follows:
25)conda install-c conda-forge-c bioconda-c defaults ABySS
26)ABYSS-k 31-o/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/ABySS/31_contigs.fa
27)/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/Dq.fq
28)qsub-l nodes=1-q queue6./ABySS.sh
and step 10, evaluating the quality of the Dunaliella D.quartz vitrecta genome assembly sequence by using BUSCO 2.0 software, and selecting the assembly sequence with the complete gene ratio of more than or equal to 20 percent, the single-copy gene ratio of 15 percent, the multi-copy gene ratio of more than or equal to 12 percent and the deletion/vacancy ratio of less than or equal to 3 percent as the Dunaliella D.quartz vitrecta core genome sequence. The set program commands are:
29)python/public/home/wangjingchun/miniconda2/bin/run_BUSCO.py-i
30)/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/02busco/Dq_contig.fa-m geno-l
31)/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/02busco/eukaryota_odb10-o results_Dq
and step 11, performing functional gene CDS prediction on the screened core genome assembly data by using Augustus 3.3.3 software, wherein the set program command is as follows:
32)augustus--strand=both--genemodel=partial--singlestrand=false--protein=on--introns=on--start=on--stop=on--cds=on--codingseq=on--alternatives-from-evidence=true--gff3=on--UTR=false--outfile=/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/04gene/Dqaugustus/out.gff--species=volvox/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/04gene/Dq/Dq_masked.fa
step 12, constructing a core genome circular map of the alga by using a Circos software, wherein the set program command is as follows:
33)#circos.conf
34)karyotype=data/karyotype/karyotype.Dq.txt
35)<ideogram>
36)<spacing>
37)default=0.005r
38)</spacing>
39)radius=0.9r
40)thickness=20p
41)fill=yes
42)</ideogram>
43)#The remaining content is standard and required.It is imported
44)#from default files inthe Circos distribution.
45)#These shouldbe present in every Circos configuration file and
46)#overridden as required.To see the content ofthese files,
47)#look in etc/in the Circos distribution.
48)<image>
49)#Included from Circos distribution.
50)<<include etc/image.conf>>
51)</image>
52)#RGB/HSV color definitions,colorlists,location offonts,fill patterns.
53)#Included from Circos distribution.
54)<<include etc/colors_fonts_patterns.conf>>
55)#Debugging,I/O an dother systemparameters
56)#Included from Circos distribution.
57)<<include etc/housekeeping.conf>>
according to the genome assembly quality evaluation results, a core genome sequence can be screened from the Dunaliella tertiolecta D.quartolecta, the size of the core genome sequence is 6592916bp, the number of contigs is 3000, the maximum contig length is 1133322bp, the average length of the contigs is 2197.64bp, the contig N50 is 15270, the proportion of complete genes is 23.65%, the proportion of single-copy genes is 15.18%, the proportion of multi-copy genes is 13.76%, the proportion of vacancy/deletion is 1.89%, the predicted CDS proportion is 38.03%, and the core genome circular map is shown in FIG. 1.
Example 2
A method for strain identification using the core genome sequence of dunaliella d.quartolecta, comprising the steps of:
step 1, sample collection, purification and culture: collecting an alga strain to be detected (tentatively named Dunaliella sp), purifying the alga strain to be detected, and then carrying out indoor amplification culture, wherein the method comprises the following specific steps: monoclonal picking of algal cells of the algal strain to be detected under the aseptic condition, performing indoor expanded culture under the aseptic condition after passing microscopic examination, wherein the indoor expanded culture conditions are as follows: the photoperiod is 18 h: 6h, light intensity 19000lx, temperature: keeping the aseptic ventilation environment at 23 +/-3 ℃, shaking the culture dish every 5 days to prevent the algal cells from adhering to the walls, performing microscopic examination on 0.5-1 mL of algal solution, and preparing the following culture medium solutions to perform indoor expanded culture on the algal strains to be detected, wherein the formula of the culture medium is as follows:
30g/L NaCl,1.5g/L NaNO3,1.4g/L K2HPO4,1.75g/L MgSO4·7H2O,1.36g/LCaCl2·7H2O,1.2g/LNa2CO3,0.006g/L FeC6H5O7,0.005g/LNaH2PO4·2H2O,0.5g/LCo(NO3)2·6H2O,0.8g/LCuSO4·5H2O,2.3g/LZnSO4·7H2O,0.03g/LH3BO3,4.0g/LNa2MoO4·2H2O,0.02g/LMnCl2·4H2O,0.5g/LVB1,0.5g/LVB12VH is 0.5g/L, and the volume of ultrapure water is constant to 1L; the algal strains obtained by the scale-up culture were divided into 4 specimens (Nos. 1 to 4).
Step 2, extracting whole genome DNA: respectively taking algae liquid (figure 2) in a mature period (about 30 days), centrifuging at a low temperature of 4 ℃ for 1.5min (8000r/min), enriching algae cells, quickly freezing by using liquid nitrogen, quickly grinding for 15sec, and respectively extracting whole genome DNA by using an improved CTAB method, wherein the specific procedure is as follows: adding 800 mu L of 2% (W/V) CTAB solution preheated at 20 ℃ into the grinding powder, adding 1 mu L of 1% beta-mercaptoethanol (V/V), gently mixing uniformly, then carrying out water bath at 60 ℃ for 1.5h, adding 800 mu L of Tris saturated phenol, gently mixing uniformly, centrifuging at 4 ℃ of 12000r/min for 2.5min, taking supernatant, and adding the mixture into the mixture according to the volume ratio of 25: 24: 2 Tris-saturated phenol, chloroform and isoamyl alcohol mixture, and vortex oscillating Standing at 4 deg.C for 10min, gently mixing for 2-3 times, adding 800 μ L of 0.1% DEPC (V/V) -treated ddH2O, water bath at 60 ℃ for 30min, centrifuging at 12000r/min at 4 ℃ for 4min, taking supernatant, adding 150mL of 3mol/L sodium acetate and 250mL of anhydrous ethanol pre-cooled at 4-5 ℃, precipitating at 20 ℃ for 50min, centrifuging at 10000r/min at 4 ℃ for 3min, discarding supernatant, adding 1mL of 70% (V/V) ethanol solution pre-cooled at 4-5 ℃, performing vortex oscillation for 20sec, volatilizing the supernatant in a nucleic acid vacuum drying system, adding 100 muL of 100 xTE buffer (10mmol/L Tris-HCl, 1mmol/L EDTA) to dissolve and precipitate, detecting the quality of genome DNA by 1% (W/V) agarose gel electrophoresis combined with a fluorescence quantifier, and ensuring that the DNA concentration is not lower than 150 ng/muL and the OD is not lower than 150 ng/muL260/OD280Between 1.8 and 1.9, free of protein, salt ion and RNA contamination. Agarose gel electrophoresis detection results show (fig. 3) that the DNA concentration of the No. 1 and No. 4 samples is higher, and the integrity is better; the results of the fluorescent quantitative detection also show (Table 1), that the samples No. 1 and No. 4 have higher DNA concentration and less pollution, and are suitable for being used as candidate samples for the next library construction.
TABLE 1 fluorescent quantitative determination of the quality of the whole genome DNA of an algae sample to be identified
Sample numbering Dilution factor (X) Sample size (μ L) Detection concentration (ng/. mu.L) OD260/OD 280
1 1 1 204.6 1.85
2 1 1 152.0 1.69
3 1 1 72.2 1.62
4 1 1 384.1 1.89
Step 3, constructing a DNA sequencing library: taking about 2.0 mu g of whole genome DNA, obtaining short DNA fragments meeting the length requirement (300-400 bp) after 5 times of strong energy interruption (6 sec/time, On/6s Off and once every 3 sec) by a 80-100W non-contact ultrasonic crusher, then agarose gel electrophoresis is carried out (the concentration of the agarose gel is 1 percent, the voltage is 150V), EB staining is carried out after 30min of electrophoresis, fragments of about 300-400 bp are intercepted under an ultraviolet lamp and recovered, adding 10 mu L of silicon-based magnetic Beads (the AgencourtAmpure XP Beads magnetic Beads are selected in the invention) with the adsorption range of 300-400 bp into the dissolved glue recovery liquid, uniformly mixing, placing the mixture in a magnetic frame for separation, washing the separated magnetic Beads for 2-3 times by 150 mu L of 80% ethanol, adding 15 mu L of 0.1 XTE, mixing, standing at room temperature for 10min, placing a centrifugal tube on the magnetic frame, and collecting the supernatant after about 8 min. After the fluorescent quantitative detection is qualified, the obtained qualified DNA sample repairs the end under the action of T4 DNA polymerase and Klenow polymerase, a blunt end is prepared, and A is added to the 3' end; preparing a connection reaction system: 1 u L T4 DNA ligase, 1 u LT vector, 5 u L1 Xligation reaction buffer, 5 u L linker (10 u mol/L), 5 u L DNA sample, sterile water constant volume to 20L. Obtaining a connecting reaction product after water bath at 16 ℃ overnight, and purifying the product according to the requirements of an Agencourt AMPure XP kit; carrying out PCR verification and sequencing on the purified product after transformation and screening by using a bacterial liquid (the step can be finished by a sequencing company), selecting a positive cloning result, and detecting an amplification product by using an Agilent 2100 Bioanalyzer; the amplification product was denatured at 96 ℃ for 30sec and then placed on ice to prepare a DNA circularization amplification system: mu.L of DNA sample, 4. mu.L of 5 × Rapid ligation buffer, 1. mu.L of ligase, and double distilled water to a volume of 20. mu.L. And (3) after the amplification system is subjected to water bath at 25 ℃ for 15min, adding linear DNA digestive enzyme for room temperature digestion for 10min, finally obtaining a DNA sequencing library, and detecting the concentration of the library by using an Agilent SureSelectQXTWGS instrument to ensure that the concentration of a single library does not exceed 2nmol/L and the volume is not less than 12 mu L.
Step 4, performing gradient PCR on the sequencing library obtained in the step 3 to prepare an amplification system: mu.L of the library sample to be tested, 1. mu.L of each primer pair (optionally using a second generation sequencing adapter primer kit), 0.5. mu.L of DNA polymerase, 2.5. mu.L of dNTPs, 1.5. mu.L of LMgCl22.5 μ Lbuffer buffer, ddH2O is added to the volume of 25 mu L; the PCR amplification procedure was: cycling at 96 deg.C for 3min and 96 deg.C for 30sec 40 times (every 0.5sec, 1 deg.C is decreased to 56 deg.C and 72 deg.C for 45sec), at 72 deg.C for 8min, and storing at 4 deg.C; and (3) carrying out high-throughput sequencing on the amplified fragment by a joint anchored polymerization technology (cPAS) to obtain whole genome sequencing data of the strain to be tested (the step can be finished by a sequencing company with related technical qualifications).
And 5, performing quality control on the original sequencing data of the to-be-detected algae strain obtained in the step 4 (Q20 is more than 96%, and GC content is more than 45%), respectively performing data filtration, filtering out low-quality sequencing data (short sequences with the length less than 5kb, sequences with the average quality less than 8 and linker sequences) by utilizing ngsQCToolkit 2.3.3 software, setting a filtration parameter to be-l 20-Q0.5-n 0.03-A0.28', storing the obtained high-quality sequencing data (table 2) in a FASTQ file format, and naming the file as Dsp.fq.
TABLE 2 statistical table of the sequencing information of filtered strains to be identified
Sample numbering Number of fragments after filtration Number of bases after filtration Read length Q20(%) GC(%)
1 238,959 23,895,898 100 97.90 49.11
4 155,286 15,528,625 100 95.36 47.47
As can be seen from Table 2, the quality control test shows that the sample of the strain to be identified with the number of 1 has better sequencing quality (higher Q20 and GC content), and can be used for data comparison and analysis in the next step.
Step 6, genome sequencing data (Dsp. fq) of the to-be-identified algae strain obtained in the step 5 and genome sequencing data of 5 representative algae published by NCBI database, namely, stonewort (Chara braunii), Chlamydomonas eustigma (Chlamydomonas eustigma), Microcystis aeruginosa (Microcystis aeruginosa), Microcystis paniformis and Volvox carteri, are collected, with reference to the D.quartolecta D.quarttactuctive genome sequence assembled and constructed in the example 1, the genome data of the algae are compared with the D.quartolecta D.tact core genome data by using LASTZ1.02.00 software, and the genotype corresponding to the D.quarttact of each species is extracted from the results of the result of the collinearity blocks (A and B in the figure 4), and the genotype information is merged, extracted and filtered (the loss rate of filtration is less than or equal to 20%).
And 7, detecting Single Nucleotide Polymorphism (SNP) and insertion/deletion sites (Indel) among the species in the step 5 by using an BWA0.7.17 software with the core genome sequence of the Dunaliella alga D.quartz as a reference, wherein a program command for detecting the data of the strain to be detected is as follows:
1) Establishing a library of bw index-abwtsw Dq
2)bwa aln-t 2-f/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dq_results/Dsp_R1.sai/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dq.fna/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/00data/Dsp_1.fq
3)bwa aln-t 2-f/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp_R2.sai/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dq.fna/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/00data/Dsp_2.fq
4)bwa sampe-f/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp.sam/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dq.fna/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp_R1.sai/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp_R2.sai/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/00data/Dsp_1.fq/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/00data/Dsp_2.fq
5)samtools view-@20-b-S/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp.sam-o/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp.bam
6)samtools sort-@20-m 150G/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp.bam-o/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp.sort.bam
7)samtools rmdup-S/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp.sort.bam/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp.rmdup.bam
8)samtools index/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp.rmdup.bam/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp.rmdup.bam.bai
9)samtools mpileup-gf/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dq.fna/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp.rmdup.bam>/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/08snp/Dsp_results/Dsp.bcf
10)bcftools view-A./Dsp.bcf>Dsp.vcf
Step 8, sequencing fragment read length is greater than 100, using an aln mem algorithm, and the program commands as follows:
1)samtools view-@20-b-S./result/SRR2602391.sam-o./result/Dsp.fq.bam
2)samtools sort-@20-m150G./result/Dsp.fq.bam-o./result/Dsp.fq.sort.bam
3)samtools rmdup-S./result/Dsp.fq.sort.bam./result/Dsp.fq.rmdup.bam
4)samtools index./result/SRR2602391.rmdup.bam./result/Dsp.fq.rmdup.bam.bai
5)samtools mpileup-gf./database/grape.fa./result/*.rmdup.bam>Vitis_2.bcf
6)bcftools call-Avm Vitis.bcf>Vitis.vcf
and 9, carrying out SNP and InDel detection programs and algorithms of other representative algae genomes in the same steps as the algae strains to be identified.
Step 10, detecting effective Single Nucleotide Polymorphism (SNP) and insertion/deletion site (InDel) data by using BWA0.7.17 software, wherein when detecting the SNP and the InDel, a repeated segment is marked and ignored firstly, then the region near the InDel is compared again, and finally the SNP and the InDel are obtained by screening. As can be seen from the statistical results of the strain Dq _ SX to be identified (Table 3), the major SNP type of the genome of the strain is mainly converted into nucleotide, and the transversion mainly occurs between adenine (A) and thymine (T).
TABLE 3 statistics of SNPs and InDel of the strains to be identified
Species (II) The strain Dunaliella to be identified.
Number of SNPs 968,450
Number of InDel 61,140
SNP type 1 TC conversion (number: 167,620)
SNP type 2 AG conversion (quantity: 167,120)
SNP type 3 GA conversion (quantity: 167,060)
SNP type 4 CT conversion (number: 266,320)
SNP type 5 AT transversion (quantity: 200,330)
Step 11, using easy specificity tree 1.0 software, performing phylogenetic tree construction (fig. 5) based on the obtained effective SNP data, further determining the genetic relationship between the algal strain to be tested and the dunaliella d.quartolecta, adopting a maximum likelihood algorithm, wherein the step value is 1000, and the program command is set as follows:
1)orthofinder-forthsp1-M msa-S diamond-t 16-a 16
2)orthofinder-f/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/06tree-M msa-S diamond-t 10-a 10-o
3)vol3/agis/xiaoyutao_group/wangjingchun/yanzao/06tree/results
4) Second column of input file-in 1 orthopofinder/Results _ Sep25/working directory/specificids. txt-in 4 cat
5) Input files-in 2 and-in 3 from xxx/orthsp1/OrthoFinder/Results _ Sep 25/Orthologs
6)-in2#cp Orthogroups_SingleCopyOrthologues.txt../../../easy/SingleCopyOrthologues.txt
7)-in3#cp Orthogroups.tsv../../../easy/Orthogroups.csv
8)python2.7/vol1/agis/xiaoyutao_group/wangjingchun/software/EasySpeciesTree/EasySpeciesTree.py-in1
9)SpeciesIDs.txt-in2 SingleCopyOrthologues.txt-in3 Orthogroups.csv-in4 all.pep.fa-t 2
And step 12, determining whether the alga strain to be detected belongs to the D.quartz-glomerecta based on the support rate and the percentage value of the genetic similarity between branches of the constructed phylogenetic tree, namely when the support rate between the alga strain to be detected and the D.quartz-glomerecta is 0.99-1.00 and the percentage of the similarity is more than or equal to 99%, the genome coverage is more than or equal to 55%, and determining that the alga is the D.quartz-glomerecta. As can be seen from fig. 4, the support ratio between the strain to be identified (Dunaliella sp.) and the Dunaliella d.quartz is 1.00, the percentage of similarity is 100%, the genome coverage is 56.8%, and the strain can be identified as the Dunaliella d.quartz.
Example 3
Analyzing genetic variation and evolution characteristics of a identified alga strain Dq _ SX genome by taking the D.quartolecta core genome data of the Dunaliella as reference, and comprising the following steps:
step 1, referring to the method for screening and assembling dunaliella D.quatolocta core genome sequencing data constructed by the invention (see example 1), the software SOAPde novo 2.04 is used for screening core fragments and assembling de novo on the whole genome sequencing data of an identified dunaliella strain (tentatively named as Dq _ SX) (the method can be seen in examples 1 and 2), and the main indexes of the screened and assembled core genome sequence of the dunaliella strain are shown in Table 4.
Step 2, using LASTZ 1.02.00 software to perform co-linear analysis on the Dunaliella Dq _ SX core genome assembly data constructed in step 1, and obtaining the repeated segments of the doubling event occurring between different regions in the species genome (FIG. 6).
Step 3, taking the core genome sequence of the Dunaliella D.quartz necta constructed by the invention as a reference template, comparing the core genome sequence with the Dq _ SX core genome data assembled in the step 1 by using TBtools software, screening homologous genes between the core genome sequence and the Dq _ SX core genome data from a comparison result by using Orthofinander 2.3.11 software, and setting the screening conditions as follows: p-value<10-50,score>80, program naming is set as: this method is characterized by that the first and second keys are used to generate a new key, and the first key is used to generate a new key, and the second key is used to generate a new key, and the third key is used to generate a new key, and the new key is used to generate a new key, so that the new key can be used to generate a new key.
And 4, taking the homologous gene information screened in the step 3 as a data analysis set, detecting synonymous and non-synonymous mutation sites by using PAML 4.8 software, calculating a non-synonymous substitution rate (Ka) and a synonymous substitution rate (Ks) value, and estimating the evolutionary selection pressure of the identified strain Dq _ SX according to the Ka/Ks value (figure 7).
TABLE 4 identified Dunaliella alga Dq _ SX core genome assembly data and quality evaluation thereof
Figure BDA0002767618840000241
As can be seen from table 4, the core genome assembly of the dunaliella strain Dq _ SX was identified to be complete, with incomplete fragments accounting for only 16.12%, and with only 1.54% of gaps or deletions. As can be seen from FIG. 6, it was identified that the algal strain Dq _ SX may have a large number of doubling events in different regions of its genome during the evolution process, and there are 1007 pairs of segments involved in the doubling events, which suggests the complexity of the species evolution process. As can be seen from FIG. 7, it was identified that 80.52% of the genes in the core genome of the strain Dq _ SX have a Ka/Ks ratio of less than 1.0 (mean value of Ka/Ks is 0.47; when the Ka/Ks ratio is in the range of 0.35-0.45, the frequency is at most 0.108) relative to the core genome of D.quartolecta constructed according to the present invention, suggesting that most genes of the strain were subjected to purification selection pressure during the evolution process (FIG. 7).
Example 4
The repeated fragment prediction, the function annotation of the predicted protein and the structural feature analysis of the identified Dunaliella strain Dq _ SX core genome comprise the following steps:
step 1, using replay scanner 4.0.9 software to perform repeated sequence analysis on the identified dunaliella Dq _ SX core genome assembly data in example 3, firstly constructing a sequence database to be tested (BuildDatabase-name Dq _ SX _ contig. fa), and setting the following program commands:
1)RepeatModeler-pa 10-database/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/03repeat/Dq_SX/Dq_SX-engine ncbi-recoverDir/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/03repeat/Dq_SX
2)qsub-l nodes=1-q queue8./repeatmodeler.sh
Step 2, obtaining consensus, fa, masked family, stk in the family directory
And 3, a # fasta file and a family of common identification repeat sequences obtained by training are marked after the sequence id, and if the family can not be classified, the family is marked as 'Unkown'. Stk is a Seed alignment (Seed alignment) file, is in a Dfam-compatible Stockholm format, and can be uploaded to a Dfam _ con-sensus database by using a tool 'RepeatModler/util/dfamConnsensolsTool.pl' carried by a RepeatModler installation path.
And 4, searching a repetitive sequence in the Dunaliella Dq _ SX core genome, and setting a program command as follows:
1)RepeatMasker-pa 4gff lib/public/home/wangjingchun/RM_Dq_SX/consensi.fa dir/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/03repeat/Dq_SX/Repeatmasker/lib_result/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/03repeat/Dq_SX/Dq_SX_contig.fa
2)qsub-l nodes=1-q queue8./repeatmasker2.sh
step 5, BLASTp comparison is carried out on CDS coding protein sequences of the core genome of the identified algal strains and a non-redundant protein database (NR) by utilizing Diamond 0.9.14 software, so as to obtain functional annotations of the proteins, and the comparison parameter is set to be 1e-value less than or equal to 10-5The program command is set as follows:
1)$diamondmakedb--innr_eukaryon.fasta-d nr_eukaryon_20200805
2)$diamond blastx--db nr_eukaryon_20200805--query reads.fq.gz--outreads.tab
3)$diamond blastp--db nr_eukaryon_20200805--query proteins.fasta--outnr.tab--outfmt 6--sensitive--max-target-seqs 20--evalue 1e-5--id 30--block-size20.0--tmpdir/dev/shm--index-chunks 1
and 6, performing the collinear analysis of the repetitive fragments of the core genome of the identified strain by using MCScanX software, and setting a program command as follows:
1)makeblastdb-in Dq_SX.fa-dbtype prot-out Dq_SX
2)Blastp-query/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/07circos/Dq_SX.fa-db/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/07circos/Dq_SXnum_threads 10-evalue 1e outfmt 6out/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/07circos/Dq_SX.blastp
3)MCScanX./Dq_SX
step 7, because the conservation of the repetitive sequences among different species is relatively low, the prediction of the repetitive sequences aiming at a specific species needs to query a specific repetitive sequence database. In view of this, we aligned the sequencing assembly data of the core genome of the identified strain Dq _ SX with the data in the RepBase using the repeatmaskerv4.0.6 software to query possible scattered repeat sequences in the strain. The core genome data of the identified strain Dq _ SX was annotated with the RepeatModler, LTR-Finder, RepeatScout software to obtain tandem repeats (including microsatellite sequences, etc.).
And 8, filtering repeated parts in the results to obtain a final non-redundant repeated sequence annotation result (table 5).
Step 9, comparing the core genome data of the identified algal strain Dq _ SX with an NR database, and screening the result by comparison (e-value)<10-5)。
And step 10, performing COG functional annotation on the screened homologous protein sequences by utilizing eggNOG software, performing annotation on the protein sequences by using an emapper. py script in eggNOG, and performing classification statistics on the top20 (top20) protein cluster in the annotation result (FIG. 8).
Step 11, running eggNOG software to perform COG functional annotation on homologous protein encoded by the gene; the program commands are set as follows:
python/public/home/wangjingchun/miniconda2/envs/qiime1/bin/emapper.py-i/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/new/04cog/Dq_SX_protein.fa--output/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/new/04cog/out-mdiamond--data_dir/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/new/04cog/database--cpu20
step 12, performing transmembrane domain prediction analysis on the top-ranked protein in the top20 protein cluster by using an online TMHMM2.0 analysis tool (FIG. 9); using an online analytical tool of SignalP4.1 to predict the signal peptide of the protein, and setting the threshold value of the number of amino acids of each protein sequence not to exceed 6000 (figure 10); the output format is selected as extend, within graphic, and other parameters are selected as default.
TABLE 5 statistics of the results of classification of repetitive sequences in the identified Dunaliella Dq _ SX core genome
Repetitive sequence types Repeat size (bp) Genome proportion of repetitive sequence (%)
LINE 165380 0.26
LTR 118737 0.19
SINE 984126 1.57
Others (C) 1007445 1.60
Total number of 2275688 3.62
As can be seen from Table 5, the identified Dunaliella Dq _ SX core genome has a searched length of 2275688bp, which accounts for about 3.62% of the whole genome. As can be seen from FIG. 6, the Dq _ SX core genome has been annotated with the highest number of classes of transcriptional regulators (88) and dynein heavy chains (87) in the functional proteins. The prediction result of the transmembrane domain of the transcription regulatory factor shows that the structure of the 60-110 amino acids of the factor is probably outside the membrane (the probability value is about 0.8), the part of the structure after the 130 amino acids is in the membrane with the probability (the probability value is 0.82), and the probability of being on the membrane is not higher than 0.4 (FIG. 9). As is clear from the signal peptide prediction results of this factor (FIG. 10), the C value is the largest, the S value is steep, and the Y value is the highest around amino acids 25 to 26, suggesting that this is a signal peptide cleavage site.
Example 5
The differential metabolic pathway comparative analysis and characteristic gene mining based on the core genome data of the Dunaliella D.quartz necta and the identified strain Dq _ SX comprise the following steps:
step 1, performing BLASTp comparison on a protein sequence predicted in a core genome of a Dunaliella D.quartz algae and an identified strain Dq _ SX in example 3 (a Dq _ SX core genome sequencing assembly data acquisition method is shown in example 3, and a protein sequence acquisition method is shown in example 4) and a KEGG database (Kyoto Gene and encyclopedia of genomes in Japan) to acquire a metabolic pathway in which a gene coding product possibly participates, wherein the set program command is as follows:
1)diamond makedb--in/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/09kegg/ko.pep.fasta-d
2)/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/09kegg/kegg
3)diamond blastp-d/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/09kegg/kegg--query
4)/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/09kegg/Dq_protein.fa-f6-o
5)/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/09kegg/Dq.blastp-p 30-e0.00005
6)diamond blastp-d/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/09kegg/kegg--query
7)/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/09kegg/Dq_SX_protein.fa-f6-o
8)/vol3/agis/xiaoyutao_group/wangjingchun/yanzao/09kegg/Dq_SX.blastp-p 30-e 0.00005
And 2, performing intersection analysis on prediction results of the D.quartolecta and the KEGG channel of the identified strain Dq _ SX according to the KO number distributed to each metabolic channel in the step 1 to construct a Venturi diagram (figure 11), and screening respective unique metabolic channels.
And 3, respectively screening the characteristic genes with the highest enrichment degree, namely the highest enrichment ratio from the Dunaliella D.quartz and the identified unique metabolic pathway top20 of the strain Dq _ SX (in the statistics of the step 2, the KEGG pathways 20 at the top of the rank except the intersection) obtained in the step 2 (fig. 12 and 13).
And 4, respectively carrying out query analysis on the Dunaliella D.quartz and the metabolic pathway genes with the highest enrichment degree (significant enrichment) in the identified algal strain Dq _ SX in a GO (gene ontology association) database, further obtaining GO function annotation enrichment results (figures 14 and 15) of the pre-ranked 20(top20) of the Dunaliella D.quartz and the identified algal strain Dq _ SX, and screening characteristic genes which are interested by researchers from a gene set with higher enrichment degree, namely a higher GO entry number and a higher corresponding-log 10(Q value) (confidence).
As can be seen from fig. 11, based on the core genome sequencing data of dunaliella d.quartolecta and Dq _ SX, we predicted 608 channels of KEGG, 141 channels of common metabolic channels and 467 channels of distinctive metabolic channels (85 channels of dunaliella d.quartolecta and Dq _ SX 382 channels). As can be seen from fig. 12 and 13, the most enriched specific metabolic pathway of d.quartolecta of dunaliella was spliceosome-associated metabolism, and the most enriched specific pathway of Dq _ SX was cellular component synthesis-associated metabolism. As can be seen from fig. 14 and 15, most of the functions of the genes involved in the metabolic pathway of the d.quartolecta spliceosome of the dunaliella salina are closely related to RNA transport, processing and synthesis, and most of the functions of the genes involved in the anabolism of the Dq _ SX membrane component are related to protein structure and processing.
Example 6
Comparing and analyzing three different Dunaliella D.quartolecta molecular identification technologies,
collecting 20 to-be-detected algal strains and the identified dunaliella D.quartolecta in the embodiment 1, and performing molecular identification on the to-be-detected algal strains by using ITS genes, SSR molecular markers and genome sequencing data, wherein the method specifically comprises the following steps:
step 1, extracting genome DNA of each strain by using the improved CTAB method (see example 1 specifically), designing and synthesizing an ITS gene amplification primer of the Dunaliella alga as shown in SEQ ID NO.1 and SEQ ID NO. 2:
SEQ ID NO.1:5'-GAAGGAGAAGTCGTAACAAG-3';
SEQ ID NO.2:5'-CCTCCCTTATTGATATGC-3';
preparing an ITS gene PCR amplification system: 2.0. mu.L dNTPs (2mmol/L), 1.0. mu.L Mg2+(25mmol/L), 1.0. mu.L of DNA, 0.3. mu.L of LTaq enzyme (5U/. mu.L) and 2.5. mu.L of 10 XBuffer buffer, 1.0. mu.L of each of the above primers, ddH2Supplementing O to 25 μ L; setting a PCR reaction program: 3min at 95 ℃, 30sec at 95 ℃, 40sec at 52 ℃ and 1min at 72 ℃, and after circulating for 35 times, extending for 10min at 72 ℃; detecting by 1.2% agarose gel electrophoresis, collecting the specific amplification product of 800-1000 bp, and sending to a sequencing company for sequencing.
And 2, constructing an ITS gene system evolutionary tree of 21 strains of algae by using MEGA5.0 software according to a sequencing result fed back by a sequencing company based on a maximum likelihood method, wherein the step value is 1000, and identifying the D.quartolecta from the to-be-detected algae strain according to the support rate and the genetic similarity percentage of each branch node in the evolutionary tree (figure 16).
Step 3, based on 9 groups of Dunaliella tertiolecta transcriptome sequencing data (NCBI database number: SRR8393723, SRR8393722, SRR8393725, SRR8393724, SRR8393727, SRR8393726, SRR8393729, SRR8393728 and SRR8393721) obtained by the inventor, we screened and obtained 15 specific markers from 24311 SSR markers, and designed 10 pairs of polymorphism amplification primers according to the marker information, wherein the primer information is shown as follows:
CL1007:SEQ ID NO.3:5'-CTAAATCCATGCGTTCTTCTTTC-3';
SEQ ID NO.4:5'-ACAGTACAACCAGAGGCTTTGAA-3';
CL1008:SEQ ID NO.5:5'-AACAATGTCACCTCTCATTTGCT-3';
SEQ ID NO.6:5'-TCGTTTTGTTGTTGTTCTTCAAA-3';
CL102:SEQ ID NO.7:5'-GCCAATTCCAAAAAGTTAAAATCT-3';
SEQ ID NO.8:5'-ATTGTGGTTTTCTTCCTGGTTTT-3';
CL1041:SEQ ID NO.9:5'-AGGCAAGCAGTGCATTTGTA-3';
SEQ ID NO.10:5'-GGCTCTCTATGAGTCGATGTGTC-3';
CL1047:SEQ ID NO.11:5'-GCAGTGGAAACACACTTCCTTAC-3';
SEQ ID NO.12:5'-TCTCTCAAATCAAAGGTGCTTTC-3';
CL1157:SEQ ID NO.13:5'-GAGATCGAACTTGAGGCTTAGAA-3';
SEQ ID NO.14:5'-AAAATAGAAGCCATCATGAAACG-3';
CL1160:SEQ ID NO.15:5'-GGATACAGATTTCCACACTGCTC-3';
SEQ ID NO.16:5'-CTATCTGGCTGAAGGTCATGTTT-3';
CL1168:SEQ ID NO.17:5'-CGTTTTTGGAACTGATTTCTTTG-3';
SEQ ID NO.18:5'-TTCTTGTAATACATCGCAGGAAG-3';
CL1322:SEQ ID NO.19:5'-AACAGAGGAAATTCTGATGATGC-3';
SEQ ID NO.20:5'-CTTGCAAGAAGGAACAACTCACT-3';
CL1627:SEQ ID NO.21:5'-GTGGTCACCAGGAAGAGACAG-3';
SEQ ID NO.22:5'-ACGGTACTGACAGTGGAAACAAT-3';
the sizes of the amplified products are 155bp, 131bp, 139bp, 121bp, 158bp, 136bp, 118bp, 149bp, 160bp and 127bp in sequence;
and 4, sending the SSR primers to a biological company for synthesis, and preparing an SSR-PCR amplification system, namely: 2.5. mu.L dNTPs (2mmol/L), 1.2. mu.L Mg2+(25mmol/L), 1.0. mu.L of DNA (obtained in step 1), 0.4. mu.L of Taq enzyme (5U/. mu.L) and 2.5. mu.L of 10 XBuffer buffer, 0.8. mu.L of each of the above primers, ddH2Supplementing O to 25 μ L; the SSR-PCR reaction program is as follows: 5min at 94 ℃; 35 cycles (94 ℃ 45sec, 57 ℃ 35sec, 72)1min at DEG C); 8min at 72 ℃; carrying out electrophoretic separation on the amplified SSR product by using 4% denatured polyacrylamide, carrying out silver staining for 30min, developing for 15min, fixing for 20min, and then carrying out marking on '1' (with strips) and '0' (without strips) on an electrophoretic map; clustering analysis of the algal strains to be detected is carried out by using an UPGMA method and NTSYSpc 2.2 software, and a phylogenetic tree marked by the SSR is constructed (figure 17).
Step 5, establishing a sequencing library based on the whole genome DNA obtained in the step 1 by taking the assembling data of the core genome of the Dunaliella alga D.quartolecta constructed in the invention as reference, wherein the library establishment method can be carried out according to the example 1; the genome of the strain to be tested is sequenced, the sequencing fragment does not need to be assembled from the beginning, and the step can be finished by a qualified sequencing company.
Step 6, using the d.quartz pecta core genome data of the dunaliella salina constructed by the invention as a reference, detecting Single Nucleotide Polymorphism (SNP) and insertion deletion (InDel) data among the algae strains to be detected by using BWA0.7.17 software, when detecting SNP and InDel, firstly marking out a repeated segment and neglecting, then carrying out re-comparison on the region near the InDel, finally screening to obtain SNP and InDel, and carrying out a program command according to the embodiment 2.
Step 7, using easy specificity tree 1.0 software, building a phylogenetic tree (fig. 18) based on the obtained SNP data, setting the step size to 1000 by using a maximum likelihood algorithm, and performing a program command with reference to example 2.
And 8, comparing and analyzing the three different molecular identification results, wherein the technical advantages and disadvantages are shown in a table 6.
As can be seen from fig. 16, the strain Dsp11 and the dunaliella d.quartz are clustered together, the support rate is 0.99, the genetic similarity is 99%, and the strain can be identified as d.quartz. As can be seen from fig. 17, the algal strain Dsp4 and the dunaliella d.quartolecta cluster together, the supporting rate of Dsp4 and the dunaliella d.quartolecta cluster is 0.99, the genetic similarity is 99%, and the algal strain Dsp11 and the algal strain Dsp4 and d.quartolecta cluster together, the supporting rate is 1.00, the genetic similarity is 99%, and the algal strain Dsp 3825 and the dunaliella d.quartolecta cluster can also be identified as d.quartolecta; as can be seen from fig. 18, dpsp 11 and dpsp 4 can be copolymerized with d.quartolecta into a cluster, the support rate is 1.00, the genetic similarity is 100%, and the cluster can be identified as d.quartolecta. As can be seen from table 6, compared with the other two molecular identification methods, the simplified genome sequencing is performed on the alga strain to be detected and SNP data is obtained by taking the core genome data of the dunaliella salina constructed by the invention as reference, the d.quartz-tacta can be accurately identified in a short period (7-10 days), the cost is low, and abundant biological information data can be provided for later-stage deep research.
Table 6 comparison of three molecular identification methods for dunaliella d
Figure BDA0002767618840000331
Example 7
The comparison of the Dunaliella D.quartz pecta core genome sequencing and assembling technology established by the invention and the traditional genome sequencing technology comprises the following steps:
step 1, a genomic DNA extraction of the identified dunaliella d.quartz, from example 3, can be performed as described in example 1.
Step 2, DNA samples with qualified quality control (the DNA concentration is more than or equal to 150 ng/mu L, an electrophoresis band is bright and has no degradation, and OD260/OD2801.8-1.9) sent to a sequencing company for DNA sequencing library construction, sequencing, core fragment screening, de novo assembly, and selection of Nanopore, PacBio and HiSeq by a sequencing analysis platform respectively (the step can be entrusted to the company with the relevant sequencing platform for operation).
And 3, comparing key indexes of the autonomously constructed core genome sequencing fragment assembly data (detailed in the operation steps of the example 1) of the dunaliella salina D.quartz pectera and the assembly data of each sequencing platform obtained in the step 2.
And 4, comparing the sequencing data of the D.quartolecta core genome of the dunaliella salina obtained by each technical platform with reference to a Dunsal1 v.2 published by NCBI, and analyzing the difference between the technologies according to the comparison result (Table 7).
TABLE 7 analysis of alignment results during core genome sequencing data Assembly
Figure BDA0002767618840000341
Step 5, using SOAPsnp software to detect Single Nucleotide Polymorphism (SNP) of the uniquely-compared sequencing fragment obtained in the step 4, filtering out repeated fragments in the detection process, performing re-comparison on the region near an insertion/deletion (InDel) site, and screening effective high-quality SNP; and (3) comparing and clustering the short sequence in the sequencing data with a reference genome, detecting InDel, and setting the gap length: 1 to 10 bases. The mean number of effective SNPs and InDel obtained by the four techniques were analyzed in comparison (table 8).
TABLE 8 comparative analysis of SNP and InDel statistical results
Figure BDA0002767618840000342
And 6, calculating the proportion of the repetitive sequences of the algal strains to the total sequencing fragments under different technical platform conditions by using the sequencing fragments obtained in the step 4 and combining the repetitive fragment prediction method in the embodiment 4 (Table 9).
TABLE 9 comparative analysis of repeat sequence ratios
Technique of Proportion of repeat sequence to total sequence fragment (%)
Autonomous techniques 1.45%
Nanopore 15.27%
PacBio 12.99%
HiSeq 3.58%
As can be seen from Table 7, under the technical conditions established by the method, the genome coverage rate, the aligned sequence and the identification ratio of the sequencing fragment of the strain to be tested are all higher than those of the other three sequencing technologies. As can be seen from Table 8, the effective SNP and InDel detected under the technical conditions of the invention are higher than those of the other three technologies, and the error rate is lowest. As can be seen from Table 9, the ratio of the repeat sequences detected under the conditions of the present invention is lower than that of the other three techniques. In conclusion, the overall performance of the dunaliella D.quartz pecta core genome sequencing fragment assembly technology created by the invention is superior to that of Nanopore, PacBio and HiSeq.
While there have been shown and described what are at present considered to be the basic principles and essential features of the invention and advantages thereof, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, but is capable of other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.
Sequence listing
<110> university of Shanxi
<120> method for strain identification based on Dunaliella core genome sequence
<160> 22
<170> SIPOSequenceListing 1.0
<210> 16
<211> 20
<212> DNA
<213> ITS Gene upstream primer (ITS-F)
<400> 16
gaaggagaag tcgtaacaag 20
<210> 17
<211> 18
<212> DNA
<213> ITS Gene downstream primer (ITS-R)
<400> 17
cctcccttat tgatatgc 18
<210> 18
<211> 23
<212> DNA
<213> CL1007 upstream primer (CL1007-F)
<400> 18
ctaaatccat gcgttcttct ttc 23
<210> 19
<211> 23
<212> DNA
<213> CL1007 downstream primer (CL1007-R)
<400> 19
acagtacaac cagaggcttt gaa 23
<210> 20
<211> 23
<212> DNA
<213> CL1008 upstream primer (CL1008-F)
<400> 20
aacaatgtca cctctcattt gct 23
<210> 21
<211> 23
<212> DNA
<213> CL1008 downstream primer (CL1008-R)
<400> 21
tcgttttgtt gttgttcttc aaa 23
<210> 22
<211> 24
<212> DNA
<213> upstream primer of CL102 (CL102-F)
<400> 22
gccaattcca aaaagttaaa atct 24
<210> 23
<211> 23
<212> DNA
<213> CL102 downstream primer (CL102-R)
<400> 23
attgtggttt tcttcctggt ttt 23
<210> 24
<211> 20
<212> DNA
<213> upstream primer of CL1041 (CL1041-F)
<400> 24
aggcaagcag tgcatttgta 20
<210> 25
<211> 23
<212> DNA
<213> downstream primer of CL1041 (CL1041-R)
<400> 25
ggctctctat gagtcgatgt gtc 23
<210> 26
<211> 23
<212> DNA
<213> upstream primer of CL1047 (CL1047-F)
<400> 26
gcagtggaaa cacacttcct tac 23
<210> 27
<211> 23
<212> DNA
<213> downstream primer of CL1047 (CL1047-R)
<400> 27
tctctcaaat caaaggtgct ttc 23
<210> 28
<211> 23
<212> DNA
<213> upstream primer of CL1157 (CL1157-F)
<400> 28
gagatcgaac ttgaggctta gaa 23
<210> 29
<211> 23
<212> DNA
<213> CL1157 downstream primer (CL1157-R)
<400> 29
aaaatagaag ccatcatgaa acg 23
<210> 30
<211> 23
<212> DNA
<213> CL1160 upstream primer (CL1160-F)
<400> 30
ggatacagat ttccacactg ctc 23
<210> 31
<211> 23
<212> DNA
<213> CL1160 downstream primer (CL1160-R)
<400> 31
ctatctggct gaaggtcatg ttt 23
<210> 32
<211> 23
<212> DNA
<213> CL1168 upstream primer (CL1168-F)
<400> 32
cgtttttgga actgatttct ttg 23
<210> 33
<211> 23
<212> DNA
<213> CL1168 downstream primer (CL1168-R)
<400> 33
ttcttgtaat acatcgcagg aag 23
<210> 34
<211> 23
<212> DNA
<213> CL1322 upstream primer (CL1322-F)
<400> 34
aacagaggaa attctgatga tgc 23
<210> 35
<211> 23
<212> DNA
<213> CL1322 downstream primer (CL1322-R)
<400> 35
cttgcaagaa ggaacaactc act 23
<210> 36
<211> 21
<212> DNA
<213> upstream primer of CL1627 (CL1627-F)
<400> 36
gtggtcacca ggaagagaca g 21
<210> 37
<211> 23
<212> DNA
<213> downstream primer of CL1627 (CL1627-R)
<400> 37
acggtactga cagtggaaac aat 23

Claims (6)

1. The method for strain identification based on the core genome sequence of the dunaliella is characterized by comprising the following steps:
(1) collecting, purifying and culturing a sample: collecting an alga strain to be detected and a Dunaliella quartolecta strain of Dunaliella, purifying the alga strain to be detected, and then carrying out indoor expanded culture, wherein the method comprises the following specific steps: performing monoclonal picking on algal cells of an algal strain to be detected under an aseptic condition, performing indoor expanded culture under the aseptic condition after passing microscopic examination, wherein the indoor expanded culture condition is as follows: the photoperiod is 18 h: 6h, illumination intensity 19000lx, temperature: keeping the aseptic ventilation environment at 23 +/-3 ℃, shaking the culture dish every 5 days to prevent the algal cells from adhering to the walls, performing microscopic examination on 0.5-1 mL of algal solution, and preparing the following culture medium solutions to perform indoor expanded culture on the algal strains to be detected, wherein the formula of the culture medium is as follows:
30g/L NaCl,1.5g/L NaNO3,1.4g/L K2HPO4,1.75g/L MgSO4·7H2O,1.36g/L CaCl2·7H2O,1.2g/L Na2CO3,0.006g/L FeC6H5O7,0.005g/L NaH2PO4·2H2O,0.5g/L Co(NO3)2·6H2O,0.8g/L CuSO4·5H2O,2.3g/L ZnSO4·7H2O,0.03g/L H3BO3,4.0g/L Na2MoO4·2H2O,0.02g/L MnCl2·4H2O,0.5g/LVB1,0.5g/L VB12VH is 0.5g/L, and the volume of ultrapure water is constant to 1L;
(2) extracting whole genome DNA: respectively extracting the whole genome DNA of the to-be-detected alga strain and the D.quartolecta strain by using an improved CTAB method, and freezing and storing; the improved CTAB method comprises the following specific steps: taking 600-800 mg of algae to be tested, washing with ultrapure water for 2-3 times, centrifuging at 4 ℃ 8000r/min for 1.5min, adding liquid nitrogen, grinding for 15sec, adding 800 mu L of 2% W/V CTAB solution preheated at 20 ℃ and 1 mu L of 1% V/V beta-mercaptoethanol, uniformly mixing, carrying out water bath at 60 ℃ for 1.5h, shaking for 1 time every 20min, adding 800 mu L of LTris saturated phenol, centrifuging at 4 ℃ 12000r/min for 2.5min, taking supernatant, adding the mixture into the mixture, and adding the mixture into the mixture in a volume ratio of 25: 24: 2, mixing Tris saturated phenol, chloroform and isoamylol, standing for 10min at 4 ℃ after vortex oscillation, uniformly mixing for 2-3 times, and adding 800 mu L of ddH treated by 0.1% V/V DEPC2O, water bath at 60 ℃ for 30min, centrifuging at 4 ℃ for 4min at 12000r/min, taking supernatant, adding 150mL of 3mol/L sodium acetate and 250mL of 4-5 ℃ precooled absolute ethanol, precipitating at-20 ℃ for 50min, centrifuging at 4 ℃ for 3min at 10000r/min, discarding supernatant, adding 1mL of 4-5 ℃ precooled 70% V/V ethanol solution, carrying out vortex oscillation for 20sec, volatilizing liquid in a nucleic acid vacuum drying system after discarding supernatant, adding 100 xTE buffer solution to dissolve precipitate so as to ensure that the DNA concentration is more than or equal to 150 ng/mu L and the 1% W/V agarose gel electrophoresis combined fluorescence quantifier is used for detecting genome DNA, ensuring that an electrophoresis strip is bright and has no degradation, and OD is not degraded 260/OD2801.8-1.9, no pollution;
(3) respectively constructing a DNA sequencing library after breaking and purifying the whole genome DNA of the alga strain to be detected and the Dunaliella D.quartz ectca in the step (2);
(4) sequencing the DNA sequencing libraries in the step (3) by adopting a high-throughput sequencing method respectively to obtain second-generation sequencing data of the to-be-detected alga strain and the D.quartolecta whole genome;
(5) taking the whole genome data of the dunaliella salina (D.salina) published by NCBI as reference, comparing the sequencing data of the whole genome of the dunaliella salina obtained in the step (4) with the sequencing data of the whole genome of the dunaliella salina, obtaining a core genome sequence of the dunaliella salina D.quatolecta after screening, de novo assembly and quality evaluation, wherein the size of the core genome sequence is 6592916bp, the number of contigs is 3000, the length of the maximum contig is 1133322bp, the average length of the contig is 2197.64bp, the contig N50 is 15270, the proportion of the complete gene is 23.65%, the proportion of the single copy gene is 15.18%, the proportion of the multi-copy gene is 13.76%, the proportion of vacancy/deletion is 1.89%, and the proportion of the incomplete fragment is 17.45%, constructing a circular map of the core genome of the dunaliella salina assembled de D.quatolecta, and then performing gene component, protein function annotation and genome overlap collinearity analysis on the core genome sequence of the dunalina D.quatolecta;
(6) And (3) taking the core genome sequence of the Dunaliella D.quartz Colecta constructed in the step (5) as a reference, comparing the whole genome sequencing data of the to-be-detected algal strain obtained in the step (4) and published genome sequencing data of representative algae with the to-be-detected algal strain, detecting single nucleotide polymorphism and insertion/deletion sites among species, and constructing a phylogenetic tree by using the single nucleotide polymorphism, wherein when the to-be-detected algal strain and the Dunaliella D.quartz Colecta are gathered into a cluster, the branched data support rate is 0.99-1.00, the genetic similarity percentage is more than or equal to 99%, and the to-be-detected algal strain is the Dunaliella D.quartz Colecta.
2. The method for strain identification based on a Dunaliella alga core genome sequence according to claim 1, wherein the specific steps of constructing the DNA sequencing library in the step (3) are as follows: breaking the whole genome DNA by using a strong-grade ultrasonic wave band of 80-100W for 6sec, repeating the breaking for 1 time every 3sec, carrying out ultrasonic treatment for 5 times in total, and setting breaking parameters to be 300-400 bp; carrying out agarose gel electrophoresis on the fragments, and recovering 300-400 bp target fragments by using the agarose gel; adsorbing and recovering the target fragments by using silicon-based magnetic beads, and detecting the quality of the adsorbed and recovered target fragments by using a fluorescence quantitative instrument; DNA end repair, adding A at the 3' end; adding a joint for a connection reaction, and purifying, converting and PCR verifying a connection product; and (3) carrying out single-stranded DNA cyclization reaction on the positive product after the positive product is denatured at 95 ℃ for 20sec, and purifying the product to construct a whole genome DNA sequencing library for use in the computer.
3. The method for strain identification based on the core genome sequence of the dunaliella salina according to claim 1, wherein the specific steps of obtaining the core genome sequence of the dunaliella salina after screening, assembling and quality evaluation in the step (5) are as follows: screening a high-quality sequence from a sequencing platform, taking a fragment with the screening sequencing depth of 50-80X, the average length of 12-15K and the length of N50 being more than 18K as a query sequence, replying the query sequence onto a reported dunaliella salina reference genome by utilizing SOAPaligner or BWA software, further screening a sequencing fragment with the sequence consistency of more than or equal to 90 percent and the comparison result E value of less than 1E-10 as dunaliella salina D.quartz genome core sequence candidate data; comparing all the residual sequencing fragments with the candidate data set to obtain an overlapping area between comparison data; error correction and correction operation are carried out on the comparison result by using Falcon or Pilot software, and the contig is assembled by using SOAPde novo 2.04, Mecat, HERA or Canu software; determining the order of each contig using BySS 2.2.3, Velvet 1.2.10 or ABySS 2.2.3 software; carrying out whole genome coverage measurement and calculation by using BAMStats or GATK DepthOfCoverage software, and screening contigs with the reference genome coverage of more than or equal to 50% and continuous arrangement number of more than or equal to 2000; evaluating the assembly quality of the screened overlapping groups by using BUSCO 2.0 or Quast software, and selecting an assembly sequence with the complete gene ratio of more than or equal to 20 percent, the single-copy gene ratio of 15 percent, the multi-copy gene ratio of more than or equal to 12 percent and the deletion/vacancy ratio of less than or equal to 3 percent as a core genome sequence of the Dunaliella tertiolecta D.quartz tacta; the circular map of the core genome of this species was constructed using the Circos software.
4. The method for strain identification based on a core genome sequence of dunaliella salina according to claim 1, wherein the step (5) is performed on the core genome sequence of dunaliella salina by genetic composition, protein function annotation and genome contig collinearity analysis, and comprises the following steps: CDS prediction is carried out on the assembly data by using Augusts 3.3.3, ESTScan3.0.1, TransDecoder 2.0.1 or Prodigal 2.6.1 software, repeated sequence analysis is carried out on the assembly data by using replay asker 4.0.9, replay proteinMask 3.2.2, LTR-FINDER, Piler 1.0.6 or replay Scout 1.0.5 software, protein sequences coded by CDS are aligned to an NR database by using Diamons 0.9.14 or BLASTX software and are annotated with functions, and after the predicted protein sequences are aligned by BLASTp, the co-linear analysis of genome is carried out by using MCScanX, Last, Mugsy, Spines or progressive analytical software.
5. The method for strain identification based on a Dunaliella core genome sequence of claim 1, wherein the specific steps of constructing phylogenetic tree by using single nucleotide polymorphism in the step (6) are as follows: comparing the algae strain to be detected and 5-6 kinds of representative algae genome data reported in an NCBI database with the core genome sequence of the Dunaliella alga D.quartz, which is assembled in the step (5), respectively by using LASTZ 1.02.00 or Mauvee 2.3.1 software, extracting the corresponding genotype of each species and the Dunaliella alga D.quartz genome according to the result of the compared collinear block, merging, extracting and filtering the genotype information of all the species by using the core genome of the Dunaliella alga D.quartz as a template, and detecting the single nucleotide polymorphism data and the insertion/deletion site data by using BWA 0.7.17 software; based on single nucleotide polymorphism data, a phylogenetic tree is constructed by utilizing a maximum likelihood algorithm in easy SpecifesTree 1.0, MEGA 5.0, TreeBeST 1.9.2, PHYLIP, Puzzle 5.2 or PHYLO-WIN software, and then the genetic relationship between the to-be-detected algae strain and the Dunaliella D.quartz necta is determined.
6. The method of claim 5, wherein the deletion rate of the filtering is no greater than 20%.
CN202011238521.2A 2020-11-09 2020-11-09 Method for strain identification based on Dunaliella core genome sequence Active CN112349350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011238521.2A CN112349350B (en) 2020-11-09 2020-11-09 Method for strain identification based on Dunaliella core genome sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011238521.2A CN112349350B (en) 2020-11-09 2020-11-09 Method for strain identification based on Dunaliella core genome sequence

Publications (2)

Publication Number Publication Date
CN112349350A CN112349350A (en) 2021-02-09
CN112349350B true CN112349350B (en) 2022-07-19

Family

ID=74428639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011238521.2A Active CN112349350B (en) 2020-11-09 2020-11-09 Method for strain identification based on Dunaliella core genome sequence

Country Status (1)

Country Link
CN (1) CN112349350B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160893B (en) * 2021-06-09 2022-08-19 中国科学院昆明植物研究所 Mining plant ITSs sequence from second generation sequencing data and using the same for identifying variety families
CN113549620B (en) * 2021-07-13 2022-09-23 山西大学 Multi-type Dunaliella salt stress response miRNAs and application thereof
CN114664379A (en) * 2022-04-12 2022-06-24 桂林电子科技大学 Third generation sequencing data self-correction error correction method based on deep learning
CN115810393B (en) * 2022-12-22 2023-08-25 南京普恩瑞生物科技有限公司 Sequencing sample homology detection method and system based on SNPs library of construction crowd
CN116705155A (en) * 2023-08-03 2023-09-05 海南大学三亚南繁研究院 Definition method of whole-gene DNA data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013177615A1 (en) * 2012-06-01 2013-12-05 Agriculture Victoria Services Pty Ltd Selection of symbiota by screening multiple host-symbiont associations
CN106282330A (en) * 2015-12-02 2017-01-04 香港中文大学深圳研究院 A kind of method developing Caulis et Folium Ammopiptanthi Mongolici Plant Genome simple repeated sequence molecular marker
WO2018190170A1 (en) * 2017-04-12 2018-10-18 花王株式会社 Method for improving resistance to nitrate substrate analogue in microalga

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101504697B (en) * 2008-12-12 2010-09-08 深圳华大基因研究院 Construction method and system for genome sequencing equipment and its fragment connection stand
WO2011143231A2 (en) * 2010-05-10 2011-11-17 The Broad Institute High throughput paired-end sequencing of large-insert clone libraries
US9506167B2 (en) * 2011-07-29 2016-11-29 Ginkgo Bioworks, Inc. Methods and systems for cell state quantification
WO2013170235A1 (en) * 2012-05-11 2013-11-14 University Of Hawaii Ultrasound mediated delivery of substances to algae
US10777301B2 (en) * 2012-07-13 2020-09-15 Pacific Biosciences For California, Inc. Hierarchical genome assembly method using single long insert library
WO2016192772A1 (en) * 2015-06-02 2016-12-08 Siemens Healthcare Gmbh Genetic testing for predicting resistance of shigella species against antimicrobial agents
WO2017012659A1 (en) * 2015-07-22 2017-01-26 Curetis Gmbh Genetic testing for predicting resistance of salmonella species against antimicrobial agents
WO2017016600A1 (en) * 2015-07-29 2017-02-02 Curetis Gmbh Genetic testing for predicting resistance of enterobacter species against antimicrobial agents
WO2017117633A1 (en) * 2016-01-07 2017-07-13 Commonwealth Scientific And Industrial Research Organisation Plants with modified traits
CN107190003A (en) * 2017-06-09 2017-09-22 武汉天问生物科技有限公司 A kind of method of efficient quick separating T DNA insertion point flanking sequences and application thereof
CN111052250A (en) * 2017-06-28 2020-04-21 西奈山伊坎医学院 High resolution microbiological analysis method
CN110042148B (en) * 2018-01-16 2023-01-31 深圳华大基因科技有限公司 Method for efficiently acquiring chloroplast DNA sequencing data and application thereof
CN108034706B (en) * 2018-01-16 2021-03-26 浙江大学 Method for rapidly determining insertion site of transgenic strain by using re-sequencing technology
US11913006B2 (en) * 2018-03-16 2024-02-27 Nuseed Global Innovation Ltd. Plants producing modified levels of medium chain fatty acids
CN109295185B (en) * 2018-09-05 2022-03-22 暨南大学 Method for determining genome size of unicellular eukaryotic algae
CN109355410A (en) * 2018-10-30 2019-02-19 厦门极元科技有限公司 A method of identification and parting are carried out to the salmonella in macro genome based on the analysis of two generation sequencing datas
CN111276185B (en) * 2020-02-18 2023-11-03 上海桑格信息技术有限公司 Microorganism identification analysis system and device based on second-generation high-throughput sequencing
CN111363706A (en) * 2020-04-13 2020-07-03 天津中医药大学 Ecliptae herba endophytic bacteria, eclipta alba composition and application thereof
CN111647680A (en) * 2020-06-18 2020-09-11 北京市园林科学研究院 Method for rapidly identifying and tracing sedge variety at whole genome level based on second-generation high-throughput sequencing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013177615A1 (en) * 2012-06-01 2013-12-05 Agriculture Victoria Services Pty Ltd Selection of symbiota by screening multiple host-symbiont associations
CN106282330A (en) * 2015-12-02 2017-01-04 香港中文大学深圳研究院 A kind of method developing Caulis et Folium Ammopiptanthi Mongolici Plant Genome simple repeated sequence molecular marker
WO2018190170A1 (en) * 2017-04-12 2018-10-18 花王株式会社 Method for improving resistance to nitrate substrate analogue in microalga

Also Published As

Publication number Publication date
CN112349350A (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN112349350B (en) Method for strain identification based on Dunaliella core genome sequence
CN104164479B (en) Heterozygous genes group processing method
US20180258421A1 (en) Compositions, methods and uses for multiplex protein sequence activity relationship mapping
CN105740650B (en) A method of quick and precisely identifying high-throughput genomic data pollution sources
CN105112569A (en) Virus infection detection and identification method based on metagenomics
CN103088120A (en) Large-scale genetic typing method based on SLAF-seq (Specific-Locus Amplified Fragment Sequencing) technology
Mark et al. Barcoding lichen-forming fungi using 454 pyrosequencing is challenged by artifactual and biological sequence variation
CN106868116A (en) A kind of mulberry tree pathogen high throughput identification and kind sorting technique and its application
CN106947827A (en) One kind obtains flathead sex specific molecular marker and its screening technique and application
CN108103235A (en) A kind of SNP marker, primer and its application of apple rootstock cold hardness evaluation
CA3114759A1 (en) Sequence-graph based tool for determining variation in short tandem repeat regions
CN109402241A (en) Identification and the method for analyzing ancient DNA sample
CN109112217A (en) A kind of and pig body length and the significantly associated genetic marker of number of nipples and application
Lemoine et al. Assessing the evolutionary rate of positional orthologous genes in prokaryotes using synteny data
Xu et al. Genome reconstruction and haplotype phasing using chromosome conformation capture methodologies
CN111197050A (en) Ribosomal RNA gene of mulberry pseudoblight pathogenic bacteria and application thereof
Olds et al. Applying a modified metabarcoding approach for the sequencing of macrofungal specimens from fungarium collections
CN110438244A (en) A kind of molecular labeling of quick raising duck group blueness shell rate and application
US20220243267A1 (en) Compositions and methods related to quantitative reduced representation sequencing
CN113564266B (en) SNP typing genetic marker combination, detection kit and application
CN107354151A (en) STR molecular labelings and its application based on the exploitation of sika deer full-length genome
CN102102129B (en) Method for detecting single nucleotide polymorphism or small insertions and deletions by utilizing MutS proteins in genome range
CN104357563A (en) Method for performing high-throughput sequencing on haplotype of genome subjected to two-time DNA fragmentation
Yang et al. A new perspective on codon usage, selective pressure, and phylogenetic implications of the plastomes in the Telephium clade (Crassulaceae)
Kust et al. Model cyanobacterial consortia reveal a consistent core microbiome independent of inoculation source or cyanobacterial host species

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231108

Address after: No. 9 Fulong Road, Shinan District, Qingdao, Shandong Province, 266000, 317

Patentee after: Qingdao Aixin Biotechnology Co.,Ltd.

Address before: 030006 No. 92, Hollywood Road, Taiyuan, Shanxi

Patentee before: SHANXI University