CN113593645A - cDNA library gene sequence frame shift judgment method - Google Patents

cDNA library gene sequence frame shift judgment method Download PDF

Info

Publication number
CN113593645A
CN113593645A CN202110878793.7A CN202110878793A CN113593645A CN 113593645 A CN113593645 A CN 113593645A CN 202110878793 A CN202110878793 A CN 202110878793A CN 113593645 A CN113593645 A CN 113593645A
Authority
CN
China
Prior art keywords
sequence
cdna
input
input sequence
initial position
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110878793.7A
Other languages
Chinese (zh)
Inventor
张萍萍
公光业
肖云平
李晖
林博
殷昊
赵仕兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Oe Biotech Co ltd
Original Assignee
Shanghai Oe Biotech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Oe Biotech Co ltd filed Critical Shanghai Oe Biotech Co ltd
Priority to CN202110878793.7A priority Critical patent/CN113593645A/en
Publication of CN113593645A publication Critical patent/CN113593645A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a cDNA library gene sequence frame shift judgment method, belonging to the technical field of gene analysis; the method can obtain target sequences matched with the cDNA to be compared in batch, and detect whether the cDNA is shifted in the carrier or not according to the position number, so that the analysis efficiency is not influenced by the limitations of maintenance of a source database, network speed and the like, and the gene comparison analysis efficiency is greatly improved.

Description

cDNA library gene sequence frame shift judgment method
Technical Field
The invention relates to the technical field of gene analysis, in particular to a method for judging cDNA library gene sequence frameshifting.
Background
The cDNA library is a kind of gene library, and refers to a collection of clones formed by transferring all mRNA transcribed by a certain organism at a certain development period into a recipient cell after connecting a cDNA fragment to a certain vector. Unlike genomic DNA, which contains introns, cDNA is difficult to express correctly, and is convenient for cloning and mass amplification, and can be used for screening the desired target gene from a cDNA library and directly for expression and transgenic research of the target gene. The construction and screening of cDNA library has become an important method for researching functional genomics, and is one of the basic tools for discovering new genes and researching gene functions.
Generally, cDNA library construction utilizes the characteristic that polyA tail of mRNA is poly A, and uses OligodT primer to carry out reverse transcription from 3 'end to 5' end to obtain sscDNA; then the cDNA is constructed on a corresponding vector after double-strand synthesis and linker connection, and the position of the termination of reverse transcription cannot be accurately controlled, so that the cDNA connected into the vector can not be ensured to correctly code the protein. Therefore, in the gene screening experiment (such as the yeast two-hybrid cDNA library screening, subtraction library screening and other experiments) using the cDNA library, after obtaining the positive clone, whether the obtained positive clone cDNA correctly encodes the protein in the vector can not be directly detected, the obtained cDNA sequence needs to be confirmed by a first-generation sequencing method and other methods, and then the sequencing result is analyzed one by one to determine whether the insertion sequence is shifted, so that the process is complicated, the workload is large, and errors are easy to occur.
Disclosure of Invention
The invention aims to provide a cDNA library gene sequence frameshift judgment method, which is simple in process and can greatly improve the working efficiency.
In order to achieve the above object, the present invention provides the following technical solutions:
the invention provides a cDNA library gene sequence frameshift judgment method, which comprises the following steps:
1) converting the cDNA sequence to be compared into a Fasta format, identifying and removing a joint sequence, and extracting the cDNA sequence to obtain an input sequence;
2) constructing a local database comprising candidate proteins;
3) comparing the input sequence in the step 1) with the local database in the step 2) by using blastx, and taking a gene with the highest matching rate as a target sequence;
4) obtaining the frame shift condition of the cDNA according to the position comparison information of the input sequence in the step 1) and the target sequence in the step 3);
the frame shift of the cDNA comprises:
when the initial position of the input sequence compared with the target sequence is a multiple of three plus one, the input sequence is not a frame shift;
when the initial position of the input sequence is compared with a target sequence and is a multiple of three plus two, the initial position is a frame shift one bit;
when the initial position of the input sequence compared with the target sequence is a multiple of three, the input sequence is two shift codes;
there is no chronological restriction between step 1) and step 2).
Preferably, the step 4) further comprises determining the matching degree of the input sequence and the target sequence, wherein the determining the matching degree of the input sequence and the target sequence comprises comparing the starting and ending positions of the input sequence, the starting and ending positions of the target sequence, gap and mismatch information;
the judgment standard of the matching degree comprises the following steps: when the initial position of the input sequence is 1, the end position is the total length of the sequence; and the initial position of the target sequence is 1, the terminal position is the total length of the sequence, 0 mismatch, 0gap, then the input sequence is completely matched with the target sequence;
when the initial position of the input sequence is 1, the termination position is less than the total length of the sequence; and the initial position of the target sequence is 1, the final position is less than the total length of the sequence, 0 mismatch, 0gap, the 5 'end of the input sequence is completely matched with the 5' end of the target sequence;
when the initial position of the input sequence is 1, the end position is the total length of the sequence; the initial position of the target sequence is not 1, the terminal position is the total length of the sequence, 0 mismatch and 0gap, and the 3 'end of the input sequence is judged to be completely matched with the 3' end of the target sequence;
when the starting position of the input sequence is not 1, the ending position is less than the total length of the sequence; and the initial position of the target sequence is not 1, the end position is less than the total sequence length, N is mismatched, N gap, N is more than or equal to 0 and is an integer, and the input sequence and the target sequence are not completely matched.
Preferably, in step 3), the threshold value of the alignment is 1e-5 or 1 e-10.
Preferably, in step 1), the software for converting the cDNA sequences to be aligned into Fasta format includes sequence processing software seqtk.
Preferably, in step 1), the software used for identifying and removing the linker sequence includes a substr function of awk.
Preferably, in step 1), before converting the cDNA sequences to be aligned into Fasta format, the method further comprises converting the cDNA sequences to be aligned into a line format for display.
The invention provides a cDNA library gene sequence frameshift judgment method, which comprises the following steps: converting the cDNA sequence to be compared into a Fasta format, identifying and removing a joint sequence, and extracting the cDNA sequence to obtain an input sequence; constructing a local database comprising candidate proteins; comparing the input sequence with a local database, and taking the gene with the highest matching rate as a target sequence; judging the frame shift condition of the cDNA according to the position comparison information of the input sequence and the target sequence; the step of judging the frame shift condition of the cDNA comprises the following steps: when the initial position of the input sequence is compared with the target sequence and is a multiple of three plus one, the input sequence is not frame-shifted; when the initial position of the input sequence is a multiple of three plus two when compared with the target sequence, the input sequence is a frame shift one bit; when the initial position of the input sequence is compared with the target sequence and is a multiple of three, the input sequence is two bits of frame shift. The method can obtain target sequences matched with the cDNA to be compared in batch, and detect whether the cDNA is shifted in the carrier or not according to the position number, so that the analysis efficiency is not influenced by the limitations of maintenance of a source database, network speed and the like, and the gene comparison analysis efficiency is greatly improved.
Drawings
FIG. 1 shows the results of rapid batch alignment analysis of the best matched gene in cDNA database;
FIG. 2 is a diagram of the overall framework of the method for performing rapid alignment of gene data and analyzing whether a gene is frameshifted according to example 2 of the present invention;
FIG. 3 shows the results of rapid batch alignment analysis of the best matched gene in the database and analysis of the frameshift of cDNA in the library vector.
Detailed Description
The invention provides a cDNA library gene sequence frameshift judgment method, which comprises the following steps:
1) converting the cDNA sequence to be compared into a Fasta format, identifying and removing a joint sequence, and extracting the cDNA sequence to obtain an input sequence;
2) constructing a local database comprising candidate proteins;
3) comparing the input sequence in the step 1) with the local database in the step 2) by using blastx, and taking a gene with the highest matching rate as a target sequence;
4) obtaining the frame shift condition of the cDNA according to the position comparison information of the input sequence in the step 1) and the target sequence in the step 3);
the frame shift of the cDNA comprises:
when the initial position of the input sequence compared with the target sequence is a multiple of three plus one, the input sequence is not a frame shift;
when the initial position of the input sequence is compared with a target sequence and is a multiple of three plus two, the initial position is a frame shift one bit;
when the initial position of the input sequence compared with the target sequence is a multiple of three, the input sequence is two shift codes;
there is no chronological restriction between step 1) and step 2).
The invention converts the cDNA sequence to be compared into Fasta format, identifies and removes the linker sequence, extracts the cDNA sequence and obtains the input sequence.
In the present invention, the software used to convert the cDNA sequences to be aligned to Fasta format preferably comprises the sequence processing software seqtk.
In the present invention, the software used to identify and remove linker sequences preferably includes the substr function of awk.
In the present invention, before converting the cDNA sequences to be aligned into Fasta format, it is preferable to display the cDNA sequences to be aligned in a line format. In the present invention, the software used to convert the cDNA sequences to be aligned into a line format for display includes the sequence processing software seqtk.
In the specific implementation process of the invention, the sequence processing software seqtk is used to adjust all cDNA sequences to be compared into one line of each sequence for display, so that the condition that the sequence part of a matched joint cannot be matched due to line change can be avoided, and the seq format gene sequences to be processed are converted into Fasta format in batch.
The method can identify any connector sequence and can extract the post-connector sequence in batches. The specific steps are that all sequences after (including) the first base after the adaptor sequence are extracted according to the appointed adaptor sequence by using a substr function of awk (namely, only the adaptor sequence and the rest sequences of the front part of the adaptor sequence are removed to obtain a complete cDNA sequence), and the position of the extracted sequence is Y ═ X + L (X is the position of the first base of the adaptor sequence, and L is the length of the adaptor sequence).
For example, the following steps are carried out:
1) the linker sequence is ACAAGTTTTGTACAAAAAGTTGGX (SEQ ID NO.1, X is a non-fixed base and may be any one of ATCGs), and the length of the linker sequence is 24 (including X);
2) if the linker position information (i.e. the position of the first base of the linker sequence over the entire sequence) obtained by the substr function of awk is 80, the sequence start site to be extracted is 80+ 24-104. Position 104 of the entire sequence is the starting base position of the CDS sequence actually required.
The present invention constructs a local database comprising candidate proteins.
In the present invention, local database construction is preferably performed by makeblastdb of blast. In the present invention, the amino acid sequence of the candidate protein is selected according to the alignment requirement. The candidate protein is not particularly limited in the present invention, and may be a species, a family of proteins, or a single protein of interest. The data source of the local database is not particularly limited in the invention, and the data source can be derived from NCBI, uniprot and other published protein databases, personalized and customized protein sequences or corrected and modified protein sequences. In the specific implementation process of the invention, a personalized local database can be constructed for accurate comparison.
After obtaining the local database and the input sequence, the invention uses blastx to compare the input sequence with the local database, and the output sequence is the target sequence.
In the specific implementation process of the invention, under the condition that the studied species is an unusual species and the gene amount in the database is small, the threshold value is set to be 1 e-5; in the case where the species under study is a common species, the genome and transcriptome have been deeply sequenced, and the number of genes entered in the database is large, the setting is 1 e-10. The more relaxed the threshold, the more results obtained from the alignment, the more stringent the threshold, and the fewer results obtained from the alignment. Sequences below the above threshold are not displayed in the output result.
In the present invention, the alignment parameters include: and the Identity, the Gap, the Align _ length and the E _ value are scored according to the parameters, and the sequence with the highest score is an output sequence, namely a target sequence.
After a target sequence is obtained, judging the frame shift condition of the cDNA according to the position comparison information of an input sequence and the target sequence;
the step of judging the frame shift condition of the cDNA comprises the following steps:
when the initial position of the input sequence compared with the target sequence is a multiple of three plus one, the input sequence is not a frame shift;
when the initial position of the input sequence is compared with a target sequence and is a multiple of three plus two, the initial position is a frame shift one bit;
and when the initial position of the input sequence compared with the target sequence is a multiple of three, the input sequence is two bits of frame shift.
In the present invention, according to the position alignment information of the input sequence and the target sequence, it is preferable to further determine the matching degree between the input sequence and the target sequence; the judgment criterion for determining the matching degree of the input sequence and the target sequence comprises the following steps:
comparing the start and end positions of the input sequence, the start and end positions of the target sequence, gap and mismatch information;
when the initial position of the input sequence is 1, the end position is the total length of the sequence; the initial position of the target sequence is 1, the terminal position is the total length of the sequence, 0 mismatch and 0gap, and the input sequence is judged to be completely matched with the target sequence;
when the initial position of the input sequence is 1, the termination position is less than the total length of the sequence; and the initial position of the target sequence is 1, the end position is less than the total length of the sequence, 0 mismatch and 0gap, and the 5' end of the input sequence is judged to be completely matched with the target sequence;
when the initial position of the input sequence is 1, the end position is the total length of the sequence; the initial position of the target sequence is not 1, the terminal position is the total length of the sequence, 0 mismatch and 0gap, and the 3' end of the input sequence is judged to be completely matched with the target sequence;
when the starting position of the input sequence is not 1, the ending position is less than the total length of the sequence; and the initial position of the target sequence is not 1, the end position is less than the total sequence length, N is mismatched, N gap, N is more than or equal to 0 and is an integer, the input sequence and the target sequence are incompletely matched, and the specific matching rate is output according to a sequence similarity algorithm.
The technical solution of the present invention will be clearly and completely described below with reference to the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The cDNA sequences were aligned as follows:
(1) uploading the sequence to be compared and the gene database to a host;
(2) converting the format of the sequence to be compared into a Fasta format;
(3) comparing the sequence to be compared with a reference gene or protein sequence in a local database;
(4) and outputting the target gene matched with the gene to be compared according to the matching score value of the candidate gene and the gene to be compared. The alignment results are shown in FIG. 1.
Example 2
The cDNA sequences were aligned according to the following procedure, and the flow chart is shown in FIG. 2:
(1) uploading the sequence to be compared and the gene database to a host;
(2) converting the format of the sequence to be compared into a Fasta format;
(3) setting the display mode of the sequences to be compared as 1 line;
(4) setting a library adaptor sequence;
(5) searching and deleting the upstream of the joint and the joint sequence in the sequence to be compared;
(6) converting the cDNA sequence without the library joint into an amino acid sequence, comparing the amino acid sequence in a local protein database, and analyzing the matching rate;
(7) outputting the gene with the highest comparison score as an optimal result, wherein the optimal result is a target gene;
(8) calculating whether the expression of the sequences to be compared in the vector is shifted according to the set shifting judgment rule;
(9) and outputting the frame shift of the cDNA to be compared in the carrier and the detailed information of the compared target gene.
Comparative example 1
This is consistent with example 2, except that the sequence display format is set, the library adaptor sequence is set, the adaptor is removed and the number of first matched base positions of the sequence to be aligned and the target gene is calculated.
Comparative example 2
Adding sequence display format, setting library joints and removing the joint sequences, calculating the number of first matching base positions of the sequences to be compared and the target gene, and on the basis of quick batch comparison in embodiment 1, simultaneously analyzing whether the cDNA is frameshifted in the carrier or not. And for different cDNA library vectors, the same effect can be achieved by only replacing the linker sequence in the script, and the method is simple and quick. The comparison result is shown in fig. 3, and the frame shift column shows the result of the determination of whether to shift the code (indicated by yellow mark). Wherein 0 in the frameshift column indicates that the cDNA has not been frameshifted in the vector, i.e. the inserted gene is correctly expressed; frame shift column 1 or 2 indicates that the cDNA is frameshifted in the vector and needs to be reconstructed.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (6)

1. A method for judging cDNA library gene sequence frameshift comprises the following steps:
1) converting the cDNA sequence to be compared into a Fasta format, identifying and removing a joint sequence, and extracting the cDNA sequence to obtain an input sequence;
2) constructing a local database comprising candidate proteins;
3) comparing the input sequence in the step 1) with the local database in the step 2) by using blastx, and taking a gene with the highest matching rate as a target sequence;
4) obtaining the frame shift condition of the cDNA according to the position comparison information of the input sequence in the step 1) and the target sequence in the step 3);
the frame shift of the cDNA comprises:
when the initial position of the input sequence compared with the target sequence is a multiple of three plus one, the input sequence is not a frame shift;
when the initial position of the input sequence is compared with a target sequence and is a multiple of three plus two, the initial position is a frame shift one bit;
when the initial position of the input sequence compared with the target sequence is a multiple of three, the input sequence is two shift codes;
there is no chronological restriction between step 1) and step 2).
2. The method of claim 1, wherein step 4) further comprises determining the degree of match between the input sequence and the target sequence, wherein the determining the degree of match between the input sequence and the target sequence comprises comparing the start and end positions of the input sequence, the start and end positions of the target sequence, gap, and mismatch information;
the judgment standard of the matching degree comprises the following steps: when the initial position of the input sequence is 1, the end position is the total length of the sequence; and the initial position of the target sequence is 1, the terminal position is the total length of the sequence, 0 mismatch, 0gap, then the input sequence is completely matched with the target sequence;
when the initial position of the input sequence is 1, the termination position is less than the total length of the sequence; and the initial position of the target sequence is 1, the final position is less than the total length of the sequence, 0 mismatch, 0gap, the 5 'end of the input sequence is completely matched with the 5' end of the target sequence;
when the initial position of the input sequence is 1, the end position is the total length of the sequence; the initial position of the target sequence is not 1, the terminal position is the total length of the sequence, 0 mismatch and 0gap, and the 3 'end of the input sequence is judged to be completely matched with the 3' end of the target sequence;
when the starting position of the input sequence is not 1, the ending position is less than the total length of the sequence; and the initial position of the target sequence is not 1, the end position is less than the total sequence length, N is mismatched, N gap, N is more than or equal to 0 and is an integer, and the input sequence and the target sequence are not completely matched.
3. The method of claim 1, wherein in step 3), the threshold value of the alignment is 1e-5 or 1 e-10.
4. The method according to claim 1, wherein in step 1), the software used to convert the cDNA sequences to be aligned into Fasta format comprises the sequence processing software seqtk.
5. The method of claim 1, wherein in step 1), the software used to identify and remove the linker sequence comprises a substr function of awk.
6. The method according to claim 1, wherein the step 1) further comprises displaying the cDNA sequences to be aligned in a line format before converting the cDNA sequences to be aligned into Fasta format.
CN202110878793.7A 2021-08-02 2021-08-02 cDNA library gene sequence frame shift judgment method Pending CN113593645A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110878793.7A CN113593645A (en) 2021-08-02 2021-08-02 cDNA library gene sequence frame shift judgment method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110878793.7A CN113593645A (en) 2021-08-02 2021-08-02 cDNA library gene sequence frame shift judgment method

Publications (1)

Publication Number Publication Date
CN113593645A true CN113593645A (en) 2021-11-02

Family

ID=78253471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110878793.7A Pending CN113593645A (en) 2021-08-02 2021-08-02 cDNA library gene sequence frame shift judgment method

Country Status (1)

Country Link
CN (1) CN113593645A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149743A (en) * 2007-11-09 2008-03-26 中国水产科学研究院黑龙江水产研究所 DNA sequencing polluted sequence batch treating tool
CN110541001A (en) * 2019-09-20 2019-12-06 福建上源生物科学技术有限公司 Gene knock-out method combining precise large-fragment gene deletion with stop codon insertion
CN110993023A (en) * 2019-11-29 2020-04-10 北京优迅医学检验实验室有限公司 Detection method and detection device for complex mutation
CN111653313A (en) * 2020-05-25 2020-09-11 中国人民解放军海军军医大学第三附属医院 Variant sequence annotation method
CN112017729A (en) * 2020-08-10 2020-12-01 浙江大学 Method and device for quickly annotating bacterial DNA sequence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149743A (en) * 2007-11-09 2008-03-26 中国水产科学研究院黑龙江水产研究所 DNA sequencing polluted sequence batch treating tool
CN110541001A (en) * 2019-09-20 2019-12-06 福建上源生物科学技术有限公司 Gene knock-out method combining precise large-fragment gene deletion with stop codon insertion
CN110993023A (en) * 2019-11-29 2020-04-10 北京优迅医学检验实验室有限公司 Detection method and detection device for complex mutation
CN111653313A (en) * 2020-05-25 2020-09-11 中国人民解放军海军军医大学第三附属医院 Variant sequence annotation method
CN112017729A (en) * 2020-08-10 2020-12-01 浙江大学 Method and device for quickly annotating bacterial DNA sequence

Similar Documents

Publication Publication Date Title
US20240120021A1 (en) Methods and systems for large scale scaffolding of genome assemblies
JP6314091B2 (en) DNA sequence data analysis
KR101795124B1 (en) Method and system for detecting copy number variation
CN106947827B (en) Bighead carp gender specific molecular marker, screening method and application thereof
TW201018731A (en) Methods for accurate sequence data and modified base position determination
Akmaev et al. Correction of sequence-based artifacts in serial analysis of gene expression
CN107345256A (en) One kind is based on transcript profile sequencing exploitation grass vetch EST SSR primer sets and methods and applications
Behera et al. Plant transcriptome assembly: review and benchmarking
CN111292806B (en) Transcriptome analysis method by using nanopore sequencing
CN106947817B (en) DNA bar code for identifying octopodidae species
US10179934B2 (en) High-throughput detection method for DNA synthesis product
CN113593645A (en) cDNA library gene sequence frame shift judgment method
CN109943644A (en) The method for identifying molecules of mouse kind
Zeng et al. A novel high-accuracy genome assembly method utilizing a high-throughput workflow
CN109321646A (en) The virtual PCR method compared based on NGS read and reference sequences
CN113969311B (en) Method for detecting mutation after gene editing
CN112802554B (en) Animal mitochondrial genome assembly method based on second-generation data
CN114108103A (en) High-quality 3' RNA-seq database building method and application thereof
CN114480602A (en) SNP marker for identifying genetic sex of red swamp crayfish and primer pair and application thereof
Walden et al. Unravelling complex hybrid and polyploid evolutionary relationships using phylogenetic placement of paralogs from target enrichment data
CN116004893A (en) Interseed hybrid of spring sand and Hainan sand and identification method of parent thereof
Daniels et al. Benchmarking sample pooling for epigenomics of natural populations
Keane Computing a Yeast Tree of Life
Wei Single Cell Phylogenetic Fate Mapping: Combining Microsatellite and Methylation Sequencing for Retrospective Lineage Tracing
Torkler STAMMP: A Statistical Model and Processing Pipeline for PAR-CLIP Data Reveals Transcriptome Maps of MRNP Biogenesis Factors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211102