CN114627967A - Method for accurately annotating three-generation full-length transcript - Google Patents

Method for accurately annotating three-generation full-length transcript Download PDF

Info

Publication number
CN114627967A
CN114627967A CN202210252816.8A CN202210252816A CN114627967A CN 114627967 A CN114627967 A CN 114627967A CN 202210252816 A CN202210252816 A CN 202210252816A CN 114627967 A CN114627967 A CN 114627967A
Authority
CN
China
Prior art keywords
transcript
information
annotation
transcripts
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210252816.8A
Other languages
Chinese (zh)
Inventor
张函槊
张成胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Genex Health Co Ltd
Original Assignee
Genex Health Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genex Health Co Ltd filed Critical Genex Health Co Ltd
Priority to CN202210252816.8A priority Critical patent/CN114627967A/en
Publication of CN114627967A publication Critical patent/CN114627967A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Bioethics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for accurately annotating a third-generation full-length transcript. The method comprises the following steps: obtaining corrected transcript structure information; analyzing the human reference genome annotation file, and extracting information of a specific gene-transcript as reference information; analyzing the corrected structure information of the transcript to obtain primary classification information, and converting the abnormal transcript to obtain normal data; carrying out annotation processing on normal data to obtain specific gene-transcript annotation and final transcript classification information; and (4) sorting the annotation information and the classification information to obtain accurate annotations of the three generations of full-length transcripts. Experiments prove that the annotation accuracy is obviously improved when the method provided by the invention is used for annotating the three-generation full-length transcripts. The invention has important application value.

Description

Method for accurately annotating third-generation full-length transcript
Technical Field
The invention belongs to the field of bioinformatics, and particularly relates to a method for accurately annotating a third-generation full-length transcript.
Background
Full-length transcript sequencing is a technique for obtaining the full-length sequence of an mRNA using third generation sequencing techniques. The third-generation sequencing has the advantage of high read length compared with the second-generation sequencing, the read length can completely cover most of the self length of the transcript to obtain complete transcript sequencing information, and errors caused by splicing of the second-generation sequencing short read length are avoided, so that the full-length transcript sequencing has obvious advantages.
After the sequencing data are obtained, the structural data of the transcript can be obtained through comparison software and structural analysis software, and then the transcript annotation is carried out. As the development time of the third generation sequencing technology is short, the software for annotating the sequencing data of the full-length transcript is less, and the existing software (such as SQANT software) can annotate the transcript but has low accuracy.
Disclosure of Invention
The purpose of the invention is to make accurate annotation on three generations of full-length transcripts.
The invention firstly protects a method for accurately annotating a three-generation full-length transcript, which comprises the following steps:
(1) obtaining corrected transcript structure information;
(2) analyzing the human reference genome annotation file, and extracting information of a specific gene-transcript as reference information;
(3) analyzing the corrected transcript structure information obtained in the step (1) to obtain primary classification information; carrying out transformation treatment on the abnormal transcripts; obtaining normal data;
(4) performing annotation processing on the normal type data obtained in the step (3) to obtain specific gene-transcript annotation and final transcript classification information;
(5) and (4) sorting the annotation information and the classification information to obtain accurate annotations of the three generations of full-length transcripts.
In the above method, in the step (1), the corrected transcript structure information is obtained by correcting the structure information of the three generations of full-length transcripts and the original sequence information.
In the step (1), the specific steps of obtaining the corrected transcript structure information may be as follows:
(1-1) judging the length consistency of the comparison position information in the structure information and the corresponding position of the original sequence information;
(1-2) judging the base consistency of the alignment position sequence in the structural information and the original sequence;
(1-3) judging the structural integrity consistency of the original sequence and the transcript;
and (1-4) integrating the consistency information in the step (1-1), the step (1-2) and the step (1-3) to judge the structural accuracy in a weighting mode, and carrying out optimization adjustment on the abnormal region according to the consistency data to obtain corrected transcript structural information.
In the above method, in the step (2), the reference genome annotation file may be obtained from a public database.
In the above method, in the step (2), the information of the specific gene-transcript can be used as the reference information after the data format is converted.
In the above method, in the step (3), the specific steps of obtaining the normal type data may be as follows:
(3-1) calculating data such as chain specificity, structural continuity, integrity and the like of each transcript according to the structural information of each transcript, and obtaining a classification value through weighted calculation;
(3-2) classifying the transcripts into 4 classes of normal transcripts, fusion transcripts, structural variant transcripts and abnormal transcripts according to the classification values;
(3-3) calculating the number of fusion genes and fusion breakpoints thereof for the fusion transcript in the step (3-2), cutting the transcript into a plurality of fragments by taking the fusion breakpoints as boundaries, and repeating the step (3-1) for each fragment until normal fragment classification is obtained; calculating the structural variation region and type of the structural variation transcript in the step (3-2), and repeating the step (3-1) by taking the rest fragments as a whole after removing the structural variation region until obtaining normal fragment classification;
and (3-4) treating the normal transcripts obtained in the step (3-2) as a single whole, and performing integration treatment on the normal transcripts and the normal fragment classifications obtained in the step (3-3) to obtain normal type data.
In the above method, in the step (4), the specific steps of obtaining the transcript classification information may be as follows:
(4-1) importing the reference information obtained in the step (2), and judging whether intersection exists between each fragment in the normal data and the reference information: if yes, extracting intersection annotation content; if not, classifying the fragment as a new transcript;
(4-2) judging whether a single known transcript with the structure consistent with the transcript structure exists in the intersection annotation for the transcript with the intersection annotation extracted in the step (4-1): if the transcript exists, the transcript is a known transcript, and corresponding annotation information is reserved; if not, judging whether each exon area has unique annotation information; if the unique annotation information is not contained, judging the transcript as a new transcript, simultaneously calculating the similarity coefficient of each annotation gene and the transcript, and taking the highest bit as the annotation of the transcript; if the unique annotation information exists, collecting all unique annotation exons, and judging the consistency of the unique annotation exons; if the unique annotation exons are completely identical, determining as a new transcript; if the unique annotation exons are not consistent, determining the fusion transcript; meanwhile, calculating annotation information required to be output by relevant classification;
(4-3) carrying out combined processing on the multi-fragment transcripts, judging the transcripts to be fusion transcripts if the annotations are inconsistent, and classifying the transcripts in sequence of fusion transcripts, new transcripts and known transcripts if the transcripts are finally classified in inconsistent; and finally obtaining the annotation information and the classification information.
The application of any of the above methods to the precise annotation of three generations of full-length transcripts also falls within the scope of the present invention.
The invention mainly provides a novel method for realizing accurate annotation of full-length transcripts. The method is based on the high-precision transcript structure, although the method does not require the quality of the transcript structure data, the annotation accuracy can be obviously improved in consideration of high precision, and the quality of results given by most of comparison software and structure analysis software still has great improvement space at present, so that the method provided by the invention is a method for correcting the transcript structure data with general quality, and is convenient to select and use when needed. The method for correcting the transcript structure data with general quality comprises the steps of taking complete information of a transcript structure as a premise, and comparing position information with original transcript sequence information. The accuracy of structural variation such as single base variation, insertion deletion and the like in the transcript structural information is confirmed by comparing the difference of the two in the aspects of length consistency, base consistency, integrity consistency and the like. And further confirming the accuracy of the structure information of the transcript according to the accuracy of the structure variation, and if the difference which does not meet the standard exists, carrying out optimization adjustment in the standard according to the consistency check result until the difference meets the standard and retaining the difference or judging the difference to be completely abandoned in error. According to the method for correcting the transcript structure data, on the premise that complete and accurate information meeting the standard is not changed, optimization adjustment is performed on part of information with low quality which does not meet the standard, so that the proportion of low-quality information can be remarkably improved, and the accuracy of subsequent annotation is improved.
The method provided by the invention is also a novel method for realizing accurate annotation of the full-length transcript, and comprises a type judgment method of a non-reference stage and a transcript annotation method of a reference stage.
The type judgment method of the parameter-free stage comprises the following steps: analyzing the structural information of the input full-length transcript, and initially dividing the structure into four types of normal, fusion, structural variation and abnormity according to structural continuity, comparison result chain specificity and the like; analyzing the fusion breakpoint of the fusion type transcript, and then dividing the fusion transcript into a plurality of single normal transcript fragments by taking the fusion breakpoint as a boundary; analyzing the variation region of the structural variation transcript, judging the types of the remaining normal regions after eliminating the variation region, reserving normal fragments, and discarding abnormal fragments; the abnormal type transcript is only classified and marked, and is not annotated; and coding the normal segment information of the normal transcript and the processed other types of transcripts for subsequent annotation processing.
A method for annotating a reference stage transcript comprising the steps of: analyzing the annotation file to obtain the structure information annotated by each gene-transcript as reference information; comparing the coded information with the reference information, and judging whether the annotation area corresponds to the coded information; judging whether a known transcript in the annotation region can be matched with the target transcript or not; judging whether the transcript is a fusion transcript of an adjacent gene; splicing the multi-fragment transcripts; and sorting and outputting the finally matched annotation information and structure classification information.
Experiments prove that the method and SQANT software provided by the invention are respectively adopted to annotate the sequencing data of the three generations of full-length transcripts of the melanoma cell line COLO829, and the annotation accuracy rate is counted. The result shows that the annotation accuracy of the method provided by the invention is 99%, and the annotation accuracy of SQANT software is 94%. The method provided by the invention has the advantage that the annotation on the sequencing data of the third-generation full-length transcript is accurate and remarkably improved. The invention has important application value.
Drawings
FIG. 1 is a schematic flow chart of the precise annotation of three generations of full-length transcripts.
Detailed Description
The present invention is described in further detail below with reference to specific embodiments, which are given for the purpose of illustration only and are not intended to limit the scope of the invention. The examples provided below serve as a guide for further modifications by a person skilled in the art and do not constitute a limitation of the invention in any way.
The experimental procedures in the following examples, unless otherwise indicated, are conventional and are carried out according to the techniques or conditions described in the literature in the field or according to the instructions of the products. Materials, reagents and the like used in the following examples are commercially available unless otherwise specified.
Example 1 creation of a method for accurate annotation of three generations of full-length transcript sequencing data
The inventor of the invention establishes a method for accurately annotating the sequencing data of the third-generation full-length transcript through a large number of experiments.
The method comprises the following specific steps:
1. obtaining corrected transcript structure information
(1) And analyzing the sequencing data of the three-generation full-length transcript by adopting bioinformatics to obtain structural information.
(2) And inputting structural information and original sequence (namely three generations of full-length transcript sequencing data) information for correction to obtain corrected transcript structural information. The method comprises the following specific steps:
(1-1) judging the length consistency of the comparison position information in the structure information and the corresponding position of the original sequence information;
(1-2) judging the base consistency of the alignment position sequence in the structural information and the original sequence;
(1-3) judging the structural integrity consistency of the original sequence and the transcript;
and (1-4) integrating the consistency information in the step (1-1), the step (1-2) and the step (1-3) to judge the structural accuracy in a weighting mode, and carrying out optimization adjustment on the abnormal region according to the consistency data to obtain corrected transcript structural information.
2. And analyzing the human reference genome annotation file acquired from the public database, extracting the information of the specific gene-transcript, and converting the information into a data format convenient for subsequent use as reference information.
3. Analyzing the input corrected transcript structure information to obtain primary classification information; carrying out transformation treatment on the abnormal transcripts; and finally obtaining normal data. The method comprises the following specific steps:
and (3-1) calculating data such as chain specificity, structural continuity, integrity and the like of each transcript according to the structural information of each transcript, and obtaining a classification value by weighting calculation.
(3-2) classifying the transcripts into 4 classes of normal transcripts, fusion transcripts, structural variant transcripts and aberrant transcripts according to the classification values. Aberrant transcripts were discarded.
(3-3) calculating the number of fusion genes and fusion breakpoints thereof for the fusion transcript in the step (3-2), cutting the transcript into a plurality of fragments by taking the fusion breakpoints as boundaries, and repeating the step (3-1) for each fragment until normal fragment classification is obtained; and (3) calculating the structural variation region and type of the structural variation transcript in the step (3-2), and after the structural variation region is removed, repeating the step (3-1) by taking the rest fragments as a whole until normal fragment classification is obtained.
And (3-4) treating the normal transcripts obtained in the step (3-2) as a single whole, and performing integration treatment on the normal transcripts and the normal fragment classifications obtained in the step (3-3) to obtain normal type data.
4. And (4) performing annotation processing on the normal type data obtained in the step (3) to obtain specific gene-transcript annotation and final transcript classification information. The method comprises the following specific steps:
(4-1) importing the reference information obtained in the step 2, and judging whether intersection exists between each fragment in the normal data and the reference information: if yes, extracting intersection annotation content; if not, the fragment is classified as a new transcript.
(4-2) judging whether a single known transcript with the structure consistent with the transcript structure exists in the intersection annotation for the transcript with the intersection annotation extracted in the step (4-1): if the annotation exists, the known transcript is obtained, and corresponding annotation information is reserved; if not, judging whether each exon area has unique annotation information; if the unique annotation information is not contained, judging the transcript as a new transcript, simultaneously calculating the similarity coefficient of each annotation gene and the transcript, and taking the highest bit as the annotation of the transcript; if unique annotation information is available, all unique annotation exons are collected and judged for identity. If the unique annotation exons are completely identical, determining as a new transcript; fusion transcripts were judged if the uniquely annotated exons were not identical. And meanwhile, calculating annotation information required to be output by relevant classification.
(4-3) performing combined processing on the multi-fragment transcripts, judging the transcripts to be fusion transcripts if the annotations are inconsistent, and classifying the transcripts in sequence of fusion transcripts, new transcripts and known transcripts if the transcripts are finally classified in inconsistent. And outputting the annotation information and the classification information.
5. And (4) sorting the annotation information and the classification information to obtain the three-generation full-length transcript sequencing data with accurate annotation.
The flow diagram of the method for accurately annotating a third generation of full-length transcripts established by the present invention is shown in FIG. 1.
Example 2, comparative example 1 the annotation methods and the existing full-length transcript sequencing data annotation software for annotation of three generations of full-length transcript sequencing data
The third generation full length transcript sequencing data for this example is for melanoma cell line COLO 829.
1. Annotation was performed on the third generation of full-length transcript sequencing data using the annotation method established in example 1, and the annotation accuracy was calculated.
The result shows that the annotation accuracy of the annotation method established in example 1 on the sequencing data of the three generations of full-length transcripts is 99%.
2. And (4) annotating the sequencing data of the three-generation full-length transcripts by using SQANT software, and counting the annotation accuracy.
The result shows that the annotation accuracy of the SQANT software on the sequencing data of the three generations of full-length transcripts is 94%.
The results show that the annotation accuracy rate is obviously improved when the method provided by the invention is used for annotating the sequencing data of the third-generation full-length transcript.
The present invention has been described in detail above. It will be apparent to those skilled in the art that the invention can be practiced within a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the invention and without undue experimentation. While the invention has been described with reference to specific embodiments, it will be appreciated that the invention can be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. The use of some of the essential features is possible within the scope of the claims attached below.

Claims (8)

1. A method of accurately annotating a three generation full-length transcript comprising the steps of:
(1) obtaining corrected transcript structure information;
(2) analyzing the human reference genome annotation file, and extracting information of a specific gene-transcript as reference information;
(3) analyzing the corrected transcript structure information obtained in the step (1) to obtain primary classification information; carrying out transformation treatment on the abnormal transcripts; obtaining normal data;
(4) performing annotation processing on the normal type data obtained in the step (3) to obtain specific gene-transcript annotation and final transcript classification information;
(5) and arranging annotation information and classification information to obtain accurate annotations of three generations of full-length transcripts.
2. The method of claim 1, wherein: in the step (1), the corrected structure information of the transcript is obtained by correcting the structure information of three generations of full-length transcripts and the original sequence information.
3. The method of claim 2, wherein: in the step (1), the specific steps of obtaining the corrected transcript structure information are as follows:
(1-1) judging the length consistency of the comparison position information in the structure information and the corresponding position of the original sequence information;
(1-2) judging the base consistency of the alignment position sequence in the structural information and the original sequence;
(1-3) judging the structural integrity consistency of the original sequence and the transcript;
and (1-4) integrating the consistency information in the step (1-1), the step (1-2) and the step (1-3) to judge the structural accuracy in a weighting mode, and carrying out optimization adjustment on the abnormal region according to the consistency data to obtain corrected transcript structural information.
4. The method of claim 1, wherein: in the step (2), the human reference genome annotation file is obtained from a public database.
5. The method of claim 1, wherein: in the step (2), the information of the specific gene-transcript can be used as reference information after the data format is converted.
6. The method of claim 1, wherein: in the step (3), the specific steps for obtaining the normal data are as follows:
(3-1) calculating data such as chain specificity, structural continuity, integrity and the like of each transcript according to the structural information of each transcript, and obtaining a classification value through weighted calculation;
(3-2) classifying the transcripts into 4 classes of normal transcripts, fusion transcripts, structural variant transcripts and abnormal transcripts according to the classification values;
(3-3) calculating the number of fusion genes and fusion breakpoints thereof for the fusion transcript in the step (3-2), cutting the transcript into a plurality of fragments by taking the fusion breakpoints as boundaries, and repeating the step (3-1) for each fragment until normal fragment classification is obtained; calculating the structural variation region and type of the structural variation transcript in the step (3-2), and after the structural variation region is removed, repeating the step (3-1) by taking the rest fragments as a whole until normal fragment classification is obtained;
and (3-4) treating the normal transcripts obtained in the step (3-2) as a single whole, and performing integration treatment on the normal transcripts and the normal fragment classifications obtained in the step (3-3) to obtain normal type data.
7. The method of claim 1, wherein: in the step (4), the specific steps of obtaining the transcript classification information are as follows:
(4-1) importing the reference information obtained in the step (2), and judging whether intersection exists between each fragment in the normal data and the reference information: if yes, extracting intersection annotation content; if not, classifying the fragment as a new transcript;
(4-2) judging whether a single known transcript with the structure consistent with the transcript structure exists in the intersection annotation for the transcript with the intersection annotation extracted in the step (4-1): if the annotation exists, the known transcript is obtained, and corresponding annotation information is reserved; if not, judging whether each exon area has unique annotation information; if the unique annotation information is not contained, judging the transcript as a new transcript, simultaneously calculating the similarity coefficient of each annotation gene and the transcript, and taking the highest bit as the annotation of the transcript; if the unique annotation information exists, collecting all unique annotation exons, and judging the consistency of the unique annotation exons; if the unique annotated exons are completely consistent, determining the unique annotated exons as a new transcript; if the unique annotation exons are not consistent, determining the fusion transcript; meanwhile, calculating annotation information required to be output by relevant classification;
(4-3) carrying out combined processing on the multi-fragment transcripts, judging the transcripts to be fusion transcripts if the annotations are inconsistent, and classifying the transcripts in sequence of fusion transcripts, new transcripts and known transcripts if the transcripts are finally classified in inconsistent; and finally obtaining the annotation information and the classification information.
8. Use of the method of any one of claims 1 to 7 for the accurate annotation of three generations of full-length transcripts.
CN202210252816.8A 2022-03-15 2022-03-15 Method for accurately annotating three-generation full-length transcript Pending CN114627967A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210252816.8A CN114627967A (en) 2022-03-15 2022-03-15 Method for accurately annotating three-generation full-length transcript

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210252816.8A CN114627967A (en) 2022-03-15 2022-03-15 Method for accurately annotating three-generation full-length transcript

Publications (1)

Publication Number Publication Date
CN114627967A true CN114627967A (en) 2022-06-14

Family

ID=81902881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210252816.8A Pending CN114627967A (en) 2022-03-15 2022-03-15 Method for accurately annotating three-generation full-length transcript

Country Status (1)

Country Link
CN (1) CN114627967A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117174166A (en) * 2023-10-26 2023-12-05 北京基石京准诊断科技有限公司 Tumor neoantigen prediction method and system based on third-generation sequencing data

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140280327A1 (en) * 2013-03-15 2014-09-18 Cypher Genomics Systems and methods for genomic variant annotation
CN105389481A (en) * 2015-12-22 2016-03-09 武汉菲沙基因信息有限公司 Method for detecting variable spliceosome in third generation full-length transcriptome
CN107563149A (en) * 2017-08-21 2018-01-09 上海派森诺生物科技股份有限公司 The structure annotation and comparison result appraisal procedure of total length transcript
CN107688727A (en) * 2016-08-05 2018-02-13 深圳华大基因股份有限公司 Biological sequence clusters and the recognition methods of transcript hypotype and device in total length transcript profile
CN111312331A (en) * 2020-03-27 2020-06-19 武汉古奥基因科技有限公司 Genome annotation method using second-generation and third-generation transcriptome sequencing data
CN111326212A (en) * 2020-02-18 2020-06-23 福建和瑞基因科技有限公司 Detection method of structural variation
CN111916147A (en) * 2019-05-10 2020-11-10 武汉未来组生物科技有限公司 Transcript classification method
CN112086128A (en) * 2020-08-14 2020-12-15 南京派森诺基因科技有限公司 Third-generation full-length transcriptome sequencing result analysis method suitable for sequence sequencing
CN112397149A (en) * 2020-11-11 2021-02-23 天津现代创新中药科技有限公司 Transcriptome analysis method and system without reference genome sequence
CN113362889A (en) * 2021-06-25 2021-09-07 广州燃石医学检验所有限公司 Genome structure variation annotation method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140280327A1 (en) * 2013-03-15 2014-09-18 Cypher Genomics Systems and methods for genomic variant annotation
CN105389481A (en) * 2015-12-22 2016-03-09 武汉菲沙基因信息有限公司 Method for detecting variable spliceosome in third generation full-length transcriptome
CN107688727A (en) * 2016-08-05 2018-02-13 深圳华大基因股份有限公司 Biological sequence clusters and the recognition methods of transcript hypotype and device in total length transcript profile
CN107563149A (en) * 2017-08-21 2018-01-09 上海派森诺生物科技股份有限公司 The structure annotation and comparison result appraisal procedure of total length transcript
CN111916147A (en) * 2019-05-10 2020-11-10 武汉未来组生物科技有限公司 Transcript classification method
CN111326212A (en) * 2020-02-18 2020-06-23 福建和瑞基因科技有限公司 Detection method of structural variation
CN111312331A (en) * 2020-03-27 2020-06-19 武汉古奥基因科技有限公司 Genome annotation method using second-generation and third-generation transcriptome sequencing data
CN112086128A (en) * 2020-08-14 2020-12-15 南京派森诺基因科技有限公司 Third-generation full-length transcriptome sequencing result analysis method suitable for sequence sequencing
CN112397149A (en) * 2020-11-11 2021-02-23 天津现代创新中药科技有限公司 Transcriptome analysis method and system without reference genome sequence
CN113362889A (en) * 2021-06-25 2021-09-07 广州燃石医学检验所有限公司 Genome structure variation annotation method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
周卫星;石海鹤;: "高通量测序中序列拼接算法的研究进展", 计算机科学, no. 05, 15 May 2019 (2019-05-15) *
李玉梅;李书娴;李向上;李川昀;: "第三代测序技术在转录组学研究中的应用", 生命科学仪器, no. 1, 25 October 2018 (2018-10-25) *
钟伟民;张兴坦;赵茜;马东娜;唐海宝;: "三代测序PacBio在转录组研究中的应用", 福建农林大学学报(自然科学版), no. 05, 18 September 2018 (2018-09-18) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117174166A (en) * 2023-10-26 2023-12-05 北京基石京准诊断科技有限公司 Tumor neoantigen prediction method and system based on third-generation sequencing data
CN117174166B (en) * 2023-10-26 2024-03-26 北京基石生命科技有限公司 Tumor neoantigen prediction method and system based on third-generation sequencing data

Similar Documents

Publication Publication Date Title
CN109033749B (en) Tumor mutation load detection method, device and storage medium
CN106202991B (en) The detection method of abrupt information in product is sequenced in a kind of genome multiplex amplification
CN111326212B (en) Structural variation detection method
CN107391965A (en) A kind of lung cancer somatic mutation determination method based on high throughput sequencing technologies
CN100356392C (en) Post-processing approach of character recognition
WO2018218788A1 (en) Third-generation sequencing sequence alignment method based on global seed scoring optimization
CN106529171A (en) Detection analysis method for breast cancer susceptibility gene heritable variation point
WO2018218787A1 (en) Third-generation sequencing sequence correction method based on local graph
CN110993023B (en) Detection method and detection device for complex mutation
CN110692101A (en) Method for aligning targeted nucleic acid sequencing data
CN115631789B (en) Group joint variation detection method based on pan genome
CN113035273A (en) Rapid and ultrahigh-sensitivity DNA fusion gene detection method
CN110021346A (en) Gene Fusion and mutation detection methods and system based on RNAseq data
CN114627967A (en) Method for accurately annotating three-generation full-length transcript
US20040142347A1 (en) Mitochondrial DNA autoscoring system
CN111696622B (en) Method for correcting and evaluating detection result of mutation detection software
CN110164504B (en) Method and device for processing next-generation sequencing data and electronic equipment
CN109960707B (en) College recruitment data acquisition method and system based on artificial intelligence
CN112397148A (en) Sequence comparison method, sequence correction method and device thereof
CN114067908B (en) Method, device and storage medium for evaluating single-sample homologous recombination defects
CN113536759B (en) Text duplicate checking method, device and equipment
CN114530200A (en) Mixed sample identification method based on calculation of SNP entropy
Savriama et al. Testing the accuracy of 3D automatic landmarking via genome-wide association studies
CN111653312B (en) Method for exploring disease subtype affinity by using genome data
CN117746989B (en) Method and device for processing variation description information and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination