CN112951329A - High-throughput sequencing variation risk grouping screening method - Google Patents

High-throughput sequencing variation risk grouping screening method Download PDF

Info

Publication number
CN112951329A
CN112951329A CN202110275446.5A CN202110275446A CN112951329A CN 112951329 A CN112951329 A CN 112951329A CN 202110275446 A CN202110275446 A CN 202110275446A CN 112951329 A CN112951329 A CN 112951329A
Authority
CN
China
Prior art keywords
variation
screening
risk
pathogenicity
setting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110275446.5A
Other languages
Chinese (zh)
Inventor
刘洪洲
喻长顺
李冬梅
陈建春
贾晓冬
李行
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Jinyu Medical Laboratory Co ltd
Original Assignee
Tianjin Jinyu Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Jinyu Medical Laboratory Co ltd filed Critical Tianjin Jinyu Medical Laboratory Co ltd
Priority to CN202110275446.5A priority Critical patent/CN112951329A/en
Publication of CN112951329A publication Critical patent/CN112951329A/en
Priority to CN202111212516.9A priority patent/CN113793642B/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Organic Chemistry (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a high-throughput sequencing variation risk grouping and screening method, which comprises the following steps: setting variation screening conditions with high pathogenicity risk, and screening gene data according to the variation screening conditions with high pathogenicity risk; setting the screening conditions of pathogenic high-risk variation and medium-risk variation with extremely low population frequency, and screening gene data; setting variation screening conditions with high risk in pathogenicity, and screening gene data; setting a genetic variation screening condition of autosomal dominant inheritance with high risk in pathogenicity, and screening genetic data; setting screening conditions of autosomal recessive inheritance with high risk in pathogenicity and the same gene, and screening gene data; setting variation screening conditions of sex linkage inheritance with high risk in pathogenicity, and screening gene data; and setting screening conditions of high risk in pathogenicity, which are considered as pathogenic variation or suspicious pathogenic variation, and screening the gene data.

Description

High-throughput sequencing variation risk grouping screening method
Technical Field
The invention relates to the technical field of high-throughput sequencing, in particular to a high-throughput sequencing variation risk grouping screening method.
Background
Human nuclear gene-related diseases can be classified into autosomal dominant genetic disease (AD inheritance), autosomal recessive genetic disease (AR inheritance) and sex-linked inheritance according to the genetic manner. The application of gene sequencing technology is an important means for searching the etiology of disease genes.
High-Throughput Sequencing (also known as Next-Generation Sequencing) is one of the gene Sequencing technologies, and is typically characterized in that thousands of genes can be detected at a time, and compared with one-Generation Sequencing, that is, Sanger Sequencing can only detect hundreds of bases at a time, a large amount of data can be generated by greatly increasing the number of bases. And annotating the generated data by a bioinformatics method to generate a variation annotation table. Currently, the high-throughput sequencing technology is widely applied to whole exome sequencing, about 2 ten thousand genes are detected, and about 6 thousand pieces of variation annotation table data of each sample exist.
With such huge data information, it is almost impossible to manually analyze site by site. The current method commonly used by independent laboratories is to define risks by itself according to specific database frequencies, variation categories, etc. Or the data volume is still very large after screening, the personnel requirement is extremely high, the workload is very large, or the screening condition is too harsh, so that the variation with higher risk is filtered out, and the detection rate is reduced. There are patents disclosing phenotypic screening using specific samples (patent No.: CN202010035599.8), and this screening method can be used only for screening specific samples, and has high specificity but reduced sensitivity.
The existing screening method is only screened according to a specific mode, the screened variation cannot be screened in other modes, the possibility that the high-risk variation is missed under other conditions is greatly improved, and other screening logics cannot be considered due to the fact that the high-risk variation is not grouped.
Disclosure of Invention
The object of the present invention is to solve at least one of the technical drawbacks mentioned.
Therefore, the invention aims to provide a high-throughput sequencing variation risk grouping screening method.
In order to achieve the above object, an embodiment of the present invention provides a high throughput sequencing variation risk grouping screening method, including:
step S1, setting variation screening conditions with high pathogenicity risk, and screening gene data according to the variation screening conditions with high pathogenicity risk;
step S2, setting screening conditions of pathogenicity high-risk variation and medium-risk variation with extremely low crowd frequency, and screening gene data;
step S3, setting variation screening conditions with high risk in pathogenicity, and screening gene data;
step S4, setting screening conditions of genetic variation of autosomal dominant inheritance with high risk in pathogenicity, and screening genetic data;
step S5, setting screening conditions that the autosomal recessive inheritance with high risk in pathogenicity and different variation of the same gene are not less than 2 or single variation is homozygous, and screening gene data;
s6, setting variation screening conditions of sex-linked inheritance with high risk in pathogenicity, and screening gene data;
step S7, setting screening conditions that high risk in pathogenicity is considered to be pathogenic variation or suspicious pathogenic variation, or other reliable databases are considered to be harmful variation, and screening gene data;
in step S8, the screened high-risk genes are labeled.
Further, in the step S7, the HGMD database has data or the ClinVar database includes data considered to be pathogenic variants or suspected pathogenic variants while setting a high risk of pathogenicity
According to the high-throughput sequencing variation risk grouping and screening method provided by the embodiment of the invention, the set logic systematically considers the risk of harmfulness of the whole high-throughput sequencing variation, the risk of harmfulness of variation in different genetic modes and the risk of harmfulness of variation evaluated by the existing database, so that risk screening is refined, and variation risk screening is innovatively carried out in a grouping mode according to different conditions. Compared with the existing screening, the screening condition of the scheme is more targeted, the high specificity and the high sensitivity are considered, the workload of an analyst is reduced, and the efficiency is improved. The invention adopts a grouping screening method to carry out variation high-risk screening from different angles, and all groups give consideration to each other, thereby not only rapidly screening out high-risk variation in the groups, but also greatly reducing the risk of filtering out the high-risk variation. After grouping, various logics are clear, and an analyst can conveniently control whether the variation is related to the detected sample. The invention can realize the rapid and accurate marking of high-risk variant genes from 6 ten thousand genes within 2 minutes, thereby greatly improving the efficiency.
The invention screens the high-risk variation in the group by taking the high-throughput sequencing variation annotation table data as a unit, comprehensively considers various factors that the risk level defined by a laboratory is high-risk variation, Actionable Variants gene variation, high-risk variation in different genetic modes, the variation is considered to be harmful by a reliable database and the like, and the factors complement each other, so that the risk screening becomes more targeted and the high-risk variation is prevented from being missed. For the sequencing data of the whole exome, the average of the variation screened out by each group is about 50, and because the variation screened out by the groups possibly overlaps with each other, the total variation screened out is about 300, and simultaneously, the high specificity and the high sensitivity are combined, and the higher screening efficiency is realized. For the analyst, only the analysis of a single group needs to be considered during the analysis, so that the workload of the analyst is greatly reduced. There are very high risk variations that can occur in different groups, greatly reducing the likelihood of missing during the analysis.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a high throughput sequencing variant risk grouping screening method according to an embodiment of the present invention;
fig. 2 to 8 are operation interface diagrams of the high throughput sequencing variant risk grouping screening method according to the embodiment of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The invention provides a high-throughput sequencing variation risk grouping form and a high-throughput sequencing variation risk grouping method. Wherein, the grouping form refers to that the high-throughput sequencing variation group is considered for risk screening; and secondly, grouping the Chinese characters into seven groups. The invention adopts a grouping screening method to carry out variation high-risk screening from different angles, and all groups give consideration to each other, thereby not only rapidly screening out high-risk variation in the groups, but also greatly reducing the risk of filtering out the high-risk variation. After grouping, various logics are clear, and an analyst can conveniently control whether the variation is related to the detected sample.
As shown in fig. 1, the method for high throughput sequencing variation risk grouping screening according to the embodiment of the present invention includes the following steps:
step S1, setting variation screening conditions with high pathogenicity risk, and screening gene data according to the variation screening conditions with high pathogenicity risk;
specifically, the screening is carried out according to the following conditions: the 59 Actionable Variants genes or additional customized genes suggested by american society for medical genetics and genomics (ACMG) were considered collectively as highly pathogenic Variants. (Note: ACT for short)
Step S2, setting screening conditions of pathogenicity high-risk variation and medium-risk variation with extremely low crowd frequency, and screening gene data;
specifically, the screening is carried out according to the following conditions: the comprehensive consideration of the laboratory self-definition is the variation with high risk of pathogenicity and the variation with medium risk with extremely low crowd frequency. (Note: abbreviated as H). The frequency of the population is very low and can be set according to common genetic diseases, and 0 is preferred.
And step S3, setting variation screening conditions with high risk in pathogenicity, and screening gene data.
Specifically, the screening is carried out according to the following conditions: a combination of laboratory-defined considerations is a high-risk variation in pathogenicity. (Note: abbreviated as M)
Step S4, setting screening conditions of genetic variation of autosomal dominant inheritance with high risk in pathogenicity, and screening genetic data;
specifically, the screening is carried out according to the following conditions: a comprehensive consideration for laboratory customization is genetic variation in autosomal dominant inheritance, which is a high risk in pathogenicity. (Note: abbreviated AD).
Step S5, setting screening conditions that the autosomal recessive inheritance with high risk in pathogenicity and different variation of the same gene are not less than 2 or single variation is homozygous, and screening gene data;
specifically, the screening is carried out according to the following conditions: the comprehensive consideration of laboratory self-definition is that the autosomal recessive inheritance with high risk in pathogenicity and different variation of the same gene is not less than 2 or a single variation is homozygous. (Note: abbreviated AR).
S6, setting variation screening conditions of sex-linked inheritance with high risk in pathogenicity, and screening gene data;
specifically, the screening is carried out according to the following conditions: a laboratory-defined combination of considerations is the variation of sex-linked inheritance with a high risk in pathogenicity. (Note: abbreviated XY).
Step S7, setting screening conditions that high risk in pathogenicity is considered to be pathogenic variation or suspicious pathogenic variation, or other reliable databases are considered to be harmful variation, and screening gene data;
in said step S7, a high risk of pathogenicity is set while the HGMD database has data or the ClinVar database contains what is considered to be a pathogenic variant or a suspected pathogenic variant.
Specifically, the screening is carried out according to the following conditions: the comprehensive consideration of the laboratory customization is that the HGMD database has data while the pathogenicity is high, or the ClinVar database contains the variants considered to be pathogenic or suspected pathogenic, or other reliable databases considered to be harmful. (Note: abbreviated CD).
In step S8, the screened high-risk genes are labeled.
The data are screened according to the scheme of the invention, and the detailed process is as follows:
group 1 (i.e., ACT), screened for the following conditions: panel ═ ACT; risk ═ 2.
Group 2 (i.e., H), screened in combination with the following conditions: one is that Risk is 2, fmax ≦ 0.4[2] (reference [2], in order to prevent missing detection, the threshold is adjusted to 0.4); the other is "right ═ 1/1.5/2/1-potential", OMIM ═ non-empty, fmax ═ 0, class ═ missense/P-spicing cancer/D-spicing/N-ncRNA _ exic/N-exinic/D-stopgain/P-scSNV/D-stoppages/N-ncRNA _ exic; spicing/N-exonic; spicing/P-nframeshift/D-frameshift ".
Group 3 (i.e., M), screened under the following conditions: risk ═ 1/1.5/2/1-potential ", OMIM ═ nonempty, 0< fmax ≦ 0.001, class ≦ missense/P-spicing Candidate/D-spicing/N-ncRNA _ exon/N-exon/D-stopgain/P-scSNV/D-stores/N-ncRNA _ exon; spicing/N-exonic; (iii) partitioning/P-not-frame/D-frame ", CLNSIG"./deflecting-interpolating-of-mapping-x 2 c-other/association/deflecting-interpolating-of-mapping/not-mapped/drug-response/unknown _ identity/skin _ factor/protective/effects \ x2c _ association/lipid _ mapping/affinity/association \ x2c _ skin _ factor/protective/therapeutic/protective/mapping/approach/mapping \ x2c _ skin _ factor/therapeutic/mapping/therapeutic x2c _ protective ".
Group 4 (i.e., AD), screened under the following conditions: risk ═ 1/1.5/2/1-potential ", OMIM ═ AD, fmax ≦ 0.05[1], class ═ missense/P-marketing Candidate/D-marketing/N-ncRNA _ marketing/N-marketing/D-marketing/P-scSNV/D-stores/N-ncRNA _ marketing; spicing/N-exonic; spicing/P-nframeshift/D-frameshift ".
Group 5 (i.e., AR), screened in combination with the following conditions: one is Risk ═ 1/1.5/2/1-potential ", OMIM ═ AR, fmax ≦ 0.05, class ═ missense/P-spicing Candidate/D-spicing/N-ncRNA _ exic/N-exic/D-stopgain/P-scSNV/D-stoppages/N-ncRNA _ exic; spicing/N-exonic; screening/P-nframe shift/D-frame shift ", removing only one item of Gene. Secondly, Risk ═ 1/1.5/2/1-potential ", OMIM ═ AR, fmax ≦ 0.05, Het/Hom ≦ Hom, Chr ≠" chrX/chrY ".
Group 6 (i.e., XY), screened under the following conditions: risk ═ 1/1.5/2/1-potential ", OMIM ═ nonempty, fmax ≦ 0.05, Het/Hom ≦ Hom, and Chr ═ chrX/chrY.
Group 7 (i.e., CD), screened in combination with the following conditions: one is Risk ═ 1/1.5/2/1-potential ", OMIM ═ nonempty, fmax ≦ 0.2, HGMD ═ nonempty; risk ═ 1/1.5/2/1-potential ", OMIM ═ null, fmax ≦ 0.2, CLNSIG ═ Likely _ path/path \ x2c _ protective".
The screening results are shown in table 1 below:
Figure BDA0002976448220000051
TABLE 1
The following is a contrast screening mode, different from the scheme of the invention, and the screening process without grouping risk is as follows:
screening conditions are as follows: risk ═ 0.5/1/1.5/2/1-potential, fmax ≦ 0.4. Since there are no packets, other screening conditions are not applicable.
For the screening conditions, the scheme of the invention and the control scheme separately consider high risk, the true positive numbers are similar, so the sensitivity is similar, but the specificity is greatly different. If the control scheme additionally increases the screening conditions, the probability of false negatives is increased, and the sensitivity is reduced.
Specific example 1:
in one example of the whole exome sequencing data in this laboratory, the number of mutation sites detected was 59559, and the first row of the annotation, i.e., the identification and header, is described as follows:
Figure BDA0002976448220000061
Figure BDA0002976448220000071
the data are screened according to the scheme of the invention, and the detailed process is as follows:
group 1 (i.e., ACT), screened for the following conditions: panel ═ ACT; risk ═ 2.
Group 2 (i.e., H), screened in combination with the following conditions: one is that Risk is 2, fmax ≦ 0.4[2] (reference [2], in order to prevent missing detection, the threshold is adjusted to 0.4); the other is "right ═ 1/1.5/2/1-potential", OMIM ═ non-empty, fmax ═ 0, class ═ missense/P-spicing cancer/D-spicing/N-ncRNA _ exic/N-exinic/D-stopgain/P-scSNV/D-stoppages/N-ncRNA _ exic; spicing/N-exonic; spicing/P-nframeshift/D-frameshift ".
Group 3 (i.e., M), screened under the following conditions: risk ═ 1/1.5/2/1-potential ", OMIM ═ nonempty, 0< fmax ≦ 0.001, class ≦ missense/P-spicing Candidate/D-spicing/N-ncRNA _ exon/N-exon/D-stopgain/P-scSNV/D-stores/N-ncRNA _ exon; spicing/N-exonic; (iii) partitioning/P-not-frame/D-frame ", CLNSIG"./deflecting-interpolating-of-mapping-x 2 c-other/association/deflecting-interpolating-of-mapping/not-mapped/drug-response/unknown _ identity/skin _ factor/protective/effects \ x2c _ association/lipid _ mapping/affinity/association \ x2c _ skin _ factor/protective/therapeutic/protective/mapping/approach/mapping \ x2c _ skin _ factor/therapeutic/mapping/therapeutic x2c _ protective ".
Group 4 (i.e., AD), screened under the following conditions: risk ═ 1/1.5/2/1-potential ", OMIM ═ AD, fmax ≦ 0.05[1], class ═ missense/P-marketing Candidate/D-marketing/N-ncRNA _ marketing/N-marketing/D-marketing/P-scSNV/D-stores/N-ncRNA _ marketing; spicing/N-exonic; spicing/P-nframeshift/D-frameshift ".
Group 5 (i.e., AR), screened in combination with the following conditions: one is Risk ═ 1/1.5/2/1-potential ", OMIM ═ AR, fmax ≦ 0.05, class ═ missense/P-spicing Candidate/D-spicing/N-ncRNA _ exic/N-exic/D-stopgain/P-scSNV/D-stoppages/N-ncRNA _ exic; spicing/N-exonic; screening/P-nframe shift/D-frame shift ", removing only one item of Gene. Secondly, Risk ═ 1/1.5/2/1-potential ", OMIM ═ AR, fmax ≦ 0.05, Het/Hom ≦ Hom, Chr ≠" chrX/chrY ".
Group 6 (i.e., XY), screened under the following conditions: risk ═ 1/1.5/2/1-potential ", OMIM ═ nonempty, fmax ≦ 0.05, Het/Hom ≦ Hom, and Chr ═ chrX/chrY.
Group 7 (i.e., CD), screened in combination with the following conditions: one is Risk ═ 1/1.5/2/1-potential ", OMIM ═ nonempty, fmax ≦ 0.2, HGMD ═ nonempty; risk ═ 1/1.5/2/1-potential ", OMIM ═ null, fmax ≦ 0.2, CLNSIG ═ Likely _ path/path \ x2c _ protective".
The screening results are shown in table 2 below:
Figure BDA0002976448220000081
TABLE 2
The following is a contrast screening mode, different from the scheme of the invention, and the screening process without grouping risks is as follows:
screening conditions are as follows: risk ═ 0.5/1/1.5/2/1-potential, fmax ≦ 0.4. Since there are no packets, other screening conditions are not applicable.
The screening results are shown in table 3 below:
number of original data Control method screening Number of screening strips Total number of screening lines
59559 Sifting item 1719 1719
TABLE 3
For the screening conditions, the scheme of the invention and the control scheme separately consider high risk, the true positive numbers are similar, so the sensitivity is similar, but the specificity is greatly different. The approximate comparison is as follows:
the scheme of the invention is as follows: false positives of about 0; true negative is original data number-patent scheme screening number is 59269; specificity is true negative/(false positive + true negative) is 1
Control protocol: since the patent protocol is similar in number to the true positives of the control protocol, the false positives of the control protocol are approximately: 0+ (control protocol screen number-patent protocol screen number) 1429; true negative is original data number-control screening number is 57840; specificity (true negative/(false positive + true negative) ═ 97.59%
The specificity of the inventive protocol was 2.41% higher than that of the control protocol
If the control scheme additionally increases the screening conditions, the probability of false negatives is increased, and the sensitivity is reduced.
Exe executable file under dist folder is opened by double-clicking, as shown in fig. 2. The first line of the page contains the description of the software (note: redot is a general software for detecting high-risk variant genes developed by the gold-field company). The second line of the page prompts for the path of the file where the user can enter or paste the path and file name of the annotation file, as shown in fig. 3. "Enter" will then print the input content and the content that needs to be input next, as shown in fig. 4. The name of the gene annotation file is entered (note: the file name should be identical to the name in the previous step here), as shown in FIG. 5. "Enter" will then print the entered filename and the program will begin to compute the result, as shown in FIG. 6. After the program runs, the window is automatically closed, and the generated file is placed under the dist folder, as shown in fig. 7. The naming mode of the generated file is as follows: a file name + "" + "redot" + "." + "date" + ". a" + "suffix" as in "np23fw0151. xx. redot.20210125. xlsxsxsxsxsx". Two columns of "Fast _ analysis" and "Element _ analysis" were added to the file generated by the redot software, and 322 genes with high risk of variation were marked from 59238 genes, as shown in fig. 8.
According to the high-throughput sequencing variation risk grouping and screening method provided by the embodiment of the invention, the set logic systematically considers the risk of harmfulness of the whole high-throughput sequencing variation, the risk of harmfulness of variation in different genetic modes and the risk of harmfulness of variation evaluated by the existing database, so that risk screening is refined, and variation risk screening is innovatively carried out in a grouping mode according to different conditions. Compared with the existing screening, the screening condition of the scheme is more targeted, the high specificity and the high sensitivity are considered, the workload of an analyst is reduced, and the efficiency is improved. The invention adopts a grouping screening method to carry out variation high-risk screening from different angles, and all groups give consideration to each other, thereby not only rapidly screening out high-risk variation in the groups, but also greatly reducing the risk of filtering out the high-risk variation. After grouping, various logics are clear, and an analyst can conveniently control whether the variation is related to the detected sample. The invention can realize the rapid and accurate marking of high-risk variant genes from 6 ten thousand genes within 2 minutes, thereby greatly improving the efficiency.
The invention screens the high-risk variation in the group by taking the high-throughput sequencing variation annotation table data as a unit, comprehensively considers various factors that the risk level defined by a laboratory is high-risk variation, Actionable Variants gene variation, high-risk variation in different genetic modes, the variation is considered to be harmful by a reliable database and the like, and the factors complement each other, so that the risk screening becomes more targeted and the high-risk variation is prevented from being missed. For the sequencing data of the whole exome, the average of the variation screened out by each group is about 50, and because the variation screened out by the groups possibly overlaps with each other, the total variation screened out is about 300, and simultaneously, the high specificity and the high sensitivity are combined, and the higher screening efficiency is realized. For the analyst, only the analysis of a single group needs to be considered during the analysis, so that the workload of the analyst is greatly reduced. There are very high risk variations that can occur in different groups, greatly reducing the likelihood of missing during the analysis.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (2)

1. A high-throughput sequencing variation risk grouping screening method is characterized by comprising the following steps:
step S1, setting variation screening conditions with high pathogenicity risk, and screening gene data according to the variation screening conditions with high pathogenicity risk;
step S2, setting screening conditions of pathogenicity high-risk variation and medium-risk variation with extremely low crowd frequency, and screening gene data;
step S3, setting variation screening conditions with high risk in pathogenicity, and screening gene data;
step S4, setting screening conditions of genetic variation of autosomal dominant inheritance with high risk in pathogenicity, and screening genetic data;
step S5, setting screening conditions that the autosomal recessive inheritance with high risk in pathogenicity and different variation of the same gene are not less than 2 or single variation is homozygous, and screening gene data;
s6, setting variation screening conditions of sex-linked inheritance with high risk in pathogenicity, and screening gene data;
step S7, setting screening conditions that high risk in pathogenicity is considered to be pathogenic variation or suspicious pathogenic variation, or other reliable databases are considered to be harmful variation, and screening gene data;
in step S8, the screened high-risk genes are labeled.
2. The high throughput sequencing variation risk grouping screening method of claim 1, wherein in step S7, a high risk in pathogenicity is set while the HGMD database is at data or the Cl _ inVar database is included to consider as a pathogenic variation or a suspected pathogenic variation.
CN202110275446.5A 2021-03-15 2021-03-15 High-throughput sequencing variation risk grouping screening method Pending CN112951329A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110275446.5A CN112951329A (en) 2021-03-15 2021-03-15 High-throughput sequencing variation risk grouping screening method
CN202111212516.9A CN113793642B (en) 2021-03-15 2021-10-19 High-throughput sequencing variation risk grouping screening method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110275446.5A CN112951329A (en) 2021-03-15 2021-03-15 High-throughput sequencing variation risk grouping screening method

Publications (1)

Publication Number Publication Date
CN112951329A true CN112951329A (en) 2021-06-11

Family

ID=76229788

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202110275446.5A Pending CN112951329A (en) 2021-03-15 2021-03-15 High-throughput sequencing variation risk grouping screening method
CN202111212516.9A Active CN113793642B (en) 2021-03-15 2021-10-19 High-throughput sequencing variation risk grouping screening method and system

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202111212516.9A Active CN113793642B (en) 2021-03-15 2021-10-19 High-throughput sequencing variation risk grouping screening method and system

Country Status (1)

Country Link
CN (2) CN112951329A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114255821A (en) * 2021-12-31 2022-03-29 天津金域医学检验实验室有限公司 Family three-sample high-throughput sequencing risk grouping screening method and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101302563A (en) * 2008-07-08 2008-11-12 上海中优医药高科技有限公司 Comprehensive evaluation method of polygenic diseases genetic risk
US20170329893A1 (en) * 2016-05-09 2017-11-16 Human Longevity, Inc. Methods of determining genomic health risk
GB2564848A (en) * 2017-07-18 2019-01-30 Congenica Ltd Prenatal screening and diagnostic system and method
WO2019029807A1 (en) * 2017-08-09 2019-02-14 King Faisal Specialist Hospital & Research Centre Gene panel for identifying a predisposition for inherited cancer
KR102010899B1 (en) * 2018-07-02 2019-08-14 연세대학교 산학협력단 Method for providing the information for predicting or diagnosing of inflammatory bowel disease using single nucleotide polymorphism to be identified from next generation sequencing screening
CN109754856B (en) * 2018-12-07 2021-06-22 荣联科技集团股份有限公司 Method and device for automatically generating gene detection report and electronic equipment
CN110648722B (en) * 2019-09-19 2022-05-31 首都医科大学附属北京儿童医院 Device for evaluating neonatal genetic disease risk

Also Published As

Publication number Publication date
CN113793642B (en) 2024-05-07
CN113793642A (en) 2021-12-14

Similar Documents

Publication Publication Date Title
Su et al. iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC
US10127353B2 (en) Method and systems for querying sequence-centric scientific information
Rivas et al. A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs
Browning et al. Haplotype phasing: existing methods and new developments
Tesson et al. DiffCoEx: a simple and sensitive method to find differentially coexpressed gene modules
Pugh et al. VisCap: inference and visualization of germ-line copy-number variants from targeted clinical sequencing data
Cibulskis et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples
US9141913B2 (en) Categorization and filtering of scientific data
Hijikata et al. Construction of an open-access database that integrates cross-reference information from the transcriptome and proteome of immune cells
Wang et al. Computational resources for ribosome profiling: from database to Web server and software
Su et al. Data analysis guidelines for single-cell RNA-seq in biomedical studies and clinical applications
CN112599198A (en) Microorganism species and functional composition analysis method for metagenome sequencing data
Hill et al. A deep learning approach for detecting copy number variation in next-generation sequencing data
Kyrilis et al. Detecting protein communities in native cell extracts by machine learning: a structural biologist’s perspective
US20140107933A1 (en) Gene expression barcode for normal and diseased tissue classification
CN112951329A (en) High-throughput sequencing variation risk grouping screening method
Li et al. scDEA: differential expression analysis in single-cell RNA-sequencing data via ensemble learning
Kachouie et al. Discriminant analysis of lung cancer using nonlinear clustering of copy numbers
Eckenrode et al. Curated single cell multimodal landmark datasets for R/Bioconductor
US20040219567A1 (en) Methods for global pattern discovery of genetic association in mapping genetic traits
Wang et al. LaCOme: learning the latent convolutional patterns among transcriptomic features to improve classifications
CN116825182B (en) Method for screening bacterial drug resistance characteristics based on genome ORFs and application
Yang et al. ESCCdb: a comprehensive database and key regulator exploring platform based on cross dataset comparisons for esophageal squamous cell carcinoma
Hjörleifsson et al. Annotation-agnostic discovery of associations between novel gene isoforms and phenotypes
Riccadonna et al. Supervised classification of combined copy number and gene expression data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210611

WD01 Invention patent application deemed withdrawn after publication