WO2021133351A1 - Procédé de classement par ordre de priorité et de notation - Google Patents
Procédé de classement par ordre de priorité et de notation Download PDFInfo
- Publication number
- WO2021133351A1 WO2021133351A1 PCT/TR2020/051374 TR2020051374W WO2021133351A1 WO 2021133351 A1 WO2021133351 A1 WO 2021133351A1 TR 2020051374 W TR2020051374 W TR 2020051374W WO 2021133351 A1 WO2021133351 A1 WO 2021133351A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- features
- feature
- variant
- score
- new
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H40/00—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
- G16H40/20—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
Definitions
- the main goal of the invention is to shorten the time required for genetic diagnosis, by determining the candidate variants that could be associated with a disease, compared to the existing systems.
- the invention provides an algorithm based on machine learning methods that calculate pathogenicity scores for single nucleotide variants (SNVs). Novel features that haven’t been used previously in the literature for variant scoring and some of the existing scoring models (for example FATHMM, M-CAP, CERENKOV2, SIFT, PolyPhen, ClinPred, CADD, DANN, Mutation Tester) are used to develop a variant scoring system for SNV type variants.
- SNVs single nucleotide variants
- the workload on the user (usually a medical geneticist), required for the diagnosis, is significantly reduced by means of automatically scoring SNP type variants.
- the invention is a prioritization and scoring method which facilitates the interpretation of the genetic variants (in VCF file formed as a result of the bioinformatics pipeline), using machine learning for the analysis of new generation sequencing data. It comprises the following process steps;
- Figure 3 illustrates the structure that shows the complete system.
- Figure 4 illustrates the position of the invention within the system.
- FIG 3 the view of the system within a complete structure is given.
- Figure 4 the position of the invention within the system is shown.
- the complete system starts with the examination of a patient that exhibits various symptoms by a physician.
- the physician asks for a genetic test, if he/she finds it appropriate.
- the blood sample taken from the patient is prepared for DNA sequencing by the laboratory.
- the prepared sample is processed in the laboratory by the sequencing device and digital DNA data (raw data) of the patient is obtained. Since variant information regarding the disease cannot be achieved directly from the raw material, this data is required to be processed in computer environment via bioinformatics tools and thus the variant information is reached.
- a variant report that shows the relation of the variant with the disease is created.
- New feature space (30) is created by the feature construction model (20) based on the original features (10) in the data set, to be used for the variant scoring model (40).
- the features (10) are taken as input to the feature construction model (20).
- the new feature creation module (21) creates new features from the received features (10) via mathematical operators. New features and original features (10) are ranked according to criteria, such as consistency and information gain with the feature ranking module (22). A predetermined number of features are selected among the ranked features via the feature selection module (23).
- a new feature space (30) is created by the selected features to be used by the variant scoring model (40).
- New feature space (30) that is obtained after all the stages in the feature construction model (20) are carried out, is used as input parameters of the variant scoring model (40).
- the variant scoring model (40) is trained with machine learning methods by using the variant data set containing new feature space (30).
- the scoring model (40) generates variant score (51) by scoring the variant according to the input values.
- the features from the same feature space
- their weights may be different. This is also valid for the variant scoring model.
- a user evaluates the variant score (51) he/she may desire to state his/her expert opinion by referring to the information regarding which features are considered to what extent.
- the feature coefficients of the score (53) are calculated by using SHAP Values and LIME method so as to present such information. The user thus can be able to see the how the underlying process to obtain the score is carried out.
- score (51) After a variant is applied as an input to the scoring model (40), score (51), feature coefficients of the score (53) and the scoring model summary (52) are displayed to the user via an interface. Therefore, the user can see the underlying decision process specific to the variant, along with the variant score (51).
Abstract
L'invention concerne un procédé de classement par ordre de priorité et de notation qui facilite l'interprétation des variants génétiques (dans un fichier VCF formé suite au pipeline bioinformatique) par l'utilisateur, à l'aide d'un apprentissage automatique pour l'analyse de données de séquençage de nouvelle génération (NGS).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20907928.4A EP4022646A4 (fr) | 2019-12-25 | 2020-12-24 | Procédé de classement par ordre de priorité et de notation |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TR2019/21589 | 2019-12-25 | ||
TR201921589 | 2019-12-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021133351A1 true WO2021133351A1 (fr) | 2021-07-01 |
Family
ID=76576076
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/TR2020/051374 WO2021133351A1 (fr) | 2019-12-25 | 2020-12-24 | Procédé de classement par ordre de priorité et de notation |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP4022646A4 (fr) |
WO (1) | WO2021133351A1 (fr) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013052913A2 (fr) * | 2011-10-06 | 2013-04-11 | Sequenom, Inc. | Procédés et processus d'évaluation non invasive de variations génétiques |
CN109295198A (zh) * | 2018-09-03 | 2019-02-01 | 安吉康尔(深圳)科技有限公司 | 用于检测遗传性疾病基因变异的方法、装置及终端设备 |
WO2019148141A1 (fr) * | 2018-01-26 | 2019-08-01 | The Trustees Of Princeton University | Procédés d'analyse de données génétiques pour le classement de traits multifactoriels comprenant des pathologies complexes |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ES2875892T3 (es) * | 2013-09-20 | 2021-11-11 | Spraying Systems Co | Boquilla de pulverización para craqueo catalítico fluidizado |
-
2020
- 2020-12-24 WO PCT/TR2020/051374 patent/WO2021133351A1/fr unknown
- 2020-12-24 EP EP20907928.4A patent/EP4022646A4/fr active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013052913A2 (fr) * | 2011-10-06 | 2013-04-11 | Sequenom, Inc. | Procédés et processus d'évaluation non invasive de variations génétiques |
WO2019148141A1 (fr) * | 2018-01-26 | 2019-08-01 | The Trustees Of Princeton University | Procédés d'analyse de données génétiques pour le classement de traits multifactoriels comprenant des pathologies complexes |
CN109295198A (zh) * | 2018-09-03 | 2019-02-01 | 安吉康尔(深圳)科技有限公司 | 用于检测遗传性疾病基因变异的方法、装置及终端设备 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4022646A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP4022646A4 (fr) | 2022-11-02 |
EP4022646A1 (fr) | 2022-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11037685B2 (en) | Method and process for predicting and analyzing patient cohort response, progression, and survival | |
Manni et al. | BUSCO: assessing genomic data quality and beyond | |
Moreau et al. | Computational tools for prioritizing candidate genes: boosting disease gene discovery | |
Baele et al. | Bayesian evolutionary model testing in the phylogenomics era: matching model complexity with computational efficiency | |
US7324928B2 (en) | Method and system for determining phenotype from genotype | |
JP7041614B6 (ja) | 生体データにおけるパターン認識のマルチレベルアーキテクチャ | |
Palacio et al. | Smart data for genomic information systems: the SILE method | |
CN114424287A (zh) | 单细胞rna-seq数据处理 | |
Zhang et al. | MaLAdapt reveals novel targets of adaptive introgression from Neanderthals and Denisovans in worldwide human populations | |
US20240087747A1 (en) | Method and process for predicting and analyzing patient cohort response, progression, and survival | |
Hawinkel et al. | Model-based joint visualization of multiple compositional omics datasets | |
JP5067417B2 (ja) | 分子ネットワーク分析支援プログラム、分子ネットワーク分析支援装置、および分子ネットワーク分析支援方法 | |
Bernstein et al. | Jupyter notebook-based tools for building structured datasets from the Sequence Read Archive | |
Wen et al. | OmicsEV: a tool for comprehensive quality evaluation of omics data tables | |
Steinbiss et al. | LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons | |
Zhang et al. | VEF: a variant filtering tool based on ensemble methods | |
WO2021133351A1 (fr) | Procédé de classement par ordre de priorité et de notation | |
Reimand et al. | Pathway enrichment analysis of-omics data | |
Salazar et al. | Computational tools for parsimony phylogenetic analysis of omics data | |
CN113010783A (zh) | 基于多模态心血管疾病信息的医疗推荐方法、系统及介质 | |
Ahmad et al. | A review of genetic variant databases and machine learning tools for predicting the pathogenicity of breast cancer | |
Prytuliak et al. | SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence-and time-series data | |
KR102483880B1 (ko) | 복수의 데이터베이스 정보를 기반으로 하는 질병 프로파일링 정보 제공 시스템 및 그 방법 | |
Albrecht et al. | Machine Learning in Quality Assessment of Early Stage Next-Generation Sequencing Data | |
JP2001178463A (ja) | 類似発現パターン抽出方法及び関連生体高分子抽出方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20907928 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020907928 Country of ref document: EP Effective date: 20220328 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |