WO2021133351A1 - Procédé de classement par ordre de priorité et de notation - Google Patents

Procédé de classement par ordre de priorité et de notation Download PDF

Info

Publication number
WO2021133351A1
WO2021133351A1 PCT/TR2020/051374 TR2020051374W WO2021133351A1 WO 2021133351 A1 WO2021133351 A1 WO 2021133351A1 TR 2020051374 W TR2020051374 W TR 2020051374W WO 2021133351 A1 WO2021133351 A1 WO 2021133351A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
feature
variant
score
new
Prior art date
Application number
PCT/TR2020/051374
Other languages
English (en)
Inventor
Kazim Kivanç EREN
Yağmur Ceren DARDAĞAN
Orçun TAŞAR
Muhammed AKTOLUN
Esra ÇINAR
Irmak TÜRKOĞLU ÖZTORUN
Cüneyt Öksüz
Bahadir ONAY
Hüseyin ONAY
Original Assignee
İdea Teknoloji̇ Çözümleri̇ Bi̇lgi̇sayar Sanayi̇ Ve Ti̇caret Anoni̇m Şi̇rketi̇
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by İdea Teknoloji̇ Çözümleri̇ Bi̇lgi̇sayar Sanayi̇ Ve Ti̇caret Anoni̇m Şi̇rketi̇ filed Critical İdea Teknoloji̇ Çözümleri̇ Bi̇lgi̇sayar Sanayi̇ Ve Ti̇caret Anoni̇m Şi̇rketi̇
Priority to EP20907928.4A priority Critical patent/EP4022646A4/fr
Publication of WO2021133351A1 publication Critical patent/WO2021133351A1/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms

Definitions

  • the main goal of the invention is to shorten the time required for genetic diagnosis, by determining the candidate variants that could be associated with a disease, compared to the existing systems.
  • the invention provides an algorithm based on machine learning methods that calculate pathogenicity scores for single nucleotide variants (SNVs). Novel features that haven’t been used previously in the literature for variant scoring and some of the existing scoring models (for example FATHMM, M-CAP, CERENKOV2, SIFT, PolyPhen, ClinPred, CADD, DANN, Mutation Tester) are used to develop a variant scoring system for SNV type variants.
  • SNVs single nucleotide variants
  • the workload on the user (usually a medical geneticist), required for the diagnosis, is significantly reduced by means of automatically scoring SNP type variants.
  • the invention is a prioritization and scoring method which facilitates the interpretation of the genetic variants (in VCF file formed as a result of the bioinformatics pipeline), using machine learning for the analysis of new generation sequencing data. It comprises the following process steps;
  • Figure 3 illustrates the structure that shows the complete system.
  • Figure 4 illustrates the position of the invention within the system.
  • FIG 3 the view of the system within a complete structure is given.
  • Figure 4 the position of the invention within the system is shown.
  • the complete system starts with the examination of a patient that exhibits various symptoms by a physician.
  • the physician asks for a genetic test, if he/she finds it appropriate.
  • the blood sample taken from the patient is prepared for DNA sequencing by the laboratory.
  • the prepared sample is processed in the laboratory by the sequencing device and digital DNA data (raw data) of the patient is obtained. Since variant information regarding the disease cannot be achieved directly from the raw material, this data is required to be processed in computer environment via bioinformatics tools and thus the variant information is reached.
  • a variant report that shows the relation of the variant with the disease is created.
  • New feature space (30) is created by the feature construction model (20) based on the original features (10) in the data set, to be used for the variant scoring model (40).
  • the features (10) are taken as input to the feature construction model (20).
  • the new feature creation module (21) creates new features from the received features (10) via mathematical operators. New features and original features (10) are ranked according to criteria, such as consistency and information gain with the feature ranking module (22). A predetermined number of features are selected among the ranked features via the feature selection module (23).
  • a new feature space (30) is created by the selected features to be used by the variant scoring model (40).
  • New feature space (30) that is obtained after all the stages in the feature construction model (20) are carried out, is used as input parameters of the variant scoring model (40).
  • the variant scoring model (40) is trained with machine learning methods by using the variant data set containing new feature space (30).
  • the scoring model (40) generates variant score (51) by scoring the variant according to the input values.
  • the features from the same feature space
  • their weights may be different. This is also valid for the variant scoring model.
  • a user evaluates the variant score (51) he/she may desire to state his/her expert opinion by referring to the information regarding which features are considered to what extent.
  • the feature coefficients of the score (53) are calculated by using SHAP Values and LIME method so as to present such information. The user thus can be able to see the how the underlying process to obtain the score is carried out.
  • score (51) After a variant is applied as an input to the scoring model (40), score (51), feature coefficients of the score (53) and the scoring model summary (52) are displayed to the user via an interface. Therefore, the user can see the underlying decision process specific to the variant, along with the variant score (51).

Abstract

L'invention concerne un procédé de classement par ordre de priorité et de notation qui facilite l'interprétation des variants génétiques (dans un fichier VCF formé suite au pipeline bioinformatique) par l'utilisateur, à l'aide d'un apprentissage automatique pour l'analyse de données de séquençage de nouvelle génération (NGS).
PCT/TR2020/051374 2019-12-25 2020-12-24 Procédé de classement par ordre de priorité et de notation WO2021133351A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP20907928.4A EP4022646A4 (fr) 2019-12-25 2020-12-24 Procédé de classement par ordre de priorité et de notation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TR2019/21589 2019-12-25
TR201921589 2019-12-25

Publications (1)

Publication Number Publication Date
WO2021133351A1 true WO2021133351A1 (fr) 2021-07-01

Family

ID=76576076

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/TR2020/051374 WO2021133351A1 (fr) 2019-12-25 2020-12-24 Procédé de classement par ordre de priorité et de notation

Country Status (2)

Country Link
EP (1) EP4022646A4 (fr)
WO (1) WO2021133351A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013052913A2 (fr) * 2011-10-06 2013-04-11 Sequenom, Inc. Procédés et processus d'évaluation non invasive de variations génétiques
CN109295198A (zh) * 2018-09-03 2019-02-01 安吉康尔(深圳)科技有限公司 用于检测遗传性疾病基因变异的方法、装置及终端设备
WO2019148141A1 (fr) * 2018-01-26 2019-08-01 The Trustees Of Princeton University Procédés d'analyse de données génétiques pour le classement de traits multifactoriels comprenant des pathologies complexes

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2875892T3 (es) * 2013-09-20 2021-11-11 Spraying Systems Co Boquilla de pulverización para craqueo catalítico fluidizado

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013052913A2 (fr) * 2011-10-06 2013-04-11 Sequenom, Inc. Procédés et processus d'évaluation non invasive de variations génétiques
WO2019148141A1 (fr) * 2018-01-26 2019-08-01 The Trustees Of Princeton University Procédés d'analyse de données génétiques pour le classement de traits multifactoriels comprenant des pathologies complexes
CN109295198A (zh) * 2018-09-03 2019-02-01 安吉康尔(深圳)科技有限公司 用于检测遗传性疾病基因变异的方法、装置及终端设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4022646A4 *

Also Published As

Publication number Publication date
EP4022646A4 (fr) 2022-11-02
EP4022646A1 (fr) 2022-07-06

Similar Documents

Publication Publication Date Title
US11037685B2 (en) Method and process for predicting and analyzing patient cohort response, progression, and survival
Manni et al. BUSCO: assessing genomic data quality and beyond
Moreau et al. Computational tools for prioritizing candidate genes: boosting disease gene discovery
Baele et al. Bayesian evolutionary model testing in the phylogenomics era: matching model complexity with computational efficiency
US7324928B2 (en) Method and system for determining phenotype from genotype
JP7041614B6 (ja) 生体データにおけるパターン認識のマルチレベルアーキテクチャ
Palacio et al. Smart data for genomic information systems: the SILE method
CN114424287A (zh) 单细胞rna-seq数据处理
Zhang et al. MaLAdapt reveals novel targets of adaptive introgression from Neanderthals and Denisovans in worldwide human populations
US20240087747A1 (en) Method and process for predicting and analyzing patient cohort response, progression, and survival
Hawinkel et al. Model-based joint visualization of multiple compositional omics datasets
JP5067417B2 (ja) 分子ネットワーク分析支援プログラム、分子ネットワーク分析支援装置、および分子ネットワーク分析支援方法
Bernstein et al. Jupyter notebook-based tools for building structured datasets from the Sequence Read Archive
Wen et al. OmicsEV: a tool for comprehensive quality evaluation of omics data tables
Steinbiss et al. LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons
Zhang et al. VEF: a variant filtering tool based on ensemble methods
WO2021133351A1 (fr) Procédé de classement par ordre de priorité et de notation
Reimand et al. Pathway enrichment analysis of-omics data
Salazar et al. Computational tools for parsimony phylogenetic analysis of omics data
CN113010783A (zh) 基于多模态心血管疾病信息的医疗推荐方法、系统及介质
Ahmad et al. A review of genetic variant databases and machine learning tools for predicting the pathogenicity of breast cancer
Prytuliak et al. SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence-and time-series data
KR102483880B1 (ko) 복수의 데이터베이스 정보를 기반으로 하는 질병 프로파일링 정보 제공 시스템 및 그 방법
Albrecht et al. Machine Learning in Quality Assessment of Early Stage Next-Generation Sequencing Data
JP2001178463A (ja) 類似発現パターン抽出方法及び関連生体高分子抽出方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20907928

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020907928

Country of ref document: EP

Effective date: 20220328

NENP Non-entry into the national phase

Ref country code: DE