EP4022646A1 - A prioritization and scoring method - Google Patents

A prioritization and scoring method

Info

Publication number
EP4022646A1
EP4022646A1 EP20907928.4A EP20907928A EP4022646A1 EP 4022646 A1 EP4022646 A1 EP 4022646A1 EP 20907928 A EP20907928 A EP 20907928A EP 4022646 A1 EP4022646 A1 EP 4022646A1
Authority
EP
European Patent Office
Prior art keywords
features
feature
variant
score
new
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP20907928.4A
Other languages
German (de)
French (fr)
Other versions
EP4022646A4 (en
Inventor
Kazim Kivanç EREN
Ya mur Ceren DARDA AN
Orçun TA AR
Muhammed AKTOLUN
Esra ÇINAR
Irmak TÜRKO LU ÖZTORUN
Cüneyt Öksüz
Bahadir ONAY
Hüseyin ONAY
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Idea Teknoloji Coezuemleri Bilgisayar Sanayi Ve Ticaret AS
Idea Teknoloji Coezuemleri Bilgisayar Sanayi Ve Ticaret AS
Original Assignee
Idea Teknoloji Coezuemleri Bilgisayar Sanayi Ve Ticaret AS
Idea Teknoloji Coezuemleri Bilgisayar Sanayi Ve Ticaret AS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Idea Teknoloji Coezuemleri Bilgisayar Sanayi Ve Ticaret AS, Idea Teknoloji Coezuemleri Bilgisayar Sanayi Ve Ticaret AS filed Critical Idea Teknoloji Coezuemleri Bilgisayar Sanayi Ve Ticaret AS
Publication of EP4022646A1 publication Critical patent/EP4022646A1/en
Publication of EP4022646A4 publication Critical patent/EP4022646A4/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms

Definitions

  • the invention relates to a variant prioritization and scoring method that facilitates the interpretation of the genetic variants (in VCF file formed as a result of the bioinformatics pipeline), using machine learning for the analysis of Next-Generation Sequencing (NGS) data.
  • NGS Next-Generation Sequencing
  • VCF file includes many variants as the output of bioinformatics pipeline (for example, there may be an average of 20.000 variants with confirmed quality, as a result of whole exome sequencing), most of which does not have a pathogenic effect. It is essential to determine a small number of candidate variants that might cause diseases among these variants for diagnosis. Processes such as filtering and prioritization required to determine whether these variants are associated with a disease or not, are carried out manually in the clinic. However, it is a difficult and long process. Finding a small number of candidate variants automatically is hence crucial for faster diagnosis.
  • Correct classification and prioritization of the variants from next-generation sequencing data is one of the most important steps in clinical diagnosis, which consists of manually filtering approximately tens of thousands of variants according to certain features. However, in most cases the correct variant might not be obtained since the filtering method does not have any standard and the filtering parameters were user-dependent.
  • ClinVar one of the most used variant databases recommends the use of American Medical Genetics and Genomics College Guideline (ACMG) so as to improve clinical classification of the variants in the human genome.
  • ACMG American Medical Genetics and Genomics College Guideline
  • Using these rules as features for the machine learning model is very important in terms of increasing the success of classification of variants.
  • These criteria are applied as present/absent (as binary features) to the variants in the current applications whereas the invention creates new rules using these criteria and takes them as features.
  • the invention aims to solve the abovementioned disadvantages motivated from the current conditions.
  • the main goal of the invention is to shorten the time required for genetic diagnosis, by determining the candidate variants that could be associated with a disease, compared to the existing systems.
  • the invention provides an algorithm based on machine learning methods that calculate pathogenicity scores for single nucleotide variants (SNVs). Novel features that haven’t been used previously in the literature for variant scoring and some of the existing scoring models (for example FATHMM, M-CAP, CERENKOV2, SIFT, PolyPhen, ClinPred, CADD, DANN, Mutation Tester) are used to develop a variant scoring system for SNV type variants.
  • SNVs single nucleotide variants
  • ACMG guideline criteria and rules and family segregation information mentioned in the state-of-the-art are used as features (as factors that affect pathogenicity in machine learning models) in the method.
  • ExAC PLI score of a given gene region where the relevant variant is formed is also used as a feature in the method. PLI score gives a probability regarding the tolerance of a given gene to the loss of function on the basis of the number of protein truncating variants.
  • the invention comprises constructing new features (feature generation/construction) from the existing features. The main aim here is to find out the relations between different features via mathematical operations (division, multiplication etc.) using the existing features.
  • feature construction methods such as ExploreKit, AutoLearn, Iterative Feature Construction, Association Rule Mining are used.
  • the workload on the user (usually a medical geneticist), required for the diagnosis, is significantly reduced by means of automatically scoring SNP type variants.
  • the user may require detailed information regarding how the variant scores are generated, to evaluate the variants for diagnosis.
  • Machine learning models are generally complex and their results are not always easy to interpret.
  • additional information is provided to the user regarding the decision process (consisting of complex machine learning models), using Machine Learning Interpretability methods.
  • SHAP Values, Permutation Importance, LIME methods are used.
  • presenting the complex models in the form of one decision tree as Decision Tree Surrogate using Quinlan’s C4.5 Algorithm
  • the invention is a prioritization and scoring method which facilitates the interpretation of the genetic variants (in VCF file formed as a result of the bioinformatics pipeline), using machine learning for the analysis of new generation sequencing data. It comprises the following process steps;
  • Figure 1 illustrates the process steps for generating novel features for the variant scoring model.
  • Figure 2 illustrates the general structure of the variant scoring model.
  • Figure 3 illustrates the structure that shows the complete system.
  • Figure 4 illustrates the position of the invention within the system.
  • FIG 3 the view of the system within a complete structure is given.
  • Figure 4 the position of the invention within the system is shown.
  • the complete system starts with the examination of a patient that exhibits various symptoms by a physician.
  • the physician asks for a genetic test, if he/she finds it appropriate.
  • the blood sample taken from the patient is prepared for DNA sequencing by the laboratory.
  • the prepared sample is processed in the laboratory by the sequencing device and digital DNA data (raw data) of the patient is obtained. Since variant information regarding the disease cannot be achieved directly from the raw material, this data is required to be processed in computer environment via bioinformatics tools and thus the variant information is reached.
  • a variant report that shows the relation of the variant with the disease is created.
  • the bioinformatics pipeline initiates with the raw data obtained from the sequencing device.
  • Raw data contains readings from different parts of the DNA of the patient.
  • the readings in the data are aligned to the human reference genome and saved in the SAM/BAM format so as to determine the regions that these readings are from.
  • variant information that does not confirm with the human reference genome is obtained from the processed SAM / BAM files and written in the VCF file.
  • Specific filters are applied to determine the candidate variants that are associated with the disease, among many variants in VCF file. Variants that are left after filtering are reported and the genetic diagnosis report of the patient is created.
  • the scoring process of the variants is carried out in the variant interpretation step following the creation of the VCF file step.
  • New feature space (30) is created by the feature construction model (20) based on the original features (10) in the data set, to be used for the variant scoring model (40).
  • the features (10) are taken as input to the feature construction model (20).
  • the new feature creation module (21) creates new features from the received features (10) via mathematical operators. New features and original features (10) are ranked according to criteria, such as consistency and information gain with the feature ranking module (22). A predetermined number of features are selected among the ranked features via the feature selection module (23).
  • a new feature space (30) is created by the selected features to be used by the variant scoring model (40).
  • New feature space (30) that is obtained after all the stages in the feature construction model (20) are carried out, is used as input parameters of the variant scoring model (40).
  • the variant scoring model (40) is trained with machine learning methods by using the variant data set containing new feature space (30).
  • the scoring model (40) generates variant score (51) by scoring the variant according to the input values.
  • the features from the same feature space
  • their weights may be different. This is also valid for the variant scoring model.
  • a user evaluates the variant score (51) he/she may desire to state his/her expert opinion by referring to the information regarding which features are considered to what extent.
  • the feature coefficients of the score (53) are calculated by using SHAP Values and LIME method so as to present such information. The user thus can be able to see the how the underlying process to obtain the score is carried out.
  • Variant scoring model (40) is a complex model, thus it may not be easy to interpret fully its results and how it operates. For this purpose, using Machine Learning Model Interpretability methods the scoring model summary (52) is formed so as to provide additional information to the user about the process regarding how the decision is made by the complex variant scoring model (40). Scoring model summary (52) is formed so as to assist the user to make a more accurate evaluation. Methods such as Permutation Significance and Decision Tree Proxy Models are used to create the scoring model summary (52).
  • score (51) After a variant is applied as an input to the scoring model (40), score (51), feature coefficients of the score (53) and the scoring model summary (52) are displayed to the user via an interface. Therefore, the user can see the underlying decision process specific to the variant, along with the variant score (51).

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Epidemiology (AREA)
  • Business, Economics & Management (AREA)
  • Biotechnology (AREA)
  • Public Health (AREA)
  • Evolutionary Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a prioritization and scoring method which facilitates the interpretation of the genetic variants (in VCF file formed as a result of the bioinformatics pipeline) by the user, using machine learning for the analysis of Next-Generation Sequencing (NGS) data.

Description

DESCRIPTION
A PRIORITIZATION AND SCORING METHOD
Technical Field
The invention relates to a variant prioritization and scoring method that facilitates the interpretation of the genetic variants (in VCF file formed as a result of the bioinformatics pipeline), using machine learning for the analysis of Next-Generation Sequencing (NGS) data.
State of the Art
Today, the most time-consuming stage in the DNA sequencing data analysis process is filtering the records in the variant list and interpreting the variants obtained subsequently. VCF file includes many variants as the output of bioinformatics pipeline (for example, there may be an average of 20.000 variants with confirmed quality, as a result of whole exome sequencing), most of which does not have a pathogenic effect. It is essential to determine a small number of candidate variants that might cause diseases among these variants for diagnosis. Processes such as filtering and prioritization required to determine whether these variants are associated with a disease or not, are carried out manually in the clinic. However, it is a difficult and long process. Finding a small number of candidate variants automatically is hence crucial for faster diagnosis.
Correct classification and prioritization of the variants from next-generation sequencing data is one of the most important steps in clinical diagnosis, which consists of manually filtering approximately tens of thousands of variants according to certain features. However, in most cases the correct variant might not be obtained since the filtering method does not have any standard and the filtering parameters were user-dependent.
There are several methods that calculate a pathogenicity score for each variant by using different variant features (such as, allele frequency, functional effect, conservation scores) in machine learning methods. The results of these models that use different algorithms, features and training data, might be conflicting with each other. Thus, a consensus has not been reached on how to classify variants according to their pathogenicity. Moreover, the presence of a variant in other members of the family (segregation information) is very important in terms of clinical diagnosis. Family segregation data, which may increase the success of model prediction has not been used in the existing algorithms as a feature. On the other hand, ClinVar (one of the most used variant databases) recommends the use of American Medical Genetics and Genomics College Guideline (ACMG) so as to improve clinical classification of the variants in the human genome. Using these rules as features for the machine learning model is very important in terms of increasing the success of classification of variants. These criteria are applied as present/absent (as binary features) to the variants in the current applications whereas the invention creates new rules using these criteria and takes them as features.
As a result of the research conducted, US20160357903A1 , US20150066378A1 , US20130332081 A1 and EP3061020 are the patent documents that were found. In these applications, systems and methods that are disclosed are used to generate a priority score for a variant of a gene to evaluate the potential significance of said variant in a disease. This invention aims to generalize the prioritization process for any variant.
As a result, due to the abovementioned disadvantages and the insufficiency of the current solutions regarding the subject matter, further developments are needed in the relevant technical field.
Aim of the Invention
The invention aims to solve the abovementioned disadvantages motivated from the current conditions.
The main goal of the invention is to shorten the time required for genetic diagnosis, by determining the candidate variants that could be associated with a disease, compared to the existing systems. The invention provides an algorithm based on machine learning methods that calculate pathogenicity scores for single nucleotide variants (SNVs). Novel features that haven’t been used previously in the literature for variant scoring and some of the existing scoring models (for example FATHMM, M-CAP, CERENKOV2, SIFT, PolyPhen, ClinPred, CADD, DANN, Mutation Tester) are used to develop a variant scoring system for SNV type variants.
Features: ACMG guideline criteria and rules and family segregation information mentioned in the state-of-the-art are used as features (as factors that affect pathogenicity in machine learning models) in the method. ExAC PLI score of a given gene region where the relevant variant is formed, is also used as a feature in the method. PLI score gives a probability regarding the tolerance of a given gene to the loss of function on the basis of the number of protein truncating variants. In addition to the relevant features, the invention comprises constructing new features (feature generation/construction) from the existing features. The main aim here is to find out the relations between different features via mathematical operations (division, multiplication etc.) using the existing features. Here, feature construction methods such as ExploreKit, AutoLearn, Iterative Feature Construction, Association Rule Mining are used.
With the invention, the workload on the user (usually a medical geneticist), required for the diagnosis, is significantly reduced by means of automatically scoring SNP type variants.
The user may require detailed information regarding how the variant scores are generated, to evaluate the variants for diagnosis. Machine learning models are generally complex and their results are not always easy to interpret. For this reason, with this invention, additional information is provided to the user regarding the decision process (consisting of complex machine learning models), using Machine Learning Interpretability methods. For this goal, SHAP Values, Permutation Importance, LIME methods are used. Also, by presenting the complex models in the form of one decision tree as Decision Tree Surrogate (using Quinlan’s C4.5 Algorithm), detailed information regarding the decision mechanism of the method is rendered more intuitive.
Machine learning methods:
Different machine learning methods are used to score variants. Some of these methods are as follows; Random Forest, XGBoost, CatBoost Classifier, Support Vector Machines, Deep Learning Models, Gauss Mixture Modeling.
Innovative features of the invention can be listed as follows:
• New features are created from the features that are used in the literature (feature construction). Relevant Features are obtained for the machine learning models for variant scoring. • Using Machine Learning Interpretability methods, for each variant, when a score is assigned, an explanation is provided as to how much each feature contributed to the scoring decision by the machine learning model. Therefore, different from the other scoring models in the literature, a more directly interpretable score is provided to the user by displaying the information regarding what the scoring decision is based on for each variant score.
In order to fulfill the abovementioned goals, the invention is a prioritization and scoring method which facilitates the interpretation of the genetic variants (in VCF file formed as a result of the bioinformatics pipeline), using machine learning for the analysis of new generation sequencing data. It comprises the following process steps;
• taking the features as input to the feature construction model,
• creating new features by using the received features via mathematical operators with the feature construction module,
• ranking features (constructed and original features) according to their consistency and information gain with the feature ranking module,
• selecting a predetermined number of features from the listed features via the feature selection module and creating a new feature space,
• generating features in the new feature space by using feature calculation module,
• applying the generated features as input to the variant scoring model and obtaining variant score,
• calculating coefficients of the features by using SHAP Values and LIME method,
• creating scoring model summary by using Permutation Importance and Decision Tree Surrogate Models,
• displaying, via a user interface the obtained score, feature coefficients and of scoring model summary. The structural and characteristic features of the present invention will be understood clearly by the following drawings and the detailed description made with reference to these drawings and therefore the evaluation shall be made by taking these figures and the detailed description into consideration. Figures Clarifying the Invention
Figure 1 illustrates the process steps for generating novel features for the variant scoring model.
Figure 2 illustrates the general structure of the variant scoring model.
Figure 3 illustrates the structure that shows the complete system. Figure 4 illustrates the position of the invention within the system.
Description of the Part References
10. Feature
20. Feature construction model
21 . New feature creation module 22. Feature ranking module
23. Feature selection module 30. New feature space
31 . Feature calculation module 40. Scoring model 50. Score monitor
51 . Score
52. Scoring model summary
53. Feature coefficients of the score Detailed Description of the Invention
In this detailed description, the preferred embodiments of the inventive prioritization and scoring method is described by means of examples only for clarifying the subject matter.
In Figure 1 , the flowchart for constructing the novel features for the variant scoring model is shown.
In Figure 2, the general structure of the variant scoring model is shown.
In Figure 3, the view of the system within a complete structure is given. In Figure 4, the position of the invention within the system is shown. The complete system starts with the examination of a patient that exhibits various symptoms by a physician. The physician asks for a genetic test, if he/she finds it appropriate. The blood sample taken from the patient is prepared for DNA sequencing by the laboratory. The prepared sample is processed in the laboratory by the sequencing device and digital DNA data (raw data) of the patient is obtained. Since variant information regarding the disease cannot be achieved directly from the raw material, this data is required to be processed in computer environment via bioinformatics tools and thus the variant information is reached. After variants are obtained and examined by the user, a variant report that shows the relation of the variant with the disease is created.
The bioinformatics pipeline initiates with the raw data obtained from the sequencing device. Raw data contains readings from different parts of the DNA of the patient. The readings in the data are aligned to the human reference genome and saved in the SAM/BAM format so as to determine the regions that these readings are from. Subsequently, variant information that does not confirm with the human reference genome is obtained from the processed SAM / BAM files and written in the VCF file. Specific filters are applied to determine the candidate variants that are associated with the disease, among many variants in VCF file. Variants that are left after filtering are reported and the genetic diagnosis report of the patient is created. The scoring process of the variants is carried out in the variant interpretation step following the creation of the VCF file step.
Preliminary steps performed to construct the scoring model: New feature space (30) is created by the feature construction model (20) based on the original features (10) in the data set, to be used for the variant scoring model (40). As a first step, the features (10) are taken as input to the feature construction model (20). Then, the new feature creation module (21) creates new features from the received features (10) via mathematical operators. New features and original features (10) are ranked according to criteria, such as consistency and information gain with the feature ranking module (22). A predetermined number of features are selected among the ranked features via the feature selection module (23). A new feature space (30) is created by the selected features to be used by the variant scoring model (40). New feature space (30) that is obtained after all the stages in the feature construction model (20) are carried out, is used as input parameters of the variant scoring model (40). The variant scoring model (40) is trained with machine learning methods by using the variant data set containing new feature space (30).
Scorina:
Features in the new feature space (30) (which is obtained in the preprocessing step from original features (10)) are calculated using feature calculation module (31) and are applied as an input to the variant scoring model (40). The scoring model (40) generates variant score (51) by scoring the variant according to the input values.
When a complex model is applied for any individual input for decision making, the features (from the same feature space) and their weights may be different. This is also valid for the variant scoring model. When a user evaluates the variant score (51), he/she may desire to state his/her expert opinion by referring to the information regarding which features are considered to what extent. The feature coefficients of the score (53) are calculated by using SHAP Values and LIME method so as to present such information. The user thus can be able to see the how the underlying process to obtain the score is carried out.
Variant scoring model (40) is a complex model, thus it may not be easy to interpret fully its results and how it operates. For this purpose, using Machine Learning Model Interpretability methods the scoring model summary (52) is formed so as to provide additional information to the user about the process regarding how the decision is made by the complex variant scoring model (40). Scoring model summary (52) is formed so as to assist the user to make a more accurate evaluation. Methods such as Permutation Significance and Decision Tree Proxy Models are used to create the scoring model summary (52).
After a variant is applied as an input to the scoring model (40), score (51), feature coefficients of the score (53) and the scoring model summary (52) are displayed to the user via an interface. Therefore, the user can see the underlying decision process specific to the variant, along with the variant score (51).

Claims

1. A prioritization and scoring method that facilitates the interpretation of the genetic variants (in VCF file formed as a result of the bioinformatics pipeline), by using machine learning for the analysis of new generation sequencing data, characterized by comprising of the following steps;
• taking the features (10) as an input to the feature construction model (20),
• the new feature creation module’s (21) creating new features by associating the received features (10) via the mathematical operators,
• ranking new features and features (10) according to their consistency and information gains with the feature ranking module (22),
• selecting a predetermined number of consistent features from the listed features via the feature selection module (23) and creating a new feature space (30),
• generating features in the new feature space (30) by using feature calculation module (31),
• applying the generated features as an input to the variant scoring model (40) and obtaining the variant score (51),
• calculating the feature coefficients of the score (53) by using SHAP Values and LIME method,
• creating the scoring model summary (52) by using Permutation Significance and Decision Tree Proxy Models,
• displaying the obtained score (51), the feature coefficients of the score (53) and the scoring model summary (52) to the user via an interface.
EP20907928.4A 2019-12-25 2020-12-24 A prioritization and scoring method Pending EP4022646A4 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TR201921589 2019-12-25
PCT/TR2020/051374 WO2021133351A1 (en) 2019-12-25 2020-12-24 A prioritization and scoring method

Publications (2)

Publication Number Publication Date
EP4022646A1 true EP4022646A1 (en) 2022-07-06
EP4022646A4 EP4022646A4 (en) 2022-11-02

Family

ID=76576076

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20907928.4A Pending EP4022646A4 (en) 2019-12-25 2020-12-24 A prioritization and scoring method

Country Status (2)

Country Link
EP (1) EP4022646A4 (en)
WO (1) WO2021133351A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6073902B2 (en) * 2011-10-06 2017-02-01 セクエノム, インコーポレイテッド Methods and processes for non-invasive assessment of genetic variation
ES2875892T3 (en) * 2013-09-20 2021-11-11 Spraying Systems Co Spray nozzle for fluidized catalytic cracking
US20210074378A1 (en) * 2018-01-26 2021-03-11 The Trustees Of Princeton University Methods for Analyzing Genetic Data to Classify Multifactorial Traits Including Complex Medical Disorders
CN109295198A (en) * 2018-09-03 2019-02-01 安吉康尔(深圳)科技有限公司 For detecting the method, apparatus and terminal device of genetic disease genetic mutation

Also Published As

Publication number Publication date
EP4022646A4 (en) 2022-11-02
WO2021133351A1 (en) 2021-07-01

Similar Documents

Publication Publication Date Title
US11037685B2 (en) Method and process for predicting and analyzing patient cohort response, progression, and survival
Manni et al. BUSCO: assessing genomic data quality and beyond
Moreau et al. Computational tools for prioritizing candidate genes: boosting disease gene discovery
Baele et al. Bayesian evolutionary model testing in the phylogenomics era: matching model complexity with computational efficiency
US7324928B2 (en) Method and system for determining phenotype from genotype
JP2015527635A (en) System and method for generating biomarker signatures using an integrated dual ensemble and generalized simulated annealing technique
Castillo-Secilla et al. KnowSeq R-Bioc package: The automatic smart gene expression tool for retrieving relevant biological knowledge
Fung et al. Automation of QIIME2 metagenomic analysis platform
US20070173700A1 (en) Disease risk information display device and program
CN114424287A (en) Single cell RNA-SEQ data processing
JP2019530098A (en) Method and apparatus for coordinated mutation selection and treatment match reporting
Zhang et al. MaLAdapt reveals novel targets of adaptive introgression from Neanderthals and Denisovans in worldwide human populations
Hawinkel et al. Model-based joint visualization of multiple compositional omics datasets
JP5067417B2 (en) Molecular network analysis support program, molecular network analysis support device, and molecular network analysis support method
Wen et al. OmicsEV: a tool for comprehensive quality evaluation of omics data tables
Gaynor et al. Identification of differentially expressed gene sets using the Generalized Berk–Jones statistic
Zhang et al. VEF: a variant filtering tool based on ensemble methods
EP4022646A1 (en) A prioritization and scoring method
May et al. ClearCNV: CNV calling from NGS panel data in the presence of ambiguity and noise
Reimand et al. Pathway enrichment analysis of-omics data
Ahmad et al. A review of genetic variant databases and machine learning tools for predicting the pathogenicity of breast cancer
KR102483880B1 (en) disease profiling information providing system based on multiple database information and method therefor
JP2001178463A (en) Method for extracting similar expression pattern and method for extracting related biopolymer
Albrecht et al. Machine Learning in Quality Assessment of Early Stage Next-Generation Sequencing Data
Bruno et al. AIM in Medical Informatics

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20220328

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

A4 Supplementary search report drawn up and despatched

Effective date: 20221005

RIC1 Information provided on ipc code assigned before grant

Ipc: G16B 40/20 20190101ALI20220928BHEP

Ipc: G16B 20/20 20190101ALI20220928BHEP

Ipc: G16H 50/20 20180101AFI20220928BHEP

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)