AU2021100434A4 - A system and method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes - Google Patents

A system and method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes Download PDF

Info

Publication number
AU2021100434A4
AU2021100434A4 AU2021100434A AU2021100434A AU2021100434A4 AU 2021100434 A4 AU2021100434 A4 AU 2021100434A4 AU 2021100434 A AU2021100434 A AU 2021100434A AU 2021100434 A AU2021100434 A AU 2021100434A AU 2021100434 A4 AU2021100434 A4 AU 2021100434A4
Authority
AU
Australia
Prior art keywords
gene
genes
samples
schizophrenia
dataset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2021100434A
Inventor
Anitha A.
Sudha M.
Sivashankari R.
Karthik S.
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to AU2021100434A priority Critical patent/AU2021100434A4/en
Application granted granted Critical
Publication of AU2021100434A4 publication Critical patent/AU2021100434A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/30Psychoses; Psychiatry
    • G01N2800/302Schizophrenia
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/30Psychoses; Psychiatry
    • G01N2800/304Mood disorders, e.g. bipolar, depression

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Biotechnology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Public Health (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)

Abstract

The present disclosure relates to a system and a method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes. A rank-based gene biomarker identification and classification framework is proposed to identify the overlapping and non overlapping gene patterns of bipolar disorder and schizophrenia. The dataset used in this experiment is obtained from gene expression omnibus database. As an outcome of this experiment, seven biomarkers are identified as the overlapping genes. Also, 60 and 68 informative gene biomarkers are identified on bipolar disorder and schizophrenia dataset as feature subsets to discriminate the samples. Overlapping genes are eliminated to increase the diagnostic accuracy of the disorders. The performance of the proposed system is evaluated with standard existing machine learning techniques. This proposed framework attained 97.01% and 95.65% accuracy on bipolar disorder and schizophrenia dataset with deep neural network model outperformed other benchmarked techniques and proved its efficacy. 37 0~ 04 04 0 C 00)3 0 0 L w~. x 0.- a. 0 o0U0 oo CU o ~ bo 0))Ca)) 3J O 0 a I CUo -Fu 2 ai 0 0- 0- a) CU E E~~ c - aj 3:0)0) E E E Z E toL> = _ LE->E a 0 a C C 0-0 C 0 0 CU 0)0 a 0C 0)0In bo E -oo E X'C 0 aU 0)r == 0) w 0 boL~ 2 a) aU-~ ) CU 0 )j CU 0 0 m C 00 2 bo zj a E = .>_ 'bUl' bo f CU E0o 0iu ajXi N a

Description

0~ 04 04 0
C 00)3 0 0 L w~. a. 0 x 0.-
o0U 0 CU oo
o ~ bo
O 3J 0))Ca))
0a I
2CUo-Fu
ai 0
0- 0- a) CU E E~~ c - aj 3:0)0)
E E E Z E toL> =_ LE->E a 0 a C C 0-0
C
0 0
CU a 0)0
0C 0)0In
E bo
-oo E
aU 0 X'C == 0)r 0) w 0
boL~ 2 a)
aU-~) CU j0 CU )
0 0 m C 00
2 bo zj a E = .>_ bo f 'bUl' CU
E0o iu ajXi N a
A SYSTEM AND METHOD FOR PREDICTING BIPOLAR DISORDER AND SCHIZOPHRENIA BASED ON NON-OVERLAPPING GENETIC PHENOTYPES FIELD OF THE INVENTION
The present disclosure relates to a system and a method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes. In more details, the present disclosure relates to a system and a method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes using deep neural network.
BACKGROUND OF THE INVENTION
Schizophrenia is a serious form of mental disorder that affects the adulthood of a person. The prevalence of schizophrenia is common in both genders. The common symptoms are highlighted as hallucinations, delusions, confusions, imbalance state of mind, thought disorder and negative symptoms. It interferes in regular day-to-day activities, thinking, action, speech, and emotions. The abnormal workflow in neurotransmission is the clinical reason given by the mental health professionals as a reason for the development of schizophrenia. Genetic and environmental factors play an important role in where schizophrenia can be originated. Treatments such as Cognitive-behavioral therapy (CBT) is said to be more effective in treating schizophrenia patients. Bipolar Disorder is another type of mental disorder which has serious effects on mental health. It is clearly stated that the risk of developing bipolar disorder has inheritable genetic links. It is a chronic mental illness followed by manic and depressive episodes of various time bounds. Among the global population, 2% of people are affected by the bipolar disorder of both types. Often there is a misconception about Major Depressive Disorder (MDD) and bipolar disorder because of the similarities in its symptoms. Inaccurate diagnosis may sometime lead to a life-threatening effect on their personal life. This elevates the risk of attempting suicide. The notable symptoms of bipolar disorder are mood swings, irritability, sadness, anxiety, and difficulty in sleeping. Anti-psychotic medications are said to be more effective in treating bipolar disorder. The Diagnostic and Statistical Manual (DSM-V) of American Psychiatric Association, briefly identified the symptoms and possible medications for both schizophrenia and bipolar disorder and is recommended for mental disorder diagnosis for medical professionals. The number of peoples affected by mental illness is rising at an alarming rate due to many reasons. It becomes an overhead for medical professionals to analyze every individual with care. The demand for developing an intelligent computational model for the diagnosis of mental disorders is increasing progressively.
Neural Networks is the term first coined and introduced by McCulloch and Pitts. Initially, it was used to point out any circuit of neurons or networks. Inspired by the workflow and functionality of a typical biological human brain, Artificial Neural Network (ANN) is created by Rosenblatt which is also called a perceptron technique, used for pattern recognition. The main advantage of a neural network is the way it understands and captures the pattern hidden in the input data. The back-propagation concept in a neural network is added up by Paul Werbos, which gives the next dimension to apply a neural network to many real-world problems. It provides more flexibility to enhance the performance of the model by adjusting the results of each layer during propagation. Computationally intensive neural network models are further developed and named "Deep Learning" models. Some deep learning models are Convolutional Neural Networks, Deep Neural Networks, and Contractive Auto-Encoders. Currently, neural network models are widely adopted in various disciplines such as energy systems, healthcare, robotics, agriculture, weather modeling, geospatial image analysis, etc. The impact of neural networks in the medical field is comparatively high. The heterogeneous data from different sources can be analyzed with a neural network because of its non-linearity. Technology development plays a vital role in the improvement of healthcare facilities. The invention of many medical devices minimizes the burden of healthcare professionals. In that way, after the successful attempt on human genome sequencing has been made, many modern devices are invented to analyze genetic patterns of humans to identify the root cause of any disease. Some of the techniques are microarray gene expression analysis, Next Generation Sequencing (NSG), etc. Neural network models could effectively analyze and reveal the patterns of the gene expressions. Microarray gene expression of cancer data is classified well using a neural network model with an accuracy of 98%. A Support Vector Machine (SVM) based classifier is developed to classify two different cancer datasets and benchmarked with ANN. SVM performed better than ANN model. A brief survey is made on gene expressions in cancer prognosis with neural network and other machine learning models. A Multilayered Perceptron (MLP) neural network is employed to classify gene data of cancer patients with Artificial Bee Colony (ABC) technique to select informative genes. MLP showed the better result with an accuracy of 93.2% than the Radial Basis Function (RBF). A risk classification process is made to predict the cancer survivor rate using ANN. An attempt is made to provide targeted therapy to cancer patients using ANN. Small round blue cell tumors (SRBCT) data is given as the input to the ANN to train the model. This model is tested with blinded samples to evaluate its performance. In this work, a neural network model is developed to predict the clinical outcome of neuroblastoma patients. This model achieved 88% of accuracy and predicted the prognostic biomarkers from 98% of patients. An ensemble neural network model is framed to classify leukemia, colon, and b-cell lymphoma data. To identify the cancer subgroups from gene expression data, a diagnostic system with multilayered networks is developed. Seven gene biomarkers with four subgroups are precisely suggested by the engaged neural model.
In recent years, after the advent of gene sequencing technologies, many research attempts have been made to find the patterns from the gene expressions. Many computational models are developed to analyze and observe the pattern of the data. Gene expression analysis might bring up a new way to treat patients with targeted drugs for different diseases effectively. In specific, a lot of intensive research has conducted in analyzing neurological and psychiatric disorders. The primary reason is said to be the genetic link associated with the hierarchy without a common cause. A neural network model is constructed to classify schizophrenia patients from their gene expressions. This model achieved 92.1% accuracy with train-test split. An SVM based model is developed to identify and expose Alzheimer's candidate genes. An accuracy rate of 84.56% with the Receiver's Operational Characteristic (ROC) 94% is obtained by this model. Gene signatures of schizophrenia patients are identified with SVM Recursive Feature Elimination (RFE) model. 21 gene biomarkers are identified in this model for an accurate diagnosis. Correlation-based diagnostic signature identification is performed to identify genetic biomarkers of schizophrenia patients. 103 gene biomarkers were identified and achieved 100% accuracy with 10-fold cross-validation. The transcriptomic biomarkers of autism spectrum disorder from blood-based gene expressions are identified using SVM and k-Nearest Neighbor (k-NN). In this system, SVM has shown better results with 93.8% accuracy. An early risk prediction model for Post-Traumatic Stress Disorder (PTSD) is developed for targeted prevention using Targeted Information Equivalence Technique (TIE*). Diffuse large B-cell lymphoma prediction model is made using SVM by extracting the candidate genes from protein expression profiles. Four candidate genes were identified, classifies the data with a low error rate. An ensemble feature selection method is developed to identify DNA methylation biomarkers of three types of lung cancer. An accuracy of 86.54% is achieved from Random Forest (RF) with Leave One Out cross-validation (LOOCV). A hybrid gene selection model is proposed to identify gene biomarkers of cancer. SVM is used for classification and minimum redundancy maximum relevance - ABC (mRMR-ABC) is applied to select genetic markers.
Many of the computational models intend to accurately discriminate the samples of each group for better classification. That process is accomplished when the candidate features were properly identified prior to the learning phase. Generally, the final feature subset is fed into the model for training and evaluation. However, the objective of the proposed work is to find the overlapping association between schizophrenia and bipolar disorder. So, before model training, the genes that overlaps in both the datasets were rooted out. The non overlapping, candidate genetic features were inputted into learning model to classify schizophrenia and bipolar disorder affected individuals against the healthy samples.
However, there are various computational models that intends to accurately discriminate the samples of each group for better classification, but models fail to identify candidate features prior to the learning phase. In view of the foregoing discussion, there exists a need to have a system and a method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes.
SUMMARY OF THE INVENTION
The present disclosure seeks to develop a method and a highly reliable computational system to identify gene biomarkers of schizophrenia and bipolar disorder by eliminating its overlapping genetic association.
In an embodiment, a system for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes is provided. The system includes an input module for receiving microarray gene expression samples from GEO database, wherein the samples are taken of the prefrontal cortex of the human brain of total 102 samples from postmortem brains.
The system includes a pre-processing module in connection with the input module for rectifying background using Robust Multi-array Average (RMA) technique, normalizing the probe values using Quantile Normalization technique and summarizing with median polish using 'limma' library, analysis of differential gene expression.
The system includes a gene ranking model for selecting the biomarker genes from the given data in order to extract optimal gene features and thereby eliminating overlapping genes, wherein the rank of each feature vector is calculated by subtracting the weight with calculated JI value, wherein the genes are extracted through GEO2R library, wherein 100 genes with smaller p-value are selected as significant genes for the identification of potential gene biomarkers.
The system includes a classification module in association with the gene ranking model for dividing given dataset into two different datasets to find the phenotype genetic markers of the diseases individually and thereby analyzing two different datasets together to avoid overlapping association identification.
In an embodiment, a significance level of p-value < 0.05 is considered to select the most informative, discriminative genes, wherein the rank of each gene is calculated by Gene Mania tool based on its importance.
In an embodiment, a Quantile Normalization technique is used to normalize the probe values, wherein a median polish is employed to summarize the probe values and analysis of differential gene expression is performed by a library.
In an embodiment, Jaccard index (JI) is a statistical method used to calculate the similarity and diversity range between two different sample sets, wherein the similarity between bipolar disorder and schizophrenia dataset is calculated using JI in which the similarity value of the two sets of data lies in the range from 0% to 100%.
In another embodiment, a method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes is provided. The method includes receiving microarray gene expression samples from GEO database, wherein the samples are taken of the prefrontal cortex of the human brain of total 102 samples from postmortem brains.
The method includes rectifying background using Robust Multi-array Average (RMA) technique, normalizing the probe values using Quantile Normalization technique and summarizing with median polish using 'limma' library, analysis of differential gene expression.
The method includes selecting the biomarker genes from the given data in order to extract optimal gene features and thereby eliminating overlapping genes, wherein the rank of each feature vector is calculated by subtracting the weight with calculated JI value, wherein the genes are extracted through GEO2R library, wherein 100 genes with smaller p-value are selected as significant genes for the identification of potential gene biomarkers.
The method further includes dividing given dataset into two different datasets to find the phenotype genetic markers of the diseases individually and thereby analyzing two different datasets together to avoid overlapping association identification.
In an embodiment, the method further comprises: receiving microarray gene expression samples from GEO database, wherein the samples are taken of the prefrontal cortex of the human brain of total 102 samples from postmortem brains; examining gene expression patters of mitochondrial genes of at least 33 bipolar disorder patients' samples, 34 normal people samples and 35 schizophrenia patient samples; capturing and analyzing the genomic patterns using Affymetrix Human Genome Array (U133A), wherein 22283 gene expression probes are given in the dataset for each sample; dividing given dataset into two different datasets after pre-processing to find the phenotype genetic markers of the diseases individually; and analyzing two different datasets together to avoid overlapping association identification, wherein initially the dataset has 102 samples with 33 bipolar disorder patients, 34 control samples and 35 schizophrenia patients whereas later on single dataset is separated into bipolar disorder dataset (33 bipolar + 34 control, totally 67 samples) and schizophrenia dataset (35 schizophrenia + 34 control, totally 69 samples).
In an embodiment, a process to develop the RGBIC framework comprises: acquiring microarray gene expression dataset from standard resources and investigating various related works to identify and thereby remove key gap in the system; perform preliminary data analysis, pre-processing and transformations on microarray gene expression dataset; identifying features with high importance and thereafter eliminating remaining features; and applying various machine learning models to identify best model and evaluating performance using standard metrics.
In an embodiment, a process for calculating noise ratio comprises: calculating ratio between the power of a signal and power of noise of an input signal which is represented as the ratio of mean to standard deviation of any measurement or a signal; mapping the input signal with the gene features and thereby calculating strength of the gene; calculating similarity between two different datasets using Jaccard index; calculating signal to noise ratio (SNR) by finding the mean and standard deviation of each feature using plurality of positive values from the datasets; and calculating difference between the signals from the maximum and minimum SNR upon calculating difference between the weakest and strongest signal and thereby weight of each feature is calculated.
In an embodiment, gene network and pathway analysis are performed to identify the similarity between functional genes, wherein the backbone of this tool is built with a large amount of functional gene association data with logical interactions between each gene, wherein in the same way, few more tools are available for gene regulation prediction such as Gene Mania, Fun Coup, STRING, VisANT, etc.
In DNN model, 60 gene biomarkers are selected from SIFRA as input, wherein the number of input nodes is 60, with 10, 20 and 10 hidden layers.
An object of the present disclosure is to develop a highly reliable computational model to identify gene biomarkers of schizophrenia and bipolar disorder by eliminating its overlapping genetic association.
Another object of the present disclosure is to develop a two-phase model, wherein phase 1 is a novel gene ranking technique is used to identify the non-overlapping genes, which is bagged together by removing the overlapping genes and phase 2 is considered as the riskiest genes of both mental disorders.
Another object of the present disclosure is to build a deep neural network model is built to classify the given data.
Another object of the present disclosure is to find the overlapping association between schizophrenia and bipolar disorder.
Another object of the present disclosure is to develop a rank-based gene biomarker identification and classification framework is proposed to identify the overlapping and non overlapping gene patterns of bipolar disorder and schizophrenia.
Another object of the present disclosure is to minimize operational complexity under any condition.
Another object of the present disclosure is to reveal the importance of identifying potential gene biomarkers of mental disorders for accurate disease diagnosis.
Another object of the present disclosure is to assist automated diagnostic procedures for the medical practitioners and is cost effective.
Yet another object of the present invention is to deliver an expeditious and cost effective method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes.
To further clarify advantages and features of the present disclosure, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
BRIEF DESCRIPTION OF FIGURES
These and other features, aspects, and advantages of the present disclosure will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
Figure 1 illustrates a schematic block diagram of a system for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes in accordance with an embodiment of the present disclosure; Figure 2 illustrates a flow chart of a method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes in accordance with an embodiment of the present disclosure; Figure 3 illustrates an architecture for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes in accordance with an embodiment of the present disclosure; Figure 4 illustrates a work flow of the gene selection technique in accordance with an embodiment of the present disclosure; Figure 5 illustrates a deep neural network model for bipolar disorder and schizophrenia classification in accordance with an embodiment of the present disclosure; Figure 6 illustrates a heat map representation of bipolar disorder in accordance with an embodiment of the present disclosure; Figure 7 illustrates a heat map representation of schizophrenia in accordance with an embodiment of the present disclosure; Figure 8 illustrates an exemplary profile of a gene regulation and pathways of overlapping genes in accordance with an embodiment of the present disclosure; Figure 9 illustrates a graph of a true positive rate of bipolar disorder schizophrenia achieved on 5 Classifiers in accordance with an embodiment of the present disclosure;
Figure 10 illustrates a graph of a false positive rate of bipolar disorder schizophrenia achieved on 5 Classifiers in accordance with an embodiment of the present disclosure; Figure 11 illustrates a graph of a root mean squared error of bipolar disorder schizophrenia achieved on 5 Classifiers in accordance with an embodiment of the present disclosure; Figure 12 illustrates a comparison graph of results obtained from deep neural network with different feature subsets on bipolar disorder dataset in accordance with an embodiment of the present disclosure; and Figure 13 illustrates a comparison graph of results obtained from deep neural network with different feature subsets on schizophrenia dataset in accordance with an embodiment of the present disclosure.
Further, skilled artisans will appreciate that elements in the drawings are illustrated for simplicity and may not have been necessarily been drawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present disclosure. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having benefit of the description herein.
DETAILED DESCRIPTION
For the purpose of promoting an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates.
It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not intended to be restrictive thereof.
Reference throughout this specification to "an aspect", "another aspect" or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more devices or sub-systems or elements or structures or components proceeded by "comprises...a" does not, without more constraints, preclude the existence of other devices or other sub-systems or other elements or other structures or other components or additional devices or additional sub-systems or additional elements or additional structures or additional components.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting.
Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings.
Referring to Figure 1, a schematic block diagram of a system for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes is illustrated in accordance with an embodiment of the present disclosure. The system 100 includes an input module 102 for receiving microarray gene expression samples from GEO database, wherein the samples are taken of the prefrontal cortex of the human brain of total 102 samples from postmortem brains.
In an embodiment, a pre-processing module 104 is in connection with the input module 102 for rectifying background using Robust Multi-array Average (RMA) technique, normalizing the probe values using Quantile Normalization technique and summarizing with median polish using 'limma' library, analysis of differential gene expression.
In an embodiment, a gene ranking model 106 is used for selecting the biomarker genes from the given data in order to extract optimal gene features and thereby eliminating overlapping genes, wherein the rank of each feature vector is calculated by subtracting the weight with calculated JI value, wherein the genes are extracted through GEO2R library, wherein 100 genes with smaller p-value are selected as significant genes for the identification of potential gene biomarkers.
In an embodiment, a classification module 108 is in association with the gene ranking model 106 for dividing given dataset into two different datasets to find the phenotype genetic markers of the diseases individually and thereby analyzing two different datasets together to avoid overlapping association identification.
In an embodiment, a significance level of p-value < 0.05 is considered to select the most informative, discriminative genes, wherein the rank of each gene is calculated by Gene Mania tool based on its importance. In an embodiment, a Quantile Normalization technique is used to normalize the probe values, wherein a median polish is employed to summarize the probe values and analysis of differential gene expression is performed by a library.
In an embodiment, Jaccard index (JI) is a statistical method used to calculate the similarity and diversity range between two different sample sets, wherein the similarity between bipolar disorder and schizophrenia dataset is calculated using JI in which the similarity value of the two sets of data lies in the range from 0% to 100%.
In an embodiment, the dataset is accessed from GEO database under the control of the National Center for Biotechnology Information (NCBI). The accession number of the dataset is GSE12649. The samples are taken from the prefrontal cortex of the human brain. A total of 102 samples are collected from postmortem brains. The patterns examined from the gene expressions are mainly focused on mitochondrial genes. Among them, 33 samples belong to bipolar disorder patients, 34 samples are from normal peoples and 35 are from schizophrenia patients. Affymetrix Human Genome Array (U133A) is used to capture and analyze the genomic patterns. Totally, 22283 gene expression probes are given in the dataset for each sample. The given dataset is divided into two different datasets after pre-processing. This procedure doesn't need any specific technique to split the samples into groups. The importance of dividing the dataset is to find the phenotype genetic markers of the diseases individually. If both the classes are analyzed together, then the overlapping association identification becomes impractical. Initially, the dataset has 102 samples with 33 bipolar disorder patients, 34 control samples and 35 schizophrenia patients. In this work, a new attempt has been made by separating the single dataset into bipolar disorder dataset (33 bipolar + 34 control, totally 67 samples) and schizophrenia dataset (35 schizophrenia + 34 control, totally 69 samples). In Table 1, the details about the dataset are given.
Table 1: Dataset Description
Details Source Information Data Repository Gene Expression Omnibus
Accession Number GSE12649
Disease Type Bipolar Disorder and Schizophrenia
Number of Samples 102 (33 BP, 35 Sczh, 34 Control)
Number of Features 22283
Class (0 - Control, 1 - Case)
Figure 2 illustrates a flow chart of a method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes in accordance with an embodiment of the present disclosure. At step 202, the method 200 includes receiving microarray gene expression samples from GEO database, wherein the samples are taken of the prefrontal cortex of the human brain of total 102 samples from postmortem brains.
At step 204, the method 200 includes rectifying background using Robust Multi-array Average (RMA) technique, normalizing the probe values using Quantile Normalization technique and summarizing with median polish using 'limma' library, analysis of differential gene expression.
At step 206, the method 200 includes selecting the biomarker genes from the given data in order to extract optimal gene features and thereby eliminating overlapping genes, wherein the rank of each feature vector is calculated by subtracting the weight with calculated JI value, wherein the genes are extracted through GEO2R library, wherein 100 genes with smaller p-value are selected as significant genes for the identification of potential gene biomarkers.
At step 208, the method 200 includes dividing given dataset into two different datasets to find the phenotype genetic markers of the diseases individually and thereby analyzing two different datasets together to avoid overlapping association identification.
In an embodiment, the method further comprises receiving microarray gene expression samples from GEO database, wherein the samples are taken of the prefrontal cortex of the human brain of total 102 samples from postmortem brains. The method further comprises examining gene expression patters of mitochondrial genes of at least 33 bipolar disorder patients' samples, 34 normal people samples and 35 schizophrenia patient samples. The method further comprises capturing and analyzing the genomic patterns using Affymetrix Human Genome Array (U133A), wherein 22283 gene expression probes are given in the dataset for each sample. The method further comprises dividing given dataset into two different datasets after pre-processing to find the phenotype genetic markers of the diseases individually and analyzing two different datasets together to avoid overlapping association identification, wherein initially the dataset has 102 samples with 33 bipolar disorder patients, 34 control samples and 35 schizophrenia patients whereas later on single dataset is separated into bipolar disorder dataset (33 bipolar + 34 control, totally 67 samples) and schizophrenia dataset (35 schizophrenia + 34 control, totally 69 samples).
In an embodiment, a process to develop the RGBIC framework comprises acquiring microarray gene expression dataset from standard resources and investigating various related works to identify and thereby remove key gap in the system. The process comprises perform preliminary data analysis, pre-processing and transformations on microarray gene expression dataset. The process further comprises identifying features with high importance and thereafter eliminating remaining features and applying various machine learning models to identify best model and evaluating performance using standard metrics.
In an embodiment, a process for calculating noise ratio comprises calculating ratio between the power of a signal and power of noise of an input signal which is represented as the ratio of mean to standard deviation of any measurement or a signal. The process comprises mapping the input signal with the gene features and thereby calculating strength of the gene. The process comprises calculating similarity between two different datasets using Jaccard index. The process comprises calculating signal to noise ratio (SNR) by finding the mean and standard deviation of each feature using plurality of positive values from the datasets and calculating difference between the signals from the maximum and minimum SNR upon calculating difference between the weakest and strongest signal and thereby weight of each feature is calculated.
In an embodiment, gene network and pathway analysis are performed to identify the similarity between functional genes, wherein the backbone of this tool is built with a large amount of functional gene association data with logical interactions between each gene, wherein in the same way, few more tools are available for gene regulation prediction such as Gene Mania, Fun Coup, STRING, VisANT, etc. In DNN model, 60 gene biomarkers are selected from SIFRA as input, wherein the number of input nodes is 60, with 10, 20 and 10 hidden layers.
Figure 3 illustrates an architecture for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes in accordance with an embodiment of the present disclosure. Rank based gene biomarker identification and classification (RGBIC) Framework is developed to analyze the gene expressions of bipolar disorder and schizophrenia patients. The workflow of the framework has two phases. In phase-I, the potentially high-risk gene biomarkers of bipolar disorder and schizophrenia are identified using SIFRA. The prediction of the occurrence of the disease is made in phase II using deep neural network model. Before phase I, data preprocessing is performed on the dataset. This proposed model has shown better performance on both the datasets. The general architecture of the proposed framework is given in Figure 3.
In data pre-processing, the dataset is downloaded from GEO repository in CEL microarray format. It is processed using various R libraries and a special package called "Bioconductor". An array of probe intensity values is generated from the CEL file using 'affy' library. For background correction, the Robust Multi-array Average (RMA) technique is adopted. To normalize the probe values, Quantile Normalization technique is adopted. Summarization is made with median polish. Using 'limma' library, analysis of differential gene expression is performed. A significance level of p-value < 0.05 is considered to select the most informative, discriminative genes. The extraction of the genes is made with the help of GEO2R library. Top 100 genes with smaller p-value are selected as significant genes for the identification of potential gene biomarkers.
In signal induced feature ranking technique (SIFRA), in phase - I of RGBIC framework, a gene ranking model 106 is proposed to select the biomarker genes from the
given data. Let X(ERd) and Y (ER) defines the domain of input vector x and output values y respectively. The dimensions of the original dataset is represented as
X = [x 1 ,x 2 ,...-x] E Rd,
y = [y 1 ,..,y.]T E R'
where T represents transpose of the predictor variable. In genomic data, the number of clinical samples is very less with a number of gene features. The aim of the technique is to find the optimal gene features where (m < d) of the given input vector x to predict the output y. m denotes the size of the identified best feature subset and d is the total number of features.
The Jaccard index (JI) is a statistical method used to calculate the similarity and diversity range between two different sample sets. In the proposed technique, the similarity between bipolar disorder and schizophrenia dataset is calculated using JI. The similarity value of the two sets of data lies in the range from 0% to 100%. JI can be calculated from equation 1.
(AnB) (1) J(A,B)=(AUB)
In the above equation, A and B represents two different datasets. Signal to Noise Ratio Signal to Noise Ratio (SNR) is a measure used to compare the desired signal to the background noise. The ratio between the power of a signal and power of noise calculates SNR of an input signal. It is represented as the ratio of mean ([) to standard deviation (u) of any measurement or a signal. In our case, the input signal is mapped with the gene features, where the strength of the gene can be calculated with SNR. In equation 2, the formula of SNR is given.
SNR= (2)
Here, X is the complete feature set of the input dataset ranging from X, to Xn-1
Each feature Xi is considered as a signal are each signal is given as SEX. Jaccard Index (JI)
calculates the similarity between two different datasets. In this work, two different datasets are used. Similarities between the dataset are calculated with the formula given in Technique I in step 1 as JI. Each signal in the dataset has some positive value, i.e SNRi>O. From these
values, Signal to Noise Ratio (SNR) is calculated by finding the mean and standard deviation of each feature Xi. The procedure to calculate SNR is given in step 2. Then the difference
between the signals is calculated from the maximum and minimum SNR. To find the signal strength of Xj, the difference between the weakest and strongest signal is calculated. The
weight Wi of each feature is calculated as mentioned in step 5.
In technique SIFRA:
Input: Microarray Dataset of Schizophrenia (Schz), Bipolar Disorder (BP) and Control Samples
Output: Overlapping and Non-Overlapping Genes of Schizophrenia and Bipolar Disorder
1. JI = (BP nSchz)/(BP U Sczh)
2. SNR = /u
3. SNRran=SNRmax-SNRmin
4. Xi,ran=Xi,max-Xi,min
5. Wi=SNRran/Xi,ran
6. R=Wi-JI
7. Rfin=0, Rsub=0
8. fori=ltoRdo
9. if(Ri>0)
10. Rfin=Ri
11. else
12. Eliminate R
13. end
14. for j = 1 to Rfin do
15. if(Rj==Rfin)
16. Eliminate Rj
17. else
18. Rsub=Rj
19. end
The rank of each feature vector is calculated by subtracting the weight with calculated JI value and is given in step 6. All positive rank signals are considered as strong signals and negative rank signals are assumed to be weak signals. For the next phase, strong signals are taken into consideration. After this process, 67 gene biomarkers on bipolar disorder and 75 gene biomarkers on schizophrenia are identified. Overlapping of genes is another issue to be pointed out, which affects the system performance by reducing the prediction accuracy. After the strong signals are identified on both datasets, common gene features are sorted out. Seven genes are identified as overlapping genes and eliminated from the dataset. Among all the seven, six genes with their symbols are given as LST1, HSD11B1, DDX27, CRHBP, BID, and ATG5. After phase - I, 60 and 68 gene biomarkers are identified as high potential risky genes on bipolar disorder and schizophrenia respectively.
Figure 4 illustrates a work flow of the gene selection technique in accordance with an embodiment of the present disclosure. The workflow of the proposed SIFRA technique is given in Figure 4. Step 402 deals with input microarray dataset (GSE12649). Step 404 deals with calculating Jaccard index i.e., J= (AnB)/(AUB)). Step 406 deals with calculating signal to noise ratio: SNR=p/a. Step 408 deals with evaluating d= SNR max-SNR min; f=F Max- F Min. Step 410 deal with evaluating weight and rank of each feature : W=d/f; R=W-J. At step 412 if rank is positive, eliminate the feature and if rank is not positive, accept the feature. At step 414 if gene overlaps, eliminate the feature and if gene not overlaps perform optical gene subset.
Figure 5 illustrates a deep neural network model for bipolar disorder and schizophrenia classification in accordance with an embodiment of the present disclosure. Neural networks are non-linear computational models employed for computationally intensive tasks such as pattern recognition, object identification, etc. The schema of a DNN is similar to a typical ANN. A brief architecture of DNN is given in Figure 5. The complexity of the network is high in DNN since it has any number of hidden layers to be added up, whereas it is limited in ANN. A typical DNN has the following components such as an input layer, a hidden layer, and an output layer. The main advantage of any type of neural network is its flexibility to fine-tune its hyper-parameters. In this framework, DNN classifier is positioned in phase - II for classifying the input data. This DNN model has 10, 20 and 10 hidden layers other than input and output layer. Since the number of input feature vectors is high, this network model consumes high computational resources to perform the task. This model outperformed other benchmarked classifier models for the given dataset. The parameters and the corresponding values used in this DNN is given in Table 2.
Table 2: Parameters of Deep Neural Network Technique
Parameters Values
Solver Adagrad
Activation Function ReLU
Learning Rate 0.1
Hidden Layers 3
Batch Size 10
Hidden Units 10,20,10
Decay Steps 10000
Decay Rate 0.96
Regularization Strength (L1) 0.001
Number of Classes 2 (Case, Control)
In DNN model, 60 gene biomarkers selected from SIFRA is given as input. The number of input nodes is 60, with 10, 20 and 10 hidden layers. This network supports binary classification for the given input data. Rectified Linear Unit (ReLU) activation function is used in this model. The weights, bias, and learning rate are constantly fixed in the first epoch. During back-propagation, the values will be adjusted automatically by the network. This model will run till all the epochs are completed or the best result obtained. In this network,
I1,12---In represents the inputs, H1 ,H2 ,H3 are the number of hidden layers with 10, 20 and 10 respectively. 0 and 02 are the two nodes in the output layer of the binary classification
model 108. Batch size plays an important role in training a DNN model. In general, batch size represents the number of training examples in one forward/backward pass. If the samples in a dataset are very less, then the batch size might be as equal as the sample size, whereas in case of large sample size, the batch size can be determined randomly. For schizophrenia, 68 genes are identified from the gene selection model and given as input for the DNN. The system has a similar structure of the previous network model with the same parameters. The result obtained from this model has better accuracy than other benchmarked models.
Figure 6 illustrates a heat map representation of bipolar disorder in accordance with an embodiment of the present disclosure. A heat map visualizes the gene expression data in a clustered grid form. Hierarchical clustering technique is applied to the candidate gene subset to plot this heat map. Each row represents individual samples and column indicates the gene features. The color palette in the heat map represents the changes occurred in the gene expression. Heat maps combines with different clustering methods to group the samples based on its similarity pattern to identify the regulation of genes. In Figure 6, the up-regulated and down-regulated genes are represented as the heat map with red and green color. Black color indicates the absence of regulation.
Figure 7 illustrates a heat map representation of schizophrenia in accordance with an embodiment of the present disclosure. Figure 7 represents the up-regulated and down regulated genes as the heat map with red and green color. Black color indicates the absence of regulation.
Figure 8 illustrates an exemplary profile of a gene regulation and pathways of overlapping genes in accordance with an embodiment of the present disclosure. Gene Network and Pathway analysis are performed to identify the similarity between functional genes. The backbone of this tool is built with a large amount of functional gene association data with logical interactions between each gene. In the same way, few more tools are available for gene regulation prediction such as GeneMania, FunCoup, STRING, VisANT, etc. The gene network and pathway association map is generated using GeneMania tool and represented in Figure 8. This tool is more flexible and user-friendly when compared with other tools with similar functionalities. This tool is loaded with a list of gene biomarkers. From that, the significant genes that are strongly correlated with the given input gene are generated. Analyzing the genetic interactions is significantly important to find the pattern and its correlated functionalities associated with any disease. In Figure 8, the network represents the relationship between the genes and each node in the network is a gene. Table 3 contains the genes correlated with the identified overlapping subset. The rank of each gene is calculated by GeneMania tool based on its importance. The pathway analysis performed effectively on the overlapping gene to identify the genes having a strong correlation with each other.
Table 3: Correlated Genes of Overlapping Biomarkers
Gene Description Rank MTMR11 myotubularin related protein 11 1 TTYH2 tweety family member 2 2 TBC1D8 TBC1 domain family member 8 3 SLC2A6 solute carrier family 2-member 6 4 MS4A7 membrane spanning 4-domains A7 5
NUDC nudC nuclear distribution protein 6 SRRM1 serine, arginine repetitive matrix 1 7 TCF7L2 transcription factor 7 like 2 8 CTSL cathepsin L 9 USP7 ubiquitin specific peptidase 7 10 CDKN1C cyclin dependent kinase inhibitor IC 11 RAB24 RAB24, RAS oncogene family 12 VDR vitamin D receptor 13 AKT2 AKT serine/threonine kinase 2 14 ADGRE2 adhesion G protein-coupled receptor 15 GCH1 GTP cyclohydrolase 1 16 CD300LF CD300 molecule like family f 17 VHL von Hippel-Lindau tumor suppressor 18 MS4A14 membrane spanning 4-domains A14 19 C3orfl4 chromosome 3 open reading frame 14 20
A statistical evaluation is conducted on the identified gene biomarkers of bipolar disorder and schizophrenia to verify its significance. Limma library is used to calculate the statistical parameters for the gene expression. Bonferroni correction is made on the data to adjust and correct the p-value. The statistical analysis of the identified gene biomarkers is projected in Table 4 and 5 for BP and Schz datasets. Similarly, the results of the identified overlapping genes are given in Table 6.
Table 4: Statistical Analysis of Bipolar Disorder Biomarkers
Genesymbol P.Value t B logFC ACOXI 2.48E-05 4.53 1.196 1.153 TLE3 2.59E05 4.51 1.168 0.812 RRAS2 3.18E-05 4.46 1.031 0.327 CCHCR1 6.21E-05 4.27 0.586 0.875 LAMB4 0.000178 3.97 -0.119 1.494 SF3Al 0.000244 -3.87 -0.329 -0.387 DUSP6 0.000314 -3.8 -0.496 -1.341 ELP5 0.000345 -3.77 -0.559 -0.505
HLA-DRA 0.000345 -3.77 -0.560 -0.762 GINSI 0.000356 3.76 -0.581 0.376 GAREMI 0.000375 -3.74 -0.616 -0.381 NDSTl 0.000376 -3.74 -0.617 -0.873 CCNE1 0.000393 3.73 -0.647 0.404 ETV5 0.000403 -3.72 -0.663 -0.433 IGF1 0.000451 -3.69 -0.738 -1.139 MDK 0.000568 3.62 -0.892 0.759 CES3 0.000578 3.61 -0.904 0.339 VPS13A 0.000583 3.61 -0.910 0.385 B4GALT1 0.000594 3.6 -0.922 0.833 IL33 0.000831 3.5 -1.147 0.666 FAM117A 0.000916 3.47 -1.212 0.245 CDH11 0.001022 -3.43 -1.285 -0.504 COL4A2 0.00108 3.42 -1.322 0.444 PDE12 0.001085 3.41 -1.325 0.879 RAB25 0.001101 3.41 -1.335 0.934 CNTD2 0.001151 3.39 -1.364 0.837 MAP2K6 0.001236 3.37 -1.412 1.029 ANP32C 0.001283 3.36 -1.437 0.909 BTG4 0.001406 3.33 -1.498 0.948 CCDC93 0.001406 3.33 -1.498 0.247 PPP1R14D 0.001444 3.32 -1.515 0.446 THEMIS2 0.001558 -3.3 -1.566 -0.51 TGS1 0.001604 3.29 -1.585 0.295 TRMO 0.001608 -3.29 -1.587 -0.294 HTATIP2 0.001609 -3.29 -1.588 -0.372 DNAJB12 0.001752 -3.26 -1.644 -0.321 CD36 0.001802 -3.25 -1.663 -0.96 LAT2 0.001904 -3.23 -1.700 -0.671 IVNS1ABP 0.001914 -3.23 -1.703 -0.319 CYP2A7P1 0.001991 3.22 -1.729 0.834
GK 0.002027 3.21 -1.741 0.603 KIAA1661 0.002065 3.2 -1.754 0.934 ST14 0.002085 3.2 -1.760 0.917 SDS 0.002148 3.19 -1.780 0.519 STX6 0.00215 -3.19 -1.780 -0.591 RAB6B 0.002228 -3.18 -1.804 -0.512 DUSP4 0.002296 -3.17 -1.824 -1.06 ZNF266 0.002361 3.16 -1.843 0.276 BAZ2A 0.002579 3.13 -1.902 0.3 CCDC181 0.002641 3.12 -1.917 0.284 TIMMIOB 0.002677 -3.12 -1.926 -0.24 DUS2 0.002697 3.11 -1.931 0.319 ERCC5 0.002769 3.11 -1.949 0.324 CNOT9 0.002774 -3.11 -1.950 -0.398
Table 5: Statistical Analysis of Schizophrenia Biomarkers
Genesymbol P.Value t B logFC ST8SIA4 4.58E-05 -4.35 0.605 -1.159 IGFBP6 7.22E-05 -4.22 0.315 -0.283 NOX5 0.000158 -4 -0.189 -0.923 TAF2 0.000159 3.99 -0.193 0.386 TRIM24 0.000203 3.92 -0.348 0.295 FUT7 0.000215 -3.91 -0.386 -0.729 ABCG2 0.000385 -3.73 -0.759 -0.597 CTBP2 0.000427 3.7 -0.826 0.296 MBIP 0.000435 3.69 -0.838 0.413 MGMT 0.000466 -3.67 -0.883 -0.28 CP 0.000483 3.66 -0.905 1.043 NXPH4 0.000533 -3.63 -0.969 -0.322 DOPEY2 0.000565 -3.61 -1.006 -0.772 TRIB2 0.000578 -3.61 -1.02 -0.359 DIAPH2 0.000627 3.58 -1.073 0.28
RAD52 0.000672 -3.56 -1.117 -1.104 ATP6VOE1 0.000689 3.55 -1.133 0.983 TAPTI 0.000698 -3.55 -1.141 -1.383 NR4A3 0.000709 -3.54 -1.151 -0.829 AAAS 0.000716 -3.54 -1.158 -0.509 HNRNPDL 0.000751 -3.53 -1.188 -0.437 IL6ST 0.000782 3.51 -1.214 0.751 DST 0.000852 -3.49 -1.27 -1.134 PHACTR2 0.000885 -3.47 -1.294 -0.511 COL6A2 0.000905 -3.47 -1.308 -0.638 DZIP1 0.000939 3.46 -1.332 0.521 TMEM97 0.001021 -3.43 -1.386 -0.593 HLA-C 0.001024 -3.43 -1.388 -0.477 SLC27A3 0.001064 3.42 -1.412 0.354 NR3C1 0.00107 -3.41 -1.416 -0.33 PLBD1 0.001095 3.41 -1.431 0.784 PRRG3 0.001138 3.39 -1.455 0.806 ATF6B 0.001146 3.39 -1.46 0.402 ASXL1 0.001205 -3.38 -1.492 -0.198 ENTPD1-AS1 0.001234 -3.37 -1.507 -0.2 GNS 0.001294 -3.35 -1.537 -0.732 TUBBI 0.00139 -3.33 -1.584 -1.032 FAM189A2 0.001394 3.33 -1.585 0.403 TNFSF1O 0.001408 -3.33 -1.592 -0.64 PAPOLG 0.001469 3.31 -1.619 0.355 FYB 0.001488 -3.31 -1.627 -0.685 GPR137 0.001498 -3.31 -1.631 -0.715 SLC39A8 0.001552 -3.29 -1.654 -0.376 NOL12 0.001574 -3.29 -1.663 -0.206 SMR3A 0.001589 -3.29 -1.669 -0.717 UBASH3A 0.0016 -3.28 -1.674 -0.842 SMARCA1 0.001651 3.27 -1.694 0.236
FXYD3 0.00176 3.25 -1.735 0.934 DRI 0.001976 -3.22 -1.809 -0.29 KRT6B 0.002009 3.21 -1.819 0.612 ZFP2 0.002023 3.21 -1.824 0.501 MBNL1 0.002114 -3.19 -1.852 -1.081 PALB2 0.00222 3.18 -1.883 0.224 MPZL1 0.002229 3.18 -1.886 0.318 DNAJB6 0.002456 -3.14 -1.948 -0.388 PIAS1 0.002481 3.14 -1.954 0.363 ADD3-AS1 0.002521 3.13 -1.964 0.805 GIP 0.002568 -3.13 -1.976 -0.652 SLC25A38 0.002606 3.12 -1.985 0.195 SCAF11 0.002631 3.12 -1.992 0.28 LRRC31 0.002657 3.12 -1.998 0.81 ERAPI 0.002684 3.11 -2.004 0.633 RAB9BP1 0.002754 3.1 -2.021 1.081 LGI2 0.00295 -3.08 -2.065 -0.683 TRMT61A 0.003228 -3.05 -2.122 -0.271
Table 6: Statistical Analysis of Overlapping Gene Biomarkers
Genesymbol P.Value t B logFC LSTl 5.96E-06 -4.91 2.140 -1.16 HSD11B1 0.000574 -3.61 -0.899 -0.505 DDX27 0.000879 -3.48 -1.184 -0.322 CRHBP 0.001474 -3.31 -1.529 -0.668 BID 0.001813 -3.25 -1.667 -0.328 ATG5 0.002421 -3.15 -1.859 -0.358
The significance level 0.05 is used to identify genes that are differentially expressed through a t-test. Log2 transformation is performed to identify the regulation of genes. From the analysis, positive and negative LogFC scores indicates the up and down-regulation of genes. Moreover, the adjusted p-values are very low that represents the selected biomarkers are differentially expressed.
Figure 9 illustrates a graph of a true positive rate of bipolar disorder schizophrenia achieved on 5 Classifiers in accordance with an embodiment of the present disclosure. The experimental result reveals the potential gene biomarkers of bipolar disorder and schizophrenia and is given in Table 4 and Table 5. These gene features are passed as the input for the computational models to classify the samples and to calculate the prediction accuracy. These models are evaluated using standard metrics such as Accuracy, Precision, Recall, and F-Score. For model evaluation, k-fold cross-validation with 10 folds is adopted. Confusion Matrix is a performance measurement technique, used to evaluate the performance of the classification techniques. Four important parameters in a confusion matrix are True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN). TP represents the predicted result as positive is the same as expected. TN is the result said to be negative and the expected result is also negative. FP, which is known as Type - I error, where the result predicted as positive, but the expected result is negative. In the same way, FN represents Type - II error, where, the result is predicted as negative, but expected to be positive. The ratio of calculating correctly classified instances by total observations is called as Accuracy.
Acc= (TP+TN) (TP+TN+FP+FN) (3)
Precision is the measure of ratio between correctly classified positive instances by total correctly classified positive observations.
TP Precision= TP+FP)
Recall calculates the correctly classified positive instances by all the observations in the same class.
TP Recall= TP+FN) (5)
F-Score is calculated by the weighted average of recall and precision and is represented in equation 4.
(2*(Re*Pre)) F-Score (Re+Pre) (6)
In Table 7 and 8, the results achieved from various classifiers on identified biomarkers are given. The results are calculated from equation 3, 4, 5 and 6. From the results, is clear that the Deep Neural Network model attained better performance on both datasets than other benchmarked classifiers. The graphs on comparing the true positive rate (TPR) of the proposed model with other models are plotted in Figure 9.
Table 7: Performance of the classifiers on bipolar dataset
Classifier Accuracy Precision Recall F-Score BN 79.1 79.5 79.1 79.0 LR 88.0 88.2 88.1 88.1 SVM 82.0 83.1 82.1 81.9 RF 79.1 79.2 79.1 79.1 DNN 97.0 97.2 97.0 97.0
Table 8: Performance of the classifiers on schizophrenia dataset
Classifier Accuracy Precision Recall F-Score
BN 91.3 91.5 91.3 91.3
LR 84.0 84.1 84.1 84.1
SVM 91.3 91.3 91.3 91.3 RF 91.3 91.5 91.3 91.3
DNN 95.6 95.7 95.7 95.7
Figure 10 illustrates a graph of a false positive rate of bipolar disorder schizophrenia achieved on 5 Classifiers in accordance with an embodiment of the present disclosure. The graphs on comparing the false positive rate (FPR) of the proposed model with other models are plotted in Figure 10.
Figure 11 illustrates a graph of a root mean squared error of bipolar disorder schizophrenia achieved on 5 Classifiers in accordance with an embodiment of the present disclosure. The graphs on comparing the root mean squared error (RMSE) of the proposed model with other models are plotted in Figure 11.
Figure 12 illustrates a comparison graph of results obtained from deep neural network with different feature subsets on bipolar disorder dataset in accordance with an embodiment of the present disclosure. The feature subsets obtained from differential expression analysis and SIFRA is evaluated with DNN and plotted as a bar graph in Figure 12. DNN optimally minimizes the error on bipolar disorder dataset to 0.09% and 0.17% on schizophrenia dataset, lowest among all the benchmarked models. By analyzing the results, SIFRA with DNN has shown best performance among all other models. Moreover, an attempt is made to find the overlapping genetic markers of bipolar disorder and schizophrenia with existing feature selection techniques. Correlation based Feature Selection (CBFS), minimum Redundancy Maximum Relevance (mRMR) and Conditional Mutual Information Maximization (CMIM) models are employed for benchmarking the results. But these aforementioned techniques couldn't find any overlapping genes that are common in both the datasets. Also, these methods underperformed when compared with the score of the proposed SIFRA technique in terms of discrimination rate. The accuracy obtained with the benchmarked techniques is 86.63%, 91.62% and 89.21% on CBFS, mRMR and CMIM respectively. These attained results are evaluated with DNN classifier.
The main theme of the proposed study is the identification of genetic overlapping between schizophrenia and bipolar disorder. Currently, the specialized dataset available to conduct this integrated study is GSE12649 alone. In rest of the cases, the genomic data is made only for a specific disorder. So, the attained results can't be benchmarked with other gene expression datasets.
Strategic computational models are in demand to accurately diagnose patients with utmost care along with cost-effective treatment. A similar type of computational models is developed by many researchers for disease diagnosis. A neural network-based disease diagnosis model is proposed to identify novel gene biomarkers of Alzheimer's disease. A brief review of target identification in diseases and discovering personalized drugs using various computational approaches are discussed. A new dimension in approaching the patients with personalized treatment is actively undertaken by the researchers. Molecular analysis of patient blood samples with transcriptomic signatures may find out a way to discover precision medicine. A gastric cancer classification system is developed to identify its risk factors.
The urge for developing this model is high due to the heterogeneity of the disease and its life-threatening conditions. Various computational methodologies on data mining and machine learning are transformed into useful insights to identify new findings in diabetic research. Statistical methods are widely adopted in clinical record analysis, especially in analyzing high throughput genomic data. A comprehensive assessment has performed on heart disease data to predict the risk of heart failure using various state-of-art techniques. Moreover, the adverse effects of heart disease such as re-hospitalization, adverse drug reactions are also examined. An intelligent, application specific computing model is proposed to diagnose three diseases such as breast cancer, heart disease and fertility diagnosis using a back-propagation neural network with rough sets. So, intelligent computational models have more effect on providing better health facilities by assisting medical practitioners. More new findings in the medical field with the help of computational models may increase the chance of survivability for any disease. Additionally, these models are proven its prominence in other dominating fields such as energy optimization, system identification, and electrical circuit designing.
Figure 13 illustrates a comparison graph of results obtained from deep neural network with different feature subsets on schizophrenia dataset in accordance with an embodiment of the present disclosure. The feature subsets obtained from differential expression analysis and SIFRA is evaluated with DNN and plotted as a bar graph in Figure 13.
It is concluded that, the experimental observation reveals important candidate gene biomarkers of schizophrenia and bipolar disorder. These significant genes identified using the proposed model effectively classifies mental disorders than other existing methods. Seven commonly overlapping genes are sorted out in this experiment. From the top 100 differentially expressed genes, SIFRA is applied to rank the most important gene subsets. It identified 67 important gene probes as biomarkers for bipolar disorder and 75 for schizophrenia individually. After ranking, seven overlapping genes on both the datasets are eliminated. Also, the processing on features has made 60 gene biomarkers for bipolar disorder and 68 gene biomarkers for schizophrenia is selected for the next phase. Deep neural network model outperformed other benchmarked classification techniques and achieved high accuracy of 97.01% and 95.65% on bipolar disorder and schizophrenia datasets. Bayes Net, Support Vector Machine and Random Forest obtained same result on schizophrenia dataset whereas the logistic regression shown poor performance over the aforementioned models. The same pattern is not repeated in bipolar disorder classification as the result varies significantly between each model. Conversely, logistic regression model outperforms BN, SVM and RF in bipolar dataset but not DNN. These outcomes highlight the significance and variability of each model under different conditions. The proposed technique is not without limitations as it is prone to result in heavy computational cost with increase in the dimension of the dataset. In future, the proposed SIFRA model is refined to minimize its complexity under any condition. This RGBIC framework is constructed by combining SIFRA with DNN model which proved its efficacy on both datasets. This methodological observation reveals the importance of identifying potential gene biomarkers of mental disorders for accurate disease diagnosis. The proposed intelligent model would assist automated diagnostic procedures for the medical practitioners and is cost effective.
Computational Psychiatry is an emerging field of science. It focuses on identifying the complex relationship between the brain's neurobiology. Mental illness has recently become an important problem to be addressed as the number of people affected is increasing over time. Schizophrenia and Bipolar Disorder are two major types of psychiatric disorders. Most of the people are experienced these illnesses in their lifetime. But, diagnosing psychiatric disorders is even more a complex problem. Genetic factors play a vital role in developing mental illness. Interestingly, few psychiatric disorders have common genetic overlapping between each other. It causes detrimental effect on diagnosing the illness accurately. To overcome this existing issue, a Rank based Gene Biomarker Identification and Classification (RGBIC) framework is proposed to identify the overlapping and non overlapping gene patterns of bipolar disorder and schizophrenia. The dataset used in this experiment is obtained from Gene Expression Omnibus (GEO) database. As an outcome of this experiment, seven biomarkers are identified as the overlapping genes. Also, 60 and 68 informative gene biomarkers are identified on bipolar disorder and schizophrenia dataset as feature subsets to discriminate the samples. Overlapping genes are eliminated to increase the diagnostic accuracy of the disorders. The performance of the proposed system is evaluated with standard existing machine learning techniques. This proposed framework attained
97.01% and 95.65% accuracy on bipolar disorder and schizophrenia dataset with Deep Neural Network model outperformed other benchmarked techniques and proved its efficacy.
The system primarily focuses on developing a highly reliable computational model to identify gene biomarkers of schizophrenia and bipolar disorder by eliminating its overlapping genetic association. This model is constructed in two phases. In phase I, a novel gene ranking technique is developed to identify the non-overlapping genes, which is bagged together by removing the overlapping genes and the rest is considered as the riskiest genes of both mental disorders. These genes are fed as the input for the next phase. In phase II, a deep neural network model is built to classify the given data. However, the objective of the proposed work is to find the overlapping association between schizophrenia and bipolar disorder. So, before model training, the genes that overlaps in both the datasets are rooted out. The non overlapping, candidate genetic features are inputted into learning model to classify schizophrenia and bipolar disorder affected individuals against the healthy samples.
The drawings and the forgoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, orders of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts necessarily need to be performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples. Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. The scope of embodiments is at least as broad as given by the following claims.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or component of any or all the claims.

Claims (1)

  1. WE CLAIM 1. A system for predicting bipolar disorder and schizophrenia based on non overlapping genetic phenotypes, the system comprises:
    an input module for receiving microarray gene expression samples from GEO database, wherein the samples are taken of the prefrontal cortex of the human brain of total 102 samples from post-mortem brains; a pre-processing module in connection with the input module for rectifying background using Robust Multi-array Average (RMA) technique, normalizing the probe values using Quantile Normalization technique and summarizing with median polish using 'limma' library, analysis of differential gene expression; a gene ranking model for selecting the biomarker genes from the given data in order to extract optimal gene features and thereby eliminating overlapping genes, wherein the rank of each feature vector is calculated by subtracting the weight with calculated JI value, wherein the genes are extracted through GEO2R library, wherein 100 genes with smaller p-value are selected as significant genes for the identification of potential gene biomarkers; and a classification module in association with the gene ranking model for dividing given dataset into two different datasets to find the phenotype genetic markers of the diseases individually and thereby analysing two different datasets together to avoid overlapping association identification.
    2. The system as claimed in claim 1, wherein a significance level of p-value < 0.05 is considered to select the most informative, discriminative genes, wherein the rank of each gene is calculated by Gene Mania tool based on its importance.
    3. The system as claimed in claim 1, wherein a Quantile Normalization technique is used to normalize the probe values, wherein a median polish is employed to summarize the probe values and analysis of differential gene expression is performed by a library.
    4. The system as claimed in claim 1, wherein Jaccard index (JI) is a statistical method used to calculate the similarity and diversity range between two different sample sets, wherein the similarity between bipolar disorder and schizophrenia dataset is calculated using JI in which the similarity value of the two sets of data lies in the range from 0% to 100%.
    5. A method for predicting bipolar disorder and schizophrenia based on non overlapping genetic phenotypes, the method comprises:
    receiving microarray gene expression samples from GEO database, wherein the samples are taken of the prefrontal cortex of the human brain of total 102 samples from post-mortem brains; rectifying background using Robust Multi-array Average (RMA) technique, normalizing the probe values using Quantile Normalization technique and summarizing with median polish using 'limma' library, analysis of differential gene expression; selecting the biomarker genes from the given data in order to extract optimal gene features and thereby eliminating overlapping genes, wherein the rank of each feature vector is calculated by subtracting the weight with calculated JI value, wherein the genes are extracted through GEO2R library, wherein 100 genes with smaller p value are selected as significant genes for the identification of potential gene biomarkers; and dividing given dataset into two different datasets to find the phenotype genetic markers of the diseases individually and thereby analysing two different datasets together to avoid overlapping association identification.
    6. The method as claimed in claim 5, wherein the method further comprises:
    receiving microarray gene expression samples from GEO database, wherein the samples are taken of the prefrontal cortex of the human brain of total 102 samples from post-mortem brains; examining gene expression patters of mitochondrial genes of at least 33 bipolar disorder patients' samples, 34 normal people samples and 35 schizophrenia patient samples; capturing and analysing the genomic patterns using Affymetrix Human Genome Array (U133A), wherein 22283 gene expression probes are given in the dataset for each sample; dividing given dataset into two different datasets after pre-processing to find the phenotype genetic markers of the diseases individually; and analysing two different datasets together to avoid overlapping association identification, wherein initially the dataset has 102 samples with 33 bipolar disorder patients, 34 control samples and 35 schizophrenia patients whereas later on single dataset is separated into bipolar disorder dataset (33 bipolar + 34 control, totally 67 samples) and schizophrenia dataset (35 schizophrenia + 34 control, totally 69 samples).
    7. The method as claimed in claim 5, wherein a process to develop the RGBIC framework comprises:
    acquiring microarray gene expression dataset from standard resources and investigating various related works to identify and thereby remove key gap in the system; perform preliminary data analysis, pre-processing and transformations on microarray gene expression dataset; identifying features with high importance and thereafter eliminating remaining features; and applying various machine learning models to identify best model and evaluating performance using standard metrics.
    8. The method as claimed in claim 5, wherein a process for calculating noise ratio comprises:
    calculating ratio between the power of a signal and power of noise of an input signal which is represented as the ratio of mean to standard deviation of any measurement or a signal; mapping the input signal with the gene features and thereby calculating strength of the gene; calculating similarity between two different datasets using Jaccard index; calculating signal to noise ratio (SNR) by finding the mean and standard deviation of each feature using plurality of positive values from the datasets; and calculating difference between the signals from the maximum and minimum SNR upon calculating difference between the weakest and strongest signal and thereby weight of each feature is calculated.
    9. The method as claimed in claim 5, wherein gene network and pathway analysis are performed to identify the similarity between functional genes, wherein the backbone of this tool is built with a large amount of functional gene association data with logical interactions between each gene, wherein in the same way, few more tools are available for gene regulation prediction such as GeneMania, FunCoup, STRING, VisANT, etc.
    10. The method as claimed in claim 5, wherein in DNN model, 60 gene biomarkers are selected from SIFRA as input, wherein the number of input nodes is 60, with 10, 20 and 10 hidden layers.
    Input Module Pre-processing 102 Module 104
    Gene Ranking Classification Model 106 Module 108
    Figure 1
    0
    receiving microarray gene expression samples from GEO database, wherein the samples are taken of the prefrontal cortex of the 2 202 human brain of total 102 samples from postmortem brains
    204 2 rectifying background using Robust Multi-array Average (RMA) RMA) technique, normalizing the probe values using Quantile Normalization technique and summarizing with median polish using ‘limma’ library, analysis of differential gene expression
    2 206 selecting the biomarker genes from the given data in order to extract extra optimal gene features and thereby eliminating overlapping genes, wherein the rank of each feature vector is calculated by subtracting the weight with calculated JI value, wherein the genes are extracted through GEO2R library, wherein 100 genes with smaller p-value are selected as significant genes for the identification of potential gene biomarkers
    dividing given dataset into two different datasets to find the phenotype phenot genetic markers of the diseases individually and thereby 208 analyzing two different datasets together to avoid overlapping association identification
    Figure 2
    404
    406
    408
    410 4
    412
    414
    Figure 3 Figure 4
    Figure 5
    Figure 6 Figure 7
    Figure 8
    Figure 9 Figure 10
    Figure 11
    Figure 12 Figure 13
AU2021100434A 2021-01-23 2021-01-23 A system and method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes Ceased AU2021100434A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2021100434A AU2021100434A4 (en) 2021-01-23 2021-01-23 A system and method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2021100434A AU2021100434A4 (en) 2021-01-23 2021-01-23 A system and method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes

Publications (1)

Publication Number Publication Date
AU2021100434A4 true AU2021100434A4 (en) 2021-04-15

Family

ID=75397152

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2021100434A Ceased AU2021100434A4 (en) 2021-01-23 2021-01-23 A system and method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes

Country Status (1)

Country Link
AU (1) AU2021100434A4 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239032A (en) * 2021-06-02 2021-08-10 云南电网有限责任公司电力科学研究院 Power distribution network power distribution equipment operation and maintenance monitoring method, device and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239032A (en) * 2021-06-02 2021-08-10 云南电网有限责任公司电力科学研究院 Power distribution network power distribution equipment operation and maintenance monitoring method, device and system

Similar Documents

Publication Publication Date Title
US20240029892A1 (en) Disease monitoring from insurance claims data
Karthik et al. Predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes using deep neural network
Bonilla-Huerta et al. Hybrid framework using multiple-filters and an embedded approach for an efficient selection and classification of microarray data
US9940383B2 (en) Method, an arrangement and a computer program product for analysing a biological or medical sample
US11972870B2 (en) Systems and methods for predicting patient outcome to cancer therapy
Joshi et al. An ensembled SVM based approach for predicting adverse drug reactions
Ahmed et al. Early detection of Alzheimer's disease using single nucleotide polymorphisms analysis based on gradient boosting tree
AU2021100434A4 (en) A system and method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes
US20180181705A1 (en) Method, an arrangement and a computer program product for analysing a biological or medical sample
US20150094223A1 (en) Methods and apparatuses for diagnosing cancer by using genetic information
Casalino et al. Evaluation of cognitive impairment in pediatric multiple sclerosis with machine learning: an exploratory study of miRNA expressions
Rashid et al. Network-based identification of diagnosis-specific trans-omic biomarkers via integration of multiple omics data
Purohit et al. Predicting Mental Health Disorders Post Long COVID Diagnosis Using Advanced Machine Learning Techniques
Chitode et al. A comparative study of microarray data analysis for cancer classification
Deng et al. Cross-platform analysis of cancer biomarkers: a Bayesian network approach to incorporating mass spectrometry and microarray data
Zhang et al. A two-stage machine learning approach for pathway analysis
Sudha Unlocking Biomarker Identification-Harnessing AI and ML for Precision Medicine: AI and ML for Precision Medicine
Ali et al. MACHINE LEARNING IN EARLY GENETIC DETECTION OF MULTIPLE SCLEROSIS DISEASE: ASurvey
Maalej et al. Risk Factors of Breast Cancer Determination: a Comparative Study on Different Feature Selection Techniques
El-Gawady et al. Hybrid Feature Selection Method for Predicting Alzheimer’s Disease Using Gene Expression Data
Kalkan et al. Prediction of Alzheimer’s Disease by a Novel Image-Based Representation of Gene Expression. Genes 2022, 13, 1406
Ead et al. Feedforward Deep Learning Optimizer-based RNA-Seq Women's Cancers Detection with a Hybrid Classification Models for Biomarker Discovery
Sánchez-Cruz et al. Epigenetic Target Prediction with Accurate Machine Learning Models
ALEESA et al. AN EARLY RNA-SEQ DETECTION SYSTEM FOR BREAST TUMOURS BASED ON MACHINE LEARNING
Gulande et al. Systematic Study of Gen Profiles Analysis Methods in Disease Classification

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry