CN112863605A - Platform, method, computer device and medium for determining dysnoesia genes - Google Patents

Platform, method, computer device and medium for determining dysnoesia genes Download PDF

Info

Publication number
CN112863605A
CN112863605A CN202110152883.8A CN202110152883A CN112863605A CN 112863605 A CN112863605 A CN 112863605A CN 202110152883 A CN202110152883 A CN 202110152883A CN 112863605 A CN112863605 A CN 112863605A
Authority
CN
China
Prior art keywords
data
gene
pathogenic
clinical phenotype
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110152883.8A
Other languages
Chinese (zh)
Inventor
朱丽娜
马秀伟
杨晓
封志纯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
7th Medical Center of PLA General Hospital
Original Assignee
7th Medical Center of PLA General Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 7th Medical Center of PLA General Hospital filed Critical 7th Medical Center of PLA General Hospital
Priority to CN202110152883.8A priority Critical patent/CN112863605A/en
Publication of CN112863605A publication Critical patent/CN112863605A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Abstract

A platform, method, computer device and medium for determining intellectual impairment genes, the platform comprising a data acquisition module, a data processing module and a gene presentation module; the data acquisition module is used for acquiring clinical phenotype data and candidate pathogenic gene data of a target patient; inputting the clinical phenotype data and the candidate disease-causing gene data to a data processing module; the data processing module is used for inputting the received clinical phenotype data and the candidate pathogenic gene data into a pathogenic gene determination model and determining target pathogenic gene data in the candidate pathogenic gene data; inputting the target disease-causing gene data to a gene display module; and the gene display module is used for displaying the received target pathogenic gene data.

Description

Platform, method, computer device and medium for determining dysnoesia genes
Technical Field
The present invention relates to the field of mental retardation gene and phenotypic data processing technology, and in particular, to a platform, method, computer device and medium for determining a mental retardation gene.
Background
Mental retardation, also known as mental retardation, is a complex disease mainly caused by abnormal development of the central nervous system and possibly accompanied by symptoms such as metabolic disorder, patients usually show obvious defects in intelligence, behaviors and the like before 18 years old, the incidence rate of the patients is 1 to 3 percent, and the mental retardation becomes a social problem worldwide.
In order to further explore the genetic mechanism of mental retardation, a great deal of research work has been carried out, and more pathogenic genes related to mental retardation have been found, and meanwhile, a great deal of clinical phenotype data of related genes are obtained. However, for patients with unknown disease genes, the search for their disease genes is often time-consuming and labor-consuming depending on the empirical analysis of researchers, and the search for target disease genes is not accurate.
Disclosure of Invention
In view of the above, the present invention aims to provide a platform, a method, a computer device and a medium for determining intellectual impairment genes, which can improve the accuracy of determining target disease genes in the prior art.
In a first aspect, an embodiment of the present application provides a platform for determining a dysnoesia gene, where the platform includes a data acquisition module, a data processing module, and a gene display module;
the data acquisition module is used for acquiring clinical phenotype data and candidate pathogenic gene data of a target patient; inputting the clinical phenotype data and the candidate disease-causing gene data to a data processing module;
the data processing module is used for inputting the received clinical phenotype data and the candidate pathogenic gene data into a pathogenic gene determination model and determining target pathogenic gene data in the candidate pathogenic gene data; inputting the target disease-causing gene data to a gene display module;
and the gene display module is used for displaying the received target pathogenic gene data.
Optionally, the platform further comprises:
the database module is used for storing the literature related to the intellectual disability and the incidence relation between the pathogenic gene data and the clinical phenotype data; wherein the correlation between the pathogenic gene data and clinical phenotype data is determined according to the mental disorder-related literature.
Optionally, the platform further comprises:
the query module is used for finding target data corresponding to the target pathogenic gene data in the database module according to the target pathogenic gene data; the target data comprises literature relating to the intellectual disability associated with the target disease gene.
Optionally, the training process of the pathogenic gene determination model includes:
acquiring a training sample set; the training sample set comprises at least one training sample, and the training sample comprises a positive sample and a negative sample; the positive sample consists of correlation data between phenotypes of patients with known disease genes and disease genes; the negative sample consists of the correlation data between the clinical phenotype of the patient with the known pathogenic gene and the gene with the lowest similarity to the clinical phenotype of the patient with the known pathogenic gene in the database;
and aiming at the training sample set, training by using a machine learning algorithm to obtain a pathogenic gene determination model.
Optionally, the association of the pathogenic gene data with clinical phenotype data is determined by:
determining, for each mental retardation-related document, the causal gene data and clinical phenotype data corresponding to the causal gene data in the document;
counting, for each disease-causing gene data, each clinical phenotype data corresponding to the disease-causing gene data;
for each pathogenic gene data, determining an association between each clinical phenotype corresponding to the pathogenic gene data and each corresponding clinical phenotype data according to a relationship between each clinical phenotype corresponding to the pathogenic gene data and a common ancestor between the phenotypes.
In a second aspect, embodiments of the present application provide a method of determining a dysnoesia gene, the method comprising:
acquiring clinical phenotype data and candidate disease-causing gene data of a target patient;
inputting the clinical phenotype data and the candidate pathogenic gene data into a pathogenic gene determination model, and determining target pathogenic gene data in the candidate pathogenic gene data;
and displaying the target disease-causing gene data.
Optionally, the method further includes:
searching target data corresponding to the target pathogenic gene data in a database according to the target pathogenic gene data; wherein, the database stores the literature related to the intellectual disability and the correlation between the pathogenic gene data and the clinical phenotype data; the correlation between the pathogenic gene data and clinical phenotype data is determined according to the mental disorder related literature;
and displaying the target data.
Optionally, the method further comprises correlating the disease causing gene data with clinical phenotype data by:
determining, for each mental retardation-related document, the causal gene data and clinical phenotype data corresponding to the causal gene data in the document;
counting, for each disease-causing gene data, each clinical phenotype data corresponding to the disease-causing gene data;
for each pathogenic gene data, determining an association between each clinical phenotype corresponding to the pathogenic gene data and each corresponding clinical phenotype data according to a relationship between each clinical phenotype corresponding to the pathogenic gene data and a common ancestor between the phenotypes.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the steps of the method.
The platform for determining the dysnoesia genes, which is provided by the embodiment of the application, firstly acquires clinical phenotype data and candidate pathogenic gene data of a target patient through a data acquisition module; then inputting the clinical phenotype data and the candidate pathogenic gene data into a pathogenic gene determination model in the data processing module, and efficiently determining target pathogenic gene data from the candidate pathogenic gene data through the pathogenic gene determination model; the pathogenic mechanism of the target patient can be accurately judged through the incidence relation between the target pathogenic gene data and the clinical phenotype data, and the accuracy of finding the pathogenic gene data of the target patient is improved.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flow chart illustrating a platform for determining a dysnoesia gene according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method for determining a pathogenic gene determination model according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart diagram illustrating a method for determining a dysnoesia gene according to an embodiment of the present invention;
fig. 4 shows a schematic structural diagram of a computer program provided in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
In the prior art, finding a corresponding pathogenic gene for a patient with unknown pathogenic gene usually depends on a great amount of analysis work of researchers, which is time-consuming and labor-consuming, and the accuracy of the found target pathogenic gene is not high.
Based on the above defects, the present application provides a platform for determining a dysnoesia gene, as shown in fig. 1, the platform includes a data acquisition module 101, a data processing module 102, and a gene display module 103;
the data acquisition module 101 is used for acquiring clinical phenotype data and candidate disease-causing gene data of a target patient; inputting the clinical phenotype data and the candidate disease causing gene data to a data processing module 102;
the data processing module 102 is configured to input the received clinical phenotype data and the candidate disease causing gene data into a disease causing gene determination model, and determine target disease causing gene data in the candidate disease causing gene data; inputting the target disease-causing gene data to a gene display module;
the gene display module 103 is configured to display the received target disease causing gene data.
In the data acquisition module 101, clinical phenotype data and candidate disease causing gene data of a target patient need to be acquired first. Wherein the target patient may be a patient for whom no disease-causing gene is known. The clinical phenotype data may be a set of data lists normalized with reference to phenotype information in an hpo (human phenotyping) database, for example: 0008897 parts of HP; 0011927 parts of HP; and HP 0045027. Wherein the HPO database provides a standardized vocabulary of phenotypic abnormalities encountered in human disease; 0008897 parts of HP; 0011927 parts of HP; HP 0045027 is a type of phenotypic abnormality in a standardized vocabulary in the HPO database. The candidate disease-causing Gene data may be a set of data lists standardized with reference to Gene name information in the hgnc (hugo Gene Nomenclature committee) database, such as: ZDHHC 9; hypotonia; LongFace. Wherein the HGNC database provides standardized information for human gene naming; ZDHHC9 is the unique identification name of zinc finger gene; hypotonia is the unique identification name of the dystonia gene; longface is the unique identification name of the facial elongation gene. The acquired clinical phenotype data list and candidate disease causing gene data list of the target patient are input into the data processing module 102, so that each candidate disease causing gene in the candidate gene list can be subjected to pathogenicity analysis.
In the data processing module 102, the acquired clinical phenotype data of the target patient and the candidate pathogenic gene data are firstly input into the pathogenic gene determination model, and then the target pathogenic gene data can be determined from the candidate pathogenic gene data through the pathogenic gene determination model. The pathogenic gene determination model is constructed based on the prior knowledge of the pathogenic genes and the phenotype information of the known mental disorder patients, and the pathogenicity of candidate pathogenic gene data can be evaluated according to the clinical phenotype data of the target patients, so that the possible pathogenic genes of the patients can be judged in advance. The target pathogenic gene data is gene data which is screened from candidate pathogenic gene data through a pathogenic gene determination model and has high correlation with the phenotype information of the target patient. And the pathogenicity of the candidate gene can be evaluated by analyzing the similarity of the candidate pathogenic gene and clinical phenotype data of the target patient through a pathogenic gene model, and an evaluation result is output. The evaluation result comprises a similarity score of the candidate pathogenic gene and the clinical phenotype data of the target patient, and the higher the similarity score is, the higher the pathogenicity of the candidate pathogenic gene is, so that the pathogenic gene model can accurately find the target pathogenic gene data which has pathogenic relation with the clinical phenotype data of the target patient from the candidate pathogenic gene data. Specifically, as shown in fig. 2, in the data processing module 102, the training process of the pathogenic gene determination model includes the following steps:
s201, acquiring a training sample set; the training sample set comprises at least one training sample, and the training sample comprises a positive sample and a negative sample; the positive sample consists of the phenotype of the known pathogenic gene and the pathogenic gene correlation data thereof; the negative sample consists of the correlation data between the phenotype of the patient with known pathogenic genes and the gene with the lowest similarity in the database;
s202, training by using a machine learning algorithm based on the training sample set to obtain a pathogenic gene determination model.
In step S201, a training sample set needs to be acquired first. The training sample set comprises known pathogenic gene data and clinical phenotype data of the mental disorder patients after standardized treatment. The known pathogenic gene data and clinical phenotype data of the mental disorder patients can be derived from literature information of reported cases of each platform, but the literature information data of each platform are different in form and relatively dispersed in distribution, so that the standardized processing is required to be carried out, and the processing process for obtaining the known pathogenic gene data and clinical phenotype data of the mental disorder patients particularly aiming at the literature information data form is different and comprises the following steps:
step 2011, the phenotype information of the patient appearing in the literature is found out by manual consulting, and then the phenotype information is matched with the HPO database in character similarity, and is manually adjusted to be HPO standardized phenotype description;
step 2012, recording the basic information appearing in the literature, such as clinical phenotype, gene name, mutation information and the like of the sample family sample according to the description of the sample in the literature;
step 2013, finding out the gene name information appearing in the literature by means of manual reference, and acquiring related description information of the gene from an HGNC database, such as the alias of the gene, the chromosome location and the like;
step 2014, based on the found gene name information, obtaining the biological functions and pathway information corresponding to the genes from the GO and KEGG databases. Wherein the GO (Gene ontology) gene ontology database provides information data about gene functions; KEGG (Kyoto Encyclopedia of Genes and genomes) Kyoto Gene and genome databases provide information on the computer representation of Genes.
Through the steps 2011 to 2014, the document data which are different in data form and relatively distributed and scattered can be standardized and stored, so that a researcher can be helped to quickly inquire and inquire the genes or phenotypes related to the mental disorders, and the method has great significance for the subsequent mental disorder research.
The training sample set can be constructed by collecting the known pathogenic gene data and clinical phenotype data of the mental disorder patients after standardized treatment. Wherein the training sample set comprises at least one training sample, and the training sample comprises a positive sample and a negative sample. Composing a positive sample of patient phenotypes for known disease genes and their disease gene association data; forming a negative sample by the correlation data between the phenotype of the patient with known pathogenic genes and the gene with the lowest similarity in the database; the clinical phenotype data and the pathogenic gene data can be used for calculating similarity by 5 methods including ERIC _ Sim, ERIC _ Norm, Resnik _ Sim, Lin _ Sim and JC _ Sim, the methods are all used for measuring the similarity between any two objects based on topological structure information of a graph, and the core idea is as follows: two objects are similar if they are referenced by objects that are similar to them.
Finally, based on the training sample set, a machine learning algorithm is used to train to obtain a pathogenic gene determination model in step S202.
In step S202, the Machine learning algorithm may be a Support Vector Machine (SVM) algorithm and a Gradient Boosting Decision Tree (XGBoost) algorithm.
And through the trained pathogenic gene determination model, pathogenicity evaluation can be carried out on the candidate pathogenic gene data according to clinical phenotype data of the target patient, the target pathogenic gene data can be accurately determined from the candidate pathogenic gene data through an evaluation result, and pathogenic gene data with high pathogenic correlation degree with the target patient are found. After the target pathogenic gene data are obtained, the data can be displayed through the gene display module 103, so that the research personnel can conveniently analyze and process the data.
In the gene display module 103, the target data can be displayed in a list manner. Wherein, the data of pathogenicity score, pathogenicity possibility, similarity ranking with phenotype and the like in the target pathogenicity gene data can be displayed. The pathogenic relation between the target pathogenic gene data and the clinical phenotype data of the target patient can be intuitively judged through displaying the pathogenic score, pathogenic possibility, similarity ranking with phenotype and other data, and the accuracy of determining the pathogenic gene data is improved.
Optionally, the platform for determining a dysnoesia gene provided in the embodiment of the present application further includes:
the database module is used for storing the literature related to the intellectual disability and the incidence relation between the pathogenic gene data and the clinical phenotype data; wherein the correlation between the pathogenic gene data and clinical phenotype data is determined according to the mental disorder-related literature.
When the database module 104 is implemented, based on extracting the disease-causing gene information and the phenotype information known by the patient from the document after the standardization treatment, the correlation between the disease-causing gene information and the phenotype information can be analyzed through similarity calculation between the information. Specifically, analyzing the association between genes and phenotypes based on literature in the database module 104 comprises the following steps:
step 1041, determining, for each intellectual impairment related document, the pathogenic gene data and clinical phenotype data corresponding to the pathogenic gene data in the document.
Step 1042, for each pathogenic gene data, counting each clinical phenotype data corresponding to said pathogenic gene data.
Step 1043, determining, for each causal gene data, an association between the causal gene data and each corresponding clinical phenotype data according to a relationship between each clinical phenotype corresponding to the causal gene data and a common ancestor between the phenotypes.
In the above step 1041, based on each mental disorder-related document, the pathogenic gene data and the corresponding clinical phenotype data can be determined, but one pathogenic gene data may include a plurality of clinical phenotype data, so we need to count each clinical phenotype data corresponding to the pathogenic gene data through the step 1042. For each virulence gene data, the correlation between the two phenotypes can be assessed according to the relationship between the different clinical phenotypes and the common ancestors between the phenotypes, building a phenotype co-expression network, via step 1043. The correlation between the pathogenic gene data and each clinical phenotype data can be accurately analyzed by a phenotype co-expression network.
According to the method, reported literature data related to the mental disorder diseases are stored through the database module, and the association relationship between the genes and the phenotype information is analyzed, so that researchers can find the pathogenic mechanism of the mental disorder diseases from the association relationship between the genes and the phenotype information, the researchers or doctors can be assisted to quickly find out possible pathogenic genes, a personalized treatment method is formulated according to similar reported treatment methods and recurrence conditions of patients, and the treatment effect of the patients is effectively improved.
Optionally, based on the database module, the target pathogenic gene data having pathogenic relation with the clinical phenotype data of the target patient can be screened from the candidate pathogenic gene data according to the pathogenic gene determination model, and the pathogenic mechanism of the target patient can be better understood through the correlation analysis of the target pathogenic gene data. Specifically, the application also provides a query module, and the query module can find target data corresponding to the target pathogenic gene data in the database module according to the target pathogenic gene data; the target data comprises the literature related to the intellectual disability related to the target disease treatment gene, so that data support is provided for researchers, and reference basis is provided for diagnosis and research of later-stage diseases.
Specifically, in the query module, the corresponding target data can be found according to the target disease-causing gene database module. Wherein the data can comprise the intelligence disorder related documents related to the target disease treating genes and can also comprise basic gene information and reported variation information of target disease causing gene data; phenotypic information associated with target disease gene data; patient information related to target disease gene data; functional annotation information and pathway information associated with the target disease gene data. The pathogenicity of the target pathogenic gene data can be further analyzed by acquiring the data corresponding to the target pathogenic gene data, so that the accuracy of determining the pathogenic gene data of the target patient is improved, and the pathogenic mechanism of the target patient is further understood.
The method comprises the steps of firstly, acquiring clinical phenotype data and candidate pathogenic gene data of a target patient through a data acquisition module 101; then inputting the clinical phenotype data and the candidate pathogenic gene data into a pathogenic gene determination model in the data processing module 102, and determining target pathogenic gene data from the candidate pathogenic gene data through the pathogenic gene determination model; whether pathogenicity relevance exists between the target pathogenic gene data and the target patient can be accurately judged through the incidence relation between the target pathogenic gene data and the clinical phenotype data, and the accuracy of finding the targeted pathogenic gene of the target patient is improved. Finally, the gene display module 103 is used for displaying, so that the research personnel can conveniently analyze and process the gene, and the data support and the gene phenotype association relation research can be improved.
In a second aspect, the present embodiments provide a method for determining a dysnoesia gene, as shown in fig. 3, the method comprising:
s301, clinical phenotype data and candidate disease-causing gene data of the target patient are obtained.
S302, inputting the clinical phenotype data and the candidate pathogenic gene data into a pathogenic gene determination model, and determining target pathogenic gene data in the candidate pathogenic gene data.
S303, displaying the target disease-causing gene data.
Optionally, the method further includes:
searching target data corresponding to the target pathogenic gene data in a database according to the target pathogenic gene data; wherein, the database stores the literature related to the intellectual disability and the correlation between the pathogenic gene data and the clinical phenotype data; the correlation between the pathogenic gene data and clinical phenotype data is determined according to the mental disorder related literature;
and displaying the target data.
Optionally, the method further comprises correlating the disease causing gene data with clinical phenotype data by:
determining, for each mental retardation-related document, the causal gene data and clinical phenotype data corresponding to the causal gene data in the document;
counting, for each pathogenic gene data, the number of occurrences of each clinical phenotype data corresponding to the pathogenic gene data;
and determining the incidence relation between the pathogenic gene data and each corresponding clinical phenotype data according to the occurrence frequency of each clinical phenotype data corresponding to the pathogenic gene data aiming at each pathogenic gene data.
The method for determining the dysnoesia genes, provided by the embodiment of the application, comprises the steps of firstly, acquiring clinical phenotype data and candidate pathogenic gene data of a target patient; inputting the clinical phenotype data and the gene data into a pathogenic gene determination model, and accurately screening target pathogenic gene data in the candidate pathogenic gene data through the pathogenic gene determination model; and the data of the target pathogenic genes are displayed, so that a researcher can be helped to quickly inquire the genes or phenotypes related to the mental disorder, and the method has great significance for the subsequent mental disorder research. Moreover, whether the target patient has a pathogenic relation with the target patient can be judged by analyzing the correlation between the target pathogenic gene data and the clinical phenotype data of the target patient, so that the accuracy of determining the target pathogenic gene is improved.
Corresponding to the method in fig. 3, the embodiment of the present application further provides a computer device 400, as shown in fig. 4, the device includes a memory 401, a processor 402 and a computer program stored in the memory 401 and executable on the processor 402, wherein the processor 402 is configured to implement the method for determining a dysnoesia gene when executing the computer program.
Specifically, the memory 401 and the processor 402 can be general-purpose memory and processor, and are not limited in particular, and when the processor 402 runs the computer program stored in the memory 401, the method for determining the dysnoesia gene can be executed, so that the problem of low accuracy in determining the causative gene in the prior art is solved.
Corresponding to a method for determining a dysnoesia gene in fig. 3, the present application further provides a computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs the steps of the method for determining a dysnoesia gene.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, when a computer program on the storage medium is executed, the method for determining the dysnoesia gene can be executed, so that the problems that in the prior art, for a patient with unknown pathogenic gene, a great amount of analysis work of researchers is required to find the pathogenic gene, and the accuracy of the found pathogenic gene is not high are solved. In the embodiments provided by the present invention, it should be understood that the disclosed platform and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A platform for determining dysnoesia genes, which comprises a data acquisition module, a data processing module and a gene display module;
the data acquisition module is used for acquiring clinical phenotype data and candidate pathogenic gene data of a target patient; inputting the clinical phenotype data and the candidate disease-causing gene data to a data processing module;
the data processing module is used for inputting the received clinical phenotype data and the candidate pathogenic gene data into a pathogenic gene determination model and determining target pathogenic gene data in the candidate pathogenic gene data; inputting the target disease-causing gene data to a gene display module;
and the gene display module is used for displaying the received target pathogenic gene data.
2. The platform of claim 1, further comprising:
the database module is used for storing the literature related to the intellectual disability and the incidence relation between the pathogenic gene data and the clinical phenotype data; wherein the correlation between the pathogenic gene data and clinical phenotype data is determined according to the mental disorder-related literature.
3. The platform of claim 1, further comprising:
the query module is used for finding target data corresponding to the target pathogenic gene data in the database module according to the target pathogenic gene data; the target data comprises the intellectual impairment related literature associated with the target disease causing gene.
4. The platform of claim 1, wherein the training process of the pathogenic gene determination model comprises:
acquiring a training sample set; the training sample set comprises at least one training sample, and the training sample comprises a positive sample and a negative sample; the positive sample consists of the clinical phenotype of the known disease causing gene and the correlation data between the disease causing genes; the negative sample consists of the correlation data between the clinical phenotype of the patient with the known pathogenic gene and the gene with the lowest similarity to the clinical phenotype of the patient with the known pathogenic gene in the database;
and aiming at the training sample set, training by using a machine learning algorithm to obtain a pathogenic gene determination model.
5. The platform of claim 2, wherein the association of pathogenic gene data with clinical phenotype data is determined by:
determining, for each mental retardation-related document, the causal gene data and clinical phenotype data corresponding to the causal gene data in the document;
counting, for each disease-causing gene data, each clinical phenotype data corresponding to the disease-causing gene data;
for each pathogenic gene data, determining an association between each clinical phenotype corresponding to the pathogenic gene data and each corresponding clinical phenotype data according to a relationship between each clinical phenotype corresponding to the pathogenic gene data and a common ancestor between the phenotypes.
6. A method of determining a dysnoesia gene, the method comprising:
acquiring clinical phenotype data and candidate disease-causing gene data of a target patient;
inputting the clinical phenotype data and the candidate pathogenic gene data into a pathogenic gene determination model, and determining target pathogenic gene data in the candidate pathogenic gene data;
and displaying the target disease-causing gene data.
7. The method of claim 6, further comprising:
searching target data corresponding to the target pathogenic gene data in a database according to the target pathogenic gene data; wherein, the database stores the literature related to the intellectual disability and the correlation between the pathogenic gene data and the clinical phenotype data; the correlation between the pathogenic gene data and clinical phenotype data is determined according to the mental disorder related literature;
and displaying the target data.
8. The method of claim 7, further comprising correlating the disease causing gene data to clinical phenotype data by:
determining, for each mental retardation-related document, the causal gene data and clinical phenotype data corresponding to the causal gene data in the document;
counting, for each disease-causing gene data, each clinical phenotype data corresponding to the disease-causing gene data;
for each pathogenic gene data, determining an association between each clinical phenotype corresponding to the pathogenic gene data and each corresponding clinical phenotype data according to a relationship between each clinical phenotype corresponding to the pathogenic gene data and a common ancestor between the phenotypes.
9. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of the preceding claims 6-8 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of the preceding claims 6-8.
CN202110152883.8A 2021-02-03 2021-02-03 Platform, method, computer device and medium for determining dysnoesia genes Pending CN112863605A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110152883.8A CN112863605A (en) 2021-02-03 2021-02-03 Platform, method, computer device and medium for determining dysnoesia genes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110152883.8A CN112863605A (en) 2021-02-03 2021-02-03 Platform, method, computer device and medium for determining dysnoesia genes

Publications (1)

Publication Number Publication Date
CN112863605A true CN112863605A (en) 2021-05-28

Family

ID=75986569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110152883.8A Pending CN112863605A (en) 2021-02-03 2021-02-03 Platform, method, computer device and medium for determining dysnoesia genes

Country Status (1)

Country Link
CN (1) CN112863605A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103038635A (en) * 2010-05-11 2013-04-10 威拉赛特公司 Methods and compositions for diagnosing conditions
WO2014199944A1 (en) * 2013-06-12 2014-12-18 公立大学法人横浜市立大学 Method for detecting intractable epilepsy accompanying severe intellectual disability and motor development retardation
US20150066378A1 (en) * 2013-08-27 2015-03-05 Tute Genomics Identifying Possible Disease-Causing Genetic Variants by Machine Learning Classification
US20160083792A1 (en) * 2013-01-25 2016-03-24 Murdoch Childrens Research Institute An assay for quantitating the extent of methylation of a target site
CN106191032A (en) * 2015-05-06 2016-12-07 戴勇 The Disease-causing gene model of dysnoesia disease and construction method thereof and application
US20160357903A1 (en) * 2013-09-20 2016-12-08 University Of Washington Through Its Center For Commercialization A framework for determining the relative effect of genetic variants
CN108363902A (en) * 2018-01-30 2018-08-03 成都奇恩生物科技有限公司 A kind of accurate prediction technique of pathogenic hereditary variation
WO2020006306A1 (en) * 2018-06-28 2020-01-02 Hygea Precision Medicine, Inc. Systems and methods for clinical guidance of genetic testing
US20200165680A1 (en) * 2018-11-28 2020-05-28 Bioscreening & Diagnostics Llc Method for detection of traumatic brain injury
US20210375407A1 (en) * 2017-10-06 2021-12-02 The Trustees Of Columbia University In The City Of New York Diagnostic genomic predictions based on electronic health record data

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103038635A (en) * 2010-05-11 2013-04-10 威拉赛特公司 Methods and compositions for diagnosing conditions
US20160083792A1 (en) * 2013-01-25 2016-03-24 Murdoch Childrens Research Institute An assay for quantitating the extent of methylation of a target site
WO2014199944A1 (en) * 2013-06-12 2014-12-18 公立大学法人横浜市立大学 Method for detecting intractable epilepsy accompanying severe intellectual disability and motor development retardation
US20150066378A1 (en) * 2013-08-27 2015-03-05 Tute Genomics Identifying Possible Disease-Causing Genetic Variants by Machine Learning Classification
US20160357903A1 (en) * 2013-09-20 2016-12-08 University Of Washington Through Its Center For Commercialization A framework for determining the relative effect of genetic variants
CN106191032A (en) * 2015-05-06 2016-12-07 戴勇 The Disease-causing gene model of dysnoesia disease and construction method thereof and application
US20210375407A1 (en) * 2017-10-06 2021-12-02 The Trustees Of Columbia University In The City Of New York Diagnostic genomic predictions based on electronic health record data
CN108363902A (en) * 2018-01-30 2018-08-03 成都奇恩生物科技有限公司 A kind of accurate prediction technique of pathogenic hereditary variation
WO2020006306A1 (en) * 2018-06-28 2020-01-02 Hygea Precision Medicine, Inc. Systems and methods for clinical guidance of genetic testing
US20200165680A1 (en) * 2018-11-28 2020-05-28 Bioscreening & Diagnostics Llc Method for detection of traumatic brain injury

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BALDO FEDERICO等: "Machine learning based analysis for intellectual disability in Down syndrome", HELIYON, vol. 9, no. 9, pages 19444 *
LEITAO, E等: "Systematic analysis and prediction of genes associated with monogenic disorders on human chromosome X", NATURE COMMUNICATIONS, vol. 13, no. 1, pages 6570 *
YAN WANG等: "Gene-Focused Networks Underlying Phenotypic Convergence in a Systematically Phenotyped Cohort With Heterogeneous Intellectual Disability", FRONT. BIOENG. BIOTECHNOL., vol. 8, pages 45 *

Similar Documents

Publication Publication Date Title
CN108121896B (en) Disease relation analysis method and device based on miRNA
CN109686439B (en) Data analysis method, system and storage medium for genetic disease gene detection
US20150066378A1 (en) Identifying Possible Disease-Causing Genetic Variants by Machine Learning Classification
CN110911009A (en) Clinical diagnosis aid decision-making system and medical knowledge map accumulation method
CN110021364A (en) Analysis detection system based on patients clinical symptom data and full sequencing of extron group data screening single gene inheritance disease Disease-causing gene
Jacob et al. Data mining in clinical data sets: a review
DE112014000897T5 (en) Learning health systems and procedures
CN110570905A (en) method and device for constructing omics data analysis platform and computer equipment
KR102274564B1 (en) Device for diagnosing cancer using bia data analysis
CN112270988B (en) Auxiliary diagnosis method for rare diseases
KR101067352B1 (en) System and method comprising algorithm for mode-of-action of microarray experimental data, experiment/treatment condition-specific network generation and experiment/treatment condition relation interpretation using biological network analysis, and recording media having program therefor
CN113345545B (en) Clinical data checking method and device, electronic equipment and readable storage medium
CN111243753A (en) Medical data-oriented multi-factor correlation interactive analysis method
KR101839572B1 (en) Apparatus Analyzing Disease-related Genes and Method thereof
CN113284627A (en) Medication recommendation method based on patient characterization learning
CN112863605A (en) Platform, method, computer device and medium for determining dysnoesia genes
Ketpupong et al. Applying text mining for classifying disease from symptoms
US20220148178A1 (en) Methods of assessing diseases using image classifiers
WO2022029492A1 (en) Methods of assessing breast cancer using machine learning systems
CN111971754B (en) Medical information processing device, medical information processing method, and storage medium
JP4169618B2 (en) Text information management device
JP5852902B2 (en) Gene interaction analysis system, method and program thereof
Ebrahimi et al. Analysis of Persian Bioinformatics Research with Topic Modeling
Mandava MDensNet201-IDRSRNet: Efficient cardiovascular disease prediction system using hybrid deep learning
JP2003006329A (en) System for supporting diagnosis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination