CN113782092A - Method and device for generating life prediction model and storage medium - Google Patents

Method and device for generating life prediction model and storage medium Download PDF

Info

Publication number
CN113782092A
CN113782092A CN202111087695.8A CN202111087695A CN113782092A CN 113782092 A CN113782092 A CN 113782092A CN 202111087695 A CN202111087695 A CN 202111087695A CN 113782092 A CN113782092 A CN 113782092A
Authority
CN
China
Prior art keywords
target
gene
data
sample
prediction model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111087695.8A
Other languages
Chinese (zh)
Other versions
CN113782092B (en
Inventor
刘小双
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111087695.8A priority Critical patent/CN113782092B/en
Publication of CN113782092A publication Critical patent/CN113782092A/en
Application granted granted Critical
Publication of CN113782092B publication Critical patent/CN113782092B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • G16B5/30Dynamic-time models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Physiology (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to the field of digital medical treatment, and provides a method and a device for generating a life cycle prediction model, a storage medium and computer equipment, wherein the method comprises the following steps: acquiring sample data of target samples, wherein the sample data comprises SNP data, gene expression data and real life cycle data corresponding to each target sample; acquiring a gene regulation relation among genes in a preset gene database, and generating a gene regulation network according to the gene regulation relation; constructing a target abnormal graph based on the sample data of the target sample and the gene regulation and control network; and training the initial prediction model according to the target abnormal graph to obtain a life cycle prediction model. The method and the device can more systematically know the gene information of the target sample, fully utilize the relation between different SNPs and different genes to predict the life cycle of the cancer patient, and improve the accuracy of the life cycle prediction of the cancer patient.

Description

Method and device for generating life prediction model and storage medium
Technical Field
The present application relates to the field of digital medical technology, and in particular, to a method and an apparatus for generating a lifetime prediction model, a storage medium, and a computer device.
Background
The occurrence and development of cancer are actually closely related to the genetic variation of the patient. For example, the types of variant genes differ in their effect on the individual bodily functions of cancer patients, and in their effect on the survival time of cancer patients. Therefore, the survival of cancer patients can be predicted by studying genetic variation.
In the prior art, the survival time of the cancer patient is predicted by using SNP data, but the accuracy of the prediction result is low when the survival time of the cancer patient is predicted by using the method in the application process. Therefore, how to improve the accuracy of survival prediction of cancer patients becomes a technical problem to be solved urgently in the field.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for generating a lifetime prediction model, a storage medium, and a computer device, which can understand gene information of a target sample more systematically, and sufficiently use relationships between different SNPs and different genes to predict the lifetime of a cancer patient, thereby improving the accuracy of the lifetime prediction of the cancer patient.
According to one aspect of the present application, there is provided a method for generating a lifetime prediction model, including:
acquiring sample data of target samples, wherein the sample data comprises SNP data, gene expression data and real life cycle data corresponding to each target sample;
acquiring a gene regulation relation among genes in a preset gene database, and generating a gene regulation network according to the gene regulation relation;
constructing a target abnormal graph based on the sample data of the target sample and the gene regulation and control network;
and training the initial prediction model according to the target abnormal graph to obtain a life cycle prediction model.
Optionally, the target samples comprise training samples and test samples; the training of the initial prediction model according to the target abnormal graph to obtain the life cycle prediction model comprises the following steps:
inputting the target abnormal pattern into a feature recognition layer of an initial prediction model to obtain target sample node features, wherein the target sample node features comprise training sample node features;
inputting the training sample node characteristics into a full-connection layer of the initial prediction model to obtain training sample life prediction characteristics, and identifying the training sample life prediction characteristics through a preset activation function to obtain first life prediction data corresponding to each training sample;
calculating a model loss value through a preset cross entropy function based on the first life cycle prediction data and real life cycle data corresponding to the training sample;
adjusting model parameters of the initial prediction model according to the model loss value, obtaining second life cycle prediction data corresponding to each training sample through a full-link layer corresponding to the adjusted initial prediction model and the preset activation function, and calculating the model loss value again;
and when the model loss value is smaller than a preset loss threshold value, obtaining a life cycle prediction model.
Optionally, the constructing a target heterogeneous graph based on the sample data of the target sample and the gene regulation network comprises:
constructing a second characteristic edge of a target abnormal picture according to each target sample and the corresponding SNP, and determining the weight corresponding to the second characteristic edge based on the variation type of the SNP;
determining a gene in the gene regulation network corresponding to the SNP by using a preset annotation method and the SNP data, constructing a third characteristic edge of the target abnormal graph according to the SNP and the gene, and determining the weight corresponding to the third characteristic edge based on the position relation between the SNP and the gene;
determining any gene in the gene regulation network as a target node gene, searching the gene expression data corresponding to the target node gene, and respectively calculating the relative expression ratio of the gene expression data corresponding to each target sample; and when the relative expression ratio is greater than or equal to the first expression threshold value or the relative expression ratio is less than or equal to the second expression threshold value, constructing a fourth characteristic edge of the target abnormal pattern according to the target node gene and the target sample, and taking the relative expression ratio as the weight corresponding to the fourth characteristic edge.
Optionally, the respectively calculating the relative expression ratio of the gene expression data corresponding to each target sample includes:
calculating the average expression data of the target node genes in the target samples based on the gene expression data corresponding to the target node genes, and respectively calculating the ratio of the gene expression data corresponding to each target sample to the average expression data to obtain the relative expression ratio corresponding to each target sample.
Optionally, before determining the gene in the gene regulatory network corresponding to the SNP by using a preset annotation method and the SNP data, the method further includes:
and carrying out duplication removal processing on the SNP data in the sample data of different target samples to remove duplicate SNP data.
Optionally, the obtaining a gene regulation relationship between genes in a preset gene database, and generating a gene regulation network according to the gene regulation relationship includes:
acquiring incidence relation data between proteins corresponding to genes in a preset gene database, determining that regulation and control relations exist between the genes when the incidence relation data are larger than a preset incidence threshold value, constructing a first characteristic edge between the genes, and generating the gene regulation and control network by taking the incidence relation data as the weight corresponding to the first characteristic edge.
Optionally, after obtaining the lifetime prediction model, the method further includes:
inputting the target abnormal pattern into a feature recognition layer of the life cycle prediction model to obtain target sample node features, wherein the target sample node features comprise test sample node features;
calculating a life cycle prediction deviation value of the life cycle prediction model based on the test sample node characteristics and the real life cycle data corresponding to the test sample;
and if the life cycle prediction deviation value meets a preset deviation value condition, determining that the life cycle prediction model is successfully trained.
According to another aspect of the present application, there is provided an apparatus for generating a lifetime prediction model, including:
the system comprises a sample data acquisition module, a data processing module and a data processing module, wherein the sample data acquisition module is used for acquiring sample data of target samples, and the sample data comprises SNP data, gene expression data and real life cycle data corresponding to each target sample;
the gene regulation and control network generation module is used for acquiring gene regulation and control relations among genes in a preset gene database and generating a gene regulation and control network according to the gene regulation and control relations;
the abnormal picture construction module is used for constructing a target abnormal picture based on the sample data of the target sample and the gene regulation and control network;
and the model construction module is used for training the initial prediction model according to the target abnormal graph so as to obtain the life cycle prediction model.
Optionally, the target samples comprise training samples and test samples; the model building module comprises:
the node feature acquisition unit is used for inputting the target abnormal pattern into a feature recognition layer of an initial prediction model to obtain target sample node features, wherein the target sample node features comprise training sample node features;
the prediction data calculation unit is used for inputting the node characteristics of the training samples into a full-connection layer of the initial prediction model to obtain life prediction characteristics of the training samples, and identifying the life prediction characteristics of the training samples through a preset activation function to obtain first life prediction data corresponding to each training sample;
the model loss value calculation unit is used for calculating a model loss value through a preset cross entropy function based on the first life cycle prediction data and the real life cycle data corresponding to the training sample;
the model adjusting unit is used for adjusting model parameters of the initial prediction model according to the model loss value, obtaining second life cycle prediction data corresponding to each training sample through the full-link layer corresponding to the adjusted initial prediction model and the preset activation function, and calculating the model loss value again;
and the model determining unit is used for obtaining a life cycle prediction model when the model loss value is smaller than a preset loss threshold value.
Optionally, the heterogeneous graph building module includes:
a second feature edge construction unit, configured to construct a second feature edge of a target heterogeneous graph according to each target sample and the corresponding SNP, and determine a weight corresponding to the second feature edge based on a variation type of the SNP;
a third feature edge construction unit, configured to determine a gene in the gene regulation network corresponding to the SNP by using a preset annotation method and the SNP data, construct a third feature edge of the target profile according to the SNP and the gene, and determine a weight corresponding to the third feature edge based on a positional relationship between the SNP and the gene;
a fourth characteristic edge construction unit, configured to determine any gene in the gene regulation and control network as a target node gene, search the gene expression data corresponding to the target node gene, and calculate a relative expression ratio of the gene expression data corresponding to each target sample; and when the relative expression ratio is greater than or equal to the first expression threshold value or the relative expression ratio is less than or equal to the second expression threshold value, constructing a fourth characteristic edge of the target abnormal pattern according to the target node gene and the target sample, and taking the relative expression ratio as the weight corresponding to the fourth characteristic edge.
Optionally, the fourth feature edge constructing unit is configured to:
calculating the average expression data of the target node genes in the target samples based on the gene expression data corresponding to the target node genes, and respectively calculating the ratio of the gene expression data corresponding to each target sample to the average expression data to obtain the relative expression ratio corresponding to each target sample.
Optionally, the apparatus further comprises:
and the duplication removing module is used for carrying out duplication removing treatment on the SNP data in the sample data of different target samples to remove repeated SNP data before determining the genes in the gene regulation network corresponding to the SNP by using a preset annotation method and the SNP data.
Optionally, the gene regulatory network generation module is configured to:
acquiring incidence relation data between proteins corresponding to genes in a preset gene database, determining that regulation and control relations exist between the genes when the incidence relation data are larger than a preset incidence threshold value, constructing a first characteristic edge between the genes, and generating the gene regulation and control network by taking the incidence relation data as the weight corresponding to the first characteristic edge.
Optionally, the apparatus further comprises:
the test module is used for inputting the target abnormal graph into a feature identification layer of the life prediction model after the life prediction model is obtained to obtain target sample node features, wherein the target sample node features comprise test sample node features;
the prediction deviation calculation module is used for calculating the life cycle prediction deviation value of the life cycle prediction model based on the test sample node characteristics and the real life cycle data corresponding to the test sample;
the test module is further used for determining that the life prediction model is successfully trained if the life prediction deviation value meets a preset deviation value condition.
According to yet another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method of generating a lifetime prediction model.
According to yet another aspect of the present application, there is provided a computer device, including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, wherein the processor implements the method for generating a lifetime prediction model described above when executing the program.
By means of the technical scheme, the method and the device for generating the life cycle prediction model, the storage medium and the computer device, sample data such as SNP data, gene expression data and real life cycle data of the target sample are obtained, the gene regulation and control relation among different genes of a human body is obtained from a preset gene database, a corresponding gene regulation and control network is generated, then a target abnormal pattern is constructed through the obtained sample data such as the SNP data, the gene expression data and the real life cycle data corresponding to the target sample and the generated gene regulation and control network, then the initial prediction model is trained through the constructed target abnormal pattern, and the life cycle prediction model is finally obtained through carrying out iterative optimization on relevant parameters in the initial prediction model. According to the embodiment of the application, the target abnormal pattern containing the patient, the gene and the SNP node is constructed, the graph neural network is used for learning the neighbor information of each node in the target abnormal pattern, the gene information of the target sample can be known more systematically, the prediction of the life cycle of the cancer patient is carried out by fully utilizing the relation between different SNPs and different genes, and the accuracy of the prediction of the life cycle of the cancer patient is improved.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart illustrating a method for generating a lifetime prediction model according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram illustrating a generation apparatus of a lifetime prediction model according to an embodiment of the present application.
Detailed Description
The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
In this embodiment, a method for generating a lifetime prediction model is provided, as shown in fig. 1, the method includes:
step 101, obtaining sample data of target samples, wherein the sample data comprises SNP data, gene expression data and real life cycle data corresponding to each target sample;
the embodiment of the invention is mainly suitable for a scene of predicting the life cycle of a cancer patient, and the method for generating the life cycle prediction model provided by the embodiment of the invention can be particularly applied to one side of a server. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. In the embodiment of the present application, sample data such as SNP (Single Nucleotide Polymorphism) data, gene expression data, and actual lifetime data of a target sample is obtained. Among them, SNP data mainly refers to DNA sequence polymorphisms caused by variations of a single nucleotide at the genome level. In addition to SNP data, gene expression data can also reflect abnormalities in the presence of genes. Gene expression data, which reflect the abundance of mRNA, a gene transcript, measured directly or indirectly, in a cell, can be used to analyze which genes whose expression has been altered, what associations there are between genes, and how gene activity is affected under different conditions, and can be affected by SNPs. The actual survival data corresponding to the target sample can be represented by different values, for example, the survival of the cancer patient is divided into two types, the first type is the long survival (more than 5 years), and the corresponding actual survival data can be 1; the second type is short lifetime (5 years or less), and the corresponding real lifetime data may be 0. In addition, some target samples may not have SNPs and therefore do not have corresponding SNP data, and may be directly filled with data 0 when acquiring target sample data.
102, acquiring a gene regulation relation among genes in a preset gene database, and generating a gene regulation network according to the gene regulation relation;
in this embodiment, the gene regulation and control relationship between different genes of the human body is obtained from the preset gene database, and an edge is formed between the genes having the regulation and control relationship, and so on, thereby generating a corresponding gene regulation and control network. Here, the predetermined gene database may be a dip (database of interactive proteins) database, a bind (biological interaction network database) database, or the like.
103, constructing a target abnormal graph based on the sample data of the target sample and the gene regulation and control network;
in this embodiment, a target heteromorphic graph is constructed by acquiring sample data such as SNP data, gene expression data, and real life cycle data corresponding to a target sample, and a generated gene regulation network. Specifically, the target differential map may be constructed with SNPs, genes in a gene regulatory network, and a target sample as nodes in the target differential map.
And 104, training an initial prediction model according to the target abnormal graph to obtain a life cycle prediction model.
In the embodiment, an initial prediction model is trained through the constructed target abnormal graph, and a life cycle prediction model is finally obtained through iterative optimization of relevant parameters in the initial prediction model.
By applying the technical scheme of the embodiment, sample data such as SNP data, gene expression data and real life time data of a target sample are obtained, a gene regulation and control relation among different genes of a human body is obtained from a preset gene database, a corresponding gene regulation and control network is generated, then, a target abnormal pattern is constructed through the obtained sample data such as the SNP data, the gene expression data and the real life time data corresponding to the target sample and the generated gene regulation and control network, then, an initial prediction model is trained through the constructed target abnormal pattern, and a life time prediction model is finally obtained through carrying out iterative optimization on related parameters in the initial prediction model. According to the embodiment of the application, the target abnormal pattern containing the patient, the gene and the SNP node is constructed, the graph neural network is used for learning the neighbor information of each node in the target abnormal pattern, the gene information of the target sample can be known more systematically, the prediction of the life cycle of the cancer patient is carried out by fully utilizing the relation between different SNPs and different genes, and the accuracy of the prediction of the life cycle of the cancer patient is improved.
Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully illustrate the specific implementation process of the embodiment, another method for generating a lifetime prediction model is provided, which includes:
step 201, obtaining sample data of target samples, wherein the sample data comprises SNP data, gene expression data and real life cycle data corresponding to each target sample;
202, acquiring incidence relation data between proteins corresponding to genes in a preset gene database, determining that regulation and control relations exist between the genes when the incidence relation data are larger than a preset incidence threshold value, constructing a first characteristic edge between the genes, and generating a gene regulation and control network by taking the incidence relation data as a weight corresponding to the first characteristic edge;
in this embodiment, when sample data such as SNP data, gene expression data, and real lifetime data of a target sample are acquired, a gene regulation network may also be constructed. Specifically, the association relationship data between the proteins is obtained from a preset gene database, the association relationship data can reflect the similarity degree between different proteins, when the obtained association relationship data is larger than a preset association threshold, it is indicated that the similarity degree between the two proteins is higher, it is determined that a regulation relationship exists between the genes corresponding to the two proteins, and an edge is constructed between the genes corresponding to the two proteins, and the edge can be referred to as a first characteristic edge. For example, if the association relationship data between the protein a 'corresponding to the gene a and the protein B' corresponding to the gene B in the preset gene database is greater than the preset association threshold, a regulation relationship exists between the gene a and the gene B by default, and an edge in the gene regulation network can be formed between the gene a and the gene B. In addition, the corresponding incidence relation data can be used as the weight mark of the first characteristic edge in the gene regulation network.
Step 203, constructing a target abnormal map based on the sample data of the target sample and the gene regulation and control network;
step 204, inputting the target abnormal pattern into a feature recognition layer of an initial prediction model to obtain target sample node features, wherein the target sample node features comprise training sample node features;
in this embodiment, the target samples include training samples and test samples. And further constructing a target abnormal graph through sample data corresponding to the target sample and the constructed gene regulation and control network, wherein the target abnormal graph can comprise target sample nodes, SNP nodes and gene nodes. And then, inputting the target abnormal graph into a feature recognition layer of the initial prediction model, and obtaining node features corresponding to different nodes through the feature recognition layer. Since the target sample when constructing the target abnormal pattern includes the training sample and the test sample, the obtained target sample feature may specifically include a training sample node feature.
Step 205, inputting the node characteristics of the training samples into a full-connection layer of the initial prediction model to obtain life prediction characteristics of the training samples, and identifying the life prediction characteristics of the training samples through a preset activation function to obtain first life prediction data corresponding to each training sample;
in this embodiment, the initial predictive model may include a fully connected layer in addition to the feature recognition layer. After the training sample node characteristics are obtained, the training sample node characteristics can be further input into a full-link layer of the initial prediction model, so that corresponding training sample life cycle prediction characteristics are obtained, and then the obtained training sample life cycle prediction characteristics are identified through a preset activation function, so that corresponding first life cycle prediction data are obtained. The preset activation function may be a softmax function, and the probability that each training sample corresponds to different life cycles, that is, the first life cycle prediction data, may be calculated through the softmax function. For example, for the training sample 1, if the work that the softmax function needs to do is two-class work, and the corresponding life cycle may be 5 years or more and 5 years or less, the first life cycle prediction data obtained after identifying the training sample life cycle prediction features of the training sample by the softmax function may be probabilities that the training sample 1 has a corresponding life cycle of 5 years or more and 5 years or less, and the sum of the probabilities is 1. According to the embodiment of the application, the prediction results of different life cycles corresponding to the training samples can be displayed in a probability mode through the preset activation function, the display is more visual, and the method is beneficial to helping doctors to judge the results.
Step 206, calculating a model loss value through a preset cross entropy function based on the first life cycle prediction data and the real life cycle data corresponding to the training sample;
in this embodiment, after the first life cycle prediction data corresponding to each training sample is obtained, the model loss value of the initial prediction model is calculated by using the preset cross entropy function, the first life cycle prediction data of each training sample, and the corresponding real life cycle data.
Step 207, adjusting the model parameters of the initial prediction model according to the model loss value, obtaining second life cycle prediction data corresponding to each training sample through the full connection layer corresponding to the adjusted initial prediction model and the preset activation function, and calculating the model loss value again; when the model loss value is smaller than a preset loss threshold value, obtaining a life cycle prediction model;
in this embodiment, the model parameters preset in the initial prediction model may be adjusted by using the model loss value to obtain an adjusted initial prediction model, and the second lifetime prediction data corresponding to each training sample is calculated through the full-link layer of the adjusted initial prediction model and the same preset activation function. And then, calculating the model loss value again on the basis of the second life cycle prediction data obtained by calculation and the real life cycle data of the training sample. The above process of adjusting the model parameters in the initial prediction model is repeated until the calculated model loss value is smaller than the preset loss threshold value, which indicates that the model loss has reached an acceptable level, and at this time, the model parameters corresponding to the initial prediction model can be used as final model parameters, and the corresponding initial prediction model is a lifetime prediction model. According to the embodiment of the application, the model loss value is calculated, and the model parameters are continuously adjusted through the model loss value, so that the final output result of the life cycle prediction model is closer to the reality, and the accuracy of subsequent prediction is favorably improved.
Step 208, inputting the target abnormal graph into a feature identification layer of the life prediction model to obtain target sample node features, wherein the target sample node features comprise test sample node features; calculating a life cycle prediction deviation value of the life cycle prediction model based on the test sample node characteristics and the real life cycle data corresponding to the test sample; and if the life cycle prediction deviation value meets a preset deviation value condition, determining that the life cycle prediction model is successfully trained.
In this embodiment, after the lifetime prediction model is constructed, the prediction performance of the constructed lifetime prediction model may be tested by using the test sample. Specifically, the constructed target heteromorphic graph may be input into a feature identification layer of the lifetime prediction model, and node features corresponding to each node are obtained, where the node features include target sample node features. For the target sample node features, since the test sample is included in the target sample, the target sample node features may include the test sample node features, then inputting the node characteristics of the test sample into a full-connection layer of the life prediction model to obtain the life prediction characteristics of the test sample, and identifying the life cycle prediction characteristics of the test samples through a preset activation function to obtain life cycle test data corresponding to each test sample, and then based on the life cycle test data and the real life cycle data corresponding to the test samples, calculating the life prediction deviation value of the life prediction model through a preset cross entropy function, further judging whether the life prediction deviation value meets the preset deviation value condition or not, if the prediction model meets the requirements, the life cycle prediction model has a good life cycle prediction effect, and the model is trained successfully.
In this embodiment of the present application, optionally, the step 203 of "constructing a target abnormal pattern based on the sample data of the target sample and the gene regulatory network" includes: constructing a second characteristic edge of a target abnormal picture according to each target sample and the corresponding SNP, and determining the weight corresponding to the second characteristic edge based on the variation type of the SNP; determining a gene in the gene regulation network corresponding to the SNP by using a preset annotation method and the SNP data, constructing a third characteristic edge of the target abnormal graph according to the SNP and the gene, and determining the weight corresponding to the third characteristic edge based on the position relation between the SNP and the gene; determining any gene in the gene regulation network as a target node gene, searching the gene expression data corresponding to the target node gene, respectively calculating the relative expression ratio of the gene expression data corresponding to each target sample, when the relative expression ratio is greater than or equal to the first expression threshold value, or the relative expression ratio is less than or equal to the second expression threshold value, constructing a fourth characteristic edge of the target heteromorphic graph according to the target node gene and the target sample, and taking the relative expression ratio as the weight corresponding to the fourth characteristic edge.
In this embodiment, the process of constructing the target differential pattern may specifically be a process of constructing the second feature edge, the third feature edge, and the fourth feature edge. The second characteristic edge in the target differential map is constructed on the basis of the target sample and the SNP, and the second characteristic edge between the target sample and the SNP is constructed aiming at each target sample and the corresponding SNP, so that the second characteristic edge between the target sample and the corresponding SNP does not need to be constructed because some target samples do not have the SNP or the corresponding SNP data is 0. The second characteristic edge may also have a corresponding weight, which may be determined in particular according to the type of variation of the SNP. For example, if SNP1 data is present in the sample data of the target sample a and is not 0, a second feature edge between the target sample a and the SNP1 is constructed, and when the mutation type of the SNP1 is a homozygous mutation, the second feature edge corresponds to a weight of 2, and when the mutation type of the SNP1 is a heterozygous mutation, the second feature edge corresponds to a weight of 1.
The third characteristic edge in the target abnormal picture is constructed on the basis of genes and SNP. And finding out the corresponding gene of each SNP in the gene regulation network by using a preset annotation method and SNP data, so that a third characteristic edge is constructed between the SNP and the corresponding gene. Different target samples may contain the same SNP data, and when the same SNP data exists, the third characteristic edge does not need to be repeatedly constructed. The third feature edge may also have a corresponding weight, which may be determined by the positional relationship between the SNP and the corresponding gene. For example, when a SNP is located on an exon of the gene, the weight of the third characteristic edge between the SNP and the gene is 3; the weight of the third characteristic edge between an SNP and a gene is 2 when the SNP is located on an intron of the gene, and the weight of the third characteristic edge between an SNP and a gene is 1 when the SNP is located between two genes.
And constructing a fourth characteristic edge in the target abnormal pattern based on the gene and the target sample. And selecting any gene from the gene regulation network as a target node gene, and searching gene expression data corresponding to the target node gene from all the gene expression data. For example, if gene y is selected as a target node gene from a gene regulatory network, then y is obtained from gene expression data1To ynWherein, y1Correspond to target sample 1Gene expression data of gene y, ynFor the gene expression data of the gene y corresponding to the target sample n, all the target samples are constituted from the target sample 1 to the target sample n. After finding the corresponding gene expression data, the relative expression ratios of the gene expression data can be respectively calculated, and the expression level of the gene expression quantity of the gene in the target sample relative to the average gene expression quantity of the gene in all the target samples can be seen through the relative expression ratios. When the relative expression ratio is greater than or equal to the first expression threshold, it can be said that the gene expression level of the gene in the target sample is at an upper level; when the relative expression ratio is smaller than or equal to the second expression threshold, it can be said that the gene expression level of the gene in the target sample is at a lower level. And when the gene expression level of the target sample is in a higher level or a lower level, a fourth characteristic edge is constructed between the gene in the gene regulation network and the corresponding target sample, and the relative expression ratio is used as the corresponding weight of the fourth characteristic edge. And after traversing the relative expression ratio of each target sample corresponding to the selected target node gene once, replacing the target node gene in the gene regulation and control network, and repeating the process until all the genes in the gene regulation and control network are traversed once, thereby constructing all the fourth characteristic edges of the target special-pattern.
In this embodiment of the present application, optionally, the "calculating the relative expression ratio of the gene expression data corresponding to each target sample" in the above steps includes: calculating the average expression data of the target node genes in the target samples based on the gene expression data corresponding to the target node genes, and respectively calculating the ratio of the gene expression data corresponding to each target sample to the average expression data to obtain the relative expression ratio corresponding to each target sample.
In this example, the relative expression ratio can be obtained specifically by: firstly, based on all gene expression data corresponding to the selected target node genes, calculating the gene expression data in all target samplesAnd calculating the ratio between the gene expression data corresponding to each target sample and the average expression data respectively according to the average expression data, wherein the calculated ratio is the relative expression ratio corresponding to each target sample. For example, the selected target node gene is gene y, the target samples have n numbers, namely target sample 1 to target sample n, and the gene expression data corresponding to gene y in these target samples is y1To ynThen the average expression data of gene y in all target samples is
Figure BDA0003266133200000141
Figure BDA0003266133200000142
The relative expression ratio of the gene y corresponding to each target sample is
Figure BDA0003266133200000143
In this embodiment of the present application, optionally, before "determining a gene in the gene regulatory network corresponding to the SNP by using a preset annotation method and the SNP data", the method further includes: and carrying out duplication removal processing on the SNP data in the sample data of different target samples to remove duplicate SNP data.
In this embodiment, before the third characteristic edge of the target heterogeneous composition is constructed, the SNP data corresponding to all the target samples may be collected together, and the repeated SNP data existing therein is removed, so that the remaining SNP data are different pairwise, thereby effectively avoiding the situation of repeated operation when the third characteristic edge of the target heterogeneous composition is constructed, and improving the efficiency of constructing the third characteristic edge.
Further, as a specific implementation of the method in fig. 1, an embodiment of the present application provides an apparatus for generating a lifetime prediction model, as shown in fig. 2, the apparatus includes:
the system comprises a sample data acquisition module, a data processing module and a data processing module, wherein the sample data acquisition module is used for acquiring sample data of target samples, and the sample data comprises SNP data, gene expression data and real life cycle data corresponding to each target sample;
the gene regulation and control network generation module is used for acquiring gene regulation and control relations among genes in a preset gene database and generating a gene regulation and control network according to the gene regulation and control relations;
the abnormal picture construction module is used for constructing a target abnormal picture based on the sample data of the target sample and the gene regulation and control network;
and the model construction module is used for training the initial prediction model according to the target abnormal graph so as to obtain the life cycle prediction model.
Optionally, the target samples comprise training samples and test samples; the model building module comprises:
the node feature acquisition unit is used for inputting the target abnormal pattern into a feature recognition layer of an initial prediction model to obtain target sample node features, wherein the target sample node features comprise training sample node features;
the prediction data calculation unit is used for inputting the node characteristics of the training samples into a full-connection layer of the initial prediction model to obtain life prediction characteristics of the training samples, and identifying the life prediction characteristics of the training samples through a preset activation function to obtain first life prediction data corresponding to each training sample;
the model loss value calculation unit is used for calculating a model loss value through a preset cross entropy function based on the first life cycle prediction data and the real life cycle data corresponding to the training sample;
the model adjusting unit is used for adjusting model parameters of the initial prediction model according to the model loss value, obtaining second life cycle prediction data corresponding to each training sample through the full-link layer corresponding to the adjusted initial prediction model and the preset activation function, and calculating the model loss value again;
and the model determining unit is used for obtaining a life cycle prediction model when the model loss value is smaller than a preset loss threshold value.
Optionally, the heterogeneous graph building module includes:
a second feature edge construction unit, configured to construct a second feature edge of a target heterogeneous graph according to each target sample and the corresponding SNP, and determine a weight corresponding to the second feature edge based on a variation type of the SNP;
a third feature edge construction unit, configured to determine a gene in the gene regulation network corresponding to the SNP by using a preset annotation method and the SNP data, construct a third feature edge of the target profile according to the SNP and the gene, and determine a weight corresponding to the third feature edge based on a positional relationship between the SNP and the gene;
a fourth characteristic edge construction unit, configured to determine any gene in the gene regulation and control network as a target node gene, search the gene expression data corresponding to the target node gene, and calculate a relative expression ratio of the gene expression data corresponding to each target sample; and when the relative expression ratio is greater than or equal to the first expression threshold value or the relative expression ratio is less than or equal to the second expression threshold value, constructing a fourth characteristic edge of the target abnormal pattern according to the target node gene and the target sample, and taking the relative expression ratio as the weight corresponding to the fourth characteristic edge.
Optionally, the fourth feature edge constructing unit is configured to:
calculating the average expression data of the target node genes in the target samples based on the gene expression data corresponding to the target node genes, and respectively calculating the ratio of the gene expression data corresponding to each target sample to the average expression data to obtain the relative expression ratio corresponding to each target sample.
Optionally, the apparatus further comprises:
and the duplication removing module is used for carrying out duplication removing treatment on the SNP data in the sample data of different target samples to remove repeated SNP data before determining the genes in the gene regulation network corresponding to the SNP by using a preset annotation method and the SNP data.
Optionally, the gene regulatory network generation module is configured to:
acquiring incidence relation data between proteins corresponding to genes in a preset gene database, determining that regulation and control relations exist between the genes when the incidence relation data are larger than a preset incidence threshold value, constructing a first characteristic edge between the genes, and generating the gene regulation and control network by taking the incidence relation data as the weight corresponding to the first characteristic edge.
Optionally, the apparatus further comprises:
the test module is used for inputting the target abnormal graph into a feature identification layer of the life prediction model after the life prediction model is obtained to obtain target sample node features, wherein the target sample node features comprise test sample node features;
the prediction deviation calculation module is used for calculating the life cycle prediction deviation value of the life cycle prediction model based on the test sample node characteristics and the real life cycle data corresponding to the test sample;
the test module is further used for determining that the life prediction model is successfully trained if the life prediction deviation value meets a preset deviation value condition.
It should be noted that other corresponding descriptions of the functional units related to the apparatus for generating a lifetime prediction model provided in the embodiment of the present application may refer to the corresponding descriptions in the method in fig. 1, and are not described herein again.
Based on the method shown in fig. 1, correspondingly, the present application further provides a storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method for generating the lifetime prediction model shown in fig. 1.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the implementation scenarios of the present application.
Based on the method shown in fig. 1 and the virtual device embodiment shown in fig. 2, in order to achieve the above object, the present application further provides a computer device, which may specifically be a personal computer, a server, a network device, and the like, where the computer device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the method for generating a lifetime prediction model as described above and shown in fig. 1.
Optionally, the computer device may also include a user interface, a network interface, a camera, Radio Frequency (RF) circuitry, sensors, audio circuitry, a WI-FI module, and so forth. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., a bluetooth interface, WI-FI interface), etc.
It will be appreciated by those skilled in the art that the present embodiment provides a computer device architecture that is not limiting of the computer device, and that may include more or fewer components, or some components in combination, or a different arrangement of components.
The storage medium may further include an operating system and a network communication module. An operating system is a program that manages and maintains the hardware and software resources of a computer device, supporting the operation of information handling programs, as well as other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and other hardware and software in the entity device.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. Acquiring sample data such as SNP data, gene expression data and real life time data of a target sample, acquiring a gene regulation and control relation among different genes of a human body from a preset gene database, generating a corresponding gene regulation and control network, then constructing a target abnormal pattern through the acquired sample data such as the SNP data, the gene expression data and the real life time data corresponding to the target sample and the generated gene regulation and control network, then training an initial prediction model through the constructed target abnormal pattern, and finally obtaining a life time prediction model through carrying out iterative optimization on related parameters in the initial prediction model. According to the embodiment of the application, the target abnormal pattern containing the patient, the gene and the SNP node is constructed, the graph neural network is used for learning the neighbor information of each node in the target abnormal pattern, the gene information of the target sample can be known more systematically, the prediction of the life cycle of the cancer patient is carried out by fully utilizing the relation between different SNPs and different genes, and the accuracy of the prediction of the life cycle of the cancer patient is improved.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims (10)

1. A method for generating a lifetime prediction model, comprising:
acquiring sample data of target samples, wherein the sample data comprises SNP data, gene expression data and real life cycle data corresponding to each target sample;
acquiring a gene regulation relation among genes in a preset gene database, and generating a gene regulation network according to the gene regulation relation;
constructing a target abnormal graph based on the sample data of the target sample and the gene regulation and control network;
and training the initial prediction model according to the target abnormal graph to obtain a life cycle prediction model.
2. The method of claim 1, wherein the target samples comprise training samples and test samples; the training of the initial prediction model according to the target abnormal graph to obtain the life cycle prediction model comprises the following steps:
inputting the target abnormal pattern into a feature recognition layer of an initial prediction model to obtain target sample node features, wherein the target sample node features comprise training sample node features;
inputting the training sample node characteristics into a full-connection layer of the initial prediction model to obtain training sample life prediction characteristics, and identifying the training sample life prediction characteristics through a preset activation function to obtain first life prediction data corresponding to each training sample;
calculating a model loss value through a preset cross entropy function based on the first life cycle prediction data and real life cycle data corresponding to the training sample;
adjusting model parameters of the initial prediction model according to the model loss value, obtaining second life cycle prediction data corresponding to each training sample through a full-link layer corresponding to the adjusted initial prediction model and the preset activation function, and calculating the model loss value again;
and when the model loss value is smaller than a preset loss threshold value, obtaining a life cycle prediction model.
3. The method of claim 1, wherein constructing a target heterogeneous graph based on the sample data of the target sample and the gene regulatory network comprises:
constructing a second characteristic edge of a target abnormal picture according to each target sample and the corresponding SNP, and determining the weight corresponding to the second characteristic edge based on the variation type of the SNP;
determining a gene in the gene regulation network corresponding to the SNP by using a preset annotation method and the SNP data, constructing a third characteristic edge of the target abnormal graph according to the SNP and the gene, and determining the weight corresponding to the third characteristic edge based on the position relation between the SNP and the gene;
determining any gene in the gene regulation network as a target node gene, searching the gene expression data corresponding to the target node gene, respectively calculating the relative expression ratio of the gene expression data corresponding to each target sample, when the relative expression ratio is greater than or equal to the first expression threshold value, or the relative expression ratio is less than or equal to the second expression threshold value, constructing a fourth characteristic edge of the target heteromorphic graph according to the target node gene and the target sample, and taking the relative expression ratio as the weight corresponding to the fourth characteristic edge.
4. The method of claim 3, wherein the calculating the relative expression ratio of the gene expression data corresponding to each of the target samples comprises:
calculating the average expression data of the target node genes in the target samples based on the gene expression data corresponding to the target node genes, and respectively calculating the ratio of the gene expression data corresponding to each target sample to the average expression data to obtain the relative expression ratio corresponding to each target sample.
5. The method of claim 3, wherein prior to determining the gene in the gene regulatory network corresponding to the SNP using the predetermined annotation method and the SNP data, the method further comprises:
and carrying out duplication removal processing on the SNP data in the sample data of different target samples to remove duplicate SNP data.
6. The method of claim 1, wherein obtaining the gene regulation relationship between the genes in the predetermined gene database and generating the gene regulation network according to the gene regulation relationship comprises:
acquiring incidence relation data between proteins corresponding to genes in a preset gene database, determining that regulation and control relations exist between the genes when the incidence relation data are larger than a preset incidence threshold value, constructing a first characteristic edge between the genes, and generating the gene regulation and control network by taking the incidence relation data as the weight corresponding to the first characteristic edge.
7. The method of claim 2, wherein after obtaining the life prediction model, the method further comprises:
inputting the target abnormal pattern into a feature recognition layer of the life cycle prediction model to obtain target sample node features, wherein the target sample node features comprise test sample node features;
calculating a life cycle prediction deviation value of the life cycle prediction model based on the test sample node characteristics and the real life cycle data corresponding to the test sample;
and if the life cycle prediction deviation value meets a preset deviation value condition, determining that the life cycle prediction model is successfully trained.
8. An apparatus for generating a lifetime prediction model, comprising:
the system comprises a sample data acquisition module, a data processing module and a data processing module, wherein the sample data acquisition module is used for acquiring sample data of target samples, and the sample data comprises SNP data, gene expression data and real life cycle data corresponding to each target sample;
the gene regulation and control network generation module is used for acquiring gene regulation and control relations among genes in a preset gene database and generating a gene regulation and control network according to the gene regulation and control relations;
the abnormal picture construction module is used for constructing a target abnormal picture based on the sample data of the target sample and the gene regulation and control network;
and the model construction module is used for training the initial prediction model according to the target abnormal graph so as to obtain the life cycle prediction model.
9. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method of any of claims 1 to 7.
10. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 7 when executing the computer program.
CN202111087695.8A 2021-09-16 2021-09-16 Method and device for generating lifetime prediction model and storage medium Active CN113782092B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111087695.8A CN113782092B (en) 2021-09-16 2021-09-16 Method and device for generating lifetime prediction model and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111087695.8A CN113782092B (en) 2021-09-16 2021-09-16 Method and device for generating lifetime prediction model and storage medium

Publications (2)

Publication Number Publication Date
CN113782092A true CN113782092A (en) 2021-12-10
CN113782092B CN113782092B (en) 2023-06-02

Family

ID=78851582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111087695.8A Active CN113782092B (en) 2021-09-16 2021-09-16 Method and device for generating lifetime prediction model and storage medium

Country Status (1)

Country Link
CN (1) CN113782092B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994652A (en) * 2023-09-22 2023-11-03 苏州元脑智能科技有限公司 Information prediction method and device based on neural network and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197432A (en) * 2017-11-29 2018-06-22 东北电力大学 A kind of gene regulatory network reconstructing method based on gene expression data
CN109523415A (en) * 2018-11-14 2019-03-26 南京邮电大学 Heterogeneous Information cyberrelationship prediction meanss
CN110188263A (en) * 2019-05-29 2019-08-30 国网山东省电力公司电力科学研究院 It is a kind of towards isomery when away from scientific research hotspot prediction method and system
CN112201346A (en) * 2020-10-12 2021-01-08 哈尔滨工业大学(深圳) Cancer survival prediction method, apparatus, computing device and computer-readable storage medium
CN112259180A (en) * 2020-10-21 2021-01-22 平安科技(深圳)有限公司 Disease prediction method based on heterogeneous medical knowledge graph and related equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197432A (en) * 2017-11-29 2018-06-22 东北电力大学 A kind of gene regulatory network reconstructing method based on gene expression data
CN109523415A (en) * 2018-11-14 2019-03-26 南京邮电大学 Heterogeneous Information cyberrelationship prediction meanss
CN110188263A (en) * 2019-05-29 2019-08-30 国网山东省电力公司电力科学研究院 It is a kind of towards isomery when away from scientific research hotspot prediction method and system
CN112201346A (en) * 2020-10-12 2021-01-08 哈尔滨工业大学(深圳) Cancer survival prediction method, apparatus, computing device and computer-readable storage medium
CN112259180A (en) * 2020-10-21 2021-01-22 平安科技(深圳)有限公司 Disease prediction method based on heterogeneous medical knowledge graph and related equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘文斌 等: "基于个性化网络标志物的药物推荐方法研究", 《电子与信息学报》, vol. 42, no. 6, pages 1340 - 1347 *
覃桂敏 等: "基因调控网络中的癌症标记物预测方法", 《西安电子科技大学学报》, vol. 46, no. 6, pages 81 - 87 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994652A (en) * 2023-09-22 2023-11-03 苏州元脑智能科技有限公司 Information prediction method and device based on neural network and electronic equipment
CN116994652B (en) * 2023-09-22 2024-02-02 苏州元脑智能科技有限公司 Information prediction method and device based on neural network and electronic equipment

Also Published As

Publication number Publication date
CN113782092B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
Benidt et al. SimSeq: a nonparametric approach to simulation of RNA-sequence datasets
Weiß et al. nQuire: a statistical framework for ploidy estimation using next generation sequencing
Vlasblom et al. Markov clustering versus affinity propagation for the partitioning of protein interaction graphs
Simpson Exploring genome characteristics and sequence quality without a reference
CN112365171B (en) Knowledge graph-based risk prediction method, device, equipment and storage medium
JP2012094143A (en) Apparatus and method for extracting biomarker
US20150066378A1 (en) Identifying Possible Disease-Causing Genetic Variants by Machine Learning Classification
Topa et al. Gaussian process test for high-throughput sequencing time series: application to experimental evolution
CN110827924B (en) Clustering method and device for gene expression data, computer equipment and storage medium
Shaw et al. Theory of local k-mer selection with applications to long-read alignment
CN113517022B (en) Gene detection method, feature extraction method, device, equipment and system
CN108805174A (en) clustering method and device
KR20220069943A (en) Single-cell RNA-SEQ data processing
CN114496099A (en) Cell function annotation method, device, equipment and medium
CN113488104A (en) Cancer driver gene prediction method and system based on local and global network centrality analysis
Chen et al. Improved interpretability of machine learning model using unsupervised clustering: predicting time to first treatment in chronic lymphocytic leukemia
CN115272797A (en) Training method, using method, device, equipment and storage medium of classifier
CN113782092B (en) Method and device for generating lifetime prediction model and storage medium
Xiong et al. Chord: an ensemble machine learning algorithm to identify doublets in single-cell RNA sequencing data
Srivastava et al. NetSeekR: a network analysis pipeline for RNA-Seq time series data
CN113782093A (en) Method and device for acquiring gene expression filling data and storage medium
Mccallum et al. Quantifying copy number variations using a hidden Markov model with inhomogeneous emission distributions
CN116525108A (en) SNP data-based prediction method, device, equipment and storage medium
CN115831219A (en) Quality prediction method, device, equipment and storage medium
CN113780445B (en) Method and device for generating cancer subtype classification prediction model and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant