CN112802546B - Biological state characterization method, device, equipment and storage medium - Google Patents

Biological state characterization method, device, equipment and storage medium Download PDF

Info

Publication number
CN112802546B
CN112802546B CN202011596779.XA CN202011596779A CN112802546B CN 112802546 B CN112802546 B CN 112802546B CN 202011596779 A CN202011596779 A CN 202011596779A CN 112802546 B CN112802546 B CN 112802546B
Authority
CN
China
Prior art keywords
gene
network
path
score
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011596779.XA
Other languages
Chinese (zh)
Other versions
CN112802546A (en
Inventor
王升启
张孝昌
杨骞
周喆
康其传
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Academy of Military Medical Sciences AMMS of PLA
Original Assignee
Academy of Military Medical Sciences AMMS of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Academy of Military Medical Sciences AMMS of PLA filed Critical Academy of Military Medical Sciences AMMS of PLA
Priority to CN202011596779.XA priority Critical patent/CN112802546B/en
Publication of CN112802546A publication Critical patent/CN112802546A/en
Application granted granted Critical
Publication of CN112802546B publication Critical patent/CN112802546B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Abstract

The application provides a biological state characterization method, a biological state characterization device, biological state characterization equipment and a storage medium, and relates to the technical field of bioinformatics. Determining a differential expression gene set according to a gene list to be tested and a preset reference gene list; obtaining a gene network from a preset genome database, wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes; determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network; determining a target access network according to the access fingerprint score corresponding to each access network; the differentially expressed genes on the network of pathways of interest are identified as gene markers for genes indicative of the risk of disease. By applying the embodiment of the application, the accuracy of disease analysis can be improved.

Description

Biological state characterization method, device, equipment and storage medium
Technical Field
The application relates to the technical field of bioinformatics, in particular to a biological state characterization method, a biological state characterization device, biological state characterization equipment and a storage medium.
Background
Cancer is a major disease that threatens human life and health, and the incidence and mortality rates of the disease on a global scale are rising year by year. Biologists can identify gene markers in gene sequences by gene enrichment methods, and can determine genes at risk of developing disease (cancer) based on the identified gene markers, facilitating analysis of cancer.
Currently, a target pathway is obtained mainly by determining the number of differentially expressed genes contained in each pathway network, and then a gene marker is determined based on the target pathway, and thus a gene at risk of a disease can be determined from the gene marker.
However, when determining gene markers using the prior art, the topological connection of differentially expressed genes in the pathway network is not considered, which may reduce the accuracy of disease analysis.
Disclosure of Invention
The application aims to overcome the defects in the prior art and provide a biological state characterization method, a biological state characterization device, biological state characterization equipment and a biological state characterization storage medium, which can improve the accuracy of disease analysis.
In order to achieve the above purpose, the technical scheme adopted by the embodiment of the application is as follows:
in a first aspect, embodiments of the present application provide a method for representing a biological state, the method comprising:
Determining a differential expression gene set according to the gene list to be tested and a preset reference gene list;
obtaining a gene network from a preset genome database, wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes;
determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network;
determining a target access network according to the access fingerprint score corresponding to each access network;
And determining the differentially expressed genes on the target pathway network as gene markers, wherein the gene markers are used for indicating genes at risk of diseases.
Optionally, each gene network is used to characterize a different causative agent of a disease; the method further comprises the steps of:
Inputting the path fingerprint score corresponding to each path network into a pre-trained gene analysis model to obtain a risk value of diseases in the gene list to be tested; the gene analysis model is a model which is obtained by training according to the path fingerprint score of each path network corresponding to the sample gene list.
Optionally, the method further comprises:
obtaining a basic score corresponding to each channel network according to the number of the differentially expressed genes in each channel network;
obtaining a target path fingerprint score of each path network according to the base score and the path fingerprint score;
the determining the target path network according to the path fingerprint score corresponding to each path network comprises the following steps:
and determining the target path network according to the target path fingerprint score corresponding to each path network.
Optionally, the connection relationship comprises a direct connection relationship and an indirect connection relationship, wherein the gene score of the direct connection relationship is greater than the gene score of the indirect connection relationship;
determining the path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network, including:
Determining a gene score with a direct connection relationship in each pathway network according to the direct connection relationship of each differential expression gene in the differential expression gene set in each pathway network;
Determining a gene score with an indirect connection relationship in each path network according to the indirect connection relationship of each differential expression gene in the differential expression gene set in each path network;
and determining the corresponding path fingerprint score of each path network according to the gene score with the direct connection relationship in each path network and the gene score with the indirect connection relationship in each path network.
Optionally, the determining the differential expression gene set according to the gene list to be tested and the preset reference gene list includes:
obtaining gene expression profiles corresponding to the gene list to be tested and the preset reference gene list respectively by adopting a high-throughput sequencing technology;
And obtaining a differential expression gene set according to the gene expression amounts corresponding to the genes in the gene expression profiles respectively corresponding to the to-be-tested gene list and the preset reference gene list.
Optionally, the determining the target path network according to the path fingerprint score corresponding to each path network includes:
And determining the target access network according to the access fingerprint score corresponding to each access network and a preset access fingerprint score threshold.
Optionally, the method further comprises:
And displaying a scoring table of the path fingerprint scores corresponding to each path network, wherein the scoring table is displayed with: the name of each access network and the access fingerprint score of each access network.
In a second aspect, embodiments of the present application also provide a biological state characterization device, the device including:
the first determining module is used for determining a differential expression gene set according to the gene list to be tested and a preset reference gene list;
The acquisition module is used for acquiring a gene network from a preset genome database, wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes;
The second determining module is used for determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network;
the third determining module is used for determining a target path network according to the path fingerprint score corresponding to each path network;
And a fourth determination module for determining the differentially expressed genes on the network of target pathways as gene markers for genes indicative of a risk of disease.
Optionally, each gene network is used to characterize a different causative agent of a disease; the apparatus further comprises:
The input module is used for inputting the path fingerprint score corresponding to each path network into a pre-trained gene analysis model to obtain a risk value of diseases in the gene list to be tested; the gene analysis model is a model which is obtained by training according to the path fingerprint score of each path network corresponding to the sample gene list.
Optionally, the third determining module is further configured to obtain a base score corresponding to each of the path networks according to the number of differentially expressed genes in each of the path networks; obtaining a target path fingerprint score of each path network according to the base score and the path fingerprint score; and determining the target path network according to the target path fingerprint score corresponding to each path network.
Optionally, the connection relationship comprises a direct connection relationship and an indirect connection relationship, wherein the gene score of the direct connection relationship is greater than the gene score of the indirect connection relationship;
Correspondingly, the second determining module is specifically configured to determine a gene score with a direct connection relationship in each path network according to the direct connection relationship of each differential expression gene in the differential expression gene set in each path network; determining a gene score with an indirect connection relationship in each path network according to the indirect connection relationship of each differential expression gene in the differential expression gene set in each path network; and determining the corresponding path fingerprint score of each path network according to the gene score with the direct connection relationship in each path network and the gene score with the indirect connection relationship in each path network.
Optionally, the first determining module is specifically configured to obtain a gene expression profile corresponding to the to-be-tested gene list and the preset reference gene list respectively by using a high-throughput sequencing technology; and obtaining a differential expression gene set according to the gene expression amounts corresponding to the genes in the gene expression profiles respectively corresponding to the to-be-tested gene list and the preset reference gene list.
Optionally, the third determining module is specifically configured to determine the target access network according to the access fingerprint score corresponding to each access network and a preset access fingerprint score threshold.
Optionally, the apparatus further comprises: the display module is used for displaying a score table of the path fingerprint scores corresponding to each path network, and the score table is displayed with: the name of each access network and the access fingerprint score of each access network.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium, and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the biological state characterization method of the first aspect described above.
In a fourth aspect, an embodiment of the present application provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the biological state characterization method of the first aspect described above.
The beneficial effects of the application are as follows:
the embodiment of the application provides a biological state characterization method, a biological state characterization device, biological state characterization equipment and a storage medium, wherein the method can comprise the following steps: determining a differential expression gene set according to the gene list to be tested and a preset reference gene list; obtaining a gene network from a preset genome database, wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes; determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network; determining a target access network according to the access fingerprint score corresponding to each access network; the differentially expressed genes on the network of pathways of interest are identified as gene markers for genes indicative of the risk of disease. After the differential expression gene set is obtained, the biological state characterization method provided by the embodiment of the application determines the path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in each path network in the differential expression gene set. That is, the application considers the topological relation of each differential expression gene in the pathway network to determine the target pathway network, thereby obtaining more accurate gene markers and improving the accuracy of disease analysis.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a biological state characterization method according to an embodiment of the present application;
FIG. 2 is a flow chart of another method for characterizing biological status according to an embodiment of the present application;
FIG. 3 is a flow chart of another method for characterizing biological status according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of a method for characterizing a biological state according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a biological state characterization device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application.
Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
Fig. 1 is a schematic flow chart of a biological state characterization method according to an embodiment of the present application. As shown in fig. 1, the method may include:
S101, determining a differential expression gene set according to a gene list to be tested and a preset reference gene list.
The gene list to be tested can comprise tens of thousands of genes, and the number of genes contained in the preset reference gene list is the same as the number of genes contained in the gene list to be tested. When disease analysis is performed, the genes in the gene list to be tested are acquired based on non-health information of a certain target to be tested (abnormal target), and the genes in the preset reference gene list are acquired based on health information of a certain normal target; the genes in the gene list to be tested are obtained based on the non-health information of a plurality of targets to be tested, the genes in the preset reference gene list are obtained based on the health information of a plurality of normal targets, and the biological mechanism study indicates that when the disease is an unknown disease, the genes causing the disease can be obtained by the biological state characterization method of the application.
Whether it is for disease analysis or for biological mechanism study, it is necessary to obtain differentially expressed genes, each of which constitutes a differentially expressed gene set. Specifically, each gene in the gene list to be tested is compared with each gene in the preset reference gene list, and according to a preset difference threshold value, a plurality of genes with comparison results exceeding the preset difference threshold value can be obtained, and the genes are used as differential expression genes.
S102, obtaining a gene network from a preset genome database.
Wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes. The genome database can be specifically KEGG (Kyoto Encyclopedia of Genes and Genomes, encyclopedia of Kyoto genes and genome), wherein the KEGG researches the genes and expression information as a whole network, integrates the data of genome, chemical molecular group, biochemical system and the like, and comprises metabolic pathways, medicines, diseases, gene sequences, genome and the like.
That is, the genome database includes a gene network composed of a plurality of nodes and regulatory information between each node, wherein the nodes represent genes and the regulatory information characterizes connection relations between the nodes. The gene network may include a plurality of pathway networks, each pathway network characterizing one cause of the induced disease, and each pathway network may include genes that are linked to each other, as well as genes that are not linked to other genes, which genes may be referred to as noise genes.
S103, determining the path fingerprint score corresponding to each path network according to the connection relation of the different expression genes in the different expression gene set in each path network.
Matching each differential expression gene in the differential expression gene set with the genes contained in each path network, determining whether each path network contains the differential expression gene, and if the path network does not contain the differential expression gene, determining that the path fingerprint score corresponding to the path network is 0; if the path network contains the differential expression genes, judging the connection relation between the contained differential expression genes, and when the direct connection relation exists between the contained differential expression genes, carrying out A-grade scoring on each differential expression gene directly connected, and when the indirect connection relation exists between the contained differential expression genes, carrying out B-grade scoring on each differential expression gene indirectly connected, wherein the A-grade scoring is higher than the B-grade scoring. And finally, adding the differential expression gene scores contained in each channel network to obtain a gene score adding result. A pathway fingerprint score corresponding to each pathway network may be determined based on the gene score addition result corresponding to each pathway network.
S104, determining a target path network according to the path fingerprint score corresponding to each path network.
In general, the number of the channel networks included in the gene network in the gene database (KEGG) is a constant value of 330. The 330 path networks may be ranked in order of path fingerprint score from top to bottom, with the top path network (i.e., the path fingerprint score is highest) being the target path network, and the top n path networks may be the target path network, i.e., the application is not limited to the number of target path networks.
S105, determining the differential expression genes on the target pathway network as gene markers, wherein the gene markers are used for indicating genes at risk of diseases.
After the target pathway network is determined, the differentially expressed genes contained on the target pathway network may be extracted and used as gene markers. In diagnosing a disease, each of the differentially expressed genes as the gene markers is compared with the gene markers corresponding to each of the known diseases, and a disease having a high matching rate is considered as a target disease (biological state), that is, each of the differentially expressed genes as the gene markers corresponds to a gene at risk of the known disease. In the case of biological mechanism studies, each differentially expressed gene as a marker of the gene corresponds to a gene at risk of the unknown disease.
In summary, in the biological state characterization method provided by the application, a differential expression gene set is determined according to the gene list to be tested and the preset reference gene list; obtaining a gene network from a preset genome database, wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes; determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network; determining a target access network according to the access fingerprint score corresponding to each access network; the differentially expressed genes on the network of pathways of interest are identified as gene markers for genes indicative of the risk of disease. After the differential expression gene set is obtained, the biological state characterization method provided by the embodiment of the application determines the path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in each path network in the differential expression gene set. That is, the application considers the topological relation of each differential expression gene in the pathway network to determine the target pathway network, thereby obtaining more accurate gene markers and improving the accuracy of disease analysis.
Optionally, each gene network is used to characterize a different causative agent of a disease; the method further comprises the steps of: inputting the path fingerprint score corresponding to each path network into a pre-trained gene analysis model to obtain a risk value of diseases in the gene list to be tested; the gene analysis model is a model which is obtained by training according to the path fingerprint score of each path network corresponding to the sample gene list.
When the initial gene analysis model is trained, a plurality of training samples can be input into the initial gene analysis model, the features in the training samples comprise corresponding path fingerprint scores of each path network, the labels corresponding to the path fingerprint scores are disease categories, and the path fingerprint scores corresponding to each path network are acquired based on a sample gene list constructed by an expert. And when the training stopping condition is met, training to obtain a gene analysis model. And inputting the path fingerprint score corresponding to each path network corresponding to the gene list to be tested into the gene analysis model, wherein the gene analysis model can output probabilities corresponding to various diseases, and the larger the probability is, the higher the risk value of the disease exists in the gene list to be tested is. It can be seen that analyzing the disease by means of machine learning can improve the accuracy of the analysis results.
Fig. 2 is a schematic flow chart of another biological status characterization method according to an embodiment of the present application. As shown in fig. 2, the method further includes:
s201, obtaining a basic score corresponding to each pathway network according to the number of the differential expression genes in each pathway network.
And matching each differential expression gene in the differential expression gene set with the genes contained in each pathway network, counting the number of the differential expression genes contained in each pathway network, and taking the counted number as a basic score of the pathway network.
For example, if the number of differentially expressed genes included in the pathway network 1 is 3, the basis score of the pathway network 1 is 3, and if the number of differentially expressed genes included in the pathway network 2 is 6, the basis score of the pathway network 2 is 6, and the other pathway networks are similar.
S202, obtaining a target path fingerprint score of each path network according to the base score and the path fingerprint score.
S203, determining a target access network according to the target access score corresponding to each access network.
And adding the base score corresponding to each path network and the path fingerprint score corresponding to each path network, and taking the added result as a target path fingerprint score of each path network. The target path network may be determined from each path network based on the target path fingerprint score of each path network and a preset rule, where the preset rule may be that the path network with the highest score of the target path fingerprint is used as the target path network, or may be that the path network with the top n bits of the ranking (from big to small) of the target path fingerprint score is used as the target path network. That is, the number of the target path networks may be 1 or more, and the present application is not limited thereto.
Fig. 3 is a flow chart of another biological status characterization method according to an embodiment of the present application. As shown in fig. 3, optionally, the linkage includes a direct linkage and an indirect linkage, the gene score of the direct linkage being greater than the gene score of the indirect linkage; determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network, wherein the determining comprises the following steps:
s301, determining a gene score with a direct connection relationship in each path network according to the direct connection relationship of each differential expression gene in the differential expression gene set in each path network.
S302, determining the gene score with indirect connection relation in each path network according to the indirect connection relation of each differential expression gene in the differential expression gene set in each path network.
Wherein the connection relationship between the differentially expressed genes contained in each pathway network has the following two cases, one is direct connection between the differentially expressed genes and the other is indirect connection between the differentially expressed genes. If there is a direct connection between the differentially expressed genes contained in the pathway network, these genes may be given a class a score, and if there is an indirect connection between the differentially expressed genes contained in the pathway network, these genes may be given a class B score, which is greater than the class B score.
S303, determining the path fingerprint score corresponding to each path network according to the gene score with the direct connection relation in each path network and the gene score with the indirect connection relation in each path network.
For one pathway network, the score of each of the differentially expressed genes having a direct connection relationship and the score of each of the differentially expressed genes having an indirect connection relationship contained in the pathway network are counted, and a total gene score is obtained based on the score of each of the differentially expressed genes contained in the pathway network. And adding the total gene score and the number of the differentially expressed genes contained in the pathway network to obtain the pathway fingerprint score corresponding to the pathway network. It should be noted that, the obtaining of the path fingerprint scores corresponding to other path networks is similar, and details are not repeated here, and reference may be made to the above description.
FIG. 4 is a flow chart of another method for characterizing biological status according to an embodiment of the present application. As shown in fig. 4, optionally, determining the differential expression gene set according to the to-be-tested gene list and the preset reference gene list includes:
S401, acquiring gene expression profiles corresponding to the gene list to be tested and the preset reference gene list respectively by adopting a high-throughput sequencing technology.
S402, obtaining a differential expression gene set according to the gene expression amounts corresponding to the genes in the gene expression profiles respectively corresponding to the to-be-tested gene list and the preset reference gene list.
The high-throughput sequencing technology is also called as 'next generation' sequencing technology, and hundreds of thousands to millions of gene molecules can be sequenced in parallel at a time. The gene expression profile of the gene list to be tested can be obtained by the high-throughput sequencing technology on the basis of the sample to be tested obtained by the case, and the gene expression profile of the preset reference gene list can be obtained in real time or can be directly obtained from a gene library. The gene expression profile comprises the gene expression amounts corresponding to the genes, the gene expression amounts of the genes in the gene expression profile of the to-be-tested gene list can be compared with the gene expression amounts of the genes in the gene expression profile of the preset reference gene list, if the variation range of the gene expression amounts exceeds a preset threshold value, the genes are differential expression genes, and finally, all the differential expression genes form a differential expression gene set.
Optionally, the determining the target path network according to the path fingerprint score corresponding to each path network includes: and determining the target access network according to the access fingerprint score corresponding to each access network and a preset access fingerprint score threshold.
The preset path fingerprint score threshold value can be set according to practical experience, and the application is not limited to the preset path fingerprint score threshold value. And comparing the path fingerprint score corresponding to each path network with a preset path fingerprint score threshold, and if the path fingerprint score corresponding to the path network is larger than the preset path fingerprint score threshold, taking the path network as a target path network.
Optionally, the method may further include: a score table showing the path fingerprint scores corresponding to each path network, wherein the score table shows: the name of each access network and the access fingerprint score of each access network.
Alternatively, the score table may be displayed automatically, or may be displayed after a trigger instruction of a worker is received, which is not limited by the present application. In the score table, the names of each path network may be arranged in order of the path fingerprint score of each path network from large to small. Thus, the staff can more intuitively know the relevance between each path network and the gene list to be tested.
Fig. 5 is a schematic structural diagram of a biological status characterization device according to an embodiment of the present application. As shown in fig. 5, the apparatus may include:
A first determining module 501, configured to determine a differential expression gene set according to a to-be-tested gene list and a preset reference gene list;
an obtaining module 502, configured to obtain a gene network from a preset genome database;
the second determining module 503 is configured to determine a path fingerprint score corresponding to each path network according to a connection relationship of each differential expression gene in the differential expression gene set in each path network;
a third determining module 504, configured to determine a target path network according to the path fingerprint score corresponding to each path network;
a fourth determining module 505 is configured to determine that the differentially expressed genes on the target pathway network are gene markers, where the gene markers are used to indicate genes at risk of disease.
Optionally, each gene network is used to characterize a different causative agent of a disease; the apparatus further comprises:
The input module is used for inputting the path fingerprint score of each path network into a pre-trained gene analysis model to obtain a risk value of diseases in the gene list to be tested; the gene analysis model is a model which is obtained by training according to the path fingerprint score of each path network corresponding to the sample gene list.
Optionally, the third determining module 504 is further configured to obtain a base score corresponding to each pathway network according to the number of differentially expressed genes in each pathway network; obtaining a target path fingerprint score of each path network according to the base score and the path fingerprint score; and determining the target access network according to the target access fingerprint score corresponding to each access network.
Optionally, the connection relationship comprises a direct connection relationship and an indirect connection relationship, the gene score of the direct connection relationship is greater than the gene score of the indirect connection relationship;
Accordingly, the second determining module 503 is specifically configured to determine, according to the direct connection relationship of each differentially expressed gene in the differentially expressed gene set in each pathway network, a gene score having a direct connection relationship in each pathway network; determining a gene score with an indirect connection relationship in each pathway network according to the indirect connection relationship of each differential expression gene in each pathway network in the differential expression gene set; and determining the corresponding path fingerprint score of each path network according to the gene score with the direct connection relationship in each path network and the gene score with the indirect connection relationship in each path network.
Optionally, the first determining module 501 is specifically configured to obtain a gene expression profile corresponding to the to-be-tested gene list and a preset reference gene list by using a high-throughput sequencing technology; and obtaining a differential expression gene set according to the gene expression amounts corresponding to the genes in the gene expression profiles respectively corresponding to the to-be-tested gene list and the preset reference gene list.
Optionally, the third determining module 504 is specifically configured to determine the target path network according to the path fingerprint score corresponding to each path network and a preset path fingerprint score threshold.
Optionally, the apparatus further comprises: the display module is used for displaying a scoring table of the path fingerprint scores corresponding to each path network, and the scoring table is displayed with: the name of each access network and the access fingerprint score of each access network.
The foregoing apparatus is used for executing the method provided in the foregoing embodiment, and its implementation principle and technical effects are similar, and are not described herein again.
The above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more Application SPECIFIC INTEGRATED Circuits (ASIC), or one or more microprocessors (DIGITAL SIGNAL Processor DSP), or one or more field programmable gate arrays (Field Programmable GATE ARRAY FPGA), etc. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in fig. 6, the electronic device may include: the system comprises a processor 601, a storage medium 602 and a bus 603, the storage medium 602 storing machine readable instructions executable by the processor 601, the processor 601 and the storage medium 602 communicating over the bus 603 when the electronic device is operating, the processor 601 executing machine readable instructions to perform the steps of the biological state characterization method described above. The specific implementation manner and the technical effect are similar, and are not repeated here.
Optionally, the present application further provides a storage medium, on which a computer program is stored, which when being executed by a processor performs the steps of the above-mentioned biological state characterization method.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the indirect coupling or communication connection of devices or elements may be in the form of electrical, mechanical, or otherwise.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.
The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (english: processor) to perform some of the steps of the methods according to the embodiments of the application. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (5)

1. A biological state characterization device, the device comprising:
the first determining module is used for determining a differential expression gene set according to the gene list to be tested and a preset reference gene list;
The acquisition module is used for acquiring a gene network from a preset genome database, wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes;
The second determining module is used for determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network;
the third determining module is used for determining a target path network according to the path fingerprint score corresponding to each path network;
A fourth determining module for determining a differentially expressed gene on the network of target pathways as a gene marker for indicating a gene at risk of disease;
Each gene network is used for representing different induction reasons of a disease;
The apparatus further comprises:
The input module is used for inputting the path fingerprint score corresponding to each path network into a pre-trained gene analysis model to obtain a risk value of diseases in the gene list to be tested; the gene analysis model is a model obtained by training according to the path fingerprint score of each path network corresponding to the sample gene list;
The connection relationship comprises a direct connection relationship and an indirect connection relationship, and the gene score of the direct connection relationship is larger than that of the indirect connection relationship;
The second determining module is specifically configured to:
Determining a gene score with a direct connection relationship in each pathway network according to the direct connection relationship of each differential expression gene in the differential expression gene set in each pathway network;
Determining a gene score with an indirect connection relationship in each path network according to the indirect connection relationship of each differential expression gene in the differential expression gene set in each path network;
and determining the corresponding path fingerprint score of each path network according to the gene score with the direct connection relationship in each path network and the gene score with the indirect connection relationship in each path network.
2. The biological state characterization device of claim 1, wherein the third determination module is specifically configured to:
obtaining a basic score corresponding to each channel network according to the number of the differentially expressed genes in each channel network;
obtaining a target path fingerprint score of each path network according to the base score and the path fingerprint score;
the determining the target path network according to the path fingerprint score corresponding to each path network comprises the following steps:
and determining the target path network according to the target path fingerprint score corresponding to each path network.
3. The biological state characterization device of claim 1, wherein the first determining module is specifically configured to:
obtaining gene expression profiles corresponding to the gene list to be tested and the preset reference gene list respectively by adopting a high-throughput sequencing technology;
And obtaining a differential expression gene set according to the gene expression amounts corresponding to the genes in the gene expression profiles respectively corresponding to the to-be-tested gene list and the preset reference gene list.
4. The biological state characterization device of claim 1, wherein the third determination module is specifically configured to:
And determining the target access network according to the access fingerprint score corresponding to each access network and a preset access fingerprint score threshold.
5. The biological state characterization device of claim 1, wherein the device further comprises:
The display module is used for displaying a score table of the path fingerprint scores corresponding to each path network, and the score table is displayed with: the name of each access network and the access fingerprint score of each access network.
CN202011596779.XA 2020-12-29 2020-12-29 Biological state characterization method, device, equipment and storage medium Active CN112802546B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011596779.XA CN112802546B (en) 2020-12-29 2020-12-29 Biological state characterization method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011596779.XA CN112802546B (en) 2020-12-29 2020-12-29 Biological state characterization method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112802546A CN112802546A (en) 2021-05-14
CN112802546B true CN112802546B (en) 2024-05-03

Family

ID=75805614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011596779.XA Active CN112802546B (en) 2020-12-29 2020-12-29 Biological state characterization method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112802546B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007002895A1 (en) * 2005-06-29 2007-01-04 Board Of Trustees Of Michigan State University Integrative framework for three-stage integrative pathway search
CN101812525A (en) * 2010-04-09 2010-08-25 南通大学 Method for building Alzheimer's disease (AD) morbidity-associated gene network path analyzing model
WO2012175675A1 (en) * 2011-06-24 2012-12-27 Genomatix Software Gmbh Method of producing biological networks
WO2016118513A1 (en) * 2015-01-20 2016-07-28 The Broad Institute, Inc. Method and system for analyzing biological networks
CN106055922A (en) * 2016-06-08 2016-10-26 哈尔滨工业大学深圳研究生院 Hybrid network gene screening method based on gene expression data
CN108108589A (en) * 2017-12-29 2018-06-01 郑州轻工业学院 The recognition methods of esophageal squamous cell carcinoma label based on network index variance analysis
CN109886385A (en) * 2019-03-04 2019-06-14 上海宝藤生物医药科技股份有限公司 Determination method, apparatus, equipment and the medium of cell-signaling pathways network characterization
CN109906486A (en) * 2016-10-03 2019-06-18 伊鲁米那股份有限公司 Use phenotype/disease specific gene order of common recognition gene pool and network-based data structure
WO2019117400A1 (en) * 2017-12-11 2019-06-20 연세대학교 산학협력단 Gene network construction apparatus and method
CN110444248A (en) * 2019-07-22 2019-11-12 山东大学 Cancer Biology molecular marker screening technique and system based on network topology parameters
CN111402955A (en) * 2020-04-09 2020-07-10 德州学院 Biological information measuring method, system, storage medium and terminal
CN111899882A (en) * 2020-08-07 2020-11-06 北京科技大学 Method and system for predicting cancer

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2381471A1 (en) * 2001-05-07 2002-11-07 Andreas Wagner System and method for reconstructing pathways in large genetic networks from genetic perturbations
US20090299646A1 (en) * 2004-07-30 2009-12-03 Soheil Shams System and method for biological pathway perturbation analysis
WO2010056982A2 (en) * 2008-11-17 2010-05-20 The George Washington University Compositions and methods for identifying autism spectrum disorders
EA201590175A1 (en) * 2012-07-26 2015-06-30 Дзе Реджентс Оф Дзе Юниверсити Оф Калифорния SCREENING, DIAGNOSTICS AND FORECASTING OF AUTISM AND OTHER DEVELOPMENTAL DISABILITIES
SG2013079173A (en) * 2013-10-18 2015-05-28 Agency Science Tech & Res Sense-antisense gene pairs for patient stratification, prognosis, and therapeutic biomarkers identification
GB201913690D0 (en) * 2019-09-23 2019-11-06 Univ Southampton Molecular phenotype classification

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007002895A1 (en) * 2005-06-29 2007-01-04 Board Of Trustees Of Michigan State University Integrative framework for three-stage integrative pathway search
CN101812525A (en) * 2010-04-09 2010-08-25 南通大学 Method for building Alzheimer's disease (AD) morbidity-associated gene network path analyzing model
WO2012175675A1 (en) * 2011-06-24 2012-12-27 Genomatix Software Gmbh Method of producing biological networks
WO2016118513A1 (en) * 2015-01-20 2016-07-28 The Broad Institute, Inc. Method and system for analyzing biological networks
CN106055922A (en) * 2016-06-08 2016-10-26 哈尔滨工业大学深圳研究生院 Hybrid network gene screening method based on gene expression data
CN109906486A (en) * 2016-10-03 2019-06-18 伊鲁米那股份有限公司 Use phenotype/disease specific gene order of common recognition gene pool and network-based data structure
WO2019117400A1 (en) * 2017-12-11 2019-06-20 연세대학교 산학협력단 Gene network construction apparatus and method
CN108108589A (en) * 2017-12-29 2018-06-01 郑州轻工业学院 The recognition methods of esophageal squamous cell carcinoma label based on network index variance analysis
CN109886385A (en) * 2019-03-04 2019-06-14 上海宝藤生物医药科技股份有限公司 Determination method, apparatus, equipment and the medium of cell-signaling pathways network characterization
CN110444248A (en) * 2019-07-22 2019-11-12 山东大学 Cancer Biology molecular marker screening technique and system based on network topology parameters
CN111402955A (en) * 2020-04-09 2020-07-10 德州学院 Biological information measuring method, system, storage medium and terminal
CN111899882A (en) * 2020-08-07 2020-11-06 北京科技大学 Method and system for predicting cancer

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Comprehensive RNA-Sequencing Analysis in Peripheral Blood Cells Reveals Differential Expression Signatures with Biomarker Potential for Idiopathic Membranous Nephropathy;Wan, N等;DNA AND CELL BIOLOGY;第38卷(第11期);第1223-1232页 *
基于遗传算法与支持向量机的基因微阵列分析;汪伟;刘红;;中国组织工程研究与临床康复(第17期);第71-75页 *
基因生物信息学的脓毒症潜在发病机制研究及生物标记物筛选;王海清;左文;陈睦虎;胡迎春;杨蕊萍;钟武;;西南医科大学学报(第01期);第31-35页 *
基因表达谱芯片的数据挖掘;尤元海;张建中;;中国生物工程杂志(第10期);第93-97页 *
胃癌关键基因和通路的生物信息学和功能分析;吴茜;宋兴勃;钟慧钰;温阳;应斌武;;肿瘤预防与治疗(第02期);第50-58页 *

Also Published As

Publication number Publication date
CN112802546A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
Calderon et al. Inferring relevant cell types for complex traits by using single-cell gene expression
Rau et al. Transformation and model choice for RNA-seq co-expression analysis
AU2017338775B2 (en) Phenotype/disease specific gene ranking using curated, gene library and network based data structures
Beesley et al. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities
Coissac et al. Bioinformatic challenges for DNA metabarcoding of plants and animals
Ritchie et al. A scalable permutation approach reveals replication and preservation patterns of network modules in large datasets
CN108121896B (en) Disease relation analysis method and device based on miRNA
KR101450784B1 (en) Systematic identification method of novel drug indications using electronic medical records in network frame method
Pihur et al. Reconstruction of genetic association networks from microarray data: a partial least squares approach
Fryett et al. Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome‐wide association studies
Arbet et al. Lessons and tips for designing a machine learning study using EHR data
Milan et al. Comparison and possible use of in silico tools for carcinogenicity within REACH legislation
Su et al. Utility of characters evolving at diverse rates of evolution to resolve quartet trees with unequal branch lengths: analytical predictions of long-branch effects
Avino et al. Tree shape‐based approaches for the comparative study of cophylogeny
CN103473416A (en) Protein-protein interaction model building method and device
Rahnenführer et al. Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges
Stojmirović et al. Robust and accurate data enrichment statistics via distribution function of sum of weights
Callahan et al. Ontologizing health systems data at scale: making translational discovery a reality
Zhou et al. Quartet-based computations of internode certainty provide accurate and robust measures of phylogenetic incongruence
CN112802546B (en) Biological state characterization method, device, equipment and storage medium
CN111383709A (en) Recognition method and device for CERNA competition module, electronic equipment and storage medium
Linheiro et al. CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure
Sundar et al. An intelligent prediction model for target protein identification in hepatic carcinoma using novel graph theory and ann model
Zuber et al. Selecting causal risk factors from high-throughput experiments using multivariable Mendelian randomization
CN112071439B (en) Drug side effect relationship prediction method, system, computer device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant