CN112802546B - Biological state characterization method, device, equipment and storage medium - Google Patents
Biological state characterization method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN112802546B CN112802546B CN202011596779.XA CN202011596779A CN112802546B CN 112802546 B CN112802546 B CN 112802546B CN 202011596779 A CN202011596779 A CN 202011596779A CN 112802546 B CN112802546 B CN 112802546B
- Authority
- CN
- China
- Prior art keywords
- gene
- network
- path
- score
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012512 characterization method Methods 0.000 title claims abstract description 35
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 332
- 230000014509 gene expression Effects 0.000 claims abstract description 106
- 230000037361 pathway Effects 0.000 claims abstract description 74
- 201000010099 disease Diseases 0.000 claims abstract description 44
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 44
- 238000004458 analytical method Methods 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 claims description 8
- 238000012165 high-throughput sequencing Methods 0.000 claims description 7
- 239000003550 marker Substances 0.000 claims description 4
- 230000006698 induction Effects 0.000 claims 1
- 238000000034 method Methods 0.000 description 25
- 230000036541 health Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 231100000676 disease causative agent Toxicity 0.000 description 4
- 206010028980 Neoplasm Diseases 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 230000007321 biological mechanism Effects 0.000 description 3
- 201000011510 cancer Diseases 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 101150010487 are gene Proteins 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000037353 metabolic pathway Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Abstract
The application provides a biological state characterization method, a biological state characterization device, biological state characterization equipment and a storage medium, and relates to the technical field of bioinformatics. Determining a differential expression gene set according to a gene list to be tested and a preset reference gene list; obtaining a gene network from a preset genome database, wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes; determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network; determining a target access network according to the access fingerprint score corresponding to each access network; the differentially expressed genes on the network of pathways of interest are identified as gene markers for genes indicative of the risk of disease. By applying the embodiment of the application, the accuracy of disease analysis can be improved.
Description
Technical Field
The application relates to the technical field of bioinformatics, in particular to a biological state characterization method, a biological state characterization device, biological state characterization equipment and a storage medium.
Background
Cancer is a major disease that threatens human life and health, and the incidence and mortality rates of the disease on a global scale are rising year by year. Biologists can identify gene markers in gene sequences by gene enrichment methods, and can determine genes at risk of developing disease (cancer) based on the identified gene markers, facilitating analysis of cancer.
Currently, a target pathway is obtained mainly by determining the number of differentially expressed genes contained in each pathway network, and then a gene marker is determined based on the target pathway, and thus a gene at risk of a disease can be determined from the gene marker.
However, when determining gene markers using the prior art, the topological connection of differentially expressed genes in the pathway network is not considered, which may reduce the accuracy of disease analysis.
Disclosure of Invention
The application aims to overcome the defects in the prior art and provide a biological state characterization method, a biological state characterization device, biological state characterization equipment and a biological state characterization storage medium, which can improve the accuracy of disease analysis.
In order to achieve the above purpose, the technical scheme adopted by the embodiment of the application is as follows:
in a first aspect, embodiments of the present application provide a method for representing a biological state, the method comprising:
Determining a differential expression gene set according to the gene list to be tested and a preset reference gene list;
obtaining a gene network from a preset genome database, wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes;
determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network;
determining a target access network according to the access fingerprint score corresponding to each access network;
And determining the differentially expressed genes on the target pathway network as gene markers, wherein the gene markers are used for indicating genes at risk of diseases.
Optionally, each gene network is used to characterize a different causative agent of a disease; the method further comprises the steps of:
Inputting the path fingerprint score corresponding to each path network into a pre-trained gene analysis model to obtain a risk value of diseases in the gene list to be tested; the gene analysis model is a model which is obtained by training according to the path fingerprint score of each path network corresponding to the sample gene list.
Optionally, the method further comprises:
obtaining a basic score corresponding to each channel network according to the number of the differentially expressed genes in each channel network;
obtaining a target path fingerprint score of each path network according to the base score and the path fingerprint score;
the determining the target path network according to the path fingerprint score corresponding to each path network comprises the following steps:
and determining the target path network according to the target path fingerprint score corresponding to each path network.
Optionally, the connection relationship comprises a direct connection relationship and an indirect connection relationship, wherein the gene score of the direct connection relationship is greater than the gene score of the indirect connection relationship;
determining the path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network, including:
Determining a gene score with a direct connection relationship in each pathway network according to the direct connection relationship of each differential expression gene in the differential expression gene set in each pathway network;
Determining a gene score with an indirect connection relationship in each path network according to the indirect connection relationship of each differential expression gene in the differential expression gene set in each path network;
and determining the corresponding path fingerprint score of each path network according to the gene score with the direct connection relationship in each path network and the gene score with the indirect connection relationship in each path network.
Optionally, the determining the differential expression gene set according to the gene list to be tested and the preset reference gene list includes:
obtaining gene expression profiles corresponding to the gene list to be tested and the preset reference gene list respectively by adopting a high-throughput sequencing technology;
And obtaining a differential expression gene set according to the gene expression amounts corresponding to the genes in the gene expression profiles respectively corresponding to the to-be-tested gene list and the preset reference gene list.
Optionally, the determining the target path network according to the path fingerprint score corresponding to each path network includes:
And determining the target access network according to the access fingerprint score corresponding to each access network and a preset access fingerprint score threshold.
Optionally, the method further comprises:
And displaying a scoring table of the path fingerprint scores corresponding to each path network, wherein the scoring table is displayed with: the name of each access network and the access fingerprint score of each access network.
In a second aspect, embodiments of the present application also provide a biological state characterization device, the device including:
the first determining module is used for determining a differential expression gene set according to the gene list to be tested and a preset reference gene list;
The acquisition module is used for acquiring a gene network from a preset genome database, wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes;
The second determining module is used for determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network;
the third determining module is used for determining a target path network according to the path fingerprint score corresponding to each path network;
And a fourth determination module for determining the differentially expressed genes on the network of target pathways as gene markers for genes indicative of a risk of disease.
Optionally, each gene network is used to characterize a different causative agent of a disease; the apparatus further comprises:
The input module is used for inputting the path fingerprint score corresponding to each path network into a pre-trained gene analysis model to obtain a risk value of diseases in the gene list to be tested; the gene analysis model is a model which is obtained by training according to the path fingerprint score of each path network corresponding to the sample gene list.
Optionally, the third determining module is further configured to obtain a base score corresponding to each of the path networks according to the number of differentially expressed genes in each of the path networks; obtaining a target path fingerprint score of each path network according to the base score and the path fingerprint score; and determining the target path network according to the target path fingerprint score corresponding to each path network.
Optionally, the connection relationship comprises a direct connection relationship and an indirect connection relationship, wherein the gene score of the direct connection relationship is greater than the gene score of the indirect connection relationship;
Correspondingly, the second determining module is specifically configured to determine a gene score with a direct connection relationship in each path network according to the direct connection relationship of each differential expression gene in the differential expression gene set in each path network; determining a gene score with an indirect connection relationship in each path network according to the indirect connection relationship of each differential expression gene in the differential expression gene set in each path network; and determining the corresponding path fingerprint score of each path network according to the gene score with the direct connection relationship in each path network and the gene score with the indirect connection relationship in each path network.
Optionally, the first determining module is specifically configured to obtain a gene expression profile corresponding to the to-be-tested gene list and the preset reference gene list respectively by using a high-throughput sequencing technology; and obtaining a differential expression gene set according to the gene expression amounts corresponding to the genes in the gene expression profiles respectively corresponding to the to-be-tested gene list and the preset reference gene list.
Optionally, the third determining module is specifically configured to determine the target access network according to the access fingerprint score corresponding to each access network and a preset access fingerprint score threshold.
Optionally, the apparatus further comprises: the display module is used for displaying a score table of the path fingerprint scores corresponding to each path network, and the score table is displayed with: the name of each access network and the access fingerprint score of each access network.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium, and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the biological state characterization method of the first aspect described above.
In a fourth aspect, an embodiment of the present application provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the biological state characterization method of the first aspect described above.
The beneficial effects of the application are as follows:
the embodiment of the application provides a biological state characterization method, a biological state characterization device, biological state characterization equipment and a storage medium, wherein the method can comprise the following steps: determining a differential expression gene set according to the gene list to be tested and a preset reference gene list; obtaining a gene network from a preset genome database, wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes; determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network; determining a target access network according to the access fingerprint score corresponding to each access network; the differentially expressed genes on the network of pathways of interest are identified as gene markers for genes indicative of the risk of disease. After the differential expression gene set is obtained, the biological state characterization method provided by the embodiment of the application determines the path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in each path network in the differential expression gene set. That is, the application considers the topological relation of each differential expression gene in the pathway network to determine the target pathway network, thereby obtaining more accurate gene markers and improving the accuracy of disease analysis.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a biological state characterization method according to an embodiment of the present application;
FIG. 2 is a flow chart of another method for characterizing biological status according to an embodiment of the present application;
FIG. 3 is a flow chart of another method for characterizing biological status according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of a method for characterizing a biological state according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a biological state characterization device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application.
Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
Fig. 1 is a schematic flow chart of a biological state characterization method according to an embodiment of the present application. As shown in fig. 1, the method may include:
S101, determining a differential expression gene set according to a gene list to be tested and a preset reference gene list.
The gene list to be tested can comprise tens of thousands of genes, and the number of genes contained in the preset reference gene list is the same as the number of genes contained in the gene list to be tested. When disease analysis is performed, the genes in the gene list to be tested are acquired based on non-health information of a certain target to be tested (abnormal target), and the genes in the preset reference gene list are acquired based on health information of a certain normal target; the genes in the gene list to be tested are obtained based on the non-health information of a plurality of targets to be tested, the genes in the preset reference gene list are obtained based on the health information of a plurality of normal targets, and the biological mechanism study indicates that when the disease is an unknown disease, the genes causing the disease can be obtained by the biological state characterization method of the application.
Whether it is for disease analysis or for biological mechanism study, it is necessary to obtain differentially expressed genes, each of which constitutes a differentially expressed gene set. Specifically, each gene in the gene list to be tested is compared with each gene in the preset reference gene list, and according to a preset difference threshold value, a plurality of genes with comparison results exceeding the preset difference threshold value can be obtained, and the genes are used as differential expression genes.
S102, obtaining a gene network from a preset genome database.
Wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes. The genome database can be specifically KEGG (Kyoto Encyclopedia of Genes and Genomes, encyclopedia of Kyoto genes and genome), wherein the KEGG researches the genes and expression information as a whole network, integrates the data of genome, chemical molecular group, biochemical system and the like, and comprises metabolic pathways, medicines, diseases, gene sequences, genome and the like.
That is, the genome database includes a gene network composed of a plurality of nodes and regulatory information between each node, wherein the nodes represent genes and the regulatory information characterizes connection relations between the nodes. The gene network may include a plurality of pathway networks, each pathway network characterizing one cause of the induced disease, and each pathway network may include genes that are linked to each other, as well as genes that are not linked to other genes, which genes may be referred to as noise genes.
S103, determining the path fingerprint score corresponding to each path network according to the connection relation of the different expression genes in the different expression gene set in each path network.
Matching each differential expression gene in the differential expression gene set with the genes contained in each path network, determining whether each path network contains the differential expression gene, and if the path network does not contain the differential expression gene, determining that the path fingerprint score corresponding to the path network is 0; if the path network contains the differential expression genes, judging the connection relation between the contained differential expression genes, and when the direct connection relation exists between the contained differential expression genes, carrying out A-grade scoring on each differential expression gene directly connected, and when the indirect connection relation exists between the contained differential expression genes, carrying out B-grade scoring on each differential expression gene indirectly connected, wherein the A-grade scoring is higher than the B-grade scoring. And finally, adding the differential expression gene scores contained in each channel network to obtain a gene score adding result. A pathway fingerprint score corresponding to each pathway network may be determined based on the gene score addition result corresponding to each pathway network.
S104, determining a target path network according to the path fingerprint score corresponding to each path network.
In general, the number of the channel networks included in the gene network in the gene database (KEGG) is a constant value of 330. The 330 path networks may be ranked in order of path fingerprint score from top to bottom, with the top path network (i.e., the path fingerprint score is highest) being the target path network, and the top n path networks may be the target path network, i.e., the application is not limited to the number of target path networks.
S105, determining the differential expression genes on the target pathway network as gene markers, wherein the gene markers are used for indicating genes at risk of diseases.
After the target pathway network is determined, the differentially expressed genes contained on the target pathway network may be extracted and used as gene markers. In diagnosing a disease, each of the differentially expressed genes as the gene markers is compared with the gene markers corresponding to each of the known diseases, and a disease having a high matching rate is considered as a target disease (biological state), that is, each of the differentially expressed genes as the gene markers corresponds to a gene at risk of the known disease. In the case of biological mechanism studies, each differentially expressed gene as a marker of the gene corresponds to a gene at risk of the unknown disease.
In summary, in the biological state characterization method provided by the application, a differential expression gene set is determined according to the gene list to be tested and the preset reference gene list; obtaining a gene network from a preset genome database, wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes; determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network; determining a target access network according to the access fingerprint score corresponding to each access network; the differentially expressed genes on the network of pathways of interest are identified as gene markers for genes indicative of the risk of disease. After the differential expression gene set is obtained, the biological state characterization method provided by the embodiment of the application determines the path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in each path network in the differential expression gene set. That is, the application considers the topological relation of each differential expression gene in the pathway network to determine the target pathway network, thereby obtaining more accurate gene markers and improving the accuracy of disease analysis.
Optionally, each gene network is used to characterize a different causative agent of a disease; the method further comprises the steps of: inputting the path fingerprint score corresponding to each path network into a pre-trained gene analysis model to obtain a risk value of diseases in the gene list to be tested; the gene analysis model is a model which is obtained by training according to the path fingerprint score of each path network corresponding to the sample gene list.
When the initial gene analysis model is trained, a plurality of training samples can be input into the initial gene analysis model, the features in the training samples comprise corresponding path fingerprint scores of each path network, the labels corresponding to the path fingerprint scores are disease categories, and the path fingerprint scores corresponding to each path network are acquired based on a sample gene list constructed by an expert. And when the training stopping condition is met, training to obtain a gene analysis model. And inputting the path fingerprint score corresponding to each path network corresponding to the gene list to be tested into the gene analysis model, wherein the gene analysis model can output probabilities corresponding to various diseases, and the larger the probability is, the higher the risk value of the disease exists in the gene list to be tested is. It can be seen that analyzing the disease by means of machine learning can improve the accuracy of the analysis results.
Fig. 2 is a schematic flow chart of another biological status characterization method according to an embodiment of the present application. As shown in fig. 2, the method further includes:
s201, obtaining a basic score corresponding to each pathway network according to the number of the differential expression genes in each pathway network.
And matching each differential expression gene in the differential expression gene set with the genes contained in each pathway network, counting the number of the differential expression genes contained in each pathway network, and taking the counted number as a basic score of the pathway network.
For example, if the number of differentially expressed genes included in the pathway network 1 is 3, the basis score of the pathway network 1 is 3, and if the number of differentially expressed genes included in the pathway network 2 is 6, the basis score of the pathway network 2 is 6, and the other pathway networks are similar.
S202, obtaining a target path fingerprint score of each path network according to the base score and the path fingerprint score.
S203, determining a target access network according to the target access score corresponding to each access network.
And adding the base score corresponding to each path network and the path fingerprint score corresponding to each path network, and taking the added result as a target path fingerprint score of each path network. The target path network may be determined from each path network based on the target path fingerprint score of each path network and a preset rule, where the preset rule may be that the path network with the highest score of the target path fingerprint is used as the target path network, or may be that the path network with the top n bits of the ranking (from big to small) of the target path fingerprint score is used as the target path network. That is, the number of the target path networks may be 1 or more, and the present application is not limited thereto.
Fig. 3 is a flow chart of another biological status characterization method according to an embodiment of the present application. As shown in fig. 3, optionally, the linkage includes a direct linkage and an indirect linkage, the gene score of the direct linkage being greater than the gene score of the indirect linkage; determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network, wherein the determining comprises the following steps:
s301, determining a gene score with a direct connection relationship in each path network according to the direct connection relationship of each differential expression gene in the differential expression gene set in each path network.
S302, determining the gene score with indirect connection relation in each path network according to the indirect connection relation of each differential expression gene in the differential expression gene set in each path network.
Wherein the connection relationship between the differentially expressed genes contained in each pathway network has the following two cases, one is direct connection between the differentially expressed genes and the other is indirect connection between the differentially expressed genes. If there is a direct connection between the differentially expressed genes contained in the pathway network, these genes may be given a class a score, and if there is an indirect connection between the differentially expressed genes contained in the pathway network, these genes may be given a class B score, which is greater than the class B score.
S303, determining the path fingerprint score corresponding to each path network according to the gene score with the direct connection relation in each path network and the gene score with the indirect connection relation in each path network.
For one pathway network, the score of each of the differentially expressed genes having a direct connection relationship and the score of each of the differentially expressed genes having an indirect connection relationship contained in the pathway network are counted, and a total gene score is obtained based on the score of each of the differentially expressed genes contained in the pathway network. And adding the total gene score and the number of the differentially expressed genes contained in the pathway network to obtain the pathway fingerprint score corresponding to the pathway network. It should be noted that, the obtaining of the path fingerprint scores corresponding to other path networks is similar, and details are not repeated here, and reference may be made to the above description.
FIG. 4 is a flow chart of another method for characterizing biological status according to an embodiment of the present application. As shown in fig. 4, optionally, determining the differential expression gene set according to the to-be-tested gene list and the preset reference gene list includes:
S401, acquiring gene expression profiles corresponding to the gene list to be tested and the preset reference gene list respectively by adopting a high-throughput sequencing technology.
S402, obtaining a differential expression gene set according to the gene expression amounts corresponding to the genes in the gene expression profiles respectively corresponding to the to-be-tested gene list and the preset reference gene list.
The high-throughput sequencing technology is also called as 'next generation' sequencing technology, and hundreds of thousands to millions of gene molecules can be sequenced in parallel at a time. The gene expression profile of the gene list to be tested can be obtained by the high-throughput sequencing technology on the basis of the sample to be tested obtained by the case, and the gene expression profile of the preset reference gene list can be obtained in real time or can be directly obtained from a gene library. The gene expression profile comprises the gene expression amounts corresponding to the genes, the gene expression amounts of the genes in the gene expression profile of the to-be-tested gene list can be compared with the gene expression amounts of the genes in the gene expression profile of the preset reference gene list, if the variation range of the gene expression amounts exceeds a preset threshold value, the genes are differential expression genes, and finally, all the differential expression genes form a differential expression gene set.
Optionally, the determining the target path network according to the path fingerprint score corresponding to each path network includes: and determining the target access network according to the access fingerprint score corresponding to each access network and a preset access fingerprint score threshold.
The preset path fingerprint score threshold value can be set according to practical experience, and the application is not limited to the preset path fingerprint score threshold value. And comparing the path fingerprint score corresponding to each path network with a preset path fingerprint score threshold, and if the path fingerprint score corresponding to the path network is larger than the preset path fingerprint score threshold, taking the path network as a target path network.
Optionally, the method may further include: a score table showing the path fingerprint scores corresponding to each path network, wherein the score table shows: the name of each access network and the access fingerprint score of each access network.
Alternatively, the score table may be displayed automatically, or may be displayed after a trigger instruction of a worker is received, which is not limited by the present application. In the score table, the names of each path network may be arranged in order of the path fingerprint score of each path network from large to small. Thus, the staff can more intuitively know the relevance between each path network and the gene list to be tested.
Fig. 5 is a schematic structural diagram of a biological status characterization device according to an embodiment of the present application. As shown in fig. 5, the apparatus may include:
A first determining module 501, configured to determine a differential expression gene set according to a to-be-tested gene list and a preset reference gene list;
an obtaining module 502, configured to obtain a gene network from a preset genome database;
the second determining module 503 is configured to determine a path fingerprint score corresponding to each path network according to a connection relationship of each differential expression gene in the differential expression gene set in each path network;
a third determining module 504, configured to determine a target path network according to the path fingerprint score corresponding to each path network;
a fourth determining module 505 is configured to determine that the differentially expressed genes on the target pathway network are gene markers, where the gene markers are used to indicate genes at risk of disease.
Optionally, each gene network is used to characterize a different causative agent of a disease; the apparatus further comprises:
The input module is used for inputting the path fingerprint score of each path network into a pre-trained gene analysis model to obtain a risk value of diseases in the gene list to be tested; the gene analysis model is a model which is obtained by training according to the path fingerprint score of each path network corresponding to the sample gene list.
Optionally, the third determining module 504 is further configured to obtain a base score corresponding to each pathway network according to the number of differentially expressed genes in each pathway network; obtaining a target path fingerprint score of each path network according to the base score and the path fingerprint score; and determining the target access network according to the target access fingerprint score corresponding to each access network.
Optionally, the connection relationship comprises a direct connection relationship and an indirect connection relationship, the gene score of the direct connection relationship is greater than the gene score of the indirect connection relationship;
Accordingly, the second determining module 503 is specifically configured to determine, according to the direct connection relationship of each differentially expressed gene in the differentially expressed gene set in each pathway network, a gene score having a direct connection relationship in each pathway network; determining a gene score with an indirect connection relationship in each pathway network according to the indirect connection relationship of each differential expression gene in each pathway network in the differential expression gene set; and determining the corresponding path fingerprint score of each path network according to the gene score with the direct connection relationship in each path network and the gene score with the indirect connection relationship in each path network.
Optionally, the first determining module 501 is specifically configured to obtain a gene expression profile corresponding to the to-be-tested gene list and a preset reference gene list by using a high-throughput sequencing technology; and obtaining a differential expression gene set according to the gene expression amounts corresponding to the genes in the gene expression profiles respectively corresponding to the to-be-tested gene list and the preset reference gene list.
Optionally, the third determining module 504 is specifically configured to determine the target path network according to the path fingerprint score corresponding to each path network and a preset path fingerprint score threshold.
Optionally, the apparatus further comprises: the display module is used for displaying a scoring table of the path fingerprint scores corresponding to each path network, and the scoring table is displayed with: the name of each access network and the access fingerprint score of each access network.
The foregoing apparatus is used for executing the method provided in the foregoing embodiment, and its implementation principle and technical effects are similar, and are not described herein again.
The above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more Application SPECIFIC INTEGRATED Circuits (ASIC), or one or more microprocessors (DIGITAL SIGNAL Processor DSP), or one or more field programmable gate arrays (Field Programmable GATE ARRAY FPGA), etc. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in fig. 6, the electronic device may include: the system comprises a processor 601, a storage medium 602 and a bus 603, the storage medium 602 storing machine readable instructions executable by the processor 601, the processor 601 and the storage medium 602 communicating over the bus 603 when the electronic device is operating, the processor 601 executing machine readable instructions to perform the steps of the biological state characterization method described above. The specific implementation manner and the technical effect are similar, and are not repeated here.
Optionally, the present application further provides a storage medium, on which a computer program is stored, which when being executed by a processor performs the steps of the above-mentioned biological state characterization method.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the indirect coupling or communication connection of devices or elements may be in the form of electrical, mechanical, or otherwise.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.
The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (english: processor) to perform some of the steps of the methods according to the embodiments of the application. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Claims (5)
1. A biological state characterization device, the device comprising:
the first determining module is used for determining a differential expression gene set according to the gene list to be tested and a preset reference gene list;
The acquisition module is used for acquiring a gene network from a preset genome database, wherein the gene network comprises: a plurality of pathway networks, each pathway network including an associated pathway between at least two genes;
The second determining module is used for determining a path fingerprint score corresponding to each path network according to the connection relation of each differential expression gene in the differential expression gene set in each path network;
the third determining module is used for determining a target path network according to the path fingerprint score corresponding to each path network;
A fourth determining module for determining a differentially expressed gene on the network of target pathways as a gene marker for indicating a gene at risk of disease;
Each gene network is used for representing different induction reasons of a disease;
The apparatus further comprises:
The input module is used for inputting the path fingerprint score corresponding to each path network into a pre-trained gene analysis model to obtain a risk value of diseases in the gene list to be tested; the gene analysis model is a model obtained by training according to the path fingerprint score of each path network corresponding to the sample gene list;
The connection relationship comprises a direct connection relationship and an indirect connection relationship, and the gene score of the direct connection relationship is larger than that of the indirect connection relationship;
The second determining module is specifically configured to:
Determining a gene score with a direct connection relationship in each pathway network according to the direct connection relationship of each differential expression gene in the differential expression gene set in each pathway network;
Determining a gene score with an indirect connection relationship in each path network according to the indirect connection relationship of each differential expression gene in the differential expression gene set in each path network;
and determining the corresponding path fingerprint score of each path network according to the gene score with the direct connection relationship in each path network and the gene score with the indirect connection relationship in each path network.
2. The biological state characterization device of claim 1, wherein the third determination module is specifically configured to:
obtaining a basic score corresponding to each channel network according to the number of the differentially expressed genes in each channel network;
obtaining a target path fingerprint score of each path network according to the base score and the path fingerprint score;
the determining the target path network according to the path fingerprint score corresponding to each path network comprises the following steps:
and determining the target path network according to the target path fingerprint score corresponding to each path network.
3. The biological state characterization device of claim 1, wherein the first determining module is specifically configured to:
obtaining gene expression profiles corresponding to the gene list to be tested and the preset reference gene list respectively by adopting a high-throughput sequencing technology;
And obtaining a differential expression gene set according to the gene expression amounts corresponding to the genes in the gene expression profiles respectively corresponding to the to-be-tested gene list and the preset reference gene list.
4. The biological state characterization device of claim 1, wherein the third determination module is specifically configured to:
And determining the target access network according to the access fingerprint score corresponding to each access network and a preset access fingerprint score threshold.
5. The biological state characterization device of claim 1, wherein the device further comprises:
The display module is used for displaying a score table of the path fingerprint scores corresponding to each path network, and the score table is displayed with: the name of each access network and the access fingerprint score of each access network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011596779.XA CN112802546B (en) | 2020-12-29 | 2020-12-29 | Biological state characterization method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011596779.XA CN112802546B (en) | 2020-12-29 | 2020-12-29 | Biological state characterization method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112802546A CN112802546A (en) | 2021-05-14 |
CN112802546B true CN112802546B (en) | 2024-05-03 |
Family
ID=75805614
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011596779.XA Active CN112802546B (en) | 2020-12-29 | 2020-12-29 | Biological state characterization method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112802546B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007002895A1 (en) * | 2005-06-29 | 2007-01-04 | Board Of Trustees Of Michigan State University | Integrative framework for three-stage integrative pathway search |
CN101812525A (en) * | 2010-04-09 | 2010-08-25 | 南通大学 | Method for building Alzheimer's disease (AD) morbidity-associated gene network path analyzing model |
WO2012175675A1 (en) * | 2011-06-24 | 2012-12-27 | Genomatix Software Gmbh | Method of producing biological networks |
WO2016118513A1 (en) * | 2015-01-20 | 2016-07-28 | The Broad Institute, Inc. | Method and system for analyzing biological networks |
CN106055922A (en) * | 2016-06-08 | 2016-10-26 | 哈尔滨工业大学深圳研究生院 | Hybrid network gene screening method based on gene expression data |
CN108108589A (en) * | 2017-12-29 | 2018-06-01 | 郑州轻工业学院 | The recognition methods of esophageal squamous cell carcinoma label based on network index variance analysis |
CN109886385A (en) * | 2019-03-04 | 2019-06-14 | 上海宝藤生物医药科技股份有限公司 | Determination method, apparatus, equipment and the medium of cell-signaling pathways network characterization |
CN109906486A (en) * | 2016-10-03 | 2019-06-18 | 伊鲁米那股份有限公司 | Use phenotype/disease specific gene order of common recognition gene pool and network-based data structure |
WO2019117400A1 (en) * | 2017-12-11 | 2019-06-20 | 연세대학교 산학협력단 | Gene network construction apparatus and method |
CN110444248A (en) * | 2019-07-22 | 2019-11-12 | 山东大学 | Cancer Biology molecular marker screening technique and system based on network topology parameters |
CN111402955A (en) * | 2020-04-09 | 2020-07-10 | 德州学院 | Biological information measuring method, system, storage medium and terminal |
CN111899882A (en) * | 2020-08-07 | 2020-11-06 | 北京科技大学 | Method and system for predicting cancer |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2381471A1 (en) * | 2001-05-07 | 2002-11-07 | Andreas Wagner | System and method for reconstructing pathways in large genetic networks from genetic perturbations |
US20090299646A1 (en) * | 2004-07-30 | 2009-12-03 | Soheil Shams | System and method for biological pathway perturbation analysis |
WO2010056982A2 (en) * | 2008-11-17 | 2010-05-20 | The George Washington University | Compositions and methods for identifying autism spectrum disorders |
EA201590175A1 (en) * | 2012-07-26 | 2015-06-30 | Дзе Реджентс Оф Дзе Юниверсити Оф Калифорния | SCREENING, DIAGNOSTICS AND FORECASTING OF AUTISM AND OTHER DEVELOPMENTAL DISABILITIES |
SG2013079173A (en) * | 2013-10-18 | 2015-05-28 | Agency Science Tech & Res | Sense-antisense gene pairs for patient stratification, prognosis, and therapeutic biomarkers identification |
GB201913690D0 (en) * | 2019-09-23 | 2019-11-06 | Univ Southampton | Molecular phenotype classification |
-
2020
- 2020-12-29 CN CN202011596779.XA patent/CN112802546B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007002895A1 (en) * | 2005-06-29 | 2007-01-04 | Board Of Trustees Of Michigan State University | Integrative framework for three-stage integrative pathway search |
CN101812525A (en) * | 2010-04-09 | 2010-08-25 | 南通大学 | Method for building Alzheimer's disease (AD) morbidity-associated gene network path analyzing model |
WO2012175675A1 (en) * | 2011-06-24 | 2012-12-27 | Genomatix Software Gmbh | Method of producing biological networks |
WO2016118513A1 (en) * | 2015-01-20 | 2016-07-28 | The Broad Institute, Inc. | Method and system for analyzing biological networks |
CN106055922A (en) * | 2016-06-08 | 2016-10-26 | 哈尔滨工业大学深圳研究生院 | Hybrid network gene screening method based on gene expression data |
CN109906486A (en) * | 2016-10-03 | 2019-06-18 | 伊鲁米那股份有限公司 | Use phenotype/disease specific gene order of common recognition gene pool and network-based data structure |
WO2019117400A1 (en) * | 2017-12-11 | 2019-06-20 | 연세대학교 산학협력단 | Gene network construction apparatus and method |
CN108108589A (en) * | 2017-12-29 | 2018-06-01 | 郑州轻工业学院 | The recognition methods of esophageal squamous cell carcinoma label based on network index variance analysis |
CN109886385A (en) * | 2019-03-04 | 2019-06-14 | 上海宝藤生物医药科技股份有限公司 | Determination method, apparatus, equipment and the medium of cell-signaling pathways network characterization |
CN110444248A (en) * | 2019-07-22 | 2019-11-12 | 山东大学 | Cancer Biology molecular marker screening technique and system based on network topology parameters |
CN111402955A (en) * | 2020-04-09 | 2020-07-10 | 德州学院 | Biological information measuring method, system, storage medium and terminal |
CN111899882A (en) * | 2020-08-07 | 2020-11-06 | 北京科技大学 | Method and system for predicting cancer |
Non-Patent Citations (5)
Title |
---|
Comprehensive RNA-Sequencing Analysis in Peripheral Blood Cells Reveals Differential Expression Signatures with Biomarker Potential for Idiopathic Membranous Nephropathy;Wan, N等;DNA AND CELL BIOLOGY;第38卷(第11期);第1223-1232页 * |
基于遗传算法与支持向量机的基因微阵列分析;汪伟;刘红;;中国组织工程研究与临床康复(第17期);第71-75页 * |
基因生物信息学的脓毒症潜在发病机制研究及生物标记物筛选;王海清;左文;陈睦虎;胡迎春;杨蕊萍;钟武;;西南医科大学学报(第01期);第31-35页 * |
基因表达谱芯片的数据挖掘;尤元海;张建中;;中国生物工程杂志(第10期);第93-97页 * |
胃癌关键基因和通路的生物信息学和功能分析;吴茜;宋兴勃;钟慧钰;温阳;应斌武;;肿瘤预防与治疗(第02期);第50-58页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112802546A (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Calderon et al. | Inferring relevant cell types for complex traits by using single-cell gene expression | |
Rau et al. | Transformation and model choice for RNA-seq co-expression analysis | |
AU2017338775B2 (en) | Phenotype/disease specific gene ranking using curated, gene library and network based data structures | |
Beesley et al. | The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities | |
Coissac et al. | Bioinformatic challenges for DNA metabarcoding of plants and animals | |
Ritchie et al. | A scalable permutation approach reveals replication and preservation patterns of network modules in large datasets | |
CN108121896B (en) | Disease relation analysis method and device based on miRNA | |
KR101450784B1 (en) | Systematic identification method of novel drug indications using electronic medical records in network frame method | |
Pihur et al. | Reconstruction of genetic association networks from microarray data: a partial least squares approach | |
Fryett et al. | Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome‐wide association studies | |
Arbet et al. | Lessons and tips for designing a machine learning study using EHR data | |
Milan et al. | Comparison and possible use of in silico tools for carcinogenicity within REACH legislation | |
Su et al. | Utility of characters evolving at diverse rates of evolution to resolve quartet trees with unequal branch lengths: analytical predictions of long-branch effects | |
Avino et al. | Tree shape‐based approaches for the comparative study of cophylogeny | |
CN103473416A (en) | Protein-protein interaction model building method and device | |
Rahnenführer et al. | Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges | |
Stojmirović et al. | Robust and accurate data enrichment statistics via distribution function of sum of weights | |
Callahan et al. | Ontologizing health systems data at scale: making translational discovery a reality | |
Zhou et al. | Quartet-based computations of internode certainty provide accurate and robust measures of phylogenetic incongruence | |
CN112802546B (en) | Biological state characterization method, device, equipment and storage medium | |
CN111383709A (en) | Recognition method and device for CERNA competition module, electronic equipment and storage medium | |
Linheiro et al. | CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure | |
Sundar et al. | An intelligent prediction model for target protein identification in hepatic carcinoma using novel graph theory and ann model | |
Zuber et al. | Selecting causal risk factors from high-throughput experiments using multivariable Mendelian randomization | |
CN112071439B (en) | Drug side effect relationship prediction method, system, computer device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |