CN114255885B - New drug research and development management system and method based on graph data - Google Patents
New drug research and development management system and method based on graph data Download PDFInfo
- Publication number
- CN114255885B CN114255885B CN202111526092.3A CN202111526092A CN114255885B CN 114255885 B CN114255885 B CN 114255885B CN 202111526092 A CN202111526092 A CN 202111526092A CN 114255885 B CN114255885 B CN 114255885B
- Authority
- CN
- China
- Prior art keywords
- compound
- disease
- information
- point type
- compounds
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000012827 research and development Methods 0.000 title claims abstract description 17
- 239000002547 new drug Substances 0.000 title claims description 17
- 150000001875 compounds Chemical class 0.000 claims abstract description 168
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 121
- 201000010099 disease Diseases 0.000 claims abstract description 120
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 78
- 230000000694 effects Effects 0.000 claims abstract description 56
- 239000003814 drug Substances 0.000 claims abstract description 21
- 238000010586 diagram Methods 0.000 claims abstract description 11
- 238000011282 treatment Methods 0.000 claims description 18
- 238000007726 management method Methods 0.000 claims description 17
- 238000009509 drug development Methods 0.000 claims description 13
- 210000000349 chromosome Anatomy 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 5
- 238000012163 sequencing technique Methods 0.000 claims description 4
- 230000009471 action Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000011160 research Methods 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract 1
- 206010008342 Cervix carcinoma Diseases 0.000 description 8
- 206010039491 Sarcoma Diseases 0.000 description 8
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 8
- 201000010881 cervical cancer Diseases 0.000 description 8
- 208000008439 Biliary Liver Cirrhosis Diseases 0.000 description 7
- 208000033222 Biliary cirrhosis primary Diseases 0.000 description 7
- 208000012654 Primary biliary cholangitis Diseases 0.000 description 7
- 238000011161 development Methods 0.000 description 4
- 102100038367 Gremlin-1 Human genes 0.000 description 2
- 101001032872 Homo sapiens Gremlin-1 Proteins 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000001225 therapeutic effect Effects 0.000 description 2
- 201000004569 Blindness Diseases 0.000 description 1
- 206010033128 Ovarian cancer Diseases 0.000 description 1
- 206010061535 Ovarian neoplasm Diseases 0.000 description 1
- 208000002495 Uterine Neoplasms Diseases 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 206010046766 uterine cancer Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/90—Programming languages; Computing architectures; Database systems; Data warehousing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Bioethics (AREA)
- Software Systems (AREA)
- Toxicology (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The embodiment of the invention discloses a new medicine research and development management system and method based on graph data, wherein the system comprises a data acquisition module for acquiring and integrating medicine data; the medical data includes compound information, disease information, target gene information, and side effect information; the diagram data module is used for constructing a diagram model according to the medical data; wherein, each compound, disease, target gene and side effect are regarded as peaks, and the correlation factor between each peak is regarded as side; the query prediction module is used for transmitting the query information to the graph model for prediction according to the acquired query information, and displaying the fed-back prediction result; the beneficial effects are as follows: by constructing a correlation network according to the information of the compound, the disease, the target gene, the side effect and the like, a graph model is obtained, so that new medicine research personnel can be helped to quickly find the relation among the compound, the disease and the target gene, the research and development progress of the new medicine is accelerated, and the research and development efficiency of the new medicine is further improved.
Description
Technical Field
The invention relates to the technical field of information processing, in particular to a new medicine research and development management system and method based on graph data.
Background
The development of new drugs is a very time-consuming, costly and labor-consuming project, and billions to billions of data are accumulated in the development stage, and relate to how various compounds treat diseases, what genes are targeted by various compounds, what side effects are caused by various compounds while treating the diseases, and the like. The data are huge in volume and complex in association, and if the value of the associated data can be quickly released, the period of new medicine development is greatly shortened, more patients can take new medicines more quickly, and the trouble of pain is eliminated.
However, the data are stored in the relational database, ten or more relational tables of TB level are generated, ten query languages are required to be written for each query, a plurality of relational tables are associated, and a great amount of time is consumed to obtain a result. And in a plurality of links of new medicine research and development, each link involves a large amount of associated inquiry of a large amount of data. The inability to quickly interrogate these vast amounts of associated data becomes a large block that hinders the improvement of new drug development efficiency.
Disclosure of Invention
The invention aims at: the novel drug development management system and method based on the graph data are provided for helping novel drug developers to quickly discover the relation among compounds, diseases and target genes and accelerating development progress.
First aspect: a new drug development management system based on graph data, comprising:
the data acquisition module is used for acquiring and integrating the medical data; wherein the medical data includes compound information, disease information, target gene information, and side effect information;
the diagram data module is used for constructing a diagram model according to the medical data; wherein, each compound, disease, target gene and side effect are regarded as peaks, and the correlation factor between each peak is regarded as side;
And the query prediction module is used for transmitting the query information to the graph model for prediction according to the acquired query information, and displaying the fed-back prediction result.
Preferably, the compound information includes compound ID, compound name, data source, international compound identification, and similar compound information;
the disease information includes a disease ID, a disease name, and similar disease information;
the target gene information comprises target gene ID, target gene name, gene description and chromosome;
the side effect information includes a side effect ID and a side effect name.
Preferably, the association factors include similar compounds, similar diseases, combinations, treatments, causes and links a plurality of factors, and each factor is taken as a corresponding edge type.
Preferably, when the edge type is a similar compound, the corresponding start point type and end point type are both compounds;
when the edge type is similar to the disease, the corresponding starting point type and ending point type are both diseases;
When the edge types are combination, the corresponding starting point type is a compound, and the ending point type is a target gene;
When the side type is treatment, the corresponding starting point type is a compound, and the ending point type is a disease;
When the edge type is caused, the corresponding starting point type is a compound, and the ending point type is a side effect;
when the edge type is the connection, the corresponding starting point type is the disease, and the ending point type is the target gene.
Preferably, the graph query language is adopted and the prediction results are ranked during query.
Second aspect: a new drug development management method based on graph data, which is applied to the new drug development management system based on graph data in the first aspect, the method comprises the following steps:
acquiring and integrating medical data; wherein the medical data includes compound information, disease information, target gene information, and side effect information;
constructing a graph model according to the medical data; wherein, each compound, disease, target gene and side effect are regarded as peaks, and the correlation factor between each peak is regarded as side;
And according to the acquired query information, transmitting the query information to the graph model for prediction, and displaying the fed-back prediction result.
Preferably, the compound information includes compound ID, compound name, data source, international compound identification, and similar compound information;
the disease information includes a disease ID, a disease name, and similar disease information;
the target gene information comprises target gene ID, target gene name, gene description and chromosome;
the side effect information includes a side effect ID and a side effect name.
Preferably, the association factors include similar compounds, similar diseases, combinations, treatments, causes and links a plurality of factors, and each factor is taken as a corresponding edge type.
Preferably, when the edge type is a similar compound, the corresponding start point type and end point type are both compounds;
when the edge type is similar to the disease, the corresponding starting point type and ending point type are both diseases;
When the edge types are combination, the corresponding starting point type is a compound, and the ending point type is a target gene;
When the side type is treatment, the corresponding starting point type is a compound, and the ending point type is a disease;
When the edge type is caused, the corresponding starting point type is a compound, and the ending point type is a side effect;
when the edge type is the connection, the corresponding starting point type is the disease, and the ending point type is the target gene.
Preferably, the graph query language is adopted and the prediction results are ranked during query.
By adopting the technical scheme, the method has the following advantages: according to the new medicine research and development management system and method based on the graph data, the graph model is obtained by constructing the association relation network according to the information of the compound, the disease, the target gene, the side effect and the like, so that the association conditions of the compound, the disease, the target gene and the side effect are fully displayed, a new medicine research and development staff is helped to quickly find the relation among the compound, the disease and the target gene, the research and development progress of the new medicine is accelerated, and the research and development efficiency of the new medicine is further improved.
Drawings
FIG. 1 is a system block diagram of a new drug development management system based on graph data provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of a graphic model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a prediction result according to an embodiment of the present invention;
Fig. 4 is a flowchart of a new drug development management method based on graph data according to an embodiment of the present invention.
Detailed Description
Specific embodiments of the invention will be described in detail below, it being noted that the embodiments described herein are for illustration only and are not intended to limit the invention. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that: no such specific details are necessary to practice the invention. In other instances, well-known circuits, software, or methods have not been described in detail in order not to obscure the invention.
Throughout the specification, references to "one embodiment," "an embodiment," "one example," or "an example" mean: a particular feature, structure, or characteristic described in connection with the embodiment or example is included within at least one embodiment of the invention. Thus, the appearances of the phrases "in one embodiment," "in an embodiment," "one example," or "an example" in various places throughout this specification are not necessarily all referring to the same embodiment or example. Furthermore, the particular features, structures, or characteristics may be combined in any suitable combination and/or sub-combination in one or more embodiments or examples. Moreover, those of ordinary skill in the art will appreciate that the illustrations provided herein are for illustrative purposes and that the illustrations are not necessarily drawn to scale.
The present invention will be described in detail with reference to the accompanying drawings.
Referring to fig. 1 and fig. 2, a new drug development management system based on graph data provided by an embodiment of the present invention includes:
the data acquisition module is used for acquiring and integrating the medical data; wherein the medical data includes compound information, disease information, target gene information, and side effect information.
Specifically, the medical data includes medical data derived from internet disclosure, and data accumulated by pharmaceutical companies themselves, and these data are taken as sample data sets; the scale of the sample dataset: sample data set the sample data set contains 17 ten thousand-sided relationships of 137 diseases, 1552 compounds, 5734 side effects, 20945 target genes, similarity between points, treatment, and the like; wherein:
the sample dataset content details:
Compound information: such as compound ID, compound name, data source, international compound identification, url;
Disease information: such as disease ID, disease name, data source, url;
target gene information: such as target gene ID, target gene name, data source, url, gene description, chromosome;
Side effect information: such as side effect ID, side effect name, data source, url;
Similar compound information: such as two compound similarity, data source;
similar disease information: such as a data source;
Compounds cause side effects, compounds bind to target genes, compounds treat diseases, and disease link target gene information.
The diagram data module is used for constructing a diagram model according to the medical data; wherein each compound, disease, target gene and side effect are regarded as vertices, and the correlation factor between vertices is regarded as edges.
In particular, the association factors include similar compounds, similar diseases, binding, treatment, creation and association of a plurality of factors, and each factor is taken as a corresponding edge type.
Referring to table 1, the point types in the graph model are:
TABLE 1
Correspondingly, when the edge types are similar compounds, the corresponding starting point types and the corresponding ending point types are both compounds;
when the edge type is similar to the disease, the corresponding starting point type and ending point type are both diseases;
When the edge types are combination, the corresponding starting point type is a compound, and the ending point type is a target gene;
When the side type is treatment, the corresponding starting point type is a compound, and the ending point type is a disease;
When the edge type is caused, the corresponding starting point type is a compound, and the ending point type is a side effect;
when the edge type is the connection, the corresponding starting point type is the disease, and the ending point type is the target gene.
Specifically, referring to table 2, the edge types in the graph model are:
TABLE 2
Type of starting point | Edge type | Type of termination point | Attributes of |
Compounds of formula (I) | Analogous compounds | Compounds of formula (I) | Similarity, data sources |
Compounds of formula (I) | Bonding of | Target gene | Data source |
Compounds of formula (I) | Treatment of | Disease of the human body | Data source |
Compounds of formula (I) | Resulting in | Side effects | Data source |
Disease of the human body | Contact with | Target gene | Data source |
Disease of the human body | Similar diseases | Disease of the human body | Data source |
And the query prediction module is used for transmitting the query information to the graph model for prediction according to the acquired query information, and displaying the fed-back prediction result.
Specifically, during query, adopting a graph query language, and sequencing the prediction results; when the method is applied, the adopted Cypher, gremlin isograph query languages can concentrate dozens of associated queries of the original relational database into one query, so that the code quantity is reduced; meanwhile, the ranking can be performed according to the similarity between the obtained compounds; the related point types are corresponding to at least one of the related factors during query, and can be specifically referred to table 2.
Further, to facilitate a better understanding of the present solution, specific business requirements are exemplified below.
Business appeal 1:
In the process of developing new drugs, the searching of the Miao compound takes a great deal of time and energy, and the way of searching the Miao compound at the present stage is random screening, so that blindness is achieved; the graph data technology can be used for predicting the Miao ethnic compound from the angles of similarity and the same action mechanism, so that the research and development efficiency of the new drug is improved.
Query description:
finding a disease, for example, a similar disease of CERVICAL CANCER (cervical cancer);
Compounds capable of treating similar diseases were found as predicted leptic compounds.
Query statement:
Analogous diseases to the finding of diseases CERVICAL CANCER (cervical cancer), and compounds having therapeutic effects on analogous diseases
MATCH p= (j: disease { name: 'CERVICAL CANCER' } - [ r: similar disease ] - (h 1) - [ r1: treatment ] - (f)
Hybrid compounds useful for treating and preventing diseases
RETURN p
Referring to FIG. 3, the query results are shown, wherein the query results firstly query similar diseases to the cervical cancer, namely uterine cancer and ovarian cancer; then according to the relevant factor of treatment, finding out a compound capable of treating similar diseases as a predicted Miao ethnic compound;
compounds that may be able to treat the disease CERVICAL CANCER (cervical cancer) can be found from figure 3 by similarity of the disease, and early experimental verification of compounds that are able to treat both similar diseases can be performed.
Business appeal 2:
Query description:
Finding a compound capable of treating the disease sarcomas (sarcomas);
similar compounds to the above compounds were found as predicted leptic compounds.
Query statement:
similar compounds to those capable of treating the disease sarcomas are sought.
MATCH p= (j: disease { name: 'sarcoma' } - [ r: treatment ] - (h 1) - [ r1: analog compound ] - (f)
The compounds returned to treat the disease sarcoma (sarcoma), and the predicted Miao ethnic compound.
RETURN p
Finally, the compound which can treat the disease sarcomas is found through the similarity of the compounds, and then the experiment verification is carried out after the similarity of the compounds is sequenced.
Business appeal 3:
Query description:
Searching for compounds capable of treating disease primary biliary cirrhosis (primary biliary cirrhosis);
finding out target genes and side effects of the compound;
And (3) finding out compounds which have the same target genes and side effects as the compounds, and taking the compounds as predicted leptic compounds.
Query statement:
The finding of a compound that is capable of treating disease primary biliary cirrhosis (primary biliary cirrhosis) and has the same side effects as the compound and binding to the target gene.
MATCH p= (j: disease { name } primary biliary cirrhosis' } is < r: treatment ] - (h 1: compound) - [ r1: cause ] - > (f) < - [ r2: cause ] - (h 2: compound) - [ r3: bind ] - > (b) < - [ r4: bind ] - (h 1)
The compounds that have the same side effects and binding genes as the therapeutic disease sarcomas are regarded as predicted Miao compounds.
RETURN p
Finally, the compound which can possibly treat the disease primary biliary cirrhosis (primary biliary cirrhosis) can be found through the same binding genes and side effects of the compound, and experimental verification can be carried out on the compound.
By adopting the scheme, the graph model is obtained by forming the association relation network according to the information of the compound, the disease, the target gene, the side effect and the like, so that the association conditions of the compound, the disease, the target gene and the side effect are fully displayed, new medicine research personnel are helped to quickly find the relationship among the compound, the disease and the target gene, the research and development progress of the new medicine is accelerated, and the research and development efficiency of the new medicine is further improved.
Based on the inventive concept of the system, referring to fig. 4, the embodiment of the invention further provides a new drug development management method based on graph data, which is applied to the new drug development management system based on graph data, and the method includes:
s101, acquiring and integrating medical data; wherein the medical data includes compound information, disease information, target gene information, and side effect information.
Specifically, the medical data includes medical data derived from internet disclosures, and data accumulated by pharmaceutical companies themselves.
The compound information includes compound ID, compound name, data source, international compound identity, and similar compound information;
the disease information includes a disease ID, a disease name, and similar disease information;
the target gene information comprises target gene ID, target gene name, gene description and chromosome;
the side effect information includes a side effect ID and a side effect name.
S102, constructing a graph model according to the medical data; wherein each compound, disease, target gene and side effect are regarded as vertices, and the correlation factor between vertices is regarded as edges.
In particular, the association factors include similar compounds, similar diseases, binding, treatment, creation and association of a plurality of factors, and each factor is taken as a corresponding edge type.
Correspondingly, when the edge types are similar compounds, the corresponding starting point types and the corresponding ending point types are both compounds;
when the edge type is similar to the disease, the corresponding starting point type and ending point type are both diseases;
When the edge types are combination, the corresponding starting point type is a compound, and the ending point type is a target gene;
When the side type is treatment, the corresponding starting point type is a compound, and the ending point type is a disease;
When the edge type is caused, the corresponding starting point type is a compound, and the ending point type is a side effect;
when the edge type is the connection, the corresponding starting point type is the disease, and the ending point type is the target gene.
S103, according to the acquired query information, transmitting the query information to the graph model for prediction, and displaying the fed-back prediction result.
Specifically, during query, adopting a graph query language, and sequencing the prediction results; when the method is applied, the adopted Cypher, gremlin isograph query languages can concentrate dozens of associated queries of the original relational database into one query, so that the code quantity is reduced; meanwhile, the ordering may be performed according to the similarity between the obtained compounds.
It should be noted that, for more specific working processes and examples of the method, please refer to the foregoing system embodiment part, and no further description is provided herein.
By adopting the method, the association conditions of the compound, the disease and the gene are presented in a full dimension by using the constructed graph model, so that new medicine research personnel can be helped to quickly find the relationship among the compound, the disease and the gene, and the research and development progress of the new medicine is quickened.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.
Claims (4)
1. A new medicine research and development management system based on graph data is characterized in that: comprising the following steps:
the data acquisition module is used for acquiring and integrating the medical data; wherein the medical data includes compound information, disease information, target gene information, and side effect information;
The diagram data module is used for constructing a diagram model according to the medical data; wherein, each compound, disease, target gene and side effect are regarded as peaks, and the correlation factor between each peak is regarded as side; the association factors include similar compounds, similar diseases, binding, treating, causing and linking a plurality of factors, and each factor is taken as a corresponding edge type; the related point type corresponds to at least one related factor;
the query prediction module is used for transmitting the query information to the graph model for prediction according to the acquired query information, and displaying the fed-back prediction result;
In the prediction, from the angles of similarity and the same action mechanism, the prediction of the Miao ethnic compound is carried out;
searching for similar diseases, and finding out a compound capable of treating the similar diseases as a predicted Miao ethnic compound according to the relevant factor of treatment;
Finding compounds which have the same target genes and side effects as the compounds, and taking the compounds as predicted leptic compounds;
During inquiry, adopting a graph inquiry language, and carrying out experimental verification after sequencing the prediction results;
the compound information includes compound ID, compound name, data source, international compound identity, and similar compound information; wherein the similar compound information includes two compound similarities;
the disease information includes a disease ID, a disease name, and similar disease information;
the target gene information includes target gene ID, target gene name, gene description and chromosome;
The side effect information includes a side effect ID and a side effect name;
The compounds cause side effects, the compounds bind to target genes, the compounds treat diseases and the disease link target gene information.
2. The new drug development management system based on graph data of claim 1, wherein: when the edge type is similar compound, the corresponding starting point type and ending point type are both compounds;
when the edge type is similar to the disease, the corresponding starting point type and ending point type are both diseases;
When the edge types are combination, the corresponding starting point type is a compound, and the ending point type is a target gene;
When the side type is treatment, the corresponding starting point type is a compound, and the ending point type is a disease;
When the edge type is caused, the corresponding starting point type is a compound, and the ending point type is a side effect;
when the edge type is the connection, the corresponding starting point type is the disease, and the ending point type is the target gene.
3. A new medicine research and development management method based on graph data is characterized in that: a new drug development management system for application to the graph-based data of claim 1, the method comprising:
acquiring and integrating medical data; wherein the medical data includes compound information, disease information, target gene information, and side effect information;
Constructing a graph model according to the medical data; wherein, each compound, disease, target gene and side effect are regarded as peaks, and the correlation factor between each peak is regarded as side; the association factors include similar compounds, similar diseases, binding, treating, causing and linking a plurality of factors, and each factor is taken as a corresponding edge type; the related point type corresponds to at least one related factor;
According to the acquired query information, transmitting the query information to the graph model for prediction, and displaying the fed-back prediction result;
In the prediction, from the angles of similarity and the same action mechanism, the prediction of the Miao ethnic compound is carried out;
searching for similar diseases, and finding out a compound capable of treating the similar diseases as a predicted Miao ethnic compound according to the relevant factor of treatment;
Finding compounds which have the same target genes and side effects as the compounds, and taking the compounds as predicted leptic compounds;
During inquiry, adopting a graph inquiry language, and carrying out experimental verification after sequencing the prediction results;
the compound information includes compound ID, compound name, data source, international compound identity, and similar compound information; wherein the similar compound information includes two compound similarities;
the disease information includes a disease ID, a disease name, and similar disease information;
the target gene information includes target gene ID, target gene name, gene description and chromosome;
The side effect information includes a side effect ID and a side effect name;
The compounds cause side effects, the compounds bind to target genes, the compounds treat diseases and the disease link target gene information.
4. A new drug development management method based on graph data according to claim 3, wherein: when the edge type is similar compound, the corresponding starting point type and ending point type are both compounds;
when the edge type is similar to the disease, the corresponding starting point type and ending point type are both diseases;
When the edge types are combination, the corresponding starting point type is a compound, and the ending point type is a target gene;
When the side type is treatment, the corresponding starting point type is a compound, and the ending point type is a disease;
When the edge type is caused, the corresponding starting point type is a compound, and the ending point type is a side effect;
when the edge type is the connection, the corresponding starting point type is the disease, and the ending point type is the target gene.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111526092.3A CN114255885B (en) | 2021-12-14 | 2021-12-14 | New drug research and development management system and method based on graph data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111526092.3A CN114255885B (en) | 2021-12-14 | 2021-12-14 | New drug research and development management system and method based on graph data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114255885A CN114255885A (en) | 2022-03-29 |
CN114255885B true CN114255885B (en) | 2024-09-13 |
Family
ID=80792178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111526092.3A Active CN114255885B (en) | 2021-12-14 | 2021-12-14 | New drug research and development management system and method based on graph data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114255885B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191014A (en) * | 2019-12-26 | 2020-05-22 | 上海科技发展有限公司 | Medicine relocation method, system, terminal and medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1490822A2 (en) * | 2002-02-04 | 2004-12-29 | Ingenuity Systems Inc. | Drug discovery methods |
KR101117603B1 (en) * | 2011-08-16 | 2012-03-07 | (주)신테카바이오 | System and method for providing functional correlation information of biomedical data by generating inter-linkable maps |
US20150371009A1 (en) * | 2014-06-19 | 2015-12-24 | Jake Yue Chen | Drug identification models and methods of using the same to identify compounds to treat disease |
CN109325131B (en) * | 2018-09-27 | 2021-03-02 | 大连理工大学 | Medicine identification method based on biomedical knowledge map reasoning |
KR102225278B1 (en) * | 2020-01-31 | 2021-03-10 | 주식회사 스탠다임 | Prediction Method for Disease, Gene or Protein related Query Entity and built Prediction System using the same |
CN113742443B (en) * | 2020-05-29 | 2024-09-10 | 京东方科技集团股份有限公司 | Multi-drug sharing query method, mobile terminal and storage medium |
CN113707264B (en) * | 2021-08-31 | 2024-09-06 | 平安科技(深圳)有限公司 | Machine learning-based medicine recommendation method, device, equipment and medium |
-
2021
- 2021-12-14 CN CN202111526092.3A patent/CN114255885B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191014A (en) * | 2019-12-26 | 2020-05-22 | 上海科技发展有限公司 | Medicine relocation method, system, terminal and medium |
Also Published As
Publication number | Publication date |
---|---|
CN114255885A (en) | 2022-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Agrawal et al. | Large language models are few-shot clinical information extractors | |
Bravo et al. | Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research | |
Chao et al. | Multi-view cluster analysis with incomplete data to understand treatment effects | |
CN109493925B (en) | Method for determining incidence relation between medicine and medicine target | |
Dhombres et al. | Interoperability between phenotypes in research and healthcare terminologies—Investigating partial mappings between HPO and SNOMED CT | |
Zhu et al. | Biomedical text mining and its applications in cancer research | |
Veronin et al. | A systematic approach to'cleaning'of drug name records data in the FAERS database: a case report | |
Hettne et al. | Rewriting and suppressing UMLS terms for improved biomedical term identification | |
Sinisi et al. | Optimal personalised treatment computation through in silico clinical trials on patient digital twins | |
Wei et al. | SimConcept: A hybrid approach for simplifying composite named entities in biomedicine | |
CN114860887A (en) | Disease content pushing method, device, equipment and medium based on intelligent association | |
CN114255885B (en) | New drug research and development management system and method based on graph data | |
Lin et al. | Outcomes of out-of-hospital cardiac arrests after a decade of system-wide initiatives optimising community chain of survival in Taipei city | |
Weinzierl et al. | The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts | |
CN115376704A (en) | Medicine-disease interaction prediction method fusing multi-neighborhood correlation information | |
CN113064960A (en) | Method for accurately searching cases similar to patient's condition | |
Shi et al. | Predicting binary, discrete and continued lncRNA-disease associations via a unified framework based on graph regression | |
Di Lena et al. | MIMO: an efficient tool for molecular interaction maps overlap | |
Mortensen et al. | Modest Use of Ontology Design Patterns in a Repository of Biomedical Ontologies. | |
Gravina et al. | Controlling astrocyte-mediated synaptic pruning signals for schizophrenia drug repurposing with deep graph networks | |
Samuel et al. | Mining online full-text literature for novel protein interaction discovery | |
CN114121293A (en) | Clinical trial information mining and inquiring method and device | |
US20200303033A1 (en) | System and method for data curation | |
CN112667809A (en) | Text processing method and device, electronic equipment and storage medium | |
CN114765060A (en) | Multi-attention method for predicting drug target interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |