CN114255885B

CN114255885B - New drug research and development management system and method based on graph data

Info

Publication number: CN114255885B
Application number: CN202111526092.3A
Authority: CN
Inventors: 张晨
Original assignee: Zhejiang Create Link Technology Co ltd
Current assignee: Zhejiang Create Link Technology Co ltd
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2024-09-13
Anticipated expiration: 2041-12-14
Also published as: CN114255885A

Abstract

The embodiment of the invention discloses a new medicine research and development management system and method based on graph data, wherein the system comprises a data acquisition module for acquiring and integrating medicine data; the medical data includes compound information, disease information, target gene information, and side effect information; the diagram data module is used for constructing a diagram model according to the medical data; wherein, each compound, disease, target gene and side effect are regarded as peaks, and the correlation factor between each peak is regarded as side; the query prediction module is used for transmitting the query information to the graph model for prediction according to the acquired query information, and displaying the fed-back prediction result; the beneficial effects are as follows: by constructing a correlation network according to the information of the compound, the disease, the target gene, the side effect and the like, a graph model is obtained, so that new medicine research personnel can be helped to quickly find the relation among the compound, the disease and the target gene, the research and development progress of the new medicine is accelerated, and the research and development efficiency of the new medicine is further improved.

Description

New drug research and development management system and method based on graph data

Technical Field

The invention relates to the technical field of information processing, in particular to a new medicine research and development management system and method based on graph data.

Background

The development of new drugs is a very time-consuming, costly and labor-consuming project, and billions to billions of data are accumulated in the development stage, and relate to how various compounds treat diseases, what genes are targeted by various compounds, what side effects are caused by various compounds while treating the diseases, and the like. The data are huge in volume and complex in association, and if the value of the associated data can be quickly released, the period of new medicine development is greatly shortened, more patients can take new medicines more quickly, and the trouble of pain is eliminated.

However, the data are stored in the relational database, ten or more relational tables of TB level are generated, ten query languages are required to be written for each query, a plurality of relational tables are associated, and a great amount of time is consumed to obtain a result. And in a plurality of links of new medicine research and development, each link involves a large amount of associated inquiry of a large amount of data. The inability to quickly interrogate these vast amounts of associated data becomes a large block that hinders the improvement of new drug development efficiency.

Disclosure of Invention

The invention aims at: the novel drug development management system and method based on the graph data are provided for helping novel drug developers to quickly discover the relation among compounds, diseases and target genes and accelerating development progress.

First aspect: a new drug development management system based on graph data, comprising:

the data acquisition module is used for acquiring and integrating the medical data; wherein the medical data includes compound information, disease information, target gene information, and side effect information;

the diagram data module is used for constructing a diagram model according to the medical data; wherein, each compound, disease, target gene and side effect are regarded as peaks, and the correlation factor between each peak is regarded as side;

And the query prediction module is used for transmitting the query information to the graph model for prediction according to the acquired query information, and displaying the fed-back prediction result.

Preferably, the compound information includes compound ID, compound name, data source, international compound identification, and similar compound information;

the disease information includes a disease ID, a disease name, and similar disease information;

the target gene information comprises target gene ID, target gene name, gene description and chromosome;

the side effect information includes a side effect ID and a side effect name.

Preferably, the association factors include similar compounds, similar diseases, combinations, treatments, causes and links a plurality of factors, and each factor is taken as a corresponding edge type.

Preferably, when the edge type is a similar compound, the corresponding start point type and end point type are both compounds;

when the edge type is similar to the disease, the corresponding starting point type and ending point type are both diseases;

When the edge types are combination, the corresponding starting point type is a compound, and the ending point type is a target gene;

When the side type is treatment, the corresponding starting point type is a compound, and the ending point type is a disease;

When the edge type is caused, the corresponding starting point type is a compound, and the ending point type is a side effect;

when the edge type is the connection, the corresponding starting point type is the disease, and the ending point type is the target gene.

Preferably, the graph query language is adopted and the prediction results are ranked during query.

Second aspect: a new drug development management method based on graph data, which is applied to the new drug development management system based on graph data in the first aspect, the method comprises the following steps:

acquiring and integrating medical data; wherein the medical data includes compound information, disease information, target gene information, and side effect information;

constructing a graph model according to the medical data; wherein, each compound, disease, target gene and side effect are regarded as peaks, and the correlation factor between each peak is regarded as side;

And according to the acquired query information, transmitting the query information to the graph model for prediction, and displaying the fed-back prediction result.

the side effect information includes a side effect ID and a side effect name.

By adopting the technical scheme, the method has the following advantages: according to the new medicine research and development management system and method based on the graph data, the graph model is obtained by constructing the association relation network according to the information of the compound, the disease, the target gene, the side effect and the like, so that the association conditions of the compound, the disease, the target gene and the side effect are fully displayed, a new medicine research and development staff is helped to quickly find the relation among the compound, the disease and the target gene, the research and development progress of the new medicine is accelerated, and the research and development efficiency of the new medicine is further improved.

Drawings

FIG. 1 is a system block diagram of a new drug development management system based on graph data provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram of a graphic model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a prediction result according to an embodiment of the present invention;

Fig. 4 is a flowchart of a new drug development management method based on graph data according to an embodiment of the present invention.

Detailed Description

Specific embodiments of the invention will be described in detail below, it being noted that the embodiments described herein are for illustration only and are not intended to limit the invention. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that: no such specific details are necessary to practice the invention. In other instances, well-known circuits, software, or methods have not been described in detail in order not to obscure the invention.

Throughout the specification, references to "one embodiment," "an embodiment," "one example," or "an example" mean: a particular feature, structure, or characteristic described in connection with the embodiment or example is included within at least one embodiment of the invention. Thus, the appearances of the phrases "in one embodiment," "in an embodiment," "one example," or "an example" in various places throughout this specification are not necessarily all referring to the same embodiment or example. Furthermore, the particular features, structures, or characteristics may be combined in any suitable combination and/or sub-combination in one or more embodiments or examples. Moreover, those of ordinary skill in the art will appreciate that the illustrations provided herein are for illustrative purposes and that the illustrations are not necessarily drawn to scale.

The present invention will be described in detail with reference to the accompanying drawings.

Referring to fig. 1 and fig. 2, a new drug development management system based on graph data provided by an embodiment of the present invention includes:

the data acquisition module is used for acquiring and integrating the medical data; wherein the medical data includes compound information, disease information, target gene information, and side effect information.

Specifically, the medical data includes medical data derived from internet disclosure, and data accumulated by pharmaceutical companies themselves, and these data are taken as sample data sets; the scale of the sample dataset: sample data set the sample data set contains 17 ten thousand-sided relationships of 137 diseases, 1552 compounds, 5734 side effects, 20945 target genes, similarity between points, treatment, and the like; wherein:

the sample dataset content details:

Compound information: such as compound ID, compound name, data source, international compound identification, url;

Disease information: such as disease ID, disease name, data source, url;

target gene information: such as target gene ID, target gene name, data source, url, gene description, chromosome;

Side effect information: such as side effect ID, side effect name, data source, url;

Similar compound information: such as two compound similarity, data source;

similar disease information: such as a data source;

Compounds cause side effects, compounds bind to target genes, compounds treat diseases, and disease link target gene information.

The diagram data module is used for constructing a diagram model according to the medical data; wherein each compound, disease, target gene and side effect are regarded as vertices, and the correlation factor between vertices is regarded as edges.

In particular, the association factors include similar compounds, similar diseases, binding, treatment, creation and association of a plurality of factors, and each factor is taken as a corresponding edge type.

Referring to table 1, the point types in the graph model are:

TABLE 1

Correspondingly, when the edge types are similar compounds, the corresponding starting point types and the corresponding ending point types are both compounds;

Specifically, referring to table 2, the edge types in the graph model are:

TABLE 2

Type of starting point	Edge type	Type of termination point	Attributes of
				Compounds of formula (I)	Analogous compounds	Compounds of formula (I)	Similarity, data sources
Compounds of formula (I)	Bonding of	Target gene	Data source
				Compounds of formula (I)	Treatment of	Disease of the human body	Data source
Compounds of formula (I)	Resulting in	Side effects	Data source
				Disease of the human body	Contact with	Target gene	Data source
Disease of the human body	Similar diseases	Disease of the human body	Data source

Specifically, during query, adopting a graph query language, and sequencing the prediction results; when the method is applied, the adopted Cypher, gremlin isograph query languages can concentrate dozens of associated queries of the original relational database into one query, so that the code quantity is reduced; meanwhile, the ranking can be performed according to the similarity between the obtained compounds; the related point types are corresponding to at least one of the related factors during query, and can be specifically referred to table 2.

Further, to facilitate a better understanding of the present solution, specific business requirements are exemplified below.

Business appeal 1:

In the process of developing new drugs, the searching of the Miao compound takes a great deal of time and energy, and the way of searching the Miao compound at the present stage is random screening, so that blindness is achieved; the graph data technology can be used for predicting the Miao ethnic compound from the angles of similarity and the same action mechanism, so that the research and development efficiency of the new drug is improved.

Query description:

finding a disease, for example, a similar disease of CERVICAL CANCER (cervical cancer);

Compounds capable of treating similar diseases were found as predicted leptic compounds.

Query statement:

Analogous diseases to the finding of diseases CERVICAL CANCER (cervical cancer), and compounds having therapeutic effects on analogous diseases

MATCH p= (j: disease { name: 'CERVICAL CANCER' } - [ r: similar disease ] - (h 1) - [ r1: treatment ] - (f)

Hybrid compounds useful for treating and preventing diseases

RETURN p

Referring to FIG. 3, the query results are shown, wherein the query results firstly query similar diseases to the cervical cancer, namely uterine cancer and ovarian cancer; then according to the relevant factor of treatment, finding out a compound capable of treating similar diseases as a predicted Miao ethnic compound;

compounds that may be able to treat the disease CERVICAL CANCER (cervical cancer) can be found from figure 3 by similarity of the disease, and early experimental verification of compounds that are able to treat both similar diseases can be performed.

Business appeal 2:

Query description:

Finding a compound capable of treating the disease sarcomas (sarcomas);

similar compounds to the above compounds were found as predicted leptic compounds.

Query statement:

similar compounds to those capable of treating the disease sarcomas are sought.

MATCH p= (j: disease { name: 'sarcoma' } - [ r: treatment ] - (h 1) - [ r1: analog compound ] - (f)

The compounds returned to treat the disease sarcoma (sarcoma), and the predicted Miao ethnic compound.

RETURN p

Finally, the compound which can treat the disease sarcomas is found through the similarity of the compounds, and then the experiment verification is carried out after the similarity of the compounds is sequenced.

Business appeal 3:

Query description:

Searching for compounds capable of treating disease primary biliary cirrhosis (primary biliary cirrhosis);

finding out target genes and side effects of the compound;

And (3) finding out compounds which have the same target genes and side effects as the compounds, and taking the compounds as predicted leptic compounds.

Query statement:

The finding of a compound that is capable of treating disease primary biliary cirrhosis (primary biliary cirrhosis) and has the same side effects as the compound and binding to the target gene.

MATCH p= (j: disease { name } primary biliary cirrhosis' } is < r: treatment ] - (h 1: compound) - [ r1: cause ] - > (f) < - [ r2: cause ] - (h 2: compound) - [ r3: bind ] - > (b) < - [ r4: bind ] - (h 1)

The compounds that have the same side effects and binding genes as the therapeutic disease sarcomas are regarded as predicted Miao compounds.

RETURN p

Finally, the compound which can possibly treat the disease primary biliary cirrhosis (primary biliary cirrhosis) can be found through the same binding genes and side effects of the compound, and experimental verification can be carried out on the compound.

By adopting the scheme, the graph model is obtained by forming the association relation network according to the information of the compound, the disease, the target gene, the side effect and the like, so that the association conditions of the compound, the disease, the target gene and the side effect are fully displayed, new medicine research personnel are helped to quickly find the relationship among the compound, the disease and the target gene, the research and development progress of the new medicine is accelerated, and the research and development efficiency of the new medicine is further improved.

Based on the inventive concept of the system, referring to fig. 4, the embodiment of the invention further provides a new drug development management method based on graph data, which is applied to the new drug development management system based on graph data, and the method includes:

s101, acquiring and integrating medical data; wherein the medical data includes compound information, disease information, target gene information, and side effect information.

Specifically, the medical data includes medical data derived from internet disclosures, and data accumulated by pharmaceutical companies themselves.

The compound information includes compound ID, compound name, data source, international compound identity, and similar compound information;

the side effect information includes a side effect ID and a side effect name.

S102, constructing a graph model according to the medical data; wherein each compound, disease, target gene and side effect are regarded as vertices, and the correlation factor between vertices is regarded as edges.

S103, according to the acquired query information, transmitting the query information to the graph model for prediction, and displaying the fed-back prediction result.

Specifically, during query, adopting a graph query language, and sequencing the prediction results; when the method is applied, the adopted Cypher, gremlin isograph query languages can concentrate dozens of associated queries of the original relational database into one query, so that the code quantity is reduced; meanwhile, the ordering may be performed according to the similarity between the obtained compounds.

It should be noted that, for more specific working processes and examples of the method, please refer to the foregoing system embodiment part, and no further description is provided herein.

By adopting the method, the association conditions of the compound, the disease and the gene are presented in a full dimension by using the constructed graph model, so that new medicine research personnel can be helped to quickly find the relationship among the compound, the disease and the gene, and the research and development progress of the new medicine is quickened.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.

Claims

1. A new medicine research and development management system based on graph data is characterized in that: comprising the following steps:

The diagram data module is used for constructing a diagram model according to the medical data; wherein, each compound, disease, target gene and side effect are regarded as peaks, and the correlation factor between each peak is regarded as side; the association factors include similar compounds, similar diseases, binding, treating, causing and linking a plurality of factors, and each factor is taken as a corresponding edge type; the related point type corresponds to at least one related factor;

the query prediction module is used for transmitting the query information to the graph model for prediction according to the acquired query information, and displaying the fed-back prediction result;

In the prediction, from the angles of similarity and the same action mechanism, the prediction of the Miao ethnic compound is carried out;

searching for similar diseases, and finding out a compound capable of treating the similar diseases as a predicted Miao ethnic compound according to the relevant factor of treatment;

Finding compounds which have the same target genes and side effects as the compounds, and taking the compounds as predicted leptic compounds;

During inquiry, adopting a graph inquiry language, and carrying out experimental verification after sequencing the prediction results;

the compound information includes compound ID, compound name, data source, international compound identity, and similar compound information; wherein the similar compound information includes two compound similarities;

the target gene information includes target gene ID, target gene name, gene description and chromosome;

The side effect information includes a side effect ID and a side effect name;

The compounds cause side effects, the compounds bind to target genes, the compounds treat diseases and the disease link target gene information.

2. The new drug development management system based on graph data of claim 1, wherein: when the edge type is similar compound, the corresponding starting point type and ending point type are both compounds;

3. A new medicine research and development management method based on graph data is characterized in that: a new drug development management system for application to the graph-based data of claim 1, the method comprising:

Constructing a graph model according to the medical data; wherein, each compound, disease, target gene and side effect are regarded as peaks, and the correlation factor between each peak is regarded as side; the association factors include similar compounds, similar diseases, binding, treating, causing and linking a plurality of factors, and each factor is taken as a corresponding edge type; the related point type corresponds to at least one related factor;

According to the acquired query information, transmitting the query information to the graph model for prediction, and displaying the fed-back prediction result;

The side effect information includes a side effect ID and a side effect name;

4. A new drug development management method based on graph data according to claim 3, wherein: when the edge type is similar compound, the corresponding starting point type and ending point type are both compounds;