WO2020091185A1

WO2020091185A1 - Method for predicting heath effect of phytochemical, using integrated analysis based on molecular network, chemical property, and ethnopharmacological evidence, and system therefor

Info

Publication number: WO2020091185A1
Application number: PCT/KR2019/008244
Authority: WO
Inventors: 이도헌; 유선용
Original assignee: 재단법인 전통천연물기반 유전자동의보감 사업단; 한국과학기술원
Priority date: 2018-10-29
Filing date: 2019-07-04
Publication date: 2020-05-07
Also published as: KR102220004B1; KR20200048278A

Abstract

The present invention relates to a method for predicting health effects of phytochemicals by using integrated analysis based on molecular networks, chemical properties, and ethnopharmacological evidences, and a system therefor. Specifically, a method for predicting health effects of phytochemicals, using integrated analysis based on molecular networks, chemical properties, and ethnopharmacological evidences according to the present invention does not utilizes individual information separately, but analyzes the information in an integrated manner and as such, can analyze and predict health effects of phytochemicals on a mass scale with high reliability. Therefore, the method and a system therefor can be advantageously used to develop drugs that utilize phytochemicals having useful functions.

Description

Method for predicting the health effect of phytochemicals using a molecular network, chemical properties, and integrated analysis based on national pharmacological evidence and a system therefor

The present invention relates to a method for predicting the health effect of phytochemicals using a molecular network, chemical properties, and integrated analysis based on national pharmacological evidence and a system therefor.

A phytochemical refers to a compound produced by the chemical route of a plant, and is often referred to as a secondary metabolite. Recent studies have reported that many phytochemicals play a beneficial role in the functioning of human cells, and foods rich in these phytochemicals have been reported to promote health (Liu, RH et al. , Am. J. Clin.Nutr. 78, 517S-520S, 2003; and Mursu, J. et al. , Am. J. Clin. Nutr. 99, 328-333, 2013).

However, most studies on the efficacy of phytochemicals have been conducted through in vitro screening methods, so large-scale experiments are required to analyze a large number of phytochemicals, which have problems of low productivity and high time and cost. .

Accordingly, in silico approaches have been proposed in recent years based primarily on molecular or ethnopharmacological information. The molecular-based approach is a method of predicting the potential effects of phytochemicals by focusing on the molecular structure, molecular mechanism of action, or similarity of molecular information, such as target proteins, between phytochemicals and approved drugs. However, this approach is suitable for predicting the effect of a specific phytochemical on a specific phenotype, and it is a difficult method to predict the systemic effect of phytochemical on the human body. In addition, several approaches based on ethnopharmaceutical information have been proposed, but these methods are pharmacologic information only as a preliminary means prior to molecular analysis or in vitro evaluation to select plants or phytochemicals for the treatment of specific diseases. Focused on using. Therefore, the above method may be useful for selecting phytochemicals among a large number of candidate substances, but plants still have a problem of low productivity because they contain hundreds of phytochemicals. Therefore, it is necessary to discover a new in silico analysis method that can predict the health effects of phytochemicals on the human body by integrating the molecular information, chemical properties, and national pharmacological evidence of phytochemicals.

It is an object of the present invention to provide a method for predicting the health effect of phytochemicals using a molecular network, chemical properties, and integrated analysis based on national pharmacological evidence and a system therefor.

In order to achieve the above object, the present invention is a step of inferring the health effect of the phytochemical using a molecular network (step 1); Deriving a phytochemical having a high bioavailability by examining the chemical properties of the phytochemical (step 2); And searching for ethnopharmacological evidence to derive the health effect of phytochemicals having high semantic similarity with the national pharmacological evidence among the inferred health effects (step 3). It provides a method for predicting the health effects of phytochemicals using integrated analysis based on chemical properties and national pharmacological evidence.

In addition, the present invention provides a computer-readable medium storing a program including instructions for executing a method for predicting the health effect of phytochemicals using integrated analysis based on the molecular network, chemical properties, and ethnographic evidence.

In addition, the present invention is a module for inferring the health effects of phytochemicals using a molecular network; A module for deriving phytochemicals having high bioavailability by examining phytochemical properties of the phytochemicals; And a module for deriving the health effect of phytochemicals having high semantic similarity with the national pharmacological evidence among the inferred health effects by searching for ethnopharmaceutical evidence, and integrated analysis based on molecular networks, chemical properties, and ethnopharmacological evidence. Provides a system for predicting the health effect of phytochemicals using.

The method for predicting the health effect of phytochemicals using an integrated analysis based on molecular networks, chemical properties, and national pharmacological evidence according to the present invention is a highly reliable phytochemical health effect by analyzing each information without using it individually. Since it can be analyzed and predicted on a large scale, the method and system for the same can be usefully used for drug development using a phytochemical having a health promotion function.

1A to 1C are diagrams illustrating a method for predicting the health effect of phytochemicals using an integrated analysis based on molecular networks, chemical properties, and national pharmacological evidence of the present invention.

2A to 2C are diagrams illustrating a process of searching for ethnopharmaceutical uses of phytochemicals, and FIG. 2A is a process of extracting phenotype-related terms from descriptive expressions; Figure 2b is a process for deriving a plant containing a specific phytochemical; And FIG. 2C is a diagram illustrating a process of analyzing semantic similarity from a phenochemical network of phenotypes.

3 is a graph showing the distribution of predicted health effects of phytochemicals based on molecular networks (left) and the distribution of predicted health effects of phytochemicals based on molecular networks and ethnographic evidence (right).

Hereinafter, the present invention will be described in detail.

The present invention uses a molecular network to infer the health effect of the phytochemical (phytochemical) (step 1); Deriving a phytochemical having a high bioavailability by examining the chemical properties of the phytochemical (step 2); And searching for ethnopharmacological evidence to derive the health effect of phytochemicals having high semantic similarity with the national pharmacological evidence among the inferred health effects (step 3). It provides a method for predicting the health effects of phytochemicals using integrated analysis based on chemical properties and national pharmacological evidence.

The step 1 comprises the steps of producing a phenochemical phenotype vector by performing a random walk with restart (RWR) algorithm in a molecular network (step 1-1); Constructing a random phenotype vector by randomly selecting phytochemical targets from a fixed number of target proteins (step 1-2); And deriving a statistically significant phytochemical phenotype from the random phenotype vector (step 1-3).

Step 1-1 may include the following steps:

(a) assigning initial values to seed nodes on a molecular network based on molecular target information of phytochemicals, and

(b) calculating a transition probability from one node to a neighboring node,

Here, the transition probability of each node is defined by Equation 1 below when the time step is t + 1;

[Equation 1]

(r: restarting probability of random walker at each time step;

W: normalized adjacency matrix of the molecular network; And

P _t and P ₀ : probability vector of each node and when the time step is t).

Steps 1-3 may include determining a phenotype of a statistically significant phytochemical having a p value lower than 0.01 calculated by a formula represented by Equation 2 below:

[Equation 2]

p = (r + 1) / (n + 1)

(r: the number of random phenotype vectors of the phytochemical having a value greater than the phenotype value; and

n: number of random phenotype vectors of phytochemicals).

The chemical properties of step 2 include molecular weight, log value of octanol-water partition coefficient (AlogP), number of hydrogen-bond donors, and number of hydrogen-bond acceptors. , Number of rotatable bonds, human intestinal absorption (HIA), Caco-2 permeability, blood-brain barrier (BBB) permeability and 5 rules of Ripinski ( Lipinski's rule of five, RO5).

The semantic similarity of step 3 may be calculated by the equation represented by Equation 3 below:

[Equation 3]

(sim: semantic similarity;

depth: depth from the root phenotype (root UMLS) to the corresponding phenotype;

path: distance between each phenotype; And

lcs (c ₁ , c ₂ ): the lowest common subsumer of the c ₁ and c ₂ concepts.

After the step 3, the step of generating visualization data on the information derived through the visualization means and outputting it as visualization data through the output means may be further included. The output means may be any one selected from the group consisting of a monitor, a printer, and a plotter, and any means capable of outputting a result may be used.

In addition, the present invention provides a computer-readable medium storing a program including instructions for executing a method for predicting the health effect of phytochemicals using integrated analysis based on the molecular network, chemical properties, and ethnographic evidence. The computer-readable medium includes any kind of recording device in which data readable by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tapes, floppy disks, optical data storage devices, etc., and those implemented in the form of carrier waves (for example, transmission over the Internet). . The computer-readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

The module for inferring the health effect of phytochemicals using the molecular network includes: a module for performing a RWR algorithm in the molecular network to produce a phenochemical phenotype vector; A module for generating random phenotype vectors by randomly selecting phytochemical targets from a fixed number of target proteins; And a module for inferring the phytochemical health effect by deriving a statistically significant phytochemical phenotype from the random phenotype vector.

The system may additionally include a module that generates visualization data for the information derived by the modules and an output module that outputs it as visualization data.

The output module may be any one selected from the group consisting of a monitor, a printer and a plotter, and any means capable of outputting a result may be used.

In a specific embodiment of the present invention, the present inventors perform a RWR algorithm to produce a list of phenochemical phenotypes, produce a random phenotype vector for phytochemicals, select statistically significant phenotypes, and select phytochemicals. Health effects were inferred (see FIG. 1A).

In addition, the present inventors calculated the RO5, HIA, Caco-2 permeability, and BBB permeability of phytochemicals with known chemical structure information, thereby obtaining information about the chemical properties of the phytochemicals (see FIG. 1B).

In addition, the present inventors searched for ethnopharmaceutical uses of phytochemicals, derived health effects with pharmacological evidence among the predicted phytochemical health effects according to molecular network analysis, and compared the distribution (FIGS. 1C to 3).

In addition, the present inventors analyzed the precision and sensitivity of the method for predicting the health effect of the phytochemical according to the method of the present invention, and confirmed that the prediction method of the present invention has excellent performance, and the health of the phytochemical predicted according to the method of the present invention As the effect was confirmed in external literature, it was confirmed that the prediction method of the present invention is a reliable method.

Accordingly, a method for predicting the health effect of phytochemicals using an integrated analysis based on molecular networks, chemical properties and ethnographic evidence according to the present invention, a computer-readable medium storing a program including instructions for executing the same, and the method The system for executing can be useful for drug development using phytochemicals.

Hereinafter, the present invention will be described in detail by examples.

However, the following examples are merely illustrative of the present invention, and the contents of the present invention are not limited by the following examples.

[Example 1]

데이터 수집Data collection

For information on phytochemicals and plant compounds, see KTKP (http://www.koreantk.com/), TCMID (Xue, R. et al., Nucleic Acids Res. 1089-1095, 2012) and FooDB (http: //foodb.ca/). The ethnographic uses of plants were collected from KTKP, TCMID and Kampo (http://kampo.ca/), and the molecular targets of phytochemicals were DrugBank, Drug Combination Database (DCDB), v. 2.0, Liu, Y. et al . , Database, bau124, 2014), CTD (Comparative Toxicogenomics Database, Davis, AP et al. , Nucleic Acids Res. 39, D1067-1072, 2011), MATADOR (Gunther, S. et al. , Nucleic Acids Res. 36 , D919-922, 2008), STITCH (Kuhn, M. et al. , Nucleic Acids Res. 42, D401-407, 2013) and TTD (Zhu, F. et al., Nucleic Acids Res. 40, D1128-1136 , 2011). Gene-phenotype relationships were collected from CTD, and a protein-protein interaction network comprising 19,093 nodes and 270,970 edges was found in BioGrid (v. 3.4.136, Chatr-Aryamontri, A. et al. , Nucleic Acids). Res. 43, D470-478, 2015) and CODA (Context-Oriented Directed Associations, Hwang, W. et al. , BMC Med.Inf. Decis. Making 13, S4, 2013). The phenotypic network was obtained from the 2017AA version of the Unified Medical Language System (UMLS) (Bodenreider, O. Nucleic Acids Res. 32, D267-270, 2004), which provides integrated information on various terms related to biomedicine. Each distinct biomedical concept in UMLS is assigned a concept unique identifier (CUI), from which the MRREL list, which is a list of related concepts, among the 11 types of UMLS relations, is broder relationships (RB) and sub-words. CUIs having narrower relationships (RN) and other-related relationships (RO) are collected, and a total of 220,104 CUI and 663,018 relationships are collected.

For gold-standard sets, drug derived phytochemicals were collected in DrugBank (v. 4.0, Law, V. et al., Nucleic Acids Res. 42, D1091-1097, 2014). MetaMap tools (Aronson, AR et al. ) In DrugBank, CTD, ClinicalTrials.gov (Zarin, DA et al. , New Engl. J. Med. 364, 852-860, 2011) and DCDB to extract phenotype-related terms . , J. Am. Med. Inf. Assoc. 17, 229-236, 2010), and the phenotypic 20 such as "disease or syndrome" among 135 semantic types. The concept of metathesaurus assigned to the semantic type of dog was used (Table 1).

Semantic typeSemantic type
Acquired AbnormalityAcquired Abnormality	Mental ProcessMental Process
Anatomical AbnormalityAnatomical Abnormality	Mental or Behavioral DysfunctionMental or Behavioral Dysfunction
Biologic FunctionBiologic Function	Neoplastic ProcessNeoplastic Process
Congenital AbnormalityCongenital Abnormality	Pathologic FunctionPathologic Function
Cell or Molecular DysfunctionCell or Molecular Dysfunction	Physiologic FunctionPhysiologic Function
Disease of SyndromeDisease of Syndrome	Sign or SymptomSign or Symptom
Experimental Model of DiseaseExperimental Model of Disease	Clinical attributeClinical attribute
FindingFinding	Hazardous or Poisonous SubstanceHazardous or Poisonous Substance
Injury or PoisoningInjury or Poisoning	Body, Part, Organ, or Organ ComponentBody, Part, Organ, or Organ Component
Laboratory or Test ResultLaboratory or Test Result	TissueTissue

[Example 2]

분자 네트워크를 이용한 파이토케미컬의 건강효과 추론Inference of health effects of phytochemicals using molecular networks

2-1. RWR(Random Walk with Restart) 알고리즘 수행을 통한 파이토케미컬의 표현형 값 목록 제작2-1. Phytochemical phenotype list creation through RWR (Random Walk with Restart) algorithm

In order to investigate the propagated effects of phytochemicals, a molecular network was constructed based on protein-protein interaction information, and an RWR algorithm was performed.

Specifically, initial values were assigned to seed nodes on a molecular network based on target information of phytochemicals. The direct relationship among the target information of the phytochemicals includes binding information between the phytochemical and the target protein, whereas the indirect relationship is an interaction that causes changes due to protein expression, compound-induced phosphorylation, or effects of phytochemicals by active metabolites. It includes. Since the biological activity of phytochemicals in a molecular network can be changed from complex interactions, and the binding target information of phytochemicals is not much known compared to synthetic drugs, both information on direct and indirect relationships was used, and direct And initial values for indirect relationships were assigned to 1 and 0.3, respectively.

Then, the transition probability from one node to the neighboring node was calculated. It was assumed that the transition probability represents the drug effect spread on the molecular network, and when the time step is t + 1, the transition probability vector of each node is defined by Equation 1 below:

[Equation 1]

(r: restart probability of random walker in each time step is set to 0.7 in the present invention;

W: normalized adjacency matrix of the molecular network; And

P _t and P ₀ : probability vector of each node and when the time step is t).

Through the RWR algorithm, random workers were simulated until all nodes reached a steady state (P _{t + 1} -P _t <10 ^-8 ). For all genes that are therapeutic targets or biomarkers for a specific phenotype, the gene values obtained from the RWR results are linked to the corresponding phenotype based on the gene-phenotype relationship, and finally a quantitative phenotypic value list for each phytochemical is produced. Did.

2-2. 랜덤 표현형 벡터로부터 유의한 표현형들의 선별2-2. Selection of significant phenotypes from random phenotype vectors

Since the phenotypic values in the phenotype value list produced in Example 2-1 do not necessarily indicate the magnitude of the association in the relationship between phytochemicals and phenotypes, a random phenotype vector of phytochemicals is produced, and the inferred phenotype By comparing with a list of values, we tried to select phenotypes with significant values. Random phenotype vectors were constructed by randomly selecting phytochemical targets from a fixed number of target proteins. 1,000 random phenotype vectors were produced for each phytochemical, and p values were calculated by Equation 2 below, and phenotypes having a calculated p value lower than 0.01 were selected as statistically significant phenotypes:

[Equation 2]

p = (r + 1) / (n + 1)

n: number of random phenotype vectors of phytochemicals).

2-3. 추론된 파이토케미컬의 건강효과들의 분포2-3. Distribution of deduced phytochemical health effects

According to Examples 2-1 to 2-2, potential health effects for a total of 591 phytochemicals were deduced, and as a result, 415.6 ± 27.3 health effects were predicted on average for each phytochemical (trust interval ( confidence interval): 0.95, left graph in FIG. 3).

[Example 3]

파이토케미컬의 화학적 특성의 계산Calculation of phytochemical chemical properties

In order to derive phytochemicals with high bioavailability by analyzing the chemical properties of phytochemicals, chemical properties including physicochemical properties and physiological effects of phytochemicals were calculated, and a total of 512 phytos with known chemical structures are known. Chemicals were analyzed (FIG. 1B).

Specifically, physicochemical properties include molecular weight, log value of octanol-water partition coefficient (AlogP), number of hydrogen-bond donors, number of hydrogen-bond acceptors And the number of rotatable bonds. Physiological effects include human intestinal absorption (HIA), Caco-2 permeability, blood-brain barrier (BBB) permeability and Lipinski's rule of five (RO5). It includes. HIA and BBB values are calculated using Shen's work (Shen, J. et al. , J. Chem. Inf. Model, 50, 1034-1041, 2010), and Caco-2 permeability is a quantitative structure-activity relationship (quantitative Structure-activity relationship (QSAR) model (Pham The, H. et al. , Mol. Inform. 30, 376-385, 2011). RO5 and other physicochemical properties were calculated using a CDK (Chemistry Development Kit) Descriptor Calculator (Guha, R. http://www.rguha.net/code/java/cdkdesc.html).

	RO5RO5	HIAHIA	Caco-2 투과성Caco-2 permeability	BBB 투과성BBB permeability
RO5RO5	446446	401401	280280	365365
HIAHIA		482482	330330	407407
Caco-2 투과성Caco-2 permeability			335335	303303
BBB 투과성BBB permeability				428428

(The number of cells where the horizontal axis and the vertical axis intersect means the number of phytochemicals that satisfy each item at the same time)

As a result of investigating the RO5, HIA, Caco-2 permeability and BBB permeability of phytochemicals, it was found that 446, 482, 335 and 428 phytochemicals satisfy RO5, HIA, Caco-2 permeability and BBB permeability, respectively (Table 2). .

[Example 4]

파이토케미컬의 민족약학적 용도 탐색Exploring phytochemicals for their pharmacological uses

To extract plants with pharmacological evidence of the predicted effects of phytochemicals, phenotypic terms must be extracted and structured from descriptive expressions, and complex relationships between phenotypes must be quantified. To this end, first, by applying the MetaMap tool, phenotype-related terms were extracted from descriptive documents (FIG. 2A).

Next, plants containing phytochemicals to be known based on external database information were searched (FIG. 2B). Next, a phenotype network was produced based on the hierarchical relationship of UMLS, and Wu, Z. and Palmer, M. proposed semantic similarity between phenotypes by considering the distance and depth between the phenotypes on the network. It was calculated by the semantic similarity measurement method represented by Equation 3 (Wu, Z. & Palmer, M. 133-138 (Association for Computational Linguistics)) (FIG. 2C).

[Equation 3]

(sim: semantic similarity;

path: distance between each surface type; And

When the semantic similarity score between phenotypes was greater than 0.8, it was assumed that the relation between phenotypes was high. Thus, by calculating the semantic similarity between all possible pairs of phytochemicals for the predicted health effects of the phytochemicals and the plant's pharmacological effects, and selecting plants whose score is higher than 0.8, finally the phytochemical's ethnicity The pharmaceutical use was explored.

As a result of the analysis, in Example 2-3, on average, about 129.1 ± 11.4 health effects (about 31%) among the predicted health effects based on the molecular network had national pharmacological evidence (right graph in FIG. 3). ).

[Example 5]

본 발명에 따른 파이토케미컬의 건강효과 예측 방법의 성능 평가Performance evaluation of the method for predicting the health effect of phytochemicals according to the present invention

5-1. 데이터 수집5-1. Data collection

The experimentally validated information was collected in a gold-standard positive set. Information from DrugBank was used as a set for evaluating treatment effects, and information obtained from Side Effect Resource (SIDER) was used as a set for evaluating side effects. To analyze a larger number of phytochemicals, potential candidate effects were additionally collected from CTD and used as a silver-standard positive set.

5-2. 분자 네트워크 분석을 이용한 방법의 정밀도 및 민감도 분석5-2. Analysis of the precision and sensitivity of the method using molecular network analysis

First, the precision (p) and sensitivity (recall, r) values of the method for predicting the effect of phytochemicals using molecular network analysis were calculated by the following equations 4 and 5:

[Equation 4]

Precision = TP / (TP + FP); And

[Equation 5]

Sensitivity = TP / (TP + FN)

(TP: true positive data is detected as true (true positive);

FP: when data that is actually negative is detected as false (false positive); And

FN: When data that is actually positive is detected as negative (false negative).

Precision and sensitivity performance of methods for predicting treatment effects, side effects and potential candidate effects through molecular network analysis

	편포도Braid	치료 효과Treatment effect	부작용Side Effect	잠재적 후보효과Potential candidate effects
정밀도Precision	1:11: 1	0.921 ± 0.0320.921 ± 0.032	0.922 ± 0.0210.922 ± 0.021	0.942 ± 0.0050.942 ± 0.005
	1:101:10	0.518 ± 0.0590.518 ± 0.059	0.432 ± 0.0400.432 ± 0.040	0.706 ± 0.0130.706 ± 0.013
	AllAll	0.006 ± 0.0010.006 ± 0.001	0.049 ± 0.0100.049 ± 0.010	0.522 ± 0.0220.522 ± 0.022
민감도responsiveness	AllAll	0.738 ± 0.0620.738 ± 0.062	0.576 ± 0.0610.576 ± 0.061	0.909 ± 0.0110.909 ± 0.011

Because predicting the potential health effects of 415.6 on average for each phytochemical, the precision was very low, so we tried to evaluate the precision performance by adjusting the skewness between the positive and negative sets. Of all possible health effects, except for the gold-standard positive set, the rest was used as the gold-standard negative set, and the ratio of the positive set and the negative set was adjusted to 1: 1 or 1:10 to analyze the precision. When was adjusted, the precision performance was high (Table 3).

In addition, as a result of analyzing the sensitivity performance, 191 phenotypes of 270 treatment effects of 61 phytochemicals were predicted (r = 0.738 ± 0.062), and 1,059 phenotypes of 1,784 phenotypes of 60 phytochemicals in predicting side effects This was predicted (r = 0.576 ± 0.061), and 119,233 phenotypes of 136,862 phenotypes of 453 phytochemicals were predicted in predicting potential candidate effects (r = 0.909 ± 0.011) (Table 3).

Overall, the prediction of health effects through molecular network analysis showed excellent performance.

5-3. 분자 네트워크 분석 및 민족약학적 증거를 이용한 방법의 정밀도 분석5-3. Molecular network analysis and precision analysis of methods using ethnographic evidence

In order to compare the predicted performance with or without national pharmacological evidence, the health effects of the phytochemicals predicted in Example 5-2 were classified according to the presence or absence of national pharmacological use, and precision was analyzed.

Precision performance of methods for predicting treatment effects, side effects and potential candidate effects using molecular network analysis and ethnographic evidence

	편포도Braid	치료 효과Treatment effect	부작용Side Effect	잠재적 후보효과Potential candidate effects
정밀도Precision	1:11: 1	0.941 ± 0.0350.941 ± 0.035	0.761 ± 0.0330.761 ± 0.033	0.948 ± 0.0140.948 ± 0.014
	1:101:10	0.541 ± 0.0690.541 ± 0.069	0.319 ± 0.0550.319 ± 0.055	0.732 ± 0.0370.732 ± 0.037
	AllAll	0.014 ± 0.0030.014 ± 0.003	0.025 ± 0.0050.025 ± 0.005	0.563 ± 0.0590.563 ± 0.059

As a result, when considering pharmacological evidence, precision performance was increased in terms of predicting treatment effects and potential candidate effects than those not. However, precision was reduced in the prediction of side effects, which is interpreted as the result of using only national pharmacological evidence for therapeutic use (Table 4).

5-4. 화학적 특성 분석을 이용한 방법의 정밀도 및 민감도 분석5-4. Method precision and sensitivity analysis using chemical characterization

In order to confirm whether the predictive performance is improved when using chemical properties analysis in the method of predicting the health effects of phytochemicals using chemical properties, the precision and sensitivity of predicting the efficacy of treating neurological diseases according to BBB permeability were analyzed.

Specifically, in order to regulate neuroactive functions, since the phytochemicals must pass through the BBB, the phytochemicals are classified into two independent sets according to whether the BBB is permeable, and the predicted performance for the efficacy of treating neurological diseases is compared. As a result, the precision and sensitivity values of the set that pass the BBB (p = 0.611 ± 0.046, r = 0.725 ± 0.033), and the precision and sensitivity values of the set that do not pass the BBB (p = 0.312 ± 0.052, r = 0.558 ± 0.042) ).

As a result of the analysis through Examples 5-1 to 5-4, the method of the present invention using the integrated molecular network analysis, ethnographic evidence and chemical properties as a whole is predicted the health effects of phytochemicals. It was found that the performance to be performed is excellent.

[Example 6]

본 발명에 따른 파이토케미컬의 건강효과 예측 방법의 신뢰성 검증Reliability verification of the method for predicting the health effect of phytochemicals according to the present invention

In order to verify the reliability of the method for predicting the health effect of phytochemicals according to the present invention, it was investigated whether the predicted health effects are confirmed in external documents.

6-1. 데이터 세트 추출6-1. Data set extraction

Two independent sets were made based on the predicted health effects of Phytochemical. First, a set of phytochemical-phenotype relationships predicted to be positive in molecular networks, oral-availability, and pharmacologic evidence was created, and as a control, this same number of random relationships was extracted to create a set of random relationships. .

6-2. 공동발생(co-occurence) 분석6-2. Co-occurence analysis

For 13,200,786 PubMed abstracts disclosed during 1950 to 2013, the number of abstracts containing both phytochemicals and corresponding phenotypes in the same sentence was calculated as the number of co-occurrences (n _c ).

	공동발생Co-occurrence	자카드 인덱스Jacquard index	피셔 검정^a Fisher black ^a	FDR 검정^b FDR test ^b
예측된 관계 세트Predicted relationship set	1.251.25	1.8 × 10^-4 1.8 × 10 ^-4	29842984	13411341
무작위 관계 세트Random relationship set	0.090.09	9.5 × 10^-6 9.5 × 10 ^-6	612612	274274
Mann-Whitney U test의 p 값P value of Mann-Whitney U test	< 0.001<0.001	< 0.001<0.001	< 0.001<0.001	< 0.001<0.001

^{a Number of} phytochemical-health effects with p values less than 0.001 in the Fisher exact test ^b Number of phytochemical-health effects relationships with q values less than 0.05 in the FDR test

As a result, the number of co-occurrences in the predicted relationship set was 13.8 times larger than the number of co-occurrences in the random relationship set (Table 5).

6-3. 자카드 인덱스(Jaccard index) 분석6-3. Analysis of Jaccard index

In order to correct the difference in the frequency of phytochemicals and phenotypes, the number of co-occurrences was standardized as a jacquard index. First, the number of PubMed abstracts containing one or more phytochemicals and phenotypes was counted (n ₀ ), and the jacquard index was calculated by dividing n _c , which is the co-occurrence number of each phytochemical-phenotype relationship, by n ₀ . For example, when calculating the jacquard index for a phytochemical-phenotype pair of 'quercetin-stroke', there are 50 abstracts containing both quercetin and stroke, and abstracts containing one or more of the two If there are 200, the jacquard index is 0.25.

As a result of the analysis, the average jacquard index of the predicted relationship set was 18.9 times higher than the average jacquard index of the random relationship set (Table 5).

6-4. 피셔 정확 검정(Fisher's exact test) 분석6-4. Fisher's exact test analysis

Fischer's exact test was performed by counting the number of PubMed greens including phytochemical and target health effects, and the number of statistically significant relationships with p values less than 0.001 was calculated.

As a result, the number of significant relationships in the predicted relationship set was 4.8 times higher than the number of significant relationships in the random relationship set (Table 5).

6-5. FDR 검정 분석6-5. FDR assay analysis

In the case of analyzing a large number of relationships, the problem of multiple tests occurs in Fisher's exact test, and since it results in many false positive results, FDR test analysis is additionally performed to compensate for this, and the q value is 0.05 The lower significant relationship was investigated.

As a result, the number of relationships with q value lower than 0.05 was 4.9 times higher than the random relationship set in the predicted relationship set (Table 5).

6-6. Mann-Whitney U test 분석6-6. Mann-Whitney U test analysis

Finally, a statistical difference between the literature evidence and the predicted relationship set and the random relationship set was investigated by performing Mann-Whitney U test and calculating the corresponding p value. At this time, it was judged that the p value was lower than 0.05 statistically significant.

As a result, the p-value between the predicted relationship set and the random relationship set was lower than 0.05 in all items, indicating that the literature evidence for the two sets showed statistically significant differences (Table 5).

As a result of the analysis through Examples 6-1 to 6-6, it was found that the method of the present invention can be usefully used to confirm the health effect of phytochemicals.

[Example 7]

본 발명에 따른 방법을 이용한 파이토케미컬의 잠재적 건강효과 예측Prediction of potential health effects of phytochemicals using the method according to the invention

The potential health effects for several phytochemicals were predicted in accordance with the present invention, and clinical trials for these effects were investigated at ClinicalTrials.gov.

(n _e : number of ethnographic evidences; and

Rank: Rank of the corresponding phenotype in the order of phenotype classified by the number of ethnographic evidence)

As a result, in general, the more phenotypes that have a deeper potential relationship with phytochemicals, the more related clinical trials were in progress, and most of those clinical trials were usually phase 3 or 4 clinical trials (Table 6). The above results suggest that the method for predicting the health effect of phytochemicals according to the present invention can be usefully used to predict the potential therapeutic effect of phytochemicals on specific diseases.

Claims

Deducing the health effect of phytochemicals using a molecular network (step 1);

Deriving a phytochemical having a high bioavailability by examining the chemical properties of the phytochemical (step 2); And

Molecular network, chemical, including the step of deriving the health effect of phytochemicals having high semantic similarity with the national pharmacological evidence among the deduced health effects by searching for ethnopharmacological evidence (step 3) Method for predicting the health effect of phytochemicals using integrated analysis based on characteristics and national pharmacological evidence.
The method of claim 1, wherein the step 1 comprises: performing a RWR (Random Walk with Restart) algorithm in a molecular network to produce a phenochemical phenotype vector (step 1-1);

Constructing a random phenotype vector by randomly selecting phytochemical targets from a fixed number of target proteins (step 1-2); And

Including the step of inferring the health effects of phytochemicals by deriving statistically significant phytochemical phenotypes from the random phenotype vectors (steps 1-3), an integrated analysis based on molecular networks, chemical properties and ethnographic evidence Method for predicting the health effect of phytochemicals.
The method according to claim 2, wherein the step 1-1 comprises the following steps: a method for predicting the health effect of phytochemicals using an integrated analysis based on molecular networks, chemical properties, and ethnographic evidence:

(a) assigning initial values to seed nodes on a molecular network based on molecular target information of phytochemicals, and

(b) calculating a transition probability from one node to a neighboring node,

Here, the transition probability of each node is defined by Equation 1 below when the time step is t + 1;

[Equation 1]

(r: restarting probability of random walker at each time step;

W: normalized adjacency matrix of the molecular network; And

p t and p 0 : probability vector of each node and when the time step is t).
The molecular network of claim 2, wherein the steps 1-3 include determining phenotypes having a p value lower than 0.01 calculated by the formula represented by Equation 2 below as a statistically significant phytochemical phenotype. , Phytochemical's health effect prediction method using integrated analysis based on chemical properties and national pharmacological evidence:

[Equation 2]

p = (r + 1) / (n + 1)

(r: the number of random phenotype vectors of the phytochemical having a value greater than the phenotype value; and

n: number of random phenotype vectors of phytochemicals).
The method of claim 1, wherein the chemical properties of step 2 are molecular weight, octanol (octanol)-log value of partition coefficient (AlogP), number of hydrogen-bond donors (hydrogen-bond donors), hydrogen bond acceptor ( number of hydrogen-bond acceptors, number of rotatable bonds, human intestinal absorption (HIA), Caco-2 permeability, blood-brain barrier (BBB) permeability, and Method for predicting the health effect of phytochemicals using integrated analysis based on molecular networks, chemical properties, and ethnopharmacological evidence, at least one selected from the group consisting of Lipinski's rule of five (RO5).
According to claim 1, The semantic similarity of step 3 is to be calculated by the formula represented by the following equation (3), predicting the health effect of phytochemicals using an integrated analysis based on molecular networks, chemical properties and national pharmacological evidence Way:

[Equation 3]

(sim: semantic similarity;

depth: depth from the root phenotype (root UMLS) to the corresponding phenotype;

path: distance between each surface type; And

lcs (c 1 , c 2 ): the lowest common subsumer of the c 1 and c 2 concepts.
A computer storing a program comprising instructions for executing a method for predicting the health effect of phytochemicals using an integrated analysis based on molecular networks, chemical properties and ethnographic evidence according to any one of claims 1 to 5, Medium available.
A module for inferring phytochemical health effects using a molecular network;

A module for deriving phytochemicals having high bioavailability by examining phytochemical properties of the phytochemicals; And

An integrated analysis based on molecular networks, chemical properties, and national pharmacological evidence, including modules that derive health effects of phytochemicals that have high semantic similarity with national pharmacological evidence among the inferred health effects by searching for ethnopharmacological evidence Phytochemical's health effect prediction system.
10. The method of claim 8, wherein the module for inferring the health effect of phytochemicals using the molecular network comprises: a module for performing a RWR algorithm in the molecular network to produce a phenochemical vector;

A module for generating random phenotype vectors by randomly selecting phytochemical targets from a fixed number of target proteins; And

A module comprising a module for inferring phytochemical health effects by deriving statistically significant phytochemical phenotypes from the random phenotype vector, phytochemical health effects using integrated analysis based on molecular networks, chemical properties, and national pharmacological evidence Prediction system.