WO2021208993A1 - Information processing method and apparatus for predicting drug target - Google Patents

Information processing method and apparatus for predicting drug target Download PDF

Info

Publication number
WO2021208993A1
WO2021208993A1 PCT/CN2021/087362 CN2021087362W WO2021208993A1 WO 2021208993 A1 WO2021208993 A1 WO 2021208993A1 CN 2021087362 W CN2021087362 W CN 2021087362W WO 2021208993 A1 WO2021208993 A1 WO 2021208993A1
Authority
WO
WIPO (PCT)
Prior art keywords
compound
gene
perturbation
spectrum
perturbation spectrum
Prior art date
Application number
PCT/CN2021/087362
Other languages
French (fr)
Chinese (zh)
Inventor
蒋华良
郑明月
钟飞盛
吴小龙
李叙潼
Original Assignee
中国科学院上海药物研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院上海药物研究所 filed Critical 中国科学院上海药物研究所
Publication of WO2021208993A1 publication Critical patent/WO2021208993A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage

Definitions

  • This application relates to the field of artificial intelligence, and in particular to an information processing method and device for predicting drug targets.
  • the computer prediction model of drug action target helps to deepen our understanding of drug molecular action mechanism, metabolic pathways, adverse effects and drug resistance.
  • the rapid increase of multi-omics data and the rapid development of artificial intelligence technology have laid the foundation for the development of computer technology for drug target reasoning and prediction.
  • the techniques for predicting drug targets using gene expression profiles or transcriptome data mainly include: comparative analysis methods, network-based analysis methods, and machine learning methods.
  • the comparative analysis method is based on the similarity of characteristic differentially expressed genes for prediction, such as CMap developed by Broad Institute.
  • the network-based method starts from the perspective of systems biology and integrates gene expression profiles with cell networks to predict drug targets.
  • the ProTINA method developed by Noa et al. established a cell-type-specific protein-gene regulatory network and used dynamic models to infer drug targets from differential gene expression profiles, showing good prediction results.
  • different machine learning algorithms have also been used to mine transcription profile data for drug target prediction.
  • Pabon et al. used a random forest (RF) model to predict drug targets by analyzing the correlation between drug-induced and gene knockdown transcription profiles.
  • RF random forest
  • the purpose of the embodiments of the present application is to provide an information processing method for predicting drug targets, so as to improve the accuracy of drug target prediction.
  • an information processing method for predicting drug targets including:
  • the probability that the compound can have an effect on the target gene is predicted.
  • the beneficial effect of the present application is that it can determine the degree of correlation between the perturbation spectrum of the compound and the gene perturbation spectrum, and then predict the probability that the compound can affect the target gene based on the degree of correlation and experimental condition data, so as to determine In the process of judging whether the compound can have an effect on the target gene, the correlation between the compound perturbation spectrum and the gene perturbation spectrum is considered, thereby improving the accuracy of drug target prediction.
  • the determining the degree of correlation between the perturbation spectrum of the compound and the perturbation spectrum of the gene includes:
  • the correlation degree between the perturbation spectrum of the compound and the perturbation spectrum of the gene is calculated based on a first preset algorithm.
  • the compound when the degree of correlation is the Pearson correlation coefficient of the perturbation spectrum of the compound and the perturbation spectrum of the gene, according to the degree of correlation and preset experimental condition data, the compound can be The prediction of the probability of the target gene having an effect includes:
  • the Pearson correlation coefficient and the experimental condition data are substituted into a second preset algorithm to obtain a score of the interaction probability of the compound and the target gene.
  • the determining the degree of correlation between the perturbation spectrum of the compound and the perturbation spectrum of the gene includes:
  • the beneficial effect of this embodiment is that the eigenvectors corresponding to the compound perturbation spectrum and the gene perturbation spectrum are calculated through the neural network, that is, the first vector and the second vector, and then the first vector and the second vector can be obtained through the calculation module.
  • Pearson correlation coefficient, the Pearson correlation coefficient of the first vector and the second vector, that is, the Pearson correlation coefficient of the compound perturbation spectrum and the gene perturbation spectrum, which is used to characterize the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum Therefore, in this embodiment, the correlation degree between the compound perturbation spectrum and the gene perturbation spectrum can be obtained through the neural network, which simplifies the process of determining the correlation degree between the two.
  • the predicting the probability that the compound can have an effect on the target gene according to the correlation degree and preset experimental condition data includes:
  • the preset experimental condition data includes at least one of the following data:
  • Compound perturbation duration duration, compound dosage, gene knockdown duration and cell type.
  • the method when there are multiple types of target genes, the method further includes:
  • This application also provides an information processing device for predicting drug targets, including:
  • the first acquisition module is used to acquire the perturbation spectrum of the compound corresponding to the compound
  • the second acquisition module is used to acquire the gene perturbation spectrum corresponding to the target gene on which the compound acts;
  • the prediction module is used to predict the probability that the compound can have an effect on the target gene according to the correlation degree and preset experimental condition data.
  • the determining module includes:
  • the first input sub-module is used to input the compound perturbation spectrum and the gene perturbation spectrum into a feature extraction network to perform feature extraction on the compound perturbation spectrum and the gene perturbation spectrum;
  • the first acquisition sub-module is configured to acquire a first vector corresponding to the compound perturbation spectrum and a second vector corresponding to the gene perturbation spectrum output by the feature extraction network;
  • the second acquisition sub-module is configured to acquire the Pearson correlation coefficient of the first vector and the second vector output by the calculation module.
  • the prediction module includes:
  • the third acquisition sub-module is used to acquire preset experimental condition data
  • the third input submodule is used to input the Pearson correlation coefficient and the experimental condition data into the classification module;
  • the fourth obtaining submodule is used to obtain the score of the interaction probability between the compound and the target gene output by the classification module.
  • FIG. 1 is a flowchart of an information processing method for predicting drug targets according to an embodiment of the application
  • FIG. 2 is a flowchart of an information processing method for predicting drug targets according to another embodiment of the application
  • FIG. 3 is a flowchart of an information processing method for predicting drug targets according to another embodiment of the application.
  • FIG. 4 is a block diagram of an information processing device for predicting drug targets according to an embodiment of the application.
  • FIG. 5 is a block diagram of an information processing device for predicting drug targets according to another embodiment of the application, showing the main architecture of the determining module of this embodiment;
  • FIG. 6 is a block diagram of an information processing device for predicting drug targets according to another embodiment of the application, showing the main architecture of the prediction module of this embodiment.
  • Fig. 1 is a flowchart of an information processing method for predicting drug targets according to an embodiment of the application. The method includes the following steps S11-S14:
  • step S11 obtain the perturbation spectrum of the compound corresponding to the compound
  • step S12 obtain the gene perturbation spectrum corresponding to the target gene that the compound acts on;
  • step S13 determine the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum
  • step S14 the probability that the compound can have an effect on the target gene is predicted based on the degree of correlation and preset experimental condition data.
  • the compound perturbation profile corresponding to the compound is obtained, where the compound perturbation profile is used to express the difference between the gene expression profile after the cell is added with the drug and the gene expression profile under the normal state of the cell.
  • the compound refers to the compound in the drug whose target is to be predicted.
  • the perturbation spectrum of the compound is determined in the following way:
  • the differentially expressed genes are analyzed by sequencing technology to obtain the compound perturbation spectrum.
  • the perturbation spectrum of the compound can also be obtained by searching existing databases. From the compound perturbation differential gene expression profile, 978 marker feature differential genes were extracted, and a 978-dimensional feature vector was formed. The 978-dimensional feature vector represents the compound perturbation spectrum.
  • the gene perturbation spectrum After obtaining the compound perturbation spectrum, obtain the gene perturbation spectrum corresponding to the target gene that the compound acts on, where the gene perturbation spectrum is used to characterize the difference between the expression profile after the gene knockdown of the cell and the expression profile under the normal state of the cell The difference. Determine the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum, and then predict the probability that the compound can have an effect on the target gene based on the degree of correlation and preset experimental condition data. It should be noted that, in most cases, a compound is physically related to the protein in the gene. Therefore, the effect of the compound on the target gene includes the effect on the protein encoded by the target gene.
  • the beneficial effect of this application is that it can determine the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum, and then predict the probability that the compound can affect the target gene based on the degree of correlation and experimental condition data, so as to determine whether the compound can affect the target gene.
  • the correlation between the compound perturbation spectrum and the gene perturbation spectrum is considered, thereby improving the accuracy of drug target prediction.
  • step S13 may be implemented as the following steps:
  • the correlation degree between the compound perturbation spectrum and the gene perturbation spectrum is calculated based on the first preset algorithm.
  • the correlation degree of the above-mentioned compound perturbation spectrum can be implemented based on an algorithm.
  • the algorithm can be input into an application program for implementation.
  • the algorithm is specifically as follows:
  • the protein-protein interaction network represented by a connection matrix, is marked as symbol A.
  • the parameter matrix can be any suitable parameter matrix
  • the 200-dimensional image embedding is not generated, but only a 2-dimensional compound perturbation image embedding is generated, which is set as E1.
  • the parameter matrix can be any suitable parameter matrix
  • step S14 when the degree of correlation is the Pearson correlation coefficient of the compound perturbation spectrum and the gene perturbation spectrum, the above step S14 can be implemented as the following steps A1-A2:
  • step A1 obtain preset experimental condition data
  • step A2 the Pearson correlation coefficient and experimental condition data are substituted into the second preset algorithm to obtain the score of the interaction probability between the compound and the target gene.
  • output is a two-dimensional vector, taking the first dimension as the CPI score
  • the Pearson correlation coefficient and experimental condition data are substituted into the second preset algorithm, and the score of the interaction probability between the compound and the target gene is 0.5.
  • step S13 can be implemented as the following steps B1-B4:
  • step B1 input the compound perturbation spectrum and the gene perturbation spectrum into the feature extraction network to perform feature extraction on the compound perturbation spectrum and the gene perturbation spectrum;
  • step B2 obtain the first vector corresponding to the compound perturbation spectrum and the second vector corresponding to the gene perturbation spectrum output by the feature extraction network;
  • step B3 input the first vector and the second vector into the calculation module
  • step B4 the Pearson correlation coefficient of the first vector and the second vector output by the calculation module is obtained.
  • the compound perturbation spectrum also called compound perturbation transcription profile feature, which is composed of 978-dimensional vectors in specific practice
  • gene perturbation spectrum also called gene knockdown transcription
  • the spectrum feature is composed of 978-dimensional vector in the actual practice process
  • the feature extraction network is a spectral-based graph neural network (GCN).
  • the feature extraction network After the feature extraction, the feature extraction network outputs the first vector corresponding to the compound perturbation spectrum The second vector corresponding to the gene perturbation spectrum; the first vector and the second vector are obtained after dimensionality reduction of the corresponding 978-dimensional feature vector. Therefore, the dimensionality of the first vector and the second vector is less than 978-dimensional , Input the first vector and the second vector into the calculation module; obtain the Pearson correlation coefficient of the first vector and the second vector output by the calculation module.
  • the beneficial effect of this embodiment is that the eigenvectors corresponding to the compound perturbation spectrum and the gene perturbation spectrum are calculated through the neural network, that is, the first vector and the second vector, and then the first vector and the second vector can be obtained through the calculation module.
  • Pearson correlation coefficient, the Pearson correlation coefficient of the first vector and the second vector, that is, the Pearson correlation coefficient of the compound perturbation spectrum and the gene perturbation spectrum, which is used to characterize the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum Therefore, in this embodiment, the correlation degree between the compound perturbation spectrum and the gene perturbation spectrum can be obtained through the neural network, which simplifies the process of determining the correlation degree between the two.
  • step S14 can be implemented as the following steps S21-S23:
  • step S21 obtain preset experimental condition data
  • step S22 the Pearson correlation coefficient and experimental condition data are input into the classification module
  • step S23 the score of the interaction probability between the compound and the target gene output by the classification module is obtained.
  • the preset experimental condition data may include at least one of the following data:
  • Compound perturbation duration duration, compound dosage, gene knockdown duration and cell type.
  • the classification model consists of a fully connected hidden layer (used to extract input features) and an output layer (used to determine whether there is a compound-protein target interaction). Classification discrimination) composition to obtain the score of the interaction probability between the compound and the target gene output by the classification module.
  • the preset experimental condition data includes at least one of the following data:
  • Compound perturbation duration duration, compound dosage, gene knockdown duration and cell type.
  • the heterogeneous experimental condition information is integrated, so that the cell line background, dose and time-dependent equivalent response to differential gene expression and drug target inference prediction can be considered, and the accuracy of prediction can be further improved.
  • the method when there are multiple types of target genes, the method can also be implemented as the following steps S31-S33:
  • step S31 scores of the interaction probabilities of various target genes and compounds are obtained respectively;
  • step S32 the scores corresponding to various target genes are sorted
  • step S33 it is determined that the target gene corresponding to the highest score value interacts with the compound.
  • the scores of the interaction probability of each target gene and the compound are obtained respectively, that is, the aforementioned steps S11-S14 are performed once for each target gene to calculate the score of the interaction probability of each target gene and the compound. , And then sort the calculated scores to determine that the target gene corresponding to the highest score value interacts with the compound. That is, the target gene corresponding to the highest score value is the target of the drug corresponding to the compound.
  • Fig. 4 is a block diagram of an information processing device for predicting drug targets according to an embodiment of the application.
  • the device includes the following modules:
  • the first obtaining module 41 is used to obtain the perturbation spectrum of the compound corresponding to the compound
  • the second obtaining module 42 is used to obtain the gene perturbation spectrum corresponding to the target gene on which the compound acts;
  • the determination module 43 is used to determine the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum
  • the prediction module 44 is used to predict the probability that the compound can have an effect on the target gene according to the degree of correlation and preset experimental condition data.
  • the determining module 43 includes:
  • the first input submodule 51 is used to input the compound perturbation spectrum and the gene perturbation spectrum into the feature extraction network to perform feature extraction on the compound perturbation spectrum and the gene perturbation spectrum;
  • the first acquisition sub-module 52 is configured to acquire the first vector corresponding to the compound perturbation spectrum and the second vector corresponding to the gene perturbation spectrum output by the feature extraction network;
  • the second input submodule 53 is used to input the first vector and the second vector into the calculation module
  • the second acquisition sub-module 54 is used to acquire the Pearson correlation coefficient of the first vector and the second vector output by the calculation module.
  • the prediction module 44 includes:
  • the third acquiring sub-module 61 is used to acquire preset experimental condition data
  • the third input submodule 62 is used to input Pearson correlation coefficient and experimental condition data into the classification module
  • the fourth obtaining submodule 63 is used to obtain the score of the interaction probability between the compound and the target gene output by the classification module.

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Epidemiology (AREA)
  • Medicinal Chemistry (AREA)
  • Public Health (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Toxicology (AREA)
  • Primary Health Care (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Disclosed are an information processing method and an apparatus for predicting a drug target, thereby improving the accuracy of predicting the drug target. The method comprises: obtaining a compound perturbation spectrum corresponding to a compound (S11); obtaining a gene perturbation spectrum corresponding to a target gene on which the compound acts (S12); determining the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum (S13); and predicting the probability that the compound can act on the target gene according to the degree of correlation and preset experimental condition data (S14). The correlation between the compound perturbation spectrum and the gene perturbation spectrum is considered in the determination process for determining whether the compound can act on the target gene, so as to improve the accuracy of predicting the drug target.

Description

一种用于预测药物靶标的信息处理方法及装置Information processing method and device for predicting drug target 技术领域Technical field
本申请涉及人工智能领域,特别涉及一种用于预测药物靶标的信息处理方法及装置。This application relates to the field of artificial intelligence, and in particular to an information processing method and device for predicting drug targets.
背景技术Background technique
药物作用靶标的计算机预测模型有助于加深我们对药物分子作用机理,代谢通路以及不良作用和耐药性的理解。近年来,多组学数据的快速增加,以及人工智能技术的快速发展,为药物靶标推理预测的计算机技术开发奠定了基础。The computer prediction model of drug action target helps to deepen our understanding of drug molecular action mechanism, metabolic pathways, adverse effects and drug resistance. In recent years, the rapid increase of multi-omics data and the rapid development of artificial intelligence technology have laid the foundation for the development of computer technology for drug target reasoning and prediction.
目前,使用基因表达谱或转录组数据进行药物靶标预测的技术主要包括:比较分析方法,基于网络的分析方法和机器学习方法。At present, the techniques for predicting drug targets using gene expression profiles or transcriptome data mainly include: comparative analysis methods, network-based analysis methods, and machine learning methods.
其中,比较分析方法基于特征差异表达基因的相似性进行预测,例如Broad institute开发的CMap。而基于网络的方法从系统生物学的角度入手,将基因表达谱与细胞网络进行整合预测药物靶标。例如,Noa等人开发的ProTINA方法通过建立细胞类型特异性的蛋白质-基因调控网络,利用动态模型从差异基因表达谱中推断药物靶标,显示出较好的预测结果。另外,不同的机器学习算法也已被用于挖掘转录谱数据进行药物靶标预测。例如,Pabon等人利用随机森林(RF)模型通过分析药物诱导和基因敲降的转录谱之间的相关性来预测药物靶标。Among them, the comparative analysis method is based on the similarity of characteristic differentially expressed genes for prediction, such as CMap developed by Broad Institute. The network-based method starts from the perspective of systems biology and integrates gene expression profiles with cell networks to predict drug targets. For example, the ProTINA method developed by Noa et al. established a cell-type-specific protein-gene regulatory network and used dynamic models to infer drug targets from differential gene expression profiles, showing good prediction results. In addition, different machine learning algorithms have also been used to mine transcription profile data for drug target prediction. For example, Pabon et al. used a random forest (RF) model to predict drug targets by analyzing the correlation between drug-induced and gene knockdown transcription profiles.
然而,现有技术中所采用的上述方法仍然存在弊端,例如,不能挖掘化合物微扰谱和基因微扰谱之间的相关性,在药物靶标预测方面的准确性仍有很大的提升空间,因此,如何提出一种用于预测药物靶标的信息处理方法,以挖掘化合物微扰谱和基因微扰谱之间的相关性,提升药物靶标预 测的准确性,是一亟待解决的技术问题。However, the above-mentioned methods used in the prior art still have drawbacks. For example, the correlation between the compound perturbation spectrum and the gene perturbation spectrum cannot be explored, and there is still a lot of room for improvement in the accuracy of drug target prediction. Therefore, how to propose an information processing method for predicting drug targets to explore the correlation between compound perturbation spectrum and gene perturbation spectrum and improve the accuracy of drug target prediction is an urgent technical problem to be solved.
发明内容Summary of the invention
本申请实施例的目的在于提供一种用于预测药物靶标的信息处理方法,用以提升药物靶标预测的准确性。The purpose of the embodiments of the present application is to provide an information processing method for predicting drug targets, so as to improve the accuracy of drug target prediction.
为了解决上述技术问题,本申请的实施例采用了如下技术方案:一种用于预测药物靶标的信息处理方法,包括:In order to solve the above technical problems, the embodiments of the present application adopt the following technical solutions: an information processing method for predicting drug targets, including:
获取化合物对应的化合物微扰谱;Obtain the compound perturbation spectrum corresponding to the compound;
获取所述化合物所作用的目标基因对应的基因微扰谱;Obtaining the gene perturbation spectrum corresponding to the target gene acted by the compound;
确定所述化合物微扰谱和所述基因微扰谱的相关程度;Determining the degree of correlation between the perturbation spectrum of the compound and the perturbation spectrum of the gene;
根据所述相关程度和预设的实验条件数据,对所述化合物能够对所述目标基因产生作用的概率进行预测。According to the correlation degree and preset experimental condition data, the probability that the compound can have an effect on the target gene is predicted.
本申请的有益效果在于:能够确定化合物微扰谱和所述基因微扰谱的相关程度,然后基于相关程度和实验条件数据对化合物能够对所述目标基因产生作用的概率进行预测,从而在确定化合物是否能够对目标基因产生作用的判断过程中,考虑了化合物微扰谱和基因微扰谱之间的相关性,从而提高了药物靶标预测的准确性。The beneficial effect of the present application is that it can determine the degree of correlation between the perturbation spectrum of the compound and the gene perturbation spectrum, and then predict the probability that the compound can affect the target gene based on the degree of correlation and experimental condition data, so as to determine In the process of judging whether the compound can have an effect on the target gene, the correlation between the compound perturbation spectrum and the gene perturbation spectrum is considered, thereby improving the accuracy of drug target prediction.
在一个实施例中,所述确定所述化合物微扰谱和所述基因微扰谱的相关程度,包括:In an embodiment, the determining the degree of correlation between the perturbation spectrum of the compound and the perturbation spectrum of the gene includes:
基于第一预设算法计算所述化合物微扰谱和所述基因微扰谱的相关程度。The correlation degree between the perturbation spectrum of the compound and the perturbation spectrum of the gene is calculated based on a first preset algorithm.
在一个实施例中,所述相关程度为所述化合物微扰谱和所述基因微扰谱的皮尔逊相关系数时,根据所述相关程度和预设的实验条件数据,对所述化合物能够对所述目标基因产生作用的概率进行预测,包括:In one embodiment, when the degree of correlation is the Pearson correlation coefficient of the perturbation spectrum of the compound and the perturbation spectrum of the gene, according to the degree of correlation and preset experimental condition data, the compound can be The prediction of the probability of the target gene having an effect includes:
获取预设的实验条件数据;Obtain preset experimental condition data;
将所述皮尔逊相关系数及所述实验条件数据代入第二预设算法中,以得所述化合物和所述目标基因相互作用概率的评分。The Pearson correlation coefficient and the experimental condition data are substituted into a second preset algorithm to obtain a score of the interaction probability of the compound and the target gene.
在一个实施例中,所述确定所述化合物微扰谱和所述基因微扰谱的相关程度,包括:In an embodiment, the determining the degree of correlation between the perturbation spectrum of the compound and the perturbation spectrum of the gene includes:
将所述化合物微扰谱和所述基因微扰谱输入至特征提取网络中,以对所述化合物微扰谱和所述基因微扰谱进行特征提取;Input the compound perturbation spectrum and the gene perturbation spectrum into a feature extraction network to perform feature extraction on the compound perturbation spectrum and the gene perturbation spectrum;
获取所述特征提取网络输出的所述化合物微扰谱对应的第一向量和所述基因微扰谱对应的第二向量;Acquiring a first vector corresponding to the compound perturbation spectrum and a second vector corresponding to the gene perturbation spectrum output by the feature extraction network;
将所述第一向量和所述第二向量输入至计算模块中;Inputting the first vector and the second vector into a calculation module;
获取所述计算模块输出的所述第一向量和第二向量的皮尔逊相关系数。Obtaining the Pearson correlation coefficient of the first vector and the second vector output by the calculation module.
本实施例的有益效果在于:通过神经元网络计算化合物微扰谱和基因微扰谱对应的特征向量,即第一向量和第二向量,然后可以通过计算模块得到第一向量和第二向量的皮尔逊相关系数,该第一向量和第二向量的皮尔逊相关系数即化合物微扰谱和基因微扰谱的皮尔逊相关系数,用于表征化合物微扰谱和基因微扰谱的相关程度,因此,本实施例可以通过神经元网络得到化合物微扰谱和基因微扰谱的相关程度,简化了二者相关程度的确定过程。The beneficial effect of this embodiment is that the eigenvectors corresponding to the compound perturbation spectrum and the gene perturbation spectrum are calculated through the neural network, that is, the first vector and the second vector, and then the first vector and the second vector can be obtained through the calculation module. Pearson correlation coefficient, the Pearson correlation coefficient of the first vector and the second vector, that is, the Pearson correlation coefficient of the compound perturbation spectrum and the gene perturbation spectrum, which is used to characterize the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum, Therefore, in this embodiment, the correlation degree between the compound perturbation spectrum and the gene perturbation spectrum can be obtained through the neural network, which simplifies the process of determining the correlation degree between the two.
在一个实施例中,所述根据所述相关程度和预设的实验条件数据,对所述化合物能够对所述目标基因产生作用的概率进行预测,包括:In an embodiment, the predicting the probability that the compound can have an effect on the target gene according to the correlation degree and preset experimental condition data includes:
获取预设的实验条件数据;Obtain preset experimental condition data;
将所述皮尔逊相关系数及所述实验条件数据输入到分类模块中;Input the Pearson correlation coefficient and the experimental condition data into the classification module;
获取所述分类模块输出的所述化合物和所述目标基因相互作用概率的评分。Obtain the score of the interaction probability between the compound and the target gene output by the classification module.
在一个实施例中,所述预设的实验条件数据,包括以下至少一种数据:In an embodiment, the preset experimental condition data includes at least one of the following data:
化合物微扰时长、化合物剂量、基因敲降时长和细胞类型。Compound perturbation duration, compound dosage, gene knockdown duration and cell type.
在一个实施例中,当存在多种类型的目标基因时,所述方法还包括:In one embodiment, when there are multiple types of target genes, the method further includes:
分别获取各类目标基因与所述化合物相互作用概率的评分;Obtain scores of the interaction probabilities of various target genes and the compound respectively;
将所述各类目标基因分别对应的评分进行排序;Sort the scores corresponding to the various target genes;
确定最高评分值对应的目标基因与所述化合物存在相互作用。It is determined that the target gene corresponding to the highest score value interacts with the compound.
本申请还提供一种用于预测药物靶标的信息处理装置,包括:This application also provides an information processing device for predicting drug targets, including:
第一获取模块,用于获取化合物对应的化合物微扰谱;The first acquisition module is used to acquire the perturbation spectrum of the compound corresponding to the compound;
第二获取模块,用于获取所述化合物所作用的目标基因对应的基因微扰谱;The second acquisition module is used to acquire the gene perturbation spectrum corresponding to the target gene on which the compound acts;
确定模块,用于确定所述化合物微扰谱和所述基因微扰谱的相关程度;A determining module for determining the degree of correlation between the perturbation spectrum of the compound and the perturbation spectrum of the gene;
预测模块,用于根据所述相关程度和预设的实验条件数据,对所述化合物能够对所述目标基因产生作用的概率进行预测。The prediction module is used to predict the probability that the compound can have an effect on the target gene according to the correlation degree and preset experimental condition data.
在一个实施例中,所述确定模块,包括:In an embodiment, the determining module includes:
第一输入子模块,用于将所述化合物微扰谱和所述基因微扰谱输入至特征提取网络中,以对所述化合物微扰谱和所述基因微扰谱进行特征提取;The first input sub-module is used to input the compound perturbation spectrum and the gene perturbation spectrum into a feature extraction network to perform feature extraction on the compound perturbation spectrum and the gene perturbation spectrum;
第一获取子模块,用于获取所述特征提取网络输出的所述化合物微扰谱对应的第一向量和所述基因微扰谱对应的第二向量;The first acquisition sub-module is configured to acquire a first vector corresponding to the compound perturbation spectrum and a second vector corresponding to the gene perturbation spectrum output by the feature extraction network;
第二输入子模块,用于将所述第一向量和所述第二向量输入至计算模块中;A second input sub-module for inputting the first vector and the second vector into a calculation module;
第二获取子模块,用于获取所述计算模块输出的所述第一向量和第二向量的皮尔逊相关系数。The second acquisition sub-module is configured to acquire the Pearson correlation coefficient of the first vector and the second vector output by the calculation module.
在一个实施例中,所述预测模块,包括:In an embodiment, the prediction module includes:
第三获取子模块,用于获取预设的实验条件数据;The third acquisition sub-module is used to acquire preset experimental condition data;
第三输入子模块,用于将所述皮尔逊相关系数及所述实验条件数据输入到分类模块中;The third input submodule is used to input the Pearson correlation coefficient and the experimental condition data into the classification module;
第四获取子模块,用于获取所述分类模块输出的所述化合物和所述目标基因相互作用概率的评分。The fourth obtaining submodule is used to obtain the score of the interaction probability between the compound and the target gene output by the classification module.
附图说明Description of the drawings
图1为本申请一实施例的一种用于预测药物靶标的信息处理方法的流程图;FIG. 1 is a flowchart of an information processing method for predicting drug targets according to an embodiment of the application;
图2为本申请另一实施例的一种用于预测药物靶标的信息处理方法的流程图;2 is a flowchart of an information processing method for predicting drug targets according to another embodiment of the application;
图3为本申请又一实施例的一种用于预测药物靶标的信息处理方法的流程图;3 is a flowchart of an information processing method for predicting drug targets according to another embodiment of the application;
图4为本申请一实施例的一种用于预测药物靶标的信息处理装置的框图;4 is a block diagram of an information processing device for predicting drug targets according to an embodiment of the application;
图5为本申请另一实施例的一种用于预测药物靶标的信息处理装置的框图,示出本实施例的确定模块的主要架构;5 is a block diagram of an information processing device for predicting drug targets according to another embodiment of the application, showing the main architecture of the determining module of this embodiment;
图6为本申请又一实施例的一种用于预测药物靶标的信息处理装置的框图,示出本实施例的预测模块的主要架构。FIG. 6 is a block diagram of an information processing device for predicting drug targets according to another embodiment of the application, showing the main architecture of the prediction module of this embodiment.
具体实施方式Detailed ways
此处参考附图描述本申请的各种方案以及特征。Various solutions and features of the present application are described here with reference to the drawings.
应理解的是,可以对此处申请的实施例做出各种修改。因此,上述说明书不应该视为限制,而仅是作为实施例的范例。本领域的技术人员将想到在本申请的范围和精神内的其他修改。It should be understood that various modifications can be made to the embodiments applied herein. Therefore, the above description should not be regarded as a limitation, but merely as an example of an embodiment. Those skilled in the art will think of other modifications within the scope and spirit of this application.
包含在说明书中并构成说明书的一部分的附图示出了本申请的实施例,并且与上面给出的对本申请的大致描述以及下面给出的对实施例的详细描述一起用于解释本申请的原理。The drawings included in the specification and constituting a part of the specification illustrate the embodiments of the application, and together with the general description of the application given above and the detailed description of the embodiments given below, are used to explain the application principle.
通过下面参照附图对给定为非限制性实例的实施例的优选形式的描述,本申请的这些和其它特性将会变得显而易见。These and other characteristics of the present application will become apparent from the following description of preferred forms of embodiments given as non-limiting examples with reference to the accompanying drawings.
还应当理解,尽管已经参照一些具体实例对本申请进行了描述,但本领域技术人员能够确定地实现本申请的很多其它等效形式,它们具有如权利要求所述的特征并因此都位于借此所限定的保护范围内。It should also be understood that although the application has been described with reference to some specific examples, those skilled in the art can surely implement many other equivalent forms of the application, which have the features described in the claims and are therefore all located here. Within the limited scope of protection.
当结合附图时,鉴于以下详细说明,本申请的上述和其他方面、特征和优势将变得更为显而易见。When combined with the drawings, the above and other aspects, features and advantages of the present application will become more apparent in view of the following detailed description.
此后参照附图描述本申请的具体实施例;然而,应当理解,所申请的实施例仅仅是本申请的实例,其可采用多种方式实施。熟知和/或重复的功能和结构并未详细描述以避免不必要或多余的细节使得本申请模糊不清。因此,本文所申请的具体的结构性和功能性细节并非意在限定,而是仅仅作为权利要求的基础和代表性基础用于教导本领域技术人员以实质上任意合适的详细结构多样地使用本申请。Hereinafter, specific embodiments of the present application will be described with reference to the accompanying drawings; however, it should be understood that the applied embodiments are merely examples of the present application, which can be implemented in various ways. Well-known and/or repeated functions and structures have not been described in detail to avoid unnecessary or redundant details from obscuring the present application. Therefore, the specific structural and functional details applied for herein are not intended to be limiting, but merely serve as the basis and representative basis of the claims to teach those skilled in the art to use the present in a variety of ways with substantially any suitable detailed structure. Application.
本说明书可使用词组“在一种实施例中”、“在另一个实施例中”、“在又一实施例中”或“在其他实施例中”,其均可指代根据本申请的相同或不同实施例中的一个或多个。This specification may use the phrases "in one embodiment," "in another embodiment," "in yet another embodiment," or "in other embodiments," which can all refer to the same as in accordance with the present application. Or one or more of the different embodiments.
图1为本申请实施例的一种用于预测药物靶标的信息处理方法的流程图,该方法包括以下步骤S11-S14:Fig. 1 is a flowchart of an information processing method for predicting drug targets according to an embodiment of the application. The method includes the following steps S11-S14:
在步骤S11中,获取化合物对应的化合物微扰谱;In step S11, obtain the perturbation spectrum of the compound corresponding to the compound;
在步骤S12中,获取化合物所作用的目标基因对应的基因微扰谱;In step S12, obtain the gene perturbation spectrum corresponding to the target gene that the compound acts on;
在步骤S13中,确定化合物微扰谱和基因微扰谱的相关程度;In step S13, determine the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum;
在步骤S14中,根据相关程度和预设的实验条件数据,对化合物能够对目标基因产生作用的概率进行预测。In step S14, the probability that the compound can have an effect on the target gene is predicted based on the degree of correlation and preset experimental condition data.
本实施例中,获取化合物对应的化合物微扰谱,其中,该化合物微扰谱用于表达细胞加药以后的基因表达谱与细胞正常状态下的基因表达谱之间的差异。本实施例中,化合物是指要预测靶标的药物中的化合物。In this embodiment, the compound perturbation profile corresponding to the compound is obtained, where the compound perturbation profile is used to express the difference between the gene expression profile after the cell is added with the drug and the gene expression profile under the normal state of the cell. In this embodiment, the compound refers to the compound in the drug whose target is to be predicted.
进一步地,化合物微扰谱通过如下方式确定:Further, the perturbation spectrum of the compound is determined in the following way:
在将选定的小分子化合物与特定细胞共孵育,设置阳性和阴性对照组,利用测序技术分析差异表达基因,获得化合物微扰谱。此外,化合物微扰谱也可通过检索现有的数据库获得。从化合物微扰差异基因表达谱提取978个标志性特征差异基因,并组成978维特征向量,该978维特征向量表征化合物微扰谱。After co-incubating the selected small molecule compounds with specific cells, a positive and negative control group is set, and the differentially expressed genes are analyzed by sequencing technology to obtain the compound perturbation spectrum. In addition, the perturbation spectrum of the compound can also be obtained by searching existing databases. From the compound perturbation differential gene expression profile, 978 marker feature differential genes were extracted, and a 978-dimensional feature vector was formed. The 978-dimensional feature vector represents the compound perturbation spectrum.
在获取化合物微扰谱之后,获取化合物所作用的目标基因对应的基因微扰谱,其中,该基因微扰谱用于表征细胞基因敲降以后的表达谱与细胞正常状态下的表达谱之间的差异。确定化合物微扰谱和基因微扰谱的相关程度,然后根据相关程度和预设的实验条件数据,对化合物能够对目标基因产生作用的概率进行预测。需要说明的是,化合物,多数情况下是和基因中的蛋白质发生物理相关作用,因此,化合物对目标基因产生作用包括对目标基因编码的蛋白质产生作用。After obtaining the compound perturbation spectrum, obtain the gene perturbation spectrum corresponding to the target gene that the compound acts on, where the gene perturbation spectrum is used to characterize the difference between the expression profile after the gene knockdown of the cell and the expression profile under the normal state of the cell The difference. Determine the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum, and then predict the probability that the compound can have an effect on the target gene based on the degree of correlation and preset experimental condition data. It should be noted that, in most cases, a compound is physically related to the protein in the gene. Therefore, the effect of the compound on the target gene includes the effect on the protein encoded by the target gene.
本申请的有益效果在于:能够确定化合物微扰谱和基因微扰谱的相关程度,然后基于相关程度和实验条件数据对化合物能够对目标基因产生作用的概率进行预测,从而在确定化合物是否能够对目标基因产生作用的判断过程中,考虑了化合物微扰谱和基因微扰谱之间的相关性,从而提高了药物靶标预测的准确性。The beneficial effect of this application is that it can determine the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum, and then predict the probability that the compound can affect the target gene based on the degree of correlation and experimental condition data, so as to determine whether the compound can affect the target gene. In the process of determining the effect of the target gene, the correlation between the compound perturbation spectrum and the gene perturbation spectrum is considered, thereby improving the accuracy of drug target prediction.
在一个实施例中,上述步骤S13可被实施为如下步骤:In an embodiment, the above step S13 may be implemented as the following steps:
基于第一预设算法计算化合物微扰谱和基因微扰谱的相关程度。The correlation degree between the compound perturbation spectrum and the gene perturbation spectrum is calculated based on the first preset algorithm.
本实施例中,上述化合物微扰谱的相关程度可以基于算法实现,具体的,该算法可以输入到一应用程序中实现,该算法具体如下:In this embodiment, the correlation degree of the above-mentioned compound perturbation spectrum can be implemented based on an algorithm. Specifically, the algorithm can be input into an application program for implementation. The algorithm is specifically as follows:
首先,获取化合物微扰谱和基因微扰谱,在具体实践中,化合物微扰谱特征由978维向量来表示,记为C,C=(c1,c2,c3…c978),对于任意i(i=1-978),ci表示化合物微扰后基因i的差异表达值,即细胞加药以后的基因表达谱与细胞正常状态下的基因表达谱之间的差异。First, obtain the compound perturbation spectrum and gene perturbation spectrum. In specific practice, the characteristics of the compound perturbation spectrum are represented by a 978-dimensional vector, denoted as C, C=(c1,c2,c3...c978), for any i( i=1-978), ci represents the differential expression value of gene i after compound perturbation, that is, the difference between the gene expression profile after the cell is added with the drug and the gene expression profile under the normal state of the cell.
基因微扰谱特征(978维向量),记为G,G=(g1,g2,g3…g978),对于 任意i(i=1-978),gi表示基因敲降后基因i的差异表达值,即细胞基因敲降以后的表达谱与细胞正常状态下的表达谱之间的差异。Gene perturbation spectrum characteristics (978-dimensional vector), denoted as G, G = (g1, g2, g3...g978), for any i (i = 1-978), gi represents the differential expression value of gene i after gene knockdown , That is, the difference between the expression profile of the cell after gene knockdown and the expression profile of the cell in the normal state.
实验条件数据(4维向量),E=(t1,d,t2,l),t1表示化合物微扰时长,d表示化合物剂量,t2表示基因敲降时长,l表示细胞系种类。Experimental condition data (4-dimensional vector), E=(t1, d, t2, l), t1 represents the perturbation duration of the compound, d represents the compound dose, t2 represents the duration of gene knockdown, and l represents the type of cell line.
蛋白-蛋白相互作用网络(PPI网络),用连接矩阵表示,记为符号A。The protein-protein interaction network (PPI network), represented by a connection matrix, is marked as symbol A.
为了方便说明,在不失一般性的基础上,只研究2个基因的差异表达,那么C=(c1,c2),G=(g1,g2)。For the convenience of explanation, without loss of generality, only the differential expression of two genes is studied, then C=(c1,c2), G=(g1,g2).
为了使得整个过程更易理解,可以令C=(0.1,0.3),G=(0.1,0.3),连接矩阵
Figure PCTCN2021087362-appb-000001
E=(24,10,96,1)。
In order to make the whole process easier to understand, we can make C=(0.1,0.3), G=(0.1,0.3), the connection matrix
Figure PCTCN2021087362-appb-000001
E=(24,10,96,1).
从连接矩阵
Figure PCTCN2021087362-appb-000002
可得度矩阵
Figure PCTCN2021087362-appb-000003
From the connection matrix
Figure PCTCN2021087362-appb-000002
Availability matrix
Figure PCTCN2021087362-appb-000003
容易得到
Figure PCTCN2021087362-appb-000004
Easy to get
Figure PCTCN2021087362-appb-000004
由拉普拉斯矩阵L=D-A,可得:
Figure PCTCN2021087362-appb-000005
From the Laplacian matrix L=DA, we can get:
Figure PCTCN2021087362-appb-000005
由正则化的拉普拉斯矩阵L sys=D -1/2LD -1/2可得: From the regularized Laplacian matrix L sys = D -1/2 LD -1/2, we can get:
Figure PCTCN2021087362-appb-000006
Figure PCTCN2021087362-appb-000006
对该矩阵作谱分解:Spectral decomposition of the matrix:
L sys=UλU T L sys = UλU T
由此可得:
Figure PCTCN2021087362-appb-000007
Therefore:
Figure PCTCN2021087362-appb-000007
不失一般性,可以令参数矩阵
Figure PCTCN2021087362-appb-000008
Without loss of generality, the parameter matrix can be
Figure PCTCN2021087362-appb-000008
由于(f*h) graph=UωU Tf Since (f*h) graph =UωU T f
当f=c时,When f=c,
Figure PCTCN2021087362-appb-000009
Figure PCTCN2021087362-appb-000009
定义一个relu函数:Define a relu function:
Figure PCTCN2021087362-appb-000010
Figure PCTCN2021087362-appb-000010
显然,l1 relu=relu(l1)=(0.03,0.00) Obviously, l1 relu = relu(l1) = (0.03,0.00)
为了简化,并不生成200维图嵌入,只生成一个2维的化合物微扰图嵌入,设为E1。For simplicity, the 200-dimensional image embedding is not generated, but only a 2-dimensional compound perturbation image embedding is generated, which is set as E1.
不失一般性,可以令参数矩阵
Figure PCTCN2021087362-appb-000011
Without loss of generality, the parameter matrix can be
Figure PCTCN2021087362-appb-000011
化合物微扰图嵌入:Compound perturbation diagram embedded:
Figure PCTCN2021087362-appb-000012
Figure PCTCN2021087362-appb-000012
同理可得:基因敲降图嵌入E2=[0.03 0.03 0.03]The same can be obtained: gene knock-down map embedding E2=[0.03 0.03 0.03]
计算E1和E2的皮尔逊
Figure PCTCN2021087362-appb-000013
Calculate the Pearson of E1 and E2
Figure PCTCN2021087362-appb-000013
显然,皮尔逊R 2=r*r=1。 Obviously, Pearson R 2 =r*r=1.
在一个实施例中,相关程度为化合物微扰谱和基因微扰谱的皮尔逊相关系数时,上述步骤S14可被实施为如下步骤A1-A2:In an embodiment, when the degree of correlation is the Pearson correlation coefficient of the compound perturbation spectrum and the gene perturbation spectrum, the above step S14 can be implemented as the following steps A1-A2:
在步骤A1中,获取预设的实验条件数据;In step A1, obtain preset experimental condition data;
在步骤A2中,将皮尔逊相关系数及实验条件数据代入第二预设算法中,以得化合物和目标基因相互作用概率的评分。In step A2, the Pearson correlation coefficient and experimental condition data are substituted into the second preset algorithm to obtain the score of the interaction probability between the compound and the target gene.
本实施例中,获取预设的实验条件数据E=(t1,d,t2,l),根据具体实验情况,获得具体的实验条件数据t1=24,d=10,t2=96,l=1。将皮尔逊R 2跟四维向量实验条件数据E拼接起来,得到五维向量,记为v 5In this embodiment, the preset experimental condition data E=(t1,d,t2,l) is obtained, and the specific experimental condition data t1=24, d=10, t2=96, l=1 according to specific experimental conditions . Combine Pearson R 2 with the four-dimensional vector experimental condition data E to obtain a five-dimensional vector, denoted as v 5 .
显然v 5=(24,10,96,1,1)。 Obviously v 5 =(24,10,96,1,1).
可以令参数矩阵
Figure PCTCN2021087362-appb-000014
Can make the parameter matrix
Figure PCTCN2021087362-appb-000014
Figure PCTCN2021087362-appb-000015
Figure PCTCN2021087362-appb-000015
o exp=e o=(e 132,e 132) o exp =e o =(e 132 ,e 132 )
sum=e 132+e 132 sum=e 132 +e 132
Figure PCTCN2021087362-appb-000016
Figure PCTCN2021087362-appb-000016
output是一个二维向量,取第1维,作为CPI score,output is a two-dimensional vector, taking the first dimension as the CPI score,
即:CPI score=output[1]=0.5。That is: CPI score=output[1]=0.5.
即将皮尔逊相关系数及实验条件数据代入第二预设算法中,得到化合物和目标基因相互作用概率的评分为0.5。That is, the Pearson correlation coefficient and experimental condition data are substituted into the second preset algorithm, and the score of the interaction probability between the compound and the target gene is 0.5.
在一个实施例中,上述步骤S13可被实施为如下步骤B1-B4:In an embodiment, the above step S13 can be implemented as the following steps B1-B4:
在步骤B1中,将化合物微扰谱和基因微扰谱输入至特征提取网络中,以对化合物微扰谱和基因微扰谱进行特征提取;In step B1, input the compound perturbation spectrum and the gene perturbation spectrum into the feature extraction network to perform feature extraction on the compound perturbation spectrum and the gene perturbation spectrum;
在步骤B2中,获取特征提取网络输出的化合物微扰谱对应的第一向量和基因微扰谱对应的第二向量;In step B2, obtain the first vector corresponding to the compound perturbation spectrum and the second vector corresponding to the gene perturbation spectrum output by the feature extraction network;
在步骤B3中,将第一向量和第二向量输入至计算模块中;In step B3, input the first vector and the second vector into the calculation module;
在步骤B4中,获取计算模块输出的第一向量和第二向量的皮尔逊相关系数。In step B4, the Pearson correlation coefficient of the first vector and the second vector output by the calculation module is obtained.
本实施例中,首先获得化合物微扰谱(也可以称为化合物微扰转录谱特征,在具体实践过程中,由978维向量构成)和基因微扰谱,(也可以称为基因敲降转录谱特征,在具体实践过程中,由978维向量构成),然后将化合物微扰谱和基因微扰谱经过特征提取网络。本实施例中,特征提取网络是基于谱的图神经网络(GCN)。通过构建两个平行的GCN分别从化合物微扰谱和基因微扰谱进行特征提取,即将关键特征提取出来,实现降维,在特征提取之后,特征提取网络输出化合物微扰谱对应的第一向量和基因微扰谱对应的第二向量;该第一向量和第二向量是由各自对应的978维特征向量降维后得到的,因此,该第一向量和第二向量的维数小于978维, 将第一向量和第二向量输入至计算模块中;获取计算模块输出的第一向量和第二向量的皮尔逊相关系数。In this embodiment, firstly obtain the compound perturbation spectrum (also called compound perturbation transcription profile feature, which is composed of 978-dimensional vectors in specific practice) and gene perturbation spectrum (also called gene knockdown transcription The spectrum feature is composed of 978-dimensional vector in the actual practice process), and then the compound perturbation spectrum and gene perturbation spectrum are passed through the feature extraction network. In this embodiment, the feature extraction network is a spectral-based graph neural network (GCN). By constructing two parallel GCNs to extract features from the compound perturbation spectrum and the gene perturbation spectrum respectively, the key features are extracted to achieve dimensionality reduction. After the feature extraction, the feature extraction network outputs the first vector corresponding to the compound perturbation spectrum The second vector corresponding to the gene perturbation spectrum; the first vector and the second vector are obtained after dimensionality reduction of the corresponding 978-dimensional feature vector. Therefore, the dimensionality of the first vector and the second vector is less than 978-dimensional , Input the first vector and the second vector into the calculation module; obtain the Pearson correlation coefficient of the first vector and the second vector output by the calculation module.
本实施例的有益效果在于:通过神经元网络计算化合物微扰谱和基因微扰谱对应的特征向量,即第一向量和第二向量,然后可以通过计算模块得到第一向量和第二向量的皮尔逊相关系数,该第一向量和第二向量的皮尔逊相关系数即化合物微扰谱和基因微扰谱的皮尔逊相关系数,用于表征化合物微扰谱和基因微扰谱的相关程度,因此,本实施例可以通过神经元网络得到化合物微扰谱和基因微扰谱的相关程度,简化了二者相关程度的确定过程。The beneficial effect of this embodiment is that the eigenvectors corresponding to the compound perturbation spectrum and the gene perturbation spectrum are calculated through the neural network, that is, the first vector and the second vector, and then the first vector and the second vector can be obtained through the calculation module. Pearson correlation coefficient, the Pearson correlation coefficient of the first vector and the second vector, that is, the Pearson correlation coefficient of the compound perturbation spectrum and the gene perturbation spectrum, which is used to characterize the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum, Therefore, in this embodiment, the correlation degree between the compound perturbation spectrum and the gene perturbation spectrum can be obtained through the neural network, which simplifies the process of determining the correlation degree between the two.
在一个实施例中,如图2所示,上述步骤S14可被实施为如下步骤S21-S23:In an embodiment, as shown in FIG. 2, the above-mentioned step S14 can be implemented as the following steps S21-S23:
在步骤S21中,获取预设的实验条件数据;In step S21, obtain preset experimental condition data;
在步骤S22中,将皮尔逊相关系数及实验条件数据输入到分类模块中;In step S22, the Pearson correlation coefficient and experimental condition data are input into the classification module;
在步骤S23中,获取分类模块输出的化合物和目标基因相互作用概率的评分。In step S23, the score of the interaction probability between the compound and the target gene output by the classification module is obtained.
获取预设的实验条件数据,具体的,预设的实验条件数据可以包括以下至少一种数据:Obtain preset experimental condition data. Specifically, the preset experimental condition data may include at least one of the following data:
化合物微扰时长、化合物剂量、基因敲降时长和细胞类型。Compound perturbation duration, compound dosage, gene knockdown duration and cell type.
将皮尔逊相关系数和实验条件数据输入到分类模型中,本实施例中,分类模型由完全连接的隐藏层(用于提取输入特征)和输出层(用于是否具有化合物-蛋白靶标相互作用的分类判别)组成,获取分类模块输出的化合物和目标基因相互作用概率的评分。The Pearson correlation coefficient and experimental condition data are input into the classification model. In this example, the classification model consists of a fully connected hidden layer (used to extract input features) and an output layer (used to determine whether there is a compound-protein target interaction). Classification discrimination) composition to obtain the score of the interaction probability between the compound and the target gene output by the classification module.
在一个实施例中,预设的实验条件数据,包括以下至少一种数据:In an embodiment, the preset experimental condition data includes at least one of the following data:
化合物微扰时长、化合物剂量、基因敲降时长和细胞类型。Compound perturbation duration, compound dosage, gene knockdown duration and cell type.
本实施例中,整合了异质实验条件信息,从而可以考虑细胞系背景,剂量和时间依赖性等效应对差异基因表达和药物靶标推理预测的影响,进一步提高预测的准确度。In this embodiment, the heterogeneous experimental condition information is integrated, so that the cell line background, dose and time-dependent equivalent response to differential gene expression and drug target inference prediction can be considered, and the accuracy of prediction can be further improved.
在一个实施例中,如图3所示,当存在多种类型的目标基因时,方法还可被实施为如下步骤S31-S33:In an embodiment, as shown in FIG. 3, when there are multiple types of target genes, the method can also be implemented as the following steps S31-S33:
在步骤S31中,分别获取各类目标基因与化合物相互作用概率的评分;In step S31, scores of the interaction probabilities of various target genes and compounds are obtained respectively;
在步骤S32中,将各类目标基因分别对应的评分进行排序;In step S32, the scores corresponding to various target genes are sorted;
在步骤S33中,确定最高评分值对应的目标基因与化合物存在相互作用。In step S33, it is determined that the target gene corresponding to the highest score value interacts with the compound.
本实施例中,当存在多个目标基因时,分别获取各个目标基因与化合物相互作用概率的评分,即每一个目标基因执行一次前述步骤S11-S14,计算各个目标基因与化合物相互作用概率的评分,然后将计算得到的各个评分进行排序,确定最高评分值对应的目标基因与化合物存在相互作用。即最高评分值对应的目标基因为化合物所对应的药物的靶标。In this embodiment, when there are multiple target genes, the scores of the interaction probability of each target gene and the compound are obtained respectively, that is, the aforementioned steps S11-S14 are performed once for each target gene to calculate the score of the interaction probability of each target gene and the compound. , And then sort the calculated scores to determine that the target gene corresponding to the highest score value interacts with the compound. That is, the target gene corresponding to the highest score value is the target of the drug corresponding to the compound.
图4为本申请实施例的一种用于预测药物靶标的信息处理装置的框图,该装置包括以下模块:Fig. 4 is a block diagram of an information processing device for predicting drug targets according to an embodiment of the application. The device includes the following modules:
第一获取模块41,用于获取化合物对应的化合物微扰谱;The first obtaining module 41 is used to obtain the perturbation spectrum of the compound corresponding to the compound;
第二获取模块42,用于获取化合物所作用的目标基因对应的基因微扰谱;The second obtaining module 42 is used to obtain the gene perturbation spectrum corresponding to the target gene on which the compound acts;
确定模块43,用于确定化合物微扰谱和基因微扰谱的相关程度;The determination module 43 is used to determine the degree of correlation between the compound perturbation spectrum and the gene perturbation spectrum;
预测模块44,用于根据相关程度和预设的实验条件数据,对化合物能够对目标基因产生作用的概率进行预测。The prediction module 44 is used to predict the probability that the compound can have an effect on the target gene according to the degree of correlation and preset experimental condition data.
在一个实施例中,如图5所示,确定模块43,包括:In one embodiment, as shown in FIG. 5, the determining module 43 includes:
第一输入子模块51,用于将化合物微扰谱和基因微扰谱输入至特征提取网络中,以对化合物微扰谱和基因微扰谱进行特征提取;The first input submodule 51 is used to input the compound perturbation spectrum and the gene perturbation spectrum into the feature extraction network to perform feature extraction on the compound perturbation spectrum and the gene perturbation spectrum;
第一获取子模块52,用于获取特征提取网络输出的化合物微扰谱对应的第一向量和基因微扰谱对应的第二向量;The first acquisition sub-module 52 is configured to acquire the first vector corresponding to the compound perturbation spectrum and the second vector corresponding to the gene perturbation spectrum output by the feature extraction network;
第二输入子模块53,用于将第一向量和第二向量输入至计算模块中;The second input submodule 53 is used to input the first vector and the second vector into the calculation module;
第二获取子模块54,用于获取计算模块输出的第一向量和第二向量的皮尔逊相关系数。The second acquisition sub-module 54 is used to acquire the Pearson correlation coefficient of the first vector and the second vector output by the calculation module.
在一个实施例中,如图6所示,预测模块44,包括:In one embodiment, as shown in FIG. 6, the prediction module 44 includes:
第三获取子模块61,用于获取预设的实验条件数据;The third acquiring sub-module 61 is used to acquire preset experimental condition data;
第三输入子模块62,用于将皮尔逊相关系数及实验条件数据输入到分类模块中;The third input submodule 62 is used to input Pearson correlation coefficient and experimental condition data into the classification module;
第四获取子模块63,用于获取分类模块输出的化合物和目标基因相互作用概率的评分。The fourth obtaining submodule 63 is used to obtain the score of the interaction probability between the compound and the target gene output by the classification module.
以上实施例仅为本申请的示例性实施例,不用于限制本申请,本申请的保护范围由权利要求书限定。本领域技术人员可以在本申请的实质和保 护范围内,对本申请做出各种修改或等同替换,这种修改或等同替换也应视为落在本申请的保护范围内。The above embodiments are only exemplary embodiments of the application, and are not used to limit the application, and the protection scope of the application is defined by the claims. Those skilled in the art can make various modifications or equivalent substitutions to this application within the essence and protection scope of this application, and such modifications or equivalent substitutions shall also be deemed to fall within the protection scope of this application.

Claims (10)

  1. 一种用于预测药物靶标的信息处理方法,其特征在于,包括:An information processing method for predicting drug targets, which is characterized in that it comprises:
    获取化合物对应的化合物微扰谱;Obtain the compound perturbation spectrum corresponding to the compound;
    获取所述化合物所作用的目标基因对应的基因微扰谱;Obtaining the gene perturbation spectrum corresponding to the target gene acted by the compound;
    确定所述化合物微扰谱和所述基因微扰谱的相关程度;Determining the degree of correlation between the perturbation spectrum of the compound and the perturbation spectrum of the gene;
    根据所述相关程度和预设的实验条件数据,对所述化合物能够对所述目标基因产生作用的概率进行预测。According to the correlation degree and preset experimental condition data, the probability that the compound can have an effect on the target gene is predicted.
  2. 如权利要求1所述的方法,其特征在于,所述确定所述化合物微扰谱和所述基因微扰谱的相关程度,包括:The method of claim 1, wherein the determining the degree of correlation between the perturbation spectrum of the compound and the perturbation spectrum of the gene comprises:
    基于第一预设算法计算所述化合物微扰谱和所述基因微扰谱的相关程度。The correlation degree between the perturbation spectrum of the compound and the perturbation spectrum of the gene is calculated based on a first preset algorithm.
  3. 如权利要求2所述的方法,其特征在于,所述相关程度为所述化合物微扰谱和所述基因微扰谱的皮尔逊相关系数时,根据所述相关程度和预设的实验条件数据,对所述化合物能够对所述目标基因产生作用的概率进行预测,包括:The method according to claim 2, wherein when the degree of correlation is the Pearson correlation coefficient of the perturbation spectrum of the compound and the perturbation spectrum of the gene, according to the degree of correlation and preset experimental condition data , To predict the probability that the compound can have an effect on the target gene, including:
    获取预设的实验条件数据;Obtain preset experimental condition data;
    将所述皮尔逊相关系数及所述实验条件数据代入第二预设算法中,以得所述化合物和所述目标基因相互作用概率的评分。The Pearson correlation coefficient and the experimental condition data are substituted into a second preset algorithm to obtain a score of the interaction probability of the compound and the target gene.
  4. 如权利要求1所述的方法,其特征在于,所述确定所述化合物微扰谱和所述基因微扰谱的相关程度,包括:The method of claim 1, wherein the determining the degree of correlation between the perturbation spectrum of the compound and the perturbation spectrum of the gene comprises:
    将所述化合物微扰谱和所述基因微扰谱输入至特征提取网络中,以对所述化合物微扰谱和所述基因微扰谱进行特征提取;Input the compound perturbation spectrum and the gene perturbation spectrum into a feature extraction network to perform feature extraction on the compound perturbation spectrum and the gene perturbation spectrum;
    获取所述特征提取网络输出的所述化合物微扰谱对应的第一向量和所述基因微扰谱对应的第二向量;Acquiring a first vector corresponding to the compound perturbation spectrum and a second vector corresponding to the gene perturbation spectrum output by the feature extraction network;
    将所述第一向量和所述第二向量输入至计算模块中;Inputting the first vector and the second vector into a calculation module;
    获取所述计算模块输出的所述第一向量和第二向量的皮尔逊相关系数。Obtaining the Pearson correlation coefficient of the first vector and the second vector output by the calculation module.
  5. 如权利要求4所述的方法,其特征在于,所述根据所述相关程度和预设的实验条件数据,对所述化合物能够对所述目标基因产生作用的概率进行预测,包括:The method of claim 4, wherein the predicting the probability that the compound can have an effect on the target gene according to the correlation degree and preset experimental condition data comprises:
    获取预设的实验条件数据;Obtain preset experimental condition data;
    将所述皮尔逊相关系数及所述实验条件数据输入到分类模块中;Input the Pearson correlation coefficient and the experimental condition data into the classification module;
    获取所述分类模块输出的所述化合物和所述目标基因相互作用概率的评分。Obtain the score of the interaction probability between the compound and the target gene output by the classification module.
  6. 如权利要求3或5所述的方法,其特征在于,所述预设的实验条件数据,包括以下至少一种数据:The method according to claim 3 or 5, wherein the preset experimental condition data includes at least one of the following data:
    化合物微扰时长、化合物剂量、基因敲降时长和细胞类型。Compound perturbation duration, compound dosage, gene knockdown duration and cell type.
  7. 如权利要求1-6任意一项所述的方法,其特征在于,当存在多种类型的目标基因时,所述方法还包括:The method according to any one of claims 1-6, wherein when there are multiple types of target genes, the method further comprises:
    分别获取各类目标基因与所述化合物相互作用概率的评分;Obtain scores of the interaction probabilities of various target genes and the compound respectively;
    将所述各类目标基因分别对应的评分进行排序;Sort the scores corresponding to the various target genes;
    确定最高评分值对应的目标基因与所述化合物存在相互作用。It is determined that the target gene corresponding to the highest score value interacts with the compound.
  8. 一种用于预测药物靶标的信息处理装置,其特征在于,包括:An information processing device for predicting drug targets, which is characterized in that it comprises:
    第一获取模块,用于获取化合物对应的化合物微扰谱;The first acquisition module is used to acquire the perturbation spectrum of the compound corresponding to the compound;
    第二获取模块,用于获取所述化合物所作用的目标基因对应的基因微扰谱;The second acquisition module is used to acquire the gene perturbation spectrum corresponding to the target gene on which the compound acts;
    确定模块,用于确定所述化合物微扰谱和所述基因微扰谱的相关程度;A determining module for determining the degree of correlation between the perturbation spectrum of the compound and the perturbation spectrum of the gene;
    预测模块,用于根据所述相关程度和预设的实验条件数据,对所述化合物能够对所述目标基因产生作用的概率进行预测。The prediction module is used to predict the probability that the compound can have an effect on the target gene according to the correlation degree and preset experimental condition data.
  9. 如权利要求8所述的装置,其特征在于,所述确定模块,包括:The device according to claim 8, wherein the determining module comprises:
    第一输入子模块,用于将所述化合物微扰谱和所述基因微扰谱输入至特征提取网络中,以对所述化合物微扰谱和所述基因微扰谱进行特征提取;The first input sub-module is used to input the compound perturbation spectrum and the gene perturbation spectrum into a feature extraction network to perform feature extraction on the compound perturbation spectrum and the gene perturbation spectrum;
    第一获取子模块,用于获取所述特征提取网络输出的所述化合物微扰谱对应的第一向量和所述基因微扰谱对应的第二向量;The first acquisition sub-module is configured to acquire a first vector corresponding to the compound perturbation spectrum and a second vector corresponding to the gene perturbation spectrum output by the feature extraction network;
    第二输入子模块,用于将所述第一向量和所述第二向量输入至计算模块中;A second input sub-module for inputting the first vector and the second vector into a calculation module;
    第二获取子模块,用于获取所述计算模块输出的所述第一向量和第二向量的皮尔逊相关系数。The second acquisition sub-module is configured to acquire the Pearson correlation coefficient of the first vector and the second vector output by the calculation module.
  10. 如权利要求9所述的装置,其特征在于,所述预测模块,包括:9. The device of claim 9, wherein the prediction module comprises:
    第三获取子模块,用于获取预设的实验条件数据;The third acquisition sub-module is used to acquire preset experimental condition data;
    第三输入子模块,用于将所述皮尔逊相关系数及所述实验条件数据输入到分类模块中;The third input submodule is used to input the Pearson correlation coefficient and the experimental condition data into the classification module;
    第四获取子模块,用于获取所述分类模块输出的所述化合物和所述目标基因相互作用概率的评分。The fourth obtaining submodule is used to obtain the score of the interaction probability between the compound and the target gene output by the classification module.
PCT/CN2021/087362 2020-04-17 2021-04-15 Information processing method and apparatus for predicting drug target WO2021208993A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010309556.4 2020-04-17
CN202010309556.4A CN113539366A (en) 2020-04-17 2020-04-17 Information processing method and device for predicting drug target

Publications (1)

Publication Number Publication Date
WO2021208993A1 true WO2021208993A1 (en) 2021-10-21

Family

ID=78085268

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/087362 WO2021208993A1 (en) 2020-04-17 2021-04-15 Information processing method and apparatus for predicting drug target

Country Status (2)

Country Link
CN (1) CN113539366A (en)
WO (1) WO2021208993A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410645A (en) * 2022-08-23 2022-11-29 北京泽桥医疗科技股份有限公司 Method for identifying action target of Chinese patent medicine for treating new coronary pneumonia

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1309722A (en) * 1998-05-12 2001-08-22 罗斯塔英法美蒂克斯公司 Quantitative methods, systems and apparatuses for gene expression analysis
US20160224723A1 (en) * 2015-01-29 2016-08-04 The Trustees Of Columbia University In The City Of New York Method for predicting drug response based on genomic and transcriptomic data
CN106909807A (en) * 2017-02-14 2017-06-30 同济大学 A kind of Forecasting Methodology that drug targeting interactions between protein is predicted based on multivariate data
CN108351915A (en) * 2015-08-28 2018-07-31 纽约市哥伦比亚大学信托人 Pass through the virtual deduction for the protein active that regulator gathering and measuring carries out
CN108647489A (en) * 2018-05-15 2018-10-12 华中农业大学 A kind of method and system of screening disease medicament target and target combination

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10159262B4 (en) * 2001-12-03 2007-12-13 Siemens Ag Identify pharmaceutical targets
WO2009092024A1 (en) * 2008-01-16 2009-07-23 The Trustees Of Columbia University In The City Of New York System and method for prediction of phenotypically relevant genes and perturbation targets
KR101067352B1 (en) * 2009-11-19 2011-09-23 한국생명공학연구원 System and method comprising algorithm for mode-of-action of microarray experimental data, experiment/treatment condition-specific network generation and experiment/treatment condition relation interpretation using biological network analysis, and recording media having program therefor
EP2600269A3 (en) * 2011-12-03 2013-12-04 Medeolinx, LLC Microarray sampling and network modeling for drug toxicity prediction
WO2019075461A1 (en) * 2017-10-13 2019-04-18 BioAge Labs, Inc. Drug repurposing based on deep embeddings of gene expression profiles

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1309722A (en) * 1998-05-12 2001-08-22 罗斯塔英法美蒂克斯公司 Quantitative methods, systems and apparatuses for gene expression analysis
US20160224723A1 (en) * 2015-01-29 2016-08-04 The Trustees Of Columbia University In The City Of New York Method for predicting drug response based on genomic and transcriptomic data
CN108351915A (en) * 2015-08-28 2018-07-31 纽约市哥伦比亚大学信托人 Pass through the virtual deduction for the protein active that regulator gathering and measuring carries out
CN106909807A (en) * 2017-02-14 2017-06-30 同济大学 A kind of Forecasting Methodology that drug targeting interactions between protein is predicted based on multivariate data
CN108647489A (en) * 2018-05-15 2018-10-12 华中农业大学 A kind of method and system of screening disease medicament target and target combination

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410645A (en) * 2022-08-23 2022-11-29 北京泽桥医疗科技股份有限公司 Method for identifying action target of Chinese patent medicine for treating new coronary pneumonia
CN115410645B (en) * 2022-08-23 2023-07-21 北京泽桥医疗科技股份有限公司 Method for identifying action target point of Chinese patent medicine for treating new coronaries pneumonia

Also Published As

Publication number Publication date
CN113539366A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN110070141B (en) Network intrusion detection method
Adhao et al. Feature selection using principal component analysis and genetic algorithm
CN110021341B (en) Heterogeneous network-based GPCR (GPCR-based drug and targeting pathway) prediction method
CN112906770A (en) Cross-modal fusion-based deep clustering method and system
CN111640468B (en) Method for screening disease-related protein based on complex network
CN113299338A (en) Knowledge graph-based synthetic lethal gene pair prediction method, system, terminal and medium
CN113903395A (en) BP neural network copy number variation detection method and system for improving particle swarm optimization
CN112784921A (en) Task attention guided small sample image complementary learning classification algorithm
CN113361627A (en) Label perception collaborative training method for graph neural network
CN113468539A (en) Attack program identification method based on vulnerability attack database and decision tree
CN110659680B (en) Image patch matching method based on multi-scale convolution
WO2021208993A1 (en) Information processing method and apparatus for predicting drug target
CN113920472A (en) Unsupervised target re-identification method and system based on attention mechanism
Demirel et al. Meta-tuning loss functions and data augmentation for few-shot object detection
CN113539372A (en) Efficient prediction method for LncRNA and disease association relation
CN111310185B (en) Android malicious software detection method based on improved stacking algorithm
Bai et al. A unified deep learning model for protein structure prediction
KR102212310B1 (en) System and method for detecting of Incorrect Triple
CN115206423A (en) Label guidance-based protein action relation prediction method
Xu et al. Identifying protein complexes with fuzzy machine learning model
CN114300036A (en) Genetic variation pathogenicity prediction method and device, storage medium and computer equipment
CN116886398B (en) Internet of things intrusion detection method based on feature selection and integrated learning
Budiarto et al. Explainable supervised method for genetics ancestry estimation
Zhao et al. A hybrid method for incomplete data imputation
US20240119314A1 (en) Gene coding breeding prediction method and device based on graph clustering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21788606

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21788606

Country of ref document: EP

Kind code of ref document: A1