CN111081321A - CNS drug key feature identification method - Google Patents

CNS drug key feature identification method Download PDF

Info

Publication number
CN111081321A
CN111081321A CN201911307432.6A CN201911307432A CN111081321A CN 111081321 A CN111081321 A CN 111081321A CN 201911307432 A CN201911307432 A CN 201911307432A CN 111081321 A CN111081321 A CN 111081321A
Authority
CN
China
Prior art keywords
cns
feature
feature combination
sen
spe
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911307432.6A
Other languages
Chinese (zh)
Other versions
CN111081321B (en
Inventor
丁彦蕊
张瑞林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN201911307432.6A priority Critical patent/CN111081321B/en
Publication of CN111081321A publication Critical patent/CN111081321A/en
Application granted granted Critical
Publication of CN111081321B publication Critical patent/CN111081321B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Medicinal Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Databases & Information Systems (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a CNS drug key feature identification method, and belongs to the field of computer-aided drug design. By combining a support vector machine and a greedy algorithm, the characteristics with the minimum effect on improving the prediction result are gradually deleted by utilizing the greedy idea, and further, the key characteristics for distinguishing the CNS drugs from non-CNS drug small molecules are accurately screened out. The method combines a support vector machine and a greedy algorithm for the first time to be applied to the identification of the key characteristics of the CNS drugs, screens the key characteristics in a gradual deletion mode, considers the effect of combination among the characteristics, avoids the difficulty of initial characteristic selection brought by a characteristic increasing method, enables the screened key characteristics to effectively distinguish the CNS drugs from non-CNS drug micromolecules, and provides an important guidance method for fundamentally designing CNS drug candidate micromolecules.

Description

CNS drug key feature identification method
Technical Field
The invention relates to a CNS drug key feature identification method, and belongs to the field of computer-aided drug design.
Background
Currently, hundreds of millions of people worldwide are affected by diseases of the Central Nervous System (CNS). Due to the particularity of the brain environment, research and development of related drugs have the disadvantages of low success rate, high cost, long period and the like, and development of new CNS drugs is urgent. Designing reliable CNS drug candidates can greatly reduce the cycle and cost of new drug development and significantly improve success rates. Understanding the characteristic differences between CNS drugs and non-CNS drugs is a prerequisite for designing effective CNS drug candidates. Thus, the discovery of key features in CNS drugs helps us understand the specificity of CNS drugs and guide CNS drug design.
For how to screen out key features from a large number of features of CNS drugs, Shahid M (SVM base descriptor Selection and Classification of neurological Disease drug for pharmaceutical Modeling, Molecular information, 2013,32(3): 241-249), et al, use a support vector mechanism to build a model, and rank features by calculating feature scores from coefficients of each feature, and can also be used to perform feature Selection. But deleting unimportant features based on the scoring of individual features ignores the effect of combinations between features, some features alone do not work well, but two unimportant features in combination may work well. Lu J (Analysis of the acquisition target-based classification system using molecular descriptors. Combinatorial chemistry & high throughput screening,2016,19(2):129-135.) et al, increase features one by one starting from 0; however, in this method, when the initial single feature is used, the amount of information contained is small, and there is a high possibility that the case where SEN is 0%, SPE is 100%, or SEN is 100%, and SPE is 0%, in which case the selection leaves which feature cannot be measured, and the feature selected at the beginning has a great influence on the prediction performance of the subsequent feature combination; if the IFS algorithm is used from a plurality of features, the first plurality of features may need to be determined by other methods.
Therefore, the accurate finding of the key characteristics between the CNS drugs and non-CNS drug small molecules has great effect on helping people to design CNS drug molecules and develop new CNS drugs.
Disclosure of Invention
In order to find out key characteristics between CNS drugs and non-CNS drug small molecules and further achieve the purpose of guiding CNS drug design, the invention provides an identification method for the key characteristics of the CNS drugs.
Optionally, the method includes:
firstly, preliminarily screening out characteristics which have the effect of distinguishing the CNS drug and non-CNS drug micromolecules from all characteristics of the CNS drug and non-CNS drug micromolecules;
step two, constructing a support vector machine model by utilizing the characteristics which are preliminarily screened in the step one and have the effect of distinguishing the CNS medicament from the non-CNS medicament, and optimizing parameters c and g to obtain an optimized support vector machine model;
and step three, gradually deleting the characteristics which are preliminarily screened in the step one and have the effect of distinguishing the CNS drugs from non-CNS drugs by using a greedy algorithm, and screening key characteristics for distinguishing the CNS drugs from the non-CNS drugs in the deletion process.
Optionally, assuming that the number of the features which are preliminarily screened in the first step and have the effect of distinguishing the CNS drug from the non-CNS drug is n; the third step includes:
3.1 delete each feature one by one, resulting in n different feature combinations: { a2,a3,a4,…an},{a1,a3,a4,…an},{a1,a2,a4,…an},…{a1,a2,a3,a4,…an-1};
3.2 taking the n different feature combinations as input vectors of the optimized support vector machine model obtained in the second step to obtain the prediction performances respectively corresponding to the n different feature combinations, and reserving the feature combination with the best prediction performance;
3.3 execute 3.1 to 3.2 with n-1 features in one feature combination with the best predictive performance obtained at 3.2, and loop until n features are deleted;
3.4 selecting from the above 3.1 to 3.3 implementations a combination of features that is key to distinguishing between CNS drugs and non-CNS drugs.
Optionally, the prediction performance comprises sensitivity SEN and specificity SPE; SEN represents the prediction rate of CNS drugs and SPE represents the prediction rate of non-CNS drugs.
Optionally, the feature combination with the best prediction performance retained in the step 3.2 includes:
respectively comparing the SEN value and the SPE value corresponding to each feature combination, and selecting the highest SEN value and SPE value;
if the highest SEN and SPE belong to the same feature combination, the feature combination is reserved;
and if the SEN and the SPE which are the highest belong to two different feature combinations, comprehensively determining the feature combination to be reserved according to the SEN and the SPE of each of the two different feature combinations.
Optionally, assuming that the highest SEN and SPE belong to two different feature combinations a and B, respectively, the comprehensively determining the feature combination to be retained according to the SEN and SPE of the two different feature combinations includes:
comparing the SPE of the feature combination A with the SEN of the feature combination B;
if the SPE of the feature combination A is larger than the SEN of the feature combination B, selecting and reserving the feature combination A;
if the SPE of the feature combination A is smaller than the SEN of the feature combination B, selecting and reserving the feature combination B;
and if the SPE of the feature combination A is equal to the SEN of the feature combination B, comparing the sizes of the SEN of the feature combination A and the SPE of the feature combination B, and selecting the feature combination corresponding to the larger one.
Optionally, if SPE and SEN of the two feature combinations are equal, the feature combination a or the feature combination B is randomly reserved.
Optionally, in the first step, features which are effective in distinguishing the CNS drugs and non-CNS drug small molecules are preliminarily screened out from all the features, and a random forest algorithm is adopted, and the information gain rate is used as an attribute division evaluation function to perform preliminary feature selection.
Optionally, in the second step, the optimized support vector machine model is obtained by an exhaustion method.
The invention also provides a CNS drug molecule design method, which adopts the method to identify key characteristics of the CNS drug.
The invention has the beneficial effects that:
by combining a support vector machine and a greedy algorithm, the characteristics with the minimum effect on improving the prediction result are gradually deleted by utilizing the greedy idea, and further, the key characteristics for distinguishing the CNS drugs from non-CNS drug small molecules are accurately screened out. The method combines a support vector machine and a greedy algorithm for the first time to be applied to the identification of the key characteristics of the CNS drugs, screens the key characteristics in a gradual deletion mode, considers the effect of combination among the characteristics, avoids the difficulty of initial characteristic selection brought by a characteristic increasing method, enables the screened key characteristics to effectively distinguish the CNS drugs from non-CNS drug micromolecules, and provides an important guidance method for fundamentally designing CNS drug candidate micromolecules.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.
The first embodiment is as follows:
the embodiment provides a method for identifying key features of CNS drugs based on a support vector machine and a greedy algorithm, which combines the support vector machine and the greedy algorithm, gradually eliminates features having minimum effect on improving a prediction result by using a greedy thought, and further accurately screens out key features for distinguishing the CNS drugs from non-CNS drug small molecules, and comprises the following steps:
step (1) adopts a random forest algorithm to perform preliminary feature selection:
constructing a random forest model, dividing an evaluation function by using the information gain rate as an attribute, and performing primary feature selection;
specifically, a random forest model including 100 decision trees was constructed, and the evaluation function was divided using the information gain rate as an attribute with reference to "Yaoyang, Yangjing, Jangjuan. In order to improve the selection efficiency and performance, 2/3 samples and 1/2 features are randomly selected each time to construct a decision tree; to prevent overfitting, the node aborts splitting when the number of unassigned samples is less than 5.
And counting all the features appearing on the tree, namely the features which are preliminarily selected.
Optimizing support vector machine model parameters c and g by adopting an exhaustion method, wherein c is a penalty coefficient, and g is a nuclear parameter;
a classifier that identifies CNS drugs and non-CNS drug small molecules is constructed using a support vector machine algorithm with radial basis kernel functions in LIBSVM packages.
In [2 ]-4,24]All combinations of c and g are exhausted within the range, and 5-fold cross validation is performed under each combination to find the optimal combination of c and g.
The objective optimization problem of the support vector machine is to find a hyperplane which can distinguish the CNS sample from non-CNS sample as much as possible, and the formula is as follows:
f(x)=wTx+b
where x is the input eigenvector, w is the normal vector normal to the hyperplane, and b is the offset.
Obtaining w according to the Lagrange multiplier; using a mapping function phi to map the eigenvectors xiAnd x is mapped to a high dimensional space as shown in the following equation:
Figure BDA0002323557530000041
Figure BDA0002323557530000042
wherein λ isiIs the Lagrange multiplier, yiIs the sum of the feature vector xiAnd (3) related sample labels, wherein m is the number of samples, and i is more than or equal to 1 and less than or equal to m.
Without a kernel function, the computation of a high dimensional space would likely lead to a dimensional explosion, and to avoid this problem, the radial basis kernel function K (x)iX) is used instead of the explicit mapping ΦT(xi) Φ (x), as follows:
Figure BDA0002323557530000043
K(xi,x)=exp(-g*|xi-x|2)
among them, the g parameter is a very important kernel parameter, and has a great influence on the training of the model. Another important parameter is the penalty factor c, which affects the smoothness of the classification plane.
And (2) calculating the prediction performance of the support vector machine model corresponding to all combinations of c and g by taking the features selected preliminarily in the step (1) as input vectors, selecting a group of corresponding c and g with the best prediction performance, and taking the group of c and g as the parameters c and g of the optimized support vector machine model.
Step (3) identifying key features by using greedy algorithm
And (3) respectively taking different combinations of the features preliminarily selected in the step (1) as input vectors of the support vector machine model optimized in the step (2), and screening out key features according to corresponding prediction performance.
Specifically, the method comprises the following steps:
s1 assumes that n features are initially selected in step (1): { a1,a2,a3,a4,…an};
S2 delete each feature one by one, resulting in n different feature combinations: { a2,a3,a4,…an},{a1,a3,a4,…an},{a1,a2,a4,…an},…{a1,a2,a3,a4,…an-1};
S3 using each feature combination in step S2 as input vector for supporting vector machine model, reserving feature combination with best prediction performance, and recording prediction performance pj,1≤j≤n;
S4 using the set of feature combinations retained in step S3, executing S2 and S3 with n-1 features in the set, thereby looping through S2 to S4 until all features are deleted;
s5 all predicted performances p in the above-mentioned processes S2 to S4jBest p in (1)jAnd the corresponding feature combination is the screened key feature.
In the above-mentioned step (2) and step (3)Predicting the Performance pjIncluding sensitivity SEN and specificity SPE:
sensitivity SEN, i.e. positive sample prediction rate (CNS drug prediction rate);
specific SPE, negative sample prediction rate (non-CNS drug prediction rate);
the constructed CNS drug recognition model is evaluated by the sensitivity SEN and the specificity SPE together, and the larger the value is, the better the performance of the model is.
In particular, in determining the predicted performance pjBest p in (1)jAnd if the highest SEN and SPE belong to the same feature combination, keeping the feature combination.
If the highest SEN and SPE belong to different feature combinations, selecting the feature combination to be reserved according to the size of the corresponding SPE and SEN; for example, the highest SEN and SPE belong to feature combinations a and B, respectively, that is, the SEN of feature combination a is the highest, and the SPE of feature combination B is the highest, the SEN of feature combination a and feature combination B are compared:
if the SPE of the feature combination A is larger than the SEN of the feature combination B, selecting and reserving the feature combination A;
if the SPE of the feature combination A is smaller than the SEN of the feature combination B, selecting and reserving the feature combination B;
and if the SPE of the feature combination A is equal to the SEN of the feature combination B, comparing the sizes of the SEN of the feature combination A and the SPE of the feature combination B, and selecting the feature combination corresponding to the larger one.
And if SPE and SEN of the two feature combinations are equal, randomly reserving feature combination A or feature combination B.
In order to verify the key characteristics of the method provided by the application, which can effectively identify the CNS drugs, the application takes the existing CNS drugs and non-CNS drug small molecules as experimental objects, and the data are derived from ZINC15(http:// ZINC15. gating. org /) and drug Bank (https:// www.drugbank.ca /) databases.
The inventors downloaded drug data in SDF format (corresponding to 879 non-CNS drug small molecules and 273 CNS drug small molecules) from the above two databases, including initial coordinates of all atoms in each drug molecule and bond type information between atoms; the drug data was used as input to the PaDEL software to derive all the eigenvalues for each drug molecule, with 1875 being the number of all eigenvalues calculated for each drug molecule in this example.
Step (1) adopts a random forest algorithm to perform preliminary feature selection:
a random forest model was constructed, features contributing to the discrimination of CNS drugs from non-CNS drug small molecules were selected, and 941 useful features were selected from 1875 features.
Optimizing support vector machine model parameters c and g by adopting an exhaustion method to obtain optimized support vector machine model parameters c and g:
355 of 879 non-CNS drug small molecules are randomly selected as negative samples, 273 CNS drug small molecules are taken as positive samples, and parameters c and g of the support vector machine are set to [2 ]-4,24]In the range of (c), the best 5-fold cross-validation results were searched in all c and g combinations and the test set was used to test the model generalization performance. The above process was repeated 5 times, and the results are shown in table 1 below, where the positive samples in the 5 samples were identical in CNS drugs, and the negative samples were 355 randomly selected from 879 small molecules of non-CNS drugs, i.e., the positive samples in the 5 samples were identical in CNS drugs, and the negative samples corresponded to 355 different non-CNS drugs.
And (3) taking 941 features of each drug small molecule as input vectors of support vector machine models corresponding to different c and g combinations to predict the performance of the drug small molecule.
TABLE 1 support vector machine model parameters and Performance on external test set
Figure BDA0002323557530000061
As can be seen from table 1, the support vector machine models corresponding to parameters c and g corresponding to sample 3 have the best prediction performance, so that the support vector machine models corresponding to the group c and g are used to perform key feature screening, and the group of corresponding randomly selected 355 non-CNS drug small molecules and 273 CNS drug small molecules are used as samples for screening key features.
And (3) identifying key features by using a greedy algorithm:
selecting key features by using the randomly selected 355 non-CNS drug small molecules and 941 features corresponding to 273 CNS drug small molecules determined in the step (2) as input vectors of the optimized support vector machine model, specifically:
s1: 941 features are: { a1,a2,a3,a4,…an},n=941;
S2: deleting each feature one by one, resulting in 941 different feature combinations: { a2,a3,a4,…an},{a1,a3,a4,…an},{a1,a2,a4,…an},…{a1,a2,a3,a4,…an-1};
S3: taking each feature combination in the step S2 as an input vector of a support vector machine model, reserving the feature combination with the best prediction performance, and recording the values of the sensitivity SEN and the specificity SPE of each group of feature combinations;
and comparing the sensitivity SEN value and the specificity SPE value of each 941 feature combinations, and if the SEN and the SPE with the highest values belong to the same feature combination, keeping the feature combination.
If the highest SEN and SPE belong to different feature combinations, selecting the feature combination to be reserved according to the size of the corresponding SPE and SEN; for example, the highest SPE belongs to the 800 th feature combination, and the highest SEN corresponds to the 900 th feature combination, then the SEN of the 800 th feature combination is compared with the 900 th feature combination SPE:
if the SEN of the 800 th feature combination is larger than the 900 th feature combination SPE, the 800 th feature combination is reserved;
if the SEN of the 800 th feature combination is smaller than the SPE of the 900 th feature combination, the 900 th feature combination is reserved;
if the SEN of the 800 th feature combination is equal to the 900 th feature combination SPE, comparing the size of the 800 th feature combination SPE with the 900 th feature combination SEN:
-if SPE of the 800 th feature combination is larger than the 900 th feature combination SEN, retaining the 800 th feature combination;
-if SPE of the 800 th feature combination is smaller than the 900 th feature combination SEN, retaining the 900 th feature combination;
-if SPE of the 800 th feature combination is equal to the 900 th feature combination SEN, randomly retaining the 800 th feature combination or the 900 th feature combination.
S4: using the set of feature combinations retained in step S3, performing S2 and S3 with 940 features in the set, thereby looping through S2 to S4 until all features are deleted;
s5: all predicted performances p in the above-mentioned processes of S2 to S4jBest p in (1)jAnd the corresponding feature combination is the screened key feature.
Through the above cycle process, 40 key features are finally screened out in this embodiment, and the SEN and SPE of the test result both reach more than 94% by using the screened out 40 key features as input variables of the model. The 40 key features selected are shown in table 2 below:
TABLE 2 screened 40 key features and their description
Figure BDA0002323557530000071
Figure BDA0002323557530000081
Figure BDA0002323557530000091
Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A CNS drug key feature identification method is characterized in that a support vector machine and a greedy algorithm are combined, the feature with the minimum effect on improving a prediction result is gradually deleted by the greedy algorithm, and then key features for distinguishing CNS drugs from non-CNS drug small molecules are accurately screened out.
2. The method according to claim 1, characterized in that it comprises:
firstly, preliminarily screening out characteristics which have the effect of distinguishing the CNS drug and non-CNS drug micromolecules from all characteristics of the CNS drug and non-CNS drug micromolecules;
step two, constructing a support vector machine model by utilizing the characteristics which are preliminarily screened in the step one and have the effect of distinguishing the CNS medicament from the non-CNS medicament, and optimizing parameters c and g to obtain an optimized support vector machine model;
and step three, gradually deleting the characteristics which are preliminarily screened in the step one and have the effect of distinguishing the CNS drugs from non-CNS drugs by using a greedy algorithm, and screening key characteristics for distinguishing the CNS drugs from the non-CNS drugs in the deletion process.
3. The method of claim 2, wherein n is assumed as the number of features selected in step one as having an effect of distinguishing CNS drugs from non-CNS drugs; the third step includes:
3.1 delete each feature one by one, resulting in n different feature combinations: { a2,a3,a4,…an},{a1,a3,a4,…an},{a1,a2,a4,…an},…{a1,a2,a3,a4,…an-1};
3.2 taking the n different feature combinations as input vectors of the optimized support vector machine model obtained in the second step to obtain the prediction performances respectively corresponding to the n different feature combinations, and reserving the feature combination with the best prediction performance;
3.3 execute 3.1 to 3.2 with n-1 features in one feature combination with the best predictive performance obtained at 3.2, and loop until n features are deleted;
3.4 selecting from the above 3.1 to 3.3 implementations a combination of features that is key to distinguishing between CNS drugs and non-CNS drugs.
4. The method of claim 3, wherein the predictive performance includes sensitivity SEN and specificity SPE; SEN represents the prediction rate of CNS drugs and SPE represents the prediction rate of non-CNS drugs.
5. The method of claim 4, wherein the step of retaining the feature combination with the best prediction performance in 3.2 comprises:
respectively comparing the SEN value and the SPE value corresponding to each feature combination, and selecting the highest SEN value and SPE value;
if the highest SEN and SPE belong to the same feature combination, the feature combination is reserved;
and if the SEN and the SPE which are the highest belong to two different feature combinations, comprehensively determining the feature combination to be reserved according to the SEN and the SPE of each of the two different feature combinations.
6. The method of claim 5, wherein the step of comprehensively determining the combination of features to be preserved according to the SEN and the SPE of each of the two different feature combinations, assuming that the highest SEN and SPE belong to the two different feature combinations A and B, respectively, comprises:
comparing the SPE of the feature combination A with the SEN of the feature combination B;
if the SPE of the feature combination A is larger than the SEN of the feature combination B, selecting and reserving the feature combination A;
if the SPE of the feature combination A is smaller than the SEN of the feature combination B, selecting and reserving the feature combination B;
and if the SPE of the feature combination A is equal to the SEN of the feature combination B, comparing the sizes of the SEN of the feature combination A and the SPE of the feature combination B, and selecting the feature combination corresponding to the larger one.
7. The method of claim 6 wherein feature combination a or feature combination B is randomly retained if SPE and SEN of two feature combinations are equal.
8. The method as claimed in any one of claims 2 to 7, wherein the first step is to preliminarily select the features having an effect of distinguishing the CNS drug from non-CNS drug small molecules, and to perform the preliminary feature selection by using a random forest algorithm and using an information gain ratio as an attribute classification evaluation function.
9. The method according to any one of claims 2 to 8, wherein the second step adopts an exhaustive method to obtain the optimized support vector machine model.
10. A method for CNS drug molecule design, wherein said design method identifies key features of CNS drugs using the method of any of claims 1-9.
CN201911307432.6A 2019-12-18 2019-12-18 CNS drug key feature identification method Active CN111081321B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911307432.6A CN111081321B (en) 2019-12-18 2019-12-18 CNS drug key feature identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911307432.6A CN111081321B (en) 2019-12-18 2019-12-18 CNS drug key feature identification method

Publications (2)

Publication Number Publication Date
CN111081321A true CN111081321A (en) 2020-04-28
CN111081321B CN111081321B (en) 2023-10-31

Family

ID=70315502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911307432.6A Active CN111081321B (en) 2019-12-18 2019-12-18 CNS drug key feature identification method

Country Status (1)

Country Link
CN (1) CN111081321B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238148A (en) * 2022-09-21 2022-10-25 杭州衡泰技术股份有限公司 Characteristic combination screening method for multi-party enterprise joint credit rating and application

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866863A (en) * 2015-04-27 2015-08-26 大连理工大学 Biomarker screening method
CN105740626A (en) * 2016-02-01 2016-07-06 华中农业大学 Drug activity prediction method based on machine learning
CN106991296A (en) * 2017-04-01 2017-07-28 大连理工大学 Ensemble classifier method based on the greedy feature selecting of randomization
CN107731309A (en) * 2017-08-31 2018-02-23 武汉百药联科科技有限公司 A kind of Forecasting Methodology of pharmaceutical activity and its application
CN110459274A (en) * 2019-08-01 2019-11-15 南京邮电大学 A kind of small-molecule drug virtual screening method and its application based on depth migration study

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866863A (en) * 2015-04-27 2015-08-26 大连理工大学 Biomarker screening method
CN105740626A (en) * 2016-02-01 2016-07-06 华中农业大学 Drug activity prediction method based on machine learning
CN106991296A (en) * 2017-04-01 2017-07-28 大连理工大学 Ensemble classifier method based on the greedy feature selecting of randomization
CN107731309A (en) * 2017-08-31 2018-02-23 武汉百药联科科技有限公司 A kind of Forecasting Methodology of pharmaceutical activity and its application
CN110459274A (en) * 2019-08-01 2019-11-15 南京邮电大学 A kind of small-molecule drug virtual screening method and its application based on depth migration study

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238148A (en) * 2022-09-21 2022-10-25 杭州衡泰技术股份有限公司 Characteristic combination screening method for multi-party enterprise joint credit rating and application

Also Published As

Publication number Publication date
CN111081321B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
US20200004777A1 (en) Image Retrieval with Deep Local Feature Descriptors and Attention-Based Keypoint Descriptors
Stumpfe et al. Similarity searching
JP6954003B2 (en) Determining device and method of convolutional neural network model for database
JP6839342B2 (en) Information processing equipment, information processing methods and programs
Hanczar et al. Ensemble methods for biclustering tasks
Pes Learning from high-dimensional biomedical datasets: the issue of class imbalance
Lin et al. Efficient classification of hot spots and hub protein interfaces by recursive feature elimination and gradient boosting
US11775610B2 (en) Flexible imputation of missing data
CN107679138B (en) Spectral feature selection method based on local scale parameters, entropy and cosine similarity
US20190251468A1 (en) Systems and Methods for Distributed Generation of Decision Tree-Based Models
CN113344113B (en) Yolov3 anchor frame determination method based on improved k-means clustering
JP4937395B2 (en) Feature vector generation apparatus, feature vector generation method and program
CN109390032B (en) Method for exploring disease-related SNP (single nucleotide polymorphism) combination in data of whole genome association analysis based on evolutionary algorithm
CN111081321A (en) CNS drug key feature identification method
CN112837743A (en) Medicine repositioning method based on machine learning
He et al. Measuring boundedness for protein complex identification in PPI networks
US11886445B2 (en) Classification engineering using regional locality-sensitive hashing (LSH) searches
US11710057B2 (en) Methods and systems for identifying patterns in data using delimited feature-regions
CN111860622B (en) Clustering method and system applied to programming field big data
Yang et al. Adaptive density peak clustering for determinging cluster center
US20120208227A1 (en) Apparatus and method for processing cell culture data
Devi et al. Similarity measurement in recent biased time series databases using different clustering methods
CN111401783A (en) Power system operation data integration feature selection method
CN110766087A (en) Method for improving data clustering quality of k-means based on dispersion maximization method
Böhm et al. Querying objects modeled by arbitrary probability distributions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant