CN115062750A - Compound water solubility prediction method of dynamic evolution whale optimization algorithm - Google Patents

Compound water solubility prediction method of dynamic evolution whale optimization algorithm Download PDF

Info

Publication number
CN115062750A
CN115062750A CN202210679711.0A CN202210679711A CN115062750A CN 115062750 A CN115062750 A CN 115062750A CN 202210679711 A CN202210679711 A CN 202210679711A CN 115062750 A CN115062750 A CN 115062750A
Authority
CN
China
Prior art keywords
population
algorithm
optimization algorithm
whale
compound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210679711.0A
Other languages
Chinese (zh)
Inventor
张琛
沈亚
陈圣兵
郭法滨
张新
程知
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University
Original Assignee
Hefei University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University filed Critical Hefei University
Priority to CN202210679711.0A priority Critical patent/CN115062750A/en
Publication of CN115062750A publication Critical patent/CN115062750A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a compound water solubility prediction method of a dynamic evolution whale optimization algorithm, which specifically comprises the following steps: step S1: selecting a compound water-soluble data set as experimental data and dividing the data set into a training set and a testing set; step S2: improving a whale optimization algorithm by using a multi-population and population dynamic evolution strategy to improve the optimizing capacity of the whale optimization algorithm; step S3: the improved whale optimization algorithm is used for parameter optimization of the LSTM neural network, and the LSTM neural network with a better parameter structure is trained; step S4: predicting the water solubility of the compound by using the modified LSTM neural network; the LSTM deep learning model trained by the method can accurately predict the water solubility of the compound; the traditional whale optimization algorithm is optimized, so that the optimizing precision and the algorithm convergence efficiency are improved; the deep learning and group intelligent optimization algorithm is applied to the prediction of the water solubility of the compound, and valuable reference is provided for the research work of the prediction of the related properties of the compound.

Description

Compound water solubility prediction method of dynamic evolution whale optimization algorithm
Technical Field
The invention belongs to the technical field of compound water solution prediction, and particularly relates to a compound water solubility prediction method of a dynamic evolution whale optimization algorithm.
Background
80% of human body is composed of water, so water solubility is very important in the development process of narcotics and other drugs, and has important influence on the toxicity and in-vivo efficacy, biological activity, pharmacokinetics and other properties of various drugs. The water solubility of the medicine is focused at each stage of medicine research and development, and the accurate and efficient prediction of the water solubility of the compound is the key for reducing the cost of medicine research and development and ensuring the success of medicine research and development; the prediction of the water solubility of the compound is very important in the aspects of the selection of materials of coatings, the design of coatings and batteries, and the like, and how to determine and predict the water solubility of the compound is a complex and common problem and gradually attracts attention and attention of people.
The prediction of the water solubility of the compound can effectively promote the development of the pharmaceutical industry, but the conventional kinetic and thermodynamic methods are mainly used for predicting the water solubility of the compound at present, and the method has the characteristics of high cost and incapability of realizing large-scale compound screening.
The traditional machine learning method can realize the water solubility prediction of the compound, but the model accuracy is not high due to the small input characteristics and data quantity of data, and when the data quantity is large enough, the operation efficiency is low, so that the accuracy of the traditional machine learning is limited. The occurrence of deep learning is excellent in solving this problem, wherein LSTM, which is one kind of deep learning, is widely used for prediction and analysis of data, and has successfully solved prediction of air pollution, traffic flow, and the like; however, the LSTM neural network has many parameters that greatly affect the predicted effect of the LSTM neural network.
For example, a "hydrological time series prediction optimization method based on WOA-LSTM-MC" disclosed in chinese patent literature, whose publication number is CN112733997A, includes the problems of low optimization precision, slow algorithm convergence speed, and low algorithm calculation efficiency.
Disclosure of Invention
The invention provides a compound water-solubility prediction method of a dynamic evolution whale optimization algorithm, aiming at overcoming the problems that in the prior art, the water-solubility prediction cost of a compound is high, large-scale compound screening cannot be realized, the model accuracy is not high due to the fact that the input characteristics and the data quantity of data of a machine learning method are small, and when the data quantity is large enough, the operation efficiency is reduced, the accuracy of traditional machine learning is restricted, and the like.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
a compound water solubility prediction method of a dynamic evolution whale optimization algorithm is characterized by specifically comprising the following steps of:
step S1: the method comprises the steps of collecting water-soluble data of a compound to form a data set, and dividing the collected data into a training set and a test set by combining the data set.
Step S2: and optimizing the traditional whale optimization algorithm by using the multi-population and population dynamic evolution strategy, and improving the optimizing precision and the convergence speed of the whale optimization algorithm.
Step S3: and performing parameter optimization on the iteration times Max _ epochs, Batch processing size Batch _ size, the neuron number Hidden _ size of the Hidden layer and the learning rate Lr of the neural network by using a multi-population dynamically-evolved whale optimization algorithm, and determining the LSTM neural network model with the optimal parameter combination.
Step S4: according to the LSTM model obtained in step S3, a compound water solubility prediction analysis process is performed.
The optimized whale optimization algorithm is used for optimizing the LSTM neural network parameters, and the optimized LSTM neural network is applied to compound water solubility prediction, so that the accuracy of compound water solubility prediction and the optimization efficiency of the neural network parameters are improved.
Preferably, after the dividing step S1 is performed on the compound water-solubility data set, the method further includes a data preprocessing step, a noise data cleaning step, and a data normalization step.
Preferably, the improving whale optimization algorithm by using the strategy of population and population dynamic evolution in the step S2 comprises the following steps:
step S2-1: initializing parameters: setting the population quantity as N, setting the dimensionality of each individual as M, wherein M is the dimensionality of a problem to be solved, setting the maximum iteration time of the algorithm as T, setting the current iteration time T of the algorithm as 0, initializing a matrix with N rows and M columns to represent an initial ethnicity, setting each row to represent one individual, setting each individual as an M-dimensional vector to represent a solution of the M-dimensional problem, and setting the initial matrix of the population as:
Figure BDA0003697852760000021
step S2-2: calculating fitness value of the individual: and calculating the fitness value of each individual in the initial population according to the fitness function f (x), and finding out the individual with the current optimal fitness value.
Step S2-3: multi-population whale optimization algorithm: dividing whale individuals into three sub-populations with equal quantity according to the fitness value of the whale individuals, wherein the individuals with the worst fitness value form an exploration population to enhance the global exploration capability; the individual composition with the optimal fitness value develops the convergence rate and the local search capability of the population enhancement algorithm and improves the solving precision of the algorithm; the remaining individuals constitute a common population for balancing the global exploration capability and the local search capability of the algorithm.
The update mechanism for exploring the population position is as follows:
D=|C.X * (t)-X(t))|
A=2a.r-a
C=2.r
X(t+1)=X rand (t)-A.D
in the above formula, X rand Randomly selecting a whale from the whale population, wherein r is a random number between 0 and 1, and a is linearly decreased from 2 to 0 along with the increase of iteration.
The location update mechanism for developing populations is as follows:
D=|X * (t)-X(t)
Figure BDA0003697852760000031
in the above formula, p is a random number between 0 and 1 which satisfies uniform distribution, b is 1, l is a random number between-1 and 1, and X * (t) is the location of the optimal whale.
The location update mechanism of the general population is as follows:
Figure BDA0003697852760000032
in the above formula, p1 and p2 are random numbers that satisfy a uniform distribution between 0 and 1.
Step S2-4: judging whether the whale optimization algorithm falls into a local optimal solution, wherein for a colony intelligent optimization algorithm, the current optimal solution of the algorithm refers to the best calculation result obtained by the operation of the algorithm, if the current optimal solution is not updated after one iteration, the algorithm is not found to be a better solution temporarily, and therefore the algorithm enters a local optimal state, when the following formula is satisfied, the algorithm is judged to enter the local optimal state, and the formula is as follows:
X * (t)==X * (t+2)
in the above formula, t is the current iteration number of the algorithm, and if the whale optimization algorithm on the surface of the market does not update the current optimal solution after three continuous iterations, the current algorithm is judged to enter the local optimal state.
Step S2-5: when the whale optimization algorithm enters a local optimal state, the fitness value of each individual in the population is calculated according to the fitness function and is divided into three populations again, and dynamic evolution of the populations is carried out at the moment.
Step S2-6: exploring the population direction to expand the current position of the population so as to expand the search range of the population and enhance the global exploration capability of the algorithm, wherein the population evolution is carried out through the following formula:
r=rand[0,1]+1
X(t)=X * (t).r
step S2-7: the population is developed and the current optimal solution is utilized to carry out deep local search, so that the convergence speed of the algorithm is accelerated, the solving precision is enhanced, and the population evolution is carried out through the following formula:
r=rand[0,1]
(t)=X * (t).r
and step S2-8, the common population updates the position of the common population by using the reverse solution of the common population, and the evolution of the population is carried out by the following formula:
Figure BDA0003697852760000041
Figure BDA0003697852760000042
in the above equation, lb is the lower bound of the problem solution space, ub is the upper bound of the problem solution space, and fit () is the fitness function.
Step S2-9: and judging whether the algorithm reaches a loop ending condition, when T is T, namely the algorithm reaches the maximum iteration times, ending the algorithm to output an optimal solution, and otherwise, returning to the step S2-2.
Preferably, in step S3, the whale optimization algorithm with multi-population dynamic evolution is used to perform parameter optimization on the iteration number Max _ epochs, the Batch size Batch _ size, the neuron number of Hidden layer Hidden _ size, and the learning rate Lr of the neural network, and the specific process of determining the LSTM neural network model with the optimal parameter combination is as follows:
step S3-1: determining that the structure of the LSTM network model is a single-layer LSTM network, wherein the structure comprises an input structure and an output structure, the iteration times Max _ epochs of the model, the Batch processing size Batch _ size, the neuron number Hidden _ size of a Hidden layer and the learning rate Lr of the network; model training was performed using a training set of compound water solubility data.
Step S3-2: initialization of the population dynamic evolution whale optimization algorithm: the position vector of the multi-population dynamically evolved whale optimization algorithm corresponds to the parameters of the LSTM network in step S3-1, that is, the position vector of the multi-population dynamically evolved whale optimization algorithm includes four dimensions, and corresponds to the number of iterations Max _ epochs, the Batch size Batch _ size, the number of neurons in the Hidden layer Hidden _ size, and the learning rate Lr of the network respectively. Other parameters of the simultaneous initialization algorithm include: the population number is N, the maximum iteration number of the algorithm is T, the current iteration number of the algorithm is T equal to 0, and the whale individual is set as: x ═ X (X) 1 ,X 2 ,X 3 ,X 4 )。
Step S3-3: calculating the fitness value of each individual, calculating the fitness value of each individual in the initial population according to a fitness function, and finding out the individual with the current optimal fitness value, wherein the fitness function is the root mean square error RMSE between the predicted value and the actual value of the LSTM neural network model, and the calculation formula is as follows:
Figure BDA0003697852760000043
in the above formula, y (t) represents the real value of the t compound water-solubility data in the test set, y' (t) represents the real value of the t compound water-insolubility data, and n is the total number of data in the compound water-solubility test set.
Step S3-4: and dividing the population into three sub-populations with equal number according to the fitness value.
Step S3-5: and when the local optimal judgment formula of the step S2-4 is met, judging that the whale optimization algorithm enters a local optimal state.
Step S3-6: and when the whale optimization algorithm is trapped in a local optimal solution, performing dynamic evolution of the population.
Step S3-7: judging whether the algorithm reaches a loop ending condition, and when T is equal to T, namely the algorithm reaches the maximum iteration times, ending the algorithm and outputting an optimal solution; otherwise, the process returns to step S3-3 to continue the loop.
Preferably, the step S3-4 includes the steps of:
step S3-4-1: for the exploration population, the position of the next generation is updated by adopting the updating mechanism of the exploration population position in the step S2-3.
Step S3-4-2: for the development population, the location updating mechanism of the development population in step S2-3 is adopted to update the location of the next generation.
Step S3-4-3: for the general population, the location updating mechanism of the general population in step S2-3 is used to update the location of the next generation.
Preferably, the step S3-6 includes the steps of:
step S3-6-1: for exploring the population, dynamic evolution of the population is performed according to the location update mechanism of step S2-6.
Step S3-6-2: for developing the population, dynamic evolution of the population is performed according to the location update mechanism of step S2-7.
Step S3-6-3: for the general population, dynamic evolution of the population is performed according to the location update mechanism of step S2-8.
Preferably, the step S4 includes the steps of:
step S4-1: and when the optimal solution is output in the step S3-7, the value on the position vector of the optimal solution is used as the optimal parameter of the LSTM network to construct the LSTM network.
Step S4-2: predicting the trained LSTM neural network model with the optimal parameter combination on a test set of a compound water-soluble data set; and outputting the predicted value and carrying out inverse normalization to obtain a final predicted result.
Therefore, the beneficial effects of the invention are as follows:
1. optimizing a traditional whale optimization algorithm, improving optimizing precision and algorithm convergence efficiency, using the optimized whale optimization algorithm for optimizing LSTM neural network parameters, and applying the optimized LSTM neural network to compound water solubility prediction, so that the accuracy of compound water solubility prediction and the optimizing efficiency of neural network parameters are improved;
2. the deep learning and group intelligent optimization algorithm is applied to the prediction of the water solubility of the compound, and valuable sample references are provided for the research work of the prediction of the related properties of the compound.
Drawings
FIG. 1 is an overall workflow diagram of the present invention;
FIG. 2 is a flow chart of a whale optimization algorithm based on population dynamic evolution in the invention;
FIG. 3 is a flow chart of the work of optimizing LSTM parameters based on the population dynamic evolution whale optimization algorithm in the invention;
FIG. 4 is a graph of fitness convergence of a whale optimization algorithm based on population dynamic evolution and other optimization algorithms in the invention;
FIG. 5 is a schematic diagram of a device for predicting water solubility of a compound based on a population dynamic evolution whale optimization algorithm in the invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Example one
The compound water-solubility data prediction of the embodiment is that a publicly-accessed compound water-solubility data set AqSolDB is used for training and predicting a model; the purpose of this example is to predict the water solubility of a compound by analyzing the molecular information of the compound.
As shown in fig. 1, a compound water-soluble data set is preprocessed and feature extracted, then a multi-population dynamic evolution strategy is used for optimizing a whale algorithm, and optimizing precision and convergence speed of the whale optimization algorithm are improved; optimizing the LSTM by using an improved whale algorithm, and constructing a deep learning model; predicting the water solubility of the compound by using the constructed deep learning; the method specifically comprises the following steps:
step S1: dividing a compound water-soluble data set into a training set and a testing set, and performing data preprocessing and feature extraction.
Step S1-1: the compound water-soluble data set was partitioned, 70% as training set for model training and 30% as test set for model prediction.
Step S1-2: and carrying out data normalization on the divided training set and the divided test set, wherein the normalization is standardized in a Min-Max mode, and the calculation formula is as follows:
Figure BDA0003697852760000061
in the above formula, x represents the original characteristics of the compound data, and x' represents the characteristics after normalization. The data normalization is to remove unit limitation of data, convert the data into a pure value between 0 and 1, and improve the convergence rate and prediction accuracy of the model to a certain extent.
Step S2: a multi-population and population dynamic evolution strategy is used for optimizing a traditional whale optimization algorithm, and the optimizing precision and the convergence speed of the whale optimization algorithm are improved.
Step S3: parameter optimization is carried out on the iteration times Max _ epohs, Batch size Batch _ size, the neuron number Hidden _ size of the Hidden layer and the learning rate Lr of the network of the LSTM neural network model by using a multi-population dynamic evolution whale optimization algorithm (MDEWOA), so that the LSTM neural network model with the optimal parameter combination is determined.
Step S4: the LSTM model obtained in step S3 is trained on a training set of compound water-soluble data sets, and then the trained model is used to obtain a prediction result of the model on a test set.
In the above embodiment, the improvement process of the whale optimization algorithm based on the population dynamic evolution is shown in fig. 2, wherein the step S2 specifically includes the following steps:
step S2-1: initialization parameters
The number of populations is set to be N, the dimension of each individual is M, namely the dimension of a problem to be solved, the maximum iteration time of the algorithm is T, and the current iteration time T of the algorithm is 0, a matrix with N rows and M columns is initialized to represent the initial population, each row represents one individual, each individual is an M-dimensional vector and represents a solution of the M-dimensional problem.
Figure BDA0003697852760000071
Step S2-2: calculating fitness value of an individual
And calculating the fitness value of each individual in the initial population according to the fitness function f (x), and finding out the individual with the current optimal fitness value.
Step S2-3: multi-population whale optimization algorithm
Dividing whale individuals into three sub-populations with equal quantity according to the fitness value of the whale individuals, wherein the individuals with the worst fitness value form an exploration population to enhance the global exploration capability; the individual composition with the optimal fitness value develops the convergence rate and the local search capability of the population enhancement algorithm and improves the solving precision of the algorithm; the remaining individuals constitute a common population for balancing the global exploration capability and the local search capability of the algorithm.
The update mechanism for exploring the population position is as follows:
D=|C.X * (t)-X(t))|
A=2a.r-a
C=2.r
X(t+1)=X rand (t)-A.D
in the above formula, X rand Randomly selecting a whale from the whale population, wherein r is a random number between 0 and 1, and a is linearly decreased from 2 to 0 along with the increase of iteration.
The location update mechanism for developing populations is as follows:
D'=|X * (t)-X(t)|
Figure BDA0003697852760000081
in the above formula, p is a random number between 0 and 1 which satisfies uniform distribution, b is 1, l is a random number between-1 and 1, and X * (t) is the location of the optimal whale.
The location update mechanism of the general population is as follows:
Figure BDA0003697852760000082
in the above formula, p1 and p2 are random numbers that satisfy a uniform distribution between 0 and 1.
Step S2-4: determining whether whale optimization algorithm is trapped in local optimum
For the swarm intelligence optimization algorithm, the current optimal solution of the algorithm refers to the best result obtained after the algorithm is operated, and if the current optimal solution is not updated after one iteration, the algorithm does not find a better solution, so that the algorithm falls into local optimization, and the algorithm is judged to fall into local optimization when the following formula is satisfied.
X * (t)==X * (t+2)
Wherein t is the current iteration number of the algorithm, and the above formula shows that the whale optimization algorithm does not update the current optimal solution after three iterations, so that the algorithm is judged to be locally optimal.
Step S2-5: when the whale optimization algorithm enters a local optimal state, the fitness value of each individual in the population is calculated according to the fitness function and is divided into three populations again, and dynamic evolution of the populations is carried out at the moment.
Step S2-6: exploring the population direction to expand the current position of the population so as to expand the search range of the population and enhance the global exploration capability of the algorithm, wherein the population evolution is carried out through the following formula:
r=rand[0,1]+1
X(t)=X * (t).r
step S2-7: the population is developed and the current optimal solution is utilized to carry out deep local search, so that the convergence speed of the algorithm is accelerated, the solving precision is enhanced, and the population evolution is carried out through the following formula:
r=rand[0,1]
(t)=X * (t).r
and step S2-8, the common population updates the position of the common population by using the reverse solution of the common population, and the evolution of the population is carried out by the following formula:
Figure BDA0003697852760000083
Figure BDA0003697852760000084
in the above equation, lb is the lower bound of the problem solution space, ub is the upper bound of the problem solution space, and fit () is the fitness function.
Step S2-9: and judging whether the algorithm reaches a loop ending condition, when T is T, namely the algorithm reaches the maximum iteration times, ending the algorithm to output an optimal solution, and otherwise, returning to the step S2-2.
As shown in FIG. 3, the process of optimizing LSTM neural network parameters based on the population dynamic evolution whale optimization algorithm in this example is given.
Wherein, step S3 includes the following steps:
step S3-1: determining the structure of the LSTM network model as a single-layer LSTM network, wherein the structure comprises an input structure, an output structure, iteration times Max _ epochs of the model, Batch size Batch _ size, neuron number Hidden _ size of a Hidden layer and learning rate Lr of the network; model training was performed using a training set of compound water solubility data.
Step S3-2: initialization of the population dynamic evolution whale optimization algorithm: the position vector of the multi-population dynamically-evolved whale optimization algorithm corresponds to the parameters of the LSTM network in step S3-1, that is, the position vector of the multi-population dynamically-evolved whale optimization algorithm includes four dimensions, which respectively correspond to the iteration times Max _ epochs, the Batch processing size Batch _ size, the neuron number of the Hidden layer Hidden _ size, and the learning rate Lr of the network of the LSTM model. Other parameters of the simultaneous initialization algorithm include: the population number is N, the maximum iteration number of the algorithm is T, the current iteration number of the algorithm is T equal to 0, and the whale individual is set as: x ═ X (X) 1 ,X 2 ,X 3 ,X 4 )。
Step S3-3: calculating the fitness value of each individual, calculating the fitness value of each individual in the initial population according to a fitness function, and finding out the individual with the current optimal fitness value, wherein the fitness function is the root mean square error RMSE between the predicted value and the actual value of the LSTM neural network model, and the calculation formula is as follows:
Figure BDA0003697852760000091
in the above formula, y (t) represents the real value of the t compound water-solubility data in the test set, y' (t) represents the real value of the t compound water-insolubility data, and n is the total number of data in the compound water-solubility test set.
Step S3-4: dividing the population into three sub-populations with equal number according to the fitness value, and concretely comprising the following steps:
step S3-4-1: for the exploration population, the location of the next generation is updated by using the update mechanism of the exploration population location in step S2-3.
Step S3-4-2: for the development population, the location updating mechanism of the development population in step S2-3 is adopted to update the location of the next generation.
Step S3-4-3: for the general population, the location updating mechanism of the general population in step S2-3 is used to update the location of the next generation.
Step S3-5: and when the local optimal judgment formula of the step S2-4 is met, judging that the whale optimization algorithm enters a local optimal state.
Step S3-6: when the whale optimization algorithm is trapped in a local optimal solution, the dynamic evolution of the population is carried out, and the method specifically comprises the following steps:
step S3-6-1: for exploring the population, dynamic evolution of the population is performed according to the location update mechanism of step S2-6.
Step S3-6-2: for developing the population, dynamic evolution of the population is performed according to the location update mechanism of step S2-7.
Step S3-6-3: for the general population, dynamic evolution of the population is performed according to the location update mechanism of step S2-8.
Step S3-7: judging whether the algorithm reaches a loop ending condition, and when T is equal to T, namely the algorithm reaches the maximum iteration times, ending the algorithm and outputting an optimal solution; otherwise, the process returns to step S3-3 to continue the loop.
For step S4, the details include the steps of:
step S4.1: and when the optimal solution is output in step S3, the value on the position vector of the optimal solution is used as the optimal parameter of the LSTM network, so as to construct the LSTM network.
Step S4.2: training the LSTM neural network model with the optimal parameter combination on a training set of a compound water-soluble data set and predicting on a test set; and outputting the predicted value and carrying out inverse normalization to obtain a final predicted result.
The invention compares a particle swarm optimization algorithm, a goblet sea squirt swarm optimization algorithm and a whale optimization algorithm with a whale optimization algorithm based on multi-population dynamic evolution, analyzes the fitness convergence curves of the whale optimization algorithm, the goblet sea squirt swarm optimization algorithm and the whale optimization algorithm on a reference test function Schwefel 2.26, and the fitness convergence curves of the four algorithms are shown in figure 4.
The convergence curves of the four algorithms are analyzed, the whale optimization algorithm is improved by using the multi-population and population dynamic evolution strategy, so that the algorithm has stronger global exploration capability and local search capability, and meanwhile, the convergence speed of the algorithm is accelerated, so that the whale optimization algorithm based on the multi-population dynamic evolution in the embodiment is obviously superior to other algorithms in convergence speed and optimization precision.
The method is used for compound water solubility prediction, and table 1 shows the comparison of the model in the example with the results of optimizing an LSTM neural network model by other intelligent optimization algorithms, and the performance of the model is evaluated by analyzing RMSE of each model on a training set and a test set.
TABLE 1
Figure BDA0003697852760000101
Figure BDA0003697852760000111
By analyzing the table 1, it can be seen that, in the aspect of predicting the water solubility of the compound, the method provided by the invention can achieve higher prediction accuracy on both a training set and a testing set; the whale optimization algorithm based on multi-population dynamic evolution has strong global exploration capability and local search capability, and can effectively optimize the LSTM neural network model, so that the LSTM neural network model obtains a better parameter combination, the prediction precision of the model is improved, and therefore, a better compound water-solubility prediction effect is obtained.
As shown in fig. 5, according to the above embodiment of the present invention, there is provided a group intelligent algorithm-based compound water solubility prediction apparatus for the present invention, the apparatus including:
the device comprises a data acquisition module, a data preprocessing module, a data modeling module and a model prediction module.
The data acquisition module is used for acquiring the structure of the compound to be predicted;
a data preprocessing module: for normalizing the acquired compound data;
a data modeling module: optimizing LSTM parameters by using a whale optimization algorithm, and constructing a compound water solubility prediction model;
a model prediction module: and (4) performing water solubility prediction on the compound by using the constructed prediction model.
The structure, features and effects of the present invention have been described in detail with reference to the embodiments shown in the drawings, but the above embodiments are merely preferred embodiments of the present invention, and it should be understood that technical features related to the above embodiments and preferred modes thereof can be reasonably combined and configured into various equivalent schemes by those skilled in the art without departing from and changing the design idea and technical effects of the present invention; therefore, the invention is not limited to the embodiments shown in the drawings, and all the modifications and equivalent embodiments that can be made according to the idea of the invention are within the scope of the invention as long as they are not beyond the spirit of the description and the drawings.

Claims (7)

1. A compound water solubility prediction method of a dynamic evolution whale optimization algorithm is characterized by comprising the following steps:
step S1: collecting water-soluble data of a compound to form a data set, and dividing the collected data into a training set and a test set by combining the data set;
step S2: optimizing a traditional whale optimization algorithm by using a multi-population and population dynamic evolution strategy, and improving the optimizing precision and the convergence speed of the whale optimization algorithm;
step S3: performing parameter optimization on iteration times Max _ epochs, Batch processing quantity size Batch _ size, neuron number Hidden _ size of a Hidden layer and learning rate Lr of a neural network of the LSTM neural network model by using a multi-population dynamically evolved whale optimization algorithm, and determining the LSTM neural network model with optimal parameter combination;
step S4: according to the LSTM model obtained in step S3, a compound water solubility prediction analysis process is performed.
2. The method for predicting the water solubility of a compound in a dynamically evolved whale optimization algorithm as claimed in claim 1, wherein the step S1 further comprises the steps of preprocessing data, cleaning noise data and normalizing data after dividing the water solubility data set of the compound.
3. The method for predicting the water solubility of a compound in a dynamically evolved whale optimization algorithm as claimed in claim 1 or 2, wherein the step of improving the whale optimization algorithm by using the strategy of population and population dynamic evolution in the step S2 comprises the following steps:
step S2-1: initializing parameters: setting the number of populations as N, the dimensionality of each individual as M, wherein M is the dimensionality of a problem to be solved, the maximum iteration time of the algorithm is T, the current iteration time T of the algorithm is 0, initializing a matrix with N rows and M columns to represent an initial race, each row represents one individual, each individual is an M-dimensional vector and represents a solution of the M-dimensional problem, and the initial matrix of the populations is as follows:
Figure FDA0003697852750000011
step S2-2: calculating fitness value of the individual: calculating the fitness value of each individual in the initial population according to a fitness function f (x), and finding out the individual with the current optimal fitness value;
step S2-3: multi-population whale optimization algorithm: dividing whale individuals into three sub-populations with equal quantity according to the fitness value of the whale individuals, wherein the individuals with the worst fitness value form an exploration population to enhance the global exploration capability; the individual composition with the optimal fitness value develops the convergence rate and the local search capability of the population enhancement algorithm and improves the solving precision of the algorithm; the remaining individuals constitute a common population for balancing the global exploratory power and the local search power of the algorithm,
the update mechanism for exploring the population position is as follows:
D=|C.X * (t)-X(t))|
A=2a.r-a
C=2.r
X(t+1)=X rand (t)-A.D
in the above formula, X rand Randomly selecting a whale from the whale population, wherein r is a random number between 0 and 1, and a is linearly decreased from 2 to 0 along with the increase of iteration;
the location update mechanism for developing populations is as follows:
D’=|X * (t)-X(t)|
Figure FDA0003697852750000021
in the above formula, p is a random number between 0 and 1 which satisfies uniform distribution, b is 1, l is a random number between-1 and 1, and X * (t) is the location of the optimal whale;
the location update mechanism of the general population is as follows:
Figure FDA0003697852750000022
in the above formula, p1 and p2 are random numbers which satisfy uniform distribution between 0 and 1;
step S2-4: judging whether the whale optimization algorithm falls into a local optimal solution, wherein for a colony intelligent optimization algorithm, the current optimal solution of the algorithm refers to the best calculation result obtained by the operation of the algorithm, if the current optimal solution is not updated after one iteration, the algorithm is not found to be a better solution temporarily, and therefore the algorithm enters a local optimal state, when the following formula is satisfied, the algorithm is judged to enter the local optimal state, and the formula is as follows:
X * (t)==X * (t+2)
in the above formula, t is the current iteration times of the algorithm, and if the whale optimization algorithm on the surface of the market is not updated with the current optimal solution after three iterations, the current algorithm is judged to enter a local optimal state;
step S2-5: when the whale optimization algorithm enters a local optimal state, calculating the fitness value of each individual in the population according to the fitness function and dividing the fitness value into three populations again, and then performing dynamic evolution on the populations;
step S2-6: exploring the population direction to expand the current position of the population so as to expand the search range of the population and enhance the global exploration capability of the algorithm, wherein the population evolution is carried out through the following formula:
r=rand[0,1]+1
X(t)=X * (t).r
step S2-7: the population is developed and the current optimal solution is utilized to carry out deep local search, so that the convergence speed of the algorithm is accelerated, the solving precision is enhanced, and the population evolution is carried out through the following formula:
r=rand[0,1]
(t)=X * (t).r
step S2-8: the general population updates the position of the general population by using the reverse solution of the general population, and the evolution of the population is carried out by the following formula:
Figure FDA0003697852750000031
Figure FDA0003697852750000032
in the above formula, lb is the lower bound of the problem solution space, ub is the upper bound of the problem solution space, and fit () is the fitness function;
step S2-9: and judging whether the algorithm reaches a loop ending condition, when T is T, namely the algorithm reaches the maximum iteration times, ending the algorithm to output an optimal solution, and otherwise, returning to the step S2-2.
4. The method for predicting water solubility of chemical compounds in a dynamically evolved whale optimization algorithm according to claim 1 or 3, wherein the parameter optimization is performed on the iteration times Max _ epohs, Batch size Batch _ size, neuron number of Hidden layer Hidden _ size and learning rate of neural network Lr of the LSTM neural network model by using the multi-population dynamically evolved whale optimization algorithm in the step S3, and the specific process of determining the LSTM neural network model with the optimal parameter combination is as follows:
step S3-1: determining that the structure of the LSTM network model is a single-layer LSTM network, wherein the structure comprises an input structure and an output structure, the iteration times Max _ epochs of the model, the Batch processing size Batch _ size, the neuron number Hidden _ size of a Hidden layer and the learning rate Lr of the network; model training using a training set of compound water solubility data;
step S3-2: initialization of the population dynamic evolution whale optimization algorithm: the position vector of the whale optimization algorithm dynamically evolved by the multi-population corresponds to the parameters of the LSTM network in the step S3-1, namely the position vector of the whale optimization algorithm dynamically evolved by the multi-population comprises four dimensions, and corresponds to the iteration times Max _ epochs, the Batch processing size Batch _ size, the neuron number Hidden _ size of the Hidden layer and the learning rate Lr of the network respectively; other parameters of the simultaneous initialization algorithm include: the population number is N, the maximum iteration number of the algorithm is T, and the current iteration number T of the algorithm is 0;
step S3-3: calculating the fitness value of each individual, calculating the fitness value of each individual in the initial population according to a fitness function, and finding out the individual with the current optimal fitness value, wherein the fitness function is the root mean square error RMSE between the predicted value and the actual value of the LSTM neural network model, and the calculation formula is as follows:
Figure FDA0003697852750000033
in the formula, y (t) represents the real value of the t compound water-solubility data in the test set, y' (t) represents the real value of the t compound water-solubility-free data, and n is the total number of the data in the compound water-solubility test set;
step S3-4: dividing the population into three sub-populations with equal quantity according to the fitness value;
step S3-5: when the local optimal judgment formula of the step S2-4 is met, judging that the whale optimization algorithm enters a local optimal state;
step S3-6: when the whale optimization algorithm is trapped in a local optimal solution, carrying out dynamic evolution of the population;
step S3-7: judging whether the algorithm reaches a loop ending condition, and when T is equal to T, namely the algorithm reaches the maximum iteration times, ending the algorithm and outputting an optimal solution; otherwise, the process returns to step S3-3 to continue the loop.
5. The method for predicting the water solubility of a compound by using a dynamic evolution whale optimization algorithm as claimed in claim 4, wherein the step S3-4 comprises the following steps:
step S3-4-1: for the exploration population, updating the position of the next generation by adopting an updating mechanism of the position of the exploration population in the step S2-3;
step S3-4-2: for the development population, updating the position of the next generation by adopting the position updating mechanism of the development population in the step S2-3;
step S3-4-3: for the general population, the location updating mechanism of the general population in step S2-3 is used to update the location of the next generation.
6. The method for predicting the water solubility of a compound in a dynamically evolved whale optimization algorithm as claimed in claim 4, wherein the step S3-6 comprises the following steps:
step S3-6-1: for the exploration population, performing dynamic evolution of the population according to the position updating mechanism of the step S2-6;
step S3-6-2: for developing the population, performing dynamic evolution of the population according to the position updating mechanism of the step S2-7;
step S3-6-3: for the general population, dynamic evolution of the population is performed according to the location update mechanism of step S2-8.
7. The method for predicting the water solubility of a compound in a dynamically evolved whale optimization algorithm as claimed in claim 1 or 4, wherein the step S4 comprises the following steps:
step S4-1: when the optimal solution is output in step S3-7, the value on the position vector of the optimal solution is used as the optimal parameter of the LSTM network, and the LSTM network is constructed;
step S4-2: predicting the trained LSTM neural network model with the optimal parameter combination on a test set of a compound water-soluble data set; and outputting the predicted value and carrying out inverse normalization to obtain a final predicted result.
CN202210679711.0A 2022-06-16 2022-06-16 Compound water solubility prediction method of dynamic evolution whale optimization algorithm Withdrawn CN115062750A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210679711.0A CN115062750A (en) 2022-06-16 2022-06-16 Compound water solubility prediction method of dynamic evolution whale optimization algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210679711.0A CN115062750A (en) 2022-06-16 2022-06-16 Compound water solubility prediction method of dynamic evolution whale optimization algorithm

Publications (1)

Publication Number Publication Date
CN115062750A true CN115062750A (en) 2022-09-16

Family

ID=83200524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210679711.0A Withdrawn CN115062750A (en) 2022-06-16 2022-06-16 Compound water solubility prediction method of dynamic evolution whale optimization algorithm

Country Status (1)

Country Link
CN (1) CN115062750A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115563487A (en) * 2022-09-23 2023-01-03 江苏安控智汇科技股份有限公司 Water quality monitoring method based on EMD and improved LSTM
CN116204794A (en) * 2023-05-04 2023-06-02 国网江西省电力有限公司电力科学研究院 Method and system for predicting dissolved gas in transformer oil by considering multidimensional data
CN116796611A (en) * 2023-08-22 2023-09-22 成都理工大学 Method for adjusting bridge buckling cable force based on flagelliforme algorithm and artificial neural network
CN117437063A (en) * 2023-12-11 2024-01-23 交通银行股份有限公司湖南省分行 Financial risk prediction method and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115563487A (en) * 2022-09-23 2023-01-03 江苏安控智汇科技股份有限公司 Water quality monitoring method based on EMD and improved LSTM
CN116204794A (en) * 2023-05-04 2023-06-02 国网江西省电力有限公司电力科学研究院 Method and system for predicting dissolved gas in transformer oil by considering multidimensional data
CN116204794B (en) * 2023-05-04 2023-09-12 国网江西省电力有限公司电力科学研究院 Method and system for predicting dissolved gas in transformer oil by considering multidimensional data
CN116796611A (en) * 2023-08-22 2023-09-22 成都理工大学 Method for adjusting bridge buckling cable force based on flagelliforme algorithm and artificial neural network
CN116796611B (en) * 2023-08-22 2023-10-31 成都理工大学 Method for adjusting bridge buckling cable force based on flagelliforme algorithm and artificial neural network
CN117437063A (en) * 2023-12-11 2024-01-23 交通银行股份有限公司湖南省分行 Financial risk prediction method and system

Similar Documents

Publication Publication Date Title
CN115062750A (en) Compound water solubility prediction method of dynamic evolution whale optimization algorithm
Połap An adaptive genetic algorithm as a supporting mechanism for microscopy image analysis in a cascade of convolution neural networks
Sun et al. Gene expression data analysis with the clustering method based on an improved quantum-behaved Particle Swarm Optimization
Yang et al. An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization
Zeng et al. Accurately clustering single-cell RNA-seq data by capturing structural relations between cells through graph convolutional network
CN107862179A (en) A kind of miRNA disease association Relationship Prediction methods decomposed based on similitude and logic matrix
Huang et al. A graph neural network-based node classification model on class-imbalanced graph data
CN106021990B (en) A method of biological gene is subjected to classification and Urine scent with specific character
CN106202952A (en) A kind of Parkinson disease diagnostic method based on machine learning
CN115050477B (en) Bethes-optimized RF and LightGBM disease prediction method
CN112926640B (en) Cancer gene classification method and equipment based on two-stage depth feature selection and storage medium
CN110287985B (en) Depth neural network image identification method based on variable topology structure with variation particle swarm optimization
CN112382347B (en) Synergistic anti-cancer drug combination identification method based on molecular fingerprint and multi-target protein
Tewolde et al. Particle swarm optimization for classification of breast cancer data using single and multisurface methods of data separation
CN113742204B (en) Deep learning operator testing method based on fuzzy test
Liu et al. Reconstruction of gene regulatory networks based on two-stage Bayesian network structure learning algorithm
Zhou et al. Attribute weight entropy regularization in fuzzy c-means algorithm for feature selection
Venkat et al. Multiscale geometric and topological analyses for characterizing and predicting immune responses from single cell data
Bagyamani et al. Biological significance of gene expression data using similarity based biclustering algorithm
Liu et al. Fuzzy C-mean algorithm based on “complete” Mahalanobis distances
Turkey et al. An empirical tool for analysing the collective behaviour of population-based algorithms
Bai et al. Clustering single-cell rna sequencing data by deep learning algorithm
Kecman et al. Adaptive local hyperplane for regression tasks
Jin et al. A novel bacterial algorithm for parameter optimization of Support Vector Machine
Babichev et al. Implementation of DBSCAN clustering algorithm within the framework of the objective clustering inductive technology based on R and KNIME tools

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220916