CN115062750A

CN115062750A - Compound water solubility prediction method of dynamic evolution whale optimization algorithm

Info

Publication number: CN115062750A
Application number: CN202210679711.0A
Authority: CN
Inventors: 张琛; 沈亚; 陈圣兵; 郭法滨; 张新; 程知
Original assignee: Hefei University
Current assignee: Hefei University
Priority date: 2022-06-16
Filing date: 2022-06-16
Publication date: 2022-09-16

Abstract

The invention provides a compound water solubility prediction method of a dynamic evolution whale optimization algorithm, which specifically comprises the following steps: step S1: selecting a compound water-soluble data set as experimental data and dividing the data set into a training set and a testing set; step S2: improving a whale optimization algorithm by using a multi-population and population dynamic evolution strategy to improve the optimizing capacity of the whale optimization algorithm; step S3: the improved whale optimization algorithm is used for parameter optimization of the LSTM neural network, and the LSTM neural network with a better parameter structure is trained; step S4: predicting the water solubility of the compound by using the modified LSTM neural network; the LSTM deep learning model trained by the method can accurately predict the water solubility of the compound; the traditional whale optimization algorithm is optimized, so that the optimizing precision and the algorithm convergence efficiency are improved; the deep learning and group intelligent optimization algorithm is applied to the prediction of the water solubility of the compound, and valuable reference is provided for the research work of the prediction of the related properties of the compound.

Description

Compound water solubility prediction method of dynamic evolution whale optimization algorithm

Technical Field

The invention belongs to the technical field of compound water solution prediction, and particularly relates to a compound water solubility prediction method of a dynamic evolution whale optimization algorithm.

Background

80% of human body is composed of water, so water solubility is very important in the development process of narcotics and other drugs, and has important influence on the toxicity and in-vivo efficacy, biological activity, pharmacokinetics and other properties of various drugs. The water solubility of the medicine is focused at each stage of medicine research and development, and the accurate and efficient prediction of the water solubility of the compound is the key for reducing the cost of medicine research and development and ensuring the success of medicine research and development; the prediction of the water solubility of the compound is very important in the aspects of the selection of materials of coatings, the design of coatings and batteries, and the like, and how to determine and predict the water solubility of the compound is a complex and common problem and gradually attracts attention and attention of people.

The prediction of the water solubility of the compound can effectively promote the development of the pharmaceutical industry, but the conventional kinetic and thermodynamic methods are mainly used for predicting the water solubility of the compound at present, and the method has the characteristics of high cost and incapability of realizing large-scale compound screening.

The traditional machine learning method can realize the water solubility prediction of the compound, but the model accuracy is not high due to the small input characteristics and data quantity of data, and when the data quantity is large enough, the operation efficiency is low, so that the accuracy of the traditional machine learning is limited. The occurrence of deep learning is excellent in solving this problem, wherein LSTM, which is one kind of deep learning, is widely used for prediction and analysis of data, and has successfully solved prediction of air pollution, traffic flow, and the like; however, the LSTM neural network has many parameters that greatly affect the predicted effect of the LSTM neural network.

For example, a "hydrological time series prediction optimization method based on WOA-LSTM-MC" disclosed in chinese patent literature, whose publication number is CN112733997A, includes the problems of low optimization precision, slow algorithm convergence speed, and low algorithm calculation efficiency.

Disclosure of Invention

The invention provides a compound water-solubility prediction method of a dynamic evolution whale optimization algorithm, aiming at overcoming the problems that in the prior art, the water-solubility prediction cost of a compound is high, large-scale compound screening cannot be realized, the model accuracy is not high due to the fact that the input characteristics and the data quantity of data of a machine learning method are small, and when the data quantity is large enough, the operation efficiency is reduced, the accuracy of traditional machine learning is restricted, and the like.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

a compound water solubility prediction method of a dynamic evolution whale optimization algorithm is characterized by specifically comprising the following steps of:

step S1: the method comprises the steps of collecting water-soluble data of a compound to form a data set, and dividing the collected data into a training set and a test set by combining the data set.

Step S2: and optimizing the traditional whale optimization algorithm by using the multi-population and population dynamic evolution strategy, and improving the optimizing precision and the convergence speed of the whale optimization algorithm.

Step S3: and performing parameter optimization on the iteration times Max _ epochs, Batch processing size Batch _ size, the neuron number Hidden _ size of the Hidden layer and the learning rate Lr of the neural network by using a multi-population dynamically-evolved whale optimization algorithm, and determining the LSTM neural network model with the optimal parameter combination.

Step S4: according to the LSTM model obtained in step S3, a compound water solubility prediction analysis process is performed.

The optimized whale optimization algorithm is used for optimizing the LSTM neural network parameters, and the optimized LSTM neural network is applied to compound water solubility prediction, so that the accuracy of compound water solubility prediction and the optimization efficiency of the neural network parameters are improved.

Preferably, after the dividing step S1 is performed on the compound water-solubility data set, the method further includes a data preprocessing step, a noise data cleaning step, and a data normalization step.

Preferably, the improving whale optimization algorithm by using the strategy of population and population dynamic evolution in the step S2 comprises the following steps:

step S2-1: initializing parameters: setting the population quantity as N, setting the dimensionality of each individual as M, wherein M is the dimensionality of a problem to be solved, setting the maximum iteration time of the algorithm as T, setting the current iteration time T of the algorithm as 0, initializing a matrix with N rows and M columns to represent an initial ethnicity, setting each row to represent one individual, setting each individual as an M-dimensional vector to represent a solution of the M-dimensional problem, and setting the initial matrix of the population as:

step S2-2: calculating fitness value of the individual: and calculating the fitness value of each individual in the initial population according to the fitness function f (x), and finding out the individual with the current optimal fitness value.

Step S2-3: multi-population whale optimization algorithm: dividing whale individuals into three sub-populations with equal quantity according to the fitness value of the whale individuals, wherein the individuals with the worst fitness value form an exploration population to enhance the global exploration capability; the individual composition with the optimal fitness value develops the convergence rate and the local search capability of the population enhancement algorithm and improves the solving precision of the algorithm; the remaining individuals constitute a common population for balancing the global exploration capability and the local search capability of the algorithm.

The update mechanism for exploring the population position is as follows:

D＝|C.X ^* (t)-X(t))|

A＝2a.r-a

C＝2.r

X(t+1)＝X _rand (t)-A.D

in the above formula, X _rand Randomly selecting a whale from the whale population, wherein r is a random number between 0 and 1, and a is linearly decreased from 2 to 0 along with the increase of iteration.

The location update mechanism for developing populations is as follows:

D＝|X ^* (t)-X(t)

in the above formula, p is a random number between 0 and 1 which satisfies uniform distribution, b is 1, l is a random number between-1 and 1, and X ^* (t) is the location of the optimal whale.

The location update mechanism of the general population is as follows:

in the above formula, p1 and p2 are random numbers that satisfy a uniform distribution between 0 and 1.

Step S2-4: judging whether the whale optimization algorithm falls into a local optimal solution, wherein for a colony intelligent optimization algorithm, the current optimal solution of the algorithm refers to the best calculation result obtained by the operation of the algorithm, if the current optimal solution is not updated after one iteration, the algorithm is not found to be a better solution temporarily, and therefore the algorithm enters a local optimal state, when the following formula is satisfied, the algorithm is judged to enter the local optimal state, and the formula is as follows:

X ^* (t)＝＝X ^* (t+2)

in the above formula, t is the current iteration number of the algorithm, and if the whale optimization algorithm on the surface of the market does not update the current optimal solution after three continuous iterations, the current algorithm is judged to enter the local optimal state.

Step S2-5: when the whale optimization algorithm enters a local optimal state, the fitness value of each individual in the population is calculated according to the fitness function and is divided into three populations again, and dynamic evolution of the populations is carried out at the moment.

Step S2-6: exploring the population direction to expand the current position of the population so as to expand the search range of the population and enhance the global exploration capability of the algorithm, wherein the population evolution is carried out through the following formula:

r＝rand[0,1]+1

X(t)＝X ^* (t).r

step S2-7: the population is developed and the current optimal solution is utilized to carry out deep local search, so that the convergence speed of the algorithm is accelerated, the solving precision is enhanced, and the population evolution is carried out through the following formula:

r＝rand[0,1]

(t)＝X ^* (t).r

and step S2-8, the common population updates the position of the common population by using the reverse solution of the common population, and the evolution of the population is carried out by the following formula:

in the above equation, lb is the lower bound of the problem solution space, ub is the upper bound of the problem solution space, and fit () is the fitness function.

Step S2-9: and judging whether the algorithm reaches a loop ending condition, when T is T, namely the algorithm reaches the maximum iteration times, ending the algorithm to output an optimal solution, and otherwise, returning to the step S2-2.

Preferably, in step S3, the whale optimization algorithm with multi-population dynamic evolution is used to perform parameter optimization on the iteration number Max _ epochs, the Batch size Batch _ size, the neuron number of Hidden layer Hidden _ size, and the learning rate Lr of the neural network, and the specific process of determining the LSTM neural network model with the optimal parameter combination is as follows:

step S3-1: determining that the structure of the LSTM network model is a single-layer LSTM network, wherein the structure comprises an input structure and an output structure, the iteration times Max _ epochs of the model, the Batch processing size Batch _ size, the neuron number Hidden _ size of a Hidden layer and the learning rate Lr of the network; model training was performed using a training set of compound water solubility data.

Step S3-2: initialization of the population dynamic evolution whale optimization algorithm: the position vector of the multi-population dynamically evolved whale optimization algorithm corresponds to the parameters of the LSTM network in step S3-1, that is, the position vector of the multi-population dynamically evolved whale optimization algorithm includes four dimensions, and corresponds to the number of iterations Max _ epochs, the Batch size Batch _ size, the number of neurons in the Hidden layer Hidden _ size, and the learning rate Lr of the network respectively. Other parameters of the simultaneous initialization algorithm include: the population number is N, the maximum iteration number of the algorithm is T, the current iteration number of the algorithm is T equal to 0, and the whale individual is set as: x ═ X (X) ₁ ,X ₂ ,X ₃ ,X ₄ )。

Step S3-3: calculating the fitness value of each individual, calculating the fitness value of each individual in the initial population according to a fitness function, and finding out the individual with the current optimal fitness value, wherein the fitness function is the root mean square error RMSE between the predicted value and the actual value of the LSTM neural network model, and the calculation formula is as follows:

in the above formula, y (t) represents the real value of the t compound water-solubility data in the test set, y' (t) represents the real value of the t compound water-insolubility data, and n is the total number of data in the compound water-solubility test set.

Step S3-4: and dividing the population into three sub-populations with equal number according to the fitness value.

Step S3-5: and when the local optimal judgment formula of the step S2-4 is met, judging that the whale optimization algorithm enters a local optimal state.

Step S3-6: and when the whale optimization algorithm is trapped in a local optimal solution, performing dynamic evolution of the population.

Step S3-7: judging whether the algorithm reaches a loop ending condition, and when T is equal to T, namely the algorithm reaches the maximum iteration times, ending the algorithm and outputting an optimal solution; otherwise, the process returns to step S3-3 to continue the loop.

Preferably, the step S3-4 includes the steps of:

step S3-4-1: for the exploration population, the position of the next generation is updated by adopting the updating mechanism of the exploration population position in the step S2-3.

Step S3-4-2: for the development population, the location updating mechanism of the development population in step S2-3 is adopted to update the location of the next generation.

Step S3-4-3: for the general population, the location updating mechanism of the general population in step S2-3 is used to update the location of the next generation.

Preferably, the step S3-6 includes the steps of:

step S3-6-1: for exploring the population, dynamic evolution of the population is performed according to the location update mechanism of step S2-6.

Step S3-6-2: for developing the population, dynamic evolution of the population is performed according to the location update mechanism of step S2-7.

Step S3-6-3: for the general population, dynamic evolution of the population is performed according to the location update mechanism of step S2-8.

Preferably, the step S4 includes the steps of:

step S4-1: and when the optimal solution is output in the step S3-7, the value on the position vector of the optimal solution is used as the optimal parameter of the LSTM network to construct the LSTM network.

Step S4-2: predicting the trained LSTM neural network model with the optimal parameter combination on a test set of a compound water-soluble data set; and outputting the predicted value and carrying out inverse normalization to obtain a final predicted result.

Therefore, the beneficial effects of the invention are as follows:

1. optimizing a traditional whale optimization algorithm, improving optimizing precision and algorithm convergence efficiency, using the optimized whale optimization algorithm for optimizing LSTM neural network parameters, and applying the optimized LSTM neural network to compound water solubility prediction, so that the accuracy of compound water solubility prediction and the optimizing efficiency of neural network parameters are improved;

2. the deep learning and group intelligent optimization algorithm is applied to the prediction of the water solubility of the compound, and valuable sample references are provided for the research work of the prediction of the related properties of the compound.

Drawings

FIG. 1 is an overall workflow diagram of the present invention;

FIG. 2 is a flow chart of a whale optimization algorithm based on population dynamic evolution in the invention;

FIG. 3 is a flow chart of the work of optimizing LSTM parameters based on the population dynamic evolution whale optimization algorithm in the invention;

FIG. 4 is a graph of fitness convergence of a whale optimization algorithm based on population dynamic evolution and other optimization algorithms in the invention;

FIG. 5 is a schematic diagram of a device for predicting water solubility of a compound based on a population dynamic evolution whale optimization algorithm in the invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Example one

The compound water-solubility data prediction of the embodiment is that a publicly-accessed compound water-solubility data set AqSolDB is used for training and predicting a model; the purpose of this example is to predict the water solubility of a compound by analyzing the molecular information of the compound.

As shown in fig. 1, a compound water-soluble data set is preprocessed and feature extracted, then a multi-population dynamic evolution strategy is used for optimizing a whale algorithm, and optimizing precision and convergence speed of the whale optimization algorithm are improved; optimizing the LSTM by using an improved whale algorithm, and constructing a deep learning model; predicting the water solubility of the compound by using the constructed deep learning; the method specifically comprises the following steps:

step S1: dividing a compound water-soluble data set into a training set and a testing set, and performing data preprocessing and feature extraction.

Step S1-1: the compound water-soluble data set was partitioned, 70% as training set for model training and 30% as test set for model prediction.

Step S1-2: and carrying out data normalization on the divided training set and the divided test set, wherein the normalization is standardized in a Min-Max mode, and the calculation formula is as follows:

in the above formula, x represents the original characteristics of the compound data, and x' represents the characteristics after normalization. The data normalization is to remove unit limitation of data, convert the data into a pure value between 0 and 1, and improve the convergence rate and prediction accuracy of the model to a certain extent.

Step S2: a multi-population and population dynamic evolution strategy is used for optimizing a traditional whale optimization algorithm, and the optimizing precision and the convergence speed of the whale optimization algorithm are improved.

Step S3: parameter optimization is carried out on the iteration times Max _ epohs, Batch size Batch _ size, the neuron number Hidden _ size of the Hidden layer and the learning rate Lr of the network of the LSTM neural network model by using a multi-population dynamic evolution whale optimization algorithm (MDEWOA), so that the LSTM neural network model with the optimal parameter combination is determined.

Step S4: the LSTM model obtained in step S3 is trained on a training set of compound water-soluble data sets, and then the trained model is used to obtain a prediction result of the model on a test set.

In the above embodiment, the improvement process of the whale optimization algorithm based on the population dynamic evolution is shown in fig. 2, wherein the step S2 specifically includes the following steps:

step S2-1: initialization parameters

The number of populations is set to be N, the dimension of each individual is M, namely the dimension of a problem to be solved, the maximum iteration time of the algorithm is T, and the current iteration time T of the algorithm is 0, a matrix with N rows and M columns is initialized to represent the initial population, each row represents one individual, each individual is an M-dimensional vector and represents a solution of the M-dimensional problem.

Step S2-2: calculating fitness value of an individual

And calculating the fitness value of each individual in the initial population according to the fitness function f (x), and finding out the individual with the current optimal fitness value.

Step S2-3: multi-population whale optimization algorithm

Dividing whale individuals into three sub-populations with equal quantity according to the fitness value of the whale individuals, wherein the individuals with the worst fitness value form an exploration population to enhance the global exploration capability; the individual composition with the optimal fitness value develops the convergence rate and the local search capability of the population enhancement algorithm and improves the solving precision of the algorithm; the remaining individuals constitute a common population for balancing the global exploration capability and the local search capability of the algorithm.

The update mechanism for exploring the population position is as follows:

D＝|C.X ^* (t)-X(t))|

A＝2a.r-a

C＝2.r

X(t+1)＝X _rand (t)-A.D

The location update mechanism for developing populations is as follows:

D'＝|X ^* (t)-X(t)|

The location update mechanism of the general population is as follows:

Step S2-4: determining whether whale optimization algorithm is trapped in local optimum

For the swarm intelligence optimization algorithm, the current optimal solution of the algorithm refers to the best result obtained after the algorithm is operated, and if the current optimal solution is not updated after one iteration, the algorithm does not find a better solution, so that the algorithm falls into local optimization, and the algorithm is judged to fall into local optimization when the following formula is satisfied.

X ^* (t)＝＝X ^* (t+2)

Wherein t is the current iteration number of the algorithm, and the above formula shows that the whale optimization algorithm does not update the current optimal solution after three iterations, so that the algorithm is judged to be locally optimal.

r＝rand[0,1]+1

X(t)＝X ^* (t).r

r＝rand[0,1]

(t)＝X ^* (t).r

As shown in FIG. 3, the process of optimizing LSTM neural network parameters based on the population dynamic evolution whale optimization algorithm in this example is given.

Wherein, step S3 includes the following steps:

step S3-1: determining the structure of the LSTM network model as a single-layer LSTM network, wherein the structure comprises an input structure, an output structure, iteration times Max _ epochs of the model, Batch size Batch _ size, neuron number Hidden _ size of a Hidden layer and learning rate Lr of the network; model training was performed using a training set of compound water solubility data.

Step S3-2: initialization of the population dynamic evolution whale optimization algorithm: the position vector of the multi-population dynamically-evolved whale optimization algorithm corresponds to the parameters of the LSTM network in step S3-1, that is, the position vector of the multi-population dynamically-evolved whale optimization algorithm includes four dimensions, which respectively correspond to the iteration times Max _ epochs, the Batch processing size Batch _ size, the neuron number of the Hidden layer Hidden _ size, and the learning rate Lr of the network of the LSTM model. Other parameters of the simultaneous initialization algorithm include: the population number is N, the maximum iteration number of the algorithm is T, the current iteration number of the algorithm is T equal to 0, and the whale individual is set as: x ═ X (X) ₁ ,X ₂ ,X ₃ ,X ₄ )。

Step S3-4: dividing the population into three sub-populations with equal number according to the fitness value, and concretely comprising the following steps:

step S3-4-1: for the exploration population, the location of the next generation is updated by using the update mechanism of the exploration population location in step S2-3.

Step S3-6: when the whale optimization algorithm is trapped in a local optimal solution, the dynamic evolution of the population is carried out, and the method specifically comprises the following steps:

For step S4, the details include the steps of:

step S4.1: and when the optimal solution is output in step S3, the value on the position vector of the optimal solution is used as the optimal parameter of the LSTM network, so as to construct the LSTM network.

Step S4.2: training the LSTM neural network model with the optimal parameter combination on a training set of a compound water-soluble data set and predicting on a test set; and outputting the predicted value and carrying out inverse normalization to obtain a final predicted result.

The invention compares a particle swarm optimization algorithm, a goblet sea squirt swarm optimization algorithm and a whale optimization algorithm with a whale optimization algorithm based on multi-population dynamic evolution, analyzes the fitness convergence curves of the whale optimization algorithm, the goblet sea squirt swarm optimization algorithm and the whale optimization algorithm on a reference test function Schwefel 2.26, and the fitness convergence curves of the four algorithms are shown in figure 4.

The convergence curves of the four algorithms are analyzed, the whale optimization algorithm is improved by using the multi-population and population dynamic evolution strategy, so that the algorithm has stronger global exploration capability and local search capability, and meanwhile, the convergence speed of the algorithm is accelerated, so that the whale optimization algorithm based on the multi-population dynamic evolution in the embodiment is obviously superior to other algorithms in convergence speed and optimization precision.

The method is used for compound water solubility prediction, and table 1 shows the comparison of the model in the example with the results of optimizing an LSTM neural network model by other intelligent optimization algorithms, and the performance of the model is evaluated by analyzing RMSE of each model on a training set and a test set.

TABLE 1

By analyzing the table 1, it can be seen that, in the aspect of predicting the water solubility of the compound, the method provided by the invention can achieve higher prediction accuracy on both a training set and a testing set; the whale optimization algorithm based on multi-population dynamic evolution has strong global exploration capability and local search capability, and can effectively optimize the LSTM neural network model, so that the LSTM neural network model obtains a better parameter combination, the prediction precision of the model is improved, and therefore, a better compound water-solubility prediction effect is obtained.

As shown in fig. 5, according to the above embodiment of the present invention, there is provided a group intelligent algorithm-based compound water solubility prediction apparatus for the present invention, the apparatus including:

the device comprises a data acquisition module, a data preprocessing module, a data modeling module and a model prediction module.

The data acquisition module is used for acquiring the structure of the compound to be predicted;

a data preprocessing module: for normalizing the acquired compound data;

a data modeling module: optimizing LSTM parameters by using a whale optimization algorithm, and constructing a compound water solubility prediction model;

a model prediction module: and (4) performing water solubility prediction on the compound by using the constructed prediction model.

The structure, features and effects of the present invention have been described in detail with reference to the embodiments shown in the drawings, but the above embodiments are merely preferred embodiments of the present invention, and it should be understood that technical features related to the above embodiments and preferred modes thereof can be reasonably combined and configured into various equivalent schemes by those skilled in the art without departing from and changing the design idea and technical effects of the present invention; therefore, the invention is not limited to the embodiments shown in the drawings, and all the modifications and equivalent embodiments that can be made according to the idea of the invention are within the scope of the invention as long as they are not beyond the spirit of the description and the drawings.

Claims

1. A compound water solubility prediction method of a dynamic evolution whale optimization algorithm is characterized by comprising the following steps:

step S1: collecting water-soluble data of a compound to form a data set, and dividing the collected data into a training set and a test set by combining the data set;

step S2: optimizing a traditional whale optimization algorithm by using a multi-population and population dynamic evolution strategy, and improving the optimizing precision and the convergence speed of the whale optimization algorithm;

step S3: performing parameter optimization on iteration times Max _ epochs, Batch processing quantity size Batch _ size, neuron number Hidden _ size of a Hidden layer and learning rate Lr of a neural network of the LSTM neural network model by using a multi-population dynamically evolved whale optimization algorithm, and determining the LSTM neural network model with optimal parameter combination;

2. The method for predicting the water solubility of a compound in a dynamically evolved whale optimization algorithm as claimed in claim 1, wherein the step S1 further comprises the steps of preprocessing data, cleaning noise data and normalizing data after dividing the water solubility data set of the compound.

3. The method for predicting the water solubility of a compound in a dynamically evolved whale optimization algorithm as claimed in claim 1 or 2, wherein the step of improving the whale optimization algorithm by using the strategy of population and population dynamic evolution in the step S2 comprises the following steps:

step S2-1: initializing parameters: setting the number of populations as N, the dimensionality of each individual as M, wherein M is the dimensionality of a problem to be solved, the maximum iteration time of the algorithm is T, the current iteration time T of the algorithm is 0, initializing a matrix with N rows and M columns to represent an initial race, each row represents one individual, each individual is an M-dimensional vector and represents a solution of the M-dimensional problem, and the initial matrix of the populations is as follows:

step S2-2: calculating fitness value of the individual: calculating the fitness value of each individual in the initial population according to a fitness function f (x), and finding out the individual with the current optimal fitness value;

step S2-3: multi-population whale optimization algorithm: dividing whale individuals into three sub-populations with equal quantity according to the fitness value of the whale individuals, wherein the individuals with the worst fitness value form an exploration population to enhance the global exploration capability; the individual composition with the optimal fitness value develops the convergence rate and the local search capability of the population enhancement algorithm and improves the solving precision of the algorithm; the remaining individuals constitute a common population for balancing the global exploratory power and the local search power of the algorithm,

the update mechanism for exploring the population position is as follows:

D＝|C.X ^* (t)-X(t))|

A＝2a.r-a

C＝2.r

X(t+1)＝X _rand (t)-A.D

in the above formula, X _rand Randomly selecting a whale from the whale population, wherein r is a random number between 0 and 1, and a is linearly decreased from 2 to 0 along with the increase of iteration;

the location update mechanism for developing populations is as follows:

D’＝|X ^* (t)-X(t)|

in the above formula, p is a random number between 0 and 1 which satisfies uniform distribution, b is 1, l is a random number between-1 and 1, and X ^* (t) is the location of the optimal whale;

the location update mechanism of the general population is as follows:

in the above formula, p1 and p2 are random numbers which satisfy uniform distribution between 0 and 1;

X ^* (t)＝＝X ^* (t+2)

in the above formula, t is the current iteration times of the algorithm, and if the whale optimization algorithm on the surface of the market is not updated with the current optimal solution after three iterations, the current algorithm is judged to enter a local optimal state;

step S2-5: when the whale optimization algorithm enters a local optimal state, calculating the fitness value of each individual in the population according to the fitness function and dividing the fitness value into three populations again, and then performing dynamic evolution on the populations;

r＝rand[0，1]+1

X(t)＝X ^* (t).r

r＝rand[0，1]

(t)＝X ^* (t).r

step S2-8: the general population updates the position of the general population by using the reverse solution of the general population, and the evolution of the population is carried out by the following formula:

in the above formula, lb is the lower bound of the problem solution space, ub is the upper bound of the problem solution space, and fit () is the fitness function;

4. The method for predicting water solubility of chemical compounds in a dynamically evolved whale optimization algorithm according to claim 1 or 3, wherein the parameter optimization is performed on the iteration times Max _ epohs, Batch size Batch _ size, neuron number of Hidden layer Hidden _ size and learning rate of neural network Lr of the LSTM neural network model by using the multi-population dynamically evolved whale optimization algorithm in the step S3, and the specific process of determining the LSTM neural network model with the optimal parameter combination is as follows:

step S3-1: determining that the structure of the LSTM network model is a single-layer LSTM network, wherein the structure comprises an input structure and an output structure, the iteration times Max _ epochs of the model, the Batch processing size Batch _ size, the neuron number Hidden _ size of a Hidden layer and the learning rate Lr of the network; model training using a training set of compound water solubility data;

step S3-2: initialization of the population dynamic evolution whale optimization algorithm: the position vector of the whale optimization algorithm dynamically evolved by the multi-population corresponds to the parameters of the LSTM network in the step S3-1, namely the position vector of the whale optimization algorithm dynamically evolved by the multi-population comprises four dimensions, and corresponds to the iteration times Max _ epochs, the Batch processing size Batch _ size, the neuron number Hidden _ size of the Hidden layer and the learning rate Lr of the network respectively; other parameters of the simultaneous initialization algorithm include: the population number is N, the maximum iteration number of the algorithm is T, and the current iteration number T of the algorithm is 0;

in the formula, y (t) represents the real value of the t compound water-solubility data in the test set, y' (t) represents the real value of the t compound water-solubility-free data, and n is the total number of the data in the compound water-solubility test set;

step S3-4: dividing the population into three sub-populations with equal quantity according to the fitness value;

step S3-5: when the local optimal judgment formula of the step S2-4 is met, judging that the whale optimization algorithm enters a local optimal state;

step S3-6: when the whale optimization algorithm is trapped in a local optimal solution, carrying out dynamic evolution of the population;

5. The method for predicting the water solubility of a compound by using a dynamic evolution whale optimization algorithm as claimed in claim 4, wherein the step S3-4 comprises the following steps:

step S3-4-1: for the exploration population, updating the position of the next generation by adopting an updating mechanism of the position of the exploration population in the step S2-3;

step S3-4-2: for the development population, updating the position of the next generation by adopting the position updating mechanism of the development population in the step S2-3;

6. The method for predicting the water solubility of a compound in a dynamically evolved whale optimization algorithm as claimed in claim 4, wherein the step S3-6 comprises the following steps:

step S3-6-1: for the exploration population, performing dynamic evolution of the population according to the position updating mechanism of the step S2-6;

step S3-6-2: for developing the population, performing dynamic evolution of the population according to the position updating mechanism of the step S2-7;

7. The method for predicting the water solubility of a compound in a dynamically evolved whale optimization algorithm as claimed in claim 1 or 4, wherein the step S4 comprises the following steps:

step S4-1: when the optimal solution is output in step S3-7, the value on the position vector of the optimal solution is used as the optimal parameter of the LSTM network, and the LSTM network is constructed;