CN115662642A

CN115662642A - Construction and application of esophageal cancer life prediction model based on improved goblet ascidian algorithm

Info

Publication number: CN115662642A
Application number: CN202211300762.4A
Authority: CN
Inventors: 姜素霞; 李厚胜; 王延峰; 孙军伟; 黄春; 栗三一; 李盼龙; 兰奇逊; 雷霆
Original assignee: Zhengzhou University of Light Industry
Current assignee: Zhengzhou University of Light Industry
Priority date: 2022-10-23
Filing date: 2022-10-23
Publication date: 2023-01-31

Abstract

The application discloses construction and application of an esophageal cancer life prediction model based on an improved goblet sea squirt algorithm. The method comprises the steps of dividing iteration of an algorithm into two periods, an iteration early period and an iteration late period by using an iteration dividing strategy; in the earlier stage of iteration, updating the position of the leader by using a leader fusion variation strategy, and increasing the exploration capacity of an algorithm; in the later iteration stage, updating the position of the leader by adopting the optimal individual leading motion, and optimizing the exploitation capacity of the algorithm; then, updating all goblet and sea squirt individuals by using a one-dimensional Gaussian variation strategy, and improving the diversity of the population; and performing optimal field disturbance on the optimal goblet ascidian position before the iteration of the algorithm is finished, comparing the fitness values, and selecting the optimal fitness value by using a greedy strategy to improve the ability of jumping out of the local optimal solution. The IPSSA-BP model is good in fitting effect, high in accuracy, high in prediction precision and good in prediction stability.

Description

Construction and application of esophageal cancer life prediction model based on improved goblet ascidian algorithm

Technical Field

The application relates to the technical field of cancer survival prediction, in particular to construction and application of an esophageal cancer survival prediction model based on an improved goblet sea squirt algorithm.

Background

Esophageal cancer is a common tumor of the digestive tract, and about 30 million people die of esophageal cancer every year worldwide. Its morbidity and mortality vary widely from country to country. China is one of the high-incidence areas of esophageal cancer in the world, and the average death rate of people is about 15 ten thousand every year. With the continuous progress of medical technology, the esophageal cancer comprehensive treatment mode taking surgical treatment as a core brings good effects to esophageal cancer patients. However, the postoperative complication of esophageal cancer is more, the survival rate of postoperative patients is only 10% to 30% in 5 years, and then the survival rate of early esophageal cancer patients is higher than 70% after comprehensive treatment. And because the prognosis inference of the patient by manual diagnosis and traditional statistical methods is inevitable to have some errors and limitations, the selection of treatment modes and drug types is limited. Therefore, timely and effective prediction of the survival prognosis condition of the esophageal cancer patient is crucial to improving the survival rate of the patient. By analyzing the clinical data of the survival of the esophageal cancer patient, a prediction model is constructed to predict the survival period of the esophageal cancer patient, so that a clinician can be helped to find the survival prognosis condition of the patient in time, and then the treatment of the patient can be well assisted, thereby improving the prognosis of the esophageal cancer patient and further improving the survival rate of the esophageal cancer patient.

Although the survival time of esophageal cancer patients has been remarkably improved in the past decades with the introduction of new drugs and new technologies, due to the complexity of pathology, certain errors exist in the manual judgment of risk levels, so that the esophageal cancer patients cannot obtain a proper treatment mode.

The traditional cancer treatment method is selected based on the "gold standard" method, which comprises three tests: clinical examination, radiological imaging and pathological examination, which treatment is to be taken is determined by the clinical experience and expertise of the doctor. The traditional method has invasive detection, can bring physical discomfort and pain to the detected people, is high in cost and not suitable for large-scale popularization and application, and meanwhile, the detection result can only be used for proving whether the cancer exists and cannot judge the risk level and the survival period.

To address the above problems, statistical analysis methods are used to predict the risk level of cancer patients. The statistical analysis method can be used for deducing the correlation among the variables only by some clinical characteristic indexes which are easy to obtain of the patient, and meanwhile, the efficiency is high and the cost is low. Common statistical analysis methods are Kaplan-Meier (KM), cox, living Tree, lasso, etc. KM is a univariate analysis method for estimating survival probability from observation of survival time, and the method only describes the relation between univariate and survival and ignores the influence of other variables, and can intuitively show the survival rate or death rate of two or more groups. The survival tree method is suitable for large-scale queue data and has a plurality of variables, the application condition of the classical survival analysis method is difficult to meet, and the result is represented by a tree structure diagram, so that the method is more visual and is easier to understand and explain. While statistical analysis methods can use clinical data readily available to the patient and analyze relationships between variables more quickly, statistical analysis has high requirements on the integrity and accuracy of historical statistical data, and statistical analysis has poor accuracy and reliability when applied to relatively complex data.

Machine learning has shown advantages over statistical models in terms of the complexity of processing large-scale data and the discovery of prognostic factors. The learning process can be generally divided into: the method comprises four stages of data acquisition, data preprocessing, model training and prediction, model evaluation and the like. Machine learning techniques have absolute advantages over statistical analysis when faced with large numbers of and highly dimensional data sets. Through comprehensive collection of patient data, the data can be analyzed and utilized, and internal association among the data is further mined by a machine learning method, so that a prognosis index is constructed, and finally a survival prediction model is established. The survival prediction model may help clinicians formulate targeted individualized treatment regimens and better drug choices based on the survival prognosis for the patient.

The information disclosed in this background section is only for enhancement of understanding of the background of the disclosure and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art that is known to a person skilled in the art.

Disclosure of Invention

Aiming at the problems of unbalanced exploration capacity and mining capacity, low convergence speed, low convergence precision, easy falling into local optimum and the like of the goblet sea squirt algorithm, the inventor of the application provides a goblet sea squirt optimization algorithm based on a partitioning iteration strategy: dividing iteration of the algorithm into two periods, an iteration early period and an iteration late period by using an iteration dividing strategy; in the early stage of iteration, updating the position of the leader by using a leader fusion mutation strategy, and increasing the exploration capacity of an algorithm; in the later iteration stage, updating the position of the leader by adopting the optimal individual leading motion, and optimizing the mining capacity of the algorithm; then, updating all goblet ascidian individuals by utilizing a dimension-by-dimension Gaussian variation strategy to improve the diversity of the population; and performing optimal field disturbance on the optimal goblet ascidian position before the iteration of the algorithm is finished, comparing the fitness values, selecting the optimal fitness value by using a greedy strategy, and improving the capability of jumping out of a local optimal solution. The algorithm is used for optimizing an esophageal cancer survival prediction model, is compared with other 6 models, and is used for analyzing the model by taking Mean Absolute Error (MAE), mean Square Error (MSE) and Mean Absolute Percentage Error (MAPE) as evaluation criteria, and the result shows the effectiveness of the proposed model.

According to an aspect of the present disclosure, there is provided a method for constructing an esophageal cancer survival prediction model based on the improved ascidian algorithm, comprising:

(1) Analyzing and screening out the death influence factors of the esophageal cancer patient, taking the screened out factors as input variables and the survival time as output variables, and establishing a life cycle prediction model based on a BP neural network;

(2) Optimizing the initial weight and the threshold of the BP neural network by using an improved goblet ascidian algorithm, wherein the improved goblet ascidian algorithm comprises the following steps:

s1: setting relevant parameters of a goblet sea squirt algorithm: the population scale N, the maximum iteration number L, the dimension Dim of an individual and the initialization population;

s2: calculating the fitness value of each individual goblet sea squirt according to the target function, and sequencing the fitness values; selecting a position with the best fitness as a position of a food source, and setting the current iteration number l =1;

s3: judging whether the current iteration times l are smaller than a division coefficient, if so, entering a step S4, and if so, entering a step S5;

s4: updating the position of the leader by utilizing a leader fusion variation strategy, and introducing self-adaptive inertia weight when the position of the follower is updated;

s5: updating the position of the leader by using an optimal individual leading motion strategy, and introducing self-adaptive inertia weight when the position of the follower is updated;

s6: calculating the fitness value, updating the positions of the goblet ascidians, and updating the positions of all the goblet ascidians by using a one-dimensional Gaussian variation strategy;

s7: updating the optimal goblet ascidian position by using an optimal field disturbance strategy, comparing the optimal goblet ascidian position with the current fitness value, and selecting the optimal fitness value by using a greedy strategy;

s8: if the current iteration number L is smaller than the maximum iteration number L, adding 1 to the current iteration L, and entering a step S2; otherwise, outputting the optimal solution, and finishing the algorithm iteration.

In some embodiments of the present disclosure, in the step S2, the objective function expression is:

wherein X is the number of training samples of the esophageal cancer survival model, and y _i Output value y 'of network model of ith esophageal cancer patient' _i Is the actual output value of the ith esophageal cancer patient.

In some embodiments of the present disclosure, in the step S4, the leader location update process is as follows:

a+b+c＝1 ③；

in the formula (I), the compound is shown in the specification,

represents the 1 st individual of goblet ascidians (leader) at the j-th dimension;

is two randomly selected goblet ascidian individuals in the population; f _j A source position for the jth dimension food source; a, b, c are random numbers in [0,1]]Taking values in between.

In some embodiments of the present disclosure, in the step S5, the leader location update is performed according to the following formula:

in the formula, F _j Is the j-th dimension food source position, R ₂ R ₃ Is [0,1]]A random number of intervals;r is [ -1,1]The formula for calculating the random number, B, in the interval is as follows:

in the formula, L is the current iteration number, and L is the maximum iteration number.

In some embodiments of the present disclosure, in the step S6, all the goblet sea squirt positions are updated according to the following formula:

q＝1-(l-L) ² ⑥；

x(j)＝q×x(j)+randn×x(j) ⑦；

in the formula, q is self-adaptive inertia weight; l is the current iteration number; l is the maximum number of iterations; x (j) is the position of the jth Weizun sea squirt; randn is the gaussian mutation operator.

In some embodiments of the present disclosure, in the step S7, the optimal cask ascidian location is updated by the following formula:

in the formula, x (best) is the optimal position during global updating; x (newbest) is the generated new position; wherein v and m are random numbers between [0,1] respectively.

In some embodiments of the present disclosure, in the step S7, for the generated neighborhood position, the following formula is adopted to determine whether to keep:

in the formula, f (x (best)) is an optimal position adaptive value in the global updating, and if the generated position is better than the original position, the generated position is replaced with the original position to make the generated position be globally optimal; otherwise, the optimal position remains unchanged.

In some embodiments of the present disclosure, in step (1), 5 blood markers associated with the survival of esophageal cancer patients are screened by one-way COX regression analysis: the structure of the life cycle model for constructing the BP neural network is 5-11-1 by taking the white blood cell count, the monocyte count, the neutrophil count, the prothrombin time and the international standardized ratio as input, and the optimized dimensionality is as follows:

Dim＝(inputnum+1)×hiddennum+(hiddennum+1)×outputnum ⑩；

wherein, inputnum, hiddennum and outputnum are the number of input layers, the number of hidden layers and the number of output layers of the BP neural network respectively.

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

1. firstly, balancing exploration capacity and mining capacity of an algorithm by adopting a partitioning iteration strategy, partitioning iteration of the algorithm into two periods by utilizing a partitioning coefficient, and updating the position of a leader by adopting a leader fusion variation strategy with stronger exploration capacity in the earlier stage of the iteration; in the later iteration stage, updating the position of the leader by adopting the optimal individual leading movement with stronger mining capacity; secondly, all goblet ascidian individuals are subjected to dimensional Gaussian variation, so that the diversity of the population is enriched; and then, the optimal goblet ascidian position is updated by using an optimal field disturbance strategy, and an optimal solution is obtained by using a greedy strategy, so that the algorithm is prevented from getting premature, and the capability of jumping out of local optimal is improved.

2. Compared with different swarm intelligence algorithms and different improved goblet and sea squirt algorithms, the IPSSA algorithm has better convergence speed and convergence precision on the premise of ensuring stability, and has certain competitive advantages.

3. The IPSSA-BP model is good in fitting effect, high in accuracy, high in prediction precision and good in prediction stability.

Drawings

FIG. 1 is a flow chart of the improved cask ascidian algorithm in one embodiment of the present application.

Fig. 2 is a convergence curve of the modified cask ascidian algorithm and other swarm intelligence algorithms in an embodiment of the present application.

FIG. 3 is a graph showing the convergence of the algorithm of the present application with other improved algorithm of the present application.

Fig. 4 is a comparison graph of expected values and actual values of a test set of an esophageal cancer survival prediction model in an embodiment of the present application.

Fig. 5 is a comparison graph of the evaluation criteria of 7 models in an embodiment of the present application.

Detailed Description

In order to better understand the technical scheme of the application, the technical scheme is described in detail in the following with reference to the accompanying drawings and specific embodiments.

The first embodiment is as follows: goblet sea squirt optimization algorithm based on partitioning iteration strategy

For the swarm intelligence algorithm, the performance of the algorithm can be effectively improved by balancing the exploration and exploitation capabilities of the algorithm in the searching process. According to the characteristics of the goblet sea squirt algorithm, the embodiment introduces a partitioning iteration strategy for balancing the exploration capacity and the exploitation capacity of the algorithm, and improves the diversity of the population by utilizing a one-dimensional Gaussian variation strategy. And finally, improving the capability of jumping out of local optimum of the algorithm through the disturbance of the optimal field and a greedy strategy.

1. Partitioning an iteration policy

The partitioning iteration strategy is a strategy for balancing algorithm exploration and exploitation. The method divides iteration of the algorithm into different periods, and solves the problem by adopting different search equations according to the characteristics of the different periods. In the original cask and ascidian algorithm, more exploration capacity is needed in the early stage of the algorithm, and more mining capacity is needed in the later stage of the algorithm. The present example uses the division coefficient to iterate the division of the algorithm into two periods, an iteration early period and an iteration late period. An equation with strong exploration capacity is adopted in the early stage of iteration, and an algorithm equation with strong exploitation capacity is utilized in the later stage of iteration. This balances the exploration and balancing capabilities of the algorithm. In the embodiment, a leader fusion mutation strategy is selected for updating in the early stage of iteration, and adaptive inertial weight is introduced into the position of a follower, so that the mining capacity of the algorithm is improved. And in the later iteration stage, the optimal individual leading motion is selected to update the position of the leader, and the inertial weight is introduced into the position of the follower. Table 1 is a pseudo code of the partitioning iteration strategy, in which the setting of the partitioning coefficients is as follows:

L _h ＝L/2 (1)；

where L is the maximum number of iterations, this example divides the iteration into two periods of equal number of iterations.

1.1 leader fusion variation strategy

In the original goblet sea squirt algorithm, the position of the leader is mainly guided by a food source, and the position of the food source is updated in each iteration and is easily trapped in a local extreme value, so that the exploration capacity of the algorithm is reduced. In order to improve the problem, the embodiment introduces a leader fusion variation strategy, two goblet ascidian individuals are randomly selected from all goblet ascidian individuals, and the food source position and the randomly selected positions of the two goblet ascidian individuals are subjected to fusion variation to generate a new leader position, so that the update of the leader position is accelerated, and the search directions of the other individuals are influenced. The updated position of the leader is influenced not only by the position of the food source but also by the random individuals, so that the convergence speed and the exploration capacity of the algorithm are accelerated. The specific update process is as follows:

a+b+c＝1 (3)；

in the formula (I), the compound is shown in the specification,

is two randomly selected goblet sea squirts in the population, F _j A source position for the jth dimension food source; a, b and c are random numbers, and the value is set to be [0,1]]In the meantime.

1.2 optimal Individual guide movement

In the later stages of the iteration of the algorithm, the search of the algorithm requires more mining capacity. The optimal individual leader is introduced to guide the sports, the team needs to advance according to a target area (optimal area), so the leader needs to update the position of the leader to be close to the target position (optimal area), and the leader also continuously approaches to the optimal area while the follower approaches to the leader. The leader location update can find the best location near the food source according to equation (4). Sometimes, the leader must leave the current best position to find a better position, and the mining capacity of the algorithm is improved. The specific formula updates as follows:

in the formula, F _j Is the j-th dimension food source position, R ₂ R ₃ Is [0,1]]A random number of intervals; r is [ -1,1]The formula for calculating the random number, B, in the interval is as follows:

in the formula, L is the current iteration number, and L is the maximum iteration number. 2 XR ₃ More random motion is generated, so the algorithm does not fall into local optima, which means the algorithm is also being explored in the development stage. cos (2R pi) searches for the best individual with a different radius to find a better location around the individual.

1.3 follower location update

In the goblet sea squirt group algorithm, the follower moves along with the leader, and the follower updates the position according to the Newton's law of motion, and the formula is

For the ith follower in the jth dimensionPosition of (2, a) acceleration, v ₀ Is the initial velocity, t is time; because the iteration number is time in the algorithm, the difference between each iteration is 1, the initial velocity is 0, and an inertia weight w (l) which changes along with the iteration number is updated in each updating, the movement formula of the follower is updated as follows:

wherein L is the current iteration number, L is the maximum iteration number,

for the position of the ith follower in the jth dimension,

is the position of the i-1 st follower in the j dimension.

TABLE 1 partitioning iterative policy pseudocode

2. Dimensional Gaussian variation

The gaussian mutation strategy is to act on the original location vector using a random number that follows a normal distribution to generate a new location. Most mutation operators are distributed around the original location, which is equivalent to performing a neighborhood search over a small area. The variation not only improves the optimization precision of the optimization algorithm, but also is beneficial to the algorithm to jump out of a local optimal region. Meanwhile, a few operators are far away from the current position, so that the diversity of the population is enhanced, the potential area can be better searched, the searching speed is increased, and the convergence trend of the optimization algorithm is accelerated. All goblet ascidian individuals are subjected to one-dimensional Gaussian variation by using a Gaussian variation strategy to increase the diversity of the population, and a pseudo code of the one-dimensional Gaussian variation strategy is shown in a table 2.

In order to better adjust the global exploration capability and the local distraction capability of the algorithm, the adaptive inertial weight q is introduced while Gaussian variation is carried out, the exploration capability of the algorithm is strong when the value of the adaptive inertial weight q is large, the development capability of the algorithm is strong when the value of the inertial weight is small, and a specific formula is updated as follows:

q＝1-(l-L) ² (9)；

x(j)＝q×x(j)+randn×x(j) (10)；

in the formula, L is the current iteration number, and L is the maximum iteration number; x (j) is the position of the jth Weizun sea squirt; randn is a gaussian mutation operator.

TABLE 2 one-dimensional Gaussian variant pseudo-code

3. Optimal domain perturbation

The goblet sea squirt algorithm takes the current optimal position as the target of the iteration when the position is updated. In each iteration process, the optimal position is updated only when the position superior to the optimal position appears, so that the updating times of the optimal position are not large before the end condition is met, and the searching efficiency of the algorithm is not high. Therefore, the optimal field disturbance strategy is introduced, random search is carried out near the optimal position, and a better optimal solution is found, so that the convergence speed of the algorithm can be improved, the algorithm can be prevented from being premature, and the capability of the algorithm for jumping out of local optimization is improved. Table 3 pseudo code for best-of-bounds perturbation.

In the formula, x (best) is the optimal position in the global updating process; x (newbest) is the new position generated; wherein v and m are each [0,1]]A random number in between. With respect to the generated neighborhood location(s),using greedy strategy ^[27] Judging whether to reserve or not, and adopting the following formula:

in the equation, f (x (best)) is an optimum position adaptive value at the time of global update, and if the generated position is better than the original position, the position is replaced with the original position to make the position globally optimum. Otherwise, the optimal position remains unchanged.

TABLE 3 optimal Domain perturbation pseudo-code

4. Algorithm flow

The flow chart of the goblet sea squirt optimization algorithm based on the partitioning iteration strategy is shown in fig. 1, and the detailed steps are as follows:

s1: setting relevant parameters of a goblet sea squirt algorithm: the population scale N, the maximum iteration number L, the dimension Dim of an individual and the initialization population.

S2: and calculating the fitness value of each individual goblet ascidian according to the target function and sequencing the fitness values. And selecting the position with the best fitness as the position of the food source, and setting the current iteration number l =1.

S3: and judging whether the current iteration number l is smaller than the division coefficient, if so, entering a step S4, and if so, entering a step S5.

S4: and updating the position of the leader by utilizing a leader fusion variation strategy, and introducing self-adaptive inertia weight when the position of the follower is updated.

S5: and updating the position of the leader by utilizing an optimal individual leading motion strategy, and introducing self-adaptive inertia weight when the position of the follower is updated.

S6: and calculating the fitness value, updating the positions of the goblet ascidians, and updating the positions of all the goblet ascidians by using a one-dimensional Gaussian variation strategy.

S7: and updating the optimal goblet ascidian position by using an optimal field disturbance strategy, comparing the optimal goblet ascidian position with the current fitness value, and selecting the optimal fitness value by using a greedy strategy.

5. Function test experiment

(1) Compared with original algorithm and different group intelligent algorithm

The IPSSA algorithm, the basic SSA algorithm and the selected ant lion Algorithm (ALO), the gray wolf algorithm (GWOO), the Dragonfly Algorithm (DA) and the moth fire-fighting algorithm (MFO) with better convergence effect are subjected to optimization comparison in 30 dimensions and 100 dimensions of F1-F12 functions.

Fig. 2 shows the convergence curves of IPSSA and other different group intelligence algorithms in 30 and 100 dimensions on 3 unimodal functions and 3 multimodal functions, which can show the number of times the algorithm falls into local optimum and the convergence speed. Wherein the horizontal axis is the iteration number and the vertical axis is the optimal fitness value. It can be known from fig. 3 that after 500 iterations, the convergence speed and the convergence accuracy of IPSSA have certain advantages compared with the traditional goblet Sea Squirt Algorithm (SSA), the ant lion Algorithm (ALO), the wolf algorithm (GWO), the Dragonfly Algorithm (DA) and the moth-fire extinguishing algorithm (MFO). Specifically, the IPSSA finds the theoretical optimal value in the functions F1, F3, F8 and F10, and the convergence speed is fastest. For the function F5 (Dim = 30), the convergence speed in the early stage is fast, but the convergence accuracy falls into local optimum in the later stage, but the convergence accuracy is second to GWO and SSA. For the other cases in fig. 3, IPSSA exhibits faster convergence speed and convergence accuracy.

(2) Optimized contrast with other improved algorithm of goblet sea squirt

To further verify the optimized performance of the IPSSA algorithm, the IPSSA is compared with the selected three improved algorithms (namely ESSA, MSSA and SCSSA) with better effects, wherein the ESSA is proposed by Mohammed H.Qais et al in Enhanced salp search algorithm, application to variable speed with generator. Herein the main pair coefficient c ₁ Updating so that the leader follows the mean exponential covariance variablesc ₁ The method moves to the position of the food source, and changes the position formula of the follower, so that the follower can not only be responsible for exploring food, but also help the leader to make a decision, and the performance of the algorithm is effectively improved; MSSA is provided by Chen Lianhing et al in an improved algorithm of a goblet ascidian group, wherein a weighted gravity center is mainly introduced to a leader to replace the optimal individual position, premature gathering near the optimal individual is prevented, the global search and local optimization capability of an adaptive inertial weight balance algorithm is introduced to a follower, finally, the random difference variation is carried out on the individual in one dimension, the interference among dimensions is reduced, and the diversity of the group is improved. The SCSSA is proposed in a goblet sea squirt group algorithm of sine and cosine algorithm by Chen-faithful cloud and the like, a Logitics chaotic sequence is introduced in the algorithm to generate an initial population, and the diversity of initial individuals is increased; embedding a sine and cosine algorithm as a local factor into a goblet sea squirt group algorithm, and performing sine and cosine optimization on goblet sea squirts individuals; and carrying out a differential evolution variation strategy on the field space of the optimal cask ascidians, and enhancing the local search capability. The parameters of the comparison algorithm in this example are designed according to the above document.

Fig. 3 contains a graph of the convergence curves of the IPSSA algorithm proposed in this example with other excellent improvement algorithms in different dimensions of 3 unimodal functions and 3 multimodal functions. As can be seen from fig. 4, the IPSSA has better convergence speed and convergence accuracy for different types of test functions. Specifically, it can be seen that the IPSSA finds the theoretical optimal value of the function in the functions F1, F3, and F8, and the convergence rate is high. The convergence speed is fast in the early stage of the 30-dimensional and 100-dimensional functions F5, but the convergence precision is less than that of SCSSA. On other reference functions, the IPSSA has better convergence accuracy and convergence speed.

The second embodiment: esophageal cancer life cycle prediction model based on IPSSA-BP and effect verification

The BP neural network continuously updates the initial weight and the threshold value in a circulating mode until the minimum value of the calculation error of the initial setting is reached or the total training times of the initial setting is reached. Its advantages are high precision and high effect on solving the problem of complexity. However, the initial weight and threshold of the BP neural network are both generated randomly, which has a great influence on the accuracy of the neural network, and results of the evaluation are unreliable.

In the embodiment, the improved IPSSA algorithm optimizes the initial weight and the threshold of the BP neural network, and compares the initial weight and the threshold with ESSA, SCSSA and ASSA optimized neural networks and traditional BP neural networks and whale algorithm (WOA) optimized BP neural networks to verify the effectiveness of the algorithm.

1. Esophagus cancer life prediction model based on BP neural network

And screening out the death influencing factors of the esophageal cancer patient by applying single-factor COX regression analysis, and establishing a life cycle prediction model of the BP neural network by taking the screened out factors as input variables and the survival time as output variables. 500 patients with esophageal cancer were selected from the subsidiary hospital of zheng zhou university from 2007 to 2018, wherein 316 patients were male (63.2%) and 184 patients were female (36.8%). The average patient age was 60.258 years. The blood indices of these esophageal cancer patients were analyzed using one-way COX regression analysis to obtain 5 blood indices related to survival: white Blood Cell Count (WBCC), monocyte Count (Mono), neutrophil Count (Seg), prothrombin Time (PT), and International Normalized Ratio (INR) as inputs, the structure of the lifetime model for constructing the BP neural network is 5-11-1, and the optimized dimension is:

Dim＝(inputnum+1)×hiddennum+(hiddennum+1)×outputnum (12)；

wherein, inputnum, hiddennum and outputnum are the number of input layers, the number of hidden layers and the number of output layers of the BP neural network respectively. Therefore, the number of the initial weights and the thresholds to be optimized in this example is 5 × 11+11 × 1=66 weights and 12 thresholds to be optimized by the algorithm. Equation (16) is chosen as the fitness function. The specific expression of the function is:

2. Analysis of results

Blood samples from 500 groups of patients with esophageal cancer were used to establish the model. 300 groups of data of esophageal cancer patients are randomly selected to serve as training data of a life prediction model of the esophageal cancer patients for training, and the remaining 200 groups of data serve as test data of the life prediction model of the esophageal cancer patients for predicting the survival level prediction result of the esophageal cancer patients. The IPSSA algorithm is compared with 3 improved algorithms and a traditional BP neural network and a WOA algorithm; the population sizes of all algorithms are uniformly set to be 100, and other parameter settings are set by referring to original documents. And (3) parameter configuration of the BP neural network, wherein the maximum training times are set to be 5000 times, the learning rate is set to be 0.3, and the minimum error of a training target is set to be le-9.

Fig. 4 shows a comparison graph of actual values and predicted values of a test set of each algorithm optimized BP neural network, prediction accuracy of the prediction model can be roughly judged in fig. 5, prediction accuracy of the model cannot be quantitatively reflected, and relevant error analysis is performed on all models in order to better verify prediction effects of the IPSSA algorithm optimized BP neural network. The model was quantitatively analyzed using the commonly used MAE, MSE, MAPE as evaluation criteria. The MAE can accurately reflect the actual prediction error, and the smaller the value of the MAE is, the smaller the error of the prediction model is; the MSE can evaluate the change degree of data, and the smaller the value of the MSE is, the better the accuracy of the description experiment data of the prediction model is; the MAPS can measure the accuracy of prediction, and the smaller the MAPE value is, the better the fitting effect of the prediction model is, and the higher accuracy is achieved. Fig. 5 depicts a comparison situation of corresponding simulation evaluation criteria, and it can also be seen intuitively from the figure that, among the three evaluation indexes, the IPSSA-BP model of this example is at the minimum value, which indicates that the prediction accuracy of the IPSSA-BP model is better, the prediction stability and effect are also better, and indicates that there is a certain competitive advantage in the optimization performance of the IPSSA algorithm.

The convergence speed and the convergence accuracy of the IPSSA algorithm proposed in the first embodiment are analyzed through 3 sets of simulation experiments: 1) Comparing with different group intelligent algorithms; 2) Compared with other improved goblet sea squirt algorithms; 3) Optimizing the esophageal cancer life prediction model. The first 2 groups of experiments are respectively carried out under different dimensions of F1-F12 functions, and through a convergence curve and Friedman test analysis, the IPSSA algorithm has better convergence speed and convergence precision and certain competitive advantages compared with different swarm intelligence algorithms and different improved goblet sea squirt algorithms on the premise of ensuring stability. The 3 rd group of experiments are that the esophageal cancer survival prediction model is optimized and compared with other 6 models, and all models are analyzed by using MAE, MSE and MAPE as evaluation criteria, and the results show that the optimization performance of the IPSSA algorithm is effective.

Although preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the disclosure.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present application and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A construction method of an esophageal cancer life prediction model based on an improved goblet sea squirt algorithm is characterized by comprising the following steps:

(1) Analyzing and screening out the influence factors of the life cycle of the esophageal cancer patient, taking the screened-out factors as input variables and the life cycle as output variables, and establishing a life cycle prediction model based on a BP neural network;

s7: updating the optimal goblet ascidian position by using the optimal field disturbance strategy, comparing the optimal goblet ascidian position with the current fitness value, and selecting the optimal fitness value by using a greedy strategy;

2. The method as claimed in claim 1, wherein in step S2, the objective function expression is as follows:

wherein X is the number of training samples of the esophageal cancer survival model, and y _i Network for ith esophageal cancer patientOutput value of model, y' _i The actual output value of the ith esophageal cancer patient is obtained.

3. The method of claim 1, wherein in step S4, the leader location update process comprises the following steps:

a+b+c＝1 ③；

in the formula (I), the compound is shown in the specification,

is two randomly selected goblet sea squirt individuals in the population; f _j A source position for the jth dimension food source; a, b, c are random numbers in [0,1]]Taking a value in between.

4. The method as claimed in claim 1, wherein the updating of the leader position in step S5 is performed according to the following formula:

in the formula, F _j Is the position of the j-th food source, R ₂ R ₃ Is [0,1]]A random number of intervals; r is [ -1,1]The formula for calculating the random number, B, in the interval is as follows:

5. The method as claimed in claim 1, wherein the step S6 is performed by updating the locations of all the ascidian goblet according to the following formula:

q＝1-(l-L) ² ⑥；

x(j)＝q×x(j)+randn×x(j) ⑦；

6. The method as claimed in claim 1, wherein the optimal goblet ascidian location is updated in step S7 by the following formula:

in the formula, x (best) is the optimal position in the global updating process; x (newbest) is the new position generated; wherein v and m are random numbers between [0,1] respectively.

7. The method as claimed in claim 6, wherein in step S7, the generated neighborhood positions are determined to be retained according to the following formula:

8. The method for constructing the esophageal cancer survival prediction model based on the improved halymenia caspica algorithm as claimed in claim 1, wherein in the step (1), the blood index related to the survival of the esophageal cancer patient is obtained by screening with one-factor COX regression analysis: the structure of the life cycle model for constructing the BP neural network is 5-11-1 by taking the white blood cell count, the monocyte count, the neutrophil count, the prothrombin time and the international standardized ratio as input, and the optimized dimensionality is as follows:

Dim＝(inputnum+1)×hiddennum+(hiddennum+1)×outputnum ⑩；

9. A method for predicting the survival time of esophageal cancer patients is characterized in that blood indexes related to the survival time of esophageal cancer patients to be predicted are obtained, the esophageal cancer survival time prediction model constructed according to the method in claim 1 is input, and the survival time of the esophageal cancer patients is output.