CN117974194A - Cost prediction method, device and system based on self-adaptive NSGA-II-SVR - Google Patents

Cost prediction method, device and system based on self-adaptive NSGA-II-SVR Download PDF

Info

Publication number
CN117974194A
CN117974194A CN202410208434.4A CN202410208434A CN117974194A CN 117974194 A CN117974194 A CN 117974194A CN 202410208434 A CN202410208434 A CN 202410208434A CN 117974194 A CN117974194 A CN 117974194A
Authority
CN
China
Prior art keywords
prediction
svr
nsga
adaptive
cost prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410208434.4A
Other languages
Chinese (zh)
Inventor
周柯铭
吕广迎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202410208434.4A priority Critical patent/CN117974194A/en
Publication of CN117974194A publication Critical patent/CN117974194A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a cost prediction method, a device and a system based on self-adaptive NSGA-II-SVR, which comprise the steps of obtaining a plurality of historical cost prediction index values associated with prediction results and carrying out dimensionless treatment; bringing the history cost prediction index value after dimensionless treatment into a self-adaptive NSGA-II feature selection model to generate a plurality of feature selection schemes; constructing SVR prediction models according to different characteristic selection schemes, and finally selecting the SVR prediction model with the smallest prediction error as an optimal SVR prediction model; and (3) carrying out dimensionless treatment on the real-time cost prediction index value corresponding to the optimal characteristic selection scheme, and then sending the real-time cost prediction index value into an optimal SVR prediction model to obtain a corresponding cost prediction value. The invention uses the self-adaptive NSGA-II to solve the engineering characteristic selection problem, and improves the cost prediction precision and generalization capability of the SVR model.

Description

Cost prediction method, device and system based on self-adaptive NSGA-II-SVR
Technical Field
The invention belongs to the field of engineering cost prediction, and particularly relates to a cost prediction method, device and system based on self-adaptive NSGA-II-SVR.
Background
Whether the engineering cost prediction model constructed based on machine learning can be put into practical application depends on the prediction precision of the model. Besides the effect of model selection on the prediction result of the engineering cost, the engineering characteristic index as the input variable of the model is also good or bad of the prediction result. At present, in the field of engineering cost prediction, two main prediction model construction ideas exist. The first is to determine factors influencing the prediction index, construct an engineering characteristic index system, and take all indexes as input variables of a model; and secondly, carrying out secondary screening on the preliminarily determined engineering characteristic indexes, removing redundant indexes or indexes with smaller influence, and training the model by using the screened indexes. When the number of indexes is large, the first method may cause the complexity of the model to increase, thereby affecting the prediction accuracy and generalization capability of the model; the second method needs to adopt a scientific index screening strategy to ensure the accuracy of the model.
The research of the engineering cost prediction model at the present stage mainly focuses on the optimization of model parameters by an evolution algorithm, and the research on feature selection is very little. The feature selection is an effective data dimension reduction technology, and can effectively remove redundant and irrelevant features in a data set, so that the prediction accuracy and reliability of a model are remarkably improved. At present, in the field of project cost prediction, the research on index screening is less, the main index selection method is to use a principal component analysis method to reduce the dimension of the index, but the method has certain defects in the practical application process. When regression prediction is performed by using a principal component analysis method, since the linear combination of the original index information has the problem of overlapping and no relation is established with the dependent variable in the dimension reduction process, the finally obtained principal component is difficult to accurately interpret the dependent variable, thereby reducing the prediction accuracy of the model.
Therefore, a scientific and reasonable feature selection method needs to be provided for engineering feature index selection.
Disclosure of Invention
Aiming at the problems, the invention provides a cost prediction method, a device and a system based on self-adaptive NSGA-II-SVR, which uses the self-adaptive NSGA-II to solve the problem of engineering feature selection and improves the cost prediction precision and generalization capability of an SVR model.
In order to achieve the technical purpose and achieve the technical effect, the invention is realized by the following technical scheme:
In a first aspect, the present invention provides a cost prediction method based on adaptive NSGA-II-SVR, comprising:
Acquiring a plurality of historical cost prediction index values associated with the prediction result, and performing dimensionless treatment;
Bringing the history cost prediction index value after dimensionless treatment into a self-adaptive NSGA-II feature selection model to generate a plurality of feature selection schemes;
constructing SVR prediction models according to different characteristic selection schemes, and finally selecting the SVR prediction model with the smallest prediction error as an optimal SVR prediction model;
and (3) carrying out dimensionless treatment on the real-time cost prediction index value corresponding to the optimal characteristic selection scheme, and then sending the real-time cost prediction index value into an optimal SVR prediction model to obtain a corresponding cost prediction value.
Optionally, the adaptive NSGA-II feature selection model performs the following operations on the received dimensionless cost prediction index value to generate a plurality of feature selection schemes, including:
Initializing a population P, wherein the characteristics of each individual in the population P are selected historical cost prediction index values;
initializing the distribution probability of the crossover operator and recording the matrix of the crossover operator rewarding and punishing information;
Repeating the preset iteration step until the termination condition is met, and obtaining a plurality of feature selection schemes, wherein the number of the feature selection schemes is equal to the number of iteration rounds;
the iterative steps include:
Selecting one crossover operator according to the distribution probability of the crossover operator, randomly selecting two parents from the population P, performing crossover operation according to the crossover rate P c based on the selected crossover operator, and performing uniform mutation according to the mutation rate P m to generate offspring;
Comparing the merits of the father and the filial generation through non-dominant sorting, and recording the rewarding and punishing information of the crossover operator according to the comparison result of the merits and the merits;
repeating for a preset number of times to obtain a plurality of offspring, combining the parents and the offspring together, performing rapid non-dominant sorting and congestion distance calculation, calculating the fitness value of the individuals, and selecting a plurality of individuals meeting the preset requirements as all individuals of a new parent population to be used as a characteristic selection scheme;
after the preset number of rounds, the distribution probability of the crossover operator is updated once according to the recorded reward and punishment information.
Optionally, when comparing the merits of the father and the filial generation, if the father and the filial generation are in a relationship of not being dominant, the filial generation is not dominant by the two father at the same time, rewarding is carried out, otherwise punishment is carried out; if the dominant relationship exists between the two parents, the offspring is compared with the dominant parents, if the offspring is dominant, the offspring is punished, and otherwise, the offspring is awarded.
Optionally, bonus information is recorded using nReward vectors, nPenalty vectors penalty information is recorded, wherein:
nReward=(r1,r2,…,rQ)
nPenalty=(p1,p2,…,pQ)
Wherein, Q is the total number of crossover operators, and all elements in nReward and nPenalty are set to zero during initialization to record rewards and penalty information;
The mathematical form of the matrix for recording the punishment and punishment information of the crossover operator is as follows:
At initialization, all elements in the matrices RD and PN are set to 0, and after the probability update is selected, the matrices RD and PN will be initialized again; nReward and nPenalty obtained in each population iteration process are added into RD and PN in the form of row vectors so as to save historical reward and punishment information in the LP population iteration process, and the crossover operator selection probability is updated.
Optionally, the updating the allocation probability of the crossover operator includes the following steps:
based on the matrix RD and PN, respectively calculating the selection probability of each crossover operator;
normalizing the selection probability so that the sum of the selection probabilities of all the crossover operators is equal to 1, wherein the normalization calculation formula is as follows:
Wherein p q is the selection probability normalized by the q-th crossover operator.
Optionally, the calculation process of the selection probability of the q-th crossover operator is as follows:
Summing the q-th columns of the matrixes RD and PN, wherein the calculation formula is as follows:
Wherein, Is the number of excellent filial generations generated by the q-th crossover operator in the LP population iteration process,/>The number of bad filial generation;
The selection probability of the q-th crossover operator is calculated according to the following formula:
Optionally, setting the solution size as an fitness value of the individual, wherein a calculation formula of the fitness value is as follows:
wherein f 2 (X) represents the number of predictors, X represents a set of all features, xi represents the ith gene value in the individual, and D represents the number of selected cost predictors.
Optionally, the calculation formula of the prediction error is:
Wherein n represents the number of samples, y i is the true value of the sample, Is a sample predictor.
In a second aspect, the present invention provides a cost prediction apparatus based on adaptive NSGA-II-SVR, comprising:
The data acquisition module is used for acquiring a plurality of historical cost prediction index values associated with the prediction result and carrying out dimensionless processing;
The feature selection scheme generation module is used for bringing the history cost prediction index value after dimensionless treatment into the self-adaptive NSGA-II feature selection model to generate a plurality of feature selection schemes;
the model training module is used for constructing SVR prediction models according to different characteristic selection schemes, and finally selecting the SVR prediction model with the smallest prediction error as the optimal SVR prediction model;
and the prediction module is used for carrying out dimensionless treatment on the real-time cost prediction index value corresponding to the optimal characteristic selection scheme, and then sending the real-time cost prediction index value into the optimal SVR prediction model to obtain a corresponding cost prediction value.
In a third aspect, the present invention provides a cost prediction system based on adaptive NSGA-II-SVR, comprising a storage medium and a processor;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the method according to any one of the first aspects.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides a cost prediction method, a device and a system based on self-adaptive NSGA-II-SVR, which are characterized in that firstly, a plurality of association factors are selected, original prediction data are obtained, and dimensionless treatment is carried out on the data. And then, the data subjected to dimensionless treatment is brought into an adaptive NSGA-II feature selection model (namely, an adaptive mechanism is added on the basis of NSGA-II), so that a plurality of feature selection schemes are generated. Constructing SVR prediction models according to different characteristic selection schemes, and finally selecting the SVR prediction model with the smallest prediction error as an optimal SVR prediction model; the real-time cost prediction index value corresponding to the optimal feature selection scheme is subjected to dimensionless treatment and then is sent to the optimal SVR prediction model to obtain the corresponding cost prediction value, and the self-adaptive NSGA-II is used for solving the engineering feature selection problem, so that the prediction precision and generalization capability of the SVR model are greatly improved.
Furthermore, the invention provides that a plurality of different crossover operators are integrated in NSGA-II (namely a second generation non-dominant sorting genetic algorithm), and the global searching capability of the algorithm is greatly improved by introducing an adaptive mechanism comprising a probability updating mechanism and a punishment mechanism.
Drawings
For a clearer description of an embodiment of the invention or of the solutions of the prior art, the drawings that are needed in the embodiment will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art, in which:
FIG. 1 is a flow chart of a cost prediction method based on adaptive NSGA-II-SVR according to an embodiment of the present invention;
FIG. 2 is a flow chart of an adaptive NSGA-II feature selection model according to one embodiment of the invention;
FIG. 3 is a diagram showing the result of SVR prediction model training set unilateral cost prediction based on adaptive NSGA-II according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the result of SVR prediction model test set unilateral cost prediction based on adaptive NSGA-II according to one embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In addition, if there is a description of "first", "second", etc. in the embodiments of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
Example 1
The embodiment of the invention provides a cost prediction method based on self-adaptive NSGA-II-SVR, which is shown in figure 1 and comprises the following steps:
(1) Acquiring a plurality of historical cost prediction index values associated with the prediction result, and performing dimensionless treatment;
(2) Bringing the history cost prediction index value after dimensionless treatment into a self-adaptive NSGA-II feature selection model to generate a plurality of feature selection schemes;
(3) Constructing SVR prediction models according to different characteristic selection schemes, and finally selecting the SVR prediction model with the smallest prediction error as an optimal SVR prediction model;
(4) And (3) carrying out dimensionless treatment on the real-time cost prediction index value corresponding to the optimal characteristic selection scheme, and then sending the real-time cost prediction index value into an optimal SVR prediction model to obtain a corresponding cost prediction value.
In a specific implementation manner of the embodiment of the present invention, the adaptive NSGA-II feature selection model performs the following operations on the received dimensionless cost prediction index value, and generates a plurality of feature selection schemes, including:
Initializing a population P, wherein the characteristics of each individual in the population P are selected historical cost prediction index values; coding of individuals in a population typically uses binary coding. Each individual is a code string consisting of 0 and 1, and the length of the code string is the same as the characteristic number in the data set (namely, the same as the number of the selected historical cost prediction indexes). Each feature corresponds to a number at a position in the code string, the corresponding position being referred to as a gene location and the numerical value at the position being referred to as a gene value. If the gene value is 0, the feature is not selected; otherwise, the gene value is 1, which represents that the feature is selected. For example, where one original dataset contains four features X 1,X2,X3,X4, and the corresponding individual is encoded as "0110", it indicates that feature X 1,X4 is not selected and feature X 2,X3 is selected. During population initialization, half of the D gene values on each individual are selected, i.e., 50% of each feature is selected. The cost prediction index can comprise indexes such as a predicted construction period, a construction appearance, an overground construction area, an underground construction area, a total construction area, an earthquake resistance rank, an overground layer number and the like.
Initializing the distribution probability of the crossover operator and recording the matrix of the crossover operator rewarding and punishing information;
Repeating the preset iteration step until the termination condition is met, and obtaining a plurality of feature selection schemes, wherein the number of the feature selection schemes is equal to the number of iteration rounds;
the iterative steps include:
Selecting one crossover operator according to the distribution probability of the crossover operator, randomly selecting two parents from the population P, performing crossover operation according to the crossover rate P c based on the selected crossover operator, and performing uniform mutation according to the mutation rate P m to generate offspring;
Comparing the merits of the father and the filial generation through non-dominant sorting, and recording the rewarding and punishing information of the crossover operator according to the comparison result of the merits and the merits;
repeating for a preset number of times to obtain a plurality of offspring, combining the parents and the offspring together, performing rapid non-dominant sorting and congestion distance calculation, calculating the fitness value of the individuals, and selecting a plurality of individuals meeting the preset requirements as all individuals of a new parent population to be used as a characteristic selection scheme;
after the preset number of rounds, the distribution probability of the crossover operator is updated once according to the recorded reward and punishment information.
In a specific implementation process, the adaptive NSGA-II feature selection model provided in the embodiment of the present invention performs the following operations on the received dimensionless cost prediction index value, and pseudo codes for generating a plurality of feature selection schemes may be referred to in table 3. First, a population p is initialized in a decision space Ω, the population p comprising N individuals. In order to record the historical performance information of multiple crossover operators, a series of parameters including the total number of integrated crossover operators and the selection probability of each crossover operator need to be initialized. Then, before each crossover operation, a proportional selection method is adopted to select one crossover operator, two parents are selected from the population P by using non-return sampling, the crossover operation is completed by using the selected crossover operator, and then uniform mutation operation is carried out, so that two offspring are generated. And comparing the parent and the offspring, determining the pareto dominance relation, and evaluating the crossover operator. If the crossover operator produces excellent children, then record bonus information in nReward; otherwise, if no excellent offspring are generated, penalty information is recorded in nPenalty. The resulting offspring are added to population T. After N/2 identical operations, the offspring population T with the same parent population scale can be obtained. And combining the parent population and the offspring population to obtain a new population R, and selecting the best individuals from the new population R by using a rapid non-dominant sorting and crowding distance calculating method to construct the next generation population. The steps are an iterative process of the population, and the matrix RD and PN are used for recording punishment information of the crossover operator in the process. After each iteration of the population by LP (a preset value), the selection probability of the crossover operator is updated according to the historical information recorded by RD and PN. The above process is repeated until the population reaches a maximum number of iterations maxFEs. The algorithm flow chart is shown in fig. 2.
TABLE 3 adaptive NSGA-II pseudocode
In a specific implementation manner of the embodiment of the invention, when comparing the merits of the father and the offspring, if the father is not mutually dominant, the offspring is not simultaneously dominant by the two father, then rewarding, otherwise punishing; if the dominant relationship exists between the two parents, the offspring is compared with the dominant parents, if the offspring is dominant, the offspring is punished, and otherwise, the offspring is awarded.
The primary way in which evolutionary algorithms produce new individuals is the crossover and mutation operations. The interleaving operation is an effective search method that can transfer superior information to the next generation, thereby improving search efficiency and affecting the final search result. Therefore, selecting an appropriate crossover operator is critical to improving search efficiency. However, selecting the appropriate crossover operator for different problems is a difficult process. The present invention will solve the problems in NSGA-II with a variety of crossover operators, including single point crossover operators, multi-point crossover operators, uniform crossover operators, shuffling crossover operators, and reduced proxy crossover operators. The five operators have different searching characteristics, and before each crossing operation, a proper crossing operator is selected from the operators, and then the parent is subjected to the crossing operation. A brief description of the five crossover operators is provided below.
(1) The single point crossover operator randomly selects a position on the two parent individuals, called the crossover point, and swaps the gene to the right of the point between the two parents.
(2) The two-point crossover operator determines two positions on a parent individual, and exchanges gene values between the two positions between the parents.
(3) Even crossover operator two individual indices i (i=1, 2., the genes at D) are exchanged with a certain probability.
(4) The shuffling crossover operator is to take one parent gene half and add half from another parent gene to form a new offspring.
(5) Reducing the surrogate crossover operator can result in a child individual having more distinct traits than a parent individual, which controls the crossover point such that the crossover point occurs only where the two parent individuals differ in gene value.
During initialization, each crossover operator obtains the same probability distribution, and then the selection probability is updated according to recorded punishment information of the crossover operators. In short, two offspring are obtained in each crossing operation, the pareto ordering is carried out on the father and the offspring, and the superiority degree of the father and the offspring is compared. If the parent is better than the offspring, indicating that the crossover operator does not generate excellent individuals, giving punishment to the operator; if the offspring is better than the parent, awarding the operator, recording the punishment information, and updating the selection probability of the crossover operators, wherein the sum of the selection probabilities of all crossover operators is 1. Before each crossing operation, the optimal crossing operator in the current stage should be determined first, and then the crossing operation should be performed. The core idea of the crossover operator punishment mechanism is to motivate crossover operators that perform well by rewarding them. Thus, in one embodiment of the present invention, the bonus information is recorded using nReward vectors, and the penalty information is recorded using nPenalty vectors, wherein:
nReward=(r1,r2,…,rQ)
nPenalty=(p1,p2,…,pQ)
where Q is the total number of crossover operators, all elements in nReward and nPenalty are set to zero at initialization to record rewards and penalty information.
Rewarding and punishing the crossover operator requires a pareto dominant relationship between the parent and the offspring. The specific procedure is shown in Table 4.
Table 4 cross operator punishment mechanism pseudo code
When comparing offspring with parent, the offspring is mainly divided into two cases. One is that there is a dominant relationship between two parents, the other is that there is no mutual dominance between two parents, concretely as follows:
(1) One parent dominates the other
Assuming that parent j dominates parent i, each child needs to be compared to parent j to determine the pareto dominance. nReward q +1 if the child is not dominated by parent j, or nPenalty q +1.
(2) Two parents are not mutually dominant
Assuming that the two parents do not govern each other, then all children need to be compared with the pareto dominance relationship of the two parents, respectively. If the child is not at the same time dominated by two parents then nReward q +1, otherwise nPenalty q +1.
At initialization, all locations on nReward and nPenalty are set to 0. After a sufficient number of fitness value evaluations nReward and nPenalty will be reset to 0.
For ensuring that the best crossover operator is used in different periods, the crossover operator selection probability needs to be updated periodically according to the historical performance of each operator in the operator set, namely, the selection probability is updated once every time the LP population iterations are performed. As described in the previous subsection, the reward and punishment information of each generation is recorded in two vectors, nReward and nPenalty, respectively. For this purpose, two matrices of RD and PN are established for storing rewards and punishment information in the LP population iterative process, and the mathematical form of the matrices RD and PN for recording the rewards and punishment information of the crossover operator is as follows:
At initialization, all elements in the matrices RD and PN are set to 0, and after the probability update is selected, the matrices RD and PN will be initialized again; nReward and nPenalty obtained in each population iteration process are added into RD and PN in the form of row vectors so as to save historical reward and punishment information in the LP population iteration process, and the crossover operator selection probability is updated.
The updating of the allocation probability of the crossover operator comprises the following steps:
based on the matrix RD and PN, respectively calculating the selection probability of each crossover operator;
normalizing the selection probability so that the sum of the selection probabilities of all the crossover operators is equal to 1, wherein the normalization calculation formula is as follows:
Wherein p q is the selection probability normalized by the q-th crossover operator.
The calculation process of the selection probability of the q-th crossover operator is as follows:
Summing the q-th columns of the matrixes RD and PN, wherein the calculation formula is as follows:
Wherein, Is the number of excellent filial generations generated by the q-th crossover operator in the LP population iteration process,/>The number of bad filial generation;
The selection probability of the q-th crossover operator is calculated according to the following formula:
In a specific implementation manner of the embodiment of the present invention, the solution size is set as an fitness value of an individual, and a calculation formula of the fitness value is:
wherein f 2 (X) represents the number of predictors, X represents a set of all features, xi represents the ith gene value in the individual, and D represents the number of selected cost predictors.
In a specific implementation manner of the embodiment of the present invention, the calculation formula of the prediction error is:
Wherein n represents the number of samples, y i is the true value of the sample, Is a sample predictor.
The cost prediction method based on the adaptive NSGA-II-SVR in the embodiment of the present invention will be described in detail with reference to a specific embodiment.
(1) Model training and test samples are partitioned.
(2) Setting related parameters of an algorithm: the maximum iteration number of the population is 200, the population scale is 10, the iteration number of the updated crossover operator is 5 (LP), the crossover rate P c is 0.9, and the mutation rate P m is 1/D.
(3) And setting a fitness function, and selecting the prediction error and the solution size of the SVR prediction model as the fitness function.
(4) Population initialization, using binary encoding of individuals, yields 10 individuals. And initializing the distribution probability of the crossover operator, and recording the matrix initialization of the crossover operator reward and punishment information.
(5) And obtaining one fitness value of the individual, namely the solution size, according to the obtained feature selection scheme, and training the support vector machine model by utilizing the feature selection scheme to obtain the other fitness value of the individual, namely the prediction error.
(6) Selecting a crossover operator according to the allocation probability, randomly selecting two father, carrying out selected crossover operation according to the probability P c, carrying out uniform variation according to the probability, training the generated offspring by using a support vector machine to obtain the fitness of the offspring, and comparing the merits of the father and the offspring through non-dominant sorting. If the parent is not mutually dominant, the offspring is not simultaneously dominant by the two parent, rewarding, otherwise punishing; if the dominant relationship exists between the two parents, the offspring is compared with the dominant parents, if the offspring is dominant, the offspring is punished, and otherwise, the offspring is awarded. And recording punishment and punishment information of the crossover operator. After 5 iterations, 10 offspring were obtained, the parent and offspring were combined together and a fast non-dominant ranking and crowding distance calculation was performed to determine the most prominent individuals and add them to the next generation.
(7) And updating the distribution probability of the crossover operator once every 5 rounds according to the recorded rewarding and punishing information.
(8) And (3) judging whether the iteration times are met or not, and continuing to repeat the processes (3) to (7) until the termination condition is met, so as to obtain a plurality of feature selection schemes.
(9) And (3) selecting a feature selection scheme with the minimum prediction error to construct an SVR prediction model, training and completing prediction on a test set.
The pareto front results output by the adaptive NSGA-II feature selection method are shown in table 1. It can be seen from the table that the adaptive NSGA-II algorithm outputs 10 non-bad solutions, the number of valid non-bad solutions is 9, i.e. 9 different feature selection schemes are corresponding, where the feature selection scheme with the smallest prediction error is also selected, i.e. scheme 2, for training and predicting the model.
Table 1 adaptive NSGA-II algorithm pareto front
Fig. 3 and fig. 4 show the prediction results of the unilateral cost of the model on the training set and the test set, respectively. As can be seen from the figure, the RMSE of the model on the training set is 97.43 yuan and the RMSE on the test set is 118.71 yuan. The prediction values of most samples on the test set have smaller errors, and the prediction accuracy is good.
The relative and absolute errors between the prediction result and the true value of the SVR cost prediction model based on the self-adaptive NSGA-II are shown in Table 2. As can be seen from the table, the maximum absolute error between the predicted value and the true value is 372.35 yuan, the minimum absolute error is 7.95 yuan, the absolute error of most of the predicted results is distributed in [10,100], and the error range is smaller; the absolute value of the relative error is 21.41% at maximum, 0.33% at minimum, and the fluctuation range of the relative error is smaller than that of the two models, which shows that the model has good generalization capability. Among the prediction results of the model, almost all the relative errors of the prediction results are within +/-10%, and the relative errors of more than half of the sample prediction results are within 5%, so that the prediction model most meeting the practical engineering application standard is among three models.
TABLE 2 adaptive NSGA-II based SVR prediction model error results
The cost prediction method based on the self-adaptive NSGA-II-SVR provided by the embodiment of the invention has higher prediction precision and model generalization capability, and meets the practical application standard of the engineering field.
Example 2
Based on the same inventive concept as in embodiment 1, an embodiment of the present invention provides a cost prediction apparatus based on an adaptive NSGA-II-SVR, including:
The data acquisition module is used for acquiring a plurality of historical cost prediction index values associated with the prediction result and carrying out dimensionless processing;
The feature selection scheme generation module is used for bringing the history cost prediction index value after dimensionless treatment into the self-adaptive NSGA-II feature selection model to generate a plurality of feature selection schemes;
the model training module is used for constructing SVR prediction models according to different characteristic selection schemes, and finally selecting the SVR prediction model with the smallest prediction error as the optimal SVR prediction model;
and the prediction module is used for carrying out dimensionless treatment on the real-time cost prediction index value corresponding to the optimal characteristic selection scheme, and then sending the real-time cost prediction index value into the optimal SVR prediction model to obtain a corresponding cost prediction value.
The remainder was the same as in example 1.
Example 3
Based on the same inventive concept as that of embodiment 1, an adaptive NSGA-II-SVR-based cost prediction system is provided in an embodiment of the present invention, which includes a storage medium and a processor;
the storage medium is used for storing instructions;
The processor is configured to operate in accordance with the instructions to perform the method according to any one of embodiment 1.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are all within the protection of the present invention.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. A cost prediction method based on adaptive NSGA-II-SVR, comprising:
Acquiring a plurality of historical cost prediction index values associated with the prediction result, and performing dimensionless treatment;
Bringing the history cost prediction index value after dimensionless treatment into a self-adaptive NSGA-II feature selection model to generate a plurality of feature selection schemes;
constructing SVR prediction models according to different characteristic selection schemes, and finally selecting the SVR prediction model with the smallest prediction error as an optimal SVR prediction model;
and (3) carrying out dimensionless treatment on the real-time cost prediction index value corresponding to the optimal characteristic selection scheme, and then sending the real-time cost prediction index value into an optimal SVR prediction model to obtain a corresponding cost prediction value.
2. The adaptive NSGA-II-SVR based cost prediction method according to claim 1, wherein: the adaptive NSGA-II feature selection model executes the following operations on the received dimensionless cost prediction index value to generate a plurality of feature selection schemes, and the method comprises the following steps:
Initializing a population P, wherein the characteristics of each individual in the population P are selected historical cost prediction index values;
initializing the distribution probability of the crossover operator and recording the matrix of the crossover operator rewarding and punishing information;
Repeating the preset iteration step until the termination condition is met, and obtaining a plurality of feature selection schemes, wherein the number of the feature selection schemes is equal to the number of iteration rounds;
the iterative steps include:
Selecting one crossover operator according to the distribution probability of the crossover operator, randomly selecting two parents from the population P, performing crossover operation according to the crossover rate P c based on the selected crossover operator, and performing uniform mutation according to the mutation rate P m to generate offspring;
Comparing the merits of the father and the filial generation through non-dominant sorting, and recording the rewarding and punishing information of the crossover operator according to the comparison result of the merits and the merits;
repeating for a preset number of times to obtain a plurality of offspring, combining the parents and the offspring together, performing rapid non-dominant sorting and congestion distance calculation, calculating the fitness value of the individuals, and selecting a plurality of individuals meeting the preset requirements as all individuals of a new parent population to be used as a characteristic selection scheme;
after the preset number of rounds, the distribution probability of the crossover operator is updated once according to the recorded reward and punishment information.
3. A method for cost prediction based on adaptive NSGA-II-SVR according to claim 2, characterized by: when comparing the merits of the father and the filial generation, if the father and the filial generation are in a mutually non-dominant relationship, the filial generation is not simultaneously dominated by the two father and filial generation, rewarding, otherwise punishing; if the dominant relationship exists between the two parents, the offspring is compared with the dominant parents, if the offspring is dominant, the offspring is punished, and otherwise, the offspring is awarded.
4. A method for cost prediction based on adaptive NSGA-II-SVR according to claim 3 and characterized by: prize information is recorded using nReward vectors, and penalty information is recorded using nPenalty vectors, wherein:
nReward=(r1,r2,…,rQ)
nPenalty=(p1,p2,…,pQ)
Wherein, Q is the total number of crossover operators, and all elements in nReward and nPenalty are set to zero during initialization to record rewards and penalty information;
The mathematical form of the matrix for recording the punishment and punishment information of the crossover operator is as follows:
At initialization, all elements in the matrices RD and PN are set to 0, and after the probability update is selected, the matrices RD and PN will be initialized again; nReward and nPenalty obtained in each population iteration process are added into RD and PN in the form of row vectors so as to save historical reward and punishment information in the LP population iteration process, and the crossover operator selection probability is updated.
5. The adaptive NSGA-II-SVR based cost prediction method according to claim 4, wherein the updating the allocation probability of the crossover operator comprises the steps of:
based on the matrix RD and PN, respectively calculating the selection probability of each crossover operator;
normalizing the selection probability so that the sum of the selection probabilities of all the crossover operators is equal to 1, wherein the normalization calculation formula is as follows:
Wherein p q is the selection probability normalized by the q-th crossover operator.
6. The adaptive NSGA-II-SVR based cost prediction method according to claim 5, wherein:
the calculation process of the selection probability of the q-th crossover operator is as follows:
Summing the q-th columns of the matrixes RD and PN, wherein the calculation formula is as follows:
Wherein, Is the number of excellent filial generations generated by the q-th crossover operator in the LP population iteration process,/>The number of bad filial generation;
The selection probability of the q-th crossover operator is calculated according to the following formula:
7. A method for cost prediction based on adaptive NSGA-II-SVR according to claim 2, characterized by: setting the solution size as an individual fitness value, wherein the fitness value has a calculation formula as follows:
wherein f 2 (X) represents the number of predictors, X represents a set of all features, xi represents the ith gene value in the individual, and D represents the number of selected cost predictors.
8. The adaptive NSGA-II-SVR based cost prediction method according to claim 1, wherein the calculation formula of the prediction error is:
Wherein n represents the number of samples, y i is the true value of the sample, Is a sample predictor.
9. A cost prediction device based on adaptive NSGA-II-SVR, comprising:
The data acquisition module is used for acquiring a plurality of historical cost prediction index values associated with the prediction result and carrying out dimensionless processing;
The feature selection scheme generation module is used for bringing the history cost prediction index value after dimensionless treatment into the self-adaptive NSGA-II feature selection model to generate a plurality of feature selection schemes;
the model training module is used for constructing SVR prediction models according to different characteristic selection schemes, and finally selecting the SVR prediction model with the smallest prediction error as the optimal SVR prediction model;
and the prediction module is used for carrying out dimensionless treatment on the real-time cost prediction index value corresponding to the optimal characteristic selection scheme, and then sending the real-time cost prediction index value into the optimal SVR prediction model to obtain a corresponding cost prediction value.
10. A cost prediction system based on adaptive NSGA-II-SVR, which is characterized by comprising a storage medium and a processor;
the storage medium is used for storing instructions;
the processor is operative to perform the method according to any one of claims 1-8.
CN202410208434.4A 2024-02-26 2024-02-26 Cost prediction method, device and system based on self-adaptive NSGA-II-SVR Pending CN117974194A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410208434.4A CN117974194A (en) 2024-02-26 2024-02-26 Cost prediction method, device and system based on self-adaptive NSGA-II-SVR

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410208434.4A CN117974194A (en) 2024-02-26 2024-02-26 Cost prediction method, device and system based on self-adaptive NSGA-II-SVR

Publications (1)

Publication Number Publication Date
CN117974194A true CN117974194A (en) 2024-05-03

Family

ID=90859478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410208434.4A Pending CN117974194A (en) 2024-02-26 2024-02-26 Cost prediction method, device and system based on self-adaptive NSGA-II-SVR

Country Status (1)

Country Link
CN (1) CN117974194A (en)

Similar Documents

Publication Publication Date Title
US7725409B2 (en) Gene expression programming based on Hidden Markov Models
CN108563555B (en) Fault change code prediction method based on four-target optimization
CN111611274A (en) Database query optimization method and system
US20050177351A1 (en) Methods and program products for optimizing problem clustering
CN111275172A (en) Feedforward neural network structure searching method based on search space optimization
Liu et al. Multiobjective criteria for neural network structure selection and identification of nonlinear systems using genetic algorithms
CN111832101A (en) Construction method of cement strength prediction model and cement strength prediction method
CN110991724A (en) Method, system and storage medium for predicting scenic spot passenger flow
CN114004341A (en) Optical fiber preform preparation process optimization method based on genetic algorithm and BP neural network
CN116340726A (en) Energy economy big data cleaning method, system, equipment and storage medium
CN115908909A (en) Evolutionary neural architecture searching method and system based on Bayes convolutional neural network
CN112183749B (en) Deep learning library test method based on directed model variation
Chen et al. A new multiobjective evolutionary algorithm for community detection in dynamic complex networks
CN117453539A (en) Compiler defect positioning method based on large language model enabling
CN117974194A (en) Cost prediction method, device and system based on self-adaptive NSGA-II-SVR
CN117273080A (en) Neural network architecture based on evolutionary algorithm
CN116611504A (en) Neural architecture searching method based on evolution
CN116305939A (en) High-precision inversion method and system for carbon water flux of land ecological system and electronic equipment
CN113297293A (en) Automatic feature engineering method based on constraint optimization evolutionary algorithm
CN114969148A (en) System access amount prediction method, medium and equipment based on deep learning
CN111026661B (en) Comprehensive testing method and system for software usability
TW202312042A (en) Automatic optimization method and automatic optimization system of diagnosis model
KR20210050362A (en) Ensemble pruning method, ensemble model generation method for identifying programmable nucleases and apparatus for the same
CN115278413B (en) Ultra-low loss optical fiber upgrading method, device and storage medium
CN116501764B (en) Automatic SQL optimization method based on generated pre-training model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination