CN115936888A - Stock characteristic construction system and method based on genetic programming - Google Patents

Stock characteristic construction system and method based on genetic programming Download PDF

Info

Publication number
CN115936888A
CN115936888A CN202211546813.1A CN202211546813A CN115936888A CN 115936888 A CN115936888 A CN 115936888A CN 202211546813 A CN202211546813 A CN 202211546813A CN 115936888 A CN115936888 A CN 115936888A
Authority
CN
China
Prior art keywords
stock
constructed
genetic programming
algorithm
stocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211546813.1A
Other languages
Chinese (zh)
Inventor
朱荣晖
陈建
张斌
王广普
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Linlong Technology Co ltd
Original Assignee
Shenyang Linlong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Linlong Technology Co ltd filed Critical Shenyang Linlong Technology Co ltd
Priority to CN202211546813.1A priority Critical patent/CN115936888A/en
Publication of CN115936888A publication Critical patent/CN115936888A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a stock characteristic construction system and method based on genetic programming, and relates to the technical field of machine learning. The invention uses the genetic programming to carry out feature construction, designs components in the genetic programming algorithm aiming at the stock classification problem, and simultaneously carries out parameter adjustment through grid search to improve the accuracy and the efficiency of stock classification. The invention uses the genetic programming to construct the characteristics, which can improve the identification efficiency and accuracy of the stock change trend, and uses three groups of characteristics of ascending, descending and oscillation for the newly input stock according to the different characteristics corresponding to the existing different trends, and uses the constructed characteristics to calculate the historical data to obtain three groups of vectors. The constructed features are also used for known stocks to get the corresponding vectors. The vector obtained by newly inputting stocks and obtained by calculating the vector from the data of known stock trends is subjected to similarity analysis, and the trend with the maximum similarity is the same, so that the change trends of other stocks are identified.

Description

Stock characteristic construction system and method based on genetic programming
Technical Field
The invention relates to the technical field of machine learning, in particular to a stock characteristic construction system and method based on genetic programming.
Background
Currently, the number of people of the Chinese stocks is about 1.5 hundred million, the total population of the Chinese is 14.1 hundred million, that is, at least one of 10 people is a stock. By analyzing all the characteristics of the stocks, important characteristics can be obtained to help us to quickly and accurately judge the variation trend of a large number of stocks and to help us to classify the stocks according to the variation trend. The investment of the user is assisted by combining the artificial intelligence technology and the stock analysis algorithm. After the change trend of the stocks is quickly known, under the guidance of the trend, the method can help the user select the stocks to buy and sell, and the income of the user is maximized as much as possible.
There are many statistical methods to analyze and judge the change trend of stocks, but there are inevitable contingencies, i.e. the accuracy of the judgment result cannot be guaranteed. Meanwhile, when a large number of stocks need to be judged, there is no way to ensure the judgment efficiency. On the basis of the characteristics constructed by using the genetic programming algorithm, the efficiency and the accuracy can be considered. Because the genetic programming algorithm is obviously related to heredity, corresponding characteristics are constructed according to different stock change trends, the algorithm can be rejected according to fitness scores in the execution process, and finally the characteristics which accord with the current change trend are obtained. Therefore, the trend of a large number of stocks can be accurately and quickly judged according to the constructed characteristics, and the stocks can be classified according to different changing trends of the stocks.
The high-quality features are helpful for improving the overall performance and accuracy of the model, the expected result and performance cannot be obtained from common raw data, and the algorithm cannot automatically extract meaningful features from the common raw data, so that the raw data needs to be converted into features aiming at different stock change trends by feature engineering. Feature construction is one type of feature engineering, and is the artificial construction of new features from raw data. Requiring a significant amount of time to observe the data, thinking about the underlying form and data structure of the problem, sensitivity to data and machine learning practical experience can help professionals go to feature construction. And for the constructed new features, whether the performance of the machine learning model can be improved needs to be verified, and a useless feature is not constructed to increase the complexity of algorithm operation. Existing feature construction methods include, but are not limited to, the following: single column operation, multiple column operation, grouping/aggregation operation.
Single column operation: and performing operations of four arithmetic operations, evolution, square, power, exponent, logarithm and the like on a column of data.
Multi-column operation: and operations of multi-column summation, difference calculation, average calculation, maximum/minimum calculation of several columns and the like are realized.
Grouping/aggregation operation: when statistics according to certain attributes (such as time periods) are required, grouping/aggregation operations are required, for example, when the data records the height, gender, age and the like of people, grouping according to gender, averaging age and the like can be carried out, and the operations are the grouping/aggregation operations.
The above are some of the methods for performing feature construction currently, certain knowledge is required for data when the above methods are used for performing feature construction, and rich experience is required for performing feature construction, and the constructed features are not more suitable for training of a current model than the original features. Therefore, the research uses a genetic programming algorithm to construct features, and after population evolution is completed, new features which are more suitable for rapidly and accurately identifying stock trends than original features are generated.
The genetic programming algorithm is one of evolution algorithms, imitates the evolution and heredity of organisms in the nature, and enables the problems to be solved to approach the optimal solution step by step from the initial solution by means of operations such as copying, exchanging, mutation and the like according to the principles of 'survival competition' and 'superior-inferior'. Genetic programming reflects problems in a hierarchical format of a computer program and is suitable for various complex problems. Based on the features of the genetic programming algorithm, the algorithm is used to construct a new set of features most suitable for algorithm training. The identification efficiency and accuracy of the stock change trend can be improved by using the genetic programming to construct the features, and according to different features corresponding to different existing trends, three groups of features, namely three groups of features of ascending, descending and oscillating, are used for newly input stocks, and the constructed features are used for calculating historical data to obtain three groups of vectors. These three sets of vectors are used. And calculating historical data by using the constructed features for the known stocks to obtain corresponding vectors. The similarity analysis is carried out on vectors obtained by calculating the vectors of the newly input stocks and the data of known stock trends, and the trends with the maximum similarity are the same, so that the change trends of other stocks are judged, and the stocks are classified according to the difference of the change trends.
There are many statistical methods to analyze and judge the change trend of stocks, but there are inevitable contingencies, i.e. the accuracy of the judgment result cannot be guaranteed. Meanwhile, when a large number of stocks need to be judged, there is no way to ensure the judgment efficiency. On the basis of the characteristics constructed by using the genetic programming algorithm, the efficiency and the accuracy can be considered. Because the genetic programming algorithm is obviously related to heredity, corresponding characteristics are constructed according to different stock change trends, the algorithm can be rejected according to fitness scores in the execution process, and finally the characteristics which accord with the current change trend are obtained. Therefore, the trend of a large number of stocks can be accurately and quickly judged according to the constructed characteristics, and the stocks can be classified according to different changing trends of the stocks.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a stock characteristic construction system and method based on genetic programming.
In order to realize the purpose, the technical scheme of the invention is as follows:
in one aspect, a stock characteristic construction system based on genetic programming comprises a preparation module, an execution module and an application module;
the preparation module preprocesses the genetic programming algorithm, establishes a terminal set TerminalSet and a function set function, and sets parameters of the genetic programming algorithm;
the execution module generates an initial community, generates a population according to the set population size, and uses a grid search algorithm to adjust parameters of a genetic programming algorithm to construct the obtained characteristics;
and the application module performs weight distribution on the constructed features and classifies the stocks according to different change trends of the historical market data by using the constructed features.
On the other hand, a stock characteristic construction method based on genetic programming is realized based on the stock characteristic construction system based on the genetic programming, and specifically comprises the following steps:
step 1: the preparation module preprocesses the genetic programming algorithm, establishes a terminal set TerminalSet and a function set function, and sets parameters of the genetic programming algorithm;
step 1.1: inputting stock characteristics into a TerminalSet, wherein for a plurality of binary trees contained in each individual in the genetic programming algorithm, leaf node contents are randomly selected from the TerminalSet, and the TerminalSet is a set of leaf node values of the binary trees;
the stock characteristics include: max, min, open, close, avg; wherein min represents the minimum value of the stock's stock price in one day, max represents the maximum value of the stock's stock price in one day, open represents the stock's opening price in the day, close represents the stock's closing price in the day, and avg represents the average value of the stock's stock price in the day.
Step 1.2: and (3) the operation character: +,/input into function set, which is the set of internal nodes of the binary tree;
step 1.3: setting parameters of a genetic programming algorithm, wherein n _ pop represents the number of individuals in a population; n _ gen represents the number of iterations; gene _ num indicates the number of factors above each individual.
Step 2: the execution module generates an initial community, generates a population according to the set population size, and adjusts parameters by using a grid search algorithm;
step 2.1: the method comprises the following steps that a terminal is arranged in a preparation module, a TerminalSet and a function set are combined, and an initial community is generated according to the two sets: selecting from TerminalSet and FunctionSet, wherein the FunctionSet is a set of nodes in a binary tree, the TerminalSet is a set of leaf node values of the binary tree, factor construction is carried out through the two sets, generation of individuals is carried out according to the number of factors in the individuals, one individual comprises a plurality of binary trees, each binary tree is a constructed factor, wherein OFi represents one feature, i is a serial number of the feature, i >0, and i is an integer;
step 2.2: carrying out genetic operation on the individuals to generate a new community;
the genetic operation comprises a crossing and variation algorithm, crossing and variation are carried out on the constructed factors, and new individuals obtained through crossing and variation are placed into a new generation of population to generate a new tribe;
step 2.3: repeating the step 2.1 to the step 2.2 until the maximum iteration times is reached, jumping out of the loop, and executing the step 2.4;
the fitness evaluation is as follows: converting the original stock data according to the constructed factor to generate converted data; classifying the stocks by using a decision tree algorithm, training and predicting the converted data by using the decision tree algorithm, and evaluating the decision tree classification result by using an evaluation formula: error =1-maxP (i/t), where i is the number of correct classifications, t is the total number, P represents the probability, max represents the maximum value, error represents the fraction of classification errors, and the closer the classification Error value is to 0, the better the classification result is. The obtained score is used as the fitness score of an individual;
step 2.4: adjusting parameters by using a grid search algorithm to obtain result characteristics;
the parameters specifically include: n _ pop represents the number of individuals in the population; n _ gen represents the number of iterations; gene _ num represents the number of factors per individual;
defining the range and the variable span of each variable in the grid search algorithm, and adjusting each parameter one by one when the grid search algorithm is used, wherein only one variable is changed each time;
the resulting features contain a set of equations generated by the +, -, ×, -operations performed on the raw data features, as shown in the following equation: operator (OF 1, OF 2), wherein operator is one OF +, -,/OFi is a generalized representation OF a feature, i is a positive integer.
And step 3: the application module constructs a feature weight assignment and applies the constructed features to the stock taxonomy.
Step 3.1: the application module performs weight distribution on the result characteristics by using a decision tree algorithm:
dividing the change trend of the stocks into three types of rising, falling and oscillation according to historical market data, training the historical market data of one stock of each type by using a decision tree, converting the data according to constructed features, and taking the own kiney index of the decision tree as the weight of each constructed feature after the decision tree is trained; the stock can be divided into three types of rising, falling and oscillation according to the change of the historical quotation data until each type of classification is constructed, and the characteristics constructed according to the type of the historical quotation data of the stock are obtained;
in the process of fitting, the decision tree algorithm is used for each constructed feature according to a formula: gini = ∑ Σ i=1 p i (1-p i ) Wherein i is a positive integer, p i Expressing the probability, calculating a corresponding kini index, namely the weight, executing a genetic programming algorithm to construct the characteristics after running to obtain a group of constructed characteristics, and performing addition operation on the original characteristics; the method comprises the steps of converting an original data set according to a group of constructed features, applying the converted data to a decision tree algorithm, when a decision tree is used for fitting, distributing weight to each constructed feature, wherein the larger the weight is, the better the currently constructed feature is, sorting the constructed features from large to small according to the weight, and taking the top n features as required for specific application.
Step 3.2: the trend of the stocks is divided into three types of rising, falling and shaking according to the change of the historical quotation data of the stocks, and the stocks are classified by using the construction characteristics.
Calculating historical market data OF three known types OF stocks and unknown types OF stocks according to the constructed characteristics, wherein operators (OF 1 and OF 2) respectively obtain a group OF vectors; similarity calculation is carried out on the obtained vectors by using a Pearson correlation coefficient, the more the result is close to 1, the more similar the result is, the more similar the type of the unknown stock is consistent with the stock type with the maximum similarity; the Pearson correlation coefficient calculation formula is as follows: wherein i is a positive integer, x i A value in the vector representing the known class,
Figure BDA0003980352090000041
a mean of vectors representing known classes; y is i Represents a value in the location category vector, is>
Figure BDA0003980352090000042
Representing the mean of the unknown class vectors.
Figure BDA0003980352090000051
The invention has the beneficial effects that:
the invention provides a stock characteristic construction system and method based on genetic programming, which designs own crossover operator and mutation operator to improve the algorithm, and finds out proper constructed characteristic number for different types of models; for the classification of stocks, because different characteristics are constructed according to different trends, each characteristic has a weight for the trend, the newly input stock history data can be calculated according to the constructed characteristics to form a vector, the similarity between the newly input stock and the known stock vector is calculated, the trend of the stock is the same as the stock trend with the maximum similarity, and thus the trend of the newly input stock can be rapidly and accurately judged, and the stock can be classified according to the trend.
Drawings
FIG. 1 is a flow chart of a stock classification operation in an embodiment of the present invention;
FIG. 2 is a diagram showing an example of genetic programming individuals in the embodiment of the present invention;
FIG. 3 is a flowchart of genetic programming in an embodiment of the present invention;
FIG. 4 is a diagram illustrating examples of genetic programming individuals after fitness evaluation according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating an exemplary inter-individual interleaving operation in an embodiment of the present invention;
FIG. 6 is a diagram illustrating an exemplary individual performing mutation operations according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating exemplary factors constructed in an embodiment of the present invention;
FIG. 8 is a diagram of an example of weighted genetic programming individuals in an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings.
On one hand, a stock characteristic construction system based on genetic programming is disclosed, as shown in fig. 1, the stock characteristic construction algorithm based on genetic programming is applied to a data layer of a stock classification system, and performs stock characteristic construction through data in a server and a feedback factor analysis report, and comprises a preparation module, an execution module and an application module;
the preparation module preprocesses the genetic programming algorithm, establishes a terminal set TerminalSet and a function set function, and sets parameters of the genetic programming algorithm;
the execution module generates an initial community, generates a population according to the set population size, and uses a grid search algorithm to adjust parameters of a genetic programming algorithm to construct the obtained characteristics;
and the application module performs weight distribution on the constructed features and classifies the stocks according to different change trends of the historical market data by using the constructed features.
On the other hand, a stock characteristic construction method based on genetic programming is realized based on the stock characteristic construction system based on the genetic programming, and specifically comprises the following steps:
step 1: the preparation module preprocesses the genetic programming algorithm, establishes a terminal set TerminalSet and a function set function, and sets parameters of the genetic programming algorithm;
genetic programming algorithms are a branch of evolutionary algorithms, each individual being a computer program, a model or a function, etc., as shown in fig. 2. Genetic programming is one of the most classical optimization methods, with the aim of automatically constructing programs to solve problems independent of their domain; the main idea is to realize population evolution by simulating the human reproduction genetic process, generating populations and carrying out crossover and variation on chromosomes among individuals in the populations, and eliminating the populations in favor of the disadvantages under the rules of a target function (fitness evaluation), thereby realizing the acquisition of an optimal solution and solving an optimal solution structure. Genetic programming is an iterative algorithm that selects an optimal value based on fitness. As shown in fig. 3, the detailed steps are as follows:
step 1.1: inputting stock characteristics into a TerminalSet, and randomly selecting leaf node contents from the TerminalSet for a plurality of binary trees contained in each individual in a genetic programming algorithm, wherein the TerminalSet is a set of leaf node values of the binary trees;
the stock characteristics include: max, min, open, close, avg; wherein min represents the minimum value of the stock's stock price in one day, max represents the maximum value of the stock's stock price in one day, open represents the stock's opening price in the day, close represents the stock's closing price in the day, and avg represents the average value of the stock's stock price in the day.
Step 1.2: and (3) the operation character: +,/input into function set, which is the set of internal nodes of the binary tree;
step 1.3: setting parameters of a genetic programming algorithm, wherein n _ pop represents the number of individuals in a population; n _ gen represents the number of iterations; gene _ num indicates the number of factors above each individual.
Step 2: the execution module generates an initial community, generates a population according to the set population size, and adjusts parameters by using a grid search algorithm;
step 2.1: the method comprises the following steps that a terminal is arranged in a preparation module, and a terminal is combined with a TerminalSet and a function set to generate an initial community according to the two sets: selecting from TerminalSet and FunctionSet, wherein the FunctionSet is a set of nodes in a binary tree, the TerminalSet is a set of leaf node values of the binary tree, factor construction is performed through the two sets, generation of individuals is performed according to the number of factors in the individuals, as shown in FIG. 2, the individuals comprise a plurality of binary trees, each binary tree is a constructed factor, wherein OFi (i >0 and i is an integer) represents a feature, and i is a serial number of the feature;
step 2.2: performing genetic manipulation (such as crossing and mutation) on the individuals to generate a new community;
the genetic operation comprises a crossing and variation algorithm, crossing and variation are carried out on the constructed factors, and new individuals obtained through crossing and variation are placed into a new generation of population to generate a new tribe; the crossover process is shown in FIG. 5 and the mutation process is shown in FIG. 6.
Step 2.3: repeating the step 2.1 to the step 2.2 until the maximum iteration times is reached, jumping out of the loop, and executing the step 2.4;
the fitness evaluation is as follows: converting the original stock data according to the constructed factor to generate converted data; classifying the stocks by using a decision tree algorithm, training and predicting the converted data by using the decision tree algorithm, and evaluating the decision tree classification result by using an evaluation formula: error =1-maxP (i/t), where i is the number of correct classifications, t is the total number, P represents the probability, max represents the maximum value, error represents the fraction of classification errors, and the closer the classification Error value is to 0, the better the classification result is. The obtained score is used as the fitness score of an individual. As shown in fig. 4, the fitness score of the current individual is in the upper left corner of the graph.
Step 2.4: adjusting parameters by using a grid search algorithm to obtain result characteristics;
the method for constructing the characteristics based on the genetic programming is applied to a scene of classifying stocks according to different stock variation trends, the characteristics construction method provided by the invention is used for adjusting parameters by using grid search, and components in the genetic programming are adjusted to enable the constructed characteristics to better accord with the variation trend corresponding to the stocks.
The parameters specifically include: n _ pop represents the number of individuals in the population; n _ gen represents the number of iterations; gene _ num represents the number of factors per individual;
defining the range and the variable span of each variable in the grid search algorithm, and adjusting each parameter one by one when the grid search algorithm is used, wherein only one variable is changed each time; since the parameters are adjusted together, the operation is very slow, and how the combination of the parameters is obtained cannot be understood by adjusting the parameters at the same time, the determination of one variable by one variable is similar to a control variable method, and only one variable is changed at a time, so that the combination result of the parameters is obtained. Because the genetic programming algorithm contains many variables, such as: the parameters are adjusted according to the number of times of algorithm iteration/the number of formulas contained in a group of result characteristics, and the like, so that the quality of the generated result characteristics can be ensured;
the resulting features contain a set of equations generated by the +, -, ×, -operations performed on the original data features, as shown in fig. 2, where OFi is a generalized representation of the feature name and i is a positive integer.
And step 3: the application module constructs feature weight assignments and applies the constructed features to a stock classification algorithm.
Step 3.1: after the genetic programming algorithm in the execution module is executed, a set of constructed features is generated, as shown in fig. 2, where OFi is a generalized representation of the features, and i is a positive integer. The application module performs weight distribution on the result characteristics by using a decision tree algorithm:
because the constructed features obtained by the method of the present invention may not be all needed when performing stock trend determinations. The decision tree algorithm is used for distributing weights to the features, the stock variation trend is divided into three types of rising, falling and oscillation according to historical market data, the decision tree is used for training the historical market data of one stock of each type, the data are converted according to the constructed features, and the own Gini index of the decision tree after training is taken as the weight of each constructed feature; the change trend of the stocks can be divided into three types of rising, falling and oscillation according to the change of the historical quotation data until each type of classification is constructed, and the characteristics constructed according to the historical quotation data of the stocks of the type are obtained, so that the trends of newly input stocks can be quickly and accurately judged according to the constructed characteristics by using stocks with known trends. For classifying the stocks according to the trends, when the trend of each newly input stock is judged, the historical data of the newly input stocks are calculated according to the characteristics obtained by constructing corresponding different trends to obtain vectors, and the change trend of the newly input stocks is judged by calculating the vector similarity of the newly input stocks with the known trend stocks.
In the process of data conversion according to the result individuals, as shown in fig. 7: two characteristic values of max (the maximum value of the stock in one day) and min (the minimum value of the stock in one day) in the original data are converted according to the constructed factors, and add (addition) in the graph indicates that the two characteristics are converted into a new characteristic row.
In the fitting process of the decision tree algorithm, each constructed feature is according to a formula: gini = ∑ Σ i=1 p i (1-p i ) Wherein i is a positive integer, p i Representing the probability, calculating the corresponding kini index, i.e. weight, and after operation, each individual is as shown in fig. 8. And (3) executing a genetic programming algorithm to construct a group of constructed features, as shown in fig. 7, one constructed feature, and performing addition operation on the original features. Converting the original data set according to a set of constructed features, applying the converted data to a decision tree algorithm, and fitting each data set with the decision tree algorithmThe weight is distributed to the constructed features, the larger the weight is, the better the constructed features are, the features obtained by the construction are sorted from large to small according to the weight, and the top n (n is a positive integer) features can be selected according to the requirement to be specifically applied.
As can be seen from fig. 7, after the decision tree is directly used for fitting, each constructed feature has a score as a weight of the current feature to the current trend, that is, the importance of the current feature to the current trend is identified by a score. After the characteristics are constructed by using the genetic programming, the factors constructed for the current trend need to be recorded to judge the trend of the newly input stock
Step 3.2: the trend of the stocks can be divided into three types of rising, falling and shaking according to the change of the historical quotation data of the stocks, and the stocks are classified by using the construction characteristics.
In the embodiment of the invention, a specific crossover operator and a mutation operator are designed to improve the algorithm, and for different types of models, a proper constructed feature number is found; for the classification of stocks, because different characteristics are constructed according to different trends, each characteristic has a weight for the trend, the newly input stock history data can be calculated according to the constructed characteristics to form a vector, the similarity between the newly input stock and the known stock vector is calculated, the trend of the stock is the same as the stock trend with the maximum similarity, and thus the trend of the newly input stock can be rapidly and accurately judged, and the stock can be classified according to the trend.
When the system constructs the characteristics, the characteristic set is transmitted to the model, generated factors are analyzed when the next part of the workflow is executed, the result is fed back, and the components and parameters of the algorithm are adjusted according to the feedback, so that the accuracy rate of classifying the stocks according to the trend is improved, the analysis of the stocks is better performed, and the users are better guided to buy and sell the stocks.
Factors within the original data set include: date (stock data collection date), return (stock earning rate), open (stock opening price), min (minimum stock price), max (maximum stock price), close (stock closing price), volume (stock trading volume) and the like, wherein the corresponding constructed features are used in a targeted manner, the first 20 constructed factors with the highest score are extracted, the original data set is converted, then the factors are applied to stock trend judgment, and stocks are classified according to different trends. .
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit of the invention, which is defined by the claims.

Claims (5)

1. A stock characteristic construction system based on genetic programming is characterized in that: the system comprises a preparation module, an execution module and an application module;
the preparation module preprocesses the genetic programming algorithm, establishes a terminal set TerminalSet and a function set function, and sets parameters of the genetic programming algorithm;
the execution module generates an initial community, generates a population according to the set population size, and uses a grid search algorithm to adjust parameters of a genetic programming algorithm to construct the obtained characteristics;
and the application module performs weight distribution on the constructed features and classifies the stocks according to different change trends of the historical market data by using the constructed features.
2. A stock characteristic construction method based on genetic programming, which is realized based on the stock characteristic construction system based on genetic programming in claim 1, is characterized by comprising the following steps:
step 1: the preparation module preprocesses the genetic programming algorithm, establishes a terminal set TerminalSet and a function set FunctionSet, and sets parameters of the genetic programming algorithm;
and 2, step: the execution module generates an initial community, generates a population according to the set population size, and adjusts parameters by using a grid search algorithm to obtain result characteristics;
and 3, step 3: the application module constructs a feature weight assignment and applies the constructed features to the stock taxonomy.
3. The method for constructing stock features based on genetic programming as claimed in claim 2, wherein the step 1 comprises the following steps:
step 1.1: inputting stock characteristics into a TerminalSet, wherein for a plurality of binary trees contained in each individual in the genetic programming algorithm, leaf node contents are randomly selected from the TerminalSet, and the TerminalSet is a set of leaf node values of the binary trees;
the stock characteristics include: max, min, open, close, avg; wherein min represents the minimum value of the stock price in one day, max represents the maximum value of the stock price in one day, open represents the opening price of the stock in the day, close represents the closing price of the stock in one day, and avg represents the average value of the stock price in one day;
step 1.2: the operation characters are as follows: +, -,/input into function set, which is a set of internal nodes in the binary tree;
step 1.3: setting parameters of a genetic programming algorithm, wherein n _ pop represents the number of individuals in a population; n _ gen represents the number of iterations; gene _ num indicates the number of factors above each individual.
4. The method for constructing stock features based on genetic programming as claimed in claim 2, wherein the step 2 comprises the following steps:
step 2.1: the method comprises the following steps that a terminal is arranged in a preparation module, and a terminal is combined with a TerminalSet and a function set to generate an initial community according to the two sets: selecting from TerminalSet and FunctionSet, wherein FunctionSet is a set of nodes in a binary tree, terminalSet is a set of leaf node values of the binary tree, factor construction is performed through the two sets, generation of individuals is performed according to the number of factors in the individuals, one individual comprises a plurality of binary trees, each binary tree is a constructed factor, and one factor is a formula as shown in the following formula: operator (OF 1, OF 2), wherein operator is one OF +, -,/OFi is a generalized representation OF a feature, i is a positive integer;
step 2.2: carrying out genetic operation on the individuals to generate a new community;
the genetic operation comprises a crossing and variation algorithm, crossing and variation are carried out on the constructed factors, and new individuals obtained through crossing and variation are placed into a new generation of population to generate a new tribe;
step 2.3: repeating the step 2.1 to the step 2.2 until the maximum iteration times is reached, jumping out of the loop, and executing the step 2.4;
the fitness evaluation is as follows: converting the original stock data according to the constructed factor to generate converted data; classifying the stocks by using a decision tree algorithm, training and predicting the converted data by using the decision tree algorithm, and evaluating the decision tree classification result by using an evaluation formula: error =1-maxP (i/t), where i is the number of correct classifications, t is the total number, P represents the probability, max represents the maximum value, error represents the fraction of classification errors, and the closer the classification Error value is to 0, the better the classification result is; taking the obtained score as the fitness score of an individual;
step 2.4: adjusting parameters by using a grid search algorithm to obtain result characteristics;
the parameters specifically include: n _ pop represents the number of individuals in the population; n _ gen represents the number of iterations; gene _ num represents the number of factors per individual;
defining the range and the variable span of each variable in the grid search algorithm, and adjusting each parameter one by one when the grid search algorithm is used, wherein only one variable is changed each time;
the result characteristics comprise a group of formulas generated by + -, -x, -div operations on the characteristics of original data, wherein the original data comprise open stock opening price, close stock closing price, max stock maximum value, min stock minimum value and volume transaction amount; as shown in the following formula: operator (OF 1, OF 2), wherein operator is one OF +, -,/OFi is generalized representation OF a feature, and i is a positive integer.
5. The method for constructing stock characteristics based on genetic programming as claimed in claim 2, wherein the step 3 comprises the following steps:
step 3.1: the application module performs weight distribution on the result characteristics by using a decision tree algorithm:
dividing the change trend of the stocks into three types of rising, falling and oscillation according to historical market data, training the historical market data of one stock of each type by using a decision tree, converting the data according to constructed characteristics, and taking the own kini index of the decision tree as the weight of each constructed characteristic after training; the stock can be divided into three types of rising, falling and oscillation according to the change of the historical quotation data until each type of classification is constructed, and the characteristics constructed according to the type of the historical quotation data of the stock are obtained;
in the process of fitting, the decision tree algorithm is used for each constructed feature according to a formula: gini = ∑ Σ i=1 p i (1-p i ) Wherein i is a positive integer, p i Expressing the probability, calculating a corresponding kini index, namely the weight, executing a genetic programming algorithm to construct the characteristics after running to obtain a group of constructed characteristics, and performing addition operation on the original characteristics; converting an original data set according to a group of constructed features, applying the converted data to a decision tree algorithm, when the decision tree is used for fitting, distributing weight to each constructed feature, wherein the larger the weight is, the better the currently constructed feature is, and performing big-from-big on the constructed features according to the weightIn the small sorting, the top n characteristics are selected and sorted according to the requirement for specific application, wherein n is a positive integer;
step 3.2: dividing the trend of the stock into three types of rising, falling and oscillation according to the change of the historical quotation data of the stock, and classifying the stock by using the construction characteristics;
calculating historical market data OF three known types OF stocks and unknown types OF stocks according to the constructed characteristics, wherein operators (OF 1 and OF 2) respectively obtain a group OF vectors; similarity calculation is carried out on the obtained vectors by using a Pearson correlation coefficient, the more the result is close to 1, the more similar the result is, the more similar the type of the unknown stock is consistent with the stock type with the maximum similarity; the Pearson correlation coefficient calculation formula is as follows: wherein i is a positive integer, x i A value in the vector representing the known class,
Figure FDA0003980352080000031
a mean of vectors representing known classes; y is i Represents a value in the location category vector, is>
Figure FDA0003980352080000032
Mean representing unknown class vector: />
Figure FDA0003980352080000033
/>
CN202211546813.1A 2022-12-05 2022-12-05 Stock characteristic construction system and method based on genetic programming Pending CN115936888A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211546813.1A CN115936888A (en) 2022-12-05 2022-12-05 Stock characteristic construction system and method based on genetic programming

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211546813.1A CN115936888A (en) 2022-12-05 2022-12-05 Stock characteristic construction system and method based on genetic programming

Publications (1)

Publication Number Publication Date
CN115936888A true CN115936888A (en) 2023-04-07

Family

ID=86550061

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211546813.1A Pending CN115936888A (en) 2022-12-05 2022-12-05 Stock characteristic construction system and method based on genetic programming

Country Status (1)

Country Link
CN (1) CN115936888A (en)

Similar Documents

Publication Publication Date Title
Jiménez et al. Multi-objective evolutionary feature selection for online sales forecasting
Han et al. A survey on metaheuristic optimization for random single-hidden layer feedforward neural network
CN106779087B (en) A kind of general-purpose machinery learning data analysis platform
CN106096727B (en) A kind of network model building method and device based on machine learning
CN107766929B (en) Model analysis method and device
CN110008983A (en) A kind of net flow assorted method of the adaptive model based on distributed fuzzy support vector machine
CN111061959B (en) Group intelligent software task recommendation method based on developer characteristics
CN111105045A (en) Method for constructing prediction model based on improved locust optimization algorithm
CN108446214B (en) DBN-based test case evolution generation method
CN112906890A (en) User attribute feature selection method based on mutual information and improved genetic algorithm
CN112529638A (en) Service demand dynamic prediction method and system based on user classification and deep learning
Czajkowski et al. Steering the interpretability of decision trees using lasso regression-an evolutionary perspective
Zou et al. An evolutionary algorithm based on dynamic sparse grouping for sparse large scale multiobjective optimization
Chundawat et al. Tabsyndex: a universal metric for robust evaluation of synthetic tabular data
Martins et al. Evotype: from shapes to glyphs
Oloruntoba et al. Clan-based cultural algorithm for feature selection
CN112819499A (en) Information transmission method, information transmission device, server and storage medium
CN114091794A (en) Patent value evaluation model training method, evaluation method, device and equipment
CN115936888A (en) Stock characteristic construction system and method based on genetic programming
WO2009015069A1 (en) Methods and systems of evaluating forest management and harvesting schemes
CN112949954A (en) Method for establishing financial fraud recognition model based on recognition learning
Arifin Telecommunication service subscriber churn likelihood prediction analysis using diverse machine learning model
Murdock et al. Identifying species by genetic clustering.
Xiong et al. L-RBF: A customer churn prediction model based on lasso+ RBF
CN108573264A (en) A kind of household industry potential customers' recognition methods based on novel bee group's clustering algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination