CN113723728A - Factor checking method and system - Google Patents

Factor checking method and system Download PDF

Info

Publication number
CN113723728A
CN113723728A CN202010456034.7A CN202010456034A CN113723728A CN 113723728 A CN113723728 A CN 113723728A CN 202010456034 A CN202010456034 A CN 202010456034A CN 113723728 A CN113723728 A CN 113723728A
Authority
CN
China
Prior art keywords
factor
parameters
machine learning
learning model
verification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010456034.7A
Other languages
Chinese (zh)
Inventor
刘宇博
徐林
路宏琦
李敖
景越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zetyun Tech Co ltd
Original Assignee
Beijing Zetyun Tech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zetyun Tech Co ltd filed Critical Beijing Zetyun Tech Co ltd
Priority to CN202010456034.7A priority Critical patent/CN113723728A/en
Publication of CN113723728A publication Critical patent/CN113723728A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a factor checking method and a factor checking system, comprising the following steps: displaying a user interface, and receiving a first configuration operation on the user interface; setting parameters based on the first configuration operation; generating a machine learning model of the factor according to the set parameters; a test result of a factor is obtained based on the machine learning model. In the embodiment of the invention, the factor checking system uses the machine learning model to verify all the factors, so that the checking accuracy in factor checking is improved.

Description

Factor checking method and system
Technical Field
The invention relates to the technical field of financial and big data analysis, in particular to a factor checking method and a factor checking system.
Background
Pricing and price trend prediction of financial assets have been the core problem of investment field research. Securities, which are the major investing financial assets, have long been an important subject of relevant research and practice. The basic principle of factor quantitative investment is to select a series of factors which are explanatory to the asset income as explanatory variables to build a model. The basic principle of factor stock selection is to adopt a certain factor or some factors as the standard of stock selection, and the stocks meeting the factors are bought and the stocks not meeting the factors are sold.
Currently, the conventional factor model uses an Information Coefficient (IC) of a factor section sequence and a section excess yield sequence to check the validity of the factor, and generally considers that the absolute value mean of the IC sequence is greater than a certain threshold value and then considers the factor as a valid factor.
However, a simple IC cannot reflect all the information of data. For example, though the overall historical rule of the index can be obtained by putting data together for overall regression, the market does not keep a style for a long time, and the overall IC ignores a lot of important information.
Disclosure of Invention
The embodiment of the invention provides a factor checking method and a factor checking system, which solve the problem of poor checking accuracy when the existing factor model carries out factor checking.
In order to solve the above technical problem, the present invention provides a factor checking method, including:
displaying a user interface, and receiving a first configuration operation on the user interface;
setting parameters based on the first configuration operation;
generating a machine learning model of the factor according to the set parameters;
a test result of a factor is obtained based on the machine learning model.
Optionally, in the foregoing method, the step of setting parameters based on the first configuration operation includes:
configuring quantitative strategy parameters and model parameters based on the first configuration operation;
wherein the quantitative policy parameters include at least one of target parameters, transaction parameters, and factor data;
the model parameters include at least one of machine learning algorithm parameters, preset expected profitability parameters and return evaluation parameters.
Optionally, in the foregoing method, the step of generating the machine learning model of the factor according to the set parameter includes:
generating an AI quantization strategy workflow based on the set parameters;
and operating the AI quantitative strategy workflow to carry out machine learning model training to generate a machine learning model of the factors.
Optionally, in the method, the step of running the AI quantization strategy workflow for machine learning model training to generate a machine learning model of factors includes:
taking the factor data as input data of a machine learning model;
and the AI quantitative strategy workflow performs machine learning model training on the input data according to the model parameters.
Optionally, in the above method, before the step of using the factor data as input data of the machine learning model, the method further includes:
performing data preprocessing on the factor data, the data preprocessing including at least one of:
sampling, missing value processing, standardization, normalization, data set splitting, data type conversion, numerical value coding, feature binarization and feature deletion;
the step of using the factor data as input data for a machine learning model comprises:
using the factor data after data preprocessing as input data of a machine learning model.
Optionally, in the foregoing method, the step of performing machine learning model training on the input data by the AI quantization strategy workflow according to the model parameter includes:
and carrying out hyper-parameter tuning on the machine learning model.
Optionally, in the above method, the step of obtaining a test result of the factor based on the machine learning model includes:
evaluating the machine learning model to obtain an optimal model and an importance result of each characteristic of the optimal model;
sorting the importance results of the features;
and the factor corresponding to the feature with the sorting order smaller than the first threshold value is a valid factor.
Optionally, the method further includes:
and verifying the effective factors to obtain a verification result.
Optionally, in the foregoing method, the verification result includes stability, and the step of verifying the valid factor includes:
calculating median, mean and standard deviation of the effective factors;
judging the distribution form of the effective factors according to the median, the mean and the standard deviation;
and determining the stability of the effective factor according to the distribution form.
Optionally, in the foregoing method, the verification result includes an industry coverage, and the step of verifying the effective factor includes:
calculating the coverage rate of the effective factor;
and determining the industry coverage of the factor according to the coverage of the effective factor.
Optionally, in the foregoing method, the verification result includes a significance test result, and the step of verifying the valid factor includes:
calculating a regression result of the effective factor by using a two-time section regression test method;
and performing hypothesis test on the regression result to obtain a significance test result of the effective factor.
Optionally, in the foregoing method, the verification result includes a correlation, and the step of verifying the valid factor includes:
calculating information coefficients of the significant factors;
and judging the correlation between the effective factor and the income according to the information coefficient.
Optionally, in the foregoing method, the verification result includes monotonicity, and the step of verifying the effective factor includes:
grading the selected target according to the factor value of the effective factor;
calculating the combined income of the targets corresponding to each file;
and predicting monotonicity of the result according to the combined income judgment factor.
Optionally, in the foregoing method, the verification result includes validity and directionality, and the step of verifying the validity factor includes:
sorting the selected targets according to the factor values of the effective factors;
selecting the targets corresponding to the factor values with the ranking less than the second threshold value as the combination of multiple targets;
selecting the target corresponding to the factor value with the ranking larger than the third threshold value as the combination of the blank-looking targets;
respectively calculating the benefits of the combination of the multi-target and the combination of the blank target;
and judging the effectiveness and the directionality of the effective factor based on the two calculated groups of benefits.
The present invention also provides a factor verification system, comprising:
the device comprises a receiving module, a processing module and a display module, wherein the receiving module is used for displaying a user interface and receiving first configuration operation of the user interface;
the parameter setting module is used for setting parameters based on the first configuration operation;
the generating module is used for generating a machine learning model of the factor according to the set parameters;
a verification module to obtain a verification result of the factor based on the machine learning model.
Optionally, in the factor checking system, the parameter setting module is specifically configured to:
configuring quantitative strategy parameters and model parameters based on the first configuration operation;
wherein the quantitative policy parameters include at least one of target parameters, transaction parameters, and factor data;
the model parameters include at least one of machine learning algorithm parameters, preset expected profitability parameters and return evaluation parameters.
Optionally, in the factor checking system, the generating module includes:
the generating unit is used for generating an AI quantization strategy workflow based on the set parameters;
and the operation unit is used for operating the AI quantization strategy workflow to carry out machine learning model training and generating a machine learning model of the factors.
Optionally, in the factor checking system, the operation unit is specifically configured to:
taking the factor data as input data of a machine learning model;
and the AI quantitative strategy workflow performs machine learning model training on the input data according to the model parameters.
Optionally, in the factor checking system, the generating module further includes:
the preprocessing unit is used for preprocessing the factor data;
the data pre-processing includes at least one of: sampling, missing value processing, standardization, normalization, data set splitting, data type conversion, numerical value coding, feature binarization and feature deletion;
the execution unit is specifically configured to use the factor data after data preprocessing as input data of a machine learning model.
Optionally, in the factor checking system, the generating module further includes:
and the tuning unit is used for carrying out super-parameter tuning on the machine learning model.
Optionally, in the factor checking system, the checking module is specifically configured to:
evaluating the machine learning model to obtain an optimal model and an importance result of each characteristic of the optimal model;
sorting the importance results of the features;
and the factor corresponding to the feature with the sorting order smaller than the first threshold value is a valid factor.
Optionally, in the factor checking system, the factor checking system further includes:
and the verification module is used for verifying the effective factors to obtain a verification result.
Optionally, in the factor verification system, the verification result includes stability, and the verification module is specifically configured to:
calculating median, mean and standard deviation of the effective factors;
judging the distribution form of the effective factors according to the median, the mean and the standard deviation;
and determining the stability of the effective factor according to the distribution form.
Optionally, in the factor verification system, the verification result includes industry coverage, and the verification module is specifically configured to:
calculating the coverage rate of the effective factor;
and determining the industry coverage of the factor according to the coverage of the effective factor.
Optionally, in the factor verification system, the verification result includes a significance verification result, and the verification module is specifically configured to:
calculating a regression result of the effective factor by using a two-time section regression test method;
and performing hypothesis test on the regression result to obtain a significance test result of the effective factor.
Optionally, in the factor verification system, the verification result includes a correlation, and the verification module is specifically configured to:
calculating information coefficients of the significant factors;
and judging the correlation between the effective factor and the income according to the information coefficient.
Optionally, in the factor verification system, the verification result includes monotonicity, and the verification module is specifically configured to:
grading the selected target according to the factor value of the effective factor;
calculating the combined income of the targets corresponding to each file;
and predicting monotonicity of the result according to the combined income judgment factor.
Optionally, in the factor checking system, the verification result includes validity and directionality, and the verification module is specifically configured to:
sorting the selected targets according to the factor values of the effective factors;
selecting the targets corresponding to the factor values with the ranking less than the second threshold value as the combination of multiple targets;
selecting the target corresponding to the factor value with the ranking larger than the third threshold value as the combination of the blank-looking targets;
respectively calculating the benefits of the combination of the multi-target and the combination of the blank target;
and judging the effectiveness and the directionality of the effective factor based on the two calculated groups of benefits.
The invention also provides a factor checking system, which comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the steps of the factor checking method when being executed by the processor.
The invention also provides a computer-readable storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned factor checking method.
The technical scheme of the invention has the following beneficial effects:
in the embodiment of the invention, the factor checking system uses the machine learning model to verify the factor, so that the checking accuracy and convenience in factor checking are improved; in addition, in the embodiment of the invention, the factors are also tested through various models and multiple dimensions, and the effectiveness and the significance of the factors are evaluated through the test results of different models, so that the uncertainty of single factor test is solved, and the test accuracy of the factor test is further improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flow chart of a factor checking method provided by an embodiment of the present invention;
FIG. 2 is a flow chart of an AI quantization strategy workflow for factor verification provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a user interface for parameter setting of the factor verification method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another user interface for parameter setting for a factor verification method provided by an embodiment of the invention;
FIG. 5 is a diagram illustrating the test results of the factor test method according to an embodiment of the present invention;
FIG. 6 is a flow chart of a further factor checking method provided by an embodiment of the present invention;
FIG. 7 is a schematic diagram of a factor verification user interface provided by one embodiment of the present invention;
FIG. 8 is a diagram illustrating IC analysis test results provided by an embodiment of the present invention;
FIG. 9 is a schematic diagram of another IC analysis test result provided by an embodiment of the present invention;
FIG. 10 is a graph illustrating the results of a multi-combination profitability test provided by an embodiment of the present invention;
FIG. 11 is a diagram illustrating the results of an inspection of the combined profitability of the sky view provided by an embodiment of the present invention;
FIG. 12 is a block diagram of a factor checking system provided in accordance with an embodiment of the present invention;
fig. 13 is a block diagram of another factor checking system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a factor checking method according to an embodiment of the present invention. The factor verification method can be applied to a factor verification system, as shown in fig. 1, and includes the steps of:
step 101, displaying a user interface, and receiving a first configuration operation for the user interface.
Here, the factor verification system may be a data analysis system (machine learning platform). The invention can create AI (Artificial Intelligence) quantitative strategy workflow in the factor checking system for factor detection.
The display interface may be a creation interface of the AI quantization policy workflow, and the first configuration operation may be a selection and/or input operation of a user on a parameter configuration item on the user interface, for example, selecting a corresponding configuration item by clicking or the like to perform parameter setting.
For example, as shown in fig. 2, an AI quantization strategy workflow for factor verification may include: the system comprises a data module, a data preprocessing module, a data set splitting module, a machine learning algorithm training module and an algorithm evaluation module.
The data module is used for acquiring a data set for machine learning algorithm training. The data module includes factor data including stock market data, wherein each field in the stock market data can be considered a factor. The type of factor data can be financial factor data, technical factor data, and/or market factor (i.e., market factor) data, etc., and the specific factor can be at least one of the following: profit factor, financial leverage, profitability, market sales rate, asset return rate, multiple-day average line, dissimilarity moving average line MACD. Factor data may use periodic data, typically one year data, such as a one year historical data for the factor. And for more accurate factorial testing, the decade data may also be preferably used.
The data preprocessing module is used for preprocessing the data set acquired by the data module, such as sampling, missing value processing, normalization and the like. The data set splitting module splits the data set processed by the data preprocessing module into a training set and a test set. The machine learning algorithm training module performs machine learning algorithm training using a training set, and the machine learning algorithm may include a binary algorithm, such as a neural network, a random forest, an XGBoost, and the like. And the algorithm evaluation module adopts the test set to evaluate the model trained by the machine learning algorithm training module to obtain an evaluation result.
And 102, setting parameters based on the first configuration operation.
Optionally, the step of setting parameters based on the first configuration operation includes:
configuring quantitative strategy parameters and model parameters based on the first configuration operation;
wherein the quantitative policy parameters include at least one of target parameters, transaction parameters, and factor data;
the model parameters include at least one of machine learning algorithm parameters, preset expected profitability parameters and return evaluation parameters.
Wherein the parameters of the targets include stocks, futures, options, etc. The trading parameters include setting a binning period, a binning method, an upper limit on the number of position markers taken, an upper limit on each commission, an upper limit on the position taken by a single marker (e.g., stock or futures), a start date, an end date, a marker pool (e.g., stock and/or futures pool), and the like.
The user can set corresponding parameters through a quantitative policy parameter setting interface as shown in fig. 3. Illustratively, the Shanghai depth 300 of 2018/1/1-2018/12/31 is selected as the total data set, and the data set corresponding to the start date and the end date set in the transaction parameters is the data set in the data module shown in FIG. 2. The data set includes a training set and a test set. The data set can be automatically split into a training set and a testing set by the factor checking system, and the training set and the testing set can also be customized by a user.
User-defined datasets require a user to create and set the dataset at the factor verification system and then invoke it at the quantization strategy parameter setting interface. Specifically, when a custom data set is selected, a "data set" needs to be created and set in advance, a part or all of factor data is selected as feature variables to be imported, and the factor data is imported in a mode such as file or JDBC (Java DataBase Connectivity). After the table data of the factors of the database is imported, the imported factor data set is selected, a user can see data preview, can see all imported factor data fields (factor names), types (data basic types) and the like, and the setting of the characteristic variable data is completed. In addition, the factor checking system can automatically import the factor data into the data module as the characteristic variable. The foregoing is merely an exemplary illustration, and the present invention is not particularly limited thereto.
In addition, the targets can be classified by using the factor value of each target. For example, the targets are sorted according to the size of the profitability factor, the stocks which are ranked first 30% are selected as the prosperous shares, the labels are 1, the stocks which are ranked next 30% are selected as the prosperous shares, and the labels are 0. The classification of the targets can be automatically performed by the system, so that the data set is divided into a plurality of groups, machine learning algorithm model training is performed on each group respectively, and the result of each group is output.
The user can set the corresponding parameters through the model parameter setting interface as shown in fig. 4. The model parameters include at least one of machine learning algorithm parameters, preset expected profitability parameters and return evaluation parameters. The machine learning algorithm parameters include at least one of a type of machine learning algorithm, and an algorithm selection. The type of the machine learning algorithm comprises two classifications and regression, and the algorithm selection comprises selecting at least one of the following algorithms: decision tree, extra random number, Gradient progressive number, neural network, random forest, random Gradient descent, support vector machine, XGboost (eXtreme Gradient promotion, machine learning with extensible promotion tree, and optimized distributed Gradient enhancement library). The expected profitability is one of the predicted values, which is described herein by way of example, and the expected profitability is a target column, and values in the target column are processed based on the setting of the expected profitability for training of the subsequent algorithm. The system will give the default recommended expected profitability and the user can also make further custom adjustments. The return evaluation parameter is the probability of obtaining the expected yield. As shown in FIG. 4, the reward evaluation parameter may include five levels: aggressive, mild aggressive, neutral, mild conservative, etc.
And predicting to obtain a relative strong factor value of a future first-stage of the stock through a machine learning algorithm, namely a factor recommended to the user by the machine learning platform. The future period is determined based on the binning period set by the user, i.e. the next period of the calculation period set by the user, e.g. the period set by the user is 30 days.
And 103, generating a machine learning model of the factor according to the set parameters.
Optionally, the step of generating the machine learning model of the factor according to the set parameter includes:
generating an AI (artificial intelligence) quantitative strategy workflow based on the set parameters;
and operating the AI quantitative strategy workflow to carry out machine learning model training to generate a machine learning model of the factors.
The step of running the AI quantization strategy workflow for machine learning model training to generate a factored machine learning model comprises:
taking the factor data as input data of a machine learning model;
and the AI quantitative strategy workflow performs machine learning model training on the input data according to the model parameters.
Specifically, the system may perform automatic modeling based on parameter settings of the quantization strategy parameters and the model parameters to generate an AI quantization strategy workflow.
Specifically, after the quantitative strategy parameters and the model parameters are configured, the factor checking system calculates the expected profitability of the target selected from the target pool according to the historical data, and then determines the target according to the preset expected profitability in the set model parameters and the calculated expected profitability of the target selected from the target pool, so as to be used for performing subsequent trading on the target based on the strategy. As shown in FIG. 4, when 25% of the top expected profitability is selected and the expected profitability range is 8.9%, the stocks with 25% of the top profitability and 8.9% of the combined expected profitability range of the stocks of Shanghai depth 300 selected in the quantization strategy parameters interface are targeted. As shown in fig. 4, the model parameters determined in the model parameter setting interface include that the machine learning algorithm types are classified into two categories, the expected profitability ranks 25% at the top, and the expected profitability range is 8.9%, the specific machine learning algorithm includes three algorithms of a decision tree, an additional random tree, and a gradient progressive tree, and the return evaluation parameter is 55% of the advance level. And training the machine learning model based on the configured quantitative strategy parameters and the model parameters to obtain training results of the three machine learning models.
Optionally, before the step of using the factor data as input data of the machine learning model, the method further includes:
performing data preprocessing on the factor data, the data preprocessing including at least one of:
sampling, missing value processing, standardization, normalization, data set splitting, data type conversion, numerical value coding, feature binarization and feature deletion;
the step of using the factor data as input data for a machine learning model comprises:
using the factor data after data preprocessing as input data of a machine learning model.
Specifically, in the data preprocessing process, all empty column values in the data set are removed, empty column values larger than a certain proportion are deleted, missing values are filled in a mean value, median and mode, data are standardized, and the data set with a standard comparison is obtained. And performing machine learning model training by using the data set after data preprocessing.
In the embodiment of the invention, the factor data is subjected to data preprocessing and then the machine learning model training is carried out, so that errors in the machine learning model training process caused by the non-standardization of the data set can be avoided, and the efficiency of the machine learning model training is improved.
The step of performing machine learning model training on the input data by the AI quantitative strategy workflow according to the model parameters comprises:
and carrying out hyper-parameter tuning on the machine learning model.
During the training process of the machine learning model, the machine learning model can be subjected to hyper-parameter tuning to optimize the training result of the machine learning model. Specifically, the factor checking system may perform automatic tuning of the hyper-parameter, and the tuning method may include a random search speed or a grid search speed. A training set in the data set is divided into training data and verification data in the hyper-parameter tuning process, wherein the training data is used as training parameters, and the verification data is used as hyper-parameter optimization evaluation. The optimization evaluation times can be system defaults or preset in advance. The index can be evaluated based on at least one of the following models when the hyper-parameter evaluation is performed: AUC score, accuracy, precision, recall, F1 score, log loss.
And 104, obtaining a factor checking result based on the machine learning model.
Optionally, the step of obtaining a test result of the factor based on the machine learning model includes:
evaluating the machine learning model to obtain an optimal model and an importance result of each characteristic of the optimal model;
sorting the importance results of the features;
and the factor corresponding to the feature with the sorting order smaller than the first threshold value is a valid factor.
The model evaluation index used for measuring the evaluation result of the algorithm model comprises at least one of the following items: AUC score, ROC curve, accuracy, precision, recall, F1 score, log loss, etc. And operating the workflow, evaluating the model training results of the plurality of machine learning algorithms in the step 103 by using the model evaluation indexes, and selecting an optimal algorithm model based on the algorithm model evaluation results. And the characteristic importance result of the algorithm model can be synchronously generated during the evaluation of the algorithm model, so that the importance result of each characteristic variable of the optimal algorithm model is obtained. And sorting the importance results of the characteristic variables of the optimal algorithm model, and selecting N factors with sorting orders smaller than a first threshold value as effective factors, namely selecting N factors with top sorting. N is an integer of 1 or more. Preferably, N is 10 or the total factor number of N is 20%. As shown in fig. 5, the significance result is the significance result of each feature corresponding to the random forest algorithm model, wherein the feature variable significance result shows that the significance of the factor "trade _ amount", that is, "volume" is prominent, that is, "volume" is an effective factor.
The factor checking system in the embodiment of the invention uses the machine learning model to verify the factors, thereby improving the checking accuracy and convenience in factor checking.
FIG. 6 is a flow chart of a factor checking method according to an embodiment of the present invention. The factor verification method may be applied to a factor verification system, as shown in fig. 6, and includes the steps of:
step 201, displaying a user interface, and receiving a first configuration operation for the user interface.
Step 201 in this embodiment is the same as step 101 in the first embodiment of the present invention, and is not described herein again.
Step 202, setting parameters based on the first configuration operation.
Step 202 in this embodiment is the same as step 102 in the first embodiment of the present invention, and is not described herein again.
And step 203, generating a machine learning model of the factor according to the set parameters.
Step 203 in this embodiment is the same as step 103 in the first embodiment of the present invention, and is not described herein again.
Step 204, obtaining a factor checking result based on the machine learning model.
Step 204 in this embodiment is the same as step 104 in the first embodiment of the present invention, and is not described herein again.
And step 205, verifying the effective factor to obtain a verification result.
In addition to verifying the effective factors, the embodiment of the present invention may also directly verify the factors, that is, verify the factors that have not been subjected to the factor verification of the machine learning model described in the first embodiment, and the verification method is the same as the following method.
The verification result comprises stability, and the step of verifying the effective factor comprises:
calculating median, mean and standard deviation of the effective factors;
judging the distribution form of the effective factors according to the median, the mean and the standard deviation;
and determining the stability of the effective factor according to the distribution form.
Specifically, factor statistics is carried out, the distribution form of the effective factors is judged according to the calculated median, mean value and standard deviation, and if the dispersion is large, the factor value is judged to be unstable; otherwise, if the dispersion is small, the factor value is judged to be stable, namely the factor belongs to the effective factor.
The verification result further comprises industry coverage, and the step of verifying the effective factor comprises the following steps:
calculating the coverage rate of the effective factor;
and determining the industry coverage of the factor according to the coverage of the effective factor.
Specifically, the industrial coverage of the factors is judged by calculating the coverage of the effective factors. For example, a factor covers only 0.1% of the securities, and the factor is considered to be an invalid factor or a domain-specific factor. A factor is considered to be a valid factor if it covers 80% of the securities.
The verification result further comprises a significance test result, and the step of verifying the validity factor comprises:
calculating a regression result of the effective factor by using a two-time section regression test method;
and performing hypothesis test on the regression result to obtain a significance test result of the effective factor.
Specifically, a hypothesis test was performed by F _ M regression results to find the statistical significance of the factors. F _ M regression is a two-section regression test method. Firstly, carrying out unary linear regression on the single factor value and the stock profitability to obtain a regression coefficient value; second, a cross-sectional regression of the individual strand average returns with the values of the regression coefficients was performed at each time t. And in each period, the return rate of the stock in the next period is used for carrying out regression on the factor value to be tested, the return rate of the factor in the period and the residual error of each stock are obtained through regression, a t value sequence is further obtained, and the significance is judged through comparing the t value sequence after regression with the critical value. Wherein the threshold value of 2 is a commonly accepted threshold value in the industry. If the t value sequence is smaller than the critical value, the factor is significant and is a valid factor. The present invention employs a t-test of hypothesis testing. The hypothesis test is also called significance test, which is a statistical method for judging whether a hypothesis is established by logical reasoning of small probability back-off, and comprises the steps of firstly, assuming that the overall parameter (or distribution) corresponding to a sample is the same as a certain known overall parameter (or distribution), then analyzing the sample data according to the distribution rule of statistics, judging whether the hypothesis is supported by using sample information, and making a choice for the test hypothesis, wherein the conclusion is probabilistic and not absolute positive or negative. the t-test is mainly used for normal distributions with small sample sizes (e.g., sample size n <30) and unknown total standard deviation σ.
The verification result further includes a correlation, and the step of verifying the validity factor includes:
calculating information coefficients of the significant factors;
and judging the correlation between the effective factor and the income according to the information coefficient.
The Information Coefficient (IC) of the significance factor includes IC mean, absolute mean, IC standard deviation, IC _ IR (i.e., factor IR), IC winning. The correlation of the factors with the yield was judged by IC analysis. The larger the factor IC is, the stronger the stock selection ability is; the factor IR (i.e. the information ratio) represents the stability of the factor's ability to stock.
The verification result further includes monotonicity, and the step of verifying the effective factor includes:
grading the selected target according to the factor value of the effective factor;
calculating the combined income of the targets corresponding to each file;
and predicting monotonicity of the result according to the combined income judgment factor.
Optionally, sorting and grading are performed, and monotonicity of the factor prediction result is judged according to difference of yield rates of grading. Illustratively, the monotonicity of the factor prediction is determined by dividing the securities into 5 stages by factor value (e.g., by factor value from large to small), calculating the combined profit for each stage, and if there is a change in direction (e.g., going up and down), the monotonicity is poor. The factor predictor refers to the effect of the factor on the rise and fall of a target (e.g., stock).
The verification result further includes validity and directionality, and the step of verifying the validity factor includes:
sorting the selected targets according to the factor values of the effective factors;
selecting the targets corresponding to the factor values with the ranking less than the second threshold value as the combination of multiple targets;
selecting the target corresponding to the factor value with the ranking larger than the third threshold value as the combination of the blank-looking targets;
respectively calculating the benefits of the combination of the multi-target and the combination of the blank target;
and judging the effectiveness and the directionality of the effective factor based on the two calculated groups of benefits.
Specifically, the utility of the factor is judged through the multi-space combined yield. The factor value for each security is classified based on the factor data. For example, the factor values of the single securities are sorted according to the size, the top 30% of the ranks are selected as the strong stock with the label of 1, the bottom 30% of the ranks are selected as the weak stock with the label of 0. Selecting the securities with the security factor value ranking 30% in the front as a multi-security combination, selecting the securities with the security factor value ranking 30% in the back as another group of blank combinations, respectively calculating to obtain two groups of security combination profits, and judging the effectiveness and the directionality of the factors based on the two groups of the profits.
Illustratively, as shown in fig. 7, N factors selected based on the machine learning model are verified. For example, upload the deals to the factor bank. Selecting a checking interval with factors of 2018/01/01-2018/12/31, classifying all stocks of the A stock according to the Fangying grade, classifying all stocks of the A stock into 5 grades with equal weight, and checking all stocks of the A stock according to the weekly transfers.
And respectively carrying out factor statistics, factor coverage, F _ M regression, IC analysis, sequencing grading and multi-space combined yield verification, and analyzing and checking results. The test results of the IC analysis are shown in fig. 8 and 9.
The results of the examination of the yield of the combination of the large-view and empty-view are shown in fig. 10 and 11.
Based on the inspection result, comprehensively judging that the 'bargain' is a reverse effective factor, including 1) that the dispersion of the factor is not large and the judgment factor value is relatively stable; 2) the factor coverage is high; 3) judging the significance of the factor, wherein the significance is obvious when the value t is less than 2; 4) the mean value of the absolute value of the factor IC is more than 0.05, the factor is judged to have stronger stock selection capability, the absolute value of the IC _ IR value is more than 0.5, and the factor is judged to have stronger ability of stably acquiring excess income; 5) the monotonicity of the factor is better; 6) the difference is obvious when the comparison result of the yield of the empty-looking combination is more seen, so that the negative effect of the factor is judged. In conclusion, the factor is effective and has obvious significance in combination. Wherein the "bargain" is judged to be the inverse factor based on the IC analysis and the multi-space combined profitability. And selling the reverse effective factor, namely the factor value when the factor value reaches a certain expected value. And buying when the forward effective factor, namely the factor value reaches a certain expected value.
The embodiment of the invention can more accurately test the effectiveness and the significance of the single factor on the cross section (namely, a time point for comparison) by testing the significance of the factor through multiple models and multiple dimensions compared with the prior art. The embodiment of the invention tests the factors through various models and multiple dimensions, evaluates the effectiveness and the significance of the factors through test results of different models, solves the uncertainty of single factor test, and further improves the test accuracy of the factor test.
Based on the factor checking method provided in the above embodiment, an embodiment of the present invention further provides a factor checking system for implementing the above method, and referring to fig. 12, a factor checking system 1100 provided in an embodiment of the present invention includes:
a receiving module 1101, configured to display a user interface, and receive a first configuration operation on the user interface;
a parameter setting module 1102, configured to perform parameter setting based on the first configuration operation;
a generating module 1103 for generating a machine learning model of the factor according to the set parameters;
a verification module 1104 for deriving a verification result of the factor based on the machine learning model
Optionally, the parameter setting module 1102 is specifically configured to:
configuring quantitative strategy parameters and model parameters based on the first configuration operation;
wherein the quantitative policy parameters include at least one of target parameters, transaction parameters, and factor data;
the model parameters include at least one of machine learning algorithm parameters, preset expected profitability parameters and return evaluation parameters.
Optionally, the generating module 1103 includes:
the generating unit is used for generating an AI quantization strategy workflow based on the set parameters;
and the operation unit is used for operating the AI quantization strategy workflow to carry out machine learning model training and generating a machine learning model of the factors. Optionally, the operation unit is specifically configured to:
taking the factor data as input data of a machine learning model;
and the AI quantitative strategy workflow performs machine learning model training on the input data according to the model parameters.
Optionally, the generating module 1103 further includes:
the preprocessing unit is used for preprocessing the factor data;
the data pre-processing includes at least one of: sampling, missing value processing, standardization, normalization, data set splitting, data type conversion, numerical value coding, feature binarization and feature deletion;
the execution unit is specifically configured to use the factor data after data preprocessing as input data of a machine learning model.
Optionally, the generating module 1103 further includes:
and the tuning unit is used for carrying out super-parameter tuning on the machine learning model.
Optionally, the verification module 1104 is specifically configured to:
evaluating the machine learning model to obtain an optimal model and an importance result of each characteristic of the optimal model;
sorting the importance results of the features;
and the factor corresponding to the feature with the sorting order smaller than the first threshold value is a valid factor.
The factor checking system in the embodiment of the invention uses the machine learning model to verify the factor, thereby improving the checking accuracy in the factor checking process.
As shown in fig. 13, the factor verification system 1100 further includes:
a verifying module 1105, configured to verify the valid factor to obtain a verification result.
Optionally, the verification result includes stability, and the verification module 1105 is specifically configured to:
calculating median, mean and standard deviation of the effective factors;
judging the distribution form of the effective factors according to the median, the mean and the standard deviation;
and determining the stability of the effective factor according to the distribution form.
Optionally, the verification result includes an industry coverage, and the verification module 1105 is specifically configured to:
calculating the coverage rate of the effective factor;
and determining the industry coverage of the factor according to the coverage of the effective factor.
Optionally, the verification result includes a significance check result, and the verification module 1105 is specifically configured to:
calculating a regression result of the effective factor by using a two-time section regression test method;
and performing hypothesis test on the regression result to obtain a significance test result of the effective factor.
Optionally, the verification result includes a correlation, and the verification module 1105 is specifically configured to:
calculating information coefficients of the significant factors;
and judging the correlation between the effective factor and the income according to the information coefficient.
Optionally, the verification result includes monotonicity, and the verification module 1105 is specifically configured to:
grading the selected target according to the factor value of the effective factor;
calculating the combined income of the targets corresponding to each file;
and predicting monotonicity of the result according to the combined income judgment factor.
Optionally, the verification result includes validity and directionality, and the verification module 1105 is specifically configured to:
sorting the selected targets according to the factor values of the effective factors;
selecting the targets corresponding to the factor values with the ranking less than the second threshold value as the combination of multiple targets;
selecting the target corresponding to the factor value with the ranking larger than the third threshold value as the combination of the blank-looking targets;
respectively calculating the benefits of the combination of the multi-target and the combination of the blank target;
and judging the effectiveness and the directionality of the effective factor based on the two calculated groups of benefits.
The factor checking system provided by the embodiment of the invention uses the machine learning model to verify the factors, so that the checking accuracy and convenience in factor checking are improved; in addition, in the embodiment, the factors are also tested through various models and multiple dimensions, the effectiveness and the significance of the factors are evaluated through test results of different models, the uncertainty of single factor test is solved, and the test accuracy of the factor test is further improved.
An embodiment of the present invention provides a factor checking system, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, and when executed by the processor, the computer program implements the steps of the factor checking method as described above.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned embodiment of the factor checking method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A factor verification method, the method comprising:
displaying a user interface, and receiving a first configuration operation on the user interface;
setting parameters based on the first configuration operation;
generating a machine learning model of the factor according to the set parameters;
a test result of a factor is obtained based on the machine learning model.
2. A factor verification method according to claim 1, wherein the step of performing parameter setting based on the first configuration operation comprises:
configuring quantitative strategy parameters and model parameters based on the first configuration operation;
wherein the quantitative policy parameters include at least one of target parameters, transaction parameters, and factor data;
the model parameters include at least one of machine learning algorithm parameters, preset expected profitability parameters and return evaluation parameters.
3. A factor verification method according to claim 1 or 2, wherein the step of generating a machine learning model of the factor according to the set parameters comprises:
generating an AI quantization strategy workflow based on the set parameters;
and operating the AI quantitative strategy workflow to carry out machine learning model training to generate a machine learning model of the factors.
4. The factor verification method of claim 1, wherein the step of deriving a verification result of the factor based on the machine learning model comprises:
evaluating the machine learning model to obtain an optimal model and an importance result of each characteristic of the optimal model;
sorting the importance results of the features;
and the factor corresponding to the feature with the sorting order smaller than the first threshold value is a valid factor.
5. The factor verification method of claim 4, further comprising:
and verifying the effective factors to obtain a verification result.
6. A factor verification system, comprising:
the device comprises a receiving module, a processing module and a display module, wherein the receiving module is used for displaying a user interface and receiving first configuration operation of the user interface;
the parameter setting module is used for setting parameters based on the first configuration operation;
the generating module is used for generating a machine learning model of the factor according to the set parameters;
a verification module to obtain a verification result of the factor based on the machine learning model.
7. The factor verification system of claim 6, wherein the parameter setting module is specifically configured to:
configuring quantitative strategy parameters and model parameters based on the first configuration operation;
wherein the quantitative policy parameters include at least one of target parameters, transaction parameters, and factor data;
the model parameters include at least one of machine learning algorithm parameters, preset expected profitability parameters and return evaluation parameters.
8. A factor verification system according to claim 6 or 7, wherein the generation module comprises:
the generating unit is used for generating an AI quantization strategy workflow based on the set parameters;
and the operation unit is used for operating the AI quantization strategy workflow to carry out machine learning model training and generating a machine learning model of the factors.
9. The factor verification system of claim 6, wherein the verification module is specifically configured to:
evaluating the machine learning model to obtain an optimal model and an importance result of each characteristic of the optimal model;
sorting the importance results of the features;
and the factor corresponding to the feature with the sorting order smaller than the first threshold value is a valid factor.
10. The factor verification system of claim 9, further comprising:
and the verification module is used for verifying the effective factors to obtain a verification result.
CN202010456034.7A 2020-05-26 2020-05-26 Factor checking method and system Pending CN113723728A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010456034.7A CN113723728A (en) 2020-05-26 2020-05-26 Factor checking method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010456034.7A CN113723728A (en) 2020-05-26 2020-05-26 Factor checking method and system

Publications (1)

Publication Number Publication Date
CN113723728A true CN113723728A (en) 2021-11-30

Family

ID=78672096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010456034.7A Pending CN113723728A (en) 2020-05-26 2020-05-26 Factor checking method and system

Country Status (1)

Country Link
CN (1) CN113723728A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241872A (en) * 2017-12-30 2018-07-03 北京工业大学 The adaptive Prediction of Stock Index method of Hidden Markov Model based on the multiple features factor
KR102009310B1 (en) * 2018-10-15 2019-10-21 주식회사 에이젠글로벌 Fraud factor analysis system and method
CN110930256A (en) * 2019-09-30 2020-03-27 北京九章云极科技有限公司 Quantitative analysis method and quantitative analysis system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241872A (en) * 2017-12-30 2018-07-03 北京工业大学 The adaptive Prediction of Stock Index method of Hidden Markov Model based on the multiple features factor
KR102009310B1 (en) * 2018-10-15 2019-10-21 주식회사 에이젠글로벌 Fraud factor analysis system and method
CN110930256A (en) * 2019-09-30 2020-03-27 北京九章云极科技有限公司 Quantitative analysis method and quantitative analysis system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李杰: ""基于随机森林算法的多因子选股模型研究"", 《中国优秀硕士学位论文全文数据库》, no. 2, pages 3 *

Similar Documents

Publication Publication Date Title
CN108564286B (en) Artificial intelligent financial wind-control credit assessment method and system based on big data credit investigation
Delen et al. Measuring firm performance using financial ratios: A decision tree approach
CN108475393A (en) The system and method that decision tree is predicted are promoted by composite character and gradient
KR20010102452A (en) Methods and systems for finding value and reducing risk
CN111291925A (en) Financial market prediction and decision-making system and method based on artificial intelligence
CN115357764A (en) Abnormal data detection method and device
CN113177643A (en) Automatic modeling system based on big data
CN111626855A (en) Bond credit interest difference prediction method and system
CN116911994B (en) External trade risk early warning system
Jiang et al. [Retracted] Research on Intelligent Prediction Method of Financial Crisis of Listed Enterprises Based on Random Forest Algorithm
CN108305174B (en) Resource processing method, device, storage medium and computer equipment
CN116611911A (en) Credit risk prediction method and device based on support vector machine
CN113723728A (en) Factor checking method and system
CN113222767A (en) Data processing method and device for indexing securities combination
US20020128858A1 (en) Method and system for population classification
KR101886418B1 (en) A System of Stock Price Simulation Based on GPU
CN114548620A (en) Logistics punctual insurance service recommendation method and device, computer equipment and storage medium
CN113034264A (en) Method and device for establishing customer loss early warning model, terminal equipment and medium
Gorgulho et al. Using GAs to balance technical indicators on stock picking for financial portfolio composition
Duan et al. Application of machine learning in quantitative timing model based on factor stock selection
CN117993729A (en) Method and system for processing public accumulation fund data
CN118071483A (en) Method for constructing retail credit risk prediction model and personal credit business Scorepsi model
CN118052371A (en) Method and device for analyzing operation condition of electric power marketing field device
CN117649293A (en) Asset retention promotion method and system for bank-oriented issuing clients
CN118333737A (en) Method for constructing retail credit risk prediction model and consumer credit business Scorebetai model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination