CN109767333A - Select based method, device, electronic equipment and computer readable storage medium - Google Patents

Select based method, device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN109767333A
CN109767333A CN201811536792.9A CN201811536792A CN109767333A CN 109767333 A CN109767333 A CN 109767333A CN 201811536792 A CN201811536792 A CN 201811536792A CN 109767333 A CN109767333 A CN 109767333A
Authority
CN
China
Prior art keywords
classifier
fund
banking operation
operation data
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811536792.9A
Other languages
Chinese (zh)
Inventor
王薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811536792.9A priority Critical patent/CN109767333A/en
Publication of CN109767333A publication Critical patent/CN109767333A/en
Pending legal-status Critical Current

Links

Abstract

Based method, device, electronic equipment and computer readable storage medium are selected the present invention relates to a kind of.The described method includes: reading banking operation data, and the factor of banking operation data is constructed according to the banking operation data of reading;The factor of banking operation data and banking operation data is divided into training dataset and test data set;It is fitted multiple classifiers respectively using training dataset and test data set;Optimum classifier is determined from the classifier of fitting;Current fund data is obtained, and the fund that preset quantity is selected using optimum classifier from the current fund data of acquisition determines investment combination assets;And investment combination assets are carried out with the Asset Allocation of major class Asset Allocation model, determine the investment weight of the fund of investment combination assets.The present invention determines optimum classifier from the classifier of fitting, and the fund that preset quantity is selected using optimum classifier from the current fund data of acquisition determines investment combination assets, to improve the accuracy for selecting base to predict.

Description

Select based method, device, electronic equipment and computer readable storage medium
Technical field
The present invention relates to Financial Management fields, and in particular to a kind of to select based method, device, electricity based on machine learning algorithm Sub- equipment and computer readable storage medium.
Background technique
It mainly carries out selecting base using multiple-factor linear model in the prior art, but has the influence of many factor pair next period incomes Be not it is linear, need to introduce Fitting of Nonlinear Models historical data, can just make and more accurately base be selected to predict.
Summary of the invention
In view of the foregoing, it is necessary to propose a kind of to select based method, device, electronic equipment and computer readable storage medium To improve the accuracy for selecting base to predict.
The first aspect of the application, which provides, a kind of selects based method, which comprises
Banking operation data are read, and construct the factor of the banking operation data according to the banking operation data of reading, Wherein the factor of the banking operation data include year earning rate, maximum withdraw, Sharpe Ratio, downlink standard deviation, Suo Tinuo One of ratio, the scale of Fund Company, the scale of fund or entire period of actual operation of fund manager are a variety of;
The factor of the banking operation data and the banking operation data is divided into training dataset and test data Collection;
It is fitted multiple classifiers respectively using the training dataset and the test data set, wherein the classifier packet Include logistic regression classifier, support vector machine classifier, Gauss Naive Bayes Classifier and random forest grader;
Optimum classifier is determined from the classifier of fitting;
Current fund data is obtained, and selects present count from the current fund data of acquisition using the optimum classifier The fund of amount determines investment combination assets;And
The Asset Allocation that investment combination assets are carried out with major class Asset Allocation model, so that it is determined that the investment combination assets Fund investment weight.
Preferably, the reading banking operation data and the banking operation number is constructed according to the banking operation data of reading According to the factor include: that banking operation data to acquisition are cleaned, and according to the banking operation data building after cleaning The factor of banking operation data after cleaning.
Preferably, the banking operation data of described pair of acquisition, which clean, includes:
The fund without net value is removed from the banking operation data;
Removal classification fund and money market type fund from the banking operation data;
It is removed from the banking operation data and lists the fund less than 1 year by the current the end of month;
From removal in the banking operation data by the day of trade net value for continuing to exceed 20% in current the previous year at the end of month Fund without update;And
It is remaining after removing fund of the current the end of month fund net assets lower than 10,000,000 yuan in the banking operation data Fund.
Preferably, the factor of the banking operation data and the banking operation data is divided into training dataset and survey Trying data set includes:
Judge the banking operation data lower monthly benefits whether be more than the banking operation data similar fund finger The banking operation data label is 1 if being more than, is otherwise 0 by the banking operation data label by number, so will be described Banking operation data carry out label and generate label data;And
The factor of the banking operation data and the label data randomly divide with preset ratio and generate institute State training dataset and the test data set.
Preferably, determine that optimum classifier includes: in the classifier from fitting
Each classifier after fitting is tested using the test data set, and calculates the of each classifier One coefficient of stability C1, wherein C1 represents degree of stability of the classifier used in the test data when;
Each classifier after fitting is tested using current fund, and calculates the second of each classifier and stablizes Coefficient C2, C2 represent degree of stability of the classifier used in the current fund when;
Each classifier after fitting is tested using current fund, and calculates the head combination of each classifier Information ratio IR;
According to calculation formula A=C1*C2*IR, the final index A of each classifier is calculated;And
Classifier corresponding to maximum final index A is determined as the optimum classifier.
Preferably, the method also includes:
The Asset Allocation of major class Asset Allocation model is carried out to the throwing with predetermined period to the investment combination assets The investment weight of money combination investment is regularly updated.
Preferably, the major class Asset Allocation model is risk par model.
The second aspect of the application provide it is a kind of select based devices, described device includes:
Module is constructed, for reading banking operation data, and according to the banking operation data of the reading building financial row For the factor of data, wherein the factor of the banking operation data include year earning rate, maximum withdraw, Sharpe Ratio, downlink One of standard deviation, Suo Tinuo ratio, the scale of Fund Company, the scale of fund or entire period of actual operation of fund manager are more Kind;
Data division module, for the factor of the banking operation data and the banking operation data to be divided into training Data set and test data set;
Fitting module, for being fitted multiple classifiers respectively using the training dataset and the test data set, Described in classifier include logistic regression classifier, support vector machine classifier, Gauss Naive Bayes Classifier and random gloomy Woods classifier;
Classifier determining module, for determining optimum classifier from the classifier of fitting;
Investment combination assets determining module for obtaining current fund data, and utilizes the optimum classifier from acquisition Current fund data in select the fund of preset quantity and determine investment combination assets;And
Asset Allocation module, for investment combination assets to be carried out with the Asset Allocation of major class Asset Allocation model, thus really The investment weight of the fund of the fixed investment combination assets.
The third aspect of the application provides a kind of electronic equipment, and the electronic equipment includes processor, and the processor is used It is realized when executing the computer program stored in memory and described selects based method.
The fourth aspect of the application provides a kind of computer readable storage medium, is stored thereon with computer program, described It is realized when computer program is executed by processor and described selects based method.
This case can determine optimum classifier, and the working as from acquisition using the optimum classifier from the classifier of fitting The fund that preset quantity is selected in preceding fund data determines investment combination assets, to improve the accuracy for selecting base to predict.
Detailed description of the invention
Fig. 1 is the flow chart that based method method is selected in an embodiment of the present invention.
Fig. 2 is the structure chart that based method device is selected in an embodiment of the present invention.
Fig. 3 is the schematic diagram of electronic equipment preferred embodiment of the present invention.
Specific embodiment
To better understand the objects, features and advantages of the present invention, with reference to the accompanying drawing and specific real Applying example, the present invention will be described in detail.It should be noted that in the absence of conflict, embodiments herein and embodiment In feature can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, described embodiment is only It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.
Preferably, of the invention that based method is selected to apply in one or more electronic equipment.The electronic equipment is one Kind can be according to the instruction for being previously set or storing, the automatic equipment for carrying out numerical value calculating and/or information processing, and hardware includes But be not limited to microprocessor, specific integrated circuit (Application Specific Integrated Circuit, ASIC), Programmable gate array (Field-Programmable Gate Array, FPGA), digital processing unit (Digital Signal Processor, DSP), embedded device etc..
The electronic equipment can be the calculating such as desktop PC, laptop, tablet computer and cloud server Equipment.The equipment can carry out man-machine friendship by modes such as keyboard, mouse, remote controler, touch tablet or voice-operated devices with user Mutually.
Embodiment 1
In present embodiment, the method is applied in multiple operation systems.For example, the multiple operation system 1 can be with For the first operation system, the second operation system ..., N operation system.In present embodiment, the operation system be can be together There are respective business in the operation system of finance service under one financial system, such as each bank or securities broker company respectively System, the first operation system can be the operation system of finance service A, and the second operation system can be finance service The operation system of B.Each operation system includes an at least server, and an at least server passes through described in network connection The terminal installation of the staff of the terminal installation and finance service of all clients of finance service.This In embodiment, an at least server connection storage equipment.The storage equipment stores the institute of finance service There are the data such as the data of client, such as client identity information, account information and asset configuration information.An at least server The storage equipment of connection can be local storage equipment, be also possible to the storage equipment by network connection.
Fig. 1 is the flow chart that based method is selected in an embodiment of the present invention.It is walked in the flow chart according to different requirements, Rapid sequence can change, and certain steps can be omitted.
As shown in fig.1, the production configuration method specifically includes the following steps:
Step S21 reads banking operation data, and constructs the banking operation data according to the banking operation data of reading The factor.
In present embodiment, banking operation data can be read from the corresponding server of each operation system.Wherein, institute State banking operation data assets data include, but are not limited to public offering fund, the net value data of public offering fund, Fund Company number According to, data of fund manager etc..In present embodiment, also constructed after reading banking operation data banking operation data because Son.For example, in one embodiment, the factor of banking operation data can be configured to according to the banking operation data, but not Be limited to year earning rate, it is maximum withdraw, Sharpe Ratio, downlink standard deviation, Suo Tinuo ratio, the scale of Fund Company, fund Scale, entire period of actual operation of fund manager etc..Wherein, the year earning rate of fund refers to pre- as obtained by purchase fund product Phase earning rate is converted into annualized return and is calculated.The maximum of fund, which is withdrawn, refers to that any history time point is backward within the selected period It pushes away, earning rate when product net value goes to minimum point withdraws the maximum value of amplitude.Maximum is withdrawn for describing purchase of fund product The case where worst being likely to occur afterwards.Sharpe Ratio is fund valuation standardized index.The downlink standard deviation of fund is Refer to the possible change degree of fund.Standard deviation is bigger, and the degree that fund future net value may change is bigger, and stability is got over Small, investment risk is higher.Suo Tinuo ratio refers to a kind of method for measuring investment combination relative performance, this ratio is higher, Show that fund undertakes same units downside risks and can obtain higher excess return rate.
In present embodiment, step S21 " reads banking operation data, and more according to the building of the banking operation data of reading The factor of a banking operation data " further include: the banking operation data of acquisition are cleaned, and according to the financial row after cleaning The factor of the banking operation data after the cleaning is constructed for data.In present embodiment, the banking operation number of described pair of acquisition Include: according to cleaning is carried out
The fund without net value is removed from the banking operation data;
Removal classification fund and money market type fund from the banking operation data;
It is removed from the banking operation data and lists the fund less than 1 year by the current the end of month;
From removal in the banking operation data by the day of trade net value for continuing to exceed 20% in current the previous year at the end of month Fund without update;
The fund that current the end of month fund net assets are lower than 1,000 ten thousand yuan is removed from the banking operation data.
In present embodiment, remove from the banking operation data is by the fund that the current the end of month listed less than 1 year Refer to from the banking operation data removal from from fund distribution to current fund of the end of month Time To Market less than 1 year.From Removal is by day of trade base of the net value without update for continuing to exceed 20% in current the previous year at the end of month in the banking operation data Gold refers to that the day of trade net value that 20% is continued to exceed out of the previous year that remove the current the end of month in the banking operation data does not have The fund of update.For example, there is continuing to exceed 20% day of trade without more within the previous year at the current the end of month in the net value of certain fund Newly, then the fund is removed.It is square from Fund Type, net fund value etc. to the banking operation data read in present embodiment It is cleaned in face of data, obtains the banking operation data that can be used for studying.
Step S22, by the factor of the banking operation data and the banking operation data be divided into training dataset and Test data set.
In the present embodiment, gold can be obtained from operation system 1 in the monthly fixed time (such as No. 1 monthly) Melt behavioral data and be built into the factor of banking operation data, and the banking operation data are subjected to label and generate number of tags According to.In present embodiment, the label data can generate in the following way: judging the lower monthly benefits of banking operation data is The index of the no similar fund more than the banking operation data, if be more than if by corresponding banking operation data label be 1, it is no It is then 0 by corresponding banking operation data label.Then the factor of banking operation data and the label data of generation are divided at random For training dataset and test data set.
In present embodiment, the training dataset is different with the test data set.Wherein, the training dataset is It is made of a part of data in the factor of banking operation data and label data, and the test data is by banking operation number According to the factor and label data in another part data constitute.In the present embodiment, with preset ratio to banking operation number According to the factor and label data randomly divide and generate the training dataset and the test data set.The default ratio Example refers to the ratio for the data volume that the data volume that training data is concentrated and test data are concentrated.The preset ratio can be according to specific Using determining.In one embodiment, the preset ratio of training dataset and test data set is 8:2, i.e. training dataset It is made of any 80% data in the factor of banking operation data and label data, and test data set is by banking operation number According to the factor and label data in remaining 20% data constitute.
Step S23 is fitted multiple classifiers using the training dataset and the test data set respectively.
In present embodiment, the multiple classifier includes logistic regression classifier, support vector machine classifier, Gao Sipiao Plain Bayes classifier and random forest grader.In present embodiment, it can use in each fixed time point (such as monthly No. 1) Training dataset and test data set, respectively be fitted logistic regression classifier, support vector machine classifier, Gauss simplicity pattra leaves This classifier and random forest grader.It in another embodiment, can also be for the training data on each fixed time point Collection is fitted with test data set using the Bu Tong derivative classifier according to each classifier parameters.
Step S24 determines optimum classifier from the classifier of fitting.
In present embodiment, each classifier after fitting is carried out respectively using the test data set and current fund It tests and determines the final index A of each classifier, and classifier corresponding to maximum final index A is determined as optimal Classifier.Specifically, the method includes: in step " from optimum classifier is determined in the classifier of fitting "
(S241) each classifier after fitting is tested using the test data set, and calculates each classification First coefficient of stability C1 of device.
In present embodiment, according to classification when being tested using the test data set each classifier after fitting The data that test data is concentrated are divided into 5 grades from big to small by the probability value of device output, that is, and first grade, second gear, third gear, the Fourth gear and fifth speed.If the earning rate for first grade of the data that a certain classifier divides be it is highest, by the classifier pair The test result answered is denoted as 1, on the contrary then be denoted as 0.Then the fixation time point (monthly 1 in such as 1 year in the several years in past is utilized Number) time series on the test data set that obtains each classifier is tested, and by test result according to above-mentioned original Then the corresponding test result of each classifier is marked.In this way, one group of array being made of 1 and 0 will be obtained, to the number Group obtains the first coefficient of stability C1 after averaging.When wherein, the first coefficient of stability C1 represents classifier used in test data Degree of stability.
(S242) each classifier after fitting is tested using current fund, and calculates the of each classifier Two coefficient of stability C2.
In present embodiment, exported when being tested using current fund each classifier after fitting according to classifier Probability value current fund is divided into 5 grades from big to small, that is, first grade, second gear, third gear, fourth speed and fifth speed.If The earning rate for first grade of the current fund that a certain classifier divides be it is highest, then the corresponding test result of classifier is denoted as 1, it is on the contrary then be denoted as 0.Then the current fund pair in the time series of the fixation time point (monthly No. 1) of several years in past is utilized Each classifier is tested, and test result marks the corresponding test result of each classifier according to above-mentioned principle Note.In this way, one group of array being made of 1 and 0 will be obtained, the second coefficient of stability C2 is obtained after averaging to the array.Wherein, Second coefficient of stability C2 represents degree of stability of the classifier used in true predictive when.
(S243) each classifier after fitting is tested using current fund, and calculates the head of each classifier Portion combined information ratio IR (Information Ratio).
In present embodiment, exported when being tested using current fund each classifier after fitting according to classifier Probability value current fund is ranked up from big to small, take before N only, then utilize the several years in past fixation time point when Between current fund in sequence each classifier is tested, and by test result according to above-mentioned principle to each classifier Current fund is ranked up by the probability value of output from big to small, and N only, constitutes head combination, and calculate each classification before taking The information ratio IR of the corresponding head combination of device.
(S244) according to calculation formula A=C1*C2*IR, the final index A of each classifier is calculated.
(S245) classifier corresponding to maximum final index A is determined as optimum classifier.
Step S25 is obtained current fund data, and is selected from the current fund data of acquisition using the optimum classifier The fund of preset quantity determines investment combination assets out.
In present embodiment, current fund data is tested using the optimum classifier, according to the most optimal sorting Current fund data is ranked up by the probability value of class device output from big to small, and N fund data is as the investment group before taking Joint production.
Step S26 carries out the Asset Allocation of major class Asset Allocation model to investment combination assets, so that it is determined that investment combination The investment weight of assets.
In present embodiment, the major class Asset Allocation model can be risk par model.In present embodiment, to throwing The Asset Allocation that money combination investment carries out major class Asset Allocation model, which refers to, carries out risk par model to investment combination assets Asset Allocation is with the investment weight of the determination investment combination assets.
In present embodiment, risk par model is based on Principal Component Analysis, linear by carrying out to investment combination assets Combination forms irrelevant investment combination assets, and the assets of risk par model are carried out for incoherent investment combination assets Configuration, the final investment weight for determining investment combination assets.For example, in one embodiment, it is assumed that in investment combination assets altogether Have N number of assets, the earning rate of assets be R=[r1, r2 ... rN] ', for investment combination weight w=[w1, w2 ... wN] ', throw Provide combined total revenue are as follows: Rw=w ' R.Then, covariance matrix Σ=Cov of assets is calculated using the earning rate of N number of assets (R), because of the symmetry of covariance matrix Σ, Σ can be decomposed into N number of orthogonal feature vector: E Λ E '=∑, in which: Λ= Diag (λ 1, λ 2 ..., λ N) is the diagonal matrix of Σ characteristic value building, and λ i meets λ1≥λ2≥…≥λN;E is λiCharacter pair to Measure eiThe eigenvectors matrix being arranged to make up is arranged, and E is orthogonal matrix, wherein E '=E-1And E ' E=I.Therefore, covariance matrix can It decomposes are as follows: ∑=λ1e1e'12e2e'2+···+λNeNe'N.Feature vector can form N number of orthogonal investment combination, and be claimed The principal component factor.The earning rate of the principal component factor may be defined as: RPC=E ' R, meanwhile, Cov (RPC)=Cov (E ' R)=E ' Cov (R) E=E ' Σ E=E ' E Λ E ' E=Λ.Have for single principal component investment combinationFor appointing It anticipates two principal component factorsWithHaveIt can be found that N number of principal component factor be it is incoherent and Their variance respectively with λ12,…,λNIt is equal.Therefore, the weight of the principal component factor can be made of the linear combination of former weight, That is: wPC=E ' w.In present embodiment, the total revenue of the principal component factor are as follows: Rw=wPC′RPC=(E ' w) ' (E ' R)=w ' EE ' R=w ' R.To principal component factor risk par model, RC is contributed by riskiDefinition can obtain
Institute The risk par model for stating the principal component factor can be exchanged into:
Above-mentioned equation can be converted into Optimized model and solve optimal weights, so as to define principal component risk par mould Type, it may be assumed that
When objective function is equal to 0, haveRCi=RCj, the numerical solution is principal component risk The weight of par investment combination.
In present embodiment, the method also includes:
Step S27, to the investment combination assets with predetermined period carry out major class Asset Allocation model Asset Allocation with The investment weight of the investment combination assets is regularly updated.In present embodiment, the predetermined period is one month, one A season or 1 year.
This case can determine optimum classifier, and the working as from acquisition using the optimum classifier from the classifier of fitting The fund that preset quantity is selected in preceding fund data determines investment combination assets, to improve the accuracy for selecting base to predict.
Embodiment 2
Fig. 2 is the structure chart that based devices 10 are selected in an embodiment of the present invention.
In some embodiments, described that based devices 10 is selected to run in electronic equipment.It is described to select the based devices 10 to may include Multiple functional modules as composed by program code segments.The program code for selecting each program segment in based devices 10 can be deposited It is stored in memory, and as performed by least one processor, to execute the function of Asset Allocation.
In the present embodiment, the electronic equipment selects function of the based devices 10 according to performed by it, can be divided into more A functional module.As shown in fig.2, described select based devices 10 to may include building module 301, data division module 302, be fitted Module 303, classifier determining module 304, investment combination assets determining module 305, Asset Allocation module 306 and update module 307.The so-called module of the present invention, which refers to, a kind of performed by least one processor and can complete fixed function Series of computation machine program segment, storage is in memory.It in some embodiments, will be subsequent about the function of each module It is described in detail in embodiment.
The building module 301 is used to read banking operation data, and according to the building of the banking operation data of reading The factor of banking operation data.
In present embodiment, the building module 301 can read gold from the corresponding server 11 of each operation system 1 Melt behavioral data.Wherein, the banking operation data assets data include, but are not limited to the net value of public offering fund, public offering fund Data, the data of Fund Company, data of fund manager etc..In present embodiment, structure is gone back after reading banking operation data Build the factor of banking operation data.For example, in one embodiment, the factor of banking operation data can be configured to, but unlimited Yu Nianhua earning rate, it is maximum withdraw, the rule of Sharpe Ratio, downlink standard deviation, Suo Tinuo ratio, the scale of Fund Company, fund Mould, entire period of actual operation of fund manager etc..Wherein, the year earning rate of fund refers to expected as obtained by purchase fund product Earning rate is converted into annualized return and is calculated.The maximum of fund, which is withdrawn, refers to that any history time point is backward within the selected period It pushes away, earning rate when product net value goes to minimum point withdraws the maximum value of amplitude.Maximum is withdrawn for describing purchase of fund product The case where worst being likely to occur afterwards.Sharpe Ratio is fund valuation standardized index.The downlink standard deviation of fund is Refer to the possible change degree of fund.Standard deviation is bigger, and the degree that fund future net value may change is bigger, and stability is got over Small, investment risk is higher.Suo Tinuo ratio refers to a kind of method for measuring investment combination relative performance, this ratio is higher, Show that fund undertakes same units downside risks and can obtain higher excess return rate.
In present embodiment, the building module 301 is also used to the banking operation data to acquisition and cleans, and according to Banking operation data after cleaning construct the factor of the banking operation data after the cleaning.In present embodiment, described pair is obtained The banking operation data taken carry out cleaning
The fund without net value is removed from the banking operation data;
Removal classification fund and money market type fund from the banking operation data;
It is removed from the banking operation data and lists the fund less than 1 year by the current the end of month;
From removal in the banking operation data by the day of trade net value for continuing to exceed 20% in current the previous year at the end of month Fund without update;
The fund that current the end of month fund net assets are lower than 1,000 ten thousand yuan is removed from the banking operation data.
In present embodiment, remove from the banking operation data is by the fund that the current the end of month listed less than 1 year Refer to from the banking operation data removal from from fund distribution to current fund of the end of month Time To Market less than 1 year.From Removal is by day of trade base of the net value without update for continuing to exceed 20% in current the previous year at the end of month in the banking operation data Gold refers to that the day of trade net value that 20% is continued to exceed out of the previous year that remove the current the end of month in the banking operation data does not have The fund of update.For example, there is continuing to exceed 20% day of trade without more within the previous year at the current the end of month in the net value of certain fund Newly, then the fund is removed.
In present embodiment, to the banking operation data read from Fund Type, net fund value etc. to data into Row cleaning, obtains the banking operation data that can be used for studying.
The data division module 302 is used to divide the factor of the banking operation data and the banking operation data For training dataset and test data set.
In the present embodiment, financial when being obtained from operation system 1 in the monthly fixed time (such as No. 1 monthly) Behavioral data and after being built into the factor of banking operation data, the data division module 302 by the banking operation data into Row label generates label data.In present embodiment, the label data can generate in the following way: judge banking operation The lower monthly benefits of data whether be more than the banking operation data similar fund index, by corresponding financial row if being more than It is 1 for data label, is otherwise 0 by corresponding banking operation data label.The data division module 302 is also by banking operation The factor of data and the label data of generation are randomly divided into training dataset and test data set.
In present embodiment, the training dataset is different with the test data set.Wherein, the training dataset is It is made of a part of data in the factor of banking operation data and label data, and the test data is by banking operation number According to the factor and label data in another part data constitute.In the present embodiment, the data division module 302 is with pre- If ratio, which randomly divide to the factor and label data of banking operation data, generates the training dataset and the survey Try data set.The preset ratio refers to the ratio for the data volume that the data volume that training data is concentrated and test data are concentrated.It is described Preset ratio can be determined according to concrete application.In one embodiment, the default ratio of training dataset and test data set Example is 8:2, i.e., training dataset is made of any 80% data in the factor of banking operation data and label data, and is surveyed Examination data set is made of 20% data remaining in the factor of banking operation data and label data.
The fitting module 303 is fitted multiple classifiers using the training dataset and the test data set respectively.
In present embodiment, the multiple classifier includes logistic regression classifier, support vector machine classifier, Gao Sipiao Plain Bayes classifier and random forest grader.In present embodiment, it can use in each fixed time point (such as monthly No. 1) Training dataset and test data set, respectively be fitted logistic regression classifier, support vector machine classifier, Gauss simplicity pattra leaves This classifier and random forest grader.It in another embodiment, can also be for the training data on each fixed time point Collection is fitted with test data set using the Bu Tong derivative classifier according to each classifier parameters
The classifier determining module 304 determines optimum classifier from the classifier of fitting.
In present embodiment, the classifier determining module 304 is using the test data set and current fund to fitting The final index A of each classifier is tested respectively and determined to each classifier afterwards, and by maximum final index A institute Corresponding classifier is determined as optimum classifier.
In a specific embodiment, after the utilization of classifier determining module 304 test data set is to fitting Each classifier is tested, and calculates the first coefficient of stability C1 of each classifier.
In present embodiment, according to classification when being tested using the test data set each classifier after fitting The data that test data is concentrated are divided into 5 grades from big to small by the probability value of device output, that is, and first grade, second gear, third gear, the Fourth gear and fifth speed.If the earning rate for first grade of the data that a certain classifier divides be it is highest, by the classifier pair The test result answered is denoted as 1, on the contrary then be denoted as 0.Then the fixation time point (monthly 1 in such as 1 year in the several years in past is utilized Number) time series on the test data set that obtains each classifier is tested, and by test result according to above-mentioned original Then the corresponding test result of each classifier is marked.In this way, one group of array being made of 1 and 0 will be obtained, to the number Group obtains the first coefficient of stability C1 after averaging.When wherein, the first coefficient of stability C1 represents classifier used in test data Degree of stability.
The classifier determining module 304 also tests each classifier after fitting using current fund, and counts Calculate the second coefficient of stability C2 of each classifier.
In present embodiment, the classifier determining module 304 using current fund to each classifier after fitting into Current fund is divided into 5 grades from big to small by probability value when row test according to classifier output, that is, and first grade, second gear, third Shelves, fourth speed and fifth speed.If the earning rate for first grade of the current fund that a certain classifier divides be it is highest, will point The corresponding test result of class device is denoted as 1, on the contrary then be denoted as 0.Then the fixation time point (monthly No. 1) in the several years in past is utilized Current fund in time series tests each classifier, and by test result according to above-mentioned principle to each classification The corresponding test result of device is marked.In this way, one group of array being made of 1 and 0 will be obtained, after averaging to the array To the second coefficient of stability C2.Wherein, the second coefficient of stability C2 represents degree of stability of the classifier used in true predictive when.
The classifier determining module 304 also tests each classifier after fitting using current fund, and counts Calculate the head combination information ratio IR (Information Ratio) of each classifier.
In present embodiment, the classifier determining module 304 using current fund to each classifier after fitting into Current fund is ranked up by probability value when row test according to classifier output from big to small, and N only, is then utilized in mistake before taking Go the current fund in the time series of the fixation time point of several years to test each classifier, and by test result according to Current fund is ranked up the probability value that each classifier exports by above-mentioned principle from big to small, and N only, constitutes head before taking Combination, and calculate the information ratio IR of the corresponding head combination of each classifier.
The classifier determining module 304 calculates the final index of each classifier according to calculation formula A=C1*C2*IR A, and classifier corresponding to maximum final index A is determined as optimum classifier.
The investment combination assets determining module 305 obtains current fund data, and using the optimum classifier from obtaining The fund that preset quantity is selected in the current fund data taken determines investment combination assets.
In present embodiment, the investment combination assets determining module 305 is using the optimum classifier to current fund Data are tested, and current fund data is ranked up by the probability value according to optimum classifier output from big to small, are taken Preceding N fund data is as the investment combination assets.
The Asset Allocation module 306 carries out the Asset Allocation of major class Asset Allocation model to investment combination assets, thus Determine the investment weight of investment combination assets.
In present embodiment, the major class Asset Allocation model can be risk par model.In present embodiment, to throwing The Asset Allocation that money combination investment carries out major class Asset Allocation model, which refers to, carries out risk par model to investment combination assets Asset Allocation is with the investment weight of the determination investment combination assets.
In present embodiment, risk par model is based on Principal Component Analysis, linear by carrying out to investment combination assets Combination forms irrelevant investment combination assets, and the assets of risk par model are carried out for incoherent investment combination assets Configuration, the final investment weight for determining investment combination assets.For example, in one embodiment, it is assumed that in investment combination assets altogether Have N number of assets, the earning rate of assets be R=[r1, r2 ... rN] ', for investment combination weight w=[w1, w2 ... wN] ', throw Provide combined total revenue are as follows: Rw=w ' R.Then, covariance matrix Σ=Cov of assets is calculated using the earning rate of N number of assets (R), because of the symmetry of covariance matrix Σ, Σ can be decomposed into N number of orthogonal feature vector: E Λ E '=∑, in which: Λ= Diag (λ 1, λ 2 ..., λ N) is the diagonal matrix of Σ characteristic value building, and λ i meets λ1≥λ2≥…≥λN;E is λiCharacter pair to Measure eiThe eigenvectors matrix being arranged to make up is arranged, and E is orthogonal matrix, wherein E '=E-1And E ' E=I.Therefore, covariance matrix can It decomposes are as follows: ∑=λ1e1e'12e2e'2+···+λNeNe'N.Feature vector can form N number of orthogonal investment combination, and be claimed The principal component factor.The earning rate of the principal component factor may be defined as: RPC=E ' R, meanwhile, Cov (RPC)=Cov (E ' R)=E ' Cov (R) E=E ' Σ E=E ' E Λ E ' E=Λ.Have for single principal component investment combinationFor appointing It anticipates two principal component factorsWithHaveIt can be found that N number of principal component factor be it is incoherent and Their variance respectively with λ12,…,λNIt is equal.Therefore, the weight of the principal component factor can be made of the linear combination of former weight, That is: wPC=E ' w.In present embodiment, the total revenue of the principal component factor are as follows: Rw=wPC′RPC=(E ' w) ' (E ' R)=w ' EE ' R=w ' R.To principal component factor risk par model, RC is contributed by riskiDefinition can obtain
Institute The risk par model for stating the principal component factor can be exchanged into:
Above-mentioned equation can be converted into Optimized model and solve optimal weights, so as to define principal component risk par mould Type, it may be assumed that
When objective function is equal to 0, haveRCi=RCj, the numerical solution is principal component risk The weight of par investment combination.
In present embodiment, the method also includes:
The update module 307 carries out the assets of major class Asset Allocation model to the investment combination assets with predetermined period Configuration is regularly updated with the investment weight to the investment combination assets.In present embodiment, the predetermined period is one A month, a season or 1 year.
Embodiment three
Fig. 3 is the schematic diagram of 4 preferred embodiment of electronic equipment of the present invention.
The electronic equipment 4 includes memory 41, processor 42 and is stored in the memory 41 and can be described The computer program 43 run on processor 42.The processor 42 is realized when executing the computer program 43 above-mentioned selects base side Step in method embodiment, such as step S21~S27 shown in FIG. 1.Alternatively, the processor 42 executes the computer journey The above-mentioned function of selecting each module/unit in based method Installation practice, such as the module 301~307 in Fig. 2 are realized when sequence 43.
Illustratively, the computer program 43 can be divided into one or more module/units, it is one or Multiple module/units are stored in the memory 41, and are executed by the processor 43, to complete the present invention.Described one A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, and described instruction section is used In implementation procedure of the description computer program 43 in the electronic equipment 4.For example, the computer program 43 can be by Building module 301, data division module 302, fitting module 303, classifier determining module 304, the investment group being divided into Fig. 2 Joint production determining module 305, Asset Allocation module 306 and update module 307, each module concrete function is referring to embodiment two.
The electronic equipment 4 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set It is standby.It will be understood by those skilled in the art that the schematic diagram is only the example of electronic equipment 4, do not constitute to electronic equipment 4 Restriction, may include perhaps combining certain components or different components, such as institute than illustrating more or fewer components Stating electronic equipment 4 can also include input-output equipment, network access equipment, bus etc..
Alleged processor 42 can be central processing module (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor 42 is also possible to any conventional processing Device etc., the processor 42 are the control centres of the electronic equipment 4, utilize various interfaces and the entire electronic equipment of connection 4 various pieces.
The memory 41 can be used for storing the computer program 43 and/or module/unit, and the processor 42 passes through Operation executes the computer program and/or module/unit being stored in the memory 41, and calls and be stored in memory Data in 41 realize the various functions of the meter electronic equipment 4.The memory 41 can mainly include storing program area and deposit Store up data field, wherein storing program area can application program needed for storage program area, at least one function (for example sound is broadcast Playing function, image player function etc.) etc.;Storage data area can store according to electronic equipment 4 use created data (such as Audio data, phone directory etc.) etc..In addition, memory 41 may include high-speed random access memory, it can also include non-volatile Property memory, such as hard disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card), at least one disk memory, flush memory device or other Volatile solid-state part.
If the integrated module/unit of the electronic equipment 4 is realized in the form of software function module and as independent Product when selling or using, can store in a computer readable storage medium.Based on this understanding, the present invention is real All or part of the process in existing above-described embodiment method, can also instruct relevant hardware come complete by computer program At the computer program can be stored in a computer readable storage medium, and the computer program is held by processor When row, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer program code, institute Stating computer program code can be source code form, object identification code form, executable file or certain intermediate forms etc..It is described Computer-readable medium may include: any entity or device, recording medium, U that can carry the computer program code Disk, mobile hard disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), arbitrary access Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It needs It is bright, the content that the computer-readable medium includes can according in jurisdiction make laws and patent practice requirement into Row increase and decrease appropriate, such as do not include electric load according to legislation and patent practice, computer-readable medium in certain jurisdictions Wave signal and telecommunication signal.
In several embodiments provided by the present invention, it should be understood that arriving, disclosed electronic equipment and method can be with It realizes by another way.For example, electronic equipment embodiment described above is only schematical, for example, the mould The division of block, only a kind of logical function partition, there may be another division manner in actual implementation.
It, can also be in addition, each functional module in each embodiment of the present invention can integrate in same treatment module It is that modules physically exist alone, can also be integrated in equal modules with two or more modules.Above-mentioned integrated mould Block both can take the form of hardware realization, can also realize in the form of hardware adds software function module.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included in the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.This Outside, it is clear that one word of " comprising " is not excluded for other modules or step, and odd number is not excluded for plural number.It is stated in electronic equipment claim Multiple modules or electronic equipment can also be implemented through software or hardware by the same module or electronic equipment.The first, the Second-class word is used to indicate names, and is not indicated any particular order.
Finally it should be noted that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although reference Preferred embodiment describes the invention in detail, those skilled in the art should understand that, it can be to of the invention Technical solution is modified or equivalent replacement, without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. a kind of select based method, which is characterized in that the described method includes:
Banking operation data are read, and construct the factor of the banking operation data according to the banking operation data of reading, wherein The factor of the banking operation data include year earning rate, maximum withdraw, Sharpe Ratio, downlink standard deviation, Suo Tinuo ratio, One of entire period of actual operation of the scale of Fund Company, the scale of fund or fund manager is a variety of;
The factor of the banking operation data and the banking operation data is divided into training dataset and test data set;
It is fitted multiple classifiers respectively using the training dataset and the test data set, wherein the classifier being fitted Including logistic regression classifier, support vector machine classifier, Gauss Naive Bayes Classifier and random forest grader;
Optimum classifier is determined from the classifier of fitting;
Current fund data is obtained, and selects preset quantity from the current fund data of acquisition using the optimum classifier Fund determines investment combination assets;And
The Asset Allocation that investment combination assets are carried out with major class Asset Allocation model, determines the fund of the investment combination assets Invest weight.
2. selecting based method as described in claim 1, which is characterized in that the reading banking operation data and the gold according to reading Melt behavioral data and construct the factors of the banking operation data and includes: that banking operation data to acquisition are cleaned, and according to Banking operation data after cleaning construct the factor of the banking operation data after the cleaning.
3. selecting based method as described in claim 1, which is characterized in that the banking operation data of described pair of acquisition carry out cleaning packet It includes:
The fund without net value is removed from the banking operation data;
Removal classification fund and money market type fund from the banking operation data;
It is removed from the banking operation data and lists the fund less than 1 year by the current the end of month;
From in the banking operation data remove by continued to exceed in current the previous year at the end of month 20% the day of trade net value without more New fund;And
The fund that current the end of month fund net assets are lower than 10,000,000 yuan is removed from the banking operation data.
4. selecting based method as described in claim 1, which is characterized in that by the banking operation data and the banking operation number According to the factor be divided into training dataset and test data set includes:
Judge the banking operation data lower monthly benefits whether be more than the banking operation data similar fund index, if More than then by the banking operation data label be 1, otherwise by the banking operation data label be 0, so by the finance Behavioral data carries out label and generates label data;And
The factor of the banking operation data and the label data randomly divide with preset ratio and generate the instruction Practice data set and the test data set.
5. selecting based method as described in claim 1, which is characterized in that determine optimum classifier in the classifier from fitting Include:
Each classifier after fitting is tested using the test data set, and calculates the first steady of each classifier Determine coefficient C1, wherein C1 represents degree of stability of the classifier used in the test data when;
Each classifier after fitting is tested using current fund, and calculates second coefficient of stability of each classifier C2, C2 represent degree of stability of the classifier used in the current fund when;
Each classifier after fitting is tested using current fund, and calculates the head combination information of each classifier Ratio IR;
According to calculation formula A=C1*C2*IR, the final index A of each classifier is calculated;And
Classifier corresponding to maximum final index A is determined as the optimum classifier.
6. selecting based method as described in claim 1, which is characterized in that the method also includes:
The Asset Allocation of major class Asset Allocation model is carried out to the investment group with predetermined period to the investment combination assets The investment weight of joint production is regularly updated.
7. selecting based method as described in claim 1, which is characterized in that the major class Asset Allocation model is risk par mould Type.
8. a kind of select based devices, which is characterized in that described device includes:
Module is constructed, constructs the banking operation number for reading banking operation data, and according to the banking operation data of reading According to the factor, wherein the factor of the banking operation data include year earning rate, maximum withdraw, Sharpe Ratio, downlink standard One of entire period of actual operation of difference, Suo Tinuo ratio, the scale of Fund Company, the scale of fund or fund manager is a variety of;
Data division module, for the factor of the banking operation data and the banking operation data to be divided into training data Collection and test data set;
Fitting module, for being fitted multiple classifiers respectively using the training dataset and the test data set, wherein institute Stating classifier includes logistic regression classifier, support vector machine classifier, Gauss Naive Bayes Classifier and random forest point Class device;
Classifier determining module, for determining optimum classifier from the classifier of fitting;
Investment combination assets determining module, for obtaining current fund data, and the working as from acquisition using the optimum classifier The fund that preset quantity is selected in preceding fund data determines investment combination assets;And
Asset Allocation module, for investment combination assets to be carried out with the Asset Allocation of major class Asset Allocation model, and described in determination The investment weight of the fund of investment combination assets.
9. a kind of electronic equipment, it is characterised in that: the electronic equipment includes processor, and the processor is for executing memory It is realized when the computer program of middle storage and selects based method as described in any one of claim 1-7.
10. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program It is realized when being executed by processor and selects based method as described in any one of claim 1-7.
CN201811536792.9A 2018-12-15 2018-12-15 Select based method, device, electronic equipment and computer readable storage medium Pending CN109767333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811536792.9A CN109767333A (en) 2018-12-15 2018-12-15 Select based method, device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811536792.9A CN109767333A (en) 2018-12-15 2018-12-15 Select based method, device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN109767333A true CN109767333A (en) 2019-05-17

Family

ID=66451901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811536792.9A Pending CN109767333A (en) 2018-12-15 2018-12-15 Select based method, device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109767333A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298759A (en) * 2019-05-29 2019-10-01 苏宁易购集团股份有限公司 A kind of fund diagnostic method, device and computer readable storage medium
CN112767132A (en) * 2021-01-26 2021-05-07 北京国腾联信科技有限公司 Data processing method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298759A (en) * 2019-05-29 2019-10-01 苏宁易购集团股份有限公司 A kind of fund diagnostic method, device and computer readable storage medium
CN112767132A (en) * 2021-01-26 2021-05-07 北京国腾联信科技有限公司 Data processing method and system
CN112767132B (en) * 2021-01-26 2024-02-02 北京国腾联信科技有限公司 Data processing method and system

Similar Documents

Publication Publication Date Title
De Andrés et al. Bankruptcy forecasting: A hybrid approach using Fuzzy c-means clustering and Multivariate Adaptive Regression Splines (MARS)
Hajizadeh et al. Application of data mining techniques in stock markets: A survey
CN111402061B (en) Asset management method and system
CN108256691A (en) Refund Probabilistic Prediction Model construction method and device
TWI248001B (en) Methods and apparatus for automated underwriting of segmentable portfolio assets
CN107590688A (en) The recognition methods of target customer and terminal device
CN104321794B (en) A kind of system and method that the following commercial viability of an entity is determined using multidimensional grading
CN108280541A (en) Customer service strategies formulating method, device based on random forest and decision tree
Becker et al. ANP-based analysis of ICT usage in Central European enterprises
CN109598300A (en) A kind of assessment system and method
CN109784779A (en) Financial risk prediction technique, device and storage medium
ElBahrawy et al. Wikipedia and cryptocurrencies: interplay between collective attention and market performance
CN110147389A (en) Account number treating method and apparatus, storage medium and electronic device
CN109636620A (en) Asset Allocation method, apparatus, electronic equipment and computer readable storage medium
Wanke et al. Revisiting camels rating system and the performance of Asean banks: a comprehensive mcdm/z-numbers approach
CN107851283A (en) The segmentation and layering of the comprehensive method of investment combination of investment securities
CN109767333A (en) Select based method, device, electronic equipment and computer readable storage medium
Makkonen et al. Multi‐criteria decision support in the liberalized energy market
CN108985595A (en) The move transaction service evaluation method and device mutually commented based on counterparty
Pritam et al. A novel methodology for perception-based portfolio management
CN111667307A (en) Method and device for predicting financial product sales volume
Önder et al. REITs in Turkey: Fundamentals vs. market
CN116361542A (en) Product recommendation method, device, computer equipment and storage medium
KR101960863B1 (en) System of valuation of technology
Niknya et al. Financial distress prediction of Tehran Stock Exchange companies using support vector machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination