WO2019200742A1 - 短期盈利的预测方法、装置、计算机设备和存储介质 - Google Patents

短期盈利的预测方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2019200742A1
WO2019200742A1 PCT/CN2018/095483 CN2018095483W WO2019200742A1 WO 2019200742 A1 WO2019200742 A1 WO 2019200742A1 CN 2018095483 W CN2018095483 W CN 2018095483W WO 2019200742 A1 WO2019200742 A1 WO 2019200742A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
prediction
feature data
short
term
Prior art date
Application number
PCT/CN2018/095483
Other languages
English (en)
French (fr)
Inventor
王义文
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to JP2019570544A priority Critical patent/JP6855604B2/ja
Publication of WO2019200742A1 publication Critical patent/WO2019200742A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Definitions

  • the present application relates to the field of Internet technologies, and in particular, to a short-term profit forecasting method, apparatus, computer device, and storage medium.
  • Blockchain is a decentralized, trust-free new data architecture that is owned, managed, and supervised by all nodes in the network and does not accept a single aspect of control. Since the blockchain is a new type of data architecture, the amount of data in the early stage of the blockchain layout is small. It is difficult for financial institutions such as banks to complete short-term profit forecasts through the current “small data”, so that it is impossible to issue appropriate loans. Amounts and other issues.
  • the main purpose of the present application is to provide a method, device, computer device and storage medium for predicting short-term profit of an enterprise in the case where the amount of data related to the enterprise in the early stage of the blockchain layout is small.
  • the present application proposes a short-term profit forecasting method for obtaining when a data amount related to a loan object is less than a preset amount on a blockchain, and the prediction method includes: obtaining a loan object from a blockchain Relevant first related data;
  • the present application also provides a short-term profit forecasting device, which is used when the amount of data related to a loan object is less than a preset amount is obtained on the blockchain, and the predicting device includes:
  • An obtaining unit configured to acquire, from the blockchain, first related data related to the loan object
  • a clustering unit configured to input the first related data into a K-means algorithm, and perform a first clustering calculation
  • the regression unit is configured to perform regression prediction of various types of clusters obtained by the first clustering calculation in a preset manner to obtain a first prediction result
  • a determining unit configured to determine a short-term profitability of the loan object according to the first prediction result.
  • the application further provides a computer device comprising a memory and a processor, the memory storing computer readable instructions, the processor executing the computer readable instructions to implement the steps of any of the methods described above.
  • the present application also provides a computer non-transitory readable storage medium having stored thereon computer readable instructions, wherein the computer readable instructions are executed by a processor to implement the method of any of the above step.
  • the short-term profit prediction method, apparatus, computer equipment and storage medium of the present application firstly cluster the obtained small amount of data through the K-means algorithm, and then predict the prediction result by the regression algorithm, and finally determine the loan object according to the prediction result.
  • Short-term profitability It solves the problem that banks and other financial institutions cannot accurately predict the short-term profitability of loan companies when there is less data related to the early stage of the data link of each enterprise, so as to facilitate the relatively accurate definition of the loan amount of the loan object, so as to reduce the banking institutions. Borrowing risk.
  • FIG. 1 is a schematic flow chart of a method for predicting short-term profit according to an embodiment of the present invention
  • FIG. 2 is a schematic flow chart of a method for predicting short-term profit according to an embodiment of the present invention
  • FIG. 3 is a schematic block diagram showing the structure of a short-term profit forecasting apparatus according to an embodiment of the present invention.
  • FIG. 4 is a schematic block diagram showing the structure of a regression unit according to an embodiment of the present invention.
  • FIG. 5 is a schematic block diagram showing the structure of a clustering unit according to an embodiment of the present invention.
  • FIG. 6 is a schematic block diagram showing the structure of a short-term profit forecasting apparatus according to an embodiment of the present invention.
  • FIG. 7 is a schematic block diagram showing the structure of a computer device according to an embodiment of the present invention.
  • the present application provides a short-term profit forecasting method for obtaining when the amount of data related to a loan object is less than a preset amount on a blockchain.
  • working capital loans of financial institutions such as banks are generally classified into temporary loans, short-term loans and medium-term loans.
  • the short-term loans generally have a working capital loan of three months to one year (excluding three months and one year).
  • the rules extracted from historical data may be correct for a certain period of time, but after a period of time, the correct probability is reduced.
  • the length of the forecast time range it can be divided into short-term forecast, medium-term forecast and long-term forecast. In general, the shorter the prediction time range, the higher the prediction quality; conversely, the lower the accuracy of the prediction result.
  • the amount of data on the blockchain is less than the preset amount, which is a limited condition, which mainly limits the method for each enterprise in the early stage of the data link layout, and the various data are used in relatively small cases.
  • the amount of data can be called “small data” relative to the current "big data.”
  • the above prediction method includes the steps:
  • S2 input the first related data into a K-means algorithm, and perform a first clustering calculation
  • the loan object is a company or an individual who needs to make a loan to a financial institution such as a bank.
  • the first related data may be all data related to the loan object on the blockchain, or may be retrieved according to specified requirements, for example, according to different enterprises or projects, obtaining different data in the blockchain, such as a procurement agent.
  • Financing enterprises can obtain financial institution block data, core enterprise block data, warehouse logistics block data, dealer block data, and so on.
  • the K-means algorithm is an input cluster number k, and a database containing n data objects, and outputs an algorithm that satisfies the minimum standard k clusters of variance.
  • the k-means algorithm accepts the input quantity k; then divides the n data objects into k clusters so that the obtained clusters are satisfied; the object similarity in the same cluster is higher; and the object similarity in different clusters Smaller.
  • the principle is as follows: first set the positions of several centers, calculate the distance from all the points to these centers, and then find the points belonging to these centers. For example, the distance between point A and center 1 is the number one. Average all points belonging to No. 1 to get a new center point. Iterate continuously until the center point belonging to each center is unchanged, and the final center position is obtained to complete the clustering of the data.
  • step S2 the specific process of step S2 above is as follows:
  • step S24 Determine whether the cluster center and the value change. If the change occurs, the process returns to step S22, and if it does not change, the cluster ends.
  • This application uses K-means algorithm for data clustering, which is simple and fast.
  • the algorithm maintains scalability and high efficiency. When the cluster is close to Gaussian distribution, the effect is better.
  • the first prediction result is a result calculated by the regression prediction of the various clusters obtained by the first cluster calculation by a preset manner, and because the first related data is related data of the loan object, so the first The forecast results can reflect the profitability of the loan target in a short period of time.
  • the basic steps of regression prediction are as follows: 1. Determine the independent and dependent variables according to the prediction target. Specifically, the specific target of the forecast is determined, and the dependent variable is also determined.
  • the sales volume Y is the dependent variable.
  • the relevant influencing factors related to the forecast target namely the independent variables
  • the regression analysis is a mathematical statistical analysis process performed on the causal influence factors (independent variables) and the predicted objects (dependent variables). The established regression equation only makes sense when there is a relationship between the variable and the dependent variable.
  • the regression prediction model is used to calculate the predicted value, and the predicted value is comprehensively analyzed to determine the final predicted value.
  • the data is clustered first, and then the data is subjected to regression prediction after clustering, and the prediction speed is faster.
  • the short-term profitability of the loan object is determined according to the first prediction result.
  • a financial institution such as a bank can determine the loan amount of the above-mentioned loan object based on its profitability, that is, the upper limit of the loan amount that can be given to the above-mentioned loan object.
  • the first prediction result may be a number representing a level, for example, divided into 1-10 levels. As the level increases, the short-term profitability of the loan object is stronger, and the amount of the loan is correspondingly higher.
  • the loan amount is also related to the data such as the registered capital and market value of the loan object.
  • step S3 of performing the regression prediction of the preset types of clusters in the first clustering calculation includes:
  • S31 Input the calculated various types of clusters into a preset SVR prediction model for regression prediction.
  • the above SVR is an important application branch of the support vector machine (SVM).
  • the specific process is:
  • LIBSVM a software package developed and designed by Professor Lin Chih-Jen of Taiwan University, which is simple, easy to use and fast and efficient for SVM pattern recognition and regression
  • the user-specified parameters are used as C/l. .which is, User specified, LIBSVM solves the following problems:
  • the above SVR (Support Vector Regression Algorithm) mainly realizes linear regression by constructing a linear decision function in a high-dimensional space by upgrading the clustering result.
  • the basis is mainly the e-insensitive loss function.
  • kernel function algorithm If the fitted mathematical model expresses a curve in a multidimensional space, the result obtained from the e-insensitive loss function is the "e-pipe" that includes the curve and the training point. Of all the sample points, only the portion of the sample points distributed on the "wall" determines the position of the pipe. This part of the training sample is called the "support vector."
  • support vector This part of the training sample is called the "support vector."
  • traditional fitting methods usually add higher order terms after the linear equation.
  • the step S2 of performing the first clustering calculation by inputting the first related data into the K-means algorithm includes:
  • S22 Perform correlation analysis on the extracted feature data to obtain irrelevant feature data that is not related to other feature data;
  • step S201 to S203 feature extraction is performed on the first related data related to the loan object, and correlation analysis is performed to find irrelevant feature data in the feature data that is not related to other feature data, and then the unrelated features are obtained.
  • the first related data corresponding to the data is removed from the first related data, and the first related data is used for clustering calculation, and the obtained clustering is more accurate, because the first related data corresponding to the irrelevant feature data is presented, so Improve the efficiency of clustering calculations.
  • the method for performing feature extraction on the first related data is specifically: using the Relief algorithm (the Relief algorithm is a feature weighting algorithm, and assigning different weights according to the correlation of each feature and category) Features with weights less than a certain threshold will be removed for feature extraction.
  • the Relief algorithm randomly selects a sample R from the training set D, and then searches for the nearest neighbor sample H from the samples of the same type R, called Near Hit, and finds the nearest neighbor sample M from the samples of different R types, called NearMiss.
  • the training data set be D, the sample sampling frequency m, the threshold weight ⁇ of the feature weight, and the number of nearest neighbor samples output as the characteristic weight T of each characteristic:
  • W(A) W(A)-diff(A,R,H)/m+diff(A,R,M)/m
  • the step S202 of performing correlation analysis on the extracted feature data to obtain irrelevant feature data that is not related to other feature data includes:
  • S2021 Create the feature data into a scattergram, and record feature data corresponding to the discrete points in the scattergram as the irrelevant feature data.
  • the scatter diagram refers to the distribution map of the data points on the Cartesian coordinate system plane in the regression analysis; it is generally used to compare the aggregated data across the categories. The more data you have in a scatter plot, the better the comparison will be.
  • the feature data is generally a matrix.
  • a scatter plot matrix can be used to simultaneously draw a scatter plot between the variables, so that the main correlation between multiple variables can be quickly found.
  • the process of making the above feature data into a scatter plot is the process of visualization, and the feature data is visualized, so that the human can visually distinguish the existence of discrete points on the graph or image by the naked eye, and then select discrete points, computer equipment. The feature data corresponding to the selected discrete points is recorded as irrelevant feature data.
  • the step S202 of performing correlation analysis on the extracted feature data to obtain irrelevant feature data that is not related to other feature data includes:
  • S2022 Perform correlation matrix analysis on the feature data, and extract the irrelevant feature data that is not related to other feature data.
  • the above correlation matrix is also called a correlation coefficient matrix, which is composed of correlation coefficients between columns of the matrix. That is to say, the elements of the i-th row and the j-th column of the correlation matrix are the correlation coefficients of the i-th column and the j-th column of the original matrix.
  • the covariance matrix is generally used for analysis. The covariance is used to measure the total error of two variables. If the trends of the two variables are consistent, the covariance is positive, indicating that the two variables are positively correlated. If the two variables change in opposite directions, the covariance is a negative value, indicating that the two variables are negatively correlated. If the two variables are independent of each other, the covariance is 0, indicating that the two variables are irrelevant. When the variables are greater than or equal to three groups, the corresponding covariance matrix is used.
  • the method includes:
  • the second related data on the non-blockchain refers to data that is not recorded on the blockchain, and is generally data in a big data network.
  • the clustering algorithm and the regression prediction method for the second related data are identical to the first related data described above, and will not be described again.
  • comparing the first prediction result obtained according to the first correlation data with the second prediction result obtained according to the second correlation data that is, setting a verification step to determine whether the first prediction result is available.
  • the historical data of each enterprise will have a large amount of existence on the Internet with big data, such as the enterprise's own server, or the server of other enterprises related to the enterprise.
  • the second prediction result obtained by using "big data” on the Internet is mainly used to verify the first prediction result obtained by using "small data” on the blockchain, and only the difference between the second prediction result and the first prediction result is obtained. If the value is less than the preset threshold, it is determined that the first prediction result is substantially correct and can be used.
  • the method before the step S2 of inputting the first related data into the K-means algorithm and performing the first clustering calculation, the method includes:
  • a data threshold is set, and when the acquired data amount of the first related data is greater than the data threshold, it has deviated from the "small data" range to which the short-term profit prediction method is applied. Therefore, the subsequent clustering, regression prediction, and the like are stopped, and the prediction method is switched.
  • the specific switching method may be: inputting the obtained first related data into a preset existing relatively mature prediction model, such as a corporate profit model based on the TD-ABC model.
  • the first related data may be analyzed to include fraud data
  • the specific method may be: performing feature extraction on the acquired first related data to obtain feature data; extracting the feature data Unrelated feature data not related to other feature data; and then the outlier data is identified by the Voronoi algorithm to obtain fraud data.
  • the loan credit value of the loan object can be analyzed by the amount of fraud data. Then determine the loan amount of the loan object based on the reputation value and short-term profitability.
  • a company needs to find a bank to make a loan, and a bank P needs to evaluate a company.
  • the evaluation process is as follows: 1. Collect all the data related to the company a through the blockchain. Such as a company's sales data, production data, financial data, and so on. Then, feature extraction is performed on the acquired data, and useless data is deleted in advance, which improves the speed and efficiency of subsequent cluster calculation.
  • the specific deletion method is to first visually form the scatter plot by extracting the data, and then delete the discrete points in the scatter plot. 2.
  • the data of the a company obtained from the blockchain is clustered by the K-means algorithm. 3.
  • the results of the clustering calculation are subjected to SVR regression prediction, and then the results of the profitability of the enterprise are obtained; 4.
  • the reputation of the enterprise is judged by the identification method of the fraud data mentioned above; 5.
  • the bank of the bank is based on the reputation of the enterprise. Profitability, etc. determine whether it is possible to lend to a company, as well as the maximum loan limit. Specifically, if the reputation of a company is less than the preset value, the loan is refused to the enterprise; if the reputation of the enterprise is the default value, the enterprise can be paid to the enterprise, and the maximum profit is calculated in combination with the profitability of the enterprise. The loan limit, etc., thus effectively improving the ability of P Bank to avoid risks.
  • the specific data obtained by a company in the data chain include: the type of goods purchased, and the data of the procurement funds; customs export goods, customs duties, imported goods, customs duties; domestic sales data; sales product data; loan data; repayment credit data; Inventory data; logistics related data (quantity of warehouses, geographical distribution of warehouses, storage data of each warehouse, distribution of sales territory).
  • the short-term profit forecasting method of the present application firstly clusters the obtained "small data” according to the K-means algorithm, and then predicts the predicted result by the regression algorithm, and finally determines the short-term profitability of the loan object according to the predicted result. It solves the problem that banks and other financial institutions cannot accurately predict the short-term profitability of loan companies when there is less data related to the early stage of the data link of each enterprise, so as to facilitate the relatively accurate definition of the loan amount of the loan object, so as to reduce the banking institutions. Borrowing risk.
  • an embodiment of the present application further provides a short-term profit forecasting apparatus, which is used when the amount of data related to a loan object is less than a preset amount acquired on a blockchain.
  • working capital loans of financial institutions such as banks are generally classified into temporary loans, short-term loans and medium-term loans.
  • the short-term loans generally have a working capital loan of three months to one year (excluding three months and one year).
  • the rules extracted from historical data may be correct for a certain period of time, but after a period of time, the correct probability is reduced.
  • the length of the forecast time range it can be divided into short-term forecast, medium-term forecast and long-term forecast. In general, the shorter the prediction time range, the higher the prediction quality; conversely, the lower the accuracy of the prediction result.
  • the amount of data on the blockchain is less than the preset amount, which is a limited condition, which mainly limits the method for each enterprise in the early stage of the data link layout, and the various data are used in relatively small cases.
  • the amount of data can be called “small data” relative to the current "big data.”
  • the above prediction device includes:
  • the obtaining unit 10 is configured to obtain, from the blockchain, first related data related to the loan object;
  • a clustering unit 20 configured to input the first correlation data into a K-means algorithm, and perform a first clustering calculation
  • the regression unit 30 is configured to perform regression prediction of various types of clusters obtained by the first clustering calculation in a preset manner to obtain a first prediction result;
  • the determining unit 40 is configured to determine the short-term profitability of the loan object according to the first prediction result.
  • the loan object is a company or an individual who needs to make a loan to a financial institution such as a bank.
  • the first related data may be all data related to the loan object on the blockchain, or may be retrieved according to specified requirements, for example, according to different enterprises or projects, obtaining different data in the blockchain, such as a procurement agent.
  • Financing enterprises can obtain financial institution block data, core enterprise block data, warehouse logistics block data, dealer block data, and so on.
  • the K-means algorithm is an input cluster number k, and a database containing n data objects, and outputs an algorithm that satisfies the minimum standard k clusters of variance.
  • the k-means algorithm accepts the input quantity k; then divides the n data objects into k clusters so that the obtained clusters are satisfied; the object similarity in the same cluster is higher; and the object similarity in different clusters Smaller.
  • the principle is as follows: first set the positions of several centers, calculate the distance from all the points to these centers, and then find the points belonging to these centers. For example, the distance between point A and center 1 is the number one. Average all points belonging to No. 1 to get a new center point. Iterate continuously until the center point belonging to each center is unchanged, and the final center position is obtained to complete the clustering of the data.
  • This application uses K-means algorithm for data clustering, which is simple and fast.
  • the algorithm maintains scalability and high efficiency. When the cluster is close to Gaussian distribution, the effect is better.
  • the above-mentioned regression unit 30 the above-mentioned regression prediction is based on the principle of relevance of prediction, and the factors affecting the prediction target are found out, and then the approximate expression of the functional relationship between these factors and the prediction target is found, and the mathematical expression is used.
  • the first prediction result is a result calculated by the regression prediction of the various clusters obtained by the first cluster calculation by a preset manner, and because the first related data is related data of the loan object, so the first The forecast results can reflect the profitability of the loan target in a short period of time.
  • the basic steps of regression prediction are as follows: (1) Determine the independent and dependent variables according to the prediction target. Specifically, the specific target of the forecast is determined, and the dependent variable is also determined.
  • the sales volume Y is the dependent variable.
  • the relevant influencing factors related to the forecasting target namely the independent variables
  • (2) Establish a regression prediction model Specifically, based on the historical statistics of independent variables and dependent variables, a regression analysis equation, ie, a regression prediction model, is established.
  • (3) Conduct relevant analysis Specifically, the regression analysis is a mathematical statistical analysis process performed on the causal influence factors (independent variables) and the predicted objects (dependent variables). The established regression equation only makes sense when there is a relationship between the variable and the dependent variable.
  • the regression prediction model is used to calculate the predicted value, and the predicted value is comprehensively analyzed to determine the final predicted value.
  • the data is clustered first, and then the data is subjected to regression prediction after clustering, and the prediction speed is faster.
  • the above determining unit 40 it is for determining the short-term profitability of the loan object based on the first prediction result. Then, a financial institution such as a bank can determine the loan amount of the above-mentioned loan object based on its profitability, that is, the upper limit of the loan amount that can be given to the above-mentioned loan object.
  • the first prediction result may be a number representing a level, for example, divided into 1-10 levels. As the level increases, the short-term profitability of the loan object is stronger, and the amount of the loan is correspondingly higher.
  • the loan amount is also related to the data such as the registered capital and market value of the loan object.
  • the foregoing regression unit 30 includes:
  • the SVR prediction module 31 is configured to input the calculated various types of clusters into a preset SVR prediction model for regression prediction.
  • the SVR is an important application branch of the support vector machine (SVM).
  • the specific process is:
  • LIBSVM solves the following problems:
  • the above SVR (Support Vector Regression Algorithm) mainly realizes linear regression by constructing a linear decision function in a high-dimensional space by upgrading the clustering result.
  • the basis is mainly the e-insensitive loss function.
  • kernel function algorithm If the fitted mathematical model expresses a curve in a multidimensional space, the result obtained from the e-insensitive loss function is the "e-pipe" that includes the curve and the training point. Of all the sample points, only the portion of the sample points distributed on the "wall" determines the position of the pipe. This part of the training sample is called the "support vector."
  • support vector This part of the training sample is called the "support vector."
  • traditional fitting methods usually add higher order terms after the linear equation.
  • the clustering unit 20 includes:
  • An extraction module 21 configured to perform feature extraction on the first related data
  • the analyzing module 22 is configured to perform correlation analysis on the extracted feature data to obtain irrelevant feature data that is not related to other feature data;
  • the clustering module 23 is configured to: first clear the first related data corresponding to the unrelated feature data in the first related data, and input the data into the K-means algorithm to perform the first clustering calculation.
  • the extraction module 21, the analysis module 22, and the clustering module 23 feature extraction is performed on the first related data related to the loan object, and correlation analysis is performed to find irrelevant feature data in the feature data that is not related to other feature data. And then the first related data corresponding to the irrelevant feature data is removed from the first related data, and the first related data is used for clustering calculation, and the obtained cluster is more accurate because the unrelated feature data is corresponding.
  • the first related data is proposed, so the efficiency of clustering calculation is improved.
  • the method for performing feature extraction on the first related data is specifically: using the Relief algorithm (the Relief algorithm is a feature weighting algorithm, and assigning different weights according to the correlation of each feature and category) Features with weights less than a certain threshold will be removed for feature extraction.
  • the Relief algorithm randomly selects a sample R from the training set D, and then searches for the nearest neighbor sample H from the samples of the same type R, called Near Hit, and finds the nearest neighbor sample M from the samples of different R types, called NearMiss. Then update the weight of each feature according to the following rules: If the distance between R and Near Hit on a feature is less than the distance between R and Near Miss, then the feature is beneficial for distinguishing between nearest neighbors of the same type and different classes, then Increasing the weight of the feature; conversely, if the distance between R and Near Hit is greater than the distance between R and Near Miss, indicating that the feature has a negative effect on distinguishing between nearest neighbors of the same type and different classes, then the weight of the feature is reduced. .
  • the analysis module 22 includes: a visual analysis sub-module, configured to generate the feature data into a scatter plot, and record feature data corresponding to discrete points in the scatter plot as the Irrelevant feature data.
  • the scatter diagram refers to a distribution map of data points on a Cartesian coordinate plane in regression analysis; and is generally used to compare aggregated data across categories. The more data you have in a scatter plot, the better the comparison will be.
  • the feature data is generally a matrix.
  • a scatter plot matrix can be used to simultaneously draw a scatter plot between the variables, so that the main correlation between multiple variables can be quickly found.
  • the process of making the above feature data into a scatter plot is the process of visualization, and the feature data is visualized, so that the human can visually distinguish the existence of discrete points on the graph or image by the naked eye, and then select discrete points, computer equipment. The feature data corresponding to the selected discrete points is recorded as irrelevant feature data.
  • the analyzing module 22 includes: a matrix analysis sub-module, configured to perform correlation matrix analysis on the feature data, and extract the irrelevant feature data that is not related to other feature data.
  • the correlation matrix is also called a correlation coefficient matrix, which is composed of correlation coefficients between columns of the matrix. That is to say, the elements of the i-th row and the j-th column of the correlation matrix are the correlation coefficients of the i-th column and the j-th column of the original matrix.
  • a covariance matrix is generally used for analysis. The covariance is used to measure the overall error of two variables. If the trends of the two variables are consistent, the covariance is a positive value, indicating that the two variables are positively correlated. If the two variables change in opposite directions, the covariance is a negative value, indicating that the two variables are negatively correlated. If the two variables are independent of each other, the covariance is 0, indicating that the two variables are irrelevant. When the variables are greater than or equal to three groups, the corresponding covariance matrix is used.
  • the short-term profit forecasting apparatus further includes:
  • the data obtaining unit 50 is configured to acquire second related data related to the loan object on the non-blockchain;
  • a data clustering unit 60 configured to input the second related data into the K-means algorithm, and perform a second clustering calculation
  • the clustering regression unit 70 is configured to perform regression prediction of various types of clusters obtained by the second clustering calculation in a preset manner to obtain a second prediction result;
  • the comparing unit 80 is configured to determine whether a difference between the first prediction result and the second prediction result is less than a preset threshold
  • the determining unit 90 is configured to determine, if the difference is less than the threshold, a result of determining a short-term profitability of the loan object according to the first prediction result as a usable result.
  • the second related data on the non-blockchain refers to data that is not recorded on the blockchain, and is generally data in a big data network.
  • the clustering algorithm and the regression prediction method for the second related data are identical to the first related data described above, and will not be described again.
  • comparing the first prediction result obtained according to the first correlation data with the second prediction result obtained according to the second correlation data that is, setting a verification step to determine whether the first prediction result is available.
  • the historical data of each enterprise will have a large amount of existence on the Internet with big data, such as the enterprise's own server, or the server of other enterprises related to the enterprise. In the Internet environment, it is possible to get it.
  • the second prediction result obtained by using "big data” on the Internet is mainly used to verify the first prediction result obtained by using "small data” on the blockchain, and only the difference between the second prediction result and the first prediction result is obtained. If the value is less than the preset threshold, it is determined that the first prediction result is substantially correct and can be used.
  • the short-term profit forecasting device further includes:
  • a determining unit configured to determine whether the data amount of the first related data is greater than a preset data threshold
  • a switching unit configured to input the first related data into a preset big data-based prediction algorithm for prediction.
  • a data threshold is set, and when the acquired data amount of the first related data is greater than the data threshold, it has deviated from the applicable "small data" of the short-term profit forecasting device.
  • the scope so it will stop the subsequent clustering, regression prediction and other prediction processes, but switch the prediction method.
  • the specific switching method may be: inputting the obtained first related data into a preset existing relatively mature prediction model, such as a corporate profit model based on the TD-ABC model.
  • the short-term profit forecasting device further includes:
  • the fraud analysis unit is configured to analyze whether the first related data includes fraud data, and the specific method may be: performing feature extraction on the acquired first related data to obtain feature data; extracting and extracting from the feature data Other feature data is irrelevant irrelevant feature data; then the outlier data is identified by the Voronoi algorithm to obtain fraud data.
  • the loan credit value of the loan object can be analyzed by the amount of fraud data. Then determine the loan amount of the loan object based on the reputation value and short-term profitability.
  • a company needs to find a bank to make a loan, and a bank P needs to evaluate a company.
  • the evaluation process is as follows: 1. Collect all the data related to the company a through the blockchain. Such as a company's sales data, production data, financial data, and so on. Then, feature extraction is performed on the acquired data, and useless data is deleted in advance, which improves the speed and efficiency of subsequent cluster calculation.
  • the specific deletion method is to first visually form the scatter plot by extracting the data, and then delete the discrete points in the scatter plot. 2.
  • the data of the a company obtained from the blockchain is clustered by the K-means algorithm. 3.
  • the results of the clustering calculation are subjected to SVR regression prediction, and then the results of the profitability of the enterprise are obtained; 4.
  • the reputation of the enterprise is judged by the identification method of the fraud data mentioned above; 5.
  • the bank of the bank is based on the reputation of the enterprise. Profitability, etc. determine whether it is possible to lend to a company, as well as the maximum loan limit. Specifically, if the reputation of a company is less than the preset value, the loan is refused to the enterprise; if the reputation of the enterprise is the default value, the enterprise can be paid to the enterprise, and the maximum profit is calculated in combination with the profitability of the enterprise. The loan limit, etc., thus effectively improving the ability of P Bank to avoid risks.
  • the specific data obtained by a company in the data chain include: the type of goods purchased, and the data of the procurement funds; customs export goods, customs duties, imported goods, customs duties; domestic sales data; sales product data; loan data; repayment credit data; Inventory data; logistics related data (quantity of warehouses, geographical distribution of warehouses, storage data of each warehouse, distribution of sales territory).
  • the short-term profit forecasting device of the present application firstly clusters the acquired "small data” according to the K-means algorithm, and then predicts the predicted result by the regression algorithm, and finally determines the short-term profitability of the loan object according to the predicted result. It solves the problem that banks and other financial institutions cannot accurately predict the short-term profitability of loan companies when there is less data related to the early stage of the data link of each enterprise, so as to facilitate the relatively accurate definition of the loan amount of the loan object, so as to reduce the banking institutions. Borrowing risk.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 7.
  • the computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the memory provides an environment for the operation of operating systems and computer readable instructions in a non-volatile storage medium.
  • the database of the computer device is used to store acquired first related data and second related data, K-means algorithm model and the like.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection.
  • the computer readable instructions are executed by a processor to implement the processes of the various method embodiments described above.
  • An embodiment of the present invention further provides a computer non-volatile readable storage medium having stored thereon computer readable instructions, which are implemented by a processor to implement the processes of the foregoing method embodiments.

Abstract

本申请揭示了一种短期盈利的预测方法、装置、计算机设备和存储介质,其中预测方法,包括:从区块链上获取与贷款对象相关的第一相关数据;将第一相关数据输入到K-means算法中,进行第一次聚类计算;将第一次聚类计算得到的各类聚类进行预设方式的回归预测,得到第一预测结果;根据第一预测结果确定贷款对象的短期盈利能力。

Description

短期盈利的预测方法、装置、计算机设备和存储介质
本申请要求于2018年4月17日提交中国专利局、申请号为2018103452579,申请名称为“短期盈利的预测方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及到互联网技术领域,特别是涉及到一种短期盈利的预测方法、装置、计算机设备和存储介质。
背景技术
区块链是一种去中心化、无需信任的新型数据架构,它由网络中所有的节点共同拥有、管理和监督,不接受单一方面的控制。由于区块链是一种新型的数据架构,所以在区块链布局的前期数据量较少,银行等金融机构很难通过目前的“小数据”完成短期盈利预测,从而存在无法发放合适的贷款额度等问题。
技术问题
本申请的主要目的为提供一种在区块链布局前期企业相关数据量少的情况下,对企业的短期盈利的预测方法、装置、计算机设备和存储介质。
技术解决方案
本申请提出一种短期盈利的预测方法,用于在区块链上获取到与贷款对象相关的数据量小于预设量时使用,所述预测方法,包括:从区块链上获取与贷款对象相关的第一相关数据;
将所述第一相关数据输入到K-means算法中,进行第一次聚类计算;
将第一次聚类计算得到的各类聚类进行预设方式的回归预测,得到第一预测结果;
根据所述第一预测结果确定贷款对象的短期盈利能力。
本申请还提供一种短期盈利的预测装置,用于在区块链上获取到与贷款对象相关的数据量小于预设量时使用,所述预测装置,包括:
获取单元,用于从区块链上获取与贷款对象相关的第一相关数据;
聚类单元,用于将所述第一相关数据输入到K-means算法中,进行第一次聚类计算;
回归单元,用于将第一次聚类计算得到的各类聚类进行预设方式的回归预测,得到第一预测结果;
确定单元,用于根据所述第一预测结果确定贷款对象的短期盈利能力。
本申请还提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现上述任一项所述方法的步骤。
本申请还提供一种计算机非易失性可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现上述任一项所述的方法的步骤。
有益效果
本申请的短期盈利的预测方法、装置、计算机设备和存储介质,先对获取到的少量数据通过K-means 算法进行聚类,然后通过回归算法进行预测得到预测结果,最后根据预测结果确定贷款对象的短期盈利能力。解决了银行等金融机构在各企业数据链前期布局阶段相关数据较少的情况下,无法准确预测贷款企业的短期盈利能力的问题,便于相对准确地限定贷款对象的贷款额度,以减小银行机构的借贷风险。
附图说明
图1为本发明一实施例的短期盈利的预测方法的流程示意图;
图2为本发明一实施例的短期盈利的预测方法的流程示意图;
图3为本发明一实施例的短期盈利的预测装置的结构示意框图;
图4为本发明一实施例的回归单元的结构示意框图;
图5为本发明一实施例的聚类单元的结构示意框图;
图6为本发明一实施例的短期盈利的预测装置的结构示意框图;
图7为本发明一实施例的计算机设备的结构示意框图。
本发明的最佳实施方式
参照图1,本申请提供一种短期盈利的预测方法,用于在区块链上获取到与贷款对象相关的数据量小于预设量时使用。
本申请中,银行等金融机构流动资金贷款一般分为临时贷款、短期贷款和中期贷款,其中短期贷款期限一般为三个月至一年(不含三个月含一年)的流动资金贷款。因为市场变化反复无常,利用历史数据提炼出的规律在一定时间内可能是正确的,但是过一段时间后其正确的概率降低。按预测时间范围长短不同,可将其分为短期预测、中期预测和长期预测三种。一般地,预测时间范围越短,预测质量越高;反之,预测结果的准确性越低。本申请中,区块链上数据量小于预设量是一个限定条件,主要限定本方法针对各企业在数据链布局的前期,各种数据相对较少情况下使用,本申请中“小于预设量的数据量”相对目前“大数据”而言,可以称之为“小数据”。
上述预测方法,包括步骤:
S1、从区块链上获取与贷款对象相关的第一相关数据;
S2、将所述第一相关数据输入到K-means算法中,进行第一次聚类计算;
S3、将第一次聚类计算得到的各类聚类进行预设方式的回归预测,得到第一预测结果;
S4、根据所述第一预测结果确定贷款对象的短期盈利能力。
如上述步骤S1所述,上述贷款对象为需要到银行等金融机构贷款的企业或个人。上述第一相关数据可以是在区块链上与贷款对象相关的全部数据,也可以根据指定要求检索到的数据,比如根据不同的企业或项目,获取区块链上不同的数据,比如采购代理融资企业,其可以获取金融机构区块数据、核心企业区块数据、仓储物流区块数据、经销商区块数据等。
如上述步骤S2所述,上述K-means算法是一种输入聚类个数k,以及包含n个数据对象的数据库, 输出满足方差最小标准k个聚类的一种算法。k-means算法接受输入量k;然后将n个数据对象划分为k个聚类以便使得所获得的聚类满足;同一聚类中的对象相似度较高;而不同聚类中的对象相似度较小。其原理为:先初设几个中心的位置,计算所有点到这几个中心的距离,然后找出属于这几个中心的点,比如A点与1号中心距离最近就属于1号。将所有属于1号的点求平均就得到新的中心点。不断迭代直到属于每个中心的中心点不变,得到最后的中心位置,以完成数据的聚类。
本申请请中,上述步骤S2的具体过程如下:
S21、对于给定的一个包含n个d维数据点的相关数据的数据集(第一相关数据)X={x 1,x 2,…,x n},其中,x i∈R d,选择数据集中K个点作为初始聚类中心,每个对象代表一个类别的中心μ k(k=1,2,…,K)。
S22、计算各点到中心μ k的欧氏距离,按距离最近的准则分别将它们分配给与其最相似的聚类中心代表的类,形成K个簇C={c k,k=1,2,…,k}。每个簇c k代表一个类。计算该类各点到聚类中心μ k的距离平方和J(c k):
Figure PCTCN2018095483-appb-000001
S23、计算各类样本到其所在类别聚类中心μk总的距离平方和,直至最小:
Figure PCTCN2018095483-appb-000002
式中:若x i∈c k,d ki=1;
Figure PCTCN2018095483-appb-000003
d ki=0,则计算类内所有对象的均值作为该类的新聚类中心。
S24、判断聚类中心和值是否发生改变,若发生改变则转回步骤S22,若不再改变则聚类结束。
本申请使用K-means算法进行数据聚类,简单、快速,算法保持可伸缩性和高效性,当簇接近高斯分布时,效果更佳。
如上述步骤S3所述,上述回归预测就是把预测的相关性原则作为基础,把影响预测目标的各因素找出来,然后找出这些因素和预测目标之间的函数关系的近似表达,并且用数学的方法找出来。上述第一预测结果即为将第一次聚类计算得到的各类聚类通过预设方式的回归预测计算得出的结果,又因为上述第一相关数据是贷款对象的相关数据,所以第一预测结果在一定程度上可以反映贷款对象在短期内的盈利能力。回归预测的基本步骤如下:1、根据预测目标,确定自变量和因变量。具体地,明确预测的具体目标,也就确定了因变量。如预测具体目标是下一年度的销售量,那么销售量Y就是因变量。通过市场调查和查阅资料,寻找与预测目标的相关影响因素,即自变量,并从中选出主要的影响因素。2、建立回归预测模型。具体地,依据自变量和因变量的历史统计资料进行计算,在此基础上建立回归分析方程,即回归预测模型。3、进行相关分析。具体地,回归分析是对具有因果关系的影响因素(自变量)和预测对象(因变量)所进行的数理统计分析处理。只有当变量与因变量确实存在某种关系时,建立的回归方程才有意义。因此,作为自变量的因素与作为因变量的预测对象是否有关,相关程度如何,以及判断这种相关程度的把握性多大,就成为进行回归分析必须要解决的问题。进行相关分析,一般要求出相 关关系,以相关系数的大小来判断自变量和因变量的相关的程度。4、检验回归预测模型,计算预测误差。具体地,回归预测模型是否可用于实际预测,取决于对回归预测模型的检验和对预测误差的计算。回归方程只有通过各种检验,且预测误差较小,才能将回归方程作为预测模型进行预测。5、计算并确定预测值。具体地,利用回归预测模型计算预测值,并对预测值进行综合分析,确定最后的预测值。本申请中,先对数据进行聚类,然后在对聚类后数据进行回归预测,预测速度更快。
如上述步骤S4所述,即为根据第一预测结果确定贷款对象的短期盈利能力。然后银行等金融机构既可以根据其盈利能力确定上述贷款对象的贷款额度,即可以给上述贷款对象的贷款金额上限。上述第一预测结果可以是代表等级的数字,比如,分为1-10级,随着等级的提高,代表贷款对象的短期盈利能力越强,其贷款的额度也就相应的越高,本实施例中,贷款额度还与贷款对象的注册资金、市场价值等数据相关。
本实施例中,上述将第一次聚类计算得到的各类聚类进行预设方式的回归预测的步骤S3,包括:
S31、将计算得到的各类聚类输入到预设的SVR预测模型中进行回归预测。
如上述步骤S31所述,上述SVR(Support Vector Regression,支持向量回归),是支持向量机(SVM)的重要的应用分支。本实施例中,通过极小化目标函数来确定回归函数,回归函数为f(x)=wx+b。其具体过程为:
Figure PCTCN2018095483-appb-000004
限制条件为:(w TΦ(x i)+b)-c≤ε+ζ i
Figure PCTCN2018095483-appb-000005
Figure PCTCN2018095483-appb-000006
对偶问题为:
Figure PCTCN2018095483-appb-000007
限制条件为:e T(α-α *)=0,e T(α+α *)≤Cv
Figure PCTCN2018095483-appb-000008
近似函数为:
Figure PCTCN2018095483-appb-000009
类似于2002年提出的v-SVC,e T(α+α *)≤Cv不等式可以由等式进行替换。而且,由于用户经常选择C=1类似的小常量,导致C/l太小。因此,在LIBSVM(是台湾大学林智仁(Lin Chih-Jen)教授等开发设计的一个简单、易于使用和快速有效的SVM模式识别与回归的软件包)中,将用户指定的参数作为C/l.即,
Figure PCTCN2018095483-appb-000010
是用户指定的,LIBSVM解决了以下问题:
Figure PCTCN2018095483-appb-000011
限制条件为:
Figure PCTCN2018095483-appb-000012
Figure PCTCN2018095483-appb-000013
ε-SVR在参数
Figure PCTCN2018095483-appb-000014
下,其与v-SVR在参数
Figure PCTCN2018095483-appb-000015
下具有相同的解。
上式中,l为训练样本个数,这里l=k;C为平衡模型复杂性(1/2)w Tw和训练误差项的权重参数;ε为不敏感损失函数;ζ为松弛因子。K(x i,x)为核函数。
上述SVR(支持向量回归算法)主要是通过将聚类结果升维后,在高维空间中构造线性决策函数来实现线性回归,用e不敏感损失函数时,其基础主要是e不敏感损失函数和核函数算法。若将拟合的数学模型表达多维空间的某一曲线,则根据e不敏感损失函数所得的结果,就是包括该曲线和训练点的“e管道”。在所有样本点中,只有分布在“管壁”上的那一部分样本点决定管道的位置。这一部分训练样本称为“支持向量”。为适应训练样本集的非线性,传统的拟合方法通常是在线性方程后面加高阶项。此法诚然有效,但由此增加的可调参数未免增加了过拟合的风险。SVR采用核函数解决这一矛盾。用核函数代替线性方程中的线性项可以使原来的线性算法“非线性化”,即能做非线性回归。与此同时,引进核函数达到了“升维”的目的,而增加的可调参数是过拟合依然能控制。本申请,利用技术成熟的SVR算法,计算结果可靠,而且可以达到准确预测的效果。
在一个实施例中,上述将所述第一相关数据输入到K-means算法中,进行第一次聚类计算的步骤S2,包括:
S21、将所述第一相关数据进行特征提取;
S22、将提取的特征数据进行相关性分析,得到与其它特征数据不相关的不相关特征数据;
S23、将所述第一相关数据中与所述不相关特征数据对应的第一相关数据清除后输入到K-means算法中,进行第一次聚类计算。
如上述步骤S201至S203所述,将上述贷款对象相关的第一相关数据进行特征提取,进行相关性分析找出特征数据中与其它特征数据不相关的不相关特征数据,然后将这些不相关特征数据对应的第一相关数据从第一相关数据中剔除,使用留下的第一相关数据进行聚类计算,得到的聚类更加准确,因为将不相关特征数据对应的第一相关数据提出,所以提高聚类计算的效率。
本实施例中,对第一相关数据进行特征能提取的方法具体为:使用Relief算法(Relief算法是一种特征权重算法(Feature weighting algorithms),根据各个特征和类别的相关性赋予特征不同的权重,权重小于某个阈值的特征将被移除)进行特征提取。Relief算法从训练集D中随机选择一个样本R,然后从和R同类的样本中寻找最近邻样本H,称为Near Hit,从和R不同类的样本中寻找最近邻样本M,称为NearMiss,然后根据以下规则更新每个特征的权重:如果R和Near Hit在某个特征上的距离小于R和Near Miss上的距离,则说明该特征对区分同类和不同类的最近邻是有益的,则增加该特征的权重;反之,如果R和Near Hit在某个特征的距离大于R和Near Miss上的距离,说明该特征对区分同类和不 同类的最近邻起负面作用,则降低该特征的权重。以上过程重复m次,最后得到各特征的平均权重。特征的权重越大,表示该特征的分类能力越强,反之,表示该特征分类能力越弱。Relief算法的运行时间随着样本的抽样次数m和原始特征个数N的增加线性增加,因而运行效率非常高。具体算法如下所示:
设训练数据集为D,样本抽样次数m,特征权重的阈值δ,最近邻样本个数输出为各个特性的特征权重T:
1、置所有特征权重为0,T为空集。
2、for i=1 to m do
1)、随机选择一个样本R;
2)、从同类样本集中找到R的最近邻H,从不同类样本集中找最近邻样本M。
3)、for A=1 to N do
W(A)=W(A)-diff(A,R,H)/m+diff(A,R,M)/m
3、for A=1 to N do
if W(A)≥δ
把第A个特征添加到T中。
在一个实施例中,上述将提取的特征数据进行相关性分析,得到与其它特征数据不相关的不相关特征数据的步骤S202,包括:
S2021、将所述特征数据制作成散点图,将所述散点图中的离散点对应的特征数据记为所述不相关特征数据。
如上述步骤S2021所述,上述散点图(scatter diagram)在回归分析中是指数据点在直角坐标系平面上的分布图;通常用于比较跨类别的聚合数据。散点图中包含的数据越多,比较的效果就越好。本实施例中上述特征数据一般为矩阵,此时可利用散点图矩阵来同时绘制各自变量间的散点图,这样可以快速发现多个变量间的主要相关性。将上述特征数据制作成散点图的过程即为可视化的过程,特征数据可视化处理,所以人可以个通过肉眼在图形或图像上直观的分辨出离散点的存在,然后选择出离散点,计算机设备会将选择的离散点对应的特征数据记为不相关特征数据。
在另一实施例中,上述将提取的特征数据进行相关性分析,得到与其它特征数据不相关的不相关特征数据的步骤S202,包括:
S2022、将所述特征数据进行相关矩阵分析,提取出与其它特征数据不相关的所述不相关特征数据。
如上述步骤S2022所述,上述相关矩阵也叫相关系数矩阵,其是由矩阵各列间的相关系数构成的。也就是说,相关矩阵第i行第j列的元素是原矩阵第i列和第j列的相关系数。本实施例中一般用到协方差矩阵进行分析,协方差用来衡量两个变量的总体误差,如果两个变量的变化趋势一致,协方差就是正 值,说明两个变量正相关。如果两个变量的变化趋势相反,协方差就是负值,说明两个变量负相关。如果两个变量相互独立,那么协方差就是0,说明两个变量不相关,当变量大于或等于三组的时候,即会使用相应的协方差矩阵。
参照图2,在本实施例中,上述根据所述第一预测结果确定贷款对象的短期盈利能力的步骤S4之后,包括:
S5、获取非区块链上的与所述贷款对象相关的第二相关数据;
S6、将所述第二相关数据输入到K-means算法中,进行第二次聚类计算;
S7、将第二次聚类计算得到的各类聚类进行预设方式的回归预测,得到第二预测结果;
S8、判断所述第一预测结果与所述第二预测结果的差值是否小于预设的阈值;
S9、若所述差值小于所述阈值,则判定根据所述第一预测结果确定贷款对象的短期盈利能力的结果为可用结果。
如上述步骤S5至S9所述,上述非区块链上的第二相关数据,是指没有记录在区块链上的数据,一般为大数据网络中数据。对第二相关数据的聚类算法和回归预测方法与上述的第一相关数据完全相同,再此不在赘述。本实施例中,将根据第一相关数据得到的第一预测结果与根据第二相关数据得到的第二预测结果进行比较,即为设置一道验证的步骤,以判断第一预测结果是否可用。本申请中,因为主要是针对区块链布局的前期,所以各企业的历史数据会有大量的存在与大数据的互联网上,如企业自己的服务器中,或则与企业相关的其它企业的服务器中,只要在互联网环境中,就有可能被获取到。本步骤中,主要将利用互联网上的“大数据”得到的第二预测结果验证利用区块链上的“小数据”得到的第一预测结果,只有第二预测结果和第一预测结果的差值小于预设的阈值才判定第一预测结果基本正确,可以使用。
在一个实施例中,上述将所述第一相关数据输入到K-means算法中,进行第一次聚类计算的步骤S2之前,包括:
S201、判断所述第一相关数据的数据量是否大于预设的数据阈值;
S202、若是,则将所述第一相关数据输入到预设的基于大数据的预测算法中进行预测。
如上述步骤S201和S202所述,就是设定了一个数据阈值,当获取到的第一相关数据的数据量大于数据阈值时,其已经脱离了上述短期盈利的预测方法适用的“小数据”范围,所以会停止后续的聚类、回归预测等步骤,而是切换预测方法。具体切换的方法可以是,将获取到的第一相关数据输入到预设的现有的相对成熟的预测模型中,比如基于TD-ABC模型的企业盈利模型等。
在一个实施例中,还可以分析上述的第一相关数据中是否含有欺诈数据,具体的方法可以为:将获取的第一相关数据进行特征提取,以得到特征数据;在所述特征数据中提取出与其它特征数据不相关的不相关特征数据;然后通过Voronoi算法对所述不相关特征数据进行异常值识别,得出欺诈数据。可以 通过欺诈数据的多少等情况,分析出贷款对象的借贷信誉值。然后结合信誉值和短期盈利能力确定贷款对象的贷款额度。
在一具体实施例中,a企业需要找P银行进行贷款,P银行则需要对a企业进行评估,其评估的过程为:1、通过在区块链上收集与该a企业相关的全部数据,如a企业的销售数据、生产数据、财务数据等。然后对获取到的数据进行特征提取,将无用的数据提前删除,已提高后续聚类计算的速度与效率。具体的删除方法为,先将提取出的数据进行可视化地形成散点图,然后将散点图中的离散点删除。2、将从区块链上获取到的a企业的数据通过K-means算法进行聚类计算。3、将聚类计算的结果进行SVR回归预测,进而得到该a企业盈利能力等结果;4、还通过上述欺诈数据的识别方法判断a企业的信誉等;5、P银行根据a企业的信誉、盈利能力等确定是否可以贷款给a企业,以及最大贷款限额等。具体的,如果a企业的信誉小于预设值,则拒绝贷款给a企业;如果a企业的信誉为预设值则可以到款给a企业,此时在结合该a企业的盈利能力,计算最大的贷款限额等,从而有效地提高P银行规避风险的能力。具体获取a企业在数据链上的数据包括:采购货物种类,以及该采购经费数据;海关出口货物、关税,进口货物、关税;国内销售数据;销售产品数据;贷款数据;还贷信誉数据;货物库存数据;物流相关数据(仓库数量、仓库地理分布、每个仓库的存储数据、销售地域分布)等。
本申请的短期盈利的预测方法,先对获取到的“小数据”据通过K-means算法进行聚类,然后通过回归算法进行预测得到预测结果,最后根据预测结果确定贷款对象的短期盈利能力。解决了银行等金融机构在各企业数据链前期布局阶段相关数据较少的情况下,无法准确预测贷款企业的短期盈利能力的问题,便于相对准确地限定贷款对象的贷款额度,以减小银行机构的借贷风险。
参照图3,本申请实施例还提供一种短期盈利的预测装置,用于在区块链上获取到与贷款对象相关的数据量小于预设量时使用。
本申请中,银行等金融机构流动资金贷款一般分为临时贷款、短期贷款和中期贷款,其中短期贷款期限一般为三个月至一年(不含三个月含一年)的流动资金贷款。因为市场变化反复无常,利用历史数据提炼出的规律在一定时间内可能是正确的,但是过一段时间后其正确的概率降低。按预测时间范围长短不同,可将其分为短期预测、中期预测和长期预测三种。一般地,预测时间范围越短,预测质量越高;反之,预测结果的准确性越低。本申请中,区块链上数据量小于预设量是一个限定条件,主要限定本方法针对各企业在数据链布局的前期,各种数据相对较少情况下使用,本申请中“小于预设量的数据量”相对目前“大数据”而言,可以称之为“小数据”。
上述预测装置,包括:
获取单元10,用于从区块链上获取与贷款对象相关的第一相关数据;
聚类单元20,用于将所述第一相关数据输入到K-means算法中,进行第一次聚类计算;
回归单元30,用于将第一次聚类计算得到的各类聚类进行预设方式的回归预测,得到第一预测结果;
确定单元40,用于根据所述第一预测结果确定贷款对象的短期盈利能力.
在上述获取单元10中,上述贷款对象为需要到银行等金融机构贷款的企业或个人。上述第一相关数据可以是在区块链上与贷款对象相关的全部数据,也可以根据指定要求检索到的数据,比如根据不同的企业或项目,获取区块链上不同的数据,比如采购代理融资企业,其可以获取金融机构区块数据、核心企业区块数据、仓储物流区块数据、经销商区块数据等。
在上述聚类单元20中,上述K-means算法是一种输入聚类个数k,以及包含n个数据对象的数据库,输出满足方差最小标准k个聚类的一种算法。k-means算法接受输入量k;然后将n个数据对象划分为k个聚类以便使得所获得的聚类满足;同一聚类中的对象相似度较高;而不同聚类中的对象相似度较小。其原理为:先初设几个中心的位置,计算所有点到这几个中心的距离,然后找出属于这几个中心的点,比如A点与1号中心距离最近就属于1号。将所有属于1号的点求平均就得到新的中心点。不断迭代直到属于每个中心的中心点不变,得到最后的中心位置,以完成数据的聚类。
本申请请中,上述的聚类单元20的具体聚类过程如下:
(1)、对于给定的一个包含n个d维数据点的相关数据的数据集(第一相关数据)X={x 1,x 2,…,x n},其中,x i∈R d,选择数据集中K个点作为初始聚类中心,每个对象代表一个类别的中心μ k(k=1,2,…,K)。
(2)、计算各点到中心μ k的欧氏距离,按距离最近的准则分别将它们分配给与其最相似的聚类中心代表的类,形成K个簇C={c k,k=1,2,…,k}。每个簇c k代表一个类。计算该类各点到聚类中心μ k的距离平方和J(c k):
Figure PCTCN2018095483-appb-000016
(3)、计算各类样本到其所在类别聚类中心μk总的距离平方和,直至最小:
Figure PCTCN2018095483-appb-000017
式中:若x i∈c k,d ki=1;
Figure PCTCN2018095483-appb-000018
d ki=0,则计算类内所有对象的均值作为该类的新聚类中心。
(4)、判断聚类中心和值是否发生改变,若发生改变则转回步骤S22,若不再改变则聚类结束。
本申请使用K-means算法进行数据聚类,简单、快速,算法保持可伸缩性和高效性,当簇接近高斯分布时,效果更佳
在上述回归单元30中,上述回归预测就是把预测的相关性原则作为基础,把影响预测目标的各因素找出来,然后找出这些因素和预测目标之间的函数关系的近似表达,并且用数学的方法找出来。上述第一预测结果即为将第一次聚类计算得到的各类聚类通过预设方式的回归预测计算得出的结果,又因为上述第一相关数据是贷款对象的相关数据,所以第一预测结果在一定程度上可以反映贷款对象在短期内的盈利能力。回归预测的基本步骤如下:(1)根据预测目标,确定自变量和因变量。具体地,明确预测的具体目标,也就确定了因变量。如预测具体目标是下一年度的销售量,那么销售量Y就是因变量。通 过市场调查和查阅资料,寻找与预测目标的相关影响因素,即自变量,并从中选出主要的影响因素。(2)建立回归预测模型。具体地,依据自变量和因变量的历史统计资料进行计算,在此基础上建立回归分析方程,即回归预测模型。(3)进行相关分析。具体地,回归分析是对具有因果关系的影响因素(自变量)和预测对象(因变量)所进行的数理统计分析处理。只有当变量与因变量确实存在某种关系时,建立的回归方程才有意义。因此,作为自变量的因素与作为因变量的预测对象是否有关,相关程度如何,以及判断这种相关程度的把握性多大,就成为进行回归分析必须要解决的问题。进行相关分析,一般要求出相关关系,以相关系数的大小来判断自变量和因变量的相关的程度。(4)检验回归预测模型,计算预测误差。具体地,回归预测模型是否可用于实际预测,取决于对回归预测模型的检验和对预测误差的计算。回归方程只有通过各种检验,且预测误差较小,才能将回归方程作为预测模型进行预测。(5)计算并确定预测值。具体地,利用回归预测模型计算预测值,并对预测值进行综合分析,确定最后的预测值。本申请中,先对数据进行聚类,然后在对聚类后数据进行回归预测,预测速度更快。
在上述确定单元40中,即为用于根据第一预测结果确定贷款对象的短期盈利能力。然后银行等金融机构既可以根据其盈利能力确定上述贷款对象的贷款额度,即可以给上述贷款对象的贷款金额上限。上述第一预测结果可以是代表等级的数字,比如,分为1-10级,随着等级的提高,代表贷款对象的短期盈利能力越强,其贷款的额度也就相应的越高,本实施例中,贷款额度还与贷款对象的注册资金、市场价值等数据相关。
参照图4,本实施例中,上述回归单元30,包括:
SVR预测模块31,用于将计算得到的各类聚类输入到预设的SVR预测模型中进行回归预测。
在上述SVR预测模块31中,上述SVR(Support Vector Regression,支持向量回归),是支持向量机(SVM)的重要的应用分支。本实施例中,通过极小化目标函数来确定回归函数,回归函数为f(x)=wx+b。其具体过程为:
Figure PCTCN2018095483-appb-000019
限制条件为:(w TΦ(x i)+b)-c≤ε+ζ i
Figure PCTCN2018095483-appb-000020
Figure PCTCN2018095483-appb-000021
对偶问题为:
Figure PCTCN2018095483-appb-000022
限制条件为:e T(α-α *)=0,e T(α+α *)≤Cv
Figure PCTCN2018095483-appb-000023
近似函数为:
Figure PCTCN2018095483-appb-000024
类似于2002年提出的v-SVC,e T(α+α *)≤Cv不等式可以由等式进行替换。而且,由于用户经常选择C=1类似的小常量,导致C/l太小。因此,在LIBSVM中,将用户指定的参数作为C/l.即,
Figure PCTCN2018095483-appb-000025
是用户指定的,LIBSVM解决了以下问题:
Figure PCTCN2018095483-appb-000026
限制条件为:
Figure PCTCN2018095483-appb-000027
Figure PCTCN2018095483-appb-000028
ε-SVR在参数
Figure PCTCN2018095483-appb-000029
下,其与v-SVR在参数
Figure PCTCN2018095483-appb-000030
下具有相同的解。
上式中,l为训练样本个数,这里l=k;C为平衡模型复杂性(1/2)w Tw和训练误差项的权重参数;ε为不敏感损失函数;ζ为松弛因子。K(x i,x)为核函数。
上述SVR(支持向量回归算法)主要是通过将聚类结果升维后,在高维空间中构造线性决策函数来实现线性回归,用e不敏感损失函数时,其基础主要是e不敏感损失函数和核函数算法。若将拟合的数学模型表达多维空间的某一曲线,则根据e不敏感损失函数所得的结果,就是包括该曲线和训练点的“e管道”。在所有样本点中,只有分布在“管壁”上的那一部分样本点决定管道的位置。这一部分训练样本称为“支持向量”。为适应训练样本集的非线性,传统的拟合方法通常是在线性方程后面加高阶项。此法诚然有效,但由此增加的可调参数未免增加了过拟合的风险。SVR采用核函数解决这一矛盾。用核函数代替线性方程中的线性项可以使原来的线性算法“非线性化”,即能做非线性回归。与此同时,引进核函数达到了“升维”的目的,而增加的可调参数是过拟合依然能控制。本申请,利用技术成熟的SVR算法,计算结果可靠,而且可以达到准确预测的效果。
参照图5,在一个实施例中,上述聚类单元20,包括:
提取模块21,用于将所述第一相关数据进行特征提取;
分析模块22,用于将提取的特征数据进行相关性分析,得到与其它特征数据不相关的不相关特征数据;
聚类模块23,用于将所述第一相关数据中与所述不相关特征数据对应的第一相关数据清除后输入到K-means算法中,进行第一次聚类计算。
在上述提取模块21、分析模块22和聚类模块23中,将上述贷款对象相关的第一相关数据进行特征提取,进行相关性分析找出特征数据中与其它特征数据不相关的不相关特征数据,然后将这些不相关特征数据对应的第一相关数据从第一相关数据中剔除,使用留下的第一相关数据进行聚类计算,得到的聚类更加准确,因为将不相关特征数据对应的第一相关数据提出,所以提高聚类计算的效率。本实施例中,对第一相关数据进行特征能提取的方法具体为:使用Relief算法(Relief算法是一种特征权重算法 (Feature weighting algorithms),根据各个特征和类别的相关性赋予特征不同的权重,权重小于某个阈值的特征将被移除)进行特征提取。Relief算法从训练集D中随机选择一个样本R,然后从和R同类的样本中寻找最近邻样本H,称为Near Hit,从和R不同类的样本中寻找最近邻样本M,称为NearMiss,然后根据以下规则更新每个特征的权重:如果R和Near Hit在某个特征上的距离小于R和Near Miss上的距离,则说明该特征对区分同类和不同类的最近邻是有益的,则增加该特征的权重;反之,如果R和Near Hit在某个特征的距离大于R和Near Miss上的距离,说明该特征对区分同类和不同类的最近邻起负面作用,则降低该特征的权重。以上过程重复m次,最后得到各特征的平均权重。特征的权重越大,表示该特征的分类能力越强,反之,表示该特征分类能力越弱。Relief算法的运行时间随着样本的抽样次数m和原始特征个数N的增加线性增加,因而运行效率非常高。具体算法已在方法实施例中描述,所以不再赘述。
在一个实施例中,上述分析模块22,包括:可视分析子模块,用于将所述特征数据制作成散点图,将所述散点图中的离散点对应的特征数据记为所述不相关特征数据。
在上述可视分析子模块中,上述散点图(scatter diagram)在回归分析中是指数据点在直角坐标系平面上的分布图;通常用于比较跨类别的聚合数据。散点图中包含的数据越多,比较的效果就越好。本实施例中上述特征数据一般为矩阵,此时可利用散点图矩阵来同时绘制各自变量间的散点图,这样可以快速发现多个变量间的主要相关性。将上述特征数据制作成散点图的过程即为可视化的过程,特征数据可视化处理,所以人可以个通过肉眼在图形或图像上直观的分辨出离散点的存在,然后选择出离散点,计算机设备会将选择的离散点对应的特征数据记为不相关特征数据。
在另一实施例中,上述分析模块22,包括:矩阵分析子模块,用于将所述特征数据进行相关矩阵分析,提取出与其它特征数据不相关的所述不相关特征数据。
在上述矩阵分析子模块中,上述相关矩阵也叫相关系数矩阵,其是由矩阵各列间的相关系数构成的。也就是说,相关矩阵第i行第j列的元素是原矩阵第i列和第j列的相关系数。本实施例中一般用到协方差矩阵进行分析,协方差用来衡量两个变量的总体误差,如果两个变量的变化趋势一致,协方差就是正值,说明两个变量正相关。如果两个变量的变化趋势相反,协方差就是负值,说明两个变量负相关。如果两个变量相互独立,那么协方差就是0,说明两个变量不相关,当变量大于或等于三组的时候,即会使用相应的协方差矩阵。
参照图6,在本实施例中,上述短期盈利的预测装置,还包括:
数据获取单元50,用于获取非区块链上的与所述贷款对象相关的第二相关数据;
数据聚类单元60,用于将所述第二相关数据输入到K-means算法中,进行第二次聚类计算;
聚类回归单元70,用于将第二次聚类计算得到的各类聚类进行预设方式的回归预测,得到第二预测结果;
比较单元80,用于判断所述第一预测结果与所述第二预测结果的差值是否小于预设的阈值;
判定单元90,用于若所述差值小于所述阈值,则判定根据所述第一预测结果确定贷款对象的短期盈利能力的结果为可用结果。
上述非区块链上的第二相关数据,是指没有记录在区块链上的数据,一般为大数据网络中数据。对第二相关数据的聚类算法和回归预测方法与上述的第一相关数据完全相同,再此不在赘述。本实施例中,将根据第一相关数据得到的第一预测结果与根据第二相关数据得到的第二预测结果进行比较,即为设置一道验证的步骤,以判断第一预测结果是否可用。本申请中,因为主要是针对区块链布局的前期,所以各企业的历史数据会有大量的存在与大数据的互联网上,如企业自己的服务器中,或则与企业相关的其它企业的服务器中,只要在互联网环境中,就有可能被获取到。本步骤中,主要将利用互联网上的“大数据”得到的第二预测结果验证利用区块链上的“小数据”得到的第一预测结果,只有第二预测结果和第一预测结果的差值小于预设的阈值才判定第一预测结果基本正确,可以使用。
在一个实施例中,上述短期盈利的预测装置,还包括:
判断单元,用于判断所述第一相关数据的数据量是否大于预设的数据阈值;
切换单元、用于则将所述第一相关数据输入到预设的基于大数据的预测算法中进行预测。
如上述判断单元和切换单元中,就是设定了一个数据阈值,当获取到的第一相关数据的数据量大于数据阈值时,其已经脱离了上述短期盈利的预测装置的适用的“小数据”范围,所以会停止后续的聚类、回归预测等预测过程,而是切换预测方法。具体切换的方法可以是,将获取到的第一相关数据输入到预设的现有的相对成熟的预测模型中,比如基于TD-ABC模型的企业盈利模型等。
在一个实施例中,上述短期盈利的预测装置还包括:
欺诈分析单元,用于分析上述的第一相关数据中是否含有欺诈数据,具体的方法可以为:将获取的第一相关数据进行特征提取,以得到特征数据;在所述特征数据中提取出与其它特征数据不相关的不相关特征数据;然后通过Voronoi算法对所述不相关特征数据进行异常值识别,得出欺诈数据。可以通过欺诈数据的多少等情况,分析出贷款对象的借贷信誉值。然后结合信誉值和短期盈利能力确定贷款对象的贷款额度。
在一具体实施例中,a企业需要找P银行进行贷款,P银行则需要对a企业进行评估,其评估的过程为:1、通过在区块链上收集与该a企业相关的全部数据,如a企业的销售数据、生产数据、财务数据等。然后对获取到的数据进行特征提取,将无用的数据提前删除,已提高后续聚类计算的速度与效率。具体的删除方法为,先将提取出的数据进行可视化地形成散点图,然后将散点图中的离散点删除。2、将从区块链上获取到的a企业的数据通过K-means算法进行聚类计算。3、将聚类计算的结果进行SVR回归预测,进而得到该a企业盈利能力等结果;4、还通过上述欺诈数据的识别方法判断a企业的信誉等;5、P银行根据a企业的信誉、盈利能力等确定是否可以贷款给a企业,以及最大贷款限额等。具体 的,如果a企业的信誉小于预设值,则拒绝贷款给a企业;如果a企业的信誉为预设值则可以到款给a企业,此时在结合该a企业的盈利能力,计算最大的贷款限额等,从而有效地提高P银行规避风险的能力。具体获取a企业在数据链上的数据包括:采购货物种类,以及该采购经费数据;海关出口货物、关税,进口货物、关税;国内销售数据;销售产品数据;贷款数据;还贷信誉数据;货物库存数据;物流相关数据(仓库数量、仓库地理分布、每个仓库的存储数据、销售地域分布)等。
本申请的短期盈利的预测装置,先对获取到的“小数据”据通过K-means算法进行聚类,然后通过回归算法进行预测得到预测结果,最后根据预测结果确定贷款对象的短期盈利能力。解决了银行等金融机构在各企业数据链前期布局阶段相关数据较少的情况下,无法准确预测贷款企业的短期盈利能力的问题,便于相对准确地限定贷款对象的贷款额度,以减小银行机构的借贷风险。
参照图7,本发明实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储获取的第一相关数据和第二相关数据、K-means算法模型等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现上述各方法实施例的流程。
本发明一实施例还提供一种计算机非易失性可读存储介质,其上存储有计算机可读指令,计算机可读指令被处理器执行时实现上述各方法实施例的流程。
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种短期盈利的预测方法,其特征在于,用于在区块链上获取到与贷款对象相关的数据量小于预设量时使用,所述预测方法,包括:
    从区块链上获取与贷款对象相关的第一相关数据;
    将所述第一相关数据输入到K-means算法中,进行第一次聚类计算;
    将第一次聚类计算得到的各类聚类进行预设方式的回归预测,得到第一预测结果;
    根据所述第一预测结果确定贷款对象的短期盈利能力。
  2. 根据权利要求1所述的短期盈利的预测方法,其特征在于,所述将第一次聚类计算得到的各类聚类进行预设方式的回归预测的步骤,包括:
    将计算得到的各类聚类输入到预设的SVR预测模型中进行回归预测。
  3. 根据权利要求1所述的短期盈利的预测方法,其特征在于,所述将所述第一相关数据输入到K-means算法中,进行第一次聚类计算的步骤,包括:
    将所述第一相关数据进行特征提取;
    将提取的特征数据进行相关性分析,得到与所述特征数据中的其它特征数据不相关的不相关特征数据;
    将所述第一相关数据中与所述不相关特征数据对应的目标数据清除后输入到K-means算法中,进行第一次聚类计算。
  4. 根据权利要求3所述的短期盈利的预测方法,其特征在于,所述将提取的特征数据进行相关性分析,得到与其它特征数据不相关的不相关特征数据的步骤,包括:
    将所述特征数据制作成散点图,将所述散点图中的离散点对应的特征数据记为所述不相关特征数据。
  5. 根据权利要求3所述的短期盈利的预测方法,其特征在于,所述将提取的特征数据进行相关性分析,得到与其它特征数据不相关的不相关特征数据的步骤,包括:
    将所述特征数据进行相关矩阵分析,提取出与其它特征数据不相关的所述不相关特征数据。
  6. 根据权利要求1所述的短期盈利的预测方法,其特征在于,所述根据所述第一预测结果确定贷款对象的短期盈利能力的步骤之后,包括:
    获取非区块链上的与所述贷款对象相关的第二相关数据;
    将所述第二相关数据输入到K-means算法中,进行第二次聚类计算;
    将第二次聚类计算得到的各类聚类进行预设方式的回归预测,得到第二预测结果;
    判断所述第一预测结果与所述第二预测结果的差值是否小于预设的阈值;
    若所述差值小于所述阈值,则判定根据所述第一预测结果确定贷款对象的短期盈利能力的结果为可用结果。
  7. 根据权利要求1所述的短期盈利的预测方法,其特征在于,所述将所述第一相关数据输入到K-means算法中,进行第一次聚类计算的步骤之前,包括:
    判断所述第一相关数据的数据量是否大于预设的数据阈值;
    若是,则将所述第一相关数据输入到预设的基于大数据的预测算法中进行预测。
  8. 一种短期盈利的预测装置,其特征在于,用于在区块链上获取到与贷款对象相关的数据量小于预设量时使用,所述预测装置,包括:
    获取单元,用于从区块链上获取与贷款对象相关的第一相关数据;
    聚类单元,用于将所述第一相关数据输入到K-means算法中,进行第一次聚类计算;
    回归单元,用于将第一次聚类计算得到的各类聚类进行预设方式的回归预测,得到第一预测结果;
    确定单元,用于根据所述第一预测结果确定贷款对象的短期盈利能力。
  9. 根据权利要求8所述的短期盈利的预测装置,其特征在于,所述回归单元,包括:
    SVR预测模块,用于将计算得到的各类聚类输入到预设的SVR预测模型中进行回归预测。
  10. 根据权利要求8所述的短期盈利的预测装置,其特征在于,所述聚类单元,包括:
    提取模块,用于将所述第一相关数据进行特征提取;
    分析模块,用于将提取的特征数据进行相关性分析,得到与其它特征数据不相关的不相关特征数据;
    聚类模块,用于将所述第一相关数据中与所述不相关特征数据对应的第一相关数据清除后输入到K-means算法中,进行第一次聚类计算。
  11. 根据权利要求10所述的短期盈利的预测装置,其特征在于,所述分析模块,包括:
    矩阵分析子模块,用于将所述特征数据进行相关矩阵分析,提取出与其它特征数据不相关的所述不相关特征数据。
  12. 根据权利要求10所述的短期盈利的预测装置,其特征在于,所述将提取的特征数据进行相关性分析,得到与其它特征数据不相关的不相关特征数据的步骤,包括:
    将所述特征数据进行相关矩阵分析,提取出与其它特征数据不相关的所述不相关特征数据。
  13. 根据权利要求8所述的短期盈利的预测装置,其特征在于,所述短期盈利的预测装置,还包括:
    数据获取单元,用于获取非区块链上的与所述贷款对象相关的第二相关数据;
    数据聚类单元,用于将所述第二相关数据输入到K-means算法中,进行第二次聚类计算;
    聚类回归单元,用于将第二次聚类计算得到的各类聚类进行预设方式的回归预测,得到第二预测结果;
    比较单元,用于判断所述第一预测结果与所述第二预测结果的差值是否小于预设的阈值;
    判定单元,用于若所述差值小于所述阈值,则判定根据所述第一预测结果确定贷款对象的短期盈利能力的结果为可用结果。
  14. 根据权利要求8所述的短期盈利的预测装置,其特征在于,所述短期盈利的预测装置,还包括:
    判断单元,用于判断所述第一相关数据的数据量是否大于预设的数据阈值;
    切换单元、用于则将所述第一相关数据输入到预设的基于大数据的预测算法中进行预测。
  15. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现短期盈利的预测方法,用于在区块链上获取到与贷款对象相关的数据量小于预设量时使用,所述预测方法,包括:
    从区块链上获取与贷款对象相关的第一相关数据;
    将所述第一相关数据输入到K-means算法中,进行第一次聚类计算;
    将第一次聚类计算得到的各类聚类进行预设方式的回归预测,得到第一预测结果;
    根据所述第一预测结果确定贷款对象的短期盈利能力。
  16. 根据权利要求15所述的计算机设备,其特征在于,所述将第一次聚类计算得到的各类聚类进行预设方式的回归预测的步骤,包括:
    将计算得到的各类聚类输入到预设的SVR预测模型中进行回归预测。
  17. 根据权利要求15所述的计算机设备,其特征在于,所述将所述第一相关数据输入到K-means算法中,进行第一次聚类计算的步骤,包括:
    将所述第一相关数据进行特征提取;
    将提取的特征数据进行相关性分析,得到与所述特征数据中的其它特征数据不相关的不相关特征数据;
    将所述第一相关数据中与所述不相关特征数据对应的目标数据清除后输入到K-means算法中,进行第一次聚类计算。
  18. 根据权利要求17所述的计算机设备,其特征在于,所述将提取的特征数据进行相关性分析,得到与其它特征数据不相关的不相关特征数据的步骤,包括:
    将所述特征数据制作成散点图,将所述散点图中的离散点对应的特征数据记为所述不相关特征数据。
  19. 一种计算机非易失性可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现短期盈利的预测方法,用于在区块链上获取到与贷款对象相关的数据量小于预设量时使用,所述预测方法,包括:
    从区块链上获取与贷款对象相关的第一相关数据;
    将所述第一相关数据输入到K-means算法中,进行第一次聚类计算;
    将第一次聚类计算得到的各类聚类进行预设方式的回归预测,得到第一预测结果;
    根据所述第一预测结果确定贷款对象的短期盈利能力。
  20. 根据权利要求19所述的计算机非易失性可读存储介质,其特征在于,所述将第一次聚类计算 得到的各类聚类进行预设方式的回归预测的步骤,包括:
    将计算得到的各类聚类输入到预设的SVR预测模型中进行回归预测。
PCT/CN2018/095483 2018-04-17 2018-07-12 短期盈利的预测方法、装置、计算机设备和存储介质 WO2019200742A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2019570544A JP6855604B2 (ja) 2018-04-17 2018-07-12 短期利益を予測する方法、装置、コンピューターデバイス、プログラムおよび記憶媒体

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810345257.9A CN108710965A (zh) 2018-04-17 2018-04-17 短期盈利的预测方法、装置、计算机设备和存储介质
CN201810345257.9 2018-04-17

Publications (1)

Publication Number Publication Date
WO2019200742A1 true WO2019200742A1 (zh) 2019-10-24

Family

ID=63866732

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/095483 WO2019200742A1 (zh) 2018-04-17 2018-07-12 短期盈利的预测方法、装置、计算机设备和存储介质

Country Status (3)

Country Link
JP (1) JP6855604B2 (zh)
CN (1) CN108710965A (zh)
WO (1) WO2019200742A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444963A (zh) * 2020-03-27 2020-07-24 中南大学 一种基于ssa-svr模型的高炉铁水硅含量预测方法
CN112199812A (zh) * 2020-08-18 2021-01-08 华电电力科学研究院有限公司 基于趋势回归分析的燃气能源系统工业蒸汽负荷预测方法
CN116166960A (zh) * 2023-02-07 2023-05-26 河南大学 用于神经网络训练的大数据特征清洗方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991744B (zh) * 2019-12-05 2022-07-12 中国银行股份有限公司 一种交易限额的设置方法及系统
CN113037840B (zh) * 2021-03-08 2022-06-10 中国联合网络通信集团有限公司 通信数据传输方法、通信终端和通信平台

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127380A (zh) * 2016-06-22 2016-11-16 北京拓明科技有限公司 一种大数据风险分析方法
CN106980909A (zh) * 2017-03-30 2017-07-25 重庆大学 一种基于模糊线性回归的电影票房预测方法
CN107844836A (zh) * 2017-10-24 2018-03-27 信雅达系统工程股份有限公司 一种基于机器学习的系统及学习方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004078435A (ja) * 2002-08-13 2004-03-11 Ibm Japan Ltd リスク管理装置、リスク管理システム、リスク管理方法、将来期待利益算出方法、およびプログラム
JP5544508B2 (ja) * 2009-03-27 2014-07-09 株式会社国際電気通信基礎技術研究所 行動識別システム
JP2011039934A (ja) * 2009-08-17 2011-02-24 Tokai Univ 感情推定システム及びそれを利用した学習システム
JP5783793B2 (ja) * 2011-05-18 2015-09-24 日本電信電話株式会社 対話評価装置、方法及びプログラム
WO2016120918A1 (ja) * 2015-01-27 2016-08-04 日本電気株式会社 予測システム、予測方法及びコンピュータ読み取り可能記録媒体
US9418337B1 (en) * 2015-07-21 2016-08-16 Palantir Technologies Inc. Systems and models for data analytics
WO2017090329A1 (ja) * 2015-11-24 2017-06-01 ソニー株式会社 情報処理装置、情報処理方法、およびプログラム
JP6690298B2 (ja) * 2016-02-26 2020-04-28 沖電気工業株式会社 情報処理装置、情報処理システム、及びプログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127380A (zh) * 2016-06-22 2016-11-16 北京拓明科技有限公司 一种大数据风险分析方法
CN106980909A (zh) * 2017-03-30 2017-07-25 重庆大学 一种基于模糊线性回归的电影票房预测方法
CN107844836A (zh) * 2017-10-24 2018-03-27 信雅达系统工程股份有限公司 一种基于机器学习的系统及学习方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444963A (zh) * 2020-03-27 2020-07-24 中南大学 一种基于ssa-svr模型的高炉铁水硅含量预测方法
CN111444963B (zh) * 2020-03-27 2023-08-25 中南大学 一种基于ssa-svr模型的高炉铁水硅含量预测方法
CN112199812A (zh) * 2020-08-18 2021-01-08 华电电力科学研究院有限公司 基于趋势回归分析的燃气能源系统工业蒸汽负荷预测方法
CN112199812B (zh) * 2020-08-18 2022-10-28 华电电力科学研究院有限公司 基于趋势回归分析的燃气能源系统工业蒸汽负荷预测方法
CN116166960A (zh) * 2023-02-07 2023-05-26 河南大学 用于神经网络训练的大数据特征清洗方法及系统
CN116166960B (zh) * 2023-02-07 2023-09-29 山东经鼎智能科技有限公司 用于神经网络训练的大数据特征清洗方法及系统

Also Published As

Publication number Publication date
JP6855604B2 (ja) 2021-04-07
CN108710965A (zh) 2018-10-26
JP2020524346A (ja) 2020-08-13

Similar Documents

Publication Publication Date Title
WO2019200742A1 (zh) 短期盈利的预测方法、装置、计算机设备和存储介质
US11222046B2 (en) Abnormal sample prediction
Meiri et al. Using simulated annealing to optimize the feature selection problem in marketing applications
AU2019100362A4 (en) Personal Credit Rating System Based on The Logistic Regression
US11693917B2 (en) Computational model optimizations
US20100057773A1 (en) Fuzzy tagging method and apparatus
Long et al. A new approach for construction of geodemographic segmentation model and prediction analysis
KR20200075120A (ko) 기업 부도 예측 시스템 및 이의 동작 방법
Mousavi et al. Improving customer clustering by optimal selection of cluster centroids in K-means and K-medoids algorithms
Zhang et al. Credit scoring model based on a novel group feature selection method: The case of Chinese small-sized manufacturing enterprises
Özlem et al. Predicting cash holdings using supervised machine learning algorithms
CN113674087A (zh) 企业信用等级评定方法、装置、电子设备和介质
US20200051098A1 (en) Method and System for Predictive Modeling of Consumer Profiles
KR20110114181A (ko) 예측 정확성이 향상된 대출 심사 방법
Li et al. An improved genetic-XGBoost classifier for customer consumption behavior prediction
ELYUSUFI et al. Churn prediction analysis by combining machine learning algorithms and best features exploration
Keerthana et al. Accurate prediction of fake job offers using machine learning
US20220012613A1 (en) System and method for evaluating machine learning model behavior over data segments
Nikitin et al. Evolutionary ensemble approach for behavioral credit scoring
CN113052512A (zh) 风险预测方法、装置和电子设备
Liu Design of XGBoost prediction model for financial operation fraud of listed companies
Joolfoo et al. A Systematic Review of Algorithms applied for Telecom Churn Prediction
Chang et al. PSO based time series models applied in exchange rate forecasting for business performance management
Sinha et al. Movie production investment decision system
Rodin Growing small businesses using software system for intellectual analysis of financial performance

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18915057

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019570544

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18915057

Country of ref document: EP

Kind code of ref document: A1