CN113052395A

CN113052395A - Method for predicting financial data by neural network fusing network characteristics

Info

Publication number: CN113052395A
Application number: CN202110405540.8A
Authority: CN
Inventors: 黄泽宇
Original assignee: Shandong Ziping Information Technology Service Co ltd
Current assignee: Shandong Ziping Information Technology Service Co ltd
Priority date: 2021-04-15
Filing date: 2021-04-15
Publication date: 2021-06-29

Abstract

The invention discloses a method for predicting financial data by a neural network fusing network characteristics, which comprises the following steps: s1: collecting data; s2: preprocessing a data set; s3: constructing an enterprise relation complex network by utilizing enterprise stock price time sequence data; s4: carrying out community division on the network by using a Louvain algorithm to obtain the community property of the node; s5: calculating by using the importance of the nodes in the enterprise relevance network according to the enterprise relevance network by using a PageRank algorithm to obtain the PageRank value of the nodes; s6: performing secondary calculation on the static data according to corresponding indexes to construct dependent variable data; s7: and (4) carrying out prediction evaluation on the income capacity of the enterprise by using a neural network algorithm. The method has higher accuracy of the final prediction result, enhances the objectivity, scientificity and accuracy of the evaluation work, solves the problem that quantitative analysis is lacked in income prediction during enterprise value evaluation work, reduces the influence of subjective factors of analysts during income prediction, and improves the objectivity and interpretability of the evaluation work.

Description

Method for predicting financial data by neural network fusing network characteristics

Technical Field

The invention relates to a complex network and financial data mining technology, in particular to a method for predicting financial data by a neural network fusing network characteristics.

Background

In traditional business investment, the analysts judge the future income of the enterprise on the basis of qualitative analysis, the prediction difference of different analysts is large, and the evaluation result is greatly influenced by the professional level of the analysts and the accuracy is difficult to control. With the advent of the information-oriented era, the internet financial industry experiences a change of covering the land, and the information technology which takes the internet, cloud computing, big data, artificial intelligence and a block chain as the core is rapidly developed, is being applied to various fields of the economic society on a large scale and becomes an important driving force for promoting the transformation and upgrading of various industries. Various data in the society are processed through an algorithm, and reference information can be effectively provided for decision makers and decision making is assisted. The asset assessment industry is used as an important participant of market economy, when massive data is faced, a income prediction model is built by utilizing the massive data to assist an analyst to make a decision, and the situation of providing a referable valuation prediction becomes an emerging solution in the industry.

The judgment of the future income of the enterprise by the traditional analysts is usually based on qualitative analysis, the prediction difference of different analysts is large, and the evaluation result is greatly influenced by the professional level of the analysts and the influence is difficult to control.

The Chinese patent application CN201910237356.X provides a quantitative calculation method for investment value of listed enterprises, which comprises the steps of firstly obtaining financial data of a target enterprise to obtain a classification result, and then calculating the classification result according to a preset algorithm formed by the weight of influence factors of the investment value of the listed enterprises to obtain a quantitative calculation result for the investment value.

The chinese patent application CN201810742756.1 proposes an income prediction method, which determines a feature value corresponding to a relevant feature of a user performing an operation on a target object by obtaining relevant information of each user performing the operation on the target object in a prediction period in a group to be predicted, and predicts the income of the target object under each channel according to a group feature value corresponding to the relevant feature of each channel and a pre-trained channel income prediction model.

The Chinese patent application CN202010081874.X provides a method for establishing respective local evaluation models based on longitudinal federal learning to determine the investment value of an enterprise to be evaluated, so that the enterprise can be comprehensively and reliably evaluated.

Although these patent applications can evaluate the value of the enterprise to some extent, they have several disadvantages. First, the enterprise to be forecasted is not evaluated from a whole market perspective, and the forecasting model is relatively one-sided. Second, the assessment model does not take into account the inter-enterprise relationships, such as strong cohesion between enterprises. Third, the parameters of these assessment models are only calculated twice on common data and do not take into account the hierarchy of the enterprise.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method for predicting financial data by a neural network fusing network characteristics.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for predicting financial data by a neural network fusing network characteristics comprises the following steps:

s1: data acquisition, including dynamic data and static data;

s2: data set preprocessing, including processing missing values and outliers of data;

s3: constructing an enterprise relation complex network by utilizing enterprise stock price time sequence data;

s4: carrying out community division on the network by using a Louvain algorithm (namely a community discovery algorithm of a large-scale network mined by a social network) to obtain the community properties of the nodes;

s5: calculating the importance of the nodes in the enterprise relevance network by using a PageRank (namely, a webpage ranking, also called a webpage level) algorithm according to the enterprise relevance network to obtain the PageRank value of the nodes;

s6: according to the corresponding index, carrying out secondary calculation on the static data to construct dependent variable data, and specifically comprising the following steps:

s6.1: increasing community properties of enterprises, and performing variable dimension expansion by adopting One-Hot coding (One-Hot coding);

s6.2: increasing the PageRank value of an enterprise, and performing variable dimension expansion;

s7: and (4) carrying out prediction evaluation on the income capacity of the enterprise by using a neural network algorithm.

In step S1, the dynamic data is the market value of all listed enterprises in the stock market in the setting stage, and the static data is the financial data and the overall market data of the enterprises (i.e. the macroscopic factors in the financial indexes of the listed enterprises, and the specific data is shown in the macroscopic factor part in table 1).

In step S2, filling the null values in the sample data with median, and performing the max-min method normalization processing on the data, where the calculation formula (1) is as follows:

wherein z is_ijRepresenting the jth row of data in the ith characteristic index of the sample data set, i and j are integers greater than or equal to 1, and z_iminAnd z_imaxRespectively representing the minimum value and the maximum value of the ith characteristic index of the sample data set.

The step S3 includes the following sub-steps:

s3.1, in the step, the time series of the stock prices is taken as the dynamic characteristics of the listed enterprises, the correlation of the characteristics between the listed enterprises is calculated, Pearson correlation coefficients are adopted, the calculation formula (2) is as follows, and v is assumed to be_i(t) is the closing price of i stock at time t, at which time the value gain of i stock is delta v_i(t) is:

wherein t represents time, delta t is the time period for obtaining the income, i and j are integers which are more than or equal to 1 respectively, and the Pearson correlation coefficient p between any two i and j stocks_ijBy two variables v_i、v_jThe covariance and the ratio of the product of the two standard deviations are calculated, and the specific formula (3) is as follows:

s3.2, the connection relationship E ═ E { E } between the two listed businesses is obtained in step S3.1₁₁,e₁₂,…,e_ijIn which p is_ijRepresenting the Pearson correlation coefficient between i, j stocks, e_ijFor the incidence relation between the enterprises appearing on the market, E is a basic connection coefficient, and a table relation network G is obtained as (V, E, W), wherein V represents each network node entity of the enterprises appearing on the market, E represents the connection edge relation between two enterprises appearing on the market, W represents the weight relation between the enterprises appearing on the market, and the weight relation W between the enterprises i and j_ijIs calculated as in general formula (4)

w_ij＝e_ij*p_ij (4)

Under the calculation formula, strong weight connection edges exist when the correlation is strong, and weak weight connection edges exist when the correlation is weak.

The step S4 specifically includes the following substeps:

s4.1, initializing nodes and communities, and regarding each node in the network as an independent community, wherein the number of the communities is the same as that of the nodes;

s4.2, m represents a node, is a positive integer, sequentially tries to allocate the node m to the community where each neighbor node is located, calculates the modularity change delta Q before and after allocation, and keeps the community attribute of the neighbor node with the maximum delta Q, if max delta Q is larger than 0, allocates the node m to the community where the neighbor node with the maximum delta Q is located, otherwise, keeps unchanged;

s4.3, repeating the S4.2 until the node community attribute is unchanged;

s4.4, regarding all nodes in the same community as a new node, converting the weight of edges between the nodes in the community into the weight of a ring of the new node, and converting the edge weight between the community intervals into the weight of edges between the new nodes;

and S4.5, repeating S4.1 until the modularity Delta Q of the whole network is not changed any more.

The step S5 specifically includes the following substeps:

s5.1, the PageRank value of the initialization node is PangRank (d)_i)₁M is a node of the network;

s5.2, traversing the nodes in the network according to the PageRank value of the target node and the weights of the neighbor nodes, updating the PageRank value of the nodes, and calculating a formula (5) as follows

Wherein PangRank (d)_i)_k+1Representing node d at iteration k +1_iPageRank value of (d), M (d)_i) Represents node d_iSet of neighbor nodes of d_vRepresentative node d_iThe neighbor node of (2);

s5.3, calculating the PageRank value updating quantity delta of all nodes in the network, wherein a calculation formula (6) is as follows

Where D represents the set of all nodes, N represents the number of nodes, PangRank (D)_i)_k+1Representing node d at iteration k +1_iPageRank value of (c);

and S5.4, stopping iteration when the delta is less than or equal to the epsilon, wherein the epsilon is a constant, and otherwise, repeating the step S4.2.

In the step S6, when an index system is constructed, index fusion is performed by using the index evaluation principle in the multivariate statistical theory and using the network characteristic indexes extracted in the steps S4 and S5, and the method specifically includes the following substeps:

and S6.1, increasing community properties of listed enterprises, adopting One-Hot coding to convert the community attributes of the listed enterprise nodes into multi-bit binary data, and performing variable dimension expansion.

And S6.2, increasing the PageRank value of the enterprise on the market, and performing variable dimension expansion.

In step S7, the neural network model is divided into forward propagation and backward propagation, output data of a previous layer in the forward propagation process is used as input data of a next layer, and then the input data needs to be weighted, summed, added with a deviation, and substituted into an activation function for calculation, where a specific formula (7) is as follows:

wherein, f is an activation function, the activation function in the neural network is a Sigmoid activation function, w_pqWeight value, x, representing the p-layer q-layer of the neural network layer_pRepresenting the input at the p-layer of the neural network, b_pRepresenting the offset of the p-layer of the neural network.

A complex network is a method of representing various types of actual relationships in terms of abstract nodes and connecting edges. As an important tool for researching various disciplines, the topological characteristics of a specific network in an actual problem can be obtained through a data structure such as a network diagram, and then the corresponding problem is solved by utilizing the characteristics. A concrete network can be abstracted as a graph G ═ (V, E, W) consisting of a set of nodes V and a set of connected edges E, where V contains all the nodes,

is a collection of edges. v. of_iE.v denotes a node in the network, e_ijE represents the node v_iAnd node v_jThe connecting edge between the two. w is a_ijE W represents the weight coefficient of the edge for measuring v_jAnd v_iThe degree of tightness of the connection.

Neural networks, which were first proposed by psychologists and neurobiologists to provide a relatively simple approach to solving complex problems, have received increasing attention in recent years. Neural network models are various and different levels of description and simulation of biological nervous systems are performed from different perspectives. Representative network models are BP networks, RBF networks, Hopfield networks, ad hoc feature mapping networks, and the like.

The invention utilizes the stock price time sequence data of the listed enterprises to construct an enterprise relationship complex network, divides the whole enterprise network into communities according to the clustering idea, quantifies the importance degree of a single enterprise in the whole network, and quantifies the incidence relationship of the enterprises in the whole market through the idea of community division. Characteristics of the enterprises in the complex network are quantified through community division and important node sequencing, and an analyst can conveniently and visually know the status of the target enterprises in the complex network. And (4) carrying out prediction and evaluation on the income capacity of the enterprise by using a neural network algorithm in combination with the complex network index and the financial index of the enterprise on the market.

The invention selects stock market value and enterprise financial basic information data of listed enterprises, adopts the neural network prediction model after adding the network characteristic indexes, has higher accuracy of the final prediction result, enhances the objectivity, scientificity and accuracy of the evaluation work, solves the problem that the income prediction is lack of quantitative analysis during the enterprise value evaluation work, reduces the influence of subjective factors of analysts during the income prediction, improves the objectivity and interpretability of the evaluation work, and meets the requirement of practical use. In the experiment of real data, the model effect is obviously improved after the network characteristic parameters are fused, the mean square error MSE of the business income growth rate (%) of the model output value is reduced from 0.226 to 0.184, the reduction proportion of the mean square error is up to 18%, and as can be seen from fig. 7, the deviation degree of the real value and the predicted value is realized. The fact shows that the method provided by the patent has good prediction capability on financial data.

Drawings

FIG. 1 is a flow chart of the modeling steps of the present invention;

FIG. 2 is a diagram illustrating a complex enterprise relationship network constructed using time-series data of stock prices for a listed enterprise;

FIG. 3 is a flow chart of Louvain algorithm community discovery;

FIG. 4 is a flowchart of the PageRank algorithm;

FIG. 5 is a schematic diagram of a neural network algorithm;

FIG. 6 is a graph of iterative Loss drop of neural network algorithm fit;

FIG. 7 is a graph of the fit before and after adding a network feature.

Detailed Description

The invention is further illustrated with reference to the following figures and examples.

The structure, proportion, size and the like shown in the drawings are only used for matching with the content disclosed in the specification, so that the person skilled in the art can understand and read the description, and the description is not used for limiting the limit condition of the implementation of the invention, so the method has no technical essence, and any structural modification, proportion relation change or size adjustment still falls within the scope covered by the technical content disclosed by the invention without affecting the effect and the achievable purpose of the invention. In addition, the terms "upper", "lower", "left", "right", "middle" and "one" used in the present specification are for clarity of description, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not to be construed as a scope of the present invention.

Referring to fig. 1 to 7, the present invention takes the enterprise listed in stock a of china as an example to perform predictive modeling analysis on the income capacities of different enterprises. A stock marketing company with 2016 and 2018 as a test sample (income data is not published in 2020, financial data of enterprises in 2019 cannot be used), and companies with more than 20% of data loss, major change of main business and ST in an index system are removed.

As shown in fig. 1, the neural network prediction marketing enterprise income method with network features fused comprises the following steps:

s1: the collection of the data set is mainly divided into two parts of data, namely dynamic data and static data. The dynamic data is the market value of different listed enterprises, and the static data is the financial data and the overall market data of the listed enterprises.

Dynamic data and static data of a listed enterprise need to be collected, and the following substeps are included.

S1.1, stock market data of the listed enterprises are collected, wherein the data used by the model are from a Taian database in China, and 2016 is adopted for stock week closing prices of the listed enterprises with all A stocks in 2018.

S1.2, collecting market data and financial data of enterprises to be listed, wherein the financial indexes of the enterprises to be listed comprise the financial data of the enterprises to be listed, such as total domestic production value, Chinese international balance of income and expenditure, fixed asset investment completion amount and the like, account receivable turnover rate, account receivable turnover days and the like, as shown in Table 1.

Table 1: financial index of enterprise on market

S2: and (4) preprocessing the data set, wherein missing values and abnormal values of the data are processed. And filling null values in the sample data by adopting a median. In addition, because the measurement units and the value ranges of all indexes in the index system are different, in order to avoid the great influence of the value range difference on the result of the prediction model, the data is subjected to the standardization processing of the Min-Max scaling method. The calculation formula (1) is as follows:

S3: the method for constructing the enterprise relation complex network by using the listed enterprise stock price time sequence data comprises the following substeps:

s3.1, in the step, the time sequence of the stock price is taken as the dynamic characteristic of the listed enterprises, the correlation of the characteristics between the listed enterprises is calculated, the Pearson correlation coefficient is adopted, the calculation formula is as follows, and the assumption that v is_i(t) is the closing price of i stock at time t, at which time the value gain of i stock is delta v_i(t) is:

wherein t represents time, delta t is the time period for obtaining the income, i and j are integers which are more than or equal to 1 respectively, and the Pearson correlation coefficient between any two i and j stocks passes through two variables v_i、v_jThe covariance and the ratio of the product of the two standard deviations are calculated, and the specific formula (3) is as follows:

s3.2, obtaining the connection relation E ═ E between every two enterprises on the market through S3.1₁₁,e₁₂,…,e_ijIn which p is_ijRepresenting the Pearson correlation coefficient between i, j stocks, e_ijFor the existence of the association relationship between the listed enterprises, E is the basic connection coefficient, as shown in fig. 2, a table relationship network G ═ V, E, W is obtained, where V represents each listed enterprise network node entity, E represents the connection relationship between two listed enterprises, W represents the weight relationship between two listed enterprises, and the weight relationship W between i and j enterprises_ijIs calculated as in general formula (4)

w_ij＝e_ij*p_ij (4)

S4: the community property of the node is obtained by performing community division on the network by using a Louvain algorithm, and the flow is shown in FIG. 3 and specifically comprises the following substeps:

and S4.1, initializing the nodes and communities, and regarding each node in the network as an independent community, wherein the number of the communities is the same as that of the nodes.

S4.2, m represents a node, the node m is a positive integer, the node m is sequentially tried to be distributed to communities where each neighbor node is located, modularity change delta Q before and after distribution is calculated, the community attribute of the neighbor node with the maximum delta Q is reserved, if max delta Q is larger than 0, the node m is distributed to the community where the neighbor node with the maximum delta Q is located, and if not, the node m is kept unchanged.

And S4.3, repeating the S4.2 until the node community attribute is unchanged.

And S4.4, regarding all nodes in the same community as a new node, converting the weight of the edge between the nodes in the community into the weight of the ring of the new node, and converting the edge weight between the community intervals into the weight of the edge between the new nodes.

S5: according to the listed enterprise relevance network, calculating by using the importance of the nodes in the listed enterprise relevance network through a PageRank algorithm to obtain the PageRank value of the nodes, wherein the flow is shown in FIG. 4 and specifically comprises the following substeps:

s5.1, the PageRank value of the initialization node is PangRank (d)_m)₁M is a node of the network;

Where D represents the set of all nodes, N represents the number of nodes, PangRank (D)_i)_k+1Representing node d at iteration k +1_iThe PageRank value of; and S5.4, stopping iteration when the delta is less than or equal to the epsilon, wherein the epsilon is a constant, and otherwise, repeating the step S4.2.

S6: and performing secondary calculation on the static data according to the corresponding index, and constructing dependent variable data. When an index system is constructed, the method uses the index evaluation principle in the multivariate statistical theory for reference, and the specific calculation mode is shown in table 1. On the other hand, the index fusion is performed by using the network characteristic indexes extracted in S4 and S5, and the method specifically comprises the following substeps:

S7: the enterprise revenue capacity is forecasted and evaluated by using a neural network algorithm, and as shown in fig. 5, the neural network model is divided into forward propagation and backward propagation. The output data of the previous layer in the forward propagation process is used as the input data of the next layer, then the input data needs to be weighted and summed to add deviation, and is substituted into the activation function for calculation, and the specific formula (7) is as follows:

wherein, f is an activation function, the activation function in the neural network is a Sigmoid activation function, w_pqRepresenting a neural network layer pWeight value of layer q, x_pRepresenting the input at the p-layer of the neural network, b_pRepresenting the offset of the p-layer of the neural network.

As described above, the embodiment of the neural network prediction marketing enterprise income method fusing network characteristics in the field of financial data mining is introduced, the method selects the stock market value and the enterprise financial basic information data of marketing enterprises in 2016 + 2018, adopts the neural network prediction model after adding the network characteristic indexes, has high accuracy of the final prediction result, enhances the objectivity, the scientificity and the accuracy of the evaluation work, solves the problem that the income prediction is lack of quantitative analysis during the enterprise value evaluation work, reduces the influence of subjective factors of an analyst during the income prediction, improves the objectivity and the interpretability of the evaluation work, and meets the requirement of actual use. In the experiment of real data, the model effect is obviously improved after the network characteristic parameters are fused, the mean square error MSE of the business income growth rate (%) of the model output value is reduced from 0.226 to 0.184, and as can be seen from FIG. 7, the deviation degree of the real value and the predicted value is realized.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. A method for predicting financial data by a neural network fusing network characteristics is characterized by comprising the following steps:

s1: data acquisition, including dynamic data and static data;

s4: carrying out community division on the network by using a Louvain algorithm to obtain the community property of the node;

s5: calculating by using the importance of the nodes in the enterprise relevance network according to the enterprise relevance network by using a PageRank algorithm to obtain the PageRank value of the nodes;

s6: according to corresponding indexes, performing secondary calculation on static data, constructing dependent variable data, and when an index system is constructed, taking reference to an index evaluation principle in a multivariate statistical theory, and simultaneously performing index fusion by using the network characteristic indexes extracted in the step S4 and the step S5, specifically comprising the following substeps:

2. The method for predicting financial data through a neural network with converged network characteristics as claimed in claim 1, wherein the dynamic data is a market value of all listed enterprises in the stock market in the setting stage in the step S1, and the static data is financial data of the enterprises and a macroscopic factor in the overall market data, namely financial indexes of the listed enterprises.

3. The method for predicting financial data according to claim 1, wherein in step S2, the null values in the sample data are filled with median, and the data are normalized by the maximum-minimum method, and formula (1) is calculated as follows:

4. The method for neural network prediction of financial data with converged network characteristics as claimed in claim 1, wherein the step S3 comprises the sub-steps of:

s3.1, taking the time sequence of the stock prices of the listed enterprises as the dynamic characteristics of the listed enterprises, calculating the correlation of the characteristics between the listed enterprises, adopting Pearson correlation coefficient, and calculating the formula (2) as follows, and assuming that v is_i(t) is the closing price of i stock at time t, at which time the value gain of i stock is delta v_i(t) is:

wherein t represents time, Δ t represents time period for obtaining the profit, i and j are integers greater than or equal to 1 respectively, and Pearson correlation coefficient p between any two i and j stocks_ijBy two variables v_i、v_jThe covariance and the ratio of the product of the two standard deviations are calculated, and the specific formula (3) is as follows:

w_ij＝e_ij*p_ij (4)

5. The method for neural network based financial data fusion with network characteristics as claimed in claim 1, wherein said step S4 comprises the following steps:

s4.3, repeating the S4.2 until the node community attribute is unchanged;

6. The method for neural network based financial data fusion with network characteristics as claimed in claim 1, wherein said step S5 comprises the following steps:

s5.1, the PageRank value of the initialization node is PangRank (d)_m)₁M is a node of the network as above);

Wherein PangRank (d)_i)_k+1Representing iterationsNode d at time k +1_iPageRank value of (d), M (d)_i) Represents node d_iSet of neighbor nodes of d_vRepresentative node d_iThe neighbor node of (2);

7. The method for predicting financial data by using neural network with converged network characteristics as claimed in claim 1, wherein in the step S7, the neural network model is divided into forward propagation and backward propagation, output data of a previous layer in the forward propagation process is used as input data of a next layer, and then the input data is weighted and summed to add a deviation, and substituted into an activation function for calculation, and the specific formula (7) is as follows: