CN112734195A

CN112734195A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN112734195A
Application number: CN202011626912.1A
Authority: CN
Inventors: 田野; 郁文剑; 李泽华
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-04-30
Anticipated expiration: 2040-12-31
Also published as: CN112734195B

Abstract

The data processing method, the data processing device, the electronic equipment and the storage medium acquire a plurality of service indexes with different dimensions; screening leading factors of indexes to be predicted from a plurality of service indexes; screening a secondary lead factor of the lead factors from the plurality of service indexes; predicting the predicted value of the leading factor at the time to be predicted based on the historical data of the secondary leading factor, and acquiring a first target predicted value of the index to be predicted at the time to be predicted according to the predicted value of the leading factor; by the method, different service indexes are fully mined by using data of three dimensions, a directly related leading factor and an indirectly related secondary leading factor of the index to be predicted are screened out, a prediction model of the leading factor is constructed based on the secondary leading factor, the prediction value of the leading factor is calculated, the prediction value of the index to be predicted is calculated, the relation among the service indexes is fully mined, the prediction mode of the index to be predicted is enriched, and the prediction accuracy is improved.

Description

Data processing method and device, electronic equipment and storage medium

[ technical field ] A method for producing a semiconductor device

The present invention relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.

[ background of the invention ]

In the data processing process, some important service indexes are often required to be predicted.

In the prior art, when the index to be predicted is predicted, a plurality of related indexes are selected through experience of personnel in the industry, and then prediction is performed based on the related indexes, or comprehensive prediction is performed on the index to be predicted through a large number of service indexes, data mining is not fully performed, and the prediction accuracy needs to be improved.

[ summary of the invention ]

The invention aims to provide a data processing method, a data processing device, electronic equipment and a storage medium, and aims to solve the technical problem that the prediction accuracy of an index to be predicted is not high due to insufficient data mining in the prior art.

The technical scheme of the invention is as follows: provided is a data processing method including:

obtaining a plurality of service indexes from macroscopic economic data, mesoscopic industrial economic data and microcosmic company operation data;

acquiring first time sequence sample data of the service index and second time sequence sample data of an index to be predicted, and determining the service index related to the index to be predicted according to the first time sequence sample data and the second time sequence sample data to serve as a leading factor of the index to be predicted;

acquiring third time sequence sample data of the leading factor, and determining the service index related to the leading factor according to the first time sequence sample data and the third time sequence sample data to be used as a secondary leading factor of the leading factor;

predicting the predicted value of the leading factor at the time to be predicted based on the historical data of the secondary leading factor, and obtaining a first target predicted value of the index to be predicted at the time to be predicted according to the predicted value of the leading factor.

Preferably, the predicting the predicted value of the leading factor at the time to be predicted based on the historical data of the secondary leading factor, and obtaining the first target predicted value of the index to be predicted at the time to be predicted according to the predicted value of the leading factor, further includes:

acquiring key time sequence data related to the time to be predicted from second time sequence sample data of the index to be predicted by using a time sequence AR model;

acquiring a second target predicted value of the index to be predicted at the time to be predicted according to the key time sequence data;

and calculating a final target predicted value of the index to be predicted according to the first target predicted value and the second target predicted value.

predicting a third target prediction value of the index to be predicted at the time to be predicted based on the historical data of the lead factor;

and calculating a final target predicted value of the index to be predicted according to the first target predicted value and the third target predicted value.

Preferably, after the obtaining first time sequence sample data of the service index and second time sequence sample data of the index to be predicted, and determining the service index related to the index to be predicted according to the first time sequence sample data and the second time sequence sample data, as a leading factor of the index to be predicted, the method further includes:

acquiring a deletion instruction of a leading factor input by a user, and deleting the corresponding leading factor according to the deletion instruction;

and performing stepwise regression fitting on the leading factors by using a stepwise regression model, selecting an optimal model according to a regression fitting result, and removing the leading factors which are not in the optimal model.

Preferably, the determining, according to the first time sequence sample data and the second time sequence sample data, the service index related to the index to be predicted as a leading factor of the index to be predicted includes:

and inputting the first time sequence sample data and the second time sequence sample data into a leading factor screening model, and outputting the service index related to the index to be predicted as a leading factor of the index to be predicted.

Preferably, the lead factor screening model is a decision tree model; the inputting the first time sequence sample data and the second time sequence sample data into a leading factor screening model, and outputting the service index related to the index to be predicted as a leading factor of the index to be predicted, including:

combining the first time sequence sample data of each service index to form a data set;

randomly selecting the division characteristics of the data set, and calculating the information gain ratio of each division characteristic to the data set;

selecting a preset number of division features with the largest information gain ratio, constructing a decision tree according to the selected division features, taking the division features with the largest information gain ratio as root nodes of the decision tree, and dividing the data set based on the division features of the root nodes to obtain a first data subset;

calculating the information gain ratio of each remaining division characteristic to the first data subset, distributing the division characteristic with the largest information gain ratio to the internal nodes of the decision tree, and dividing the first data subset based on the division characteristics of the current internal nodes to obtain a second data subset;

according to the steps, the gain information ratio of the rest division characteristics is continuously calculated and distributed until the leaf nodes are reached;

and taking a leaf node as time sequence data related to the index to be predicted, and taking each service index corresponding to the time sequence data as a leading factor of the index to be predicted.

discretizing the first time sequence sample data and the second time sequence sample data respectively to obtain discretized first time sequence sample data and second time sequence sample data;

calculating mutual information between each discretized first time sequence sample data and each discretized second time sequence sample data;

and selecting a second preset number of first time sequence sample data with maximum mutual information according to the mutual information, taking a service index corresponding to the selected first time sequence sample data as the leading factor, and determining the weight of the leading factor according to the size of the mutual information.

The other technical scheme of the invention is as follows: there is provided a data processing apparatus comprising:

the data acquisition module is used for acquiring a plurality of service indexes from macroscopic economic data, mesoscopic industrial economic data and microcosmic company operation data;

a leading factor screening module, configured to obtain first time sequence sample data of the service index and second time sequence sample data of an index to be predicted, and determine, according to the first time sequence sample data and the second time sequence sample data, the service index related to the index to be predicted as a leading factor of the index to be predicted;

the second-level lead factor screening module is used for acquiring third time sequence sample data of the lead factor, and determining the service index related to the lead factor according to the first time sequence sample data and the third time sequence sample data to be used as a second-level lead factor of the lead factor;

the first prediction module is used for predicting the predicted value of the leading factor at the time to be predicted based on the historical data of the secondary leading factor and acquiring a first target predicted value of the index to be predicted at the time to be predicted according to the predicted value of the leading factor.

The other technical scheme of the invention is as follows: an electronic device is provided that includes a processor, and a memory coupled to the processor, the memory storing program instructions executable by the processor; the processor, when executing the program instructions stored by the memory, implements the data processing method described above.

The other technical scheme of the invention is as follows: there is provided a storage medium having stored therein program instructions which, when executed by a processor, implement the data processing method described above.

The invention has the beneficial effects that: the data processing method, the data processing device, the electronic equipment and the storage medium acquire a plurality of service indexes with different dimensions from macroscopic economic data, mesoscopic industrial economic data and microcosmic company operation data; determining the service index related to the index to be predicted according to the first time sequence sample data of the service index and the second time sequence sample data of the index to be predicted, wherein the service index is used as a leading factor of the index to be predicted; determining the service index related to the leading factor according to the first time sequence sample data and the third time sequence sample data of the leading factor, wherein the service index is used as a secondary leading factor of the leading factor; predicting a predicted value of the leading factor at the time to be predicted based on historical data of the secondary leading factor, and acquiring a first target predicted value of the index to be predicted at the time to be predicted according to the predicted value of the leading factor; by the method, different service indexes are fully mined by using data of three dimensions, a directly related leading factor and an indirectly related secondary leading factor of the index to be predicted are screened out, a prediction model of the leading factor is constructed based on the secondary leading factor, the prediction value of the leading factor is calculated, the prediction value of the index to be predicted is calculated, the relation among the service indexes is fully mined, the prediction mode of the index to be predicted is enriched, and the prediction accuracy is improved.

[ description of the drawings ]

FIG. 1 is a flow chart of a data processing method according to a first embodiment of the present invention;

FIG. 2 is a flow chart of a data processing method according to a second embodiment of the present invention;

FIG. 3 is a flow chart of a data processing method according to a third embodiment of the present invention;

FIG. 4 is a diagram illustrating a data processing apparatus according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the invention;

fig. 6 is a schematic structural diagram of a storage medium according to a sixth embodiment of the present invention.

[ detailed description ] embodiments

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first", "second" and "third" in the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise. All directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Fig. 1 is a flow chart illustrating a data processing method according to a first embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 1 if the results are substantially the same. As shown in fig. 1, the data processing method includes the steps of:

s101, obtaining a plurality of service indexes from macroscopic economic data, mesoscopic industrial economic data and microcosmic company operation data.

The method comprises the steps of obtaining a plurality of service indexes with different dimensions, wherein the different dimensions comprise a macro-economic dimension, a mesoscopic industry dimension and a micro-company dimension, for example, the macro-economic dimension: CPI (consumer price index), social retail sales, industrial added value, etc., mesoscopic industry dimension: the number of insurance agents in the whole bank, the insurance premium income in the whole bank, and the like, the dimension of the micro company: the number of insurance agents of the insurance company A, the first commission of the insurance agents of the insurance company A, and the like.

In order to facilitate service explanation, a service factor map can be constructed according to the service indexes, and each service index is classified, wherein the service factor map comprises service indexes with different dimensions and different levels, specifically, the first level is a dimension level and corresponds to a macroscopic economic dimension, a mesoscopic industry dimension and a microcosmic company dimension respectively; the second layer is a factor large class under the dimension level; the third layer is a factor under the factor class hierarchy; the fourth layer is a service index under a factor class, for example: macroscopic factor (macroscopic economic dimension) -macroscopic economic (factor large class) -GDP (factor class) -national GDP cumulative geometric acceleration (service index). Further, service indicators with different frequencies can be set under the factor class hierarchy, for example, two service indicators of monthly frequency CPI and yearly frequency CPI are set under the CPI factor class hierarchy.

The macro economic data comprises various statistical indexes of macro economics and comprehensive indexes calculated by using the statistical indexes, and is derived from government periodical publication and a Chinese economic database. The mesoscopic industry economic data can be derived from government published industry information, industry research reports, Chinese economic databases and industry research databases. The micro company operation data is derived from data (such as annual newspaper of listed companies) published by the company, and can also be derived from data which is not disclosed in the company, for example, the company needs to analyze and predict certain service indexes, and can carry out statistical combing on the company operation data.

S102, obtaining first time sequence sample data of the service index and second time sequence sample data of the index to be predicted, and determining the service index related to the index to be predicted according to the first time sequence sample data and the second time sequence sample data to serve as a leading factor of the index to be predicted.

Constructing corresponding first time sequence sample data for each service index acquired in step S101, wherein the first time sequence sample data is a time sequence generated according to data of each historical time of the service index, the second time sequence sample data is a time sequence generated according to data of each historical time of the index to be predicted, and the more the first time sequence sample and the second time sequence sample are distributed, the more the service index is related to the index to be predicted; the more the distributions of the first time sequence sample and the second time sequence sample are different, the less the correlation between the service index and the index to be predicted is.

The correlation between the service index and the index to be predicted can be calculated based on the first time sequence sample and the second time sequence sample, and one or more service indexes with the correlation larger than or equal to a preset threshold value are selected as the leading factor. The screened leading factor can be used for predicting the index to be predicted.

In a first optional embodiment, the correlation calculation may be performed based on mutual information, and specifically, the step S102 may be further implemented by: s201, discretizing the first time sequence sample data and the second time sequence sample data respectively to obtain discretized first time sequence sample data and second time sequence sample data; s202, calculating mutual information between each discretized first time sequence sample data and each discretized second time sequence sample data; s203, according to the mutual information, selecting a second preset number of first time sequence sample data with maximum mutual information, taking a service index corresponding to the selected first time sequence sample data as the leading factor, and determining the weight of the leading factor according to the size of the mutual information. From the perspective of time series prediction, the greater the mutual information between two sequences, the more accurate the prediction of the other sequence by one of the sequences, and the greater the mutual information, the greater the weight given.

In a second alternative embodiment, the correlation calculation may be performed based on the pearson coefficient, and specifically, step S102 may be further implemented by: s301, calculating a pearson correlation coefficient between each first time sequence sample data and each second time sequence sample data; s302, according to the Pearson correlation coefficient, selecting the first time sequence sample data related to the second time sequence sample data, taking a service index corresponding to the selected first time sequence sample data as the leading factor, and determining the weight of the leading factor according to the Pearson correlation coefficient. Wherein, from the perspective of time series prediction, the larger the pearson correlation coefficient between two sequences, the more accurate the prediction of one sequence by the other sequence is.

In a third optional implementation manner, a lead factor of an index to be predicted may be obtained based on a lead factor screening model, and in this step, the first time sequence sample data and the second time sequence sample data are input into the lead factor screening model, and the service index related to the index to be predicted is output as a lead factor of the index to be predicted.

Specifically, the lead factor screening model is a decision tree model, and step S102 may be further implemented by:

s401, combining the first time sequence sample data of each service index to form a data set;

s402, randomly selecting the division characteristics of the data set, and calculating the information gain ratio of each division characteristic to the data set;

the dividing characteristics may be selectively obtained according to the index to be predicted, for example, the dividing characteristics may be that the pearson coefficient of the index to be predicted is greater than a first preset value, the variation trend of the index to be predicted is consistent within a first specified time period, the variation trend of the index to be predicted is opposite within a second specified time period, and the like. The division characteristics may also be selected according to characteristics of each business index in the data set, for example, the division characteristics may be whether the division characteristics are influenced by macro economic growth, whether the division characteristics are seasonal, and the like.

S403, selecting a preset number of division features with the largest information gain ratio, constructing a decision tree according to the selected division features, taking the division features with the largest information gain ratio as root nodes of the decision tree, and dividing the data set based on the division features of the root nodes to obtain a first data subset;

s404, calculating the information gain ratio of each remaining division characteristic to the first data subset, distributing the division characteristic with the largest information gain ratio to the internal nodes of the decision tree, and dividing the first data subset based on the division characteristics of the current internal nodes to obtain a second data subset;

s405, according to the steps, the gain information ratio of the rest division characteristics is continuously calculated and distributed until the leaf node is reached;

s406, taking a leaf node as time sequence data related to the index to be predicted, and taking each service index corresponding to the time sequence data as a leading factor of the index to be predicted.

The information gain ratio is calculated according to an algorithm in the prior art, and specifically comprises the following steps: setting a data set as D, a division characteristic as A, and outputting an information gain g (D, A) of the division characteristic A to the data set D, wherein the specific process is as follows:

first, the empirical entropy H (D) of the data set D is calculated,

where | Ck | represents a sample of class kThe number of the cells;

then, the empirical conditional entropy H (D | A) of the partition feature A on the data set D is calculated,

then, an information gain g (D, a) ═ H (D) — H (D | a) is calculated;

and finally, calculating the information gain ratio,

wherein the content of the first and second substances,

n is the number of values of the division characteristic A.

S103, obtaining third time sequence sample data of the leading factor, and determining the service index related to the leading factor according to the first time sequence sample data and the third time sequence sample data to serve as a secondary leading factor of the leading factor.

The third time sequence sample data is a time sequence generated according to the data of each historical time of the lead factor, and similarly, the more the distribution of the first time sequence sample and the third time sequence sample is the same, the more the service index is related to the lead factor; the more the distributions of the first and third timing samples are different, the less the correlation between the service index and the lead factor.

And calculating the correlation between the service index and the lead factor based on the first time sequence sample and the third time sequence sample, and selecting one or more service indexes with the correlation larger than or equal to a preset threshold value as a secondary lead factor. The screened secondary lead factors can be used for predicting the lead factors related to the secondary lead factors.

The screening mode of the second-level lead factor of the lead factor is similar to the screening mode of the lead factor of the index to be predicted, which is specifically described in step S102. The secondary lead factors of the lead factors may also be screened by using the above steps S201 to S203, or by using the steps S301 to S302, or by using the steps S401 to S406, and the processes are similar and will not be described in detail herein.

S104, predicting the predicted value of the leading factor at the time to be predicted based on the historical data of the secondary leading factor, and obtaining a first target predicted value of the index to be predicted at the time to be predicted according to the predicted value of the leading factor.

For each lead factor, constructing a corresponding prediction model of the lead factor according to the secondary lead factors; constructing a training sample according to data of the secondary lead factor in a first historical time period and data of the lead factor in a second historical time period, wherein the first historical time period is earlier than the second historical time period; training the prediction model by using the training sample until the model converges; and inputting the historical data of the secondary lead factors into the prediction model, and obtaining the predicted value of the lead factors output by the prediction model.

The following describes the above steps of the present application with a specific application example, for example, the index to be predicted is NBEV (new business inclusion value), and in step S102, the leading factor of the NBEV is monthly active man power and active man power NBEV; in step S103, screening secondary lead factors of monthly average activity manpower as monthly average manpower and activity rate, and screening secondary lead factors of activity manpower per capita NBEV as activity average number and part number NBEV; in step S104, modeling monthly average activity manpower by using the monthly average manpower and the activity rate to obtain a prediction model of the monthly average activity manpower; modeling the movable manpower NBEV by utilizing the movable manpower average number and the part average NBEV to obtain a prediction model of the movable manpower NBEV; inputting historical data of the monthly average activity manpower and the historical data of the activity rate to the monthly average activity manpower into a corresponding prediction model to obtain a predicted value of the monthly average activity manpower; inputting historical data of the movable man-average number and the part-average NBEV into a corresponding prediction model to obtain a predicted value of the movable man-average NBEV; and multiplying the predicted value of the monthly average activity manpower and the predicted value of the movable manpower per capita NBEV to obtain a first target predicted value of the NBEV.

For example, when predicting the lead factor-monthly average activity manpower, the following prediction model is constructed: manpower for evening the moon_{Prediction model}Equal to (a1 times monthly mean time of labor_{History 1}+ a2 times monthly balance of manpower_{History 2}Manpower for + an × moon_{History n}) X (b 1X Activity_{History 1}+ b2 × Activity_{History 2}+ bn × Activity ratio of +, + …_{History n}) And the monthly average manpower and the activity rate are service indexes of the dimensionality of the microcosmic company with strong seasonality, and a training set is constructed by using historical data to train the parameters ai and bj in the prediction model.

Further, in order to further analyze the main driving factors of the index to be predicted, a third-level leading factor related to each second-level leading factor and a fourth-level leading factor related to the third-level leading factor may be further screened from the service indexes, specifically, in step S103, a fourth time sequence sample data of the leading factor of the current level is continuously constructed according to a preset level, and the service index related to the leading factor of the current level is determined according to the first time sequence sample data and the fourth time sequence sample data and is used as a next-level leading factor of the leading factor until the last level is reached. For example, after screening, the next-level lead factor of the second-level lead factor-monthly mean labor force comprises month-end labor force, increased membership rate, falling rate and examination falling rate, and the method is ended when the preset level is a three-level lead factor structure.

In step S104, predicting a predicted value of a previous-level leading factor of a last level at a time to be predicted based on historical data of the previous-level leading factor, and obtaining the predicted value of the previous-level leading factor at the time to be predicted according to the predicted values of the previous-level leading factors of the current level until obtaining a first target predicted value of the index to be predicted at the time to be predicted.

For example, when predicting the second-level lead factor-monthly mean labor power, the prediction model of monthly mean labor power is constructed according to the lead factor of the next level, which may specifically be: when the moon is driving_PredictionManpower at the end of the month_{History 1-N}X (1+ C1 increase Rate)_{History 1-N}C2 shedding Rate_{History 1-N}C3 examination of exfoliation Rate_{History 1-N}) And C4, constructing a training set by using historical data to perform model training, calculating a predicted value of the monthly average manpower at the time to be predicted by using the data of the monthly end manpower, the increased membership rate, the shedding rate and the check shedding rate of the selected historical time period, calculating a predicted value of the monthly average activity manpower at the time to be predicted according to the predicted value of the monthly average manpower and the predicted value of the activity rate, and finally calculating a first target predicted value of the index to be predicted at the time to be predicted according to the predicted value of the monthly average activity manpower and the predicted value of the NBEV of the activity manpower average.

Further, secondary screening is performed on the lead factor screened in step S102 to avoid overfitting, and after step S102 and before step S103, the method further includes the following steps:

s501, acquiring a deletion instruction of the lead factor input by a user, and deleting the corresponding lead factor according to the deletion instruction;

the user can perform experience judgment on the screened leading factors based on industry experience, and exclude the leading factors which have the possibility that no business logic can prove that the leading factors have correlation with the indexes to be predicted, for example, screening the business indexes of macroscopic economic dimensionality, namely the on-hand order, but the existing industry experience has no evidence to prove that the on-hand order can have a prediction effect on the NBEV speed increase, and the leading factors are excluded to avoid influencing the prediction result.

S502, performing stepwise regression fitting on the lead factors by using a stepwise regression model, selecting an optimal model according to a regression fitting result, and removing the lead factors which are not in the optimal model;

specifically, the forward selection method is an independent variable selection method of a regression model, and is characterized in that candidate independent variables are introduced into a regression equation one by one, so that the forward selection method is called as a forward method. The specific operation steps are as follows: firstly, carrying out significance test on a regression coefficient of an independent variable fitting model with the maximum correlation coefficient with a dependent variable y to determine whether to introduce the independent variable into the model; then, of the independent variables not introduced into the model, the model is introduced into the independent variable having the maximum partial correlation coefficient with y, the significance of the regression coefficient is checked, and the process is divided into the following steps. Until the results of the significance test of the regression coefficients of unselected independent variables on y do not significantly differ from 0 after the influence of the selected variables on y is excluded.

Fig. 2 is a flow chart illustrating a data processing method according to a second embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 2 if the results are substantially the same. As shown in fig. 2, the data processing method includes the steps of:

s601, obtaining a plurality of service indexes from macroscopic economic data, mesoscopic industrial economic data and microcosmic company operation data.

S602, obtaining first time sequence sample data of the service index and second time sequence sample data of the index to be predicted, and determining the service index related to the index to be predicted according to the first time sequence sample data and the second time sequence sample data to serve as a leading factor of the index to be predicted.

S603, obtaining third time sequence sample data of the leading factor, and determining the service index related to the leading factor according to the first time sequence sample data and the third time sequence sample data to be used as a secondary leading factor of the leading factor.

S604, predicting the predicted value of the leading factor at the time to be predicted based on the historical data of the secondary leading factor, and obtaining a first target predicted value of the index to be predicted at the time to be predicted according to the predicted value of the leading factor.

Steps S601 to S604 correspond to steps S101 to S104 of the first embodiment, respectively, and refer to the description of the first embodiment specifically.

And S605, acquiring key time sequence data related to the time to be predicted from second time sequence sample data of the index to be predicted by using a time sequence AR model.

The method further considers the autocorrelation of the index to be predicted, the change of the index to be predicted may have periodicity, for example, the agent membership increasing rate and the dropping rate have obvious change rules respectively at the beginning of the quarter and at the end of the quarter, the autocorrelation analysis is performed on the index to be predicted by using a time series AR model, the change rule of the index to be predicted is excavated, so that the historical time with a similar change trend with the time to be predicted is obtained, and the key time sequence data of the historical time has strong correlation with the time to be predicted, and can be used for prediction. Specifically, the self-correlation analysis of the NBEVs is performed, and it is found that the current-period acceleration rate of the NBEVs has correlation with the previous-quarter acceleration rate and the previous-two-quarter acceleration rates, respectively, so that the key time sequence data are the NBEVs in the previous quarter of the time to be predicted and the NBEVs in the previous two quarters of the time to be predicted.

Wherein, the time series AR model is an autoregressive model, describes a recursive linear regression relation in the data series, and is set as { X_tT is 0, ± 1, ± 2, … } is a time series, { epsilon { [_tT is 0, ± 1, ± 2, … } is a white noise sequence, and E (X) for any s < t_sε_t) When 0, the equation X is satisfied_t＝α₀+α₁X_t-1+α₂X_t-2+…+α_pX_t-p+ε_tThe time series of (a) is a p-order autoregressive sequence. Let ρ be_iFor the autocorrelation coefficients of order i, i equals 1,2, …, p, the system of equations is obtained:

ρ₁＝α₁+α₂ρ₂+α₃ρ₃+…+α_pρ_p

ρ₂＝α₁ρ₁+α₂+α₃ρ₃+…+α_pρ_p

……

ρ_p＝α₁ρ₁+α₂ρ₂+α₃ρ₃+…+α_p

the partial autocorrelation coefficient alpha can be obtained by solving the equation set_iAnd then, the autocorrelation coefficients of all orders are calculated by utilizing the recursion relation.

S606, obtaining a second target predicted value of the index to be predicted at the time to be predicted according to the key time sequence data.

Wherein the critical time sequence data is x₁，x₂，…，x_T(ii) a Time series data of the time to be predicted isX_T+1，X_T+2，…，X_T+l(ii) a The second predicted value obtained by prediction is a conditional mean value EX_T+1，EX_T+2，…，EX_T+l(ii) a According to X_t＝α₁X_t-1+α₂X_t-2+…+α_pX_t-p+ε_tAvailable EX_t＝α₁EX_t-1+α₂EX_t-2+…+α_pEX_t-pWherein, EX_t＝x_tAnd T is less than or equal to T. Thus, EX_T+k＝α₁EX_T+K-1+α₂EX_T+K-2+…+α_pEX_T+K-p，k＝1,2，…，l。

S607, calculating a final target predicted value of the index to be predicted according to the first target predicted value and the second target predicted value;

the first target predicted value and the second target predicted value are set according to experience, the first weight is set for the first target predicted value, the second weight is set for the second target predicted value, a first product of the first target predicted value and the first weight and a second product of the second target predicted value and the second weight are calculated, and the first product and the second product are summed to obtain a final target predicted value.

Accordingly, for the lead factor, autocorrelation may also be considered when predicting it, so step S604 specifically includes the following steps:

s6041, predicting a first predicted value of the leading factor at the time to be predicted based on historical data of the secondary leading factor;

s6042, acquiring key time sequence data related to the time to be predicted from the first time sequence sample data of the lead factor by using a time sequence AR model; and acquiring a second predicted value of the leading factor at the time to be predicted according to the key time sequence data.

S6043, obtaining the predicted value of the lead factor in the time to be predicted according to the first predicted value and the second predicted value;

s6044, obtaining a first target predicted value of the index to be predicted in the time to be predicted according to the predicted value of the lead factor.

Fig. 3 is a flow chart illustrating a data processing method according to a third embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 3 if the results are substantially the same. As shown in fig. 3, the data processing method includes the steps of:

s701, acquiring a plurality of service indexes from macroscopic economic data, mesoscopic industrial economic data and microcosmic company operation data.

S702, obtaining first time sequence sample data of the service index and second time sequence sample data of the index to be predicted, and determining the service index related to the index to be predicted according to the first time sequence sample data and the second time sequence sample data to serve as a leading factor of the index to be predicted.

S703, obtaining third time sequence sample data of the leading factor, and determining the service index related to the leading factor according to the first time sequence sample data and the third time sequence sample data, wherein the service index is used as a secondary leading factor of the leading factor.

S704, predicting the predicted value of the leading factor at the time to be predicted based on the historical data of the secondary leading factor, and obtaining a first target predicted value of the index to be predicted at the time to be predicted according to the predicted value of the leading factor.

Steps S701 to S704 correspond to steps S101 to S104 of the first embodiment, respectively, and refer to the description of the first embodiment specifically.

S705, predicting a third target prediction value of the index to be predicted at the time to be predicted based on the historical data of the leading factor.

The method comprises the steps of constructing a prediction model of an index to be predicted according to a leading factor related to the index to be predicted, constructing a training set by using historical data to train the prediction model, and directly predicting the index to be predicted by using the trained model.

The method comprises the following steps of firstly, establishing a model for an index to be predicted, and carrying out modeling prediction on the index to be predicted directly to obtain a third predicted value; step S104, modeling and predicting a leading factor related to the index to be predicted, and then obtaining a first predicted value by carrying out logic operation on the predicted value of the leading factor; the directions of the two are opposite.

S706, calculating a final target predicted value of the index to be predicted according to the first target predicted value and the third target predicted value.

The third weight may be empirically set for the first target predicted value, the fourth weight may be set for the third target predicted value, a third product of the first target predicted value and the third weight and a fourth product of the third target predicted value and the fourth weight are calculated, and the third product and the fourth product are summed to obtain the final target predicted value.

By the mode, the predictions in two directions are fused, so that the prediction accuracy is improved, meanwhile, the advanced factors of each level of the index to be predicted can be obtained, the service data mining is realized, and the service explanation is better performed on the index to be predicted.

In an optional embodiment, the method may further include the following steps:

and S707, uploading the first time sequence sample data of the service index and the second time sequence sample data of the index to be predicted to a block chain, so that the block chain encrypts and stores the first time sequence sample data of the service index and the second time sequence sample data of the index to be predicted.

Specifically, the corresponding summary information is obtained based on the first time sequence sample data of the service index and the second time sequence sample data of the index to be predicted, and specifically, the summary information is obtained by performing hash processing on the first time sequence sample data of the service index and the second time sequence sample data of the index to be predicted, for example, by using the sha256s algorithm. Uploading summary information to the blockchain can ensure the safety and the fair transparency of the user. The user equipment may download the summary information from the blockchain, so as to verify whether the first timing sample data of the service indicator and the second timing sample data of the indicator to be predicted are tampered. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

Fig. 4 is a schematic structural diagram of a data processing apparatus according to a fourth embodiment of the present invention. As shown in fig. 4, the data processing apparatus 40 includes: the system comprises a data acquisition module 41, a lead factor screening module 42, a secondary lead factor screening module 43 and a first prediction module 44, wherein the data acquisition module 41 is used for acquiring a plurality of service indexes from macroscopic economic data, mesoscopic industrial economic data and microcosmic company operation data; a lead factor screening module 42, configured to obtain first time sequence sample data of the service index and second time sequence sample data of an index to be predicted, and determine, according to the first time sequence sample data and the second time sequence sample data, the service index related to the index to be predicted as a lead factor of the index to be predicted; a secondary lead factor screening module 43, configured to obtain third time sequence sample data of the lead factor, and determine, according to the first time sequence sample data and the third time sequence sample data, the service index related to the lead factor as a secondary lead factor of the lead factor; the first prediction module 44 is configured to predict the predicted value of the leading factor at the time to be predicted based on the historical data of the secondary leading factor, and obtain a first target predicted value of the index to be predicted at the time to be predicted according to the predicted value of the leading factor.

Further, the data processing apparatus 40 further includes a second prediction module, configured to obtain, by using a time series AR model, key time series data related to the time to be predicted from second time series sample data of the indicator to be predicted; acquiring a second target predicted value of the index to be predicted at the time to be predicted according to the key time sequence data; and calculating a final target predicted value of the index to be predicted according to the first target predicted value and the second target predicted value.

Further, the data processing apparatus 40 further includes a third prediction module, configured to predict, based on the historical data of the lead factor, a third target prediction value of the index to be predicted at the time to be predicted; and calculating a final target predicted value of the index to be predicted according to the first target predicted value and the third target predicted value.

Further, the data processing apparatus 40 further includes a secondary screening module for leading factors, which is configured to obtain a deleting instruction of leading factors input by a user, and delete corresponding leading factors according to the deleting instruction; and performing stepwise regression fitting on the leading factors by using a stepwise regression model, selecting an optimal model according to a regression fitting result, and removing the leading factors which are not in the optimal model.

Further, the leading factor screening module 42 is configured to input the first time sequence sample data and the second time sequence sample data into a leading factor screening model, and output the service index related to the index to be predicted as a leading factor of the index to be predicted. Furthermore, the lead factor screening module 42 is configured to combine the first timing sample data of each service index to form a data set; randomly selecting the division characteristics of the data set, and calculating the information gain ratio of each division characteristic to the data set; selecting a preset number of division features with the largest information gain ratio, constructing a decision tree according to the selected division features, taking the division features with the largest information gain ratio as root nodes of the decision tree, and dividing the data set based on the division features of the root nodes to obtain a first data subset; calculating the information gain ratio of each remaining division characteristic to the first data subset, distributing the division characteristic with the largest information gain ratio to the internal nodes of the decision tree, and dividing the first data subset based on the division characteristics of the current internal nodes to obtain a second data subset; according to the steps, the gain information ratio of the rest division characteristics is continuously calculated and distributed until the leaf nodes are reached; and taking a leaf node as time sequence data related to the index to be predicted, and taking each service index corresponding to the time sequence data as a leading factor of the index to be predicted.

Further, the lead factor screening module 42 is configured to discretize the first time sequence sample data and the second time sequence sample data respectively to obtain discretized first time sequence sample data and second time sequence sample data; calculating mutual information between each discretized first time sequence sample data and each discretized second time sequence sample data; and selecting a second preset number of first time sequence sample data with maximum mutual information according to the mutual information, taking a service index corresponding to the selected first time sequence sample data as the leading factor, and determining the weight of the leading factor according to the size of the mutual information.

Fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention. As shown in fig. 5, the electronic device 50 includes a processor 51 and a memory 52 coupled to the processor 51.

The memory 52 stores program instructions for implementing the data processing method of any of the above embodiments.

The processor 51 is operative to execute program instructions stored in the memory 42 for data processing.

The processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having signal processing capabilities. The processor 41 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a storage medium according to a sixth embodiment of the invention. The storage medium of the embodiment of the present invention, which stores program instructions 61 capable of implementing all the methods described above, may be either non-volatile or volatile. The program instructions 61 may be stored in the storage medium in the form of a software product, and include several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

While the foregoing is directed to embodiments of the present invention, it will be understood by those skilled in the art that various changes may be made without departing from the spirit and scope of the invention.

Claims

1. A data processing method, comprising:

2. The data processing method according to claim 1, wherein the predicting a predicted value of the leading factor at a time to be predicted based on the historical data of the secondary leading factor, and obtaining a first target predicted value of the index to be predicted at the time to be predicted according to the predicted value of the leading factor, further comprises:

3. The data processing method according to claim 1, wherein the predicting a predicted value of the leading factor at a time to be predicted based on the historical data of the secondary leading factor, and obtaining a first target predicted value of the index to be predicted at the time to be predicted according to the predicted value of the leading factor, further comprises:

4. The data processing method according to claim 1, wherein the obtaining first time sequence sample data of the service index and second time sequence sample data of an index to be predicted, determining the service index related to the index to be predicted according to the first time sequence sample data and the second time sequence sample data, and after the service index is used as a leading factor of the index to be predicted, further comprising:

5. The data processing method according to claim 1, wherein the determining the service index related to the index to be predicted according to the first time sequence sample data and the second time sequence sample data as a leading factor of the index to be predicted comprises:

6. The data processing method of claim 5, wherein the lead factor screening model is a decision tree model; the inputting the first time sequence sample data and the second time sequence sample data into a leading factor screening model, and outputting the service index related to the index to be predicted as a leading factor of the index to be predicted, including:

7. The data processing method according to claim 1, wherein the determining the service index related to the index to be predicted according to the first time sequence sample data and the second time sequence sample data as a leading factor of the index to be predicted comprises:

8. A data processing apparatus, comprising:

9. An electronic device comprising a processor, and a memory coupled to the processor, the memory storing program instructions executable by the processor; the processor, when executing the program instructions stored in the memory, implements the data processing method of any of claims 1-7.

10. A storage medium having stored therein program instructions which, when executed by a processor, implement the data processing method of any one of claims 1 to 7.