CN117670519A - Data prediction method, device, equipment and medium - Google Patents
Data prediction method, device, equipment and medium Download PDFInfo
- Publication number
- CN117670519A CN117670519A CN202311667465.8A CN202311667465A CN117670519A CN 117670519 A CN117670519 A CN 117670519A CN 202311667465 A CN202311667465 A CN 202311667465A CN 117670519 A CN117670519 A CN 117670519A
- Authority
- CN
- China
- Prior art keywords
- time stamp
- attribute information
- object attribute
- factor
- target object
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000004590 computer program Methods 0.000 claims description 16
- 230000015654 memory Effects 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 9
- 230000001419 dependent effect Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 12
- 238000000556 factor analysis Methods 0.000 description 12
- 238000012545 processing Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 230000001364 causal effect Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000012954 risk control Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/12—Accounting
- G06Q40/125—Finance or payroll
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Business, Economics & Management (AREA)
- Technology Law (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Marketing (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a data prediction method, a device, equipment and a medium. The method comprises the following steps: acquiring target object attribute information of a target object at a next time stamp; identifying and extracting a multi-factor confusion variable and a single-factor confusion variable of target object attribute information at the next time stamp based on the initial object attribute information of the previous time stamp and the current time stamp; and inputting the multi-factor confusion variable, the single-factor confusion variable and the target object attribute information into a pre-established gating circulation unit GRU network model to obtain a target data fluctuation stream of the target object at the next time stamp. The invention eliminates the bias influence of confounding factors on the target data fluctuation flow of the next time stamp; meanwhile, the technical problem that in the prior art, only data at one time point is concerned to cause low accuracy is solved, and therefore accuracy of predicting the target data fluctuation flow is improved.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data prediction method, apparatus, device, and medium.
Background
The existing analysis method of the repayment capacity of the enterprise is mainly based on profit tables and asset liability tables, and is an evaluation result for acquiring profit capacity of the enterprise within a certain period on the basis of a responsibility generation system. But it is not capable of comprehensively and accurately reflecting the business operation achievements and financial conditions of enterprises. The cash flow has larger objectivity and small influence from subjective factors, and the repayment capacity of enterprises can be further checked and repaired and more accurate and multi-view comprehensive analysis can be realized by analyzing the related data of the cash flow. At present, some methods for predicting enterprise cash flow based on training a machine learning model by financial report data have been developed in the industry, and the processes of the methods are to acquire enterprise financial data, clean the financial data, implement feature engineering, train machine learning models such as logistic regression, ridge regression, SVM and the like, adjust model parameters and predict the enterprise cash flow by using the trained models.
When the existing method is used for training a model and making predictions by using the model, only one time section data is often selected, enterprise cash flow is continuously changed along with time, and the trend of the change of financial conditions in a period of time is helpful for predicting default, and only the data of one time point is considered to be insufficient for making accurate judgment. On the other hand, in addition to the model analysis data, the existence of confounding factors leads to deviation of the prediction result.
Disclosure of Invention
The invention provides a data prediction method, a device, equipment and a medium, which are used for solving the technical problems that in the prior art, only data at one time point is concerned, so that the accuracy is low, and the prediction result is deviated due to mixed factors.
According to an aspect of the present invention, there is provided a data prediction method including:
acquiring target object attribute information of a target object at a next time stamp;
identifying and extracting a multi-factor confusion variable and a single-factor confusion variable of the target object attribute information at the next time stamp based on the initial object attribute information of the previous time stamp and the current time stamp;
and inputting the multi-factor confusion variable, the single-factor confusion variable and the target object attribute information into a pre-created gating cycle unit GRU network model to obtain a target data fluctuation stream of the target object at the next time stamp.
According to another aspect of the present invention, there is provided a data prediction apparatus including:
the first acquisition module is used for acquiring target object attribute information of a target object at the next time stamp;
the identification and extraction module is used for identifying and extracting a multi-factor confusion variable and a single-factor confusion variable of the target object attribute information at the next time stamp based on the initial object attribute information of the previous time stamp and the current time stamp;
the first prediction module is used for inputting the multi-factor confusion variable, the single-factor confusion variable and the target object attribute information into a pre-created gating circulation unit GRU network model to obtain a target data fluctuation stream of the target object at the next time stamp.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data prediction method of any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a data prediction method according to any one of the embodiments of the present invention.
According to the technical scheme, the initial object attribute information based on the previous time stamp and the current time stamp is analyzed based on the target object, the multi-factor confusion variable and the single-factor confusion variable of the target object attribute information at the next time stamp are identified and extracted from the initial object attribute information, the multi-factor confusion variable and the single-factor confusion variable of the next time stamp and the target object attribute information are used as input parameters of a GRU network model, so that bias influence of confounding factors on a target data fluctuation stream of the next time stamp is eliminated; meanwhile, the data of the target object at a plurality of time stamps are analyzed, so that the technical problem that in the prior art, only data at one time point is concerned, the accuracy is low is solved, and the accuracy of predicting the target data fluctuation flow is improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of a cause and effect embodiment of the present invention;
FIG. 2 is a flowchart of a data prediction method according to an embodiment of the present invention;
FIG. 3 is a flowchart of another data prediction method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an implementation of extracting multi-factor confusion variables according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an implementation of extracting multi-factor confusion variables according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an implementation of data prediction for a next timestamp provided by an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a data prediction apparatus according to an embodiment of the present invention;
fig. 8 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical scheme of the invention obtains, stores, uses, processes and the like the data, which all meet the relevant regulations of national laws and regulations.
In order to facilitate understanding of the scheme, related terms involved in the embodiments of the present invention are explained as follows.
GRU GRU (Gate Recurrent Unit) is one type of recurrent neural network (Recurrent Neural Network, RNN). As with LSTM (Long-Short Term Memory), it has also been proposed to address the problems of Long-term memory and gradients in counter-propagation. Compared with LSTM, the GRU has one less "gating" inside, has fewer parameters than LSTM, is easier to train, can greatly improve training efficiency, but can achieve the function equivalent to LSTM.
Confounding factors: confounding factors refer to all other factors (including known and unknown) that may affect outcome, except research factors. FIG. 1 is a schematic illustration of a cause and effect embodiment of the present invention. As shown in fig. 1, is a causal graph of A, B and C, which is seen to be a fork structure, i.e. a and C are related. But their relevance is related by the "economic, educational level", with pseudo-relevance between them, and B is a confounding factor, which causes pseudo-relevance between a and C. As the presence of B causes confusion in the causal relationship between a and C. In this case, the confounding factor needs to be removed when the causal relation of A and C is to be explored, because the causal relation of A < - > C and the pseudo-correlation caused by A < -B- > C are mixed together.
Single factor analysis: refers to the analysis of a variable at a point in time. The purpose is to describe the fact. For example: age formation and sex formation of teachers and students, student social body distribution, school score distribution and the like. From which the condition of some aspect of the school or class can be analyzed.
Multi-factor analysis: multi-factor analysis is a single-factor analysis that focuses only on the difference between groups or the magnitude of the effect on the ending event of one factor, as opposed to the single-factor analysis, without consideration of the effects of other factors. In practice, however, the occurrence and development of an ending event is often affected by a combination of factors, and thus it is often not reasonable to use single factor analysis alone. Multi-factor analysis takes into account the inherent links and interactions between multiple variables while analyzing the impact of multiple factors on outcome. The single-factor analysis and the multi-factor analysis complement each other, the single-factor analysis can preliminarily explore the relation between the predicted variable and the response variable, and when the sample size is not very large, the part of the unrelated predicted variable can be deleted through the single-factor analysis; and the multi-factor analysis may further exclude the effects of other confounding factors to determine the correlation of the predicted variable with the response variable.
In the prior art, products such as online credit and credit are gradually increased, so that in order to protect the economy of an entity, the entity can be trusted or in a period of exhibition when the business operation is difficult, the actions serve social folks, the long tail effect of small and medium-sized enterprise clients is exerted, and meanwhile, new challenges are provided for a risk control scheme. The method is limited to the characteristics of small scale and large quantity of small and medium-sized micro enterprises, and uses technologies such as big data, artificial intelligence and the like to predict cash flows of the small and medium-sized micro enterprises in a batched and automatic mode so as to evaluate repayment capacity and credit risks of the enterprises, thereby being a necessary strategy for commercial banks to perform general financial business to perform risk control.
According to the invention, the GRU model is adopted to analyze business data information (such as financial data) and user attribute information (such as legal data) of an enterprise in the past period, so that the time sequence characteristics can be effectively extracted. Meanwhile, considering that the prediction result is deviated due to the existence of confounding factors except for model analysis data, the method combines a causal analysis model, extracts potential multi-factor confounding variables and single-factor confounding variables in the data as inputs of a GRU model, and is used for eliminating bias influence of the confounding factors on the prediction result, so that the prediction accuracy of enterprise cash flow is improved.
In an embodiment, fig. 2 is a flowchart of a data prediction method according to an embodiment of the present invention, where the method may be implemented by a data prediction device, and the data prediction device may be implemented in hardware and/or software, and the data prediction device may be configured in an electronic device. As shown in fig. 2, the method includes:
s110, acquiring target object attribute information of the target object at the next time stamp.
In one embodiment, the object attribute information includes: user attribute information of a target user associated with the target object and business data information of the target object. Wherein, the user attribute information refers to basic information of the user, such as credit information and asset information of the user; the business data information refers to related data information of a business aspect associated with the target object, such as social value information and profitability information of the target object. In an embodiment, the related database of the target object can be directly accessed through the cloud server, and the target object attribute information of the target object at the next time stamp can be obtained by identifying the target object attribute information related to the target object. In general, the user attribute information of the target user associated with the target object is fixed, but the service data information of the target object may be changed in real time or in a timed manner.
S120, identifying and extracting multi-factor confusion variables and single-factor confusion variables of target object attribute information at the next time stamp based on the initial object attribute information of the previous time stamp and the current time stamp.
In an embodiment, original object attribute information of a target object at a current time stamp may be acquired first; and carrying out data preprocessing on the original object attribute information to obtain corresponding original object attribute information. The data preprocessing can comprise data cleaning, data screening, data normalization and other processing operations. In an embodiment, after original object attribute information of a target object at a current time stamp is acquired, data cleaning, data screening and data normalization operations are performed on the original object attribute information, so that invalid data in the original object attribute information is screened out, and all data are normalized to the same dimension, so that subsequent processing of the data is facilitated. The object attribute information can be subjected to feature engineering operation to obtain the profitability feature, the social value feature, the legal credit feature and the legal business capability feature of the target object.
Wherein, the multi-factor confusion variable refers to a plurality of variables with memory connection and mutual influence; a single-cause confounding variable refers to a single variable that can have an effect at one point in time.
In one embodiment, identifying and extracting the multi-factor confusion variable of the target object attribute information at the next time stamp based on the initial object attribute information of the previous time stamp and the current time stamp includes: acquiring initial object attribute information of a target object at a previous time stamp and a current time stamp respectively; respectively inputting the initial object attribute information of the previous timestamp and the initial object attribute information of the current timestamp into the GRU network model to obtain a corresponding initial multi-factor confusion variable; and determining the multi-factor confusion variables of the initial object attribute information at the last time stamp, the current time stamp and the next time stamp respectively by adopting a normal differential equation solver and the initial multi-factor confusion variables.
In an embodiment, further comprising: and reconstructing the initial object attribute information based on the multi-factor confusion variables of the previous timestamp, the current timestamp and the next timestamp and the pre-configured weight coefficient of the first full-connection layer by adopting the full-connection layer as a decoder to obtain the target object attribute information of the previous timestamp, the current timestamp and the next timestamp.
Each target object can correspond to a plurality of initial object attribute information, and correspondingly, the plurality of initial object attribute information of the previous timestamp can be input into the GRU network model, and the plurality of initial object attribute information of the current timestamp is input into the GRU network model to obtain a corresponding initial multi-factor confusion variable; then solving the initial multi-factor confusion variable by adopting a normal differential equation solver to obtain multi-factor confusion variables corresponding to the time step, namely multi-factor confusion variables of the last time stamp, the current time stamp and the next time stamp; finally, based on the multi-factor confusion variable of the last time stamp and a pre-configured weight coefficient of a first full-connection layer, and adopting the full-connection layer as a decoder, reconstructing the input initial object attribute information to obtain target object attribute information of the last time stamp; based on the multi-factor confusion variable at the current time stamp and a pre-configured weight coefficient of a first full-connection layer, and adopting the full-connection layer as a decoder, reconstructing the input initial object attribute information to obtain target object attribute information at the current time stamp; and reconstructing the input initial object attribute information based on the multi-factor confusion variable at the next time stamp and a pre-configured weight coefficient of the first full-connection layer by adopting the full-connection layer as a decoder to obtain the target object attribute information at the next time stamp.
In one embodiment, identifying and extracting the single-factor confusion variable of the target object attribute information at the next time stamp based on the initial object attribute information of the previous time stamp and the current time stamp includes: acquiring initial object attribute information and an initial data fluctuation stream of a target object at a previous time stamp and a current time stamp respectively; respectively inputting the initial object attribute information, the initial data fluctuation flow and the multi-factor confusion variable of the previous timestamp and the initial object attribute information, the initial data fluctuation flow and the multi-factor confusion variable of the current timestamp into a GRU network model to obtain a corresponding initial single-factor confusion variable; and determining the single-factor confusion variables of the initial object attribute information at the last time stamp, the current time stamp and the next time stamp respectively by adopting a normal differential equation solver and the initial single-factor confusion variables.
In an embodiment, further comprising: and reconstructing the initial object attribute information based on the single-factor confusion variable and a pre-configured weight coefficient of a second full-connection layer by adopting the full-connection layer as a decoder to obtain the target object attribute information and the target data fluctuation stream of the last time stamp and the target object attribute information and the target data fluctuation stream of the current time stamp.
In an embodiment, each target object may correspond to a plurality of initial object attribute information and an initial data variable flow, and correspondingly, the plurality of initial object attribute information, the initial data variable flow and the multi-factor confusion variable of the previous timestamp may be input into the GRU network model, and the plurality of initial object attribute information, the initial data variable flow and the multi-factor confusion variable of the current timestamp may be input into the GRU network model, so as to obtain a corresponding initial multi-factor confusion variable; then solving the initial single-factor confusion variable by adopting a normal differential equation solver to obtain single-factor confusion variables corresponding to the time step, namely single-factor confusion variables of the last time stamp, the current time stamp and the next time stamp; finally, based on the single-factor confusion variable of the last time stamp and a pre-configured weight coefficient of a second full-connection layer, the full-connection layer is adopted as a decoder to reconstruct the input initial object attribute information, and the target object attribute information and the target data fluctuation stream of the last time stamp are obtained; and reconstructing the input initial object attribute information by adopting the full-connection layer as a decoder based on the single-factor confusion variable at the current time stamp and the pre-configured weight coefficient of the second full-connection layer to obtain the target object attribute information and the target data fluctuation stream at the current time stamp.
S130, inputting the multi-factor confusion variable, the single-factor confusion variable and the target object attribute information into a pre-created GRU network model to obtain a target data fluctuation stream of the target object at the next time stamp.
In an embodiment, the multi-factor confusion variable, the single-factor confusion variable and the target object attribute information are input into a pre-created GRU network model, so that a target data fluctuation stream of the target object at the next time stamp can be obtained.
According to the technical scheme, the initial object attribute information based on the previous time stamp and the current time stamp is analyzed based on the target object, the multi-factor confusion variable and the single-factor confusion variable of the target object attribute information at the next time stamp are identified and extracted from the initial object attribute information, the multi-factor confusion variable and the single-factor confusion variable of the next time stamp and the target object attribute information are used as input parameters of a GRU network model, so that bias influence of confounding factors on a target data fluctuation stream of the next time stamp is eliminated; meanwhile, the data of the target object at a plurality of time stamps are analyzed, so that the technical problem that in the prior art, only data at one time point is concerned, the accuracy is low is solved, and the accuracy of predicting the target data fluctuation flow is improved.
In an embodiment, fig. 3 is a flowchart of another data prediction method according to an embodiment of the present invention, and this embodiment is a preferred embodiment based on the foregoing embodiment. Illustratively, the object attribute information includes business data information and user attribute information, for example, business data information is business data, user attribute information is business legal data, and the object data change stream is business cash stream. As shown in fig. 3, the method includes:
s210, acquiring business data information and user attribute information of an enterprise.
S220, extracting a multi-factor confusion variable.
Fig. 4 is a schematic diagram of an implementation of extracting multi-factor confusion variables according to an embodiment of the present invention. As shown in fig. 4, it is assumed that one target object includes k pieces of initial object attribute information, where k is an integer of 1 or more. Since the input is time-varying time-series data, the GRU is used as the encoder to extract the time-series characteristics of the input dataCharacterization, calculation of initial Multi-dependent variable M 0 Calculating the multi-factor confusion variable (the multi-factor confusion variable M of the last time stamp respectively) of the corresponding time step by using a normal differential equation solver (namely ODEsolver) T-1 Multi-cause confusion variable M for current timestamp T And the next time stamp's multi-cause confusion variable M T+1 ) Finally, using the full connection layer (FC) as a decoder, reconstructing the input initial object attribute information containing the service data information and the user attribute information to obtain the corresponding target object attribute information (the target object attribute information A of the last time stamp respectively) of the service data information and the user attribute information T-1 Target object attribute information a of current timestamp T And target object attribute information a of the next timestamp T+1 )。
M 0 =GRU(A t ;W f1 )
M t =ODEsolver(M 0 ),t=1,2...T+1
A t =FC(M t ;W f2 ),t=1,2...T+1
Wherein A is t Object attribute information for each timestamp, where T may be an integer of 1,2 … … t+1; m is M t A multi-cause confusion variable for each timestamp, where T may be an integer of 1,2 … … t+1; w (W) f1 Weight coefficient for GRU network and W f2 The weight parameter is the weight coefficient of the full connection layer, namely the weight coefficient of the first full connection layer.
S230, extracting single-factor confusion variables.
Fig. 5 is a schematic diagram of an implementation of extracting multi-factor confusion variables according to an embodiment of the present invention. As shown in fig. 5, it is assumed that one target object includes j pieces of initial object attribute information, where j is an integer of 1 or more. The single-factor confusion variable only affects one dimension data, and therefore only passes through the object attribute information A Y Cannot be extracted and needs to be added into enterprise cash flow data Y (A T ). The process of extracting single-factor confusion variable is similar to extracting multi-factor confusion variable, and the object attribute information A at the time of the last time stamp T-1 T-1 Data variable flow Y (a T-1 ) And a multiple-factor confusion variable M T-1 As input, an initial single-factor confusion variable S is calculated 0 Calculating single-dependent confusion variables (single-dependent confusion variables S of the previous time stamp respectively) of corresponding time steps by using ordinary differential equation solver T-1 Single-cause confusion variable S of current timestamp T And the single-cause confusion variable S of the next timestamp T+1 ) Finally, using the full connection layer as a decoder to reconstruct the input business data information, user attribute information and data fluctuation stream to obtain corresponding business data information and user attribute information (respectively, object attribute information A of the last time stamp) T-1 Object attribute information a of current timestamp T ) And a data fluctuation stream (target data fluctuation stream Y (a T-1 ) Target data change stream Y (a) of current time stamp T ))。
S 0 =GRU(A t ,Y(A t );W f3 )
S t =ODEsolver(S 0 ),t=1,2...T+1
A t =FC(M t ;W f4 ),t=1,2...T+1
Wherein A is t Object attribute information for each timestamp, where T may be an integer of 1,2 … … t+1; m is M t A multi-cause confusion variable for each timestamp, where T may be an integer of 1,2 … … t+1; s is S t A single-cause confusion variable for each timestamp, where T may be an integer of 1,2 … … t+1; w (W) f3 Weight coefficient for GRU network and W f4 The weight parameter of the full connection layer is the weight coefficient of the second full connection layer.
S240, predicting cash flow of the enterprise.
Fig. 6 is a schematic diagram of an implementation of data prediction of a next time stamp according to an embodiment of the present invention. As shown in fig. 6, the single-factor confusion variable and the multi-factor confusion variable calculated according to the above are combined with the target object attribute information of the target object. At time T+1, according to A T+1 ,M T+1 And S is T+1 Target data stream Y (a) at the next time stamp t+1 is predicted T+1 )。
Y(A T+1 )=GRU(A T+1 ,M T+1 ,S T+1 ;W f5 )
In which W is f5 Is a weight parameter of the GRU network.
According to the technical scheme, the GRU model is adopted, time sequence characteristics in the object attribute information can be extracted, the change trend of the target data change stream can be analyzed conveniently and better, and compared with the LSTM model, the GRU model has the advantages of being few in training parameters and high in training efficiency. Meanwhile, the embodiment of the invention combines a potential multi-factor confusion variable and a single-factor confusion variable in the causal analysis model extraction data as input parameters of the GRU model, wherein the single-factor confusion variable represents the confounding factor affecting the enterprise data in a single dimension, the multi-factor confusion variable represents the confounding factor affecting the enterprise data in multiple dimensions, and the multi-dimension characteristics of the enterprise can be fully extracted by considering the confounding factor, so that the prediction accuracy of the target data fluctuation flow is improved.
In an embodiment, fig. 7 is a schematic structural diagram of a data prediction apparatus according to an embodiment of the present invention. As shown in fig. 7, the apparatus includes: a first acquisition module 310, an identification extraction module 310, and a first prediction module 330.
The first obtaining module 310 is configured to obtain target object attribute information of a target object at a next timestamp;
an identification extraction module 320, configured to identify and extract a multi-factor confusion variable and a single-factor confusion variable of the target object attribute information at the next time stamp based on the initial object attribute information of the previous time stamp and the current time stamp;
the first prediction module 330 is configured to input the multi-factor confusion variable, the single-factor confusion variable and the target object attribute information into a pre-created gating loop unit GRU network model, so as to obtain a target data variation stream of the target object at the next timestamp.
In one embodiment, the identification extraction module 320 includes:
the first acquisition unit is used for acquiring initial object attribute information of the target object at the last time stamp and the current time stamp respectively;
the first extraction unit is used for respectively inputting the initial object attribute information of the previous timestamp and the initial object attribute information of the current timestamp into the GRU network model to obtain a corresponding initial multi-factor confusion variable;
and the first determining unit is used for determining the multi-factor confusion variables of the last time stamp, the current time stamp and the next time stamp of the initial object attribute information by adopting the ordinary differential equation solver and the initial multi-factor confusion variables.
In an embodiment, the data prediction apparatus further comprises:
the first reconstruction module is used for reconstructing the initial object attribute information based on multi-factor confusion variables of the previous timestamp, the current timestamp and the next timestamp and a pre-configured weight coefficient of a first full-connection layer, and adopting the full-connection layer as a decoder to obtain target object attribute information of the previous timestamp, the current timestamp and the next timestamp.
In one embodiment, the identification extraction module 320 includes:
the second acquisition unit is used for acquiring initial object attribute information and an initial data fluctuation stream of the target object at the previous time stamp and the current time stamp respectively;
the second extraction unit is used for respectively inputting the initial object attribute information, the initial data variation flow and the multi-factor confusion variable of the previous time stamp and the initial object attribute information, the initial data variation flow and the multi-factor confusion variable of the current time stamp into the GRU network model to obtain a corresponding initial single-factor confusion variable;
and the second determining unit is used for determining single-factor confusion variables of the last time stamp, the current time stamp and the next time stamp of the initial object attribute information by adopting the ordinary differential equation solver and the initial single-factor confusion variables.
In an embodiment, the data prediction apparatus further comprises:
and the second reconstruction module is used for reconstructing the initial object attribute information based on the single-factor confusion variable and a pre-configured weight coefficient of a second full-connection layer by adopting the full-connection layer as a decoder to obtain the target object attribute information and the target data fluctuation stream of the last time stamp and the target object attribute information and the target data fluctuation stream of the current time stamp.
In an embodiment, the data prediction apparatus further comprises:
the second acquisition module is used for acquiring original object attribute information of the target object at the current time stamp;
the preprocessing module is used for carrying out data preprocessing on the original object attribute information to obtain corresponding initial object attribute information.
In one embodiment, the object attribute information includes: user attribute information of a target user associated with the target object and business data information of the target object.
The data prediction device provided by the embodiment of the invention can execute the data prediction method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
In one embodiment, fig. 8 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 8, a schematic diagram of an electronic device 10 that may be used to implement an embodiment of the present invention is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 8, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the data prediction method.
In some embodiments, the data prediction method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. One or more of the steps of the data prediction method described above may be performed when the computer program is loaded into RAM 13 and executed by processor 11. Alternatively, in other embodiments, the processor 11 may be configured to perform the data prediction method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (10)
1. A method of data prediction, comprising:
acquiring target object attribute information of a target object at a next time stamp;
identifying and extracting a multi-factor confusion variable and a single-factor confusion variable of the target object attribute information at the next time stamp based on the initial object attribute information of the previous time stamp and the current time stamp;
and inputting the multi-factor confusion variable, the single-factor confusion variable and the target object attribute information into a pre-created gating cycle unit GRU network model to obtain a target data fluctuation stream of the target object at the next time stamp.
2. The method of claim 1, wherein the identifying and extracting the multi-factor confusion variable for the target object attribute information at the next time stamp based on the initial object attribute information of the previous time stamp and the current time stamp comprises:
acquiring initial object attribute information of a target object at a previous time stamp and a current time stamp respectively;
respectively inputting the initial object attribute information at the previous timestamp and the initial object attribute information at the current timestamp into a GRU network model to obtain a corresponding initial multi-factor confusion variable;
and determining the multi-factor confusion variables of the initial object attribute information at the previous time stamp, the current time stamp and the next time stamp respectively by adopting a normal differential equation solver and the initial multi-factor confusion variables.
3. The method according to claim 2, characterized in that the method further comprises:
and reconstructing the initial object attribute information based on multi-factor confusion variables at the previous time stamp, the current time stamp and the next time stamp respectively and a preconfigured first full-connection layer weight coefficient by adopting a full-connection layer as a decoder to obtain target object attribute information at the previous time stamp, the current time stamp and the next time stamp respectively.
4. The method of claim 1, wherein the identifying and extracting single-factor confounding variables of the target object attribute information at the next time stamp based on the initial object attribute information of the previous time stamp and the current time stamp comprises:
acquiring initial object attribute information and an initial data fluctuation stream of a target object at a previous time stamp and a current time stamp respectively;
respectively inputting the initial object attribute information, the initial data variation flow and the multi-factor confusion variable at the previous time stamp and the initial object attribute information, the initial data variation flow and the multi-factor confusion variable at the current time stamp into a GRU network model to obtain a corresponding initial single-factor confusion variable;
and determining single-dependent confusion variables of the initial object attribute information at the previous time stamp, the current time stamp and the next time stamp respectively by adopting a normal differential equation solver and the initial single-dependent confusion variables.
5. The method according to claim 4, further comprising:
and reconstructing the initial object attribute information based on the single-factor confusion variable and a pre-configured second full-connection layer weight coefficient by adopting a full-connection layer as a decoder to obtain the target object attribute information and a target data fluctuation stream of the previous time stamp and the target object attribute information and the target data fluctuation stream of the current time stamp.
6. The method according to any one of claims 1-5, further comprising:
acquiring original object attribute information of a target object at a current time stamp;
and carrying out data preprocessing on the original object attribute information to obtain corresponding initial object attribute information.
7. The method according to any one of claims 1-5, wherein the object attribute information comprises: user attribute information of a target user associated with the target object and business data information of the target object.
8. A data prediction apparatus, comprising:
the first acquisition module is used for acquiring target object attribute information of a target object at the next time stamp;
the identification and extraction module is used for identifying and extracting a multi-factor confusion variable and a single-factor confusion variable of the target object attribute information at the next time stamp based on the initial object attribute information of the previous time stamp and the current time stamp;
the first prediction module is used for inputting the multi-factor confusion variable, the single-factor confusion variable and the target object attribute information into a pre-created gating circulation unit GRU network model to obtain a target data fluctuation stream of the target object at the next time stamp.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data prediction method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the data prediction method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311667465.8A CN117670519A (en) | 2023-12-06 | 2023-12-06 | Data prediction method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311667465.8A CN117670519A (en) | 2023-12-06 | 2023-12-06 | Data prediction method, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117670519A true CN117670519A (en) | 2024-03-08 |
Family
ID=90069370
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311667465.8A Pending CN117670519A (en) | 2023-12-06 | 2023-12-06 | Data prediction method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117670519A (en) |
-
2023
- 2023-12-06 CN CN202311667465.8A patent/CN117670519A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Munappy et al. | Data management challenges for deep learning | |
US11775412B2 (en) | Machine learning models applied to interaction data for facilitating modifications to online environments | |
CN110390408A (en) | Trading object prediction technique and device | |
CN109345133B (en) | Review method based on big data and deep learning and robot system | |
Vogl | Controversy in financial chaos research and nonlinear dynamics: a short literature review | |
CN112328869A (en) | User loan willingness prediction method and device and computer system | |
CN115545481A (en) | Risk level determination method and device, electronic equipment and storage medium | |
CN117934154A (en) | Transaction risk prediction method, model training method, device, equipment, medium and program product | |
CN117593115A (en) | Feature value determining method, device, equipment and medium of credit risk assessment model | |
CN117474669A (en) | Loan overdue prediction method, device, equipment and storage medium | |
Singh et al. | Twitter sentiment analysis for stock prediction | |
CN110910241A (en) | Cash flow evaluation method, apparatus, server device and storage medium | |
CN110544166A (en) | Sample generation method, device and storage medium | |
CN117670519A (en) | Data prediction method, device, equipment and medium | |
CN114529399A (en) | User data processing method, device, computer equipment and storage medium | |
CN114565470A (en) | Financial product recommendation method based on artificial intelligence and related equipment thereof | |
CN114092216A (en) | Enterprise credit rating method, apparatus, computer device and storage medium | |
Jaddu et al. | Combining Deep Learning on Order Books with Reinforcement Learning for Profitable Trading | |
CN112905662A (en) | Method, system and device for distinguishing true and false consumers of internet | |
CN112308295B (en) | Method and device for predicting default probability | |
CN115238817A (en) | Model training and demand judging method, device, equipment, medium and product | |
Kart | Decision support system for a customer relationship management case study | |
CN114818892A (en) | Credit grade determining method, device, equipment and storage medium | |
CN117892877A (en) | Mobile phone banking user behavior prediction method, device, equipment and medium | |
CN117635310A (en) | Method, device, equipment and medium for determining overdue risk of loan |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |