CN114218529A - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN114218529A
CN114218529A CN202111555011.2A CN202111555011A CN114218529A CN 114218529 A CN114218529 A CN 114218529A CN 202111555011 A CN202111555011 A CN 202111555011A CN 114218529 A CN114218529 A CN 114218529A
Authority
CN
China
Prior art keywords
interval
data structure
data
initial data
structure node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111555011.2A
Other languages
Chinese (zh)
Inventor
林鹭晖
张玉龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202111555011.2A priority Critical patent/CN114218529A/en
Publication of CN114218529A publication Critical patent/CN114218529A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Optimization (AREA)
  • Operations Research (AREA)
  • Strategic Management (AREA)
  • Pure & Applied Mathematics (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Development Economics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Educational Administration (AREA)
  • Software Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Complex Calculations (AREA)

Abstract

The application discloses a data processing method, a data processing device, a data processing apparatus and a storage medium. The information difference statistic corresponding to each data point is calculated, the information difference of the data point relative to the adjacent data points can be quantized, and the variation of the data point relative to the data information of two adjacent data points before and after can be judged through the information difference statistic, so that the data structure node can be accurately determined through the information difference statistic, the data structure node is corrected, and the accuracy of judging the data structure node is improved.

Description

Data processing method, device, equipment and storage medium
Technical Field
The present application belongs to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and storage medium.
Background
The financial data has corresponding data nodes on the structure, and the change of the data structure or the generation of the data nodes means that the business risk of the enterprise is greatly changed. Therefore, in order to control the operational risk and maximize the operational efficiency of the enterprise, the data structure needs to be analyzed, and the related data structure nodes can be accurately judged.
In the data processing method in the prior art, the data structure node of the time series data can be directly judged, but the data structure node often has the problem of hysteresis deviation. For example, a prediction model needs to select a history sample with a certain window width for model parameter estimation (such as an adaptive algorithm), and the judgment of a node on the left side of a window is often moved more leftward than an actual node, so that an error exists in the determination of a data structure node in the prior art.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device, data processing equipment and a storage medium, and the accuracy of data analysis can be improved.
In a first aspect, an embodiment of the present application provides a data processing method, including:
acquiring N initial data structure nodes of time series data in a target interval, wherein N is more than or equal to 3 and is a positive integer, and a plurality of data points are included between two adjacent initial data structure nodes; the target interval is any one section of interval selected from the time sequence data;
calculating information difference statistics corresponding to a first data point in the target interval, wherein the first data point is any one data point in the target interval, the information difference statistics corresponding to the first data point is information difference statistics of a subinterval in a first interval corresponding to the first data point, the first interval corresponding to the first data point is an interval between two adjacent data points at two sides of the first data point, and the subinterval is an interval between two adjacent data points;
and correcting the N initial data structure nodes according to the information difference statistics corresponding to each first data point to obtain target data structure nodes.
In a second aspect, an embodiment of the present application provides a data processing apparatus, including:
the acquisition module is used for acquiring N initial data structure nodes of the time series data in the target interval, wherein N is more than or equal to 3 and is a positive integer, and a plurality of data points are included between two adjacent initial data structure nodes; the target interval is any one section of interval selected from the time sequence data;
a calculating module, configured to calculate an information difference statistic corresponding to a first data point in the target interval, where the first data point is any one data point in the target interval, the information difference statistic corresponding to the first data point is an information difference statistic of a sub-interval in a first interval corresponding to the first data point, the first interval corresponding to the first data point is an interval between two adjacent data points on two sides of the first data point, and the sub-interval is an interval between two adjacent data points;
and the correcting module is used for correcting the N initial data structure nodes according to the information difference statistics corresponding to each first data point to obtain target data structure nodes.
In a third aspect, an embodiment of the present application provides an electronic device, where the device includes: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a data processing method as shown in the first aspect.
In a fourth aspect, the present application provides a readable storage medium, on which a program or instructions are stored, and when the program or instructions are executed by a processor, the program or instructions implement the data processing method shown in the first aspect.
In a fifth aspect, the present application provides a computer program product, and when executed by a processor of an electronic device, the instructions of the computer program product cause the electronic device to execute the data processing method according to the first aspect.
According to the data processing method provided by the embodiment of the application, a section of interval is selected from time sequence data and determined as a target interval, N initial data structure nodes of the time sequence data in the target interval are obtained, then information difference statistics corresponding to each data point are calculated, the obtained N initial data structure nodes are corrected according to the information difference statistics corresponding to each data point, and the target data structure nodes are obtained. The information difference statistic corresponding to each data point is calculated, the information difference of the data point relative to the adjacent data points can be quantized, and the information variation quantity of the data point relative to the data of the two adjacent data points can be judged through the information difference statistic, so that the data structure node can be accurately judged through the information difference statistic, the data structure node is corrected, and the accuracy of judging the data structure node is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a data processing method provided in an embodiment of the present application;
FIG. 2 is a schematic flow chart diagram illustrating a data processing method according to another embodiment of the present application;
FIG. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In order to solve the problem of the prior art, embodiments of the present application provide a data processing method, apparatus, device, and storage medium. First, a data processing method provided in an embodiment of the present application is described below.
The commercial bank has a great deal of evidence that financial data represented by interest rate period structures have corresponding data nodes on the structures, and the existence of the structural nodes is actually an explicit expression of structural changes of the data on different dimensional development (such as time series development). The occurrence of such data structure change marks a significant change in the information conveyed by the corresponding data, which may be caused by structural change generated inside the system or due to external environmental macro factors or policy influence. If the data expresses the business condition of the enterprise or the relevant indexes concerned by the business condition, the change of the data structure or the generation of the data nodes means that the business risk faced by the enterprise generates larger change. Therefore, from the fundamental point of controlling the operation risk and maximizing the operation efficiency, the enterprise has a strong demand for data structure analysis, and the related data nodes can be accurately judged.
The data processing method in the prior art can construct a data model for data and directly judge a data structure node, but the data processing method often has the problem of hysteresis deviation. For example, the prediction model needs to select a history sample with a certain window width for model parameter estimation (such as adaptive algorithm, etc.), and the judgment of the node on the left side of the window tends to move more leftward than the actual node. The information contained in the sample points which are excessive on the left side is inconsistent with the information in the right interval, and the estimation of the parameters of the participated model by using the information is more wrong for prediction.
Therefore, in order to effectively and quickly correct the time series data structural node, the present application proposes a data processing method for correcting the time series data structural node based on the information entropy.
Some of the terms referred to in this application are explained below:
data structure: the data structure in the invention refers to the statistical characteristics of the data on different dimensions, and is the specific information expression of the data. For example, if the data set is the weight data of all infants within 1 year old, the average value is one data structure thereof, and the information of the average increase of the infants within 1 year old is expressed.
Fig. 1 shows a schematic flow chart of a data processing method according to an embodiment of the present application. As shown in fig. 1, the method comprises the following steps 101 to 103.
Step 101, acquiring N initial data structure nodes of time series data in a target interval.
And 102, calculating information difference statistics corresponding to the first data point in the target interval.
And 103, correcting the N initial data structure nodes according to the information difference statistics corresponding to each first data point to obtain target data structure nodes.
Specific implementations of the above steps will be described in detail below.
According to the data processing method provided by the embodiment of the application, the data fluctuation size of the data point compared with two adjacent data points in front and back can be judged according to the information difference statistic corresponding to each data point, so that the data structure node can be accurately judged through the information difference statistic, the data structure node is corrected, and the accuracy of judging the data structure node is improved.
Specific implementations of the above steps are described below.
In step 101, N initial data structure nodes of time-series data within a target interval are acquired. N is more than or equal to 3 and is a positive integer, and a plurality of data points are arranged between two adjacent initial data structure nodes; the target interval is any interval selected from the time sequence data.
Time series refers to a sequence of values of a variable that are measured chronologically over a certain time. Anything observed or measured at multiple points in time can form a time series. Time series (time series) data is an important form of structured data.
Specifically, when analyzing time-series data, it is necessary to select one data segment from the time-series data as a target segment for data processing, and to specify an initial data structure node of the time-series data for the time-series data in the target segment.
In some embodiments, the data structure nodes of the time series data may be acquired by a data structure node algorithm.
For example, each data interval [ x ] is selected from the time series data1,xt]As the target interval, in the interval [ x ]1,xt]Time series data { x within1,x2,x3,x4,x5…,xtObtaining a data structure node set { x ] through a data structure node algorithm1,x4,…,xt}。
In some embodiments, obtaining N initial data structure nodes of time series data within a target interval may include:
constructing a time sequence model according to the time sequence data;
and obtaining N initial data structure nodes of the time sequence data in the target interval according to the time sequence model.
Specifically, after a set of data is acquired, the set of data is analyzed, and it is necessary to find the characteristics of the data and describe the characteristics of the data, so that analyzing the time series data may be to analyze the time series data by using a statistical means, so as to construct a data model for the variation characteristics of the data, and obtain the data structure node of the time series data according to the data model.
Taking the time series data of market interest rate periods as an example, in some embodiments, the data model of interest rate periods may be expressed as:
yt+1=Et(yt+1|Θ,Φt)
wherein phitFor a history information set phit={Xt,Yt}={x1,x2,…,xt,y1,y2,…,yt},ytIs the market interest level, x, of the relevant producttIs a vector of factors that affect the market interest level, and the parameter set is Θ ═ θ12,…,θkAnd the interest rate period parameter is included in the parameter set.
The initial data structure node is obtained according to the time series data model, various methods exist in the prior art, such as an adaptive algorithm and the like, a specific operation method can be determined according to an actual application scenario, and the method for obtaining the initial data structure node is not limited in the application.
For example, for any segment of interval [ x ]1,xt]Time series data of (x)1,x2,x3,x4,x5…,xtGet the time sequence in the interval [ x ] by constructing the time sequence model1,xt]N data structure nodes { x1,x5…,xt}。
In step 102, information difference statistics corresponding to the first data point in the target interval are calculated.
The information difference statistic is a statistic describing the information amount difference between data, and the difference reflects the degree of dispersion between data.
Wherein, the first isThe data point can be any one data point in the target interval, the information difference statistic corresponding to the first data point is the information difference statistic of a subinterval in the first interval corresponding to the first data point, the first interval corresponding to the first data point is the interval between two adjacent data points at two sides of the first data point, and the subinterval is the interval between two adjacent data points. For example, assume that in the target interval [ x ]1,xt]In the above, the time series data is { x }1,x2,x3,x4,x5…,xt}, then data point x2The corresponding first interval is [ x ]1,x3) And x is2The corresponding subinterval in the first interval is [ x ]1,x2) And [ x ]2,x3)。
For statistics of differences between data, the difference statistics may be: range, interquartile, variance, standard deviation, coefficient of variation measuring relative degrees of dispersion, and the like.
In some embodiments of the present application, the following method may be adopted to calculate the information difference statistic corresponding to the first data point:
calculating the information entropy of a first interval corresponding to the first data point and the information entropy of subintervals in the first interval;
respectively calculating an absolute KL divergence estimation value of each subinterval and the first interval according to the information entropy of the first interval and the information entropy of each subinterval in the first interval;
and adding the absolute KL divergence estimated values of the subintervals and the first interval to obtain the information difference statistic corresponding to the first data point.
Entropy represents a measure of uncertainty in random variables in information theory, and mathematically, entropy is an expectation of the amount of information. Information can be quantized by using the information entropy, and the information quantity of data, namely information caused by the occurrence of an event, is calculated. Therefore, the information difference statistics of subintervals at two sides of the data point in the time series data can be calculated based on the information entropy formula and the absolute KL divergence formula, and the information difference statistics corresponding to each data point can be obtained.
In this example, based on the information entropy and the KL divergence maximization technique, the information difference statistic corresponding to the first data point can be calculated, and the information difference between the data points can be quantized to visually display the information difference between the data points.
In one example, the time series data of interest rate period in the above example is taken as an example, and the time series data model y according to interest rate periodt+1=Et(yt+1|Θ,Φt) Wherein phi istFor a history information set phit={Xt,Yt}={x1,x2,…,xt,y1,y2,…,yt},ytIs the market interest level, x, of the relevant producttIs a vector of factors that affect the market interest level, and the parameter set is Θ ═ θ12,…,θkAnd the interest rate period parameter is included in the parameter set.
For the target interval I ═ x0,xT]={x1,x2,x3,x4,x5…,xs,…,xT},
Figure BDA0003418312320000071
The calculation formula of the information entropy is as follows:
H(Θs,I)=-E(log f(X)|Θs,I) (1)
where f (-) is the probability density function, sample X ∈ I, and the parameter set
Figure BDA0003418312320000072
Figure BDA0003418312320000073
Based on the information entropy formula, theta can be calculated through an absolute KL divergence formulasAnd thetatAmount of information difference between DKLs||Θt):
The absolute KL divergence is calculated as:
Figure BDA0003418312320000081
in some embodiments, when ΘsAnd ΘtWhen representing a data set of two intervals, the absolute KL divergence estimate for the two intervals can be calculated by the following equation
Figure BDA0003418312320000082
Figure BDA0003418312320000083
Wherein n is the number of data points in the interval.
When the first data point is xsThe first interval is Is∪Is+1The sub-interval in the first interval is Is=[xs-1,xs) And Is+1=[xs-1,xs) Based on the absolute KL divergence estimate, interval I is calculated by the following formulasAnd interval Is+1Information difference statistic T ofs
Figure BDA0003418312320000084
In this example, based on the information entropy and the KL divergence maximization technique, statistics of data in adjacent subintervals on both sides of the data point are constructed, information difference statistics corresponding to the data point can be calculated, information differences between the data points are quantized, and the information differences between the data points are visually shown.
It should be noted that the calculation method proposed in the present application is not limited to the time-series data of interest rate time limit, and therefore, the calculation of the statistic of information difference is not limited to the data related to interest rate time limit. For different types of time series models, the calculation of the corresponding information difference statistics may also be adjusted accordingly. For example, for adjustment of risk level control, pairs may be redefined based on different data characteristicsAbsolute KL divergence DKL,ms||Θt)=E|(logf(Xs)|Θs,Is)-(logf(Xs)|Θt,Is)|mWhereby the statistic is adjusted to
Figure BDA0003418312320000085
Figure BDA0003418312320000086
In step 103, the N initial data structure nodes are corrected according to the information difference statistics corresponding to each first data point, so as to obtain a target data structure node.
In step 102, information difference statistics corresponding to the data points in each target interval are calculated, so that according to the numerical value of the information difference statistics corresponding to each first data point, nodes where data changes can be found by comparing the information difference statistics, that is, in a section, the data point where the information difference statistics is the largest or exceeds a preset threshold is a data structure node.
Specifically, for N initial data structure nodes in the obtained target interval, each initial data structure node is corrected, a correction range of each initial data structure node needs to be determined first, after the correction range is determined, a first data point with the largest information difference statistic is found according to the numerical value of the information difference statistic corresponding to each first data point in the correction range, and the initial data structure nodes are corrected according to the first data point with the largest information difference statistic in the correction range of each initial data structure node.
In some examples, the data points corresponding to the information difference statistics exceeding the preset threshold in the target interval may be determined as new data structure nodes.
In some examples, the target interval may also be divided into a plurality of homogeneous intervals (i.e., a second interval: an interval between two adjacent initial data structure nodes on both sides of the initial data structure node) according to the obtained N initial data structure nodes, and the data point with the largest information difference statistic is determined as a new data structure node for each homogeneous interval.
For example: for an initial set of data structural nodes
Figure BDA0003418312320000091
Figure BDA0003418312320000092
(namely acquiring N initial data structure nodes of time sequence data in the target interval), and segmenting the target interval according to the initial data structure nodes to obtain a plurality of homogeneous intervals:
Figure BDA0003418312320000093
and is
Figure BDA0003418312320000094
Figure BDA0003418312320000095
In some embodiments, correcting the N initial data structure nodes according to the information difference statistics corresponding to each first data point to obtain the target data structure node may include:
a determination step: for each initial data structure node, performing the following operations: determining a first data point with the largest information difference statistic in a second interval corresponding to the initial data structure node as a first initial data structure node to obtain a first initial data structure node of each initial data structure node; the second interval is an interval between two adjacent initial data structure nodes on two sides of the initial data structure node;
a first updating step: updating the initial data structure nodes according to each first initial data structure node;
under the condition that a preset condition is met, determining the updated initial data structure node as a target data structure node;
and under the condition that the preset condition is not met, circularly executing the determining step and the first updating step until the preset condition is met.
In one example, as shown in FIG. 2:
s201, acquiring time series data;
s202, acquiring time-time sequence data initial data structure nodes and homogeneous intervals (intervals between two adjacent initial data structure nodes):
construction of time series data model yt+1=Et(yt+1|Θ,Φt) Wherein the historical information set is phit={Xt,Yt}={x1,x2,…,xt,y1,y2,…,ytThe parameter set is theta ═ theta }12,…,θkH, target interval I ═ x0,xT]。
Obtaining an initial data structural node set through a structural node algorithm according to a data model
Figure BDA0003418312320000101
Figure BDA0003418312320000102
(namely acquiring N initial data structure nodes of time sequence data in the target interval), and segmenting the target interval according to the initial data structure nodes to obtain a plurality of homogeneous intervals:
Figure BDA0003418312320000103
and is
Figure BDA0003418312320000104
Figure BDA0003418312320000105
And S203, confirming the iteration times. I.e., the number of times each initial data structure node completes the correction, excluding the endpoint. The number of iterations (corrections) for each initial data structure node may be preset, or other conditions for stopping the iterations may be set. This step may or may not be provided.
S204, determining the S node (namely
Figure BDA0003418312320000106
) According to the data point with the maximum information difference statistic in the second interval, the s-th node is corrected.
(1) Fixed structural joint
Figure BDA0003418312320000107
And
Figure BDA0003418312320000108
in the interval
Figure BDA0003418312320000109
(i.e. the
Figure BDA00034183123200001010
Corresponding second interval) to find the maximum information difference statistic
Figure BDA00034183123200001011
Structural node of
Figure BDA00034183123200001012
Figure BDA00034183123200001013
I.e. the second interval
Figure BDA00034183123200001014
Substituting each first data point in the formulas (1), (2) and (4) to obtain information difference statistics corresponding to each first data point in a second interval; recording the information difference statistic with the maximum value as
Figure BDA00034183123200001015
Will be provided with
Figure BDA00034183123200001016
The corresponding first data point is determined as a first initial data structure node
Figure BDA00034183123200001017
(2) Fixed structural joint
Figure BDA00034183123200001018
And
Figure BDA00034183123200001019
can be in the interval
Figure BDA00034183123200001020
In finding maximized statistics
Figure BDA0003418312320000111
Structural node of
Figure BDA0003418312320000112
(that is, the first data point with the largest information difference statistic in the second interval corresponding to the initial data structure node is determined as the first initial data structure node, and the first initial data structure node of the initial data structure node is obtained).
(3) According to the step (2), in
Figure BDA0003418312320000113
Corresponding to the second region
Figure BDA0003418312320000114
To identify the corresponding structural node
Figure BDA0003418312320000115
(namely, the determining step is that for each initial data structure node, the first data point with the largest information difference statistic in the second interval corresponding to the initial data structure node is determined as the first initial data structure node, and the first initial data structure node of each initial data structure node is obtained).
Obtaining a first iteration of structural nodes
Figure BDA0003418312320000116
Wherein the two ends of the sample interval remain unchanged. According to the change of structural nodes, the corresponding interval division of the log is adjusted to
Figure BDA0003418312320000117
(i.e., a first updating step: updating the initial data structure nodes according to each first initial data structure node).
S205, judging whether all nodes except the end point complete iteration, if so, turning to S206, and if not, turning to S204;
s206, judging whether the iteration times are finished or the nodes are stable:
if not, go to S202, repeat the work of S202-205 for the kth iteration. Fixed structural joint
Figure BDA0003418312320000118
And
Figure BDA0003418312320000119
in the interval
Figure BDA00034183123200001110
In finding maximized statistics
Figure BDA00034183123200001111
Structural node of
Figure BDA00034183123200001112
That is, the structural node set generated by the kth iteration can be obtained
Figure BDA00034183123200001113
And corresponding k-th iteration generated homogenous interval
Figure BDA00034183123200001114
Figure BDA00034183123200001115
(i.e. in case the preset condition is not satisfied)And then, circularly executing the determining step and the first updating step until the preset condition is met. )
If so, iterating until the Kth time or the structural node is stable or a related iteration target (such as a time series data prediction effect) determined in advance, determining that the data structure node correction is completed, namely determining the updated initial data structure node as the target data structure node under the condition that a preset condition is met.
In this example, for the correction of the initial data structure node, the initial data structure node is kept unchanged in the process of completing one correction, that is, each initial data structure node except the end point completes one confirmation step and the first update step, after the one correction is completed, all the initial data structure nodes are updated according to each obtained first initial data structure node, and the data structure node converges to a correct structural breakpoint in the correction process through the cyclic execution of the correction step. The data structure node correction method of the present example obtains better stability of the data structure node.
In some embodiments, correcting the N initial data structure nodes according to the information difference statistics corresponding to each first data point to obtain the target data structure node may include:
a second updating step: for each initial data structure node, performing the following operations: updating the first data point with the largest information difference statistic in the second interval corresponding to the initial data structure node into the initial data structure node to obtain an updated initial data structure node; the second interval is an interval between two adjacent initial data structure nodes on two sides of the initial data structure node;
under the condition that a preset condition is met, determining the updated initial data structure node as a target data structure node;
and under the condition that the preset condition is not met, circularly executing the second updating step until the preset condition is met.
In one example, for data model yt+1=Et(yt+1|Θ,Φt) Wherein the historical information setIs phit={Xt,Yt}={x1,x2,…,xt,y1,y2,…,ytThe parameter set is theta ═ theta }12,…,θkH, target interval I ═ x0,xT]。
S301, N initial data structure nodes of time sequence data in the target interval are obtained. Obtaining an initial data structural node set through a structural node algorithm according to a data model
Figure BDA0003418312320000121
Figure BDA0003418312320000122
Dividing the target interval according to the initial data structure node to obtain a plurality of homogeneous intervals:
Figure BDA0003418312320000123
and is
Figure BDA0003418312320000124
Figure BDA0003418312320000125
Figure BDA0003418312320000126
S302, updating a first data point corresponding to the initial data structure node and having the largest information difference statistic in a second interval into the initial data structure node to obtain an updated initial data structure node; the next initial data structure node is updated based on the updated initial data structure node.
Fixed structural joint
Figure BDA0003418312320000131
And
Figure BDA0003418312320000132
in that
Figure BDA0003418312320000133
Corresponding second interval
Figure BDA0003418312320000134
In finding maximized statistics
Figure BDA0003418312320000135
Structural node of
Figure BDA0003418312320000136
I.e. the second interval
Figure BDA0003418312320000137
Substituting each first data point in the formula (1), (3) and (4) to obtain information difference statistics corresponding to each first data point in a second interval; recording the information difference statistic with the maximum value as
Figure BDA0003418312320000138
Will be provided with
Figure BDA0003418312320000139
The corresponding first data point is determined as a first initial data structure node
Figure BDA00034183123200001310
Figure BDA00034183123200001311
Fixed structural joint
Figure BDA00034183123200001312
And
Figure BDA00034183123200001313
can be arranged at
Figure BDA00034183123200001314
Corresponding second interval
Figure BDA00034183123200001315
(wherein
Figure BDA00034183123200001316
According to updated
Figure BDA00034183123200001317
Determination) to find a maximized statistic
Figure BDA00034183123200001318
Structural node of
Figure BDA00034183123200001319
S303, aiming at each initial data structure node, respectively executing the following operations: and updating the first data point with the largest information difference statistic in the second interval corresponding to the initial data structure node into the initial data structure node to obtain an updated initial data structure node, namely the second updating step.
In the step S302, in the section
Figure BDA00034183123200001320
(wherein
Figure BDA00034183123200001321
Figure BDA00034183123200001322
Figure BDA00034183123200001323
For updated initial data structure nodes) to identify corresponding structural nodes
Figure BDA00034183123200001324
Then, the first iteration structural node can be obtained
Figure BDA00034183123200001325
Wherein the two ends of the sample interval remain unchanged. According to the change of the structural node, the corresponding homogeneous interval is adjusted to
Figure BDA00034183123200001326
Figure BDA00034183123200001327
S304, under the condition that the preset condition is not met, the second updating step is executed in a circulating mode until the preset condition is met.
Repeating S302-S303 for the kth iteration. Fixed structural joint
Figure BDA0003418312320000141
And
Figure BDA0003418312320000142
in the interval
Figure BDA0003418312320000143
(wherein
Figure BDA0003418312320000144
) In finding maximized statistics
Figure BDA0003418312320000145
Structural node of
Figure BDA0003418312320000146
That is, the structural node set generated by the kth iteration can be obtained
Figure BDA0003418312320000147
And corresponding k-th iteration generated homogenous interval
Figure BDA0003418312320000148
Figure BDA0003418312320000149
S305, under the condition that a preset condition is met, determining the updated initial data structure node as a target data structure node.
And (4) after the K time of iteration, the structural node is stable or relevant iteration targets (such as time series data prediction effect) determined in advance are achieved, and the data structure node correction is considered to be completed.
In this example, for the correction of the initial data structure node, the newly confirmed data structure node is updated in real time for the initial data structure node in the process of completing one correction, that is, completing the second updating step for each initial data structure node, and the correction of the next initial data structure node is corrected based on the updated initial data structure node. The data structure node correction method of the example has a faster convergence speed by circularly executing the correction step to make the data structure node converge to a correct structural breakpoint in the correction process.
In some embodiments, the preset condition may be that the data structure node correction is completed a preset number of times or the data structure node tends to be stable.
According to the data processing method provided by the embodiment of the application, the information quantity difference between the data point and two adjacent data points in front and back can be judged according to the numerical value of the information difference statistic corresponding to each data point, so that the data structure node can be accurately judged through the information difference statistic, the data structure node is corrected, and the accuracy of judging the data structure node is improved.
As shown in fig. 3, an embodiment of the present application further provides a data processing apparatus, which includes an obtaining module 301, a calculating module 302, and a correcting module 303.
An obtaining module 301, configured to obtain N initial data structure nodes of time series data in a target interval, where N is greater than or equal to 3 and is a positive integer, and a plurality of data points are included between two adjacent initial data structure nodes; the target interval is any interval selected from the time sequence data;
the calculating module 302 is configured to calculate an information difference statistic corresponding to a first data point in a target interval, where the first data point is any one data point in the target interval, the information difference statistic corresponding to the first data point is an information difference statistic of a subinterval in a first interval corresponding to the first data point, the first interval corresponding to the first data point is an interval between two adjacent data points on two sides of the first data point, and the subinterval is an interval between two adjacent data points;
the correcting module 303 is configured to correct the N initial data structure nodes according to the information difference statistic corresponding to each first data point, so as to obtain a target data structure node.
In some embodiments, in order to quickly acquire the initial data structure node of the time-series data, the data processing apparatus may further include:
the model building module is used for building a time sequence model according to the time sequence data;
and the data structure node acquisition module is used for acquiring N initial data structure nodes of the time sequence data in the target interval according to the time sequence model.
In an embodiment, in order to quantify an information difference between the first data points, the calculating module 302 is specifically configured to:
calculating the information entropy of a first interval corresponding to the first data point and the information entropy of subintervals in the first interval;
respectively calculating an absolute KL divergence estimation value of each subinterval and the first interval according to the information entropy of the first interval and the information entropy of each subinterval in the first interval;
and adding the absolute KL divergence estimated values of the subintervals and the first interval to obtain the information difference statistic corresponding to the first data point.
In some embodiments, the correcting module 303 may include:
a determination unit for performing the determining step: for each initial data structure node, performing the following operations: determining a first data point with the largest information difference statistic in a second interval corresponding to the initial data structure node as a first initial data structure node to obtain a first initial data structure node of each initial data structure node; the second interval is an interval between two adjacent initial data structure nodes on two sides of the initial data structure node;
a first updating unit for executing a first updating step: updating the initial data structure nodes according to each first initial data structure node;
under the condition that a preset condition is met, determining the updated initial data structure node as a target data structure node;
and under the condition that the preset condition is not met, circularly executing the determining step and the first updating step until the preset condition is met.
In some embodiments, the correcting module 303 may include:
a second update module for performing a second update step: for each initial data structure node, performing the following operations: updating the first data point with the largest information difference statistic in the second interval corresponding to the initial data structure node into the initial data structure node to obtain an updated initial data structure node; the second interval is an interval between two adjacent initial data structure nodes on two sides of the initial data structure node;
under the condition that a preset condition is met, determining the updated initial data structure node as a target data structure node;
and under the condition that the preset condition is not met, circularly executing the second updating step until the preset condition is met.
According to the data processing device and the data processing method, the information quantity difference between the data point and two adjacent data points can be judged according to the numerical value of the information difference statistic corresponding to each data point, so that the data structure node can be accurately judged through the information difference statistic, the data structure node is corrected, and the accuracy of judging the data structure node is improved.
Fig. 4 shows a hardware structure diagram of an electronic device provided in an embodiment of the present application.
The electronic device may include a processor 401 and a memory 402 storing computer program instructions.
Specifically, the processor 401 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 402 may include mass storage for data or instructions. By way of example, and not limitation, memory 402 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 402 may include removable or non-removable (or fixed) media, where appropriate. The memory 402 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 302 is a non-volatile solid-state memory.
The memory may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the methods according to an aspect of the present disclosure.
The processor 401 reads and executes the computer program instructions stored in the memory 402 to implement any of the data processing methods in the above embodiments.
In one example, the electronic device can also include a communication interface 403 and a bus 404. As shown in fig. 4, the processor 401, the memory 402, and the communication interface 403 are connected via a bus 404 to complete communication therebetween.
The communication interface 403 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application.
Bus 404 comprises hardware, software, or both that couple the components of the online data traffic billing device to one another. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 304 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
In addition, in combination with the data processing method in the foregoing embodiments, the embodiments of the present application may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement any of the data processing methods in the above embodiments.
It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.
The functional blocks shown in the above structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As will be apparent to those skilled in the art, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims (11)

1. A data processing method, comprising:
acquiring N initial data structure nodes of time series data in a target interval, wherein N is more than or equal to 3 and is a positive integer, and a plurality of data points are included between two adjacent initial data structure nodes; the target interval is any one section of interval selected from the time sequence data;
calculating information difference statistics corresponding to a first data point in the target interval, wherein the first data point is any one data point in the target interval, the information difference statistics corresponding to the first data point is information difference statistics of a subinterval in a first interval corresponding to the first data point, the first interval corresponding to the first data point is an interval between two adjacent data points at two sides of the first data point, and the subinterval is an interval between two adjacent data points;
and correcting the N initial data structure nodes according to the information difference statistics corresponding to each first data point to obtain target data structure nodes.
2. The method of claim 1, wherein the obtaining N initial data structure nodes of the time series data within the target interval comprises:
constructing a time series model according to the time series data;
and obtaining N initial data structure nodes of the time sequence data in the target interval according to the time sequence model.
3. The method of claim 1, wherein the calculating information difference statistics corresponding to each data point in the target interval comprises:
calculating the information entropy of a first interval corresponding to the first data point and the information entropy of subintervals in the first interval;
respectively calculating an absolute KL divergence estimation value of each subinterval and the first interval according to the information entropy of the first interval and the information entropy of each subinterval in the first interval;
and obtaining information difference statistics corresponding to the first data point according to the absolute KL divergence estimated value of each subinterval and the first interval.
4. The method of claim 1, wherein said correcting said N initial data structure nodes according to the information difference statistics corresponding to each of said first data points to obtain a target data structure node comprises:
a determination step: for each initial data structure node, performing the following operations: determining a first data point with the largest information difference statistic in a second interval corresponding to the initial data structure node as a first initial data structure node to obtain a first initial data structure node of each initial data structure node; the second interval is an interval between two adjacent initial data structure nodes on two sides of the initial data structure node;
a first updating step: updating the initial data structure nodes according to each first initial data structure node;
under the condition that a preset condition is met, determining the updated initial data structure node as the target data structure node;
and under the condition that the preset condition is not met, circularly executing the determining step and the first updating step until the preset condition is met.
5. The method of claim 1, wherein said correcting said N initial data structure nodes according to the information difference statistics corresponding to said each data point to obtain a target data structure node of said target data comprises:
a second updating step: for each initial data structure node, performing the following operations: updating the first data point with the maximum information difference statistic in the second interval corresponding to the initial data structure node into the initial data structure node to obtain an updated initial data structure node; the second interval is an interval between two adjacent initial data structure nodes on two sides of the initial data structure node;
under the condition that a preset condition is met, determining the updated initial data structure node as the target data structure node;
and under the condition that the preset condition is not met, circularly executing the second updating step until the preset condition is met.
6. The method according to claim 4 or 5, characterized in that the preset conditions are: and finishing the preset times of data structure node correction or enabling the data structure nodes to tend to be stable.
7. A data processing apparatus, comprising:
the acquisition module is used for acquiring N initial data structure nodes of the time series data in the target interval, wherein N is more than or equal to 3 and is a positive integer, and a plurality of data points are included between two adjacent initial data structure nodes; the target interval is any one section of interval selected from the time sequence data;
a calculating module, configured to calculate an information difference statistic corresponding to a first data point in the target interval, where the first data point is any one data point in the target interval, the information difference statistic corresponding to the first data point is an information difference statistic of a sub-interval in a first interval corresponding to the first data point, the first interval corresponding to the first data point is an interval between two adjacent data points on two sides of the first data point, and the sub-interval is an interval between two adjacent data points;
and the correcting module is used for correcting the N initial data structure nodes according to the information difference statistics corresponding to each first data point to obtain target data structure nodes.
8. The apparatus of claim 7, wherein the computing module is specifically configured to:
calculating the information entropy of a first interval corresponding to the first data point and the information entropy of subintervals in the first interval;
respectively calculating an absolute KL divergence estimation value of each subinterval and the first interval according to the information entropy of the first interval and the information entropy of each subinterval in the first interval;
and adding the absolute KL divergence estimated values of the subintervals and the first interval to obtain the information difference statistics corresponding to the first data point.
9. A data processing apparatus, characterized in that the apparatus comprises: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a method of data processing as claimed in any of claims 1-6.
10. A readable storage medium, on which a program or instructions are stored, which program or instructions, when executed by a processor, carry out the steps of the data processing method according to any one of claims 1 to 6.
11. A computer program product, characterized in that instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the data processing method according to any of claims 1-6.
CN202111555011.2A 2021-12-17 2021-12-17 Data processing method, device, equipment and storage medium Pending CN114218529A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111555011.2A CN114218529A (en) 2021-12-17 2021-12-17 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111555011.2A CN114218529A (en) 2021-12-17 2021-12-17 Data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114218529A true CN114218529A (en) 2022-03-22

Family

ID=80703868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111555011.2A Pending CN114218529A (en) 2021-12-17 2021-12-17 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114218529A (en)

Similar Documents

Publication Publication Date Title
CN109163720A (en) Kalman filter tracking method based on fading memory exponent
CN112148557B (en) Method for predicting performance index in real time, computer equipment and storage medium
CN109033499A (en) A kind of aero-engine method for predicting residual useful life of multistage consistency check
CN114978956A (en) Method and device for detecting abnormal performance mutation points of network equipment in smart city
Bai et al. A robust fixed-interval smoother for nonlinear systems with non-stationary heavy-tailed state and measurement noises
CN114844762B (en) Alarm authenticity detection method and device
CN113988676B (en) Safety management method and system for water treatment equipment
CN109960626B (en) Port abnormity identification method, device, equipment and medium
CN103684349A (en) Kalman filtering method based on recursion covariance matrix estimation
CN113190429B (en) Server performance prediction method and device and terminal equipment
CN112597647B (en) Rapid-convergence ultrahigh-frequency microwave rainfall data discretization method
CN114218529A (en) Data processing method, device, equipment and storage medium
CN117370913A (en) Method, device and equipment for detecting abnormal data in photovoltaic system
CN115293467B (en) Method, device, equipment and medium for predicting out-of-date risk of product manufacturing
CN106153046B (en) Gyro random noise AR modeling method based on self-adaptive Kalman filtering
CN116401783A (en) Leaf profile pneumatic uncertainty quantification method and application
CN106814608B (en) Predictive control adaptive filtering algorithm based on posterior probability distribution
CN112581727B (en) Bridge displacement drift early warning method, device, equipment and storage medium
CN116151975A (en) Transaction abnormity warning method and device
CN114511088A (en) Bayesian model updating method and system for structure damage recognition
CN109670593B (en) Method for evaluating and predicting layer calculation time in deep learning model
Saito et al. Robustness of non-homogeneous gamma process-based software reliability models
WO2023139640A1 (en) Information processing device and information processing method
CN113111717A (en) Linear time-varying system parameter identification method
CN113722176B (en) Self-adaptive abnormal performance index determination method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination