CN114218529A

CN114218529A - Data processing method, device, equipment and storage medium

Info

Publication number: CN114218529A
Application number: CN202111555011.2A
Authority: CN
Inventors: 林鹭晖; 张玉龙
Original assignee: CCB Finetech Co Ltd
Current assignee: CCB Finetech Co Ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-03-22

Abstract

The application discloses a data processing method, a data processing device, a data processing apparatus and a storage medium. The information difference statistic corresponding to each data point is calculated, the information difference of the data point relative to the adjacent data points can be quantized, and the variation of the data point relative to the data information of two adjacent data points before and after can be judged through the information difference statistic, so that the data structure node can be accurately determined through the information difference statistic, the data structure node is corrected, and the accuracy of judging the data structure node is improved.

Description

Data processing method, device, equipment and storage medium

Technical Field

The present application belongs to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and storage medium.

Background

The financial data has corresponding data nodes on the structure, and the change of the data structure or the generation of the data nodes means that the business risk of the enterprise is greatly changed. Therefore, in order to control the operational risk and maximize the operational efficiency of the enterprise, the data structure needs to be analyzed, and the related data structure nodes can be accurately judged.

In the data processing method in the prior art, the data structure node of the time series data can be directly judged, but the data structure node often has the problem of hysteresis deviation. For example, a prediction model needs to select a history sample with a certain window width for model parameter estimation (such as an adaptive algorithm), and the judgment of a node on the left side of a window is often moved more leftward than an actual node, so that an error exists in the determination of a data structure node in the prior art.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing device, data processing equipment and a storage medium, and the accuracy of data analysis can be improved.

In a first aspect, an embodiment of the present application provides a data processing method, including:

acquiring N initial data structure nodes of time series data in a target interval, wherein N is more than or equal to 3 and is a positive integer, and a plurality of data points are included between two adjacent initial data structure nodes; the target interval is any one section of interval selected from the time sequence data;

calculating information difference statistics corresponding to a first data point in the target interval, wherein the first data point is any one data point in the target interval, the information difference statistics corresponding to the first data point is information difference statistics of a subinterval in a first interval corresponding to the first data point, the first interval corresponding to the first data point is an interval between two adjacent data points at two sides of the first data point, and the subinterval is an interval between two adjacent data points;

and correcting the N initial data structure nodes according to the information difference statistics corresponding to each first data point to obtain target data structure nodes.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including:

the acquisition module is used for acquiring N initial data structure nodes of the time series data in the target interval, wherein N is more than or equal to 3 and is a positive integer, and a plurality of data points are included between two adjacent initial data structure nodes; the target interval is any one section of interval selected from the time sequence data;

a calculating module, configured to calculate an information difference statistic corresponding to a first data point in the target interval, where the first data point is any one data point in the target interval, the information difference statistic corresponding to the first data point is an information difference statistic of a sub-interval in a first interval corresponding to the first data point, the first interval corresponding to the first data point is an interval between two adjacent data points on two sides of the first data point, and the sub-interval is an interval between two adjacent data points;

and the correcting module is used for correcting the N initial data structure nodes according to the information difference statistics corresponding to each first data point to obtain target data structure nodes.

In a third aspect, an embodiment of the present application provides an electronic device, where the device includes: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements a data processing method as shown in the first aspect.

In a fourth aspect, the present application provides a readable storage medium, on which a program or instructions are stored, and when the program or instructions are executed by a processor, the program or instructions implement the data processing method shown in the first aspect.

In a fifth aspect, the present application provides a computer program product, and when executed by a processor of an electronic device, the instructions of the computer program product cause the electronic device to execute the data processing method according to the first aspect.

According to the data processing method provided by the embodiment of the application, a section of interval is selected from time sequence data and determined as a target interval, N initial data structure nodes of the time sequence data in the target interval are obtained, then information difference statistics corresponding to each data point are calculated, the obtained N initial data structure nodes are corrected according to the information difference statistics corresponding to each data point, and the target data structure nodes are obtained. The information difference statistic corresponding to each data point is calculated, the information difference of the data point relative to the adjacent data points can be quantized, and the information variation quantity of the data point relative to the data of the two adjacent data points can be judged through the information difference statistic, so that the data structure node can be accurately judged through the information difference statistic, the data structure node is corrected, and the accuracy of judging the data structure node is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a data processing method provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram illustrating a data processing method according to another embodiment of the present application;

FIG. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In order to solve the problem of the prior art, embodiments of the present application provide a data processing method, apparatus, device, and storage medium. First, a data processing method provided in an embodiment of the present application is described below.

The commercial bank has a great deal of evidence that financial data represented by interest rate period structures have corresponding data nodes on the structures, and the existence of the structural nodes is actually an explicit expression of structural changes of the data on different dimensional development (such as time series development). The occurrence of such data structure change marks a significant change in the information conveyed by the corresponding data, which may be caused by structural change generated inside the system or due to external environmental macro factors or policy influence. If the data expresses the business condition of the enterprise or the relevant indexes concerned by the business condition, the change of the data structure or the generation of the data nodes means that the business risk faced by the enterprise generates larger change. Therefore, from the fundamental point of controlling the operation risk and maximizing the operation efficiency, the enterprise has a strong demand for data structure analysis, and the related data nodes can be accurately judged.

The data processing method in the prior art can construct a data model for data and directly judge a data structure node, but the data processing method often has the problem of hysteresis deviation. For example, the prediction model needs to select a history sample with a certain window width for model parameter estimation (such as adaptive algorithm, etc.), and the judgment of the node on the left side of the window tends to move more leftward than the actual node. The information contained in the sample points which are excessive on the left side is inconsistent with the information in the right interval, and the estimation of the parameters of the participated model by using the information is more wrong for prediction.

Therefore, in order to effectively and quickly correct the time series data structural node, the present application proposes a data processing method for correcting the time series data structural node based on the information entropy.

Some of the terms referred to in this application are explained below:

data structure: the data structure in the invention refers to the statistical characteristics of the data on different dimensions, and is the specific information expression of the data. For example, if the data set is the weight data of all infants within 1 year old, the average value is one data structure thereof, and the information of the average increase of the infants within 1 year old is expressed.

Fig. 1 shows a schematic flow chart of a data processing method according to an embodiment of the present application. As shown in fig. 1, the method comprises the following steps 101 to 103.

Step 101, acquiring N initial data structure nodes of time series data in a target interval.

And 102, calculating information difference statistics corresponding to the first data point in the target interval.

And 103, correcting the N initial data structure nodes according to the information difference statistics corresponding to each first data point to obtain target data structure nodes.

Specific implementations of the above steps will be described in detail below.

According to the data processing method provided by the embodiment of the application, the data fluctuation size of the data point compared with two adjacent data points in front and back can be judged according to the information difference statistic corresponding to each data point, so that the data structure node can be accurately judged through the information difference statistic, the data structure node is corrected, and the accuracy of judging the data structure node is improved.

Specific implementations of the above steps are described below.

In step 101, N initial data structure nodes of time-series data within a target interval are acquired. N is more than or equal to 3 and is a positive integer, and a plurality of data points are arranged between two adjacent initial data structure nodes; the target interval is any interval selected from the time sequence data.

Time series refers to a sequence of values of a variable that are measured chronologically over a certain time. Anything observed or measured at multiple points in time can form a time series. Time series (time series) data is an important form of structured data.

Specifically, when analyzing time-series data, it is necessary to select one data segment from the time-series data as a target segment for data processing, and to specify an initial data structure node of the time-series data for the time-series data in the target segment.

In some embodiments, the data structure nodes of the time series data may be acquired by a data structure node algorithm.

For example, each data interval [ x ] is selected from the time series data₁,x_t]As the target interval, in the interval [ x ]₁,x_t]Time series data { x within₁,x₂,x₃,x₄,x₅…,x_tObtaining a data structure node set { x ] through a data structure node algorithm₁,x₄,…,x_t}。

In some embodiments, obtaining N initial data structure nodes of time series data within a target interval may include:

constructing a time sequence model according to the time sequence data;

and obtaining N initial data structure nodes of the time sequence data in the target interval according to the time sequence model.

Specifically, after a set of data is acquired, the set of data is analyzed, and it is necessary to find the characteristics of the data and describe the characteristics of the data, so that analyzing the time series data may be to analyze the time series data by using a statistical means, so as to construct a data model for the variation characteristics of the data, and obtain the data structure node of the time series data according to the data model.

Taking the time series data of market interest rate periods as an example, in some embodiments, the data model of interest rate periods may be expressed as:

y_t+1＝E_t(y_t+1|Θ,Φ_t)

wherein phi_tFor a history information set phi_t＝{X_t,Y_t}＝{x₁,x₂,…,x_t,y₁,y₂,…,y_t}，y_tIs the market interest level, x, of the relevant product_tIs a vector of factors that affect the market interest level, and the parameter set is Θ ═ θ₁,θ₂,…,θ_kAnd the interest rate period parameter is included in the parameter set.

The initial data structure node is obtained according to the time series data model, various methods exist in the prior art, such as an adaptive algorithm and the like, a specific operation method can be determined according to an actual application scenario, and the method for obtaining the initial data structure node is not limited in the application.

For example, for any segment of interval [ x ]₁,x_t]Time series data of (x)₁,x₂,x₃,x₄,x₅…,x_tGet the time sequence in the interval [ x ] by constructing the time sequence model₁,x_t]N data structure nodes { x₁,x₅…,x_t}。

In step 102, information difference statistics corresponding to the first data point in the target interval are calculated.

The information difference statistic is a statistic describing the information amount difference between data, and the difference reflects the degree of dispersion between data.

Wherein, the first isThe data point can be any one data point in the target interval, the information difference statistic corresponding to the first data point is the information difference statistic of a subinterval in the first interval corresponding to the first data point, the first interval corresponding to the first data point is the interval between two adjacent data points at two sides of the first data point, and the subinterval is the interval between two adjacent data points. For example, assume that in the target interval [ x ]₁,x_t]In the above, the time series data is { x }₁,x₂,x₃,x₄,x₅…,x_t}, then data point x₂The corresponding first interval is [ x ]₁,x₃) And x is₂The corresponding subinterval in the first interval is [ x ]₁,x₂) And [ x ]₂,x₃)。

For statistics of differences between data, the difference statistics may be: range, interquartile, variance, standard deviation, coefficient of variation measuring relative degrees of dispersion, and the like.

In some embodiments of the present application, the following method may be adopted to calculate the information difference statistic corresponding to the first data point:

calculating the information entropy of a first interval corresponding to the first data point and the information entropy of subintervals in the first interval;

respectively calculating an absolute KL divergence estimation value of each subinterval and the first interval according to the information entropy of the first interval and the information entropy of each subinterval in the first interval;

and adding the absolute KL divergence estimated values of the subintervals and the first interval to obtain the information difference statistic corresponding to the first data point.

Entropy represents a measure of uncertainty in random variables in information theory, and mathematically, entropy is an expectation of the amount of information. Information can be quantized by using the information entropy, and the information quantity of data, namely information caused by the occurrence of an event, is calculated. Therefore, the information difference statistics of subintervals at two sides of the data point in the time series data can be calculated based on the information entropy formula and the absolute KL divergence formula, and the information difference statistics corresponding to each data point can be obtained.

In this example, based on the information entropy and the KL divergence maximization technique, the information difference statistic corresponding to the first data point can be calculated, and the information difference between the data points can be quantized to visually display the information difference between the data points.

In one example, the time series data of interest rate period in the above example is taken as an example, and the time series data model y according to interest rate period_t+1＝E_t(y_t+1|Θ,Φ_t) Wherein phi is_tFor a history information set phi_t＝{X_t,Y_t}＝{x₁,x₂,…,x_t,y₁,y₂,…,y_t}，y_tIs the market interest level, x, of the relevant product_tIs a vector of factors that affect the market interest level, and the parameter set is Θ ═ θ₁,θ₂,…,θ_kAnd the interest rate period parameter is included in the parameter set.

For the target interval I ═ x₀,x_T]＝{x₁,x₂,x₃,x₄,x₅…,x_s,…,x_T}，

The calculation formula of the information entropy is as follows:

H(Θ_s,I)＝-E(log f(X)|Θ_s,I) (1)

where f (-) is the probability density function, sample X ∈ I, and the parameter set

Based on the information entropy formula, theta can be calculated through an absolute KL divergence formula_sAnd theta_tAmount of information difference between D_KL(Θ_s||Θ_t)：

The absolute KL divergence is calculated as:

in some embodiments, when Θ_sAnd Θ_tWhen representing a data set of two intervals, the absolute KL divergence estimate for the two intervals can be calculated by the following equation

Wherein n is the number of data points in the interval.

When the first data point is x_sThe first interval is I_s∪I_s+1The sub-interval in the first interval is I_s＝[x_s-1,x_s) And I_s+1＝[x_s-1,x_s) Based on the absolute KL divergence estimate, interval I is calculated by the following formula_sAnd interval I_s+1Information difference statistic T of_s：

In this example, based on the information entropy and the KL divergence maximization technique, statistics of data in adjacent subintervals on both sides of the data point are constructed, information difference statistics corresponding to the data point can be calculated, information differences between the data points are quantized, and the information differences between the data points are visually shown.

It should be noted that the calculation method proposed in the present application is not limited to the time-series data of interest rate time limit, and therefore, the calculation of the statistic of information difference is not limited to the data related to interest rate time limit. For different types of time series models, the calculation of the corresponding information difference statistics may also be adjusted accordingly. For example, for adjustment of risk level control, pairs may be redefined based on different data characteristicsAbsolute KL divergence D_KL,m(Θ_s||Θ_t)＝E|(logf(X_s)|Θ_s,I_s)-(logf(X_s)|Θ_t,I_s)|^mWhereby the statistic is adjusted to

In step 103, the N initial data structure nodes are corrected according to the information difference statistics corresponding to each first data point, so as to obtain a target data structure node.

In step 102, information difference statistics corresponding to the data points in each target interval are calculated, so that according to the numerical value of the information difference statistics corresponding to each first data point, nodes where data changes can be found by comparing the information difference statistics, that is, in a section, the data point where the information difference statistics is the largest or exceeds a preset threshold is a data structure node.

Specifically, for N initial data structure nodes in the obtained target interval, each initial data structure node is corrected, a correction range of each initial data structure node needs to be determined first, after the correction range is determined, a first data point with the largest information difference statistic is found according to the numerical value of the information difference statistic corresponding to each first data point in the correction range, and the initial data structure nodes are corrected according to the first data point with the largest information difference statistic in the correction range of each initial data structure node.

In some examples, the data points corresponding to the information difference statistics exceeding the preset threshold in the target interval may be determined as new data structure nodes.

In some examples, the target interval may also be divided into a plurality of homogeneous intervals (i.e., a second interval: an interval between two adjacent initial data structure nodes on both sides of the initial data structure node) according to the obtained N initial data structure nodes, and the data point with the largest information difference statistic is determined as a new data structure node for each homogeneous interval.

For example: for an initial set of data structural nodes

(namely acquiring N initial data structure nodes of time sequence data in the target interval), and segmenting the target interval according to the initial data structure nodes to obtain a plurality of homogeneous intervals:

and is

In some embodiments, correcting the N initial data structure nodes according to the information difference statistics corresponding to each first data point to obtain the target data structure node may include:

a determination step: for each initial data structure node, performing the following operations: determining a first data point with the largest information difference statistic in a second interval corresponding to the initial data structure node as a first initial data structure node to obtain a first initial data structure node of each initial data structure node; the second interval is an interval between two adjacent initial data structure nodes on two sides of the initial data structure node;

a first updating step: updating the initial data structure nodes according to each first initial data structure node;

under the condition that a preset condition is met, determining the updated initial data structure node as a target data structure node;

and under the condition that the preset condition is not met, circularly executing the determining step and the first updating step until the preset condition is met.

In one example, as shown in FIG. 2:

s201, acquiring time series data;

s202, acquiring time-time sequence data initial data structure nodes and homogeneous intervals (intervals between two adjacent initial data structure nodes):

construction of time series data model y_t+1＝E_t(y_t+1|Θ,Φ_t) Wherein the historical information set is phi_t＝{X_t,Y_t}＝{x₁,x₂,…,x_t,y₁,y₂,…,y_tThe parameter set is theta ═ theta }₁,θ₂,…,θ_kH, target interval I ═ x₀,x_T]。

Obtaining an initial data structural node set through a structural node algorithm according to a data model

and is

And S203, confirming the iteration times. I.e., the number of times each initial data structure node completes the correction, excluding the endpoint. The number of iterations (corrections) for each initial data structure node may be preset, or other conditions for stopping the iterations may be set. This step may or may not be provided.

S204, determining the S node (namely

) According to the data point with the maximum information difference statistic in the second interval, the s-th node is corrected.

(1) Fixed structural joint

And

in the interval

(i.e. the

Corresponding second interval) to find the maximum information difference statistic

Structural node of

I.e. the second interval

Substituting each first data point in the formulas (1), (2) and (4) to obtain information difference statistics corresponding to each first data point in a second interval; recording the information difference statistic with the maximum value as

Will be provided with

The corresponding first data point is determined as a first initial data structure node

(2) Fixed structural joint

And

can be in the interval

In finding maximized statistics

Structural node of

(that is, the first data point with the largest information difference statistic in the second interval corresponding to the initial data structure node is determined as the first initial data structure node, and the first initial data structure node of the initial data structure node is obtained).

(3) According to the step (2), in

Corresponding to the second region

To identify the corresponding structural node

(namely, the determining step is that for each initial data structure node, the first data point with the largest information difference statistic in the second interval corresponding to the initial data structure node is determined as the first initial data structure node, and the first initial data structure node of each initial data structure node is obtained).

Obtaining a first iteration of structural nodes

Wherein the two ends of the sample interval remain unchanged. According to the change of structural nodes, the corresponding interval division of the log is adjusted to

(i.e., a first updating step: updating the initial data structure nodes according to each first initial data structure node).

S205, judging whether all nodes except the end point complete iteration, if so, turning to S206, and if not, turning to S204;

s206, judging whether the iteration times are finished or the nodes are stable:

if not, go to S202, repeat the work of S202-205 for the kth iteration. Fixed structural joint

And

in the interval

In finding maximized statistics

Structural node of

That is, the structural node set generated by the kth iteration can be obtained

And corresponding k-th iteration generated homogenous interval

(i.e. in case the preset condition is not satisfied)And then, circularly executing the determining step and the first updating step until the preset condition is met. )

If so, iterating until the Kth time or the structural node is stable or a related iteration target (such as a time series data prediction effect) determined in advance, determining that the data structure node correction is completed, namely determining the updated initial data structure node as the target data structure node under the condition that a preset condition is met.

In this example, for the correction of the initial data structure node, the initial data structure node is kept unchanged in the process of completing one correction, that is, each initial data structure node except the end point completes one confirmation step and the first update step, after the one correction is completed, all the initial data structure nodes are updated according to each obtained first initial data structure node, and the data structure node converges to a correct structural breakpoint in the correction process through the cyclic execution of the correction step. The data structure node correction method of the present example obtains better stability of the data structure node.

a second updating step: for each initial data structure node, performing the following operations: updating the first data point with the largest information difference statistic in the second interval corresponding to the initial data structure node into the initial data structure node to obtain an updated initial data structure node; the second interval is an interval between two adjacent initial data structure nodes on two sides of the initial data structure node;

and under the condition that the preset condition is not met, circularly executing the second updating step until the preset condition is met.

In one example, for data model y_t+1＝E_t(y_t+1|Θ,Φ_t) Wherein the historical information setIs phi_t＝{X_t,Y_t}＝{x₁,x₂,…,x_t,y₁,y₂,…,y_tThe parameter set is theta ═ theta }₁,θ₂,…,θ_kH, target interval I ═ x₀,x_T]。

S301, N initial data structure nodes of time sequence data in the target interval are obtained. Obtaining an initial data structural node set through a structural node algorithm according to a data model

Dividing the target interval according to the initial data structure node to obtain a plurality of homogeneous intervals:

and is

S302, updating a first data point corresponding to the initial data structure node and having the largest information difference statistic in a second interval into the initial data structure node to obtain an updated initial data structure node; the next initial data structure node is updated based on the updated initial data structure node.

Fixed structural joint

And

in that

Corresponding second interval

In finding maximized statistics

Structural node of

I.e. the second interval

Substituting each first data point in the formula (1), (3) and (4) to obtain information difference statistics corresponding to each first data point in a second interval; recording the information difference statistic with the maximum value as

Will be provided with

Fixed structural joint

And

can be arranged at

Corresponding second interval

(wherein

According to updated

Determination) to find a maximized statistic

Structural node of

S303, aiming at each initial data structure node, respectively executing the following operations: and updating the first data point with the largest information difference statistic in the second interval corresponding to the initial data structure node into the initial data structure node to obtain an updated initial data structure node, namely the second updating step.

In the step S302, in the section

(wherein

For updated initial data structure nodes) to identify corresponding structural nodes

Then, the first iteration structural node can be obtained

Wherein the two ends of the sample interval remain unchanged. According to the change of the structural node, the corresponding homogeneous interval is adjusted to

S304, under the condition that the preset condition is not met, the second updating step is executed in a circulating mode until the preset condition is met.

Repeating S302-S303 for the kth iteration. Fixed structural joint

And

in the interval

(wherein

) In finding maximized statistics

Structural node of

That is, the structural node set generated by the kth iteration can be obtained

And corresponding k-th iteration generated homogenous interval

S305, under the condition that a preset condition is met, determining the updated initial data structure node as a target data structure node.

And (4) after the K time of iteration, the structural node is stable or relevant iteration targets (such as time series data prediction effect) determined in advance are achieved, and the data structure node correction is considered to be completed.

In this example, for the correction of the initial data structure node, the newly confirmed data structure node is updated in real time for the initial data structure node in the process of completing one correction, that is, completing the second updating step for each initial data structure node, and the correction of the next initial data structure node is corrected based on the updated initial data structure node. The data structure node correction method of the example has a faster convergence speed by circularly executing the correction step to make the data structure node converge to a correct structural breakpoint in the correction process.

In some embodiments, the preset condition may be that the data structure node correction is completed a preset number of times or the data structure node tends to be stable.

According to the data processing method provided by the embodiment of the application, the information quantity difference between the data point and two adjacent data points in front and back can be judged according to the numerical value of the information difference statistic corresponding to each data point, so that the data structure node can be accurately judged through the information difference statistic, the data structure node is corrected, and the accuracy of judging the data structure node is improved.

As shown in fig. 3, an embodiment of the present application further provides a data processing apparatus, which includes an obtaining module 301, a calculating module 302, and a correcting module 303.

An obtaining module 301, configured to obtain N initial data structure nodes of time series data in a target interval, where N is greater than or equal to 3 and is a positive integer, and a plurality of data points are included between two adjacent initial data structure nodes; the target interval is any interval selected from the time sequence data;

the calculating module 302 is configured to calculate an information difference statistic corresponding to a first data point in a target interval, where the first data point is any one data point in the target interval, the information difference statistic corresponding to the first data point is an information difference statistic of a subinterval in a first interval corresponding to the first data point, the first interval corresponding to the first data point is an interval between two adjacent data points on two sides of the first data point, and the subinterval is an interval between two adjacent data points;

the correcting module 303 is configured to correct the N initial data structure nodes according to the information difference statistic corresponding to each first data point, so as to obtain a target data structure node.

In some embodiments, in order to quickly acquire the initial data structure node of the time-series data, the data processing apparatus may further include:

the model building module is used for building a time sequence model according to the time sequence data;

and the data structure node acquisition module is used for acquiring N initial data structure nodes of the time sequence data in the target interval according to the time sequence model.

In an embodiment, in order to quantify an information difference between the first data points, the calculating module 302 is specifically configured to:

In some embodiments, the correcting module 303 may include:

a determination unit for performing the determining step: for each initial data structure node, performing the following operations: determining a first data point with the largest information difference statistic in a second interval corresponding to the initial data structure node as a first initial data structure node to obtain a first initial data structure node of each initial data structure node; the second interval is an interval between two adjacent initial data structure nodes on two sides of the initial data structure node;

a first updating unit for executing a first updating step: updating the initial data structure nodes according to each first initial data structure node;

In some embodiments, the correcting module 303 may include:

a second update module for performing a second update step: for each initial data structure node, performing the following operations: updating the first data point with the largest information difference statistic in the second interval corresponding to the initial data structure node into the initial data structure node to obtain an updated initial data structure node; the second interval is an interval between two adjacent initial data structure nodes on two sides of the initial data structure node;

According to the data processing device and the data processing method, the information quantity difference between the data point and two adjacent data points can be judged according to the numerical value of the information difference statistic corresponding to each data point, so that the data structure node can be accurately judged through the information difference statistic, the data structure node is corrected, and the accuracy of judging the data structure node is improved.

Fig. 4 shows a hardware structure diagram of an electronic device provided in an embodiment of the present application.

The electronic device may include a processor 401 and a memory 402 storing computer program instructions.

Specifically, the processor 401 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 402 may include mass storage for data or instructions. By way of example, and not limitation, memory 402 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 402 may include removable or non-removable (or fixed) media, where appropriate. The memory 402 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 302 is a non-volatile solid-state memory.

The memory may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the methods according to an aspect of the present disclosure.

The processor 401 reads and executes the computer program instructions stored in the memory 402 to implement any of the data processing methods in the above embodiments.

In one example, the electronic device can also include a communication interface 403 and a bus 404. As shown in fig. 4, the processor 401, the memory 402, and the communication interface 403 are connected via a bus 404 to complete communication therebetween.

The communication interface 403 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application.

Bus 404 comprises hardware, software, or both that couple the components of the online data traffic billing device to one another. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 304 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

In addition, in combination with the data processing method in the foregoing embodiments, the embodiments of the present application may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement any of the data processing methods in the above embodiments.

It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.

The functional blocks shown in the above structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As will be apparent to those skilled in the art, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims

1. A data processing method, comprising:

2. The method of claim 1, wherein the obtaining N initial data structure nodes of the time series data within the target interval comprises:

constructing a time series model according to the time series data;

3. The method of claim 1, wherein the calculating information difference statistics corresponding to each data point in the target interval comprises:

and obtaining information difference statistics corresponding to the first data point according to the absolute KL divergence estimated value of each subinterval and the first interval.

4. The method of claim 1, wherein said correcting said N initial data structure nodes according to the information difference statistics corresponding to each of said first data points to obtain a target data structure node comprises:

under the condition that a preset condition is met, determining the updated initial data structure node as the target data structure node;

5. The method of claim 1, wherein said correcting said N initial data structure nodes according to the information difference statistics corresponding to said each data point to obtain a target data structure node of said target data comprises:

a second updating step: for each initial data structure node, performing the following operations: updating the first data point with the maximum information difference statistic in the second interval corresponding to the initial data structure node into the initial data structure node to obtain an updated initial data structure node; the second interval is an interval between two adjacent initial data structure nodes on two sides of the initial data structure node;

6. The method according to claim 4 or 5, characterized in that the preset conditions are: and finishing the preset times of data structure node correction or enabling the data structure nodes to tend to be stable.

7. A data processing apparatus, comprising:

8. The apparatus of claim 7, wherein the computing module is specifically configured to:

and adding the absolute KL divergence estimated values of the subintervals and the first interval to obtain the information difference statistics corresponding to the first data point.

9. A data processing apparatus, characterized in that the apparatus comprises: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements a method of data processing as claimed in any of claims 1-6.

10. A readable storage medium, on which a program or instructions are stored, which program or instructions, when executed by a processor, carry out the steps of the data processing method according to any one of claims 1 to 6.

11. A computer program product, characterized in that instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the data processing method according to any of claims 1-6.