WO2016138805A1 - 一种分布数据异动判断定位方法及系统 - Google Patents

一种分布数据异动判断定位方法及系统 Download PDF

Info

Publication number
WO2016138805A1
WO2016138805A1 PCT/CN2016/072348 CN2016072348W WO2016138805A1 WO 2016138805 A1 WO2016138805 A1 WO 2016138805A1 CN 2016072348 W CN2016072348 W CN 2016072348W WO 2016138805 A1 WO2016138805 A1 WO 2016138805A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimension
transaction
distribution data
level
data
Prior art date
Application number
PCT/CN2016/072348
Other languages
English (en)
French (fr)
Inventor
李亮
刘朋飞
牟川
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Publication of WO2016138805A1 publication Critical patent/WO2016138805A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures

Definitions

  • the invention relates to the technical field of distributed data transaction correlation, in particular to a distributed data transaction determination positioning system.
  • the prior art For transaction data localization, the prior art generally adopts a method based on threshold comparison of fluctuation amplitude. Specifically, the method calculates the historical reference value by weighting the recent data (such as the most recent week and the latest month) (the corresponding data on the specific dimension), and compares the two sets of data of the latest data and the historical reference value, and examines each The amplitude of the data fluctuation, if the fluctuation amplitude exceeds a certain threshold (the threshold is generally set by experience), then the data is determined to have changed, and the one with the largest fluctuation is the main cause of the data movement.
  • the threshold is generally set by experience
  • a distributed data transaction determination positioning method includes:
  • the distributed data preparation step includes: acquiring multi-dimensional distribution data, and multi-dimensional reference value distribution data, wherein the multi-dimensional reference value distribution data is a historical reference value corresponding to each data of the multi-dimensional distribution data, and multiple dimensions Cross-combining to obtain a plurality of dimension combinations, respectively, generating a plurality of current first-level dimension distribution data about the first-level dimension according to the multi-dimensional distribution data, and a plurality of current dimension combination distribution data about the dimension combination, according to the multi-dimensional benchmark
  • the value distribution data respectively generates a plurality of historical first-level dimension reference value distribution data about the first-level dimension, and a plurality of historical dimension combination reference value distribution data about the dimension combination;
  • the transaction determination step includes: comparing the current first-level dimension distribution data with the corresponding historical first-level dimension reference value distribution data to obtain distribution data of each current first-level dimension distribution data relative to the corresponding historical first-level dimension reference value
  • the structural change, the structural first-order dimension distribution data exceeding the transaction threshold is the transaction-level first-order dimension distribution data
  • the current dimension combination distribution data is compared with the historical dimension combination reference value distribution data to obtain a relative distribution data of each current dimension.
  • the structural change of the reference value distribution data is combined, and the current dimension combination distribution data of the structural change exceeding the transaction threshold is the combined distribution data of the transaction dimension, and if there is a difference of the first-order dimension distribution data or the transaction-dimension combination distribution data, the alarm is performed.
  • a distributed data transaction determination positioning system includes:
  • a distributed data preparation module configured to: acquire multi-dimensional distribution data, and multi-dimensional reference value distribution data, where the multi-dimensional reference value distribution data is a historical reference value corresponding to each data of the multi-dimensional distribution data, and multiple Dimensional cross-combination to obtain a plurality of dimension combinations, respectively, generating a plurality of current first-level dimension distribution data about the first-level dimension according to the multi-dimensional distribution data, and a plurality of current dimension combination distribution data about the dimension combination, according to the multi-dimension
  • the reference value distribution data respectively generates a plurality of historical first-level dimension reference value distribution data about the first-level dimension, and a plurality of historical dimension combination reference value distribution data about the dimension combination;
  • the transaction determination module is configured to: compare the current first-level dimension distribution data with the corresponding historical first-level dimension reference value distribution data to obtain distribution data of each current first-level dimension distribution data relative to the corresponding historical first-level dimension reference value
  • the structural change, the structural first-order dimension distribution data exceeding the transaction threshold is the transaction-level first-order dimension distribution data
  • the current dimension combination distribution data is compared with the historical dimension combination reference value distribution data to obtain each current dimension combination distribution data.
  • the structural change of the reference value distribution data is combined with the corresponding historical dimension
  • the current dimension combination distribution data of the structural change exceeding the transaction threshold is the combined distribution data of the transaction dimension, and if there is a difference between the first dimension distribution data or the transaction dimension combination distribution data, the alarm is performed. .
  • the invention tests the multi-dimensional distribution data on the first-order dimension and the dimension combination respectively, and overcomes various shortcomings of the existing transaction determination and transaction positioning methods, so that the transaction determination is more rapid and accurate.
  • FIG. 1 is a working flow chart of a distributed data transaction determination positioning method according to the present invention.
  • FIG. 2 is a structural block diagram of a distributed data transaction determination positioning system according to the present invention.
  • Figure 3 is a schematic illustration of a preferred embodiment of the invention.
  • FIG. 1 is a flowchart of a distributed data difference determination positioning method according to the present invention, including:
  • Step S101 comprising: acquiring multi-dimensional distribution data, and multi-dimensional reference value distribution data, wherein the multi-dimensional reference value distribution data is a historical reference value corresponding to each data of the multi-dimensional distribution data, and is cross-combined by multiple dimensions.
  • Obtaining a plurality of dimension combinations respectively, generating a plurality of current first-level dimension distribution data about the first-level dimension according to the multi-dimensional distribution data, and a plurality of current dimension combination distribution data about the dimension combination, according to the multi-dimensional reference value distribution
  • the data respectively generates a plurality of historical first-level dimension reference value distribution data about the first-level dimension, and a plurality of historical dimension combination reference value distribution data about the dimension combination;
  • Step S102 comprising: comparing the current first-level dimension distribution data with the corresponding historical first-level dimension reference value distribution data to obtain a structure of each current first-level dimension distribution data relative to a corresponding historical first-level dimension reference value distribution data.
  • the current level dimension distribution data of the transaction change exceeding the transaction threshold is the transaction level dimension distribution data
  • the current dimension combination distribution data is compared with the historical dimension combination reference value distribution data to obtain each current dimension combination distribution data relative to
  • the corresponding historical dimension combines the structural change of the reference value distribution data
  • the current dimension combined distribution data of the structural change exceeding the transaction threshold is the combined distribution data of the transaction dimension, and if there is a difference of the first-order dimension distribution data or the transaction-dimension combination distribution data, the alarm is performed.
  • step S101 the multi-dimensional distribution data is decomposed into current first-level dimension distribution data about the first-level dimension, and a plurality of current dimension combination distribution data about the dimension combination. Multiple data is included in each dimension.
  • the province, order type, payment method and other dimensions are the first-level dimension, “province_order type”, “province_payment method”, “order type_payment method” ", provinces_order type_payment method” is a combination of dimensions.
  • Each dimension includes a plurality of data.
  • the province dimension may include: the data of the province A, the data of the province B, the data of the province C
  • the order type dimension may include: the data of the order type D, the data of the order type E,
  • the payment method dimension may include: data of the payment method G, data of the payment method H, and data of the payment method I.
  • "province_order type” includes: province A and order type D data, province A and order type E data, province A and order type F Data, data for province B and order type D, data for province B and order type E, data for province B and order type F, data for province C and order type D, data for province C and order type E, province C and Order type F data.
  • the multi-dimensional reference value distribution data is a reference value corresponding to each data of each dimension of the multi-dimensional distribution data, for example, a reference value for the province A, a reference value for the province B, and the like.
  • the pre-transformation data corresponding to each data in the multi-dimensional distribution data is processed, and the weighted average is generated to generate a historical reference value and stored as a multi-dimensional data table to obtain multi-dimensional reference value distribution data.
  • the multi-dimensional distribution data can be saved in time granularity such as time granularity, daily granularity, weekly granularity, monthly granularity, and annual granularity.
  • the prior non-transitive data refers to the pre-storage of the data stored in the multi-dimensional distributed data by the same time granularity.
  • step S102 structural displacements of the current first-level dimension distribution data and the current dimension combination distribution data with respect to the historical first-level dimension reference value distribution data and the historical dimension combination reference value distribution data are respectively calculated.
  • the two sets of data are structurally diagnosed to find out whether the two sets of data are consistent in structure. If they are inconsistent, they are considered to be different. That is, whether the structure of the current first-level dimension distribution data and the history-level dimension reference value distribution data are consistent by structural change, and whether the structure of the current dimension combination distribution data and the historical dimension combination reference value distribution data are consistent by structural change is determined.
  • step S102 the idea is to first assume that the two sets of data structures are consistent, and then use statistical test to determine the probability of the hypothesis being established. If the probability is small, the assumption is not true, indicating that the two sets of data structures have changed. Thus, it is said that there is a change in this dimension.
  • the technical solution of the present invention based on the idea of hypothesis testing, passes the dimension data in the dimension
  • the method of checking the structure test or the data structure after the dimension crossing, compared with the method of determining the fluctuation amplitude based on the threshold value, can more accurately determine the transaction, and can quickly make the transaction positioning.
  • the order quantity and order amount will fluctuate, the data of the payment method dimension will definitely fluctuate, and the data of the same province dimension or order type dimension will also fluctuate.
  • the existing method based on threshold comparison of fluctuation amplitude generally finds that the data in these three dimensions are different, but it is difficult to locate the transaction caused by the payment link.
  • the method of hypothesis testing according to the present invention the data in the three dimensions of payment method, province, and order type are respectively tested, and it is not difficult to find that the province and the order type data are compared with the historical reference value, and the possible values are decreased.
  • the overall structure is basically the same (such as the provincial dimension, the proportion of data in each province does not change much), through the structural test, it will not be judged as abnormal.
  • the proportion of online payment order or order amount is bound to drop seriously.
  • the proportion of other payment methods such as cash on delivery and postal remittance has increased sharply.
  • the structure has a significant anomaly. By performing a structural check on the data, the exception can be captured, thereby realizing the data movement. Therefore, the present invention makes up for the deficiencies of the existing differential determination and differential positioning methods.
  • the method further includes:
  • the transaction positioning step includes: a dimension corresponding to the highest-order transaction-level dimension distribution data of the structural change is used as a key transaction dimension, and the dimension corresponding to the combined-distribution distribution data is a combination of the transaction dimension, and the combination of the transaction dimension includes The dimension combination of the key transaction dimension is a combination of dimensions affected by the key transaction dimension, and the dimension included by the dimension combination affected by the key transaction dimension is a dimension affected by the key transaction dimension, and the display is The key transaction dimension and the dimension affected by the key transaction dimension.
  • the dimension combination affected by the key transaction dimension is found, and other dimensions affected by the key transaction dimension are obtained.
  • the transaction determining step specifically includes: calculating a chi-square value of the current first-level dimension distribution data and the corresponding historical first-level dimension reference value distribution data, where the transaction-level first-level dimension distribution data is that the corresponding chi-square value exceeds the transaction threshold The current first-level dimension distribution data is calculated, and the chi-square value of the current dimension combination distribution data and the corresponding historical dimension combination reference value distribution data is calculated, and the transaction dimension combination distribution data is the current dimension combination distribution data of the corresponding chi-square value exceeding the transaction threshold. ;
  • the transaction positioning step specifically includes: the transaction first-level dimension distribution data with the highest structural change is the transaction-level first-level dimension distribution data corresponding to the minimum chi-square value.
  • Chi-square test is a hypothesis test method, the degree of deviation between the actual observation value of the statistical sample and the theoretical inference value, and the degree of deviation between the actual observation value and the theoretical inference value determines the magnitude of the chi-square value.
  • the chi-square value can be used to derive the probability that the hypothesis is established, that is, the significance level or the P value. The smaller the P value, the less likely the hypothesis is to be established, and the less the assumption is.
  • the minimum chi-square value is used to judge the transaction, so that the transaction determination positioning is more reliable.
  • the transaction determining step specifically includes: calculating a chi-square value of the current first-level dimension distribution data and the corresponding historical first-level dimension reference value distribution data, where the transaction-level first-level dimension distribution data is that the corresponding chi-square value exceeds the transaction threshold The current first-level dimension distribution data is calculated, and the chi-square value of the current dimension combination distribution data and the corresponding historical dimension combination reference value distribution data is calculated, and the transaction dimension combination distribution data is the current dimension combination distribution data of the corresponding chi-square value exceeding the transaction threshold. ;
  • the changing the positioning step specifically includes: selecting the transaction level dimension distribution data corresponding to the minimum chi-square value as the minimum transaction level dimension distribution data, and the other transaction level dimension In the degree distribution data, the transaction first-order dimension distribution data whose difference between the corresponding chi-square value and the minimum chi-square value is smaller than the difference threshold is selected, and the fitting goodness test is performed with the corresponding historical first-level dimension reference value distribution data.
  • the coefficient of change with the coefficient of change and the highest level of structural change is the data of the dimension of the first-order dimension corresponding to the minimum determinable coefficient.
  • Goodness of Fit refers to the degree to which the regression line fits the observations.
  • the statistic for measuring the goodness of fit is the determinable coefficient (also known as the coefficient of determination) R ⁇ 2.
  • the range of R ⁇ 2 is [0, 1]. The closer the value of R ⁇ 2 is to 1, the better the fit of the regression line to the observed value; on the contrary, the closer the value of R ⁇ 2 is to 0, the worse the fit of the regression line to the observed value.
  • the difference between the chi-square value and the goodness-of-fit test is used to determine the transaction of the transaction, so that the transaction determination is more accurate.
  • the changing the positioning step further comprises: using the transaction-level dimension distribution data corresponding to the key transaction dimension as the key transaction-level dimension distribution data, and the key transaction-level dimension distribution data and the corresponding history.
  • Each data item of the first-level dimension reference value distribution data respectively calculates a difference value, and the data item in which the absolute value of the difference value is the largest is used as the main cause of the transaction, and the main cause of the transaction is displayed.
  • This embodiment can display the main cause of the transaction, so that the transaction determination positioning is more accurate.
  • FIG. 2 is a structural block diagram of a distributed data transaction determination positioning system according to the present invention, including:
  • the distributed data preparation module 201 is configured to: acquire multi-dimensional distribution data, and multi-dimensional reference value distribution data, where the multi-dimensional reference value distribution data is a historical reference value corresponding to each data of the multi-dimensional distribution data, Cross-combining the plurality of dimensions to obtain a plurality of dimension combinations, and generating, according to the multi-dimensional distribution data, a plurality of current first-level dimension distribution data about the first-level dimension, and a plurality of current dimension combination distribution data about the dimension combination, according to the plurality of The dimension reference value distribution data respectively generates a plurality of history 1 for the first dimension Level dimension reference value distribution data, and a plurality of historical dimension combination reference value distribution data about the combination of dimensions;
  • the transaction determination module 202 is configured to compare the current first-level dimension distribution data with the corresponding historical first-level dimension reference value distribution data to obtain a distribution of each current first-level dimension distribution data with respect to a corresponding historical first-level dimension reference value.
  • the structural change of the data, the current first-order dimension distribution data of the structural change exceeding the transaction threshold is the transaction-level first-order dimension distribution data
  • the current dimension combination distribution data is compared with the historical dimension combination reference value distribution data to obtain each current dimension combination distribution.
  • the structural change of the data with respect to the corresponding historical dimension combined with the reference value distribution data, and the current dimension combination distribution data of the structural change exceeding the transaction threshold is the combined distribution data of the transaction dimension, if there is a difference between the first dimension distribution data or the transaction dimension combination distribution data Alarm.
  • the method further includes:
  • the transaction positioning module is configured to: use a dimension corresponding to the highest-order transaction-level dimension distribution data as a key transaction dimension, and the dimension corresponding to the combination-distribution data is a combination of the transaction dimension, and the transaction dimension combination
  • the dimension combination including the key transaction dimension is a combination of dimensions affected by the key transaction dimension, and the dimension other than the key transaction dimension included in the dimension combination affected by the key transaction dimension is a dimension affected by the key transaction dimension, and is displayed.
  • the key transaction dimension and the dimension affected by the key transaction dimension is configured to: use a dimension corresponding to the highest-order transaction-level dimension distribution data as a key transaction dimension, and the dimension corresponding to the combination-distribution data is a combination of the transaction dimension, and the transaction dimension combination
  • the dimension combination including the key transaction dimension is a combination of dimensions affected by the key transaction dimension, and the dimension other than the key transaction dimension included in the dimension combination affected by the key transaction dimension is a dimension affected by the key transaction dimension, and is displayed.
  • the transaction determination module is specifically configured to: calculate a chi-square value of the current first-level dimension distribution data and the corresponding historical first-level dimension reference value distribution data, where the transaction-level first-level dimension distribution data is that the corresponding chi-square value exceeds the transaction threshold
  • the current first-level dimension distribution data, the chi-square value of the current dimension combination distribution data and the corresponding historical dimension combination reference value distribution data is calculated, and the transaction dimension combination distribution data is the current dimension combination distribution of the corresponding chi-square value exceeding the transaction threshold data;
  • the transaction positioning module is specifically configured to: the transaction level first dimension distribution data with the highest structural change is the transaction level first dimension distribution data corresponding to the minimum chi-square value.
  • the transaction determination module is specifically configured to: calculate a chi-square value of the current first-level dimension distribution data and the corresponding historical first-level dimension reference value distribution data, where the transaction-level first-level dimension distribution data is that the corresponding chi-square value exceeds the transaction threshold
  • the current first-level dimension distribution data, the chi-square value of the current dimension combination distribution data and the corresponding historical dimension combination reference value distribution data is calculated, and the transaction dimension combination distribution data is the current dimension combination distribution of the corresponding chi-square value exceeding the transaction threshold data;
  • the transaction positioning module is specifically configured to: select, according to the minimum chi-square value, the transaction-level dimension distribution data as the minimum transaction-level dimension distribution data, and select the corresponding chi-square value from the other transaction-level dimension distribution data.
  • the difference between the minimum chi-square value and the minimum chi-square value is smaller than the difference threshold, and the corresponding historical first-level dimension reference value distribution data is used to calculate the goodness coefficient, and the structural change is the highest.
  • the dimension distribution data is the transaction level dimension distribution data corresponding to the minimum determinable coefficient.
  • the transaction positioning module is further configured to: use the transaction-level first-level dimension distribution data corresponding to the key transaction dimension as the key transaction-level dimension distribution data, and the key transaction-level dimension distribution data and the corresponding Each data item of the historical first-level dimension reference value distribution data respectively calculates a difference value, and the data item in which the absolute value of the difference value is the largest is regarded as the main cause of the transaction, and the main cause of the transaction is displayed.
  • FIG. 3 is a schematic block diagram of a preferred embodiment of the present invention, including:
  • Data preparation module 310 The main function of the data preparation module is to preprocess multi-index multi-dimensional data. Specifically include:
  • a data input sub-module 311, configured to acquire the latest data stored in the multi-dimensional data table at a daily granularity
  • the data pre-processing sub-module 312 preprocesses the latest data, and performs data aggregation, null column processing, and accounting on the data stored in the multi-dimensional data table by the daily granularity according to the multi-level dimension after the dimension or dimension intersects.
  • the data is processed smaller than the small data column to generate the distribution data of the multi-level dimension after the indicator crosses the first dimension or the dimension.
  • the single-quantity index performs data pre-processing on dimensions such as provinces, order types, and payment methods (first-level dimensions).
  • the dimension cross sub-module 313 performs a full-arranged combination of these dimensions to generate a new multi-level dimension for corresponding data pre-processing, such as “province_order type”, “province_payment method”, “order type_payment method”. ", province_Order Type_Payment Method”. In this way, we can not only examine the change of data from the perspective of the first dimension, but also refine the multi-level dimension to explore whether the local data has changed.
  • the historical reference value processing sub-module 314 processes the previous non-transformed data stored in the multi-dimensional data table by the daily granularity, and generates a historical reference value by weighted average and stores it as a multi-dimensional data table.
  • the data pre-processing sub-module 312 and the dimension cross-sub-module 313 are similarly executed to perform the corresponding pre-processing flow, so that the history of the multi-level dimension of the index after the first-level dimension or the dimension cross can be obtained.
  • Baseline distribution data for the multi-dimensional data table including the historical reference value.
  • the transaction determination module 320 after the data is preprocessed by the data preparation module process, the two sets of data can be output, that is, the distribution data of the day and the historical reference value distribution data in the multi-level dimension after the first-level dimension or the dimension cross.
  • the main function of the change determination module is to perform a structural diagnosis on the two sets of data based on the hypothesis test to find out whether the two sets of data are consistent in structure. If they are inconsistent, they are considered to have a change.
  • the hypothesis of the test is a small probability of counter-evidence, and the small probability of a small probability event (such as P ⁇ 0.01 or P ⁇ 0.05) does not occur in a trial.
  • the transaction decision module uses this idea to first assume that the two sets of data structures are consistent, and then use statistical test to determine the probability of the hypothesis being established. If the probability is small, the assumption is not true, indicating that the two sets of data structures have changed. Thus, it is said that there is a change in this dimension.
  • the module includes a chi-square detection sub-module 321 and a goodness-of-fit sub-module 322. The chi-square test and the goodness-of-fit test are used. In some scenarios, when the overall data fluctuates greatly, the chi-square test is performed on multiple dimensions. The obtained P values may be approximately equal. At this time, the coefficient of determination R ⁇ 2 calculated by the goodness of fit test can be used to assist in verifying the size of the structural changes in these dimensions. Alerts when a change occurs.
  • Transaction positioning module 330 The main function of the module is to extract key transaction dimensions from all structural change dimensions acquired by the transaction determination module, and other levels of dimensions affected by the key transaction dimension, including dimension positioning sub-module 331 and cross dimension
  • the drill sub-module 332 corresponds to a dimension positioning algorithm and a cross-dimensional drill-down algorithm, respectively.
  • the dimension positioning algorithm finds the key transaction dimension in the first dimension and the second dimension, that is, compares the size of the P value in the same dimension and assists in comparing the size of the R ⁇ 2 value, and considers the smallest is the key transaction dimension.
  • the difference between the distribution data of the current day and the historical reference value distribution data is calculated and sorted in the key transaction dimension, and the data item in which the absolute value of the difference value is the largest is considered as the main cause of the transaction.
  • the cross-dimensional drilling algorithm is based on the key transaction dimension, and the dimension that contains the key transaction dimension and the content that is determined to be the transaction is the dimension affected by the key transaction dimension. For example, if the “payment method” compares the results of the hypothesis test with other dimensions such as “province” and “order type”, and finally the “payment method” is positioned as the key transaction dimension, then compare each in the “payment method” dimension.
  • the fluctuation of the item if the online payment data fluctuates the most, considers the fluctuation of the online payment data as the main cause of the transaction.
  • the cross-dimension of the key transaction dimension including payment method (ie “province_payment method”, “order type_payment method”, etc.), find the dimension affected by the key transaction dimension.
  • the key transaction dimension, the dimension affected by the key transaction dimension, and the main cause of the transaction are output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)
  • Software Systems (AREA)
  • Position Fixing By Use Of Radio Waves (AREA)

Abstract

一种分布数据异动判断定位方法及系统,方法包括:由多个维度交叉组合得到多个维度组合,生成多个关于一级维度的当前一级维度分布数据,以及多个关于维度组合的当前维度组合分布数据,生成多个关于一级维度的历史一级维度基准值分布数据,以及多个关于维度组合的历史维度组合基准值分布数据;得到每个当前一级维度分布数据的结构异动,得到每个当前维度组合分布数据的结构异动,如果有结构异动超过异动阈值的异动一级维度分布数据或异动维度组合分布数据则进行告警。对多维度分布数据分别在一级维度及维度组合上进行检验,克服了现有异动判定与异动定位方法的各种不足,使得异动判断更为迅速准确。

Description

一种分布数据异动判断定位方法及系统 技术领域
本发明涉及分布数据异动相关技术领域,特别是一种分布数据异动判断定位系统。
背景技术
在互联网行业,尤其在电子商务网站业务中,每时每刻都在产生海量的数据,通常这些数据里包括各种指标,且每个指标有不同的维度视角。指标如订单量、订单金额等,维度如省份、订单类型、支付方式等等。当一个指标发生波动时,每个维度相应数据也会随之波动。举例来说,当在线支付系统出现故障时,订单量、订单金额等指标总体上会受到影响,相应地,各种支付方式对应的订单量和订单金额会有波动,此外在其他维度如省份、订单类型上的数据也同样受影响。此时,如何从数据中找出是支付系统出问题导致的异常呢?
在市场环境多变、业务的优化升级、促销的此起彼伏等因素叠加,也会造成这些数据跌宕起伏。数据发生波动时,能否判定为异动(异常波动);异动情况下,如何在众多数据中准确、快速地定位,即甄别出异动指标主要源于哪些维度,是数据异动挖掘的核心问题。
对于异动数据定位,现有的技术大致采用基于阈值比较波动幅度的方法。具体来说,该方法对近期(如最近一周、最近一月)数据(具体维度上的相应数据)做加权平均计算出历史基准值,比较最新数据和历史基准值这两组数据,考察每个数据波动幅度,如果波动幅度超出某个阈值(阈值一般是凭经验人为设定)时,则判定数据发生了异动,并选其中波动幅度最大的作为造成数据异动的主因。
现有数据异动定位技术方案的主要缺点:总体来说,现有人工异 动监测与定位主观性强,从怀疑异动到逐层分解定位到具体明细的异动维度所涉及的环节多、流程长、过程繁杂低效。具体来说,首先是阈值的人为主观设定,不够科学客观;其次是某些场景下(如节假日数据的惯性走低),基于阈值比较的方法容易导致误判;最后是当多组数据同时超出相应阈值时,通常很难定位数据异动的主因。
发明内容
基于此,有必要针对现有技术对数据异动难以准确判断的技术问题,提供一种分布数据异动判断定位方法及系统。
一种分布数据异动判断定位方法,包括:
分布数据准备步骤,包括:获取多维度分布数据,以及多维度基准值分布数据,所述多维度基准值分布数据为所述多维度分布数据的每个数据对应的历史基准值,由多个维度交叉组合得到多个维度组合,根据所述多维度分布数据分别生成多个关于一级维度的当前一级维度分布数据,以及多个关于维度组合的当前维度组合分布数据,根据所述多维度基准值分布数据分别生成多个关于一级维度的历史一级维度基准值分布数据,以及多个关于维度组合的历史维度组合基准值分布数据;
异动判定步骤,包括:将所述当前一级维度分布数据与对应的历史一级维度基准值分布数据进行比较得到每个当前一级维度分布数据相对于对应的历史一级维度基准值分布数据的结构异动,结构异动超过异动阈值的当前一级维度分布数据为异动一级维度分布数据,将所述当前维度组合分布数据与历史维度组合基准值分布数据进行比较得到每个当前维度组合分布数据相对于对应的历史维度组合基准值分布数据的结构异动,结构异动超过异动阈值的当前维度组合分布数据为异动维度组合分布数据,如果有异动一级维度分布数据或异动维度组合分布数据则进行告警。
一种分布数据异动判断定位系统,包括:
分布数据准备模块,用于:获取多维度分布数据,以及多维度基准值分布数据,所述多维度基准值分布数据为所述多维度分布数据的每个数据对应的历史基准值,由多个维度交叉组合得到多个维度组合,根据所述多维度分布数据分别生成多个关于一级维度的当前一级维度分布数据,以及多个关于维度组合的当前维度组合分布数据,根据所述多维度基准值分布数据分别生成多个关于一级维度的历史一级维度基准值分布数据,以及多个关于维度组合的历史维度组合基准值分布数据;
异动判定模块,用于:将所述当前一级维度分布数据与对应的历史一级维度基准值分布数据进行比较得到每个当前一级维度分布数据相对于对应的历史一级维度基准值分布数据的结构异动,结构异动超过异动阈值的当前一级维度分布数据为异动一级维度分布数据,将所述当前维度组合分布数据与历史维度组合基准值分布数据进行比较得到每个当前维度组合分布数据相对于对应的历史维度组合基准值分布数据的结构异动,结构异动超过异动阈值的当前维度组合分布数据为异动维度组合分布数据,如果有异动一级维度分布数据或异动维度组合分布数据则进行告警。
本发明对多维度分布数据分别在一级维度及维度组合上进行检验,克服了现有异动判定与异动定位方法的各种不足,使得异动判断更为迅速准确。
附图说明
图1为本发明一种分布数据异动判断定位方法的工作流程图;
图2为本发明一种分布数据异动判断定位系统的结构模块图;
图3为本发明最佳实施例的模块示意图。
具体实施方式
下面结合附图和具体实施例对本发明做进一步详细的说明。
如图1所示为本发明一种分布数据异动判断定位方法的工作流程图,包括:
步骤S101,包括:获取多维度分布数据,以及多维度基准值分布数据,所述多维度基准值分布数据为所述多维度分布数据的每个数据对应的历史基准值,由多个维度交叉组合得到多个维度组合,根据所述多维度分布数据分别生成多个关于一级维度的当前一级维度分布数据,以及多个关于维度组合的当前维度组合分布数据,根据所述多维度基准值分布数据分别生成多个关于一级维度的历史一级维度基准值分布数据,以及多个关于维度组合的历史维度组合基准值分布数据;
步骤S102,包括:将所述当前一级维度分布数据与对应的历史一级维度基准值分布数据进行比较得到每个当前一级维度分布数据相对于对应的历史一级维度基准值分布数据的结构异动,结构异动超过异动阈值的当前一级维度分布数据为异动一级维度分布数据,将所述当前维度组合分布数据与历史维度组合基准值分布数据进行比较得到每个当前维度组合分布数据相对于对应的历史维度组合基准值分布数据的结构异动,结构异动超过异动阈值的当前维度组合分布数据为异动维度组合分布数据,如果有异动一级维度分布数据或异动维度组合分布数据则进行告警。
步骤S101中,将多维度分布数据分解为关于一级维度的当前一级维度分布数据,以及多个关于维度组合的当前维度组合分布数据。每一个维度下包括多个数据。以订单量指标作为维度的在线支付系统的分布数据为例,省份、订单类型、支付方式等维度为一级维度,“省份_订单类型”、“省份_支付方式”、“订单类型_支付方式”、“省份_订单类型_支付方式”则为维度组合。每一维包括多个数据,例如省份维中可以包括:省份A的数据、省份B的数据、省份C的数据,而订单类型维中可以包括:订单类型D的数据、订单类型E的数据、订单类型F的数据,支付方式维可以包括:支付方式G的数据、支付方式H的数据、支付方式I的数据。则“省份_订单类型”包括:省份A且订单类型D的数据、省份A且订单类型E的数据、省份A且订单类型F的 数据、省份B且订单类型D的数据、省份B且订单类型E的数据、省份B且订单类型F的数据、省份C且订单类型D的数据、省份C且订单类型E的数据、省份C且订单类型F的数据。“省份_支付方式”、“订单类型_支付方式”、“省份_订单类型_支付方式”,以此类推。同样地可以得到由多维度基准值分布数据所得的历史一级维度基准值分布数据和历史维度组合基准值分布数据。其中,多维度基准值分布数据是对应多维度分布数据每个维度的每个数据的基准值,例如对于省份A的基准值、对于省份B的基准值等。将多维度分布数据里的每个数据对应的前期非异动数据进行处理,加权平均生成历史基准值并存储为一个多维数据表则得到多维度基准值分布数据。多维度分布数据可以采用时粒度、日粒度、周粒度、月粒度、年粒度等时间粒度进行保存,前期非异动数据则是指多维度分布数据中所保存的数据采用相同时间粒度进行保存的前期数据中无异动的数据。例如对于省份A的数据,如果采用日粒度进行保存,则为前N天的省份A的数据中无异动的数据,对其进行加权平均则得到省份A的历史基准值。
步骤S102中,对当前一级维度分布数据,以及当前维度组合分布数据分别计算其相对于历史一级维度基准值分布数据和历史维度组合基准值分布数据的结构异动。基于假设检验对上述两组数据进行结构诊断,来发现两组数据的结构是否一致,不一致则认为有异动。即通过结构异动判断当前一级维度分布数据与历史一级维度基准值分布数据的结构是否一致,通过结构异动判断当前维度组合分布数据与历史维度组合基准值分布数据的结构是否一致。假设检验的思想是小概率反证法思想,小概率思想时指小概率事件(如P<0.01或P<0.05)在一次试验中基本不会发生。步骤S102中,利用这一思想,就是先假设两组数据结构保持一致,然后用统计检验的方法确定假设成立的可能性大小,如可能性很小,则假设不成立,说明两组数据结构有变化,从而得出在这个维度上是有异动的。
本发明技术方案,基于假设检验的思路,通过对指标数据在维度 上的结构检验或维度交叉后的数据结构进行检验的方法,相比于基于阈值比较波动幅度的判定方法,能够更准确地判断异动,并能快速作出异动定位。
仍然拿之前的例子来说明,当在线支付系统出现故障时,订单量、订单金额会有波动,支付方式维度的数据肯定会有波动,同样省份维度或订单类型维度的数据也会有波动。现有的基于阈值比较波动幅度的方法,一般来说会发现这三个维度上的数据都有异动,但很难定位出是支付环节导致的异动。但通过本发明基于假设检验的方法,分别对支付方式、省份、订单类型这三个维度上的数据进行检验,不难发现,省份、订单类型数据跟历史基准值比较,可能数值均有下降,但在整体结构上是基本一致(如省份维度,每个省份的数据占比变化不大)的,通过结构检验,就不会判断为异常。但从支付方式维度来看,在线支付出问题时,在线支付的订单量或订单金额占比必然下降很严重,其他支付方式如货到付款、邮政汇款等的占比则此消彼长大幅上升,其结构发生了明显异常,对数据进行结构检验,就能捕捉到这个异常,从而实现数据的异动定位。所以本发明弥补了现有异动判定与异动定位方法的不足。
在其中一个实施例中,还包括:
异动定位步骤,包括:将结构异动最高的异动一级维度分布数据所对应的维度作为关键异动维度,所述异动维度组合分布数据所对应的维度组合为异动维度组合,所述异动维度组合中包括所述关键异动维度的维度组合为受关键异动维度影响的维度组合,所述受关键异动维度影响的维度组合所包括的除关键异动维度以外的其他维度为受关键异动维度影响的维度,显示所述关键异动维度以及受关键异动维度影响的维度。
本实施例中,通过判断关键异动维度,从而找出受关键异动维度影响的维度组合,得出受关键异动维度影响的其他维度。
在其中一个实施例中:
所述异动判定步骤,具体包括:计算当前一级维度分布数据与对应的历史一级维度基准值分布数据的卡方值,所述异动一级维度分布数据为对应的卡方值超过异动阈值的当前一级维度分布数据,计算当前维度组合分布数据与对应的历史维度组合基准值分布数据的卡方值,所述异动维度组合分布数据为对应的卡方值超过异动阈值的当前维度组合分布数据;
所述异动定位步骤,具体包括:结构异动最高的异动一级维度分布数据为与最小卡方值对应的异动一级维度分布数据。
卡方检验:卡方检验是一种假设检验方法,统计样本的实际观测值与理论推断值之间的偏离程度,实际观测值与理论推断值之间的偏离程度就决定卡方值的大小,卡方值越大,越不符合,偏差越小,卡方值就越小,越趋于符合,若量值完全相等时,卡方值就为0,表明理论值完全符合。通过卡方值可以得出假设成立的概率,即显著性水平或P值,P值越小,则假设成立的可能性小,假设越不成立。
本实施例采用最小卡方值对异动进行判断,使得异动判断定位更为可靠。
在其中一个实施例中:
所述异动判定步骤,具体包括:计算当前一级维度分布数据与对应的历史一级维度基准值分布数据的卡方值,所述异动一级维度分布数据为对应的卡方值超过异动阈值的当前一级维度分布数据,计算当前维度组合分布数据与对应的历史维度组合基准值分布数据的卡方值,所述异动维度组合分布数据为对应的卡方值超过异动阈值的当前维度组合分布数据;
所述异动定位步骤,具体包括:选择与最小卡方值对应的异动一级维度分布数据为最小异动一级维度分布数据,从其他的异动一级维 度分布数据中,选择对应的卡方值与最小卡方值的差值小于差值阈值的异动一级维度分布数据,与对应的历史一级维度基准值分布数据进行拟合优度检验计算得到可决系数,结构异动最高的异动一级维度分布数据为与最小可决系数对应的异动一级维度分布数据。
拟合优度(Goodness of Fit)是指回归直线对观测值的拟合程度。度量拟合优度的统计量是可决系数(亦称确定系数)R^2。R^2的取值范围是[0,1]。R^2的值越接近1,说明回归直线对观测值的拟合程度越好;反之,R^2的值越接近0,说明回归直线对观测值的拟合程度越差。
本实施例中对异动判断采用卡方值与拟合优度检验结合的方式对异动进行判断定位,使得异动判断定位更为准确。
在其中一个实施例中,所述异动定位步骤,还包括:将关键异动维度所对应的异动一级维度分布数据作为关键异动一级维度分布数据,对关键异动一级维度分布数据与对应的历史一级维度基准值分布数据的每一数据项分别计算差值,将其中差值的绝对值最大的数据项作为异动主因,显示所述异动主因。
本实施例能够显示出异动主因,使得异动判断定位更为精确。
图2为本发明一种分布数据异动判断定位系统的结构模块图,包括:
分布数据准备模块201,用于:获取多维度分布数据,以及多维度基准值分布数据,所述多维度基准值分布数据为所述多维度分布数据的每个数据对应的历史基准值,由多个维度交叉组合得到多个维度组合,根据所述多维度分布数据分别生成多个关于一级维度的当前一级维度分布数据,以及多个关于维度组合的当前维度组合分布数据,根据所述多维度基准值分布数据分别生成多个关于一级维度的历史一 级维度基准值分布数据,以及多个关于维度组合的历史维度组合基准值分布数据;
异动判定模块202,用于:将所述当前一级维度分布数据与对应的历史一级维度基准值分布数据进行比较得到每个当前一级维度分布数据相对于对应的历史一级维度基准值分布数据的结构异动,结构异动超过异动阈值的当前一级维度分布数据为异动一级维度分布数据,将所述当前维度组合分布数据与历史维度组合基准值分布数据进行比较得到每个当前维度组合分布数据相对于对应的历史维度组合基准值分布数据的结构异动,结构异动超过异动阈值的当前维度组合分布数据为异动维度组合分布数据,如果有异动一级维度分布数据或异动维度组合分布数据则进行告警。
在其中一个实施例中,还包括:
异动定位模块,用于:将结构异动最高的异动一级维度分布数据所对应的维度作为关键异动维度,所述异动维度组合分布数据所对应的维度组合为异动维度组合,所述异动维度组合中包括所述关键异动维度的维度组合为受关键异动维度影响的维度组合,所述受关键异动维度影响的维度组合所包括的除关键异动维度以外的其他维度为受关键异动维度影响的维度,显示所述关键异动维度以及受关键异动维度影响的维度。
在其中一个实施例中:
所述异动判定模块,具体用于:计算当前一级维度分布数据与对应的历史一级维度基准值分布数据的卡方值,所述异动一级维度分布数据为对应的卡方值超过异动阈值的当前一级维度分布数据,计算当前维度组合分布数据与对应的历史维度组合基准值分布数据的卡方值,所述异动维度组合分布数据为对应的卡方值超过异动阈值的当前维度组合分布数据;
所述异动定位模块,具体用于:结构异动最高的异动一级维度分布数据为与最小卡方值对应的异动一级维度分布数据。
在其中一个实施例中:
所述异动判定模块,具体用于:计算当前一级维度分布数据与对应的历史一级维度基准值分布数据的卡方值,所述异动一级维度分布数据为对应的卡方值超过异动阈值的当前一级维度分布数据,计算当前维度组合分布数据与对应的历史维度组合基准值分布数据的卡方值,所述异动维度组合分布数据为对应的卡方值超过异动阈值的当前维度组合分布数据;
所述异动定位模块,具体用于:选择与最小卡方值对应的异动一级维度分布数据为最小异动一级维度分布数据,从其他的异动一级维度分布数据中,选择对应的卡方值与最小卡方值的差值小于差值阈值的异动一级维度分布数据,与对应的历史一级维度基准值分布数据进行拟合优度检验计算得到可决系数,结构异动最高的异动一级维度分布数据为与最小可决系数对应的异动一级维度分布数据。
在其中一个实施例中,所述异动定位模块,还用于:将关键异动维度所对应的异动一级维度分布数据作为关键异动一级维度分布数据,对关键异动一级维度分布数据与对应的历史一级维度基准值分布数据的每一数据项分别计算差值,将其中差值的绝对值最大的数据项作为异动主因,显示所述异动主因。
图3为本发明最佳实施例的模块示意图,包括:
数据准备模块310:数据准备模块的主要功能是对多指标多维度数据进行预处理。具体包括:
数据输入子模块311,用于获取以日粒度存储在多维数据表里的最新数据;
数据预处理子模块312,就是对最新数据进行预处理,对以日粒度存储在多维数据表里的数据,分别根据维度或维度交叉后的多级维度,进行数据聚合、空值列处理、占比小数据列处理,从而生成出指标在一级维度或维度交叉后的多级维度的分布数据。具体来说,以订 单量指标为例,分别对省份、订单类型、支付方式等维度(一级维度)进行数据预处理。
维度交叉子模块313,对这些维度做全排列组合交叉,生成新的多级维度来进行相应的数据预处理,如“省份_订单类型”,“省份_支付方式”,“订单类型_支付方式”,“省份_订单类型_支付方式”。这样,我们不仅能从一级维度的视角去考察数据的异动情况,还可以细化到多级维度来发掘局部数据是否有异动。
历史基准值处理子模块314,对以日粒度存储在多维数据表里的前期非异动数据进行处理,加权平均生成历史基准值并存储为一个多维数据表。对这个包含历史基准值的多维数据表,同样地通过数据预处理子模块312和维度交叉子模块313执行相应预处理流程,就能够得到指标在一级维度或维度交叉后的多级维度的历史基准值分布数据。
异动判定模块320:数据经过数据准备模块流程预处理后,能输出两组数据,即一级维度或维度交叉后多级维度上当日的分布数据和历史基准值分布数据。异动判定模块的主要功能是基于假设检验对这两组数据进行结构诊断,来发现两组数据的结构是否一致,不一致则认为有异动。假设检验的思想时小概率反证法思想,小概率思想时指小概率事件(如P<0.01或P<0.05)在一次试验中基本不会发生。异动判定模块,利用这一思想,就是先假设两组数据结构保持一致,然后用统计检验的方法确定假设成立的可能性大小,如可能性很小,则假设不成立,说明两组数据结构有变化,从而得出在这个维度上是有异动的。本模块包括卡方检测子模块321和拟合优度子模块322,采用卡方检验和拟合优度检验的方法,某些场景下,整体数据波动较大时,多个维度上卡方检验得出的P值可能都近似相等,此时拟合优度检验计算出来的可决系数R^2可以用来辅助验证这些维度上结构变化的大小。当出现异动时进行告警。
异动定位模块330:本模块的主要功能是从异动判定模块获取的所有结构有变化维度中挖掘出关键异动维度,以及受关键异动维度影响的其他各级维度,包括维度定位子模块331和交叉维度下钻子模块332,分别对应维度定位算法和交叉维度下钻算法。维度定位算法,会在一级维度和二级维度里找关键异动维度,即在同级维度里优先比较P值的大小并辅助比较R^2值的大小,把最小的认为是关键异动维度。然后在该关键异动维度计算当日的分布数据和历史基准值分布数据各项的差值并排序,把其中差值绝对值最大的数据项认为是导致异动的主因。交叉维度下钻算法是在关键异动维度定位后,把那些维度组合中包含关键异动维度且自身又被判定为异动的维度作为受关键异动维度影响的维度。举例来说,假如“支付方式”跟其他维度如“省份”、“订单类型”比较假设检验的结果,最终“支付方式”被定位为关键异动维度的话,则在“支付方式”维度里比较各项的波动情况,如果其中在线支付数据波动最大,则把在线支付数据的波动认为是异动的主因。最后,就是在包含支付方式这一关键异动维度的交叉维度(即“省份_支付方式”,“订单类型_支付方式”等)里,找出受关键异动维度影响的维度。最后输出关键异动维度、受关键异动维度影响的维度和异动的主因。
以上所述实施例仅表达了本发明的儿种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (10)

  1. 一种分布数据异动判断定位方法,其特征在于,包括:
    分布数据准备步骤,包括:获取多维度分布数据,以及多维度基准值分布数据,所述多维度基准值分布数据为所述多维度分布数据的每个数据对应的历史基准值,由多个维度交叉组合得到多个维度组合,根据所述多维度分布数据分别生成多个关于一级维度的当前一级维度分布数据,以及多个关于维度组合的当前维度组合分布数据,根据所述多维度基准值分布数据分别生成多个关于一级维度的历史一级维度基准值分布数据,以及多个关于维度组合的历史维度组合基准值分布数据;
    异动判定步骤,包括:将所述当前一级维度分布数据与对应的历史一级维度基准值分布数据进行比较得到每个当前一级维度分布数据相对于对应的历史一级维度基准值分布数据的结构异动,结构异动超过异动阈值的当前一级维度分布数据为异动一级维度分布数据,将所述当前维度组合分布数据与历史维度组合基准值分布数据进行比较得到每个当前维度组合分布数据相对于对应的历史维度组合基准值分布数据的结构异动,结构异动超过异动阈值的当前维度组合分布数据为异动维度组合分布数据,如果有异动一级维度分布数据或异动维度组合分布数据则进行告警。
  2. 根据权利要求1所述的分布数据异动判断定位方法,其特征在于,还包括:
    异动定位步骤,包括:将结构异动最高的异动一级维度分布数据所对应的维度作为关键异动维度,所述异动维度组合分布数据所对应的维度组合为异动维度组合,所述异动维度组合中包括所述关键异动维度的维度组合为受关键异动维度影响的维度组合,所述受关键异动维度影响的维度组合所包括的除关键异动维度以外的其他维度为受关键异动维度影响的维度,显示所述关键异动维度以及受关键异动维度影响的维度。
  3. 根据权利要求2所述的分布数据异动判断定位方法,其特征在于:
    所述异动判定步骤,具体包括:计算当前一级维度分布数据与对应的历史一级维度基准值分布数据的卡方值,所述异动一级维度分布数据为对应的卡方值超过异动阈值的当前一级维度分布数据,计算当前维度组合分布数据与对应的历史维度组合基准值分布数据的卡方值,所述异动维度组合分布数据为对应的卡方值超过异动阈值的当前维度组合分布数据;
    所述异动定位步骤,具体包括:结构异动最高的异动一级维度分布数据为与最小卡方值对应的异动一级维度分布数据。
  4. 根据权利要求2所述的分布数据异动判断定位方法,其特征在于:
    所述异动判定步骤,具体包括:计算当前一级维度分布数据与对应的历史一级维度基准值分布数据的卡方值,所述异动一级维度分布数据为对应的卡方值超过异动阈值的当前一级维度分布数据,计算当前维度组合分布数据与对应的历史维度组合基准值分布数据的卡方值,所述异动维度组合分布数据为对应的卡方值超过异动阈值的当前维度组合分布数据;
    所述异动定位步骤,具体包括:选择与最小卡方值对应的异动一级维度分布数据为最小异动一级维度分布数据,从其他的异动一级维度分布数据中,选择对应的卡方值与最小卡方值的差值小于差值阈值的异动一级维度分布数据,与对应的历史一级维度基准值分布数据进行拟合优度检验计算得到可决系数,结构异动最高的异动一级维度分布数据为与最小可决系数对应的异动一级维度分布数据。
  5. 根据权利要求2所述的分布数据异动判断定位方法,其特征在于,所述异动定位步骤,还包括:将关键异动维度所对应的异动一级维度分布数据作为关键异动一级维度分布数据,对关键异动一级维度 分布数据与对应的历史一级维度基准值分布数据的每一数据项分别计算差值,将其中差值的绝对值最大的数据项作为异动主因,显示所述异动主因。
  6. 一种分布数据异动判断定位系统,其特征在于,包括:
    分布数据准备模块,用于:获取多维度分布数据,以及多维度基准值分布数据,所述多维度基准值分布数据为所述多维度分布数据的每个数据对应的历史基准值,由多个维度交叉组合得到多个维度组合,根据所述多维度分布数据分别生成多个关于一级维度的当前一级维度分布数据,以及多个关于维度组合的当前维度组合分布数据,根据所述多维度基准值分布数据分别生成多个关于一级维度的历史一级维度基准值分布数据,以及多个关于维度组合的历史维度组合基准值分布数据;
    异动判定模块,用于:将所述当前一级维度分布数据与对应的历史一级维度基准值分布数据进行比较得到每个当前一级维度分布数据相对于对应的历史一级维度基准值分布数据的结构异动,结构异动超过异动阈值的当前一级维度分布数据为异动一级维度分布数据,将所述当前维度组合分布数据与历史维度组合基准值分布数据进行比较得到每个当前维度组合分布数据相对于对应的历史维度组合基准值分布数据的结构异动,结构异动超过异动阈值的当前维度组合分布数据为异动维度组合分布数据,如果有异动一级维度分布数据或异动维度组合分布数据则进行告警。
  7. 根据权利要求6所述的分布数据异动判断定位系统,其特征在于,还包括:
    异动定位模块,用于:将结构异动最高的异动一级维度分布数据所对应的维度作为关键异动维度,所述异动维度组合分布数据所对应的维度组合为异动维度组合,所述异动维度组合中包括所述关键异动维度的维度组合为受关键异动维度影响的维度组合,所述受关键异动维度影响的维度组合所包括的除关键异动维度以外的基他维度为受关 键异动维度影响的维度,显示所述关键异动维度以及受关键异动维度影响的维度。
  8. 根据权利要求6所述的分布数据异动判断定位系统,其特征在于:
    所述异动判定模块,具体用于:计算当前一级维度分布数据与对应的历史一级维度基准值分布数据的卡方值,所述异动一级维度分布数据为对应的卡方值超过异动阈值的当前一级维度分布数据,计算当前维度组合分布数据与对应的历史维度组合基准值分布数据的卡方值,所述异动维度组合分布数据为对应的卡方值超过异动阈值的当前维度组合分布数据;
    所述异动定位模块,具体用于:结构异动最高的异动一级维度分布数据为与最小卡方值对应的异动一级维度分布数据。
  9. 根据权利要求6所述的分布数据异动判断定位系统,其特征在于:
    所述异动判定模块,具体用于:计算当前一级维度分布数据与对应的历史一级维度基准值分布数据的卡方值,所述异动一级维度分布数据为对应的卡方值超过异动阈值的当前一级维度分布数据,计算当前维度组合分布数据与对应的历史维度组合基准值分布数据的卡方值,所述异动维度组合分布数据为对应的卡方值超过异动阈值的当前维度组合分布数据;
    所述异动定位模块,具体用于:选择与最小卡方值对应的异动一级维度分布数据为最小异动一级维度分布数据,从其他的异动一级维度分布数据中,选择对应的卡方值与最小卡方值的差值小于差值阈值的异动一级维度分布数据,与对应的历史一级维度基准值分布数据进行拟合优度检验计算得到可决系数,结构异动最高的异动一级维度分布数据为与最小可决系数对应的异动一级维度分布数据。
  10. 根据权利要求6所述的分布数据异动判断定位系统,其特征在于,所述异动定位模块,还用于:将关键异动维度所对应的异动一级维度分布数据作为关键异动一级维度分布数据,对关键异动一级维度分布数据与对应的历史一级维度基准值分布数据的每一数据项分别计算差值,将其中差值的绝对值最大的数据项作为异动主因,显示所述异动主因。
PCT/CN2016/072348 2015-03-04 2016-01-27 一种分布数据异动判断定位方法及系统 WO2016138805A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510096586.0 2015-03-04
CN201510096586.0A CN104715027B (zh) 2015-03-04 2015-03-04 一种分布数据异动判断定位方法及系统

Publications (1)

Publication Number Publication Date
WO2016138805A1 true WO2016138805A1 (zh) 2016-09-09

Family

ID=53414354

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/072348 WO2016138805A1 (zh) 2015-03-04 2016-01-27 一种分布数据异动判断定位方法及系统

Country Status (3)

Country Link
CN (1) CN104715027B (zh)
HK (1) HK1208927A1 (zh)
WO (1) WO2016138805A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715027B (zh) * 2015-03-04 2018-03-30 北京京东尚科信息技术有限公司 一种分布数据异动判断定位方法及系统
CN108880845B (zh) * 2017-05-16 2021-01-05 腾讯科技(深圳)有限公司 一种信息提示的方法以及相关装置
CN107908533B (zh) * 2017-06-15 2019-11-12 平安科技(深圳)有限公司 一种数据库性能指标的监测方法、装置、计算机可读存储介质及设备
CN109697203B (zh) * 2017-10-23 2023-03-24 腾讯科技(深圳)有限公司 指标异动分析方法及设备、计算机存储介质、计算机设备
CN111090644A (zh) * 2019-12-26 2020-05-01 成都康赛信息技术有限公司 基于数据分布波动率的数据一致性评估方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239753A1 (en) * 2006-04-06 2007-10-11 Leonard Michael J Systems And Methods For Mining Transactional And Time Series Data
CN102129525A (zh) * 2011-03-24 2011-07-20 华北电力大学 汽轮机组振动与过程信号异常搜索分析方法
CN103793601A (zh) * 2014-01-20 2014-05-14 广东电网公司电力科学研究院 基于异常搜索和组合预测的汽轮机组在线故障预警方法
CN104715027A (zh) * 2015-03-04 2015-06-17 北京京东尚科信息技术有限公司 一种分布数据异动判断定位方法及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7437307B2 (en) * 2001-02-20 2008-10-14 Telmar Group, Inc. Method of relating multiple independent databases
US20030200134A1 (en) * 2002-03-29 2003-10-23 Leonard Michael James System and method for large-scale automatic forecasting

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239753A1 (en) * 2006-04-06 2007-10-11 Leonard Michael J Systems And Methods For Mining Transactional And Time Series Data
CN102129525A (zh) * 2011-03-24 2011-07-20 华北电力大学 汽轮机组振动与过程信号异常搜索分析方法
CN103793601A (zh) * 2014-01-20 2014-05-14 广东电网公司电力科学研究院 基于异常搜索和组合预测的汽轮机组在线故障预警方法
CN104715027A (zh) * 2015-03-04 2015-06-17 北京京东尚科信息技术有限公司 一种分布数据异动判断定位方法及系统

Also Published As

Publication number Publication date
HK1208927A1 (zh) 2016-03-18
CN104715027A (zh) 2015-06-17
CN104715027B (zh) 2018-03-30

Similar Documents

Publication Publication Date Title
JP6707564B2 (ja) データ品質分析
WO2016138805A1 (zh) 一种分布数据异动判断定位方法及系统
Alves et al. Deriving metric thresholds from benchmark data
US11748227B2 (en) Proactive information technology infrastructure management
Kogan et al. Design and evaluation of a continuous data level auditing system
US9043647B2 (en) Fault detection and localization in data centers
TWI736587B (zh) 基於大數據推算開發對象關係的方法及裝置
CN110874778A (zh) 异常订单检测方法及装置
CN109934268B (zh) 异常交易检测方法及系统
CN110708204A (zh) 一种基于运维知识库的异常处理方法、系统、终端及介质
CN111967976B (zh) 基于知识图谱的风险企业确定方法及装置
US20200160121A1 (en) Systems and method for scoring entities and networks in a knowledge graph
CN102855588B (zh) 交易数据检测方法、装置及服务器
AU2012216531B1 (en) Data quality analysis and management system
CN106121622B (zh) 一种基于示功图的有杆泵抽油井的多故障诊断方法
CN103366091A (zh) 基于多级阈值指数加权平均的异常报税数据检测方法
WO2019019429A1 (zh) 一种虚拟机异常检测方法、装置、设备及存储介质
CN111815207A (zh) 一种针对供应链金融的风险定量评估方法
CN106528313A (zh) 一种主机变量异常检测方法及系统
CN113886373A (zh) 一种数据处理方法、装置及电子设备
CN110458713B (zh) 模型监控方法、装置、计算机设备及存储介质
CN112329862A (zh) 基于决策树的反洗钱方法及系统
CN117272145A (zh) 转辙机的健康状态评估方法、装置和电子设备
CN112116197A (zh) 一种基于供应商评价体系的不良行为预警方法及系统
CN113835947A (zh) 一种基于异常识别结果确定异常原因的方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16758421

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC , EPO FORM 1205A DATED 18.12.17.

122 Ep: pct application non-entry in european phase

Ref document number: 16758421

Country of ref document: EP

Kind code of ref document: A1