CN106991145B - Data monitoring method and device - Google Patents

Data monitoring method and device Download PDF

Info

Publication number
CN106991145B
CN106991145B CN201710178551.0A CN201710178551A CN106991145B CN 106991145 B CN106991145 B CN 106991145B CN 201710178551 A CN201710178551 A CN 201710178551A CN 106991145 B CN106991145 B CN 106991145B
Authority
CN
China
Prior art keywords
index
data
determining
historical
interactive data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710178551.0A
Other languages
Chinese (zh)
Other versions
CN106991145A (en
Inventor
张文举
陈汉
黄珍妮
张彦坤
郑瑾
陈根
覃非
戴奇波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN201710178551.0A priority Critical patent/CN106991145B/en
Publication of CN106991145A publication Critical patent/CN106991145A/en
Application granted granted Critical
Publication of CN106991145B publication Critical patent/CN106991145B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/30Payment architectures, schemes or protocols characterised by the use of specific devices or networks
    • G06Q20/34Payment architectures, schemes or protocols characterised by the use of specific devices or networks using cards, e.g. integrated circuit [IC] cards or magnetic cards
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Technology Law (AREA)
  • Development Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Marketing (AREA)
  • Fuzzy Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a device for monitoring data, wherein the method comprises the following steps: acquiring historical interaction data of each monitored object in a transaction link within a set time length; dividing historical interactive data of each monitoring object into historical interactive data sets of N indexes; aiming at a historical interaction data set of a first index, the following operations are performed, wherein the first index is any one index in N indexes: determining a fluctuation period of a historical interactive data set of a first index, an average value of historical interactive data in the fluctuation period, and a standard deviation of the historical interactive data in the fluctuation period of the first index; determining a monitoring baseline of the first index according to the mean value and the standard deviation, wherein the monitoring baseline is used for indicating the fluctuation range of the normal historical interactive data of the first index; and determining abnormal data in the interactive data set of the first index according to the monitoring baseline, wherein the method is used for providing a new abnormal data monitoring means to monitor the abnormal state of historical data for a certain time.

Description

Data monitoring method and device
Technical Field
The present invention relates to the field of data processing, and in particular, to a method and an apparatus for monitoring data.
Background
Currently, with the increasing popularity of computer and network applications and the increasing abundance of business categories in different fields, it is becoming more and more important to analyze information interaction data (such as response codes in transaction process data in the financial field) and monitor the occurrence of abnormal situations based on the analysis results.
In the existing technical solution, when analyzing information interaction data, it is usually to analyze real-time data in information interaction. Specifically, mass original real-time data interaction information associated with the monitored object is collected firstly, then data derivation operation is carried out to generate derived index data, and then abnormality judgment operation is carried out based on the derived index data. Because the period of the real-time data is short and the relative change is large, the abnormality judgment operation for the real-time data is not suitable for judging the abnormality of the historical data with long time, mainly because the change of the periodic historical data of the whole link cannot be controlled by the prior technical scheme.
Disclosure of Invention
The embodiment of the invention provides a method and a device for monitoring data, which are used for providing a new abnormal data monitoring means to monitor the abnormal state of historical data for a certain time.
The method comprises a method for monitoring data, which comprises the following steps: acquiring historical interaction data of each monitored object in a transaction link within a set time length;
dividing historical interactive data of each monitoring object into historical interactive data sets of N indexes;
for the historical interaction data set of the first index, performing the following operations, wherein the first index is any one of the N indexes:
determining a fluctuation period of the historical interactive data set of the first index, a mean value of the historical interactive data in the fluctuation period, and a standard deviation of the historical interactive data in the fluctuation period of the first index;
determining a monitoring baseline of the first index according to the mean value and the standard deviation, wherein the monitoring baseline is used for indicating the fluctuation range of the normal historical interaction data of the first index;
and determining abnormal data in the interactive data set of the first index according to the monitoring baseline.
Based on the same inventive concept, the embodiment of the present invention further provides a device for monitoring data, the device comprising:
the acquisition unit is used for acquiring historical interaction data of each monitored object in a transaction link within a set time length;
the dividing unit is used for dividing the historical interactive data of each monitoring object into historical interactive data sets of N indexes;
a determining unit, configured to perform the following operations on a historical interaction data set of the first index, where the first index is any one of the N indexes: determining a fluctuation period of the historical interactive data set of the first index, a mean value of the historical interactive data in the fluctuation period, and a standard deviation of the historical interactive data in the fluctuation period of the first index; determining a monitoring baseline of the first index according to the mean value and the standard deviation, wherein the monitoring baseline is used for indicating the fluctuation range of the normal historical interaction data of the first index;
and the judging unit is used for determining abnormal data in the interactive data set of the first index according to the monitoring baseline.
The embodiment of the invention obtains interactive data of N indexes of an object to be monitored for a certain time, generally data of more than one week, then divides the obtained data into types according to service indexes to obtain an interactive data set of each index, thereby determining the abnormal state of the data in the interactive data set of the index by using the index base line corresponding to the index. The index baseline is determined according to the mean value and the standard deviation of the index data set, the acquisition period of the object to be monitored is longer than that of real-time data, and the index baseline is also a rule obtained by analyzing historical interactive data in a historical period of time, so that abnormal judgment operation is performed, the accuracy is higher, and the probability of erroneous judgment is reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for monitoring data according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a method for dividing indexes of transaction links of a bank card according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an index rule set according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a curve for determining abnormal data according to an embodiment of the present invention;
FIG. 5 is an exploded view of the reason for the absolute quantity anomaly according to the embodiment of the present invention;
FIG. 6 is an exploded view of the cause of the anomaly in the relative quantity according to the embodiment of the present invention;
FIG. 7 is a diagram illustrating a method for predicting abnormal data from parity data according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating a multitasking concurrency according to an embodiment of the present invention;
fig. 9 is a schematic diagram of a device for monitoring data according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an embodiment of the present invention provides a schematic flow chart of a method for monitoring data, and a specific implementation method includes:
step S101, obtaining historical interaction data of each monitoring object in the transaction link within a set time length.
Step S102, dividing historical interactive data of each monitoring object into historical interactive data sets of N indexes.
Step S103, for the historical interaction data set of the first index, executing the following operations, where the first index is any one of the N indexes: determining a fluctuation period of the historical interactive data set of the first index, a mean value of the historical interactive data in the fluctuation period, and a standard deviation of the historical interactive data in the fluctuation period of the first index;
step S104, determining a monitoring baseline of the first index according to the mean value and the standard deviation, wherein the monitoring baseline is used for indicating the fluctuation range of the normal historical interactive data of the first index;
step S105, determining abnormal data in the interactive data set of the first index according to the monitoring base line.
In step S101, each monitored object refers to a merchant, an issuer, a bank, and the like in a transaction chain of a bank card. The indexes refer to data indexes related to analysis from the aspects of finance, market, products and the like. Generally, the key path index system of the bank card transaction link is divided into three levels, as shown in fig. 2, 1) and a top level, and the category of the index system is defined mainly according to the service type to be monitored in the key link of the transaction link; 2) the middle layer is mainly used for classifying the category of the index system; 3) and a bottom layer, wherein a plurality of detailed specific indexes 6700 of the index system form a specific index set.
It should be noted that, for all the corresponding Data of the bank card transaction link index system, Data transfer is first required, that is, the Data is first transferred to the operation Data Analysis system odas (operation Data Analysis system) own database, and then the original Data in the own database is collected and filtered as necessary, so as to reduce the pressure of storing the Data after redundancy is removed. In addition, for the transferred data, the following operations need to be performed respectively: a1, defining data structures of different fields, levels and categories from the aspects of formats, data magnitude, service priority and the like according to an index set, index derivation, a rule base and a judgment requirement; a2, standardizing the non-standardized data; converting dimensionality with certain periodicity and skewness through a statistical standardization formula; a3, storing the converted standardized data in layers according to index types, fields and magnitudes, and avoiding the problem that performance is affected by subsequent index derivation, full-table scanning for data query and the like.
Thus, after the data acquisition and the preliminary processing are finished and the data are stored, the following processing flows are respectively executed on the data in each index data set, and the step one is as follows: data cleaning is needed first, and the step two is as follows: establishing an index baseline of the index data set by using a rule, and performing a third step: and (3) carrying out judgment on the data in the index data set by using the index baseline, and carrying out the fourth step: and then the reason of the abnormal data is subjected to downward detection analysis.
For the data cleaning in the step one, a data cleaning method existing in the prior art can be used for cleaning, and details are not described here.
In addition, the establishment of the index baseline in the second step requires that the values of various parameters in the formula for establishing the index baseline, such as the mean value and the standard deviation, are determined first. The determination of the mean value requires determining whether the data in the index data set is stable, and if so, the mean value may be directly obtained, that is, the mean value of the sample points whose sliding window (i.e., the time period length, day, week, month, season, and year corresponding to the observed index) is L is obtained. If the data sequence is not stable, the data conversion can be carried out by adopting methods of taking difference, logarithm, unit root and the like to form a new data sequence. Thus, the conversion is carried out through difference, and the stable and stable conversion of the mean value is achieved. The new sequence after differencing is then averaged. Specifically. Performing T test on data in the historical interactive data set of the first index to determine a sequence of K different mean values;
calculating statistical correlation coefficients between the K sequences and the time attribute;
when the statistical correlation coefficient is determined to be larger than a first threshold value, differentiating the K sequences until the differentiated statistical correlation coefficient is not larger than the first threshold value;
and calculating the average value of the sequence after the difference, and taking the average value of the sequence after the difference as the average value of the historical interaction data in the fluctuation period.
That is, on the basis of the rule set generation module, the monitoring base line is established and the threshold value is set by a statistical hypothesis test method, so that the index set is automatically matched with the rule set, the base line, the threshold value and the sequence rule influence factor algorithm, the specific process is as shown in fig. 3, after the business index set and the holiday table are collected, the index set and the corresponding label are determined, B1, and the number of the index base lines is determined by T test, that is, 7 base lines are respectively established from monday to sunday, or 2 base lines are established from weekday to weekend. The specific implementation method is that two sequences with equal sample mean values are combined into the same monitoring sequence through T test, and the sequences with unequal mean values are brought into the monitoring sequence.
B2, calculating the correlation coefficient Rs between time and Spearman (Spearman) of index to judge whether the sequence has growth non-stationary trend. If Rs is greater than 0.9, differentiating the sequence and verifying whether the Spearman correlation coefficient Rs of the newly generated sequence and time after differentiation is greater than 0.9, if so, continuing differentiation until the correlation coefficient Rs is less than 0.9. Through verification, the first-order difference of the index system related by the invention can meet the requirement.
The Spearman correlation coefficient calculation method comprises the following steps: the time and the index are numbered according to the rank order to obtain sequences Xi and Yi of the two ranks (i is 1, 2 … n), and the difference Di of each pair of rank sequences is obtained in sequence; the second step is that: calculating a correlation coefficient
Figure BDA0001253000280000061
And n is the sample capacity, and when the monitored object has no cycle fluctuation rule, n is the length L of the sliding window. For example, when the week is regular, if the monitored object is the number of transactions of Monday, n is the number of days of Monday in the L window;
b3, first order difference, taking the sequence after difference Δ Xt as a new monitoring sequence, calculating the mean value and standard deviation.
Further, determining a superposition factor of the monitoring baseline according to historical interaction data of a first time period in a set time length and historical interaction data of a second time period in the set time length; determining the monitoring baseline of the first index according to the superposition factor, the mean value and the standard deviation and according to a first formula, wherein the first formula is as follows:
Figure BDA0001253000280000062
wherein, the lambda is a superposition factor,
Figure BDA0001253000280000063
is the average value of the first index in the fluctuation period, sigma is the standard deviation of the first index in the fluctuation period, and b is the amplitude factor.
Wherein, the superposition factor of the monitoring baseline can refer to a holiday factor lambda1Or the monthly factor lambda2Holiday λ of festival1Moon factor lambda2In accordance with the priority of holiday λ1Has a higher priority than the monthly factor lambda2That is, when the same day as the holiday and the beginning of the month (middle and end) are monitored, the baseline algorithm is adjusted to
Figure BDA0001253000280000064
When the monitoring day is only the beginning of the month (middle, end), the baseline algorithm is
Figure BDA0001253000280000065
The baseline algorithm on a regular day is
Figure BDA0001253000280000066
To avoid duplicate assignments affected by the factor.
In addition, as can be seen from fig. 3, the threshold b is determined by adopting an algorithm which gradually decreases from strict to wide, the value of b starts from 3,
Figure BDA0001253000280000067
starting, each iteration b is b-0.1, calculating the abnormal point rate of the monitored object, and calculating the abnormal point rate when the abnormal point rate is equal to>0%, then the b value of the subject is determined to beA controlled threshold.
Further, after determining abnormal data in the interaction data set of the first index according to the monitoring baseline, the method further includes:
if the attribute of the first index is of an absolute quantity type, determining a difference value between the abnormal data of the first index and the average value;
and carrying out downward detection analysis on the traversal factor set of the first index, determining a first target factor with contribution degree larger than a second threshold value to the difference value, and generating an abnormal analysis report related to the first target factor.
As shown in fig. 4, when the abnormal data is judged to be abnormal, the difference between the current day value and the historical mean value in the sliding window is calculated (i.e., the difference is positive or negative), and the increment is analyzed under the cause by analyzing the horizontal values of each dimension. The decomposition of the cause of the absolute quantity abnormality is shown in fig. 5, and the main steps are as follows:
c1, determination of the traversal factor set: determining a default factor set through an automatic algorithm, carrying out distribution statistics on the factor set before traversing, wherein if the classification value distribution of TOP1 of a certain factor accounts for > K% or more, the factor does not enter a traversing process (the K value is default to 90); in addition, the system provides a user-defined factor selection window, so that a user can select the analyzed factors by himself, and the downward-exploring traversal factor set is automatically configured according to the expert analysis path.
C2, determination of child node: a) automatically performing downward exploration analysis, calculating the incremental absolute quantity of each horizontal value, and traversing in 10-i dimensions to be explored; obtaining dimensions corresponding to the levels with the maximum contribution degree of all the level values to the father node under each dimension of 10-i as child nodes of the next layer; b) and determining each layer of child nodes according to a preset specified sequence for the expert analysis path and the user-defined analysis path.
C3, leaf generation: sorting the horizontal contribution degrees under the current child node dimension in a descending order, and selecting a horizontal value with the accumulated contribution degree of more than 90% or the horizontal contribution degree of more than 10% as a leaf of the decomposition of the current layer;
c4, pruning: the readability and the effectiveness of the analysis tree are improved, and when leaf nodes are too thin, the contribution degree to the root nodes cannot be summarized; when the tree is too thick, the reason for the abnormal movement is known too comprehensively; the solution is as follows: calculating the contribution degree of each layer of leaves to the root node of the father node, and continuing to probe to improve the generalizability of the result by setting the contribution degree to the father node to be more than 20% and the contribution degree to the root node to be more than 5% (threshold parameterization). And displaying the dimension information cut off due to pruning in a display index, and displaying the related contribution degree as a reference.
C5, generation of analytical tree: according to the steps 2-4, all dimensions in the factor set are traversed and decomposed under the rule of following threshold values of downward decomposition and pruning, downward analysis is completed, and a generated analysis tree is displayed.
C6, comparing the ratio index reason downward logic, when the ratio monitoring index exceeds the baseline upper and lower limit control level and is judged to be abnormal, calculating the increment difference (positive and negative) between the current value and the historical average value in the sliding window, and decomposing each level value of each dimension of the increment downward analysis in reason.
Further, if the attribute of the first index is of a relative quantity type, determining a ratio between abnormal data of the first index and the sum of historical interactive data in the fluctuation period;
and carrying out downward detection analysis on the traversal factor set of the first index, determining a second target factor with the contribution degree to the ratio being larger than a third threshold value, and generating an abnormal analysis report related to the second target factor.
Specifically, the main difference from the absolute value index is that the index value is calculated in a manner different from the manner of calculation of the absolute value, and the ratio is calculated in a manner of a ratio of the absolute value of the current level to the total amount at the current dimension, not a ratio to the relative amount of the previous level.
Furthermore, according to the needs of business, market, management layer decision and the like, the business operation KPI needs to provide a prediction function while monitoring abnormal operation of the KPI, so as to predict and decide the operation condition earlier. The method mainly comprises the steps of constructing a similarity index, taking a month as a monitoring period, monitoring whether the fluctuation of the similarity growth rate of the previous month exceeds a normal fluctuation range or not at the beginning of the month according to the real transaction data of the previous month, and automatically analyzing the reason of the abnormality when the fluctuation is judged to be abnormal by a monitoring algorithm. The geometric growth rate data has the characteristics of long time span, large geometric data difference between the whole and the local, between months, between different regions and between different industries.
And (3) carrying out downward exploration logic on abnormal reasons of the same-proportion monitoring indexes, traversing and calculating the rank of the level value of each dimension layer by layer, taking the dimension corresponding to the highest level value of the rank of the contribution degree as a father node of the next-layer decomposition, and carrying out downward exploration decomposition layer by layer to finally obtain an abnormal reason analysis tree. As shown in fig. 7, a dimension index abnormal baseline is established, that is, the month-to-month growth rate of the monitored object in the last three years is taken, and abnormal points of 1 month, 2 months and 100% of the month-to-month growth rate are removed. After the abnormal point is removed, the following three parameters are calculated: 2) e (x) mean statistics; 3) sigma (x) standard deviation statistics; 4) cv (sigma) (x)/e (x) coefficient of variation. Establishing a base line of the comparatives: the same-proportion growth rate variation detection module is provided with a baseline algorithm, a baseline 1 is directly hooked with the standard reaching value of the operation plan, and when the value is lower than the annual plan value, an abnormal alarm is triggered to enter a reason downward exploration analysis process; and the baseline 2 calculates a normal fluctuation range according to the historical fluctuation amplitude of the monitored object, and triggers an abnormal alarm to enter a reason sounding analysis process when the level of the monitored object is higher than the upper limit of the fluctuation or lower than the lower limit of the monitoring. It should be considered that the processing of the similarity index outlier: under the influence of spring festival, the jump degree of the same-proportion growth rate of 1 month and 2 months needs to be eliminated from the sample points of two months per year when calculating E (X) and sigma (X); in addition, the same-ratio growth rate of other special areas can also have larger jumps in non-1 and non-2 months, and such sample points need to be removed.
In fig. 7, the calculation of the contribution degree: in order to unify the dimension of contribution degree of each layer of dimension level and be beneficial to reducing the influence degree of each dimension level on the overall abnormity, a contribution degree algorithm is set as follows:
contribution degree to the layer: the ith dimension j level worth contribution degree (j level annual plan same-ratio growth rate x j level last year same period absolute magnitude-j level current period absolute magnitude)/(i dimension plan same-ratio growth rate x i dimension last year same period absolute magnitude-i dimension current period absolute magnitude);
the total contribution degree is the ith dimension j level value, and is the contribution degree (j level plan same-ratio growth rate, j level last year same-period absolute magnitude-j level current-period absolute magnitude)/(monitoring object plan same-ratio growth rate, monitoring last year same-period absolute magnitude-monitoring object current-period absolute magnitude).
Therefore, in the example of the transaction link index system judgment processing of the bank card provided by the embodiment of the invention, the bottleneck exists in the calculation performance, and the processing effect is difficult to meet the business requirement under the conditions of multiple abnormal points, deep lower detection level and large data packet. To solve this problem, optimization control is performed from two aspects: (D1) analyzing depth control, wherein the maximum depth of an analysis tree is the number of downward detection indexes, in example analysis, a general phenomenon is reflected, along with the increase of the depth, the contribution degree of a leaf node to the layer is more obvious in dispersion trend, the contribution degree to the whole is often less than 10%, the significance of the main reason for summarizing abnormity is not large, and therefore the depth of the analysis tree is controlled by setting a threshold value: 1) the principle of selecting the number n of leaf nodes is that the cumulative contribution degree of the n leaf nodes to the parent node is > 90%, the contribution degree of each leaf node to the parent node is > 10%, and when the contribution degree of the leaf node with the maximum contribution degree under the parent node is less than 10%, 4 leaves with the contribution degree ranking of TOP4 are selected to be displayed on the analysis tree. 2) The contribution degree of the leaf nodes to the father node is more than 20%, and the next-layer child node is required to be subjected to downward exploration analysis. 3) The detailed data of each dimension under the leaf node is embodied by a display index, and the jump-out display of each layer of node windows can be supported. 4) The new business has higher concentration in the dimensions of transaction types, channels, regions and the like, and in the first few layers of reasons of indexes, the selected key dimensions are dimensions with less classification quantity (horizontal value), in the calculation of the contribution degrees of 1-3 layers, the contribution degree of a certain horizontal value under multiple dimensions is more than 90%, even more than 100%, so that the traversal of the contribution degrees of each dimension of n-i (i is 0, 1 … 10) is performed layer by layer, the optimization can be performed in a new business module, namely, if the contribution degree of a certain horizontal value of K dimensions is 90% during the traversal of the ith layer, each dimension of K is used as a screening condition during the calculation of the contribution degree of the next layer, and the traversal calculation of the contribution degree is performed only on each remaining dimension of n-i-K, so as to provide the calculation efficiency.
Based on the abnormal points generated by the abnormal point judgment module, the filter with the same data size divides the data packet into three application processing pools by using 2G and 4G (the critical value is obtained through a performance test), and the resources at the rear end are configured in a mode of correspondingly configuring 50G, 100G and 200G respectively. Therefore, tasks can be initiated in parallel, as shown in fig. 8, the operation efficiency is greatly improved, and the bottleneck of performance and resources is solved.
Based on the same technical concept, the embodiment of the invention also provides a device for monitoring data, and the device can execute the method embodiment. As shown in fig. 9, the apparatus provided in the embodiment of the present invention includes: an obtaining unit 401, a dividing unit 402, a determining unit 403, and an exception judging unit 404, where:
an obtaining unit 401, configured to obtain historical interaction data of each monitored object in a transaction link within a set time length;
a dividing unit 402, configured to divide historical interaction data of each monitored object into historical interaction data sets of N indicators;
a determining unit 403, configured to perform the following operations on the historical interaction data set of the first index, where the first index is any one of the N indexes: determining a fluctuation period of the historical interactive data set of the first index, a mean value of the historical interactive data in the fluctuation period, and a standard deviation of the historical interactive data in the fluctuation period of the first index; determining a monitoring baseline of the first index according to the mean value and the standard deviation, wherein the monitoring baseline is used for indicating the fluctuation range of the normal historical interaction data of the first index;
an exception determining unit 404, configured to determine, according to the monitoring baseline, exception data in the interaction data set of the first indicator.
Further, the determining unit 403 is specifically configured to:
performing T test on data in the historical interactive data set of the first index to determine a sequence of K different mean values; calculating statistical correlation coefficients between the K sequences and the time attribute; when the statistical correlation coefficient is determined to be larger than a first threshold value, differentiating the K sequences until the differentiated statistical correlation coefficient is not larger than the first threshold value; and calculating the average value of the sequence after the difference, and taking the average value of the sequence after the difference as the average value of the historical interaction data in the fluctuation period.
Further, the determining unit 403 is further configured to:
determining a superposition factor of the monitoring base line according to the historical interactive data of a first time period in the set time length and the historical interactive data of a second time period in the set time length;
determining the monitoring baseline of the first index according to the superposition factor, the mean value and the standard deviation and according to a first formula, wherein the first formula is as follows:
Figure BDA0001253000280000111
wherein, the lambda is a superposition factor,
Figure BDA0001253000280000112
is the average value of the first index in the fluctuation period, sigma is the standard deviation of the first index in the fluctuation period, and b is the amplitude factor.
Further, the determining unit 403 is further configured to: if the attribute of the first index is of an absolute quantity type, determining a difference value between the abnormal data of the first index and the average value;
the device further comprises: a first positioning unit 405, configured to analyze the set of traversal factors of the first index, determine a first target factor that contributes to the difference value by more than a second threshold, and generate an anomaly analysis report related to the first target factor.
Further, the determining unit 403 is further configured to:
if the attribute of the first index is of a relative quantity type, determining the ratio of the abnormal data of the first index to the sum of historical interactive data in the fluctuation period;
the device further comprises: a second positioning unit 406, configured to analyze a traversal factor set of the first indicator in a downward manner, determine a second target factor having a contribution degree to the ratio greater than a third threshold, and generate an anomaly analysis report related to the second target factor.
In summary, in the embodiments of the present invention, interactive data of N indicators of an object to be monitored, which is data of more than one week, is obtained for a certain period of time, and then the obtained data is classified according to service indicators to obtain an interactive data set of each indicator, so that an indicator baseline corresponding to the indicator is used to determine an abnormal state of the data in the interactive data set of the indicator. The index baseline is determined according to the mean value and the standard deviation of the index data set, the acquisition period of the object to be monitored is longer than that of real-time data, and the index baseline is also a rule obtained by analyzing historical interactive data in a historical period of time, so that abnormal judgment operation is performed, the accuracy is higher, and the probability of erroneous judgment is reduced. The method for monitoring data provided by the embodiment of the invention fills up the defect of an index system of a whole link of bank card transaction; the method solves the problem of insufficient analysis performance of single-point large data packets and multi-layer levels based on Hadoop, Hive, impala and other large data processing technologies by means of double combination of services and technologies. Index design of each key link on the whole link of bank card transaction, rule self-adaption of enterprise operation indexes and an automatic reason detection processing system are realized. The method is not limited to the field of bank card transaction, and can be applied to multiple fields such as finance, manufacturing, service and the like.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. A method of monitoring data, the method comprising:
acquiring historical interaction data of each monitored object in a transaction link within a set time length;
dividing historical interactive data of each monitoring object into historical interactive data sets of N indexes;
for a historical interaction data set of a first index, performing the following operations, wherein the first index is any one of the N indexes:
determining the fluctuation period of the historical interactive data set of the first index, and performing T test on data in the historical interactive data set of the first index to determine a sequence of K different mean values; calculating statistical correlation coefficients between the K sequences and the time attribute; when the statistical correlation coefficient is determined to be larger than a first threshold value, differentiating the K sequences until the differentiated statistical correlation coefficient is not larger than the first threshold value; calculating the mean value of the sequence after the difference, taking the mean value of the sequence after the difference as the mean value of the historical interactive data in the fluctuation period, and determining the standard deviation of the historical interactive data in the fluctuation period of the first index;
determining a superposition factor according to the historical interactive data of the first time period in the set time length and the historical interactive data of the second time period in the set time length; determining a monitoring baseline of the first index according to the superposition factor, the mean value and the standard deviation, wherein the monitoring baseline is used for indicating the fluctuation range of normal historical interactive data of the first index;
and determining abnormal data in the interactive data set of the first index according to the monitoring baseline.
2. The method of claim 1, wherein determining a monitoring baseline for the first indicator based on the superposition factor, and the mean and the standard deviation comprises:
determining the monitoring baseline of the first index according to the superposition factor, the mean value and the standard deviation and according to a first formula, wherein the first formula is as follows:
Figure FDA0002742704530000011
wherein, the lambda is a superposition factor,
Figure FDA0002742704530000012
is the average value of the first index in the fluctuation period, sigma is the standard deviation of the first index in the fluctuation period, and b is the amplitude factor.
3. The method of claim 1, wherein after determining abnormal data in the set of interaction data for the first metric from the monitored baseline, further comprising:
if the attribute of the first index is of an absolute quantity type, determining a difference value between the abnormal data of the first index and the average value;
and carrying out downward detection analysis on the traversal factor set of the first index, determining a first target factor with contribution degree larger than a second threshold value to the difference value, and generating an abnormal analysis report related to the first target factor.
4. The method of claim 1, wherein after determining abnormal data in the set of interaction data for the first metric from the monitored baseline, further comprising:
if the attribute of the first index is of a relative quantity type, determining the ratio of the abnormal data of the first index to the sum of historical interactive data in the fluctuation period;
and carrying out downward detection analysis on the traversal factor set of the first index, determining a second target factor with the contribution degree to the ratio being larger than a third threshold value, and generating an abnormal analysis report related to the second target factor.
5. An apparatus for monitoring data, the apparatus comprising:
the acquisition unit is used for acquiring historical interaction data of each monitored object in a transaction link within a set time length;
the dividing unit is used for dividing the historical interactive data of each monitoring object into historical interactive data sets of N indexes;
a determining unit, configured to perform the following operations on a historical interaction data set of a first index, where the first index is any one of the N indexes: determining the fluctuation period of the historical interactive data set of the first index, and performing T test on data in the historical interactive data set of the first index to determine a sequence of K different mean values; calculating statistical correlation coefficients between the K sequences and the time attribute; when the statistical correlation coefficient is determined to be larger than a first threshold value, differentiating the K sequences until the differentiated statistical correlation coefficient is not larger than the first threshold value; calculating the mean value of the sequence after the difference, taking the mean value of the sequence after the difference as the mean value of the historical interactive data in the fluctuation period, and determining the standard deviation of the historical interactive data in the fluctuation period of the first index; determining a superposition factor according to the historical interactive data of the first time period in the set time length and the historical interactive data of the second time period in the set time length; determining a monitoring baseline of the first index according to the superposition factor, the mean value and the standard deviation, wherein the monitoring baseline is used for indicating the fluctuation range of normal historical interactive data of the first index;
and the judging unit is used for determining abnormal data in the interactive data set of the first index according to the monitoring baseline.
6. The apparatus of claim 5, wherein the determination unit is specifically configured to:
determining the monitoring baseline of the first index according to the superposition factor, the mean value and the standard deviation and according to a first formula, wherein the first formula is as follows:
Figure FDA0002742704530000031
wherein, the lambda is a superposition factor,
Figure FDA0002742704530000032
is the average value of the first index in the fluctuation period, sigma is the standard deviation of the first index in the fluctuation period, and b is the amplitude factor.
7. The apparatus of claim 5, wherein the determination unit is further to: if the attribute of the first index is of an absolute quantity type, determining a difference value between the abnormal data of the first index and the average value;
the device further comprises: and the first positioning unit is used for analyzing the traversal factor set of the first index in a downward mode, determining a first target factor with contribution degree larger than a second threshold value to the difference value and generating an abnormity analysis report related to the first target factor.
8. The apparatus of claim 5, wherein the determination unit is further to:
if the attribute of the first index is of a relative quantity type, determining the ratio of the abnormal data of the first index to the sum of historical interactive data in the fluctuation period;
the device further comprises: and the second positioning unit is used for analyzing the traversal factor set of the first index in a downward detection mode, determining a second target factor with the contribution degree to the ratio being greater than a third threshold value, and generating an abnormal analysis report related to the second target factor.
CN201710178551.0A 2017-03-23 2017-03-23 Data monitoring method and device Active CN106991145B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710178551.0A CN106991145B (en) 2017-03-23 2017-03-23 Data monitoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710178551.0A CN106991145B (en) 2017-03-23 2017-03-23 Data monitoring method and device

Publications (2)

Publication Number Publication Date
CN106991145A CN106991145A (en) 2017-07-28
CN106991145B true CN106991145B (en) 2021-03-23

Family

ID=59411781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710178551.0A Active CN106991145B (en) 2017-03-23 2017-03-23 Data monitoring method and device

Country Status (1)

Country Link
CN (1) CN106991145B (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947713B (en) * 2017-10-31 2021-08-10 北京国双科技有限公司 Log monitoring method and device
CN108228428B (en) * 2018-02-05 2021-09-21 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN108449231B (en) * 2018-03-15 2020-07-07 华青融天(北京)软件股份有限公司 Transaction data filtering method and device and implementation device
CN108775914B (en) * 2018-05-07 2020-09-22 青岛海信网络科技股份有限公司 Traffic equipment detection method and detection equipment
CN108718303B (en) * 2018-05-09 2021-03-23 北京仁和诚信科技有限公司 Safe operation and maintenance management method and system
CN108923996B (en) * 2018-05-11 2021-01-05 中国银联股份有限公司 Capacity analysis method and device
CN108682088A (en) * 2018-05-14 2018-10-19 平安科技(深圳)有限公司 Based on the cross-border determination method and device merchandised extremely of ATM
CN108829638B (en) * 2018-06-01 2022-12-16 创新先进技术有限公司 Business data fluctuation processing method and device
CN109034252B (en) * 2018-08-01 2020-10-30 中国科学院大气物理研究所 Automatic identification method for monitoring data abnormity of air quality station
CN109241043B (en) * 2018-08-13 2022-10-14 蜜小蜂智慧(北京)科技有限公司 Data quality detection method and device
CN111047125B (en) * 2018-10-11 2023-11-14 鸿富锦精密电子(成都)有限公司 Product failure analysis apparatus, method, and computer-readable storage medium
CN109634997A (en) * 2018-11-16 2019-04-16 北京奇虎科技有限公司 A kind of acquisition methods, device and the electronic equipment of unusual fluctuation channel
CN109635265A (en) * 2018-11-29 2019-04-16 济南荣耀合创电力科技有限公司 A kind of test report generation system based on image recognition
CN111899040B (en) * 2019-05-05 2023-09-01 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for detecting target object abnormal propagation
CN110784355B (en) * 2019-10-30 2022-03-08 网宿科技股份有限公司 Fault identification method and device
CN110990242B (en) * 2019-11-29 2023-06-20 上海观安信息技术股份有限公司 Method and device for determining fluctuation abnormality of user operation times
CN111191881B (en) * 2019-12-13 2024-05-14 大唐东北电力试验研究院有限公司 Thermal power generating unit industrial equipment state monitoring method based on big data
CN111209165B (en) * 2020-01-05 2021-03-16 光大兴陇信托有限责任公司 Two-stage monitoring processing method based on channel
CN111290916B (en) * 2020-02-18 2022-11-25 深圳前海微众银行股份有限公司 Big data monitoring method, device and equipment and computer readable storage medium
CN112037050A (en) * 2020-09-03 2020-12-04 中国银行股份有限公司 Transaction data monitoring method, device and equipment
CN112597144B (en) * 2020-12-29 2022-11-08 农业农村部环境保护科研监测所 Automatic cleaning method for production place environment monitoring data
CN112801345A (en) * 2021-01-07 2021-05-14 山东润一智能科技有限公司 Equipment measuring point time interval early warning method and system based on expectation and fluctuation
CN113067747A (en) * 2021-03-15 2021-07-02 中国工商银行股份有限公司 Link abnormity tracing method, cluster, node and system
WO2023019560A1 (en) * 2021-08-20 2023-02-23 京东方科技集团股份有限公司 Data processing method and apparatus, electronic device and computer-readable storage medium
CN114492529B (en) * 2022-01-27 2022-12-13 中国汽车工程研究院股份有限公司 Power battery system connection abnormity fault safety early warning method
CN114978863B (en) * 2022-05-17 2024-03-01 安天科技集团股份有限公司 Data processing method, device, computer equipment and readable storage medium
CN117198031B (en) * 2023-11-03 2024-01-26 浙江华东岩土勘察设计研究院有限公司 Platform state monitoring and early warning method based on security envelope strategy

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103236953A (en) * 2012-10-30 2013-08-07 吉林大学 Active monitoring method for IP (internet protocol) bearer network performance indexes based on fuzzy time series prediction model
CN103036741B (en) * 2012-12-19 2016-02-03 北京神州绿盟信息安全科技股份有限公司 The defining method of flow monitoring baseline and device
CN106446021B (en) * 2013-06-24 2019-08-02 北京奇虎科技有限公司 A kind of method and system of anomaly data detection processing
CN103532940B (en) * 2013-09-30 2016-06-08 广东电网公司电力调度控制中心 network security detection method and device
CN104598361B (en) * 2013-10-31 2018-12-14 华为技术有限公司 A kind of method for monitoring performance and device
US10033752B2 (en) * 2014-11-03 2018-07-24 Vectra Networks, Inc. System for implementing threat detection using daily network traffic community outliers
CN105743720B (en) * 2014-12-08 2018-12-07 中国移动通信集团设计院有限公司 A kind of link-quality-evaluating method and its device
CN105589796A (en) * 2014-12-31 2016-05-18 中国银联股份有限公司 Method for monitoring information interaction data anomalies
CN105049291B (en) * 2015-08-20 2019-01-04 广东睿江云计算股份有限公司 A method of detection exception of network traffic
CN105654381A (en) * 2015-12-28 2016-06-08 上海瀚银信息技术有限公司 Predicting system for business transaction volume
CN105678414A (en) * 2015-12-31 2016-06-15 远光软件股份有限公司 Data processing method of predicting resource consumption
CN106202389B (en) * 2016-07-08 2020-02-07 中国银联股份有限公司 Anomaly monitoring method and device based on transaction data
CN106371092B (en) * 2016-08-25 2018-10-23 中国科学院国家授时中心 It is a kind of that the deformation monitoring method adaptively combined is observed with strong-motion instrument based on GPS
CN106368816B (en) * 2016-10-27 2018-09-25 中国船舶工业系统工程研究院 A kind of online method for detecting abnormality of marine low speed diesel engine based on baseline offset

Also Published As

Publication number Publication date
CN106991145A (en) 2017-07-28

Similar Documents

Publication Publication Date Title
CN106991145B (en) Data monitoring method and device
CN111459778B (en) Operation and maintenance system abnormal index detection model optimization method, device and storage medium
CN106951984A (en) A kind of dynamic analyzing and predicting method of system health degree and device
US20140310235A1 (en) Seasonal trending, forecasting, anomaly detection, and endpoint prediction of java heap usage
CN109376924A (en) A kind of method, apparatus, equipment and the readable storage medium storing program for executing of material requirements prediction
CN106886481A (en) A kind of system health degree static analysis Forecasting Methodology and device
RU2716029C1 (en) System for monitoring quality and processes based on machine learning
WO2016149906A1 (en) Analyzing equipment degradation for maintaining equipment
CN112148561B (en) Method and device for predicting running state of business system and server
CN111179591A (en) Road network traffic time sequence characteristic data quality diagnosis and restoration method
CA3186873A1 (en) Activity level measurement using deep learning and machine learning
CN114117355A (en) Optimization method, system, equipment and readable storage medium of time-varying-resistance model
CN115409120A (en) Data-driven-based auxiliary user electricity stealing behavior detection method
US20220398604A1 (en) Systems and methods for dynamic cash flow modeling
CN116203352A (en) Fault early warning method, device, equipment and medium for power distribution network
CN113742118B (en) Method and system for detecting anomalies in data pipes
CN115293284A (en) Transaction abnormity detection method and device
CN111199419B (en) Stock abnormal transaction identification method and system
CN113869423A (en) Marketing response model construction method, equipment and medium
Ahmed et al. Forecasting GDP Of Bangladesh Using Time Series Analysis
US20230120896A1 (en) Systems and methods for detecting modeling errors at a composite modeling level in complex computer systems
CN114968744B (en) Implementation method and system based on financial industry capacity management prediction analysis AI algorithm
Züfle Dynamic Hybrid Forecasting for Self-Aware Systems
CN113902496B (en) Data analysis method and device and electronic equipment
Kovářík et al. The effect of autocorrelation on control charts performance and process capability indices calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant