CN110457367B - Method and system for discovering data transaction - Google Patents

Method and system for discovering data transaction Download PDF

Info

Publication number
CN110457367B
CN110457367B CN201910677352.3A CN201910677352A CN110457367B CN 110457367 B CN110457367 B CN 110457367B CN 201910677352 A CN201910677352 A CN 201910677352A CN 110457367 B CN110457367 B CN 110457367B
Authority
CN
China
Prior art keywords
data
threshold
float
sliding window
median
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910677352.3A
Other languages
Chinese (zh)
Other versions
CN110457367A (en
Inventor
周群
毛佩瑶
杜成宝
毛德峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201910677352.3A priority Critical patent/CN110457367B/en
Publication of CN110457367A publication Critical patent/CN110457367A/en
Application granted granted Critical
Publication of CN110457367B publication Critical patent/CN110457367B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries

Abstract

The present disclosure relates to a method and system for discovering data transactions, the method comprising: receiving a data sequence from a traffic scenario; applying a sliding window to the data sequence; determining a median of the data within the sliding window; comparing each data point of the data within the sliding window to the median to determine if the data point is within a range of an up-float threshold and a down-float threshold of the median to determine if there is a data transaction; and sliding the sliding window over the data sequence with a particular step size to take a next set of data and repeating the steps of determining a median and determining whether there is a data transaction above, wherein at least one of the sliding window size, the step size, the float threshold, and the float threshold is specific to data originating from different traffic scenarios.

Description

Method and system for discovering data transaction
Technical Field
The invention relates to a method and a system for discovering data transaction.
Background
With the continuous progress and development of computer technology, the computer technology is continuously extended into various aspects of social life, various intelligent self-service systems, intelligent question-answering systems and the like are generated, so that various businesses enter the era of digitization and electronization. For example, intelligent question-answering systems are commonly used to provide assistance and services to users, thereby conserving human resources, improving automation of information processing, and reducing operating costs.
For intelligent question-answering systems, the user consultation volume may be increased by the influence of limited time activity-like services or new services on-line; the amount of user consultation may be suddenly reduced, affected by system anomalies. Such data transaction conditions may be detrimental to the services provided by the intelligent question-answering system. Therefore, it is necessary to find data abnormality such as a change in the consultation quantity in real time and find the cause of the abnormality accordingly in order to take a corresponding coping strategy.
The existing real-time monitoring platform is limited to using simple rules to discover abnormal actions, and generally has the following two modes: 1. the data volume is equal to or the ring ratio exceeds a certain amplitude, and the data volume is regarded as abnormal movement; 2. the variability is found by adding several standard deviations to the mean. A common disadvantage of both approaches is that their logic is fixed and cannot be automatically optimized over time.
For example, for the first approach (i.e., the same ratio exceeds a certain amplitude), the disadvantage is: 1. a fixed threshold value is required to be set, and the threshold value is not provided with a reference standard; 2. if abnormality occurs on the same day, the rising amplitude is higher than a set threshold value and is regarded as abnormal movement; however, when the normal state is restored the next day, the decrease amplitude is too large, and the state is also regarded as abnormal state, so that false alarm is generated.
For the second approach (i.e., mean plus standard deviation), the disadvantage is: 1. the statistically common way of determining the transaction is only applicable to data cases subject to normal distribution, and thus uses the field Jing Shouxian; 2. if a certain abnormal point deviates too much from the normal point, the average value is pulled up, so that other abnormal points which deviate not so much are regarded as normal points, and the condition of missing report is generated.
Thus, there is a need for an improvement over the above and other drawbacks of the prior art.
Disclosure of Invention
According to a first aspect of the present disclosure, there is provided a method of discovering data transactions, comprising: receiving a data sequence from a traffic scenario; applying a sliding window to the data sequence; determining a median of the data within the sliding window; comparing each data point of the data within the sliding window to the median to determine if the data point is within a range of an up-float threshold and a down-float threshold of the median to determine if there is a data transaction; and sliding the sliding window over the data sequence with a particular step size to take a next set of data and repeating the steps of determining a median and determining whether there is a data transaction above, wherein at least one of the sliding window size, the step size, the float threshold, and the float threshold is specific to data originating from different traffic scenarios.
According to an embodiment, the method comprises determining that the data point is a transaction data point when the ratio of the difference value obtained by subtracting the median from the data point is greater than the floating threshold of the median or the ratio of the difference value obtained by subtracting the median from the data point is greater than the floating threshold of the median.
According to a further embodiment, the method further comprises, upon determining that a data transaction is present, recording and/or informing the user of such data transaction.
According to a further embodiment, the method further comprises receiving feedback from the user on data transactions and adjusting at least one of the size of the sliding window, the step size, the float threshold, and the float threshold based on the feedback.
According to a further embodiment, the method further comprises upon receiving feedback from the user regarding data recall, adjusting at least one of the size of the sliding window, the step size, the float-up threshold, and the float-down threshold such that the data point is no longer determined to be a transaction data point.
According to a further embodiment, the size of the sliding window, the step size, the float-up threshold and the float-down threshold are predefined or obtained by a training process based on historical data.
According to a further embodiment, the history data is data transaction tagged history data, the training process comprising: training on the historical data by using grid search and using various combinations of different values of sliding window size, step length, floating threshold and floating threshold so as to mark data transaction in the historical data; comparing the marked data transaction with the data transaction tag of the historical data to obtain the currently used sliding window size, step length, floating threshold value and data transaction recall rate and precision rate under the floating threshold value; the set of sliding window sizes, step sizes, float thresholds, and float thresholds that use data transaction tags that match the historical data or that have the highest data transaction accuracy or lowest data transaction recall.
According to a further embodiment, the data sequence is a real-time data stream and the method is performed in real-time.
According to a second aspect of the present disclosure, there is provided a system for discovering data transactions, comprising: a receiving component configured to receive a data sequence originating from a traffic scenario; a sliding window assembly configured to slide the sliding window over the data sequence in a particular step size to apply a sliding window to the data sequence; a median component configured to determine a median of data within the sliding window; a comparison component configured to compare each data point of data within the sliding window to the median to determine whether the data point is within a range of an up-float threshold and a down-float threshold of the median to determine whether there is a data transaction, wherein at least one of the sliding window size, the step size, the up-float threshold, and the down-float threshold is different from data originating from different traffic scenarios.
According to an embodiment, the comparison component is further configured to determine that the data point is a transaction data point when a ratio of the difference of the data point minus the median to the median is greater than an up-float threshold of the median or a ratio of the difference of the median minus the data point to the median is greater than a down-float threshold of the median.
According to a further embodiment, the system further comprises a notification component configured to notify the user of a data transaction after the comparison component determines that such a data transaction exists.
According to a further embodiment, the system further comprises an adjustment component configured to receive feedback on data transactions from the user and adjust at least one of the size of the sliding window, the step size, the float threshold, and the float threshold based on the feedback.
According to a further embodiment, the adjustment component is further configured to adjust at least one of the size of the sliding window, the step size, the float-up threshold, and the float-down threshold upon receiving feedback from the user regarding data transaction recall such that the data point is no longer determined to be a transaction data point.
According to a further embodiment, the size of the sliding window, the step size, the float-up threshold and the float-down threshold are predefined or obtained by a training process based on historical data.
According to a further embodiment, the history data is data transaction tagged history data, the training process comprising: training on the historical data by using grid search and using various combinations of different values of sliding window size, step length, floating threshold and floating threshold so as to mark data transaction in the historical data; comparing the marked data transaction with the data transaction tag of the historical data to obtain the currently used sliding window size, step length, floating threshold value and data transaction recall rate and precision rate under the floating threshold value; the set of sliding window sizes, step sizes, float thresholds, and float thresholds that use data transaction tags that match the historical data or that have the highest data transaction accuracy or lowest data transaction recall.
According to a further embodiment, the data sequence is a real-time data stream.
According to a third aspect of the present disclosure, there is provided a system for discovering data transactions, comprising: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method according to the first aspect of the present disclosure.
Aspects generally include a method, apparatus, system, computer program product, and processing system substantially as described herein with reference to and as illustrated by the accompanying drawings.
The foregoing has outlined rather broadly the features and technical advantages of examples in accordance with the present disclosure in order that the detailed description that follows may be better understood. Additional features and advantages will be described hereinafter. The disclosed concepts and specific examples may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure. Such equivalent constructions do not depart from the scope of the appended claims. The features of the concepts disclosed herein, both as to their organization and method of operation, together with associated advantages, will be better understood from the following description when considered in connection with the accompanying drawings. Each of the figures is provided for the purpose of illustration and description and is not intended to limit the claims.
Drawings
So that the manner in which the above recited features of the present disclosure can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to aspects, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only certain typical aspects of this disclosure and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects. The same reference numbers in different drawings may identify the same or similar elements.
Fig. 1 is a flow chart illustrating an example method of discovering data transactions in accordance with aspects of the present disclosure.
Fig. 2 is a block diagram illustrating an example system of discovery data transaction in accordance with aspects of the present disclosure.
Fig. 3 is a schematic diagram illustrating another example system of discovery data transaction in accordance with aspects of the present disclosure.
Detailed Description
The detailed description set forth below in connection with FIGS. 1-3 is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of the various concepts. It will be apparent, however, to one skilled in the art that these concepts may be practiced without these specific details.
As shown in fig. 1, an example method 100 of discovery data transaction in accordance with aspects of the present disclosure is illustrated. The method 100 may include receiving a data sequence originating from a traffic scenario at block 110. For example, there may be various business scenarios of credit card repayment, financial product revenue consultation, telecommunication/telephone packages, etc., and the intelligent question-answering system applied to these business scenarios may generate corresponding data sequences (i.e., data streams). The method 100 may receive such a data sequence. In an example, the data sequence is real-time data or historical data of a business scenario.
At block 120, the method 100 may include applying a sliding window to the data sequence. In this example, applying the sliding window to the data sequence includes applying the sliding window to the accumulated data sequence of the real-time data stream to determine data that falls within the sliding window. For example, where the size of the sliding window is N (where N is any natural number), applying the sliding window to the data sequence may result in N data in the data sequence that fall within the sliding window.
At block 130, the method 100 may include determining a median of the data within the sliding window. It will be appreciated by those skilled in the art that the median is the number in the middle of a set of data in a sequential order, and if the set of data includes an even number, the average of the two numbers in the middle may be returned.
In an embodiment, the determination of the median is made based on a certain indicator of the data. For example, in a process asset benefit advisory scenario, the data indicator may be an amount of advisory per unit time within the sliding window, in which case the median may be the median of the amounts of advisory per unit time within the sliding window.
At block 140, the method 100 may include comparing each data point of the data within the sliding window to the median to determine whether the data point is within a float threshold and a float threshold of the median to determine whether a data transaction exists.
In one embodiment, a data point is determined to be a transaction data point when the data point exceeds the float threshold or float threshold of the median. For example, the float threshold and/or the float threshold may be set as a ratio of the difference between the data point and the median to the median threshold (e.g., 20%, 25%, respectively). In this example, the data point may be determined to be a transaction data point when the ratio of the difference of the median subtracted from the data point to the median is greater than the float threshold for the median (i.e., 20% in the above example) or the ratio of the difference of the median subtracted from the data point to the median is greater than the float threshold for the median (i.e., 25% in the above example).
Assuming an example, in the case of a physical property return consultation scenario, the consultation amount per unit time within the sliding window is {25,30,25,40,21}, the median is 25. With the float threshold set to 20% and the float threshold set to 30%, the data point {40} may be determined to be a transaction data point because (40-25)/25=60% (this is greater than the float threshold of 20%). While for data point {21}, data point {21} may be determined to not be a transaction data point because of (25-21)/25 = 16% (this is less than the 30% float threshold). Similarly, data points {25}, {30} are not transaction data points.
In an embodiment, if it is determined that a data transaction exists, the method 100 may further include notifying the user of the data transaction. Those skilled in the art will appreciate that the user may be notified in various ways, such as sending an email to the user, a short message, making a user phone call, popping up a prompt box on the user's computing device, and so forth. In this embodiment, the user may make a determination of the data transaction based on this notification. For example, the user may determine whether the data transaction does exist by investigating the corresponding data, and the user may feed back this determination.
For example, if the user determines that the data transaction is not a true data transaction, feedback regarding the data transaction recall may be sent at the user.
In this embodiment, the method 100 may further include, after receiving feedback on the data transaction from the user, adjusting at least one of a size of the sliding window, a step size, an up-float threshold, and a down-float threshold based on such feedback. For example, upon receiving feedback from a user regarding a data transaction recall, method 100 may include adjusting at least one of a size of a sliding window, a step size, an up-float threshold, and a down-float threshold such that the data point is no longer determined to be a transaction data point, i.e., recall of such data transaction.
In one example, the adjustment may be performed using a training process as described below. In this example, the adjustment may be made upon receiving user feedback or may be made after a predetermined number of user feedback is received.
At block 150, the method 100 may include sliding the sliding window over the data sequence by a particular step size to take a set of data and repeating the steps in blocks 130, 140.
In an example, where the data sequence is a real-time data stream originating from a traffic scene, the method 100 may wait for this real-time data stream to fill the sliding window and then begin repeating the steps in blocks 130, 140.
In the case where the data sequence is historical data of a business scenario, if the last data of the historical data does not exactly fill a sliding window (e.g., after sliding the sliding window over the historical data in a particular step size, the last data point of the historical data does not fall on the last bit of the sliding window), the sliding window may be rolled back a particular number of data points such that the last data point of the historical data falls exactly on the last bit of the sliding window. Alternatively, in another example, the last partial data point of the history data that is insufficient to fill the sliding window may be discarded directly.
In an embodiment, at least one of the size of the sliding window, the step size, the float threshold, and the float threshold is different from data originating from different scenes. In general, the magnitude of their respective data may be different for different scenarios, and the periodicity of the data, etc. may also be different, and thus the applicable sliding window size, step size, float threshold, and float threshold may also be different. For example, for the products of the ant gold company, since payouts are generally concentrated on 10 per month, the size of the sliding window may be set to around one month, and the step size of the sliding window may be set to be one month or less (e.g., one day, one week, 10 days, one month, etc.) accordingly; for balance treasures revenue consultation, which is typically focused on every monday, the size of the sliding window may be set to around one week, and the step size of the sliding window may be set to be less than or equal to one week (e.g., one day, two days, one week, etc.) accordingly.
In an embodiment, the size of the sliding window, the step size, the float threshold, and the float threshold are predefined. For example, the user may make corresponding predefine of the size of the sliding window, the step size, and the threshold without any available historical data.
In another embodiment, the size of the sliding window, the step size, the float threshold, and the float threshold are obtained through a training process based on historical data. In this embodiment, the history data is data transaction tagged history data, and the training process includes: training on the historical data by using grid search and using various combinations of different values of sliding window size, step length, floating threshold and floating threshold so as to mark data transaction in the historical data; comparing the marked data transaction with the data transaction tag of the historical data to obtain the currently used sliding window size, step length, floating threshold value and data transaction recall rate and precision rate under the floating threshold value; the set of sliding window sizes, step sizes, float thresholds, and float thresholds that use data transaction tags that match the historical data or that have the highest data transaction accuracy or lowest data transaction recall.
By way of illustration, one specific example of this training process is given below in connection with a consultation scenario. Those skilled in the art will appreciate that this example is provided for illustrative purposes only. It will be appreciated that for simplicity, in some examples, the step size of the sliding window is set to 2. However, as mentioned above, the step size of the sliding window may also be set to any size and may also be trained by a training process.
For a given one of the time series data (e.g., time series data ending at 10:05 am for 30 days or more of accumulated consultation volume),
setting a sliding window size range [7,21], an upward floating threshold value range [0.2%,1.5% ], a downward floating threshold value range [0.2%,1.5% ];
and traversing the sliding window range, the floating threshold range and the floating threshold range by using grid search at 1, 0.05% and 0.05% step sizes respectively, calculating the median on given time sequence data respectively, comparing the time sequence data in each sliding window with the median in the window to determine whether the time sequence data exceeds the floating threshold or the floating threshold, and marking corresponding data points as outlier points or non-outlier points in sequence. And sliding a window forwards according to the step length of 2, and marking whether the data point is abnormal or not until all the data points are marked. Then, comparing the labeling result with an abnormal value label of the historical time sequence data to obtain abnormal recall rate and precision rate of the sliding window size, the floating threshold value and the floating threshold value; and then iteratively searching for the next set of sliding window size, floating up threshold and floating down threshold (i.e., sliding window size increases from 7 to 21 in sequence by 1, floating up threshold and floating down threshold increases from 0.2% to 1.5% in sequence by 0.05%) until the search is completed.
Finally, the set of parameter values (i.e., the combination of the sliding window size, the floating-up threshold, and the floating-down threshold) with the highest anomaly annotation precision or lowest anomaly annotation recall or most consistent with the historical time series data anomaly value tags are used as the optimal parameters and stored along with the time, data index, and scene to which the time series data corresponds for use in the methods of the present disclosure.
Fig. 2 is a block diagram illustrating an example system 200 of discovery data transaction in accordance with aspects of the present disclosure.
As shown in fig. 2, a system 200 for discovering data transactions may include a receiving component 202 configured to receive a sequence of data originating from a traffic scenario; a sliding window component 204 configured to slide the sliding window over the data sequence in a particular step size to apply a sliding window to the data sequence; a median component 206 configured to determine a median of data within the sliding window; and a comparison component 208 configured to compare each data point of the data within the sliding window to the median to determine whether the data point is within a range of the float threshold and float threshold of the median to determine whether a data transaction exists.
In an embodiment, at least one of the size of the sliding window, the step size, the float threshold, and the float threshold is different from data originating from different traffic scenarios.
In another embodiment, the comparison component 208 is further configured to determine that the data point is a transaction data point when the ratio of the difference of the data point minus the median to the median is greater than an up-float threshold of the median, or the ratio of the difference of the median minus the data point to the median is greater than a down-float threshold of the median.
In yet another embodiment, the system 200 can optionally further include a notification component 210 configured to notify a user of a data transaction after the comparison component determines that such a data transaction exists.
In yet another embodiment, system 200 can also optionally include an adjustment component 212 configured to receive feedback from the user on data transactions and adjust at least one of the size of the sliding window, the step size, the float threshold, and the float threshold based on the feedback. In this embodiment, adjustment component 212 may be further configured to adjust at least one of the size of the sliding window, the step size, the float-up threshold, and the float-down threshold upon receiving feedback from the user regarding data transaction recall such that the data point is no longer determined to be a transaction data point.
In a further embodiment, the size of the sliding window, the step size, the float-up threshold and the float-down threshold are predefined or obtained through a training process based on historical data. In this embodiment, the history data is data transaction tagged history data, and the training process includes: training on the historical data by using grid search and using various combinations of different values of sliding window size, step length, floating threshold and floating threshold so as to mark data transaction in the historical data; comparing the marked data transaction with the data transaction tag of the historical data to obtain the currently used sliding window size, step length, floating threshold value and data transaction recall rate and precision rate under the floating threshold value; the set of sliding window sizes, step sizes, float thresholds, and float thresholds that use data transaction tags that match the historical data or that have the highest data transaction accuracy or lowest data transaction recall.
Fig. 3 is a schematic diagram illustrating an example system 300 of discovery data transaction in accordance with aspects of the present disclosure. As shown, system 300 includes a processor 305 and a memory 310. Memory 310 stores computer executable instructions that are executable by processor 305 to implement the method described above in connection with fig. 1.
As described above, in the methods and systems of the present disclosure, the sliding window size, sliding step size, float-up threshold, float-down threshold, etc. may not be fixed, but may be dynamically adjustable over time, thereby being well suited for multi-scenario data transaction detection.
Meanwhile, the method and the system of the present disclosure utilize a sliding median method to train iteratively through historical data or real-time data to obtain the most suitable sliding window size, sliding step length, floating threshold and floating threshold, thereby overcoming the problem of different period fluctuation amplitude caused by long time sequence variance, and greatly improving the accuracy and recall rate of local short-term (for example, near one month or near two weeks, etc.) anomaly detection. Moreover, the method and system of the present disclosure can also iteratively learn based on user feedback to optimize detection performance. Thus, the methods and systems of the present disclosure are applicable to all of the various data, and are not limited to data subject to normal distribution. Also, the median method is more stable than the mean and is not affected by discrete values individually.
The above detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings illustrate specific embodiments that can be practiced by way of illustration. These embodiments are also referred to herein as "examples". Such examples may include elements other than those shown or described. However, examples including the elements shown or described are also contemplated. Moreover, it is also contemplated that examples using any combination or permutation of those elements shown or described, or with reference to specific examples (or one or more aspects thereof) shown or described herein, or with reference to other examples (or one or more aspects thereof) shown or described herein.
In the appended claims, the terms "including" and "comprising" are open-ended, i.e., a system, apparatus, article, or process of claim that is defined to be within the scope of the claim, except for those elements recited after such term. Furthermore, in the appended claims, the terms "first," "second," and "third," etc. are used merely as labels, and are not intended to indicate the numerical order of their objects.
In addition, the order of the operations illustrated in the present specification is exemplary. In alternative embodiments, the operations may be performed in a different order than shown in the figures, and the operations may be combined into a single operation or split into more operations.
The above description is intended to be illustrative, and not restrictive. For example, the examples described above (or one or more aspects thereof) may be used in connection with other embodiments. Other embodiments may be used, such as by one of ordinary skill in the art after reviewing the above description. The abstract allows the reader to quickly ascertain the nature of the technical disclosure. This Abstract is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Furthermore, in the above detailed description, various features may be grouped together to streamline the disclosure. However, the claims may not state every feature disclosed herein, as embodiments may characterize a subset of the features. Further, embodiments may include fewer features than are disclosed in the specific examples. Thus the following claims are hereby incorporated into the detailed description, with one claim standing on its own as a separate embodiment. The scope of the embodiments disclosed herein should be determined with reference to the appended claims, along with the full range of equivalents to which such claims are entitled.

Claims (10)

1. A method of discovering data transactions, comprising:
receiving a data sequence from a traffic scenario;
applying a sliding window to the data sequence;
determining a median of the data within the sliding window;
comparing each data point of the data within the sliding window to the median to determine if the data point is within a range of an up-float threshold and a down-float threshold of the median to determine if there is a data transaction; and
sliding the sliding window over the data sequence with a particular step size to remove a set of data and repeating the steps of determining a median and determining whether there is a data transaction above, wherein at least one of the sliding window size, the step size, the float threshold, and the float threshold is specific to data originating from different traffic scenarios, the sliding window size, the step size, the float threshold, and the float threshold being obtained through a training process based on historical data;
the method further comprises the steps of:
after determining that there is a data transaction, notifying the user of the data transaction;
receiving feedback from the user on data transactions and adjusting at least one of the size of the sliding window, the step size, the float threshold, and the float threshold based on the feedback;
the history data is the history data with data transaction labels, and the training process comprises the following steps:
training on the historical data by using grid search and using various combinations of different values of sliding window size, step length, floating threshold and floating threshold so as to mark data transaction in the historical data;
comparing the marked data transaction with the data transaction tag of the historical data to obtain the currently used sliding window size, step length, floating threshold value and data transaction recall rate and precision rate under the floating threshold value;
the set of sliding window sizes, step sizes, float thresholds, and float thresholds that use data transaction tags that match the historical data or that have the highest data transaction accuracy or lowest data transaction recall.
2. The method of claim 1, wherein the data point is determined to be a transaction data point when a ratio of the difference of the data point minus the median to the median is greater than an up-float threshold of the median or a ratio of the difference of the median minus the data point to the median is greater than a down-float threshold of the median.
3. The method of claim 2, wherein after determining that a data transaction exists, recording the data transaction.
4. The method of claim 3, wherein the adjusting comprises adjusting at least one of the size of the sliding window, the step size, the float-up threshold, and the float-down threshold after receiving feedback from the user regarding data recall such that the data point is no longer determined to be a data point of transaction.
5. The method of claim 1, wherein the data sequence is a real-time data stream, and the method is performed in real-time.
6. A system for discovering data transactions, comprising:
a receiving component configured to receive a data sequence originating from a traffic scenario;
a sliding window assembly configured to slide the sliding window over the data sequence in a particular step size to apply a sliding window to the data sequence;
a median component configured to determine a median of data within the sliding window;
a comparison component configured to compare each data point of data within the sliding window to the median to determine whether the data point is within a range of an up-float threshold and a down-float threshold of the median to determine whether there is a data transaction, wherein at least one of the sliding window size, the step size, the up-float threshold, and the down-float threshold is different from data originating from different traffic scenarios, the sliding window size, the step size, the up-float threshold, and the down-float threshold being obtained through a training process based on historical data;
further comprises:
a notification component configured to notify a user of a data transaction after the comparison component determines that there is such a data transaction;
an adjustment component configured to receive feedback from the user on data transactions and adjust at least one of the size of the sliding window, the step size, the float threshold, and the float threshold based on the feedback;
the history data is the history data with data transaction labels, and the training process comprises the following steps:
training on the historical data by using grid search and using various combinations of different values of sliding window size, step length, floating threshold and floating threshold so as to mark data transaction in the historical data;
comparing the marked data transaction with the data transaction tag of the historical data to obtain the currently used sliding window size, step length, floating threshold value and data transaction recall rate and precision rate under the floating threshold value;
the set of sliding window sizes, step sizes, float thresholds, and float thresholds that use data transaction tags that match the historical data or that have the highest data transaction accuracy or lowest data transaction recall.
7. The system of claim 6, wherein the comparison component is further configured to determine that the data point is a transaction data point when a ratio of a difference of the data point minus the median to the median is greater than an up-float threshold of the median or a ratio of the difference of the median minus the data point to the median is greater than a down-float threshold of the median.
8. The system of claim 7, wherein the adjustment component is further configured to adjust at least one of the size of the sliding window, the step size, the float-up threshold, and the float-down threshold upon receiving feedback from the user regarding data recall such that the data point is no longer determined to be a transaction data point.
9. The system of claim 7, wherein the data sequence is a real-time data stream.
10. A system for discovering data transactions, comprising:
a processor; and
a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the method of any of claims 1-5.
CN201910677352.3A 2019-07-25 2019-07-25 Method and system for discovering data transaction Active CN110457367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910677352.3A CN110457367B (en) 2019-07-25 2019-07-25 Method and system for discovering data transaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910677352.3A CN110457367B (en) 2019-07-25 2019-07-25 Method and system for discovering data transaction

Publications (2)

Publication Number Publication Date
CN110457367A CN110457367A (en) 2019-11-15
CN110457367B true CN110457367B (en) 2023-10-27

Family

ID=68483435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910677352.3A Active CN110457367B (en) 2019-07-25 2019-07-25 Method and system for discovering data transaction

Country Status (1)

Country Link
CN (1) CN110457367B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113110887B (en) * 2021-03-31 2023-07-21 联想(北京)有限公司 Information processing method, device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631681A (en) * 2013-12-10 2014-03-12 国家电网公司 Method for online restoring abnormal data of wind power plant
CN107528722A (en) * 2017-07-06 2017-12-29 阿里巴巴集团控股有限公司 Abnormal point detecting method and device in a kind of time series
CA2990262A1 (en) * 2017-05-05 2018-11-05 Servicenow, Inc. Systems and methods for anomaly detection
CN108763346A (en) * 2018-05-15 2018-11-06 中南大学 A kind of abnormal point processing method of sliding window box figure medium filtering
CN108804037A (en) * 2018-05-29 2018-11-13 郑州云海信息技术有限公司 The method and system of storage device History Performance Data are handled based on box figure
CN109727446A (en) * 2019-01-15 2019-05-07 华北电力大学(保定) A kind of identification and processing method of electricity consumption data exceptional value

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7275058B2 (en) * 2003-02-18 2007-09-25 Seiko Epson Corporation Method and system for finding a k order statistic in a union of sorted sets
US7917338B2 (en) * 2007-01-08 2011-03-29 International Business Machines Corporation Determining a window size for outlier detection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631681A (en) * 2013-12-10 2014-03-12 国家电网公司 Method for online restoring abnormal data of wind power plant
CA2990262A1 (en) * 2017-05-05 2018-11-05 Servicenow, Inc. Systems and methods for anomaly detection
CN107528722A (en) * 2017-07-06 2017-12-29 阿里巴巴集团控股有限公司 Abnormal point detecting method and device in a kind of time series
CN108763346A (en) * 2018-05-15 2018-11-06 中南大学 A kind of abnormal point processing method of sliding window box figure medium filtering
CN108804037A (en) * 2018-05-29 2018-11-13 郑州云海信息技术有限公司 The method and system of storage device History Performance Data are handled based on box figure
CN109727446A (en) * 2019-01-15 2019-05-07 华北电力大学(保定) A kind of identification and processing method of electricity consumption data exceptional value

Also Published As

Publication number Publication date
CN110457367A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
CN108537544B (en) Real-time monitoring method and monitoring system for transaction system
CN107809331A (en) The method and apparatus for identifying abnormal flow
US20060098647A1 (en) Monitoring and reporting enterprise data using a message-based data exchange
CN106355414A (en) Method and apparatus for processing user feedback information
CN107146012A (en) Risk case processing method and system
CN110401660B (en) False flow identification method and device, processing equipment and storage medium
CN111897705A (en) Service state processing method, service state processing device, model training method, model training device, equipment and storage medium
CN106845881A (en) A kind of detection method of stock abnormal data, device and electronic equipment
CN110457367B (en) Method and system for discovering data transaction
US20190220780A1 (en) Quantitative discovery of name changes
CN113469578A (en) Multi-objective optimization-based business strategy generation method, device and system
CN113342625A (en) Data monitoring method and system
CN115953021B (en) Vendor risk analysis method and device based on machine learning
CN110910061A (en) Material management method, material management system, storage medium and electronic equipment
US20230230021A1 (en) System and method for automatically obtaining and processing logistics and transportation requests
CN111626881B (en) Annuity combined risk management system, annuity combined risk management method, annuity combined risk management server and storage medium
CN111429257B (en) Transaction monitoring method and device
CN112215386A (en) Personnel activity prediction method and device and computer readable storage medium
CN109218062B (en) Internet service alarm method and device based on confidence interval
CN113537519A (en) Method and device for identifying abnormal equipment
CN105956920A (en) Method and system of monitoring transaction
CN115689574A (en) Transaction risk early warning method and device, electronic equipment and storage medium
CN114997879B (en) Payment routing method, device, equipment and storage medium
CN110414186B (en) Data asset segmentation verification method and device
CN117130887A (en) Data processing method, data processing device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant