WO2021180009A1 - 一种数据检测方法及装置 - Google Patents

一种数据检测方法及装置 Download PDF

Info

Publication number
WO2021180009A1
WO2021180009A1 PCT/CN2021/079361 CN2021079361W WO2021180009A1 WO 2021180009 A1 WO2021180009 A1 WO 2021180009A1 CN 2021079361 W CN2021079361 W CN 2021079361W WO 2021180009 A1 WO2021180009 A1 WO 2021180009A1
Authority
WO
WIPO (PCT)
Prior art keywords
transaction
volumes
transaction volume
interval
volume
Prior art date
Application number
PCT/CN2021/079361
Other languages
English (en)
French (fr)
Inventor
袁敏
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021180009A1 publication Critical patent/WO2021180009A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Definitions

  • the embodiments of the present application relate to the field of financial technology (Fintech), and in particular, to a data detection method and device.
  • the direct manifestation in the data is The transaction volume has surged rapidly or the transaction request volume under a certain IP (Internet Protocol, protocol for interconnection between networks) addresses has surged. Therefore, the availability and stability of the current financial system services can be monitored in real time through data anomaly detection. Once abnormal data is detected, an early warning will be given, allowing the company to take corresponding measures.
  • IP Internet Protocol, protocol for interconnection between networks
  • This application provides a data detection method and device to accurately determine whether the transaction data is abnormal.
  • an embodiment of the present application provides a data detection method.
  • the method includes: obtaining a first transaction volume corresponding to a first sampling time and N second sampling times corresponding to N sampling times before the first sampling time.
  • Transaction volume N is an integer greater than 1.
  • determine the M detection results corresponding to the first transaction volume among which, one abnormal value determination method Corresponding to a test result, M is an integer greater than 1; vote on the M test results according to a preset voting method to determine the number of votes corresponding to the first transaction volume; if the vote corresponding to the first transaction volume If the number is greater than the preset threshold of the number of votes, it is determined that the first transaction volume is abnormal.
  • the N second transaction volume is used to determine the detection result corresponding to the first transaction volume; the first transaction volume is determined according to the detection result and the preset value under each outlier determination method.
  • the voting method determines the number of votes corresponding to the first transaction volume; when it is determined that the number of votes is greater than the preset number of votes threshold, it is determined that the first transaction volume is abnormal.
  • the voting on the M detection results according to a preset voting method and determining the number of votes corresponding to the first transaction volume includes: according to the total number of votes corresponding to the voting method According to the historical voting correct rate corresponding to the M abnormal value determination methods respectively, the number of votes corresponding to the M abnormal value determination methods respectively is determined; according to the number of votes corresponding to the M abnormal value determination methods and the first transaction respectively The M detection results corresponding to the volume determine the number of votes corresponding to the first transaction volume.
  • the weight value corresponding to each outlier determination method can be determined; combined with the total number of votes corresponding to the voting method, it can be determined
  • the number of votes corresponding to each abnormal value determination method specifically, when determining whether the first transaction volume is abnormal, determine the detection result of the first transaction volume by each abnormal value determination method, and calculate the corresponding number of each detection result
  • the number of votes can determine the number of votes corresponding to the first transaction volume.
  • the M outlier determination methods include Laida criterion; according to the Laida criterion and the N second transaction volumes, the detection corresponding to the first transaction volume is determined
  • the result includes: determining the L second transaction volume among the N second transaction volumes, the sampling time corresponding to the L second transaction volume and the first transaction volume is in the same period on different days; The average value and standard deviation of the L second transaction volumes; if the first transaction volume belongs to the first interval, it is determined that the first transaction volume is normal, and the lower limit of the first interval is the average value and the standard
  • the difference is multiplied by a preset first parameter value, and the upper limit of the first interval is the sum of the mean value and the standard deviation multiplied by the preset first parameter value.
  • the M abnormal value determination methods include a graph reference rule; the detection result corresponding to the first transaction volume is determined according to the graph reference rule and the N second transaction volumes, The method includes: determining L second transaction volumes among the N second transaction volumes, where sampling times corresponding to the L second transaction volumes and the first transaction volume are in the same period on different days; and determining the L Quarter and third quarters of the second transaction volume; if the first transaction volume belongs to the second interval, it is determined that the first transaction volume is normal and the second interval
  • the lower limit is the difference between the quarter quantile and the interquartile range multiplied by the preset second parameter value, and the upper limit of the second interval is the quarter quantile and the fourth
  • the quantile distance is multiplied by the sum value of the preset second parameter value, and the interquartile distance is the difference between the third-quarter quantile and the quarter quantile.
  • the M abnormal value determination methods include a ring comparison criterion; and determining the detection result corresponding to the first transaction volume according to the ring comparison criterion and the N second transaction volume includes: Determine the first ratio of the first transaction volume to the third transaction volume, where the third transaction volume is the transaction volume closest to the first sampling time among the N second transaction volumes; if the first If the ratio belongs to the third interval, it is determined that the first transaction volume is normal; wherein, the third interval is a normal value interval.
  • the lower limit of the third interval is the difference between the first mean value and the first standard deviation multiplied by the preset third parameter value
  • the upper limit of the third interval is the first The mean value and the first standard deviation multiplied by the sum of the preset third parameter value, the first standard deviation and the first mean value are determined according to the N-1 ratios corresponding to the N transaction volumes
  • the N-1 ratios include the ratio of every two adjacent transaction volumes in the N transaction volumes; or, wherein the lower limit of the third interval is a quarter quantile and an interquartile range Multiplied by the difference of the preset fourth parameter value, the upper limit of the third interval is the sum of the quarter quantile and the interquartile range multiplied by the preset fourth parameter value,
  • the interquartile range is the difference between the third-quarter and the fourth-quarter, and the third-quarter and the fourth-quarter are based on
  • the N-1 ratios corresponding to the N transaction volumes are determined, and the N-1 ratios include the ratio of every two adjacent transaction volumes in the N transaction volumes
  • the M abnormal value determination methods include a year-on-year criterion; according to the year-on-year criterion and the N second transaction volumes, determining the detection result corresponding to the first transaction volume includes: Determine the second ratio of the first transaction volume to the fourth transaction volume among the N second transaction volumes, and the sampling time corresponding to the fourth transaction volume and the first transaction volume is in the same period on different days If the second ratio belongs to the fourth interval, it is determined that the first transaction volume is normal; wherein, the fourth interval is a normal value interval.
  • the lower limit of the fourth interval is the difference between the second mean value and the second standard deviation multiplied by the preset fifth parameter value
  • the upper limit of the fourth interval is the second
  • the mean value and the second standard deviation multiplied by the sum of the preset fifth parameter value, the second standard deviation and the second mean value are determined according to X ratios corresponding to the N transaction volumes, so The X ratios include the ratio of every two adjacent trading volumes in Y trading volumes, and the Y trading volumes are the trading volumes at the same time period on different days among the N trading volumes; or, wherein, the The lower limit of the fourth interval is the difference between the quarter quantile and the interquartile range multiplied by the preset sixth parameter value, and the upper limit of the fourth interval is the quarter quantile and the The interquartile range is multiplied by the sum of the preset sixth parameter value, the interquartile range is the difference between the third-quarter and the quarter-quantile, and the quarter-quarter
  • the third quantile and the quarter quantile are determined according to the X
  • each outlier judgment can be determined.
  • the abnormal value range/normal value range corresponding to the method in order to determine the detection result of the first transaction volume under each abnormal value determination method.
  • an embodiment of the present application provides a data detection device, which includes: a transaction volume acquisition unit, configured to acquire a first transaction volume corresponding to a first sampling time and N sampling times before the first sampling time Corresponding to N second transaction volumes, where N is an integer greater than 1.
  • the detection result determination unit is used to determine the first transaction volume according to the selected M abnormal value determination methods and the N second transaction volumes Corresponding M detection results, where an abnormal value determination method corresponds to a detection result, and M is an integer greater than 1; the voting number determination unit is used to vote on the M detection results according to a preset voting method to determine The number of votes corresponding to the first transaction volume; an abnormality determining unit, configured to determine that the first transaction volume is abnormal if the number of votes corresponding to the first transaction volume is greater than a preset number of votes threshold.
  • the N second transaction volume is used to determine the detection result corresponding to the first transaction volume; the first transaction volume is determined according to the detection result and the preset value under each outlier determination method.
  • the voting method determines the number of votes corresponding to the first transaction volume; when it is determined that the number of votes is greater than the preset number of votes threshold, it is determined that the first transaction volume is abnormal.
  • the voting number determining unit is specifically configured to: determine the M votes according to the total number of votes corresponding to the voting method and the historical voting correct rates corresponding to the M abnormal value determination methods.
  • the number of votes corresponding to the abnormal value determination methods; the number of votes corresponding to the first transaction volume is determined according to the number of votes corresponding to the M abnormal value determination methods and the M detection results corresponding to the first transaction volume.
  • the weight value corresponding to each outlier determination method can be determined; combined with the total number of votes corresponding to the voting method, it can be determined
  • the number of votes corresponding to each abnormal value determination method specifically, when determining whether the first transaction volume is abnormal, determine the detection result of the first transaction volume by each abnormal value determination method, and calculate the corresponding number of each detection result
  • the number of votes can determine the number of votes corresponding to the first transaction volume.
  • the M abnormal value determination methods include the Laida criterion; the detection result determining unit is specifically configured to: determine L second transaction volumes among the N second transaction volumes , The sampling times corresponding to the L second transaction volume and the first transaction volume are in the same period on different days; determine the mean value and standard deviation of the L second transaction volume; if the first transaction volume belongs to In the first interval, it is determined that the first transaction volume is normal, the lower limit of the first interval is the difference between the mean value and the standard deviation multiplied by the preset first parameter value, and the upper limit of the first interval Is the sum of the mean value and the standard deviation multiplied by a preset first parameter value.
  • the M abnormal value determination methods include a graph standard; the detection result determination unit is specifically configured to: determine L second transaction volumes among the N second transaction volumes, The sampling times corresponding to the L second transaction volume and the first transaction volume are in the same time period on different days; determine the quarter quantile and third quarter quantile of the L second transaction volume If the first transaction volume belongs to the second interval, it is determined that the first transaction volume is normal, and the lower limit of the second interval is the quarter quantile and the interquartile range multiplied by a preset The upper limit of the second interval is the sum of the quarter quantile and the interquartile range multiplied by the preset second parameter value, the quarter The bit spacing is the difference between the third quarter quantile and the quarter quantile.
  • the M abnormal value determination methods include a ring comparison criterion; the detection result determining unit is specifically configured to: determine a first ratio of the first transaction volume to the third transaction volume, and The third transaction volume is the transaction volume closest to the first sampling time among the N second transaction volumes; if the first ratio belongs to the third interval, it is determined that the first transaction volume is normal; where The third interval is a normal value interval.
  • the lower limit of the third interval is the difference between the first mean value and the first standard deviation multiplied by the preset third parameter value
  • the upper limit of the third interval is the first The mean value and the first standard deviation multiplied by the sum of the preset third parameter value, the first standard deviation and the first mean value are determined according to the N-1 ratios corresponding to the N transaction volumes
  • the N-1 ratios include the ratio of every two adjacent transaction volumes in the N transaction volumes; or, wherein the lower limit of the third interval is a quarter quantile and an interquartile range Multiplied by the difference of the preset fourth parameter value, the upper limit of the third interval is the sum of the quarter quantile and the interquartile range multiplied by the preset fourth parameter value,
  • the interquartile range is the difference between the third-quarter and the fourth-quarter, and the third-quarter and the fourth-quarter are based on
  • the N-1 ratios corresponding to the N transaction volumes are determined, and the N-1 ratios include the ratio of every two adjacent transaction volumes in the N transaction volumes
  • the M abnormal value determination methods include a year-on-year criterion; the detection result determination unit is specifically configured to determine the first transaction volume and the N second transaction volume. 4.
  • the second ratio of the transaction volume, the sampling time corresponding to the fourth transaction volume and the first transaction volume are in the same period on different days; if the second ratio belongs to the fourth interval, the first transaction is determined The amount is normal; wherein the fourth interval is a normal value interval.
  • the lower limit of the fourth interval is the difference between the second mean value and the second standard deviation multiplied by the preset fifth parameter value
  • the upper limit of the fourth interval is the second
  • the mean value and the second standard deviation multiplied by the sum of the preset fifth parameter value, the second standard deviation and the second mean value are determined according to X ratios corresponding to the N transaction volumes, so The X ratios include the ratio of every two adjacent trading volumes in Y trading volumes, and the Y trading volumes are the trading volumes at the same time period on different days among the N trading volumes; or, wherein, the The lower limit of the fourth interval is the difference between the quarter quantile and the interquartile range multiplied by the preset sixth parameter value, and the upper limit of the fourth interval is the quarter quantile and the The interquartile range is multiplied by the sum of the preset sixth parameter value, the interquartile range is the difference between the third-quarter and the quarter-quantile, and the quarter-quarter
  • the third quantile and the quarter quantile are determined according to the X
  • each outlier judgment can be determined.
  • the abnormal value range/normal value range corresponding to the method in order to determine the detection result of the first transaction volume under each abnormal value determination method.
  • an embodiment of the present application provides a computing device, including:
  • Memory used to store program instructions
  • the processor is configured to call the program instructions stored in the memory, and execute the method of the first aspect or any implementation method of the first aspect according to the obtained program.
  • an embodiment of the present application provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to cause a computer to execute the method of the first aspect, Or any implementation method of the first aspect.
  • Figure 1 is a data detection method provided by an embodiment of the application
  • Fig. 2 is a data detection device provided by an embodiment of the application.
  • a data detection method provided by an embodiment of this application includes the following steps:
  • Step 101 Obtain a first transaction volume corresponding to a first sampling time and N second transaction volumes corresponding to N sampling times before the first sampling time, where N is an integer greater than 1.
  • Step 102 Determine M detection results corresponding to the first transaction volume according to the selected M abnormal value determination methods and the N second transaction volumes, where one abnormal value determination method corresponds to one detection result, and M is An integer greater than 1.
  • Step 103 voting on the M detection results according to a preset voting method, and determining the number of votes corresponding to the first transaction volume.
  • Step 104 If the number of votes corresponding to the first transaction volume is greater than a preset vote number threshold, it is determined that the first transaction volume is abnormal.
  • the N second transaction volume is used to determine the detection result corresponding to the first transaction volume; the first transaction volume is determined according to the detection result and the preset value under each outlier determination method.
  • the voting method determines the number of votes corresponding to the first transaction volume; when it is determined that the number of votes is greater than the preset number of votes threshold, it is determined that the first transaction volume is abnormal.
  • the data generated by the financial system includes but is not limited to the following: transaction data generated by the transaction system, account opening request data generated by the account opening system, and clearing data generated by the clearing system.
  • transaction data generated by the transaction system is taken as an example for description.
  • the transaction data generated by the transaction system can be processed according to the dimensions and time granularity concerned by anomaly detection. For example, the number of requests for transactions initiated per minute, the proportion of transaction failures per minute, and the number of transactions initiated per minute under a certain IP Transaction volume, etc. As an example, in the embodiment of the present application, the amount of requests for initiating transactions per minute is taken as an example for description.
  • the sampling interval can be one minute, one hour, or it can be set by oneself. In the embodiment of the present application, one minute is taken as an example for description.
  • the first sampling time may be the current minute
  • the first transaction volume may be the request volume for initiating transactions in the current minute.
  • the N sampling times closest to the first sampling time may be every minute in the most recent week, may be every minute in the most recent month, or may be set by themselves.
  • the nearest sampling time to the first sampling time Take every minute of the 100 days as an example.
  • the amount of requests for transactions initiated at other sampling times today can be deduced by analogy.
  • step 102 there are a variety of abnormal value determination methods on the market to realize abnormal value detection of data.
  • it can be based on the Raida (3 ⁇ ) criterion, for example, it can be based on the Tukey criterion, for example, it can be based on the ring ratio criterion, for example, it can be based on the year-on-year criterion, or other mathematical statistical methods.
  • four outlier determination methods namely the 3 ⁇ criterion, the Tukey criterion, the ring comparison criterion, and the year-on-year criterion, are selected to determine whether the latest data is an outlier.
  • the first transaction volume is obtained under the 3 ⁇ criterion.
  • the Tukey criteria the number of requests for transactions initiated every minute within the last 100 days to determine whether the first transaction volume is abnormal at 00:00:00 today.
  • Detection results According to the chain comparison criterion and the number of requests for transactions initiated every minute in the last 100 days from today to determine whether the first transaction volume is abnormal at 00:00:00 today, the detection of the first transaction volume under the chain comparison criterion is obtained Results: According to the same-year criteria and the number of requests for transactions initiated every minute in the last 100 days from today to determine whether the first transaction volume is abnormal at 00:00:00 today, the detection result of the first transaction volume under the same-year criteria is obtained .
  • the selected outlier determination method can be any two of the 4 outlier determination methods of 3 ⁇ criterion, Tukey criterion, chain ratio criterion and year-on-year criterion.
  • it can be 3 ⁇ criterion and Tukey criterion, which can be The 3 ⁇ criterion and the ring comparison criterion can be Tukey criterion and year-on-year criterion, etc.; or the selected outlier determination method can be any three of the 4 outlier determination methods of 3 ⁇ criterion, Tukey criterion, ring comparison criterion and year-on-year criterion.
  • the method for example, can be 3 ⁇ criterion, Tukey criterion and chain comparison criterion, and can be Tukey criterion, chain comparison criterion and year-on-year criterion, and so on.
  • the M detection results are voted according to the preset voting method, and the number of votes corresponding to the first transaction volume can be determined.
  • step 104 usually based on the experience of the business personnel, the number of votes that should be obtained when the data is abnormal, that is, the preset threshold of votes, when the number of votes corresponding to the first transaction volume is greater than the preset threshold of votes, It is determined that the first transaction volume is abnormal.
  • step 102 when performing preliminary abnormal value detection on the first transaction volume, multiple outlier determination methods are used in the embodiment of the present application at the same time. Next, the multiple outlier determination methods used will be described in detail. .
  • the M outlier determination methods include Laida criterion; according to the Laida criterion and the N second transaction volumes, the detection corresponding to the first transaction volume is determined
  • the result includes: determining the L second transaction volume among the N second transaction volumes, the sampling time corresponding to the L second transaction volume and the first transaction volume is in the same period on different days; The average value and standard deviation of the L second transaction volumes; if the first transaction volume belongs to the first interval, it is determined that the first transaction volume is normal, and the lower limit of the first interval is the average value and the standard
  • the difference is multiplied by a preset first parameter value, and the upper limit of the first interval is the sum of the mean value and the standard deviation multiplied by the preset first parameter value.
  • the real-time data X N+2 of the amount of requests for initiating transactions today at 00:01:00 extract the amount of requests for initiating transactions between today’s 00:01:00 and 00:01:00 in each day
  • the normal value range is ( ⁇ 2 -3 ⁇ 2 ) ⁇ ( ⁇ 2 +3 ⁇ 2 , ), the normal value range at this time is the first interval, that is, when X N+2 is within this range, it is judged as a normal value;
  • the abnormal value range is (- ⁇ , ⁇ 2 -3 ⁇ 2 ) ⁇
  • the outlier range is updated according to the method described above, so that the outlier threshold can be calculated adaptively, and there is no need to manually calculate and adjust it in the future.
  • the specific value of the first parameter value is not limited in the present invention.
  • the M abnormal value determination methods include a graph reference rule; the detection result corresponding to the first transaction volume is determined according to the graph reference rule and the N second transaction volumes, The method includes: determining L second transaction volumes among the N second transaction volumes, where sampling times corresponding to the L second transaction volumes and the first transaction volume are in the same period on different days; and determining the L Quarter and third quarters of the second transaction volume; if the first transaction volume belongs to the second interval, it is determined that the first transaction volume is normal and the second interval
  • the lower limit is the difference between the quarter quantile and the interquartile range multiplied by the preset second parameter value, and the upper limit of the second interval is the quarter quantile and the fourth
  • the quantile distance is multiplied by the sum value of the preset second parameter value, and the interquartile distance is the difference between the third-quarter quantile and the quarter quantile.
  • the normal value range is (Q 11 -1.5(Q 31 -Q 11 )) ⁇ (Q 31 +1.5(Q 31 -Q 11 )), and the normal value range at this time is the second interval; abnormal The value range is (- ⁇ , Q 11 -1.5(Q 31 -Q 11 )) ⁇ (Q 31 +1.5(Q 31 -Q 11 ), + ⁇ ), which means that when X N+1 is within this range It was judged as an abnormal value.
  • the number "1.5" in the embodiment of the present application is the second parameter value.
  • the real-time data X N+2 of the amount of requests for initiating transactions today at 00:01:00 extract the amount of requests for initiating transactions between today’s 00:01:00 and 00:01:00 in each day
  • the quantile Q 12 and the 75% quantile Q 32 for the real-time data X N+2 of the amount of requests for initiating transactions at 00:01:00 today, the normal value range is (Q 12 -1.5(Q 32 -Q 12 )) ⁇ (Q 32 +1.5(Q 32 -Q 12 )), the normal value range at this time is the second interval, that is to say, when X N+2 is within this range, it is judged Is a normal value; the range of abnormal values is
  • the outlier range is updated according to the method described above, so that the outlier threshold can be calculated adaptively, and there is no need to manually calculate and adjust it in the future.
  • the specific value of the second parameter value is not limited in the present invention.
  • the M abnormal value determination methods include a ring comparison criterion; and determining the detection result corresponding to the first transaction volume according to the ring comparison criterion and the N second transaction volume includes: Determine the first ratio of the first transaction volume to the third transaction volume, where the third transaction volume is the transaction volume closest to the first sampling time among the N second transaction volumes; if the first If the ratio belongs to the third interval, it is determined that the first transaction volume is normal; wherein, the third interval is a normal value interval.
  • the chain ratio is to calculate the ratio of the amount of requests for transactions initiated in the current minute to the amount of requests for transactions initiated in the previous minute.
  • For the current real-time data X N+2 calculate the latest chain ratio The latest chain ratio at this time That is the first ratio. If the calculated HR N+2 is in the third interval, it is determined that X N+2 is a normal value, otherwise it is determined that X N+2 is an abnormal value.
  • the outlier range is updated according to the method described above, so that the outlier threshold can be calculated adaptively, and there is no need to manually calculate and adjust it in the future.
  • the normal value range of the historical chain ratio can be determined by the following methods:
  • the lower limit of the third interval is the difference between the first mean value and the first standard deviation multiplied by the preset third parameter value
  • the upper limit of the third interval is the first The mean value and the first standard deviation multiplied by the sum of the preset third parameter value, the first standard deviation and the first mean value are determined according to the N-1 ratios corresponding to the N transaction volumes
  • the N-1 ratios include the ratio of every two adjacent transaction volumes in the N transaction volumes; or, wherein the lower limit of the third interval is a quarter quantile and an interquartile range Multiplied by the difference of the preset fourth parameter value, the upper limit of the third interval is the sum of the quarter quantile and the interquartile range multiplied by the preset fourth parameter value,
  • the interquartile range is the difference between the third-quarter and the fourth-quarter, and the third-quarter and the fourth-quarter are based on
  • the N-1 ratios corresponding to the N transaction volumes are determined, and the N-1 ratios include the ratio of every two adjacent transaction volumes in the N transaction volumes
  • the normal range can be determined according to the mean ⁇ 3 and standard deviation ⁇ 3 of these N-1 historical chain ratios, that is, the third interval.
  • the third interval is: ( ⁇ 3 -3 ⁇ 3 ) ⁇ ( ⁇ 3 +3 ⁇ 3 ), where the number "3" in the embodiment of the present application is the third parameter value.
  • the 75% quantile Q 33 is the third quarter quantile; further, it can be based on the 25% quantile Q 13 and 75% quantile Q 33 of the N-1 historical month-on-month comparison.
  • the third interval is: (Q 13 -1.5(Q 33 -Q 13 )) ⁇ (Q 33 +1.5(Q 33 -Q 13 )), where (Q 33 -Q 13 ) is the interquartile range, and the number "1.5" in the embodiment of the present application is the fourth parameter value.
  • the M abnormal value determination methods include a year-on-year criterion; according to the year-on-year criterion and the N second transaction volumes, determining the detection result corresponding to the first transaction volume includes: Determine the second ratio of the first transaction volume to the fourth transaction volume among the N second transaction volumes, and the sampling time corresponding to the fourth transaction volume and the first transaction volume is in the same period on different days If the second ratio belongs to the fourth interval, it is determined that the first transaction volume is normal; wherein, the fourth interval is a normal value interval.
  • the normal value range at this time is the fourth interval; the amount of requests for initiating transactions at 00:00:00 today
  • X N+1 the latest calculated year-on-year That is the second ratio. If calculated In the fourth interval, it is determined that X N+1 is a normal value, otherwise it is determined that X N+1 is an abnormal value.
  • the normal value range at this time is the fourth interval; the request volume for the transaction initiated at 00:01:00 today For this current real-time data X N+2 , the latest calculated year-on-year That is the second ratio. If calculated In the fourth interval, it is determined that X N+2 is a normal value, otherwise it is determined that X N+2 is an abnormal value.
  • the outlier range is updated according to the method described above, so that the outlier threshold can be calculated adaptively, and there is no need to manually calculate and adjust it in the future.
  • the historical year-on-year normal value range can be determined by the following methods:
  • the lower limit of the fourth interval is the difference between the second mean value and the second standard deviation multiplied by the preset fifth parameter value
  • the upper limit of the fourth interval is the second
  • the mean value and the second standard deviation multiplied by the sum of the preset fifth parameter value, the second standard deviation and the second mean value are determined according to X ratios corresponding to the N transaction volumes, so The X ratios include the ratio of every two adjacent trading volumes in Y trading volumes, and the Y trading volumes are the trading volumes at the same time period on different days among the N trading volumes; or, wherein, the The lower limit of the fourth interval is the difference between the quarter quantile and the interquartile range multiplied by the preset sixth parameter value, and the upper limit of the fourth interval is the quarter quantile and the The interquartile range is multiplied by the sum of the preset sixth parameter value, the interquartile range is the difference between the third-quarter and the quarter-quantile, and the quarter-quarter
  • the third quantile and the quarter quantile are determined according to the X
  • a total of 99 historical year-on-years calculate the mean ⁇ 4 and standard deviation ⁇ 4 of these 99 historical year-on-years, where the mean ⁇ 4 is the second mean, and the standard deviation ⁇ 4 is the second standard deviation; further, it can be based on this 99 historical year-on-year mean ⁇ 4 and standard deviation ⁇ 4 determine the normal range, that is, the fourth interval.
  • the fourth interval is: ( ⁇ 4 -3 ⁇ 4 ) ⁇ ( ⁇ 4 +3 ⁇ 4 ), where this application
  • the number "3" in the embodiment is the fifth parameter value.
  • the fourth interval is: (Q 14 -1.5(Q 34 -Q 14 )) ⁇ (Q 34 +1.5(Q 34 -Q 14 )), where (Q 34 -Q 14 ) is four
  • the quantile distance the number "1.5" in the embodiment of this application is the sixth parameter value.
  • outlier determination methods namely the 3 ⁇ criterion, the Tukey criterion, the ring comparison criterion, and the year-on-year criterion, are selected at the same time to detect outliers on the latest data respectively. Therefore, for the latest data, these four methods will perform an outlier detection, and get the detection results under each outlier determination method. Since there are multiple possible detection results, it is necessary to comprehensively consider the influence of each detection result on the determination of the abnormal value of the latest data, so as to finally determine whether the latest data is an abnormal value.
  • a simple voting method can be used.
  • the idea of simple voting is: suppose the total number of votes is 100 votes. When it is considered that each outlier determination method has the same accuracy rate for determining the latest data, each outlier determination method has 25 votes respectively; if a certain method If the latest data is considered to be an abnormal value, all 25 votes will be voted for the latest data, otherwise no vote will be voted, that is, 0 votes will be voted for the latest data. If the final number of votes of the latest data exceeds the preset threshold of votes, for example, it can be 50 votes (the threshold of the number of votes can be set by the business personnel according to actual business needs and experience), then it is determined that the latest data is an abnormal value.
  • the request amount for initiating a transaction is X N+1
  • the request amount for initiating a transaction is X N+1
  • Tukey criterion after the selected 3 ⁇ criterion, Tukey criterion, ring comparison criterion and year-on-year criterion are used to determine the four outlier determination methods
  • the detection results may be as follows: X N+1 is judged as an abnormal value by the 3 ⁇ criterion, X N+1 is judged as an abnormal value by the Tukey criterion, X N+1 is judged as a normal value by the ring ratio criterion, and X N+1 is judged by the year-on-year criterion It is a normal value.
  • the number of votes obtained by X N+1 is 50, which does not exceed the vote threshold (set to 50 votes), and X N+1 is finally determined to be a normal value, and no warning instructions are required to remind the business personnel Perform corresponding maintenance.
  • the voting on the M detection results according to a preset voting method and determining the number of votes corresponding to the first transaction volume includes: according to the total number of votes corresponding to the voting method and The historical voting correct rates corresponding to the M abnormal value determination methods respectively determine the number of votes corresponding to the M abnormal value determination methods; the number of votes corresponding to the M abnormal value determination methods and the first transaction volume respectively The corresponding M detection results determine the number of votes corresponding to the first transaction volume.
  • the accuracy of the outlier determination methods is higher, so you can start with Simple voting; when a certain number of detections (such as 100 times) is reached, the judgment results of each historical outlier judgment method can be compared with the final voting result, and the accuracy of each outlier judgment method can be obtained. Furthermore, in each subsequent voting, a different number of votes can be assigned to each abnormal value determination method based on the historical accuracy of determining abnormal values, that is, weighted voting.
  • the final total number of votes is calculated as follows:
  • Represents an indicative function, that is, when Li(D+1) 1 (the D+1th judgment conclusion of the i-th method is an abnormal value), the value is 1; otherwise, it is 0. Then according to formula (2), the final weighted voting conclusion can be made.
  • the current to be detected is determined by the 3 ⁇ criterion.
  • the request amount of the transaction initiated at the sampling time is an abnormal value
  • the 4 kinds of abnormal value determination methods comprehensively determine that the request amount of the transaction initiated at the current sampling time to be detected is an abnormal value
  • the accuracy of the 3 ⁇ criterion is 96%;
  • the ring ratio criterion is used to determine the to-be-detected The request amount of the transaction initiated at the current sampling time is an abnormal value, and this time, the 4 kinds of abnormal value determination methods comprehensively determine that the request amount of the transaction initiated at the current sampling time to be detected is an abnormal value; and, for 89 of these
  • the accuracy rate of the ring comparison criterion is 90%;
  • the weight of each outlier determination method can be determined:
  • the weight value is:
  • the total number of votes is 100 votes, and multiply the total number of votes by the weight value of each outlier determination method, that is, the number of votes corresponding to each outlier method can be obtained.
  • the number of votes is 25.5; for the Tukey criterion, the number of votes is 24.5; for the ring comparison criterion, the number of votes is 23.9; for the year-on-year criterion, the number of votes is 26.1.
  • a weighted voting method is used to determine whether it is an abnormal value.
  • the request amount for the 101st transaction initiated at the current sampling time is determined by the 3 ⁇ criterion to be an abnormal value
  • the Tukey criterion is determined to be a normal value
  • the ring ratio criterion is determined to be an abnormal value
  • Set the preset threshold of votes to 50 votes. Since 49.4 votes are less than 50 votes, it can be considered that the number of requests for initiating transactions at the current sampling time of the 101st time is a normal value, and there is no need to generate alarm instructions to remind business personnel to respond. Maintenance.
  • an embodiment of the present application also provides a data detection device. As shown in FIG. 2, the device includes:
  • the transaction volume acquisition unit 201 acquires a first transaction volume corresponding to a first sampling time and N second transaction volumes corresponding to N sampling times before the first sampling time, where N is an integer greater than 1.
  • the detection result determination unit 202 is configured to determine the M detection results corresponding to the first transaction volume according to the selected M abnormal value determination methods and the N second transaction volumes, wherein one abnormal value determination method corresponds to one As a result of the detection, M is an integer greater than 1;
  • the number of votes determining unit 203 is configured to vote on the M detection results according to a preset voting method, and determine the number of votes corresponding to the first transaction volume;
  • the abnormality determining unit 204 is configured to determine that the first transaction volume is abnormal if the number of votes corresponding to the first transaction volume is greater than a preset vote number threshold.
  • the number of votes determining unit 203 is specifically configured to: determine the M number of votes according to the total number of votes corresponding to the voting method and the historical voting correct rate corresponding to the M abnormal value determination methods.
  • the number of votes corresponding to the abnormal value determination methods; the number of votes corresponding to the first transaction volume is determined according to the number of votes corresponding to the M abnormal value determination methods and the M detection results corresponding to the first transaction volume.
  • the M abnormal value determination methods include the Laida criterion; the detection result determining unit 202 is specifically configured to: determine L second transaction volumes among the N second transaction volumes , The sampling times corresponding to the L second transaction volume and the first transaction volume are in the same period on different days; determine the mean value and standard deviation of the L second transaction volume; if the first transaction volume belongs to In the first interval, it is determined that the first transaction volume is normal, the lower limit of the first interval is the difference between the mean value and the standard deviation multiplied by the preset first parameter value, and the upper limit of the first interval Is the sum of the mean value and the standard deviation multiplied by a preset first parameter value.
  • the M abnormal value determination methods include graph standard rules; the detection result determination unit 202 is specifically configured to: determine L second transaction volumes among the N second transaction volumes, The sampling times corresponding to the L second transaction volume and the first transaction volume are in the same time period on different days; determine the quarter quantile and third quarter quantile of the L second transaction volume If the first transaction volume belongs to the second interval, it is determined that the first transaction volume is normal, and the lower limit of the second interval is the quarter quantile and the interquartile range multiplied by a preset The upper limit of the second interval is the sum of the quarter quantile and the interquartile range multiplied by the preset second parameter value, the quarter The bit spacing is the difference between the third quarter quantile and the quarter quantile.
  • the M abnormal value determination methods include a ring comparison criterion; the detection result determining unit 202 is specifically configured to: determine a first ratio of the first transaction volume to the third transaction volume, and The third transaction volume is the transaction volume closest to the first sampling time among the N second transaction volumes; if the first ratio belongs to the third interval, it is determined that the first transaction volume is normal; where The third interval is a normal value interval.
  • the lower limit of the third interval is the difference between the first mean value and the first standard deviation multiplied by the preset third parameter value
  • the upper limit of the third interval is the first mean value and the The first standard deviation is multiplied by the sum of the preset third parameter value
  • the first standard deviation and the first mean value are determined according to the N-1 ratios corresponding to the N transaction volumes, so
  • the N-1 ratios include the ratio of every two adjacent trading volumes in the N trading volumes; or, where the lower limit of the third interval is the quarter quantile and the interquartile range multiplied by The difference value of the preset fourth parameter value, the upper limit of the third interval is the sum of the quarter quantile and the interquartile range multiplied by the preset fourth parameter value, the The interquartile range is the difference between the third-quarter and the quarter-quantile, and the third-quarter and the quarter-quantile are based on the N
  • the N-1 ratios corresponding to each transaction volume are determined, and the N-1 ratios include the ratio of every two adjacent transaction volumes in
  • the M abnormal value determination methods include a year-on-year criterion; the detection result determining unit 202 is specifically configured to determine the first transaction volume and the N second transaction volume. 4.
  • the second ratio of the transaction volume, the sampling time corresponding to the fourth transaction volume and the first transaction volume are in the same period on different days; if the second ratio belongs to the fourth interval, the first transaction is determined The amount is normal; wherein the fourth interval is a normal value interval.
  • the lower limit of the fourth interval is the difference between the second mean value and the second standard deviation multiplied by the preset fifth parameter value
  • the upper limit of the fourth interval is the second mean value and the second standard deviation.
  • the second standard deviation is multiplied by the sum of the preset fifth parameter value
  • the second standard deviation and the second mean value are determined according to X ratios corresponding to the N transaction volumes, and the X The ratios include the ratio of every two adjacent trading volumes in Y trading volumes, where the Y trading volumes are the trading volumes at the same time period on different days among the N trading volumes; or, wherein, the fourth The lower limit of the interval is the difference between the quarter quantile and the interquartile range multiplied by the preset sixth parameter value, and the upper limit of the fourth interval is the quarter quantile and the fourth The quantile distance is multiplied by the sum value of the preset sixth parameter value, the quartile distance is the difference between the third-quarter quantile and the quarter quantile, the third-quarter The quantile and the quarter quantile are
  • the embodiments of the present application also provide a computing device.
  • the computing device may specifically be a desktop computer, a portable computer, a smart phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA), and the like.
  • the computing device may include a central processing unit (CPU), a memory, an input/output device, etc.
  • the input device may include a keyboard, a mouse, a touch screen, etc.
  • an output device may include a display device, such as a liquid crystal display (Liquid Crystal Display, LCD), Cathode Ray Tube (CRT), etc.
  • LCD Liquid Crystal Display
  • CRT Cathode Ray Tube
  • the memory may include read-only memory (ROM) and random access memory (RAM), and provides the processor with program instructions and data stored in the memory.
  • the memory may be used to execute the program instructions of the data detection method;
  • the processor is used to call the program instructions stored in the memory and execute the data detection method according to the obtained program.
  • An embodiment of the present application also provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to make a computer execute a data detection method.
  • this application can be provided as methods or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Abstract

本申请涉及金融科技(Fintech)领域,公开了一种数据检测方法及装置,该方法包括:获取第一采样时间对应的第一交易量和所述第一采样时间之前的N个采样时间分别对应的N个第二交易量;根据选取的M个异常值判定方法和所述N个第二交易量,确定所述第一交易量对应的M个检测结果;根据预设的投票方法对所述M个检测结果进行投票,确定所述第一交易量对应的投票数;若所述第一交易量对应的投票数大于预设的票数阈值,则确定所述第一交易量异常。该方法通过实时调整第二交易量的数据范围,有助于提高对第一交易量的检测结果判定的准确性;以及根据多种异常值判定方法得到的检测结果和预设的投票方法,可实现准确判断第一交易量是否发生异常。

Description

一种数据检测方法及装置
相关申请的交叉引用
本申请要求在2020年03月13日提交中国专利局、申请号为202010177723.4、申请名称为“一种数据检测方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及金融科技(Fintech)领域,尤其涉及一种数据检测方法及装置。
背景技术
随着计算机技术的发展,越来越多的技术(例如:大数据、云计算或区块链)应用在金融领域,传统金融业正在逐步向金融科技转变,大数据技术也不例外。但由于金融、支付行业的安全性、实时性要求,也对大数据技术提出了更高的要求。
对金融科技企业而言,维持金融系统服务的高可用性以及稳定性是至关重要的,直接关系到企业运营的效率及稳定性,甚至会影响整个金融行业的稳定。所以对金融系统服务的可用性及稳定性的监控和预警是非常重要的。由于金融系统服务的可用性及稳定性通常直接体现在数据上,因此对数据进行异常检测则是一个非常重要的手段。例如对于银行的交易系统,如果某时段系统稳定性降低,可能会直接导致交易成功率显著降低;或者,当有外部黑客攻击频繁发起交易请求时可能会影响系统的可用性,数据上直接的体现就是交易量迅速激增或者某个IP(Internet Protocol,网络之间互联的协议)地址下的交易请求量突增。所以可以通过数据异常检测来实时监控当前金融系统服务的可用性和稳定性,一旦检测出异常数据就会预警,让企业采取相应的措施。
如何准确地发现异常数据,是亟需解决的问题。
发明内容
本申请提供一种数据检测方法及装置,用以实现准确判断交易数据是否发生异常。
第一方面,本申请实施例提供一种数据检测方法,该方法包括:获取第一采样时间对应的第一交易量和所述第一采样时间之前的N个采样时间分别对应的N个第二交易量,N为大于1的整数;根据选取的M个异常值判定方法和所述N个第二交易量,确定所述第一交易量对应的M个检测结果,其中,一个异常值判定方法对应一个检测结果,M为大于1的整数;根据预设的投票方法对所述M个检测结果进行投票,确定所述第一交易量对应的投票数;若所述第一交易量对应的投票数大于预设的票数阈值,则确定所述第一交易量异常。
基于该方案,针对每个异常值判定方法,根据N个第二交易量以确定第一交易量对应的检测结果;根据第一交易量在每个异常值判定方法下的检测结果和预设的投票方法以确定第一交易量对应的投票数;在确定投票数大于预设的票数阈值时,则确定第一交易量异常。该方案通过实时调整第二交易量的数据范围,有助于提高对第一交易量的检测结果判 定的准确性;以及根据多种异常值判定方法得到的检测结果和预设的投票方法,可以实现准确判断第一交易量是否发生异常。
在一种可能的实现方法中,所述根据预设的投票方法对所述M个检测结果进行投票,确定所述第一交易量对应的投票数,包括:根据所述投票方法对应的总票数和所述M个异常值判定方法分别对应的历史投票正确率,确定所述M个异常值判定方法分别对应的票数;根据所述M个异常值判定方法分别对应的票数和所述第一交易量对应的M个检测结果,确定所述第一交易量对应的投票数。
基于该方案,通过对每个异常值判定方法分别对应的历史投票正确的次数的统计,则可以确定每个异常值判定方法分别对应的权重值;再结合投票方法对应的总票数,则可以确定每个异常值判定方法分别对应的票数;具体到确定第一交易量是否发生异常时,确定用每个异常值判定方法对第一交易量进行判断的检测结果,通过统计每个检测结果对应的票数值,则可以确定第一交易量对应的投票数。该方案通过结合多个异常值判定方法的历史投票情况,从而有助于确定每个异常值判定方法对于第一交易量是否发生异常作出判断的准确性,进一步地为不同的异常值判定方法赋予不尽相同的票数,使得后期可以更加合理地确定第一交易量是否发生异常。
在一种可能的实现方法中,所述M个异常值判定方法包括拉依达准则;根据所述拉依达准则和所述N个第二交易量,确定所述第一交易量对应的检测结果,包括:确定所述N个第二交易量中的L个第二交易量,所述L个第二交易量与所述第一交易量对应的采样时间处于不同天的同一时段;确定所述L个第二交易量的均值和标准差;若所述第一交易量属于第一区间,则确定所述第一交易量正常,所述第一区间的下限为所述均值与所述标准差乘以预设的第一参数值的差值,所述第一区间的上限为所述均值与所述标准差乘以预设的第一参数值的和值。
在一种可能的实现方法中,所述M个异常值判定方法包括图基准则;根据所述图基准则和所述N个第二交易量,确定所述第一交易量对应的检测结果,包括:确定所述N个第二交易量中的L个第二交易量,所述L个第二交易量与所述第一交易量对应的采样时间处于不同天的同一时段;确定所述L个第二交易量的四分之一分位数和四分之三分位数;若所述第一交易量属于第二区间,则确定所述第一交易量正常,所述第二区间的下限为所述四分之一分位数与四分位间距乘以预设的第二参数值的差值,所述第二区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第二参数值的和值,所述四分位间距为所述四分之三分位数与所述四分之一分位数的差值。
在一种可能实现的方法中,所述M个异常值判定方法包括环比准则;根据所述环比准则和所述N个第二交易量,确定所述第一交易量对应的检测结果,包括:确定所述第一交易量与第三交易量的第一比值,所述第三交易量为所述N个第二交易量中距离所述第一采样时间最近的交易量;若所述第一比值属于第三区间,则确定所述第一交易量正常;其中,所述第三区间是正常值区间。
在一种可能实现的方法中,所述第三区间的下限为第一均值与第一标准差乘以预设的第三参数值的差值,所述第三区间的上限为所述第一均值与所述第一标准差乘以预设的第三参数值的和值,所述第一标准差和所述第一均值是根据所述N个交易量对应的N-1个比值确定的,所述N-1个比值包括所述N个交易量中每两个相邻交易量的比值;或者,其中,所述第三区间的下限为四分之一分位数与四分位间距乘以预设的第四参数值的差值,所述 第三区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第四参数值的和值,所述四分位间距为四分之三分位数与所述四分之一分位数的差值,所述四分之三分位数和所述四分之一分位数是根据所述N个交易量对应的N-1个比值确定的,所述N-1个比值包括所述N个交易量中每两个相邻交易量的比值。
在一种可能实现的方法中,所述M个异常值判定方法包括同比准则;根据所述同比准则和所述N个第二交易量,确定所述第一交易量对应的检测结果,包括:确定所述第一交易量与所述N个第二交易量中的第四交易量的第二比值,所述第四交易量与所述第一交易量对应的采样时间处于不同天的同一时段;若所述第二比值属于第四区间,则确定所述第一交易量正常;其中,所述第四区间是正常值区间。
在一种可能实现的方法中,所述第四区间的下限为第二均值与第二标准差乘以预设的第五参数值的差值,所述第四区间的上限为所述第二均值与所述第二标准差乘以预设的第五参数值的和值,所述第二标准差和所述第二均值是根据所述N个交易量对应的X个比值确定的,所述X个比值包括Y个交易量中每两个相邻交易量的比值,所述Y个交易量为所述N个交易量中处于不同天的同一时段的交易量;或者,其中,所述第四区间的下限为四分之一分位数与四分位间距乘以预设的第六参数值的差值,所述第四区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第六参数值的和值,所述四分位间距为四分之三分位数与所述四分之一分位数的差值,所述四分之三分位数和所述四分之一分位数是根据所述N个交易量对应的X个比值确定的,所述X个比值包括Y个交易量中每两个相邻交易量的比值,所述Y个交易量为所述N个交易量中处于不同天的同一时段的交易量。
基于上述六种方案,针对N个第二交易量,根据选取的多个异常值判定方法,其中可以包括拉依达准则、图基准则、环比准则和同比准则,从而可以确定每个异常值判定方法对应的异常值范围/正常值范围,以便于确定第一交易量在每个异常值判定方法下的检测结果。
第二方面,本申请实施例提供一种数据检测装置,该装置包括:交易量获取单元,用于获取第一采样时间对应的第一交易量和所述第一采样时间之前的N个采样时间分别对应的N个第二交易量,N为大于1的整数;检测结果确定单元,用于根据选取的M个异常值判定方法和所述N个第二交易量,确定所述第一交易量对应的M个检测结果,其中,一个异常值判定方法对应一个检测结果,M为大于1的整数;投票数确定单元,用于根据预设的投票方法对所述M个检测结果进行投票,确定所述第一交易量对应的投票数;异常确定单元,用于若所述第一交易量对应的投票数大于预设的票数阈值,则确定所述第一交易量异常。
基于该方案,针对每个异常值判定方法,根据N个第二交易量以确定第一交易量对应的检测结果;根据第一交易量在每个异常值判定方法下的检测结果和预设的投票方法以确定第一交易量对应的投票数;在确定投票数大于预设的票数阈值时,则确定第一交易量异常。该方案通过实时调整第二交易量的数据范围,有助于提高对第一交易量的检测结果判定的准确性;以及根据多种异常值判定方法得到的检测结果和预设的投票方法,可以实现准确判断第一交易量是否发生异常。
在一种可能的实现方法中,所述投票数确定单元具体用于:根据所述投票方法对应的总票数和所述M个异常值判定方法分别对应的历史投票正确率,确定所述M个异常值判定方法分别对应的票数;根据所述M个异常值判定方法分别对应的票数和所述第一交易量 对应的M个检测结果,确定所述第一交易量对应的投票数。
基于该方案,通过对每个异常值判定方法分别对应的历史投票正确的次数的统计,则可以确定每个异常值判定方法分别对应的权重值;再结合投票方法对应的总票数,则可以确定每个异常值判定方法分别对应的票数;具体到确定第一交易量是否发生异常时,确定用每个异常值判定方法对第一交易量进行判断的检测结果,通过统计每个检测结果对应的票数值,则可以确定第一交易量对应的投票数。该方案通过结合多个异常值判定方法的历史投票情况,从而有助于确定每个异常值判定方法对于第一交易量是否发生异常作出判断的准确性,进一步地为不同的异常值判定方法赋予不尽相同的票数,使得后期可以更加合理地确定第一交易量是否发生异常。
在一种可能的实现方法中,所述M个异常值判定方法包括拉依达准则;所述检测结果确定单元具体用于:确定所述N个第二交易量中的L个第二交易量,所述L个第二交易量与所述第一交易量对应的采样时间处于不同天的同一时段;确定所述L个第二交易量的均值和标准差;若所述第一交易量属于第一区间,则确定所述第一交易量正常,所述第一区间的下限为所述均值与所述标准差乘以预设的第一参数值的差值,所述第一区间的上限为所述均值与所述标准差乘以预设的第一参数值的和值。
在一种可能的实现方法中,所述M个异常值判定方法包括图基准则;所述检测结果确定单元具体用于:确定所述N个第二交易量中的L个第二交易量,所述L个第二交易量与所述第一交易量对应的采样时间处于不同天的同一时段;确定所述L个第二交易量的四分之一分位数和四分之三分位数;若所述第一交易量属于第二区间,则确定所述第一交易量正常,所述第二区间的下限为所述四分之一分位数与四分位间距乘以预设的第二参数值的差值,所述第二区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第二参数值的和值,所述四分位间距为所述四分之三分位数与所述四分之一分位数的差值。
在一种可能的实现方法中,所述M个异常值判定方法包括环比准则;所述检测结果确定单元具体用于:确定所述第一交易量与第三交易量的第一比值,所述第三交易量为所述N个第二交易量中距离所述第一采样时间最近的交易量;若所述第一比值属于第三区间,则确定所述第一交易量正常;其中,所述第三区间是正常值区间。
在一种可能的实现方法中,所述第三区间的下限为第一均值与第一标准差乘以预设的第三参数值的差值,所述第三区间的上限为所述第一均值与所述第一标准差乘以预设的第三参数值的和值,所述第一标准差和所述第一均值是根据所述N个交易量对应的N-1个比值确定的,所述N-1个比值包括所述N个交易量中每两个相邻交易量的比值;或者,其中,所述第三区间的下限为四分之一分位数与四分位间距乘以预设的第四参数值的差值,所述第三区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第四参数值的和值,所述四分位间距为四分之三分位数与所述四分之一分位数的差值,所述四分之三分位数和所述四分之一分位数是根据所述N个交易量对应的N-1个比值确定的,所述N-1个比值包括所述N个交易量中每两个相邻交易量的比值。
在一种可能的实现方法中,所述M个异常值判定方法包括同比准则;所述检测结果确定单元具体用于:确定所述第一交易量与所述N个第二交易量中的第四交易量的第二比值,所述第四交易量与所述第一交易量对应的采样时间处于不同天的同一时段;若所述第二比值属于第四区间,则确定所述第一交易量正常;其中,所述第四区间是正常值区间。
在一种可能的实现方法中,所述第四区间的下限为第二均值与第二标准差乘以预设的 第五参数值的差值,所述第四区间的上限为所述第二均值与所述第二标准差乘以预设的第五参数值的和值,所述第二标准差和所述第二均值是根据所述N个交易量对应的X个比值确定的,所述X个比值包括Y个交易量中每两个相邻交易量的比值,所述Y个交易量为所述N个交易量中处于不同天的同一时段的交易量;或者,其中,所述第四区间的下限为四分之一分位数与四分位间距乘以预设的第六参数值的差值,所述第四区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第六参数值的和值,所述四分位间距为四分之三分位数与所述四分之一分位数的差值,所述四分之三分位数和所述四分之一分位数是根据所述N个交易量对应的X个比值确定的,所述X个比值包括Y个交易量中每两个相邻交易量的比值,所述Y个交易量为所述N个交易量中处于不同天的同一时段的交易量。
基于上述六种方案,针对N个第二交易量,根据选取的多个异常值判定方法,其中可以包括拉依达准则、图基准则、环比准则和同比准则,从而可以确定每个异常值判定方法对应的异常值范围/正常值范围,以便于确定第一交易量在每个异常值判定方法下的检测结果。
第三方面,本申请实施例提供了一种计算设备,包括:
存储器,用于存储程序指令;
处理器,用于调用所述存储器中存储的程序指令,按照获得的程序执行如第一方面的方法、或第一方面的任一实现方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行如第一方面的方法、或第一方面的任一实现方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种数据检测方法;
图2为本申请实施例提供的一种数据检测装置。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
如图1所示,为本申请实施例提供的一种数据检测方法,该方法包括以下步骤:
步骤101,获取第一采样时间对应的第一交易量和所述第一采样时间之前的N个采样时间分别对应的N个第二交易量,N为大于1的整数。
步骤102,根据选取的M个异常值判定方法和所述N个第二交易量,确定所述第一交易量对应的M个检测结果,其中,一个异常值判定方法对应一个检测结果,M为大于1 的整数。
步骤103,根据预设的投票方法对所述M个检测结果进行投票,确定所述第一交易量对应的投票数。
步骤104,若所述第一交易量对应的投票数大于预设的票数阈值,则确定所述第一交易量异常。
基于该方案,针对每个异常值判定方法,根据N个第二交易量以确定第一交易量对应的检测结果;根据第一交易量在每个异常值判定方法下的检测结果和预设的投票方法以确定第一交易量对应的投票数;在确定投票数大于预设的票数阈值时,则确定第一交易量异常。该方案通过实时调整第二交易量的数据范围,有助于提高对第一交易量的检测结果判定的准确性;以及根据多种异常值判定方法得到的检测结果和预设的投票方法,可以实现准确判断第一交易量是否发生异常。
随着数字时代的到来,且伴随着业务的复杂性,各行各业始终源源不断地产生各种数据,金融科技企业也不例外。金融系统产生的数据包括但不限于以下内容:交易系统产生的交易数据,开户系统产生的开户请求数据,清算系统产生的清算数据。作为示例,本申请实施例中以交易系统产生的交易数据为例进行说明。
进一步地,可以根据异常检测关心的维度和时间粒度,对交易系统产生的交易数据进行加工,比如可以得到每分钟发起交易的请求量,每分钟交易失败量占比,某个IP下每分钟发起的交易量等。作为示例,本申请实施例中以每分钟发起交易的请求量为例进行说明。
在上述步骤101中,采样间隔可以为一分钟,可以为一小时,或者可以自行设定,本申请实施例中以一分钟作为采样间隔为例进行说明。鉴于需要对最新数据进行异常值检测的目的,则第一采样时间可以为当前一分钟,第一交易量可以为当前一分钟发起交易的请求量。距离第一采样时间最近的N个采样时间可以为最近一周内的每一分钟、可以为最近一个月内的每一分钟,或者可以自行设定,本申请实施例中以距离第一采样时间最近的100天内的每一分钟为例进行说明。举个例子,比如第一采样时间为今天的00:00:00,则可以统计到距离今天最近的100天内每分钟发起交易的请求量,按时间从远到近分别可以令为X 1,X 2,…,X N(N=100*24*60=144000),从而今天的00:00:00的发起交易的请求量为X N+1,今天的00:01:00的发起交易的请求量为X N+2,同理,今天的其他采样时间发起交易的请求量以此类推。
在上述步骤102中,市面上存在多种多样的异常值判定方法,以实现对数据进行异常值检测。比如可以是依据拉依达(3σ)准则,比如可以是依据图基(Tukey)准则,比如可以是依据环比准则,比如可以是依据同比准则,或者是其他数学统计的方法。本申请实施例中选取3σ准则、Tukey准则、环比准则和同比准则这4个异常值判定方法,以确定最新数据是否为异常值。举个例子,根据3σ准则和距离今天最近的100天内每分钟发起交易的请求量以确定今天的00:00:00时的第一交易量是否发生异常,即得到第一交易量在3σ准则下的检测结果;根据Tukey准则和距离今天最近的100天内每分钟发起交易的请求量以确定今天的00:00:00时第一交易量是否发生异常,即得到第一交易量在Tukey准则下的检测结果;根据环比准则和距离今天最近的100天内每分钟发起交易的请求量以确定今天的00:00:00时第一交易量是否发生异常,即得到第一交易量在环比准则下的检测结果;根据同比准则和距离今天最近的100天内每分钟发起交易的请求量以确定今天的00:00:00时第一交易量是否发生异常,即得到第一交易量在同比准则下的检测结果。
需要说明的是,选取的异常值判定方法可以为3σ准则、Tukey准则、环比准则和同比准则这4个异常值判定方法中的任意两种判定方法,比如可以为3σ准则和Tukey准则,可以为3σ准则和环比准则,可以为Tukey准则和同比准则,等等;或者选取的异常值判定方法可以为3σ准则、Tukey准则、环比准则和同比准则这4个异常值判定方法中的任意三种判定方法,比如可以为3σ准则、Tukey准则和环比准则,可以为Tukey准则、环比准则和同比准则,等等。
在上述步骤103中,根据预设的投票方法对M个检测结果进行投票,则可以确定第一交易量对应的投票数。
在上述步骤104中,通常根据业务人员的经验,可以确定数据发生异常时应该得到的投票数,也即预设的票数阈值,在第一交易量对应的投票数大于预设的票数阈值时,则确定第一交易量发生异常。
对于上述步骤102,在对第一交易量进行初步的异常值检测时,本申请实施例中同时使用了多个异常值判定方法,接下来将对所使用的多个异常值判定方法进行详细说明。
在一种可能实现的方法中,所述M个异常值判定方法包括拉依达准则;根据所述拉依达准则和所述N个第二交易量,确定所述第一交易量对应的检测结果,包括:确定所述N个第二交易量中的L个第二交易量,所述L个第二交易量与所述第一交易量对应的采样时间处于不同天的同一时段;确定所述L个第二交易量的均值和标准差;若所述第一交易量属于第一区间,则确定所述第一交易量正常,所述第一区间的下限为所述均值与所述标准差乘以预设的第一参数值的差值,所述第一区间的上限为所述均值与所述标准差乘以预设的第一参数值的和值。
举个例子,针对距离今天最近的100天内的每分钟发起交易的请求量,也即X 1,X 2,…,X N(N=100*24*60=144000),共N个每分钟发起交易的请求量,先提取每一天中与今天的00:00:00的发起交易的请求量这一当前实时数据X N+1时间对应的发起交易的请求量X 1,X 1441,X 2881…,X N+1-1440,共100个每分钟发起交易的请求量(剔除已知的异常值,如果有;又出于简化的目的,本申请实施例中认为不存在已知的异常值,下文不再赘述),计算这组数据的均值μ 1和标准差σ 1,则对于今天的00:00:00的发起交易的请求量这一当前实时数据X N+1来说,正常值范围是(μ 1-3σ 1)∪(μ 1+3σ 1),此时的正常值范围即为第一区间,也就是说当X N+1在这个范围内时就被判定为正常值;异常值范围是(-∞,μ 1-3σ 1)∪(μ 1+3σ 1,+∞),也就是说当X N+1在这个范围内时就被判定为异常值。其中,本申请实施例中的数字“3”即为第一参数值。
再例如,针对今天的00:01:00的发起交易的请求量这一实时数据X N+2,按照前述的处理方式,提取每一天中与今天的00:01:00的发起交易的请求量这一实时数据X N+2时间对应的请求量X 2,X 1442,X 2882…,X N+2-1440,共100个每分钟发起交易的请求量,计算这组数据的均值μ 2和标准差σ 2,则对于今天的00:01:00的发起交易的请求量这一实时数据X N+2来说,正常值范围是(μ 2-3σ 2)∪(μ 2+3σ 2,),此时的正常值范围即为第一区间,也就是说当X N+2在这个范围内时就被判定为正常值;异常值范围是(-∞,μ 2-3σ 2)∪(μ 2+3σ 2,+∞),也就是说当X N+2在这个范围内时就被判定为异常值。其中,本申请实施例中的数字“3”即为第一参数值。
因此,每当新来一个实时数据时,按照如前所述的方法更新异常值范围,这样就可以做到自适应的计算异常值阈值,以后不需人工再去计算调整。
需要说明的是,统计学上根据一组数据的均值μ和标准差σ定义异常值的范围是(-∞,μ-kσ)∪(μ+kσ,+∞),一般取k=3即为3σ准则;也可以用其他值替代,取决于对异常值偏离正常值的严格程度。如果要求严格即要求异常值与正常值的偏离程度较大,可取k>3;如果要求宽松,可取0<k<3。对第一参数值的具体取值,本发明不做限定。
在一种可能实现的方法中,所述M个异常值判定方法包括图基准则;根据所述图基准则和所述N个第二交易量,确定所述第一交易量对应的检测结果,包括:确定所述N个第二交易量中的L个第二交易量,所述L个第二交易量与所述第一交易量对应的采样时间处于不同天的同一时段;确定所述L个第二交易量的四分之一分位数和四分之三分位数;若所述第一交易量属于第二区间,则确定所述第一交易量正常,所述第二区间的下限为所述四分之一分位数与四分位间距乘以预设的第二参数值的差值,所述第二区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第二参数值的和值,所述四分位间距为所述四分之三分位数与所述四分之一分位数的差值。
举个例子,针对距离今天最近的100天内的每分钟发起交易的请求量,也即X 1,X 2,…,X N(N=100*24*60=144000),共N个每分钟发起交易的请求量,先提取每一天中与今天的00:00:00的发起交易的请求量这一当前实时数据X N+1时间对应的发起交易的请求量X 1,X 1441,X 2881…,X N+1-1440,共100个每分钟发起交易的请求量(剔除已知的异常值,如果有;又出于简化的目的,本申请实施例中认为不存在已知的异常值,下文不再赘述),计算这组数据的25%的分位数Q 11和75%分位数Q 31,则对于今天的00:00:00的发起交易的请求量这一当前实时数据X N+1来说,正常值范围是(Q 11-1.5(Q 31-Q 11))∪(Q 31+1.5(Q 31-Q 11)),此时的正常值范围即为第二区间;异常值范围是(-∞,Q 11-1.5(Q 31-Q 11))∪(Q 31+1.5(Q 31-Q 11),+∞),也就是说当X N+1在这个范围内时就被判定为异常值。其中,本申请实施例中的数字“1.5”即为第二参数值。
再例如,针对今天的00:01:00的发起交易的请求量这一实时数据X N+2,按照前述的处理方式,提取每一天中与今天的00:01:00的发起交易的请求量这一实时数据X N+2时间对应的请求量X 2,X 1442,X 2882…,X N+2-1440,共100个每分钟发起交易的请求量,计算这组数据的25%的分位数Q 12和75%分位数Q 32,则对于今天的00:01:00的发起交易的请求量这一实时数据X N+2来说,正常值范围是(Q 12-1.5(Q 32-Q 12))∪(Q 32+1.5(Q 32-Q 12)),此时的正常值范围即为第二区间,也就是说当X N+2异在这个范围内时就被判定为正常值;异常值范围是(-∞,Q 12-1.5(Q 32-Q 12))∪(Q 32+1.5(Q 32-Q 12),+∞),也就是说当X N+2在这个范围内时就被判定为异常值。其中,本申请实施例中的数字“1.5”即为第二参数值。
因此,每当新来一个实时数据时,按照如前所述的方法更新异常值范围,这样就可以做到自适应的计算异常值阈值,以后不需人工再去计算调整。
需要说明的是,统计学上根据一组数据的25%分位数Q 1和75%分位数Q 3定义异常值范围是(-∞,Q 1-k(Q 3-Q 1))∪(Q 3+k(Q 3-Q 1),+∞),一般取k=1.5;同样,也可以用其他值替代,取决于对异常值偏离正常值的严格程度。如果要求严格即要求异常值与正常值的偏离程度较大,可取k>1.5;如果要求宽松,可取0<k<1.5。对第二参数值的具体取值,本发明不做限定。
在一种可能实现的方法中,所述M个异常值判定方法包括环比准则;根据所述环比准 则和所述N个第二交易量,确定所述第一交易量对应的检测结果,包括:确定所述第一交易量与第三交易量的第一比值,所述第三交易量为所述N个第二交易量中距离所述第一采样时间最近的交易量;若所述第一比值属于第三区间,则确定所述第一交易量正常;其中,所述第三区间是正常值区间。
举个例子,针对距离今天最近的100天内的每分钟发起交易的请求量,也即X 1,X 2,…,X N(N=100*24*60=144000),共N个每分钟发起交易的请求量(剔除已知的异常值,如果有;处于简化的目的,本申请实施例中认为不存在已知的异常值,下文不再赘述),计算历史环比
Figure PCTCN2021079361-appb-000001
其中,环比是计算当前一分钟发起交易的请求量与上一分钟发起交易的请求量的比值。接下来,对于历史环比
Figure PCTCN2021079361-appb-000002
一共N-1个历史环比,确定出以上历史环比的正常值范围,此时的正常值范围即为第三区间;则对于今天的00:00:00的发起交易的请求量这一当前实时数据X N+1来说,计算最新的环比
Figure PCTCN2021079361-appb-000003
此时最新的环比
Figure PCTCN2021079361-appb-000004
即为第一比值。如果计算得出的HR N+1在第三区间内,则判定X N+1为正常值,否则判定X N+1为异常值。
再例如,对于今天的00:01:00的发起交易的请求量这一当前实时数据X N+2来说,通过计算X 2,…,X N,X N+1这一组数据的历史环比HR i(i=3,4,…,N+1),确定出以上历史环比的正常值范围,此时的正常值范围即为第三区间;则对于今天的00:01:00的发起交易的请求量这一当前实时数据X N+2来说,计算最新的环比
Figure PCTCN2021079361-appb-000005
此时最新的环比
Figure PCTCN2021079361-appb-000006
即为第一比值。如果计算得出的HR N+2在第三区间,则判定X N+2为正常值,否则判定X N+2为异常值。
因此,每当新来一个实时数据时,按照如前所述的方法更新异常值范围,这样就可以做到自适应的计算异常值阈值,以后不需人工再去计算调整。
其中,可以通过以下方法确定历史环比的正常值范围:
在一种可能实现的方法中,所述第三区间的下限为第一均值与第一标准差乘以预设的第三参数值的差值,所述第三区间的上限为所述第一均值与所述第一标准差乘以预设的第三参数值的和值,所述第一标准差和所述第一均值是根据所述N个交易量对应的N-1个比值确定的,所述N-1个比值包括所述N个交易量中每两个相邻交易量的比值;或者,其中,所述第三区间的下限为四分之一分位数与四分位间距乘以预设的第四参数值的差值,所述第三区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第四参数值的和值,所述四分位间距为四分之三分位数与所述四分之一分位数的差值,所述四分之三分位数和所述四分之一分位数是根据所述N个交易量对应的N-1个比值确定的,所述N-1个比值包括所述N个交易量中每两个相邻交易量的比值。
举个例子,对于历史环比
Figure PCTCN2021079361-appb-000007
一共N-1个历史环比,计算这N-1个历史环比的均值μ 3和标准差σ 3,其中,均值μ 3即为第一均值,标准差σ 3即为第一标准差;进一步地,可以根据这N-1个历史环比的均值μ 3和标准差σ 3以确定正常值范围,也即第三区间,第三区间为:(μ 3-3σ 3)∪(μ 3+3σ 3),其中,本申请实施例中的数字“3”即为第三参数值。则对于今天的00:00:00的发起交易的请求量这一当前实时数据X N+1来说,计算最新的环比
Figure PCTCN2021079361-appb-000008
如果计算得出的HR N+1在(μ 3-3σ 3)∪(μ 3+3σ 3)这一区间,则判定X N+1为正常值,否则判定X N+1为异常值。
或者再例如,对于历史环比
Figure PCTCN2021079361-appb-000009
一共N-1个历史环比,计算这N-1个历史环比的25%的分位数Q 13和75%分位数Q 33,其中,25%的分位数Q 13即为四分之 一分位数,75%分位数Q 33即为四分之三分位数;进一步地,可以根据这N-1个历史环比的25%的分位数Q 13和75%分位数Q 33以确定正常值范围,也即第三区间,第三区间为:(Q 13-1.5(Q 33-Q 13))∪(Q 33+1.5(Q 33-Q 13)),其中,(Q 33-Q 13)即为四分位间距,本申请实施例中的数字“1.5”即为第四参数值。则对于今天的00:00:00的发起交易的请求量这一当前实时数据X N+1来说,计算最新的环比
Figure PCTCN2021079361-appb-000010
如果计算得出的HR N+1在(Q 13-1.5(Q 33-Q 13))∪(Q 33+1.5(Q 33-Q 13))这一区间,则判定X N+1为正常值,否则判定X N+1为异常值。
在一种可能实现的方法中,所述M个异常值判定方法包括同比准则;根据所述同比准则和所述N个第二交易量,确定所述第一交易量对应的检测结果,包括:确定所述第一交易量与所述N个第二交易量中的第四交易量的第二比值,所述第四交易量与所述第一交易量对应的采样时间处于不同天的同一时段;若所述第二比值属于第四区间,则确定所述第一交易量正常;其中,所述第四区间是正常值区间。
举个例子,对于今天的00:00:00的发起交易的请求量X N+1和距离今天最近的100天内的每分钟发起交易的请求量X 1,X 2,…,X N(N=100*24*60=144000),对于这一组数据中的部分数据,分别为:X 1,X 1441,X 2881,…,X N+1,共计101个数据;计算这101个数据的同比
Figure PCTCN2021079361-appb-000011
则一共得到100个同比,其中同比是计算当前一分钟发起交易的请求量与上一个交易日同一分钟发起交易的请求量的比值。接下来,对于历史同比
Figure PCTCN2021079361-appb-000012
一共99个历史同比,确定出以上历史同比(指99个历史同比)的正常值范围,此时的正常值范围即为第四区间;则对于今天的00:00:00的发起交易的请求量这一当前实时数据X N+1来说,计算得出的最新同比
Figure PCTCN2021079361-appb-000013
即为第二比值。如果计算得出的
Figure PCTCN2021079361-appb-000014
在第四区间,则判定X N+1为正常值,否则判定X N+1为异常值。
再例如,对于今天的00:01:00的发起交易的请求量X N+2和距离今天最近的100天内的每分钟发起交易的请求量X 1,X 2,…,X N(N=100*24*60=144000),对于这一组数据中的部分数据,分别为:X 2,X 1442,X 2882,…,X N+2,共计101个数据,计算这101个数据的同比
Figure PCTCN2021079361-appb-000015
则一共得到100个同比。接下来,对于历史同比
Figure PCTCN2021079361-appb-000016
一共99个历史同比,确定出以上历史同比(指99个历史同比)的正常值范围,此时的正常值范围即为第四区间;则对于今天的00:01:00的发起交易的请求量这一当前实时数据X N+2来说,计算得出的最新同比
Figure PCTCN2021079361-appb-000017
即为第二比值。如果计算得出的
Figure PCTCN2021079361-appb-000018
在第四区间,则判定X N+2为正常值,否则判定X N+2为异常值。
因此,每当新来一个实时数据时,按照如前所述的方法更新异常值范围,这样就可以做到自适应的计算异常值阈值,以后不需人工再去计算调整。
其中,可以通过以下方法确定历史同比的正常值范围:
在一种可能实现的方法中,所述第四区间的下限为第二均值与第二标准差乘以预设的第五参数值的差值,所述第四区间的上限为所述第二均值与所述第二标准差乘以预设的第五参数值的和值,所述第二标准差和所述第二均值是根据所述N个交易量对应的X个比值确定的,所述X个比值包括Y个交易量中每两个相邻交易量的比值,所述Y个交易量为所述N个交易量中处于不同天的同一时段的交易量;或者,其中,所述第四区间的下限为 四分之一分位数与四分位间距乘以预设的第六参数值的差值,所述第四区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第六参数值的和值,所述四分位间距为四分之三分位数与所述四分之一分位数的差值,所述四分之三分位数和所述四分之一分位数是根据所述N个交易量对应的X个比值确定的,所述X个比值包括Y个交易量中每两个相邻交易量的比值,所述Y个交易量为所述N个交易量中处于不同天的同一时段的交易量。
举个例子,对于历史同比
Figure PCTCN2021079361-appb-000019
一共99个历史同比,计算这99个历史同比的均值μ 4和标准差σ 4,其中,均值μ 4即为第二均值,标准差σ 4即为第二标准差;进一步地,可以根据这99个历史同比的均值μ 4和标准差σ 4以确定正常值范围,也即第四区间,第四区间为:(μ 4-3σ 4)∪(μ 4+3σ 4),其中,本申请实施例中的数字“3”即为第五参数值。则对于今天的00:00:00的发起交易的请求量这一当前实时数据X N+1来说,计算最新的同比
Figure PCTCN2021079361-appb-000020
如果计算得出的
Figure PCTCN2021079361-appb-000021
在(μ 4-3σ 4)∪(μ 4+3σ 4)这一区间,则判定X N+1为正常值,否则判定X N+1为异常值。
或者再例如,对于历史同比
Figure PCTCN2021079361-appb-000022
一共99个历史同比,计算这99个历史同比的25%的分位数Q 14和75%分位数Q 34,其中25%的分位数Q 14即为四分之一分位数,75%分位数Q 34即为四分之三分位数;进一步地,可以根据这99个历史同比的25%的分位数Q 14和75%分位数Q 34以确定正常值范围,也即第四区间,第四区间为:(Q 14-1.5(Q 34-Q 14))∪(Q 34+1.5(Q 34-Q 14)),其中,(Q 34-Q 14)即为四分位间距,本申请实施例中的数字“1.5”即为第六参数值。则对于今天的00:00:00的发起交易的请求量这一当前实时数据X N+1来说,计算最新的同比
Figure PCTCN2021079361-appb-000023
如果计算得出的
Figure PCTCN2021079361-appb-000024
在(Q 14-1.5(Q 34-Q 14))∪(Q 34+1.5(Q 34-Q 14))这一区间,则判定X N+1为正常值,否则判定X N+1为异常值。
本申请实施例中同时选取3σ准则、Tukey准则、环比准则和同比准则这4个异常值判定方法,以分别对最新数据进行异常值的检测。因此,对于最新数据,这4种方法都会对其进行一次异常值的检测,并且得到在每种异常值判定方法下的检测结果。由于存在多种可能的检测结果,因此需要综合考虑各个检测结果对于最新数据的异常值判定的影响,从而最终确定最新数据是否为异常值。
作为一种可能实现的方法,可以采用简单投票的方法。其中,简单投票的思想是:设总票数为100票,在认为每个异常值判定方法对于确定最新数据具有相同的准确率时,则每个异常值判定方法分别具有25票;如果某个方法认为最新数据是异常值,则将25票全部投给最新数据,否则不投票,即投0票给最新数据。若最新数据的最终得票数超过预设的票数阈值,比如可以是50票(该票数阈值可以由业务人员根据实际的业务需要和经验进行设定),那么则判定最新数据是异常值。
举例如下:比如对于今天的00:00:00的发起交易的请求量为X N+1,通过所选取的3σ准则、Tukey准则、环比准则和同比准则这4个异常值判定方法进行判定后,其检测结果可能如下:X N+1由3σ准则判定是异常值,X N+1由Tukey准则判定是异常值,X N+1由环比准则判定是正常值,X N+1由同比准则判定是正常值。因此,基于简单投票的思想,X N+1的得票数为50,不超过票数阈值(设为50票),则最终确定X N+1为正常值,不需要产生告警指示,以提示业务人员进行相应的维护。
作为一种可能实现的方法,所述根据预设的投票方法对所述M个检测结果进行投票,确定所述第一交易量对应的投票数,包括:根据所述投票方法对应的总票数和所述M个异 常值判定方法分别对应的历史投票正确率,确定所述M个异常值判定方法分别对应的票数;根据所述M个异常值判定方法分别对应的票数和所述第一交易量对应的M个检测结果,确定所述第一交易量对应的投票数。
举个例子,由于不知道所选取的3σ准则、Tukey准则、环比准则和同比准则这4个异常值判定方法中的哪一个或者哪一些异常值判定方法的准确率较高,因此最开始可以采取简单投票;等到达到一定的检测次数(如100次)时,则可以对历史每种异常值判定方法的判定结果与最终投票的结果进行比较,而得出每种异常值判定方法的准确率,进而以后的每一次投票可以基于历史判定异常值的准确率而给每个异常值判定方法分配不同的投票数,也即加权投票。
为了便于简单投票方法的具体实施,特给出以下数学表达式:
最终得票总数计算如下:
Figure PCTCN2021079361-appb-000025
其中,
Figure PCTCN2021079361-appb-000026
最终判定是否是异常值(E=1表示判定是异常值,E=0表示判定不是异常值):
Figure PCTCN2021079361-appb-000027
假设对于第D+1(D≥100)次的数据进行异常值检测,在D+1次的检测中第i种方法的判定结果是L ij(i=1,2,3,4;j=1,2,…,D+1),最终异常值判定结果是E j,其中:
Figure PCTCN2021079361-appb-000028
Figure PCTCN2021079361-appb-000029
那么,基于过去D次判定情况得出每种方法的准确率为:
Figure PCTCN2021079361-appb-000030
其中,
Figure PCTCN2021079361-appb-000031
表示示性函数,即当L ij=E j(第i种方法第j次的判定结论与最终投票结论相同)时取值为1,否则为0。基于历史准确率就可以得到每种方法的权重
Figure PCTCN2021079361-appb-000032
在总票数为100时,每种方法分配的投票数为:
Figure PCTCN2021079361-appb-000033
所以,第D+1次检测时的得票总数为:
Figure PCTCN2021079361-appb-000034
其中,
Figure PCTCN2021079361-appb-000035
表示示性函数,即当L i(D+1)=1(第i种方法第D+1次判定结论是异常值)时取值为1,否则为0。然后根据公式(2)即可做出最终加权投票的判定结论。
为了便于理解,现特举如下示例:
比如,在100次的简单投票过程中,其中共有96次,进一步地细化为其中的2次和其中的94次:设对于其中的这2次中的每一次由3σ准则判定待检测的当前采样时间的发起交易的请求量是异常值,且本次由4种异常值判定方法综合判定待检测的当前采样时间的发起交易的请求量是异常值;以及,对于其中的这94次中的每一次由3σ准则判定待检测的当前采样时间的发起交易的请求量是正常值,且本次由4种异常值判定方法综合判定待检测的当前采样时间的发起交易的请求量是正常值。据此,3σ准则的准确率为96%;
同理,在100次的简单投票过程中,其中共有92次,进一步地细化为其中的3次和其中的89次:设对于其中的这3次中的每一次由Tukey准则判定待检测的当前采样时间的发起交易的请求量是异常值,且本次由4种异常值判定方法综合判定待检测的当前采样时间的发起交易的请求量是异常值;以及,对于其中的这89次中的每一次由Tukey准则判定待检测的当前采样时间的发起交易的请求量是正常值,且本次由4种异常值判定方法综合判定待检测的当前采样时间的发起交易的请求量是正常值。据此,Tukey准则的准确率为92%;
同理,在100次的简单投票过程中,其中共有90次,进一步地细化为其中的1次和其中的89次:设对于其中的这1次中的每一次由环比准则判定待检测的当前采样时间的发起交易的请求量是异常值,且本次由4种异常值判定方法综合判定待检测的当前采样时间的发起交易的请求量是异常值;以及,对于其中的这89次中的每一次由环比准则判定待检测的当前采样时间的发起交易的请求量是正常值,且本次由4种异常值判定方法综合判定待检测的当前采样时间的发起交易的请求量是正常值。据此,环比准则的准确率为90%;
同理,在100次的简单投票过程中,其中共有98次,进一步地细化为其中的4次和其中的94次:设对于其中的这4次中的每一次由同比准则判定待检测的当前采样时间的发起交易的请求量是异常值,且本次由4种异常值判定方法综合判定待检测的当前采样时间的发起交易的请求量是异常值;以及,对于其中的这94次中的每一次由同比准则判定待检测的当前采样时间的发起交易的请求量是正常值,且本次由4种异常值判定方法综合判定待检测的当前采样时间的发起交易的请求量是正常值。据此,同比准则的准确率为98%。
根据以上4个异常值判定方法的准确率,则可以确定每个异常值判定方法的权重:
对于3σ准则,其权重值为:
Figure PCTCN2021079361-appb-000036
对于Tukey准则,其权重值为:
Figure PCTCN2021079361-appb-000037
对于环比准则,其权重值为:
Figure PCTCN2021079361-appb-000038
对于同比准则,其权重值为:
Figure PCTCN2021079361-appb-000039
设总票数为100票,将总票数乘以每个异常值判定方法的权重值,即可以得到每个异常值方法对应的得票数。如对于3σ准则,其得票数为25.5票;对于Tukey准则,其得票数为24.5票;对于环比准则,其得票数为23.9票;则对同比准则,其得票数为26.1票。
从而,对于第101次的当前采样时间的发起交易的请求量,使用加权投票的方法来确定其是否为异常值。设对于第101次的当前采样时间的发起交易的请求量,由3σ准则判定是异常值,由Tukey准则判定是正常值,由环比准则判定是异常值,由同比准则判定是正常值。因此,对于第101次的当前采样时间的发起交易的请求量,它的最终得票数为49.4票(其中,25.5+0+23.9+0=49.4)。设预设的票数阈值为50票,由于49.4票不足50票,由此可以认为第101次的当前采样时间的发起交易的请求量为正常值,不需要产生告警指示,以提示业务人员进行相应的维护。
基于同样的构思,本申请实施例还提供一种数据检测装置,如图2所示,该装置包括:
交易量获取单元201,获取第一采样时间对应的第一交易量和所述第一采样时间之前的N个采样时间分别对应的N个第二交易量,N为大于1的整数;
检测结果确定单元202,用于根据选取的M个异常值判定方法和所述N个第二交易量,确定所述第一交易量对应的M个检测结果,其中,一个异常值判定方法对应一个检测结果, M为大于1的整数;
投票数确定单元203,用于根据预设的投票方法对所述M个检测结果进行投票,确定所述第一交易量对应的投票数;
异常确定单元204,用于若所述第一交易量对应的投票数大于预设的票数阈值,则确定所述第一交易量异常。
进一步地,对于该装置,所述投票数确定单元203,具体用于:根据所述投票方法对应的总票数和所述M个异常值判定方法分别对应的历史投票正确率,确定所述M个异常值判定方法分别对应的票数;根据所述M个异常值判定方法分别对应的票数和所述第一交易量对应的M个检测结果,确定所述第一交易量对应的投票数。
进一步地,对于该装置,所述M个异常值判定方法包括拉依达准则;所述检测结果确定单元202,具体用于:确定所述N个第二交易量中的L个第二交易量,所述L个第二交易量与所述第一交易量对应的采样时间处于不同天的同一时段;确定所述L个第二交易量的均值和标准差;若所述第一交易量属于第一区间,则确定所述第一交易量正常,所述第一区间的下限为所述均值与所述标准差乘以预设的第一参数值的差值,所述第一区间的上限为所述均值与所述标准差乘以预设的第一参数值的和值。
进一步的,对于该装置,所述M个异常值判定方法包括图基准则;所述检测结果确定单元202,具体用于:确定所述N个第二交易量中的L个第二交易量,所述L个第二交易量与所述第一交易量对应的采样时间处于不同天的同一时段;确定所述L个第二交易量的四分之一分位数和四分之三分位数;若所述第一交易量属于第二区间,则确定所述第一交易量正常,所述第二区间的下限为所述四分之一分位数与四分位间距乘以预设的第二参数值的差值,所述第二区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第二参数值的和值,所述四分位间距为所述四分之三分位数与所述四分之一分位数的差值。
进一步的,对于该装置,所述M个异常值判定方法包括环比准则;所述检测结果确定单元202,具体用于:确定所述第一交易量与第三交易量的第一比值,所述第三交易量为所述N个第二交易量中距离所述第一采样时间最近的交易量;若所述第一比值属于第三区间,则确定所述第一交易量正常;其中,所述第三区间是正常值区间。
进一步的,对于该装置,所述第三区间的下限为第一均值与第一标准差乘以预设的第三参数值的差值,所述第三区间的上限为所述第一均值与所述第一标准差乘以预设的第三参数值的和值,所述第一标准差和所述第一均值是根据所述N个交易量对应的N-1个比值确定的,所述N-1个比值包括所述N个交易量中每两个相邻交易量的比值;或者,其中,所述第三区间的下限为四分之一分位数与四分位间距乘以预设的第四参数值的差值,所述第三区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第四参数值的和值,所述四分位间距为四分之三分位数与所述四分之一分位数的差值,所述四分之三分位数和所述四分之一分位数是根据所述N个交易量对应的N-1个比值确定的,所述N-1个比值包括所述N个交易量中每两个相邻交易量的比值。
进一步的,对于该装置,所述M个异常值判定方法包括同比准则;所述检测结果确定单元202,具体用于:确定所述第一交易量与所述N个第二交易量中的第四交易量的第二比值,所述第四交易量与所述第一交易量对应的采样时间处于不同天的同一时段;若所述第二比值属于第四区间,则确定所述第一交易量正常;其中,所述第四区间是正常值区间。
进一步的,对于该装置,所述第四区间的下限为第二均值与第二标准差乘以预设的第 五参数值的差值,所述第四区间的上限为所述第二均值与所述第二标准差乘以预设的第五参数值的和值,所述第二标准差和所述第二均值是根据所述N个交易量对应的X个比值确定的,所述X个比值包括Y个交易量中每两个相邻交易量的比值,所述Y个交易量为所述N个交易量中处于不同天的同一时段的交易量;或者,其中,所述第四区间的下限为四分之一分位数与四分位间距乘以预设的第六参数值的差值,所述第四区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第六参数值的和值,所述四分位间距为四分之三分位数与所述四分之一分位数的差值,所述四分之三分位数和所述四分之一分位数是根据所述N个交易量对应的X个比值确定的,所述X个比值包括Y个交易量中每两个相邻交易量的比值,所述Y个交易量为所述N个交易量中处于不同天的同一时段的交易量。
本申请实施例还提供了一种计算设备,该计算设备具体可以为桌面计算机、便携式计算机、智能手机、平板电脑、个人数字助理(Personal Digital Assistant,PDA)等。该计算设备可以包括中央处理器(Center Processing Unit,CPU)、存储器、输入/输出设备等,输入设备可以包括键盘、鼠标、触摸屏等,输出设备可以包括显示设备,如液晶显示器(Liquid Crystal Display,LCD)、阴极射线管(Cathode Ray Tube,CRT)等。
存储器,可以包括只读存储器(ROM)和随机存取存储器(RAM),并向处理器提供存储器中存储的程序指令和数据。在本申请实施例中,存储器可以用于执行数据检测方法的程序指令;
处理器,用于调用所述存储器中存储的程序指令,按照获得的程序执行数据检测方法。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行数据检测方法。
本领域内的技术人员应明白,本申请的实施例可提供为方法、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (11)

  1. 一种数据检测方法,其特征在于,包括:
    获取第一采样时间对应的第一交易量和所述第一采样时间之前的N个采样时间分别对应的N个第二交易量,N为大于1的整数;
    根据选取的M个异常值判定方法和所述N个第二交易量,确定所述第一交易量对应的M个检测结果,其中,一个异常值判定方法对应一个检测结果,M为大于1的整数;
    根据预设的投票方法对所述M个检测结果进行投票,确定所述第一交易量对应的投票数;
    若所述第一交易量对应的投票数大于预设的票数阈值,则确定所述第一交易量异常。
  2. 如权利要求1所述的方法,其特征在于,所述根据预设的投票方法对所述M个检测结果进行投票,确定所述第一交易量对应的投票数,包括:
    根据所述投票方法对应的总票数和所述M个异常值判定方法分别对应的历史投票正确率,确定所述M个异常值判定方法分别对应的票数;
    根据所述M个异常值判定方法分别对应的票数和所述第一交易量对应的M个检测结果,确定所述第一交易量对应的投票数。
  3. 如权利要求1或2所述的方法,其特征在于,所述M个异常值判定方法包括拉依达准则;
    根据所述拉依达准则和所述N个第二交易量,确定所述第一交易量对应的检测结果,包括:
    确定所述N个第二交易量中的L个第二交易量,所述L个第二交易量与所述第一交易量对应的采样时间处于不同天的同一时段;
    确定所述L个第二交易量的均值和标准差;
    若所述第一交易量属于第一区间,则确定所述第一交易量正常,所述第一区间的下限为所述均值与所述标准差乘以预设的第一参数值的差值,所述第一区间的上限为所述均值与所述标准差乘以预设的第一参数值的和值。
  4. 如权利要求1或2所述的方法,其特征在于,所述M个异常值判定方法包括图基准则;
    根据所述图基准则和所述N个第二交易量,确定所述第一交易量对应的检测结果,包括:
    确定所述N个第二交易量中的L个第二交易量,所述L个第二交易量与所述第一交易量对应的采样时间处于不同天的同一时段;
    确定所述L个第二交易量的四分之一分位数和四分之三分位数;
    若所述第一交易量属于第二区间,则确定所述第一交易量正常,所述第二区间的下限为所述四分之一分位数与四分位间距乘以预设的第二参数值的差值,所述第二区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第二参数值的和值,所述四分位间距为所述四分之三分位数与所述四分之一分位数的差值。
  5. 如权利要求1或2所述的方法,其特征在于,所述M个异常值判定方法包括环比准则;
    根据所述环比准则和所述N个第二交易量,确定所述第一交易量对应的检测结果,包 括:
    确定所述第一交易量与第三交易量的第一比值,所述第三交易量为所述N个第二交易量中距离所述第一采样时间最近的交易量;
    若所述第一比值属于第三区间,则确定所述第一交易量正常;
    其中,所述第三区间是正常值区间。
  6. 如权利要求5所述的方法,其特征在于,所述第三区间的下限为第一均值与第一标准差乘以预设的第三参数值的差值,所述第三区间的上限为所述第一均值与所述第一标准差乘以预设的第三参数值的和值,所述第一标准差和所述第一均值是根据所述N个交易量对应的N-1个比值确定的,所述N-1个比值包括所述N个交易量中每两个相邻交易量的比值;或者,
    其中,所述第三区间的下限为四分之一分位数与四分位间距乘以预设的第四参数值的差值,所述第三区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第四参数值的和值,所述四分位间距为四分之三分位数与所述四分之一分位数的差值,所述四分之三分位数和所述四分之一分位数是根据所述N个交易量对应的N-1个比值确定的,所述N-1个比值包括所述N个交易量中每两个相邻交易量的比值。
  7. 如权利要求1或2所述的方法,其特征在于,所述M个异常值判定方法包括同比准则;
    根据所述同比准则和所述N个第二交易量,确定所述第一交易量对应的检测结果,包括:
    确定所述第一交易量与所述N个第二交易量中的第四交易量的第二比值,所述第四交易量与所述第一交易量对应的采样时间处于不同天的同一时段;
    若所述第二比值属于第四区间,则确定所述第一交易量正常;
    其中,所述第四区间是正常值区间。
  8. 如权利要求7所述的方法,其特征在于,所述第四区间的下限为第二均值与第二标准差乘以预设的第五参数值的差值,所述第四区间的上限为所述第二均值与所述第二标准差乘以预设的第五参数值的和值,所述第二标准差和所述第二均值是根据所述N个交易量对应的X个比值确定的,所述X个比值包括Y个交易量中每两个相邻交易量的比值,所述Y个交易量为所述N个交易量中处于不同天的同一时段的交易量;或者,
    其中,所述第四区间的下限为四分之一分位数与四分位间距乘以预设的第六参数值的差值,所述第四区间的上限为所述四分之一分位数与所述四分位间距乘以预设的第六参数值的和值,所述四分位间距为四分之三分位数与所述四分之一分位数的差值,所述四分之三分位数和所述四分之一分位数是根据所述N个交易量对应的X个比值确定的,所述X个比值包括Y个交易量中每两个相邻交易量的比值,所述Y个交易量为所述N个交易量中处于不同天的同一时段的交易量。
  9. 一种数据检测装置,其特征在于,包括:
    交易量获取单元,用于获取第一采样时间对应的第一交易量和所述第一采样时间之前的N个采样时间分别对应的N个第二交易量,N为大于1的整数;
    检测结果确定单元,用于根据选取的M个异常值判定方法和所述N个第二交易量,确定所述第一交易量对应的M个检测结果,其中,一个异常值判定方法对应一个检测结果,M为大于1的整数;
    投票数确定单元,用于根据预设的投票方法对所述M个检测结果进行投票,确定所述第一交易量对应的投票数;
    异常确定单元,用于若所述第一交易量对应的投票数大于预设的票数阈值,则确定所述第一交易量异常。
  10. 一种计算设备,其特征在于,包括:
    存储器,用于存储程序指令;
    处理器,用于调用所述存储器中存储的程序指令,按照获得的程序执行如权利要求1-8任一项所述的方法。
  11. 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行如权利要求1-8任一项所述的方法。
PCT/CN2021/079361 2020-03-13 2021-03-05 一种数据检测方法及装置 WO2021180009A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010177723.4 2020-03-13
CN202010177723.4A CN111400155B (zh) 2020-03-13 2020-03-13 一种数据检测方法及装置

Publications (1)

Publication Number Publication Date
WO2021180009A1 true WO2021180009A1 (zh) 2021-09-16

Family

ID=71434955

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/079361 WO2021180009A1 (zh) 2020-03-13 2021-03-05 一种数据检测方法及装置

Country Status (2)

Country Link
CN (1) CN111400155B (zh)
WO (1) WO2021180009A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115657448A (zh) * 2022-12-09 2023-01-31 电信科学技术第五研究所有限公司 用于时钟驯服的频率偏移量分析方法
CN116765426A (zh) * 2023-06-25 2023-09-19 浙江拓博环保科技有限公司 一种3d打印的金属粉末筛分方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400155B (zh) * 2020-03-13 2021-08-31 深圳前海微众银行股份有限公司 一种数据检测方法及装置
CN113344696A (zh) * 2021-06-28 2021-09-03 深圳前海微众银行股份有限公司 一种头寸预报评估方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778904A (zh) * 2017-01-10 2017-05-31 上海鲲云信息科技有限公司 一种数据异常检测方法、系统及具有该系统的服务器
WO2017160409A1 (en) * 2016-03-17 2017-09-21 Nec Laboratories America, Inc. Real-time detection of abnormal network connections in streaming data
CN109146236A (zh) * 2018-07-06 2019-01-04 东软集团股份有限公司 指标异常检测方法、装置、可读存储介质及电子设备
CN109448365A (zh) * 2018-10-16 2019-03-08 北京航空航天大学 一种跨尺度空基平台区域道路交通系统综合监视方法
CN111400155A (zh) * 2020-03-13 2020-07-10 深圳前海微众银行股份有限公司 一种数据检测方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407083B (zh) * 2016-10-26 2019-06-18 华为技术有限公司 故障检测方法及装置
CN109816137A (zh) * 2017-11-21 2019-05-28 银联数据服务有限公司 一种交易量监测方法及装置
CN110458580A (zh) * 2018-05-07 2019-11-15 中移(苏州)软件技术有限公司 一种异常渠道检测方法、装置、介质及设备
CN110363223A (zh) * 2019-06-20 2019-10-22 华南理工大学 工业流量数据处理方法、检测方法、系统、装置和介质
CN110189228A (zh) * 2019-06-24 2019-08-30 深圳前海微众银行股份有限公司 一种监测异常交易的方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017160409A1 (en) * 2016-03-17 2017-09-21 Nec Laboratories America, Inc. Real-time detection of abnormal network connections in streaming data
CN106778904A (zh) * 2017-01-10 2017-05-31 上海鲲云信息科技有限公司 一种数据异常检测方法、系统及具有该系统的服务器
CN109146236A (zh) * 2018-07-06 2019-01-04 东软集团股份有限公司 指标异常检测方法、装置、可读存储介质及电子设备
CN109448365A (zh) * 2018-10-16 2019-03-08 北京航空航天大学 一种跨尺度空基平台区域道路交通系统综合监视方法
CN111400155A (zh) * 2020-03-13 2020-07-10 深圳前海微众银行股份有限公司 一种数据检测方法及装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115657448A (zh) * 2022-12-09 2023-01-31 电信科学技术第五研究所有限公司 用于时钟驯服的频率偏移量分析方法
CN116765426A (zh) * 2023-06-25 2023-09-19 浙江拓博环保科技有限公司 一种3d打印的金属粉末筛分方法及系统
CN116765426B (zh) * 2023-06-25 2023-11-24 浙江拓博环保科技有限公司 一种3d打印的金属粉末筛分方法及系统

Also Published As

Publication number Publication date
CN111400155A (zh) 2020-07-10
CN111400155B (zh) 2021-08-31

Similar Documents

Publication Publication Date Title
WO2021180009A1 (zh) 一种数据检测方法及装置
CN108923996B (zh) 一种容量分析方法及装置
CN110471821B (zh) 异常变更检测方法、服务器及计算机可读存储介质
CN114500339B (zh) 一种节点带宽监测方法、装置、电子设备及存储介质
CN112966199B (zh) 一种页面调整收益的确定方法、装置、电子设备和介质
CN114999665A (zh) 数据处理方法、装置、电子设备及存储介质
WO2022251837A1 (en) Machine learning time series anomaly detection
CN114490406A (zh) 测试覆盖项管理方法、装置、设备及介质
CN116644372B (zh) 一种账户类型的确定方法、装置、电子设备及存储介质
KR102464688B1 (ko) 모니터링 결과의 이벤트 등급 결정 방법 및 장치
US20230394069A1 (en) Method and apparatus for measuring material risk in a data set
EP4134834A1 (en) Method and apparatus of processing feature information, electronic device, and storage medium
CN113220967B (zh) 互联网环境的生态健康程度衡量方法、装置和电子设备
CN115545216B (zh) 一种业务指标预测方法、装置、设备和存储介质
CN116974621A (zh) 一种参数配置方法、装置及设备
CN116010952A (zh) 动态基线确定方法、交易数据检测方法、装置和电子设备
CN117495525A (zh) 数据处理方法、装置、设备、存储介质及程序产品
CN113129127A (zh) 预警方法和装置
CN115757538A (zh) 数据处理方法、装置、电子设备、存储介质及程序产品
CN116663905A (zh) 金融风险的预测方法、装置、设备、存储介质及程序产品
CN117011011A (zh) 审核数据的处理方法、装置、设备及存储介质
CN117437033A (zh) 一种预警方法、装置、设备和可读存储介质
CN115545088A (zh) 模型构建方法、分类方法、装置和电子设备
CN113609454A (zh) 一种异常交易检测方法和装置
CN117611324A (zh) 信用评级方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21767249

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 160123)

122 Ep: pct application non-entry in european phase

Ref document number: 21767249

Country of ref document: EP

Kind code of ref document: A1