CN114493230A - Abnormal service data processing method and device and storage medium - Google Patents

Abnormal service data processing method and device and storage medium Download PDF

Info

Publication number
CN114493230A
CN114493230A CN202210068428.4A CN202210068428A CN114493230A CN 114493230 A CN114493230 A CN 114493230A CN 202210068428 A CN202210068428 A CN 202210068428A CN 114493230 A CN114493230 A CN 114493230A
Authority
CN
China
Prior art keywords
service data
data
sequence
service
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210068428.4A
Other languages
Chinese (zh)
Inventor
宋韶旭
王浩宇
张小健
王建民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210068428.4A priority Critical patent/CN114493230A/en
Publication of CN114493230A publication Critical patent/CN114493230A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06316Sequencing of tasks or work
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0633Workflow analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a method, a device and a storage medium for processing abnormal service data, wherein the method comprises the following steps: acquiring a plurality of service data of a service, the cycle length T of each service data and the acquisition time of each service data; classifying the service data with the same time sequence position into an expansion set based on the time sequence position of the acquisition time of the service data in the data cycle of the service data; for each service data yiWill service data yiIs acquired at time tiBefore and after each WdEach service data and service data y in the extended set to which the service data corresponding to each acquisition time belongsiEach service data in the extended set and the collection time tiBefore and after each WsThe service data corresponding to the acquisition time form service data yiExtended neighborhood set Ni(ii) a Extension based on corresponding of each service dataAnd the neighborhood set is used for determining abnormal business data from a plurality of business data.

Description

Abnormal service data processing method and device and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for processing abnormal service data, and a storage medium.
Background
Business (such as website application) inevitably causes business problems in the application process. The process of troubleshooting is mainly to determine abnormal data from the business data generated in the business processing process, and then analyze the determined abnormal data to determine and solve the business problem.
At present, the method for determining abnormal data from all business data is generally a traditional K-Sigma (K-Sigma) abnormal value detection method. Specifically, as shown in fig. 1, the service data acquisition device 11 acquires service data (such as the access amount of a certain website), and sends a plurality of service data acquired within a period of time (such as one week) and acquisition time corresponding to each service data to the abnormal data identification device 12. The abnormality detection unit 121 in the abnormal data identification device 12 determines the service data y by using the following K-Sigma abnormal value detection methodjWhether it is abnormal data: determining that the formula | y is satisfiedjj|>K·σjTraffic data yjIs the exception data. Wherein the content of the first and second substances,
Figure BDA0003481042800000011
Njfor traffic data yjIs in the neighborhood set, | NjL is the set NjRadix or component set NjK is a preset value (e.g., K equals to 3). y isjNeighborhood set N ofjComprises the following steps: will yjAnd yjCorresponding acquisition time tjBefore and after each WsAnd the data collection is formed by the service data corresponding to each acquisition time. Wherein, WsJ is a natural number greater than zero for a preset window value.
When the method in the prior art is adopted to determine the abnormal data, the problem that the obtained abnormal data is not accurate exists. Inaccurate data will result in failure to determine or efficiently solve the business problem based on the abnormal data.
Disclosure of Invention
The application provides a processing method and device of abnormal business data and a storage medium, which are used for solving the problem that obtained abnormal data are not accurate when the abnormal data are determined by adopting the method in the prior art.
In a first aspect, the present application provides a method for processing abnormal service data, including:
acquiring a plurality of service data of a service, the cycle length T of each service data and the acquisition time of each service data; the cycle length is the number of the service data contained in the data cycle corresponding to the service data;
classifying the service data with the same time sequence position into an expansion set based on the time sequence position of the acquisition time of the service data in the data cycle of the service data;
for each service data yiWill service data yiIs acquired at time tiBefore and after each WdEach service data and service data y in the extended set to which the service data corresponding to each acquisition time belongsiEach service data in the extended set and the collection time tiBefore and after each WsThe business data corresponding to the acquisition time form the business data yiExtended neighborhood set Ni
Determining abnormal business data from the plurality of business data based on the expansion neighborhood set corresponding to each business data;
wherein i is 1,2,3, …, n; n is not less than 2 and is a natural number; wdAnd WsAre all preset values, and Wd<Ws(ii) a T is more than or equal to 2 and is a natural number.
Optionally, the classifying the service data with the same time sequence position into an extended set based on the time sequence position of the acquisition time of each service data in the data cycle of the service data includes:
sequentially classifying each service data in a plurality of service data arranged in time sequence into an extended set Z arranged in sequencehObtaining T expansion sets;
the time sequence positions of the acquisition time of each service data in each expansion set in the data cycle to which the service data belongs are the same; h is 1,2,3, …, T.
Optionally, the classifying the service data with the same time sequence position into an extended set based on the time sequence position of the acquisition time of each service data in the data cycle of the service data includes:
marking position identification on each service data, wherein the position identification is the identification of a time sequence position of the acquisition time of the service data in the data cycle of the service data;
and grouping the service data with the same position identification into an extended set.
Optionally, determining abnormal service data from the multiple service data based on the extended neighborhood set corresponding to each service data includes:
determining abnormal business data from a plurality of business data according to the following modes based on the expansion neighborhood set corresponding to each business data:
determining that the formula | y is satisfiedii|>K·σiTraffic data yiIs abnormal data; wherein the content of the first and second substances,
Figure BDA0003481042800000031
Nifor traffic data yiExtended neighborhood set of, | NiL is the set NiRadix or component set N ofiK is a preset value.
Optionally, the obtaining the cycle length T of each service data includes:
based on the plurality of service data and the acquisition time of each service data, determining the period length T of each service data by adopting the following mode:
arranging the n service data according to the acquisition time sequence of each service data to form a service data sequence X;
calculating a sequence A formed by the 1 st service data to the n-k service data in the service data sequence X and a Pearson correlation coefficient c of a sequence B formed by the k +1 st service data to the n service data in the service data sequence XkObtaining n Pearson correlation coefficients ck
The correlation coefficient of each Pearson is calculated according to the correlation coefficient of the Pearson ckThe numerical values of k are sequentially arranged to form an autocorrelation sequence C; the numerical sequence is the sequence of numerical values from large to small, or the numerical valuesThe order is from small to large;
based on a preset peak threshold value CthDetermining a plurality of spikes from the autocorrelation sequence C as follows: based on Pearson's correlation coefficient ckAnd ckEach W before and afterpA peak set composed of Pearson correlation coefficients, and determining the peak set with the maximum value not less than CthThe Peak correlation coefficient;
sequencing the position serial numbers q of the peak values in the autocorrelation sequence C in an ascending order according to the numerical value to form a serial number set, and subtracting the serial numbers of two adjacent positions in the serial number set to obtain the difference absolute value of the difference value of the serial numbers of the two adjacent positions;
forming a difference sequence by the absolute values of the differences, and carrying out median calculation on the difference sequence to obtain a median of the difference sequence, wherein the median represents the cycle length T of each service data;
wherein, WpIs a preset value; 1,2,3, …, n; k is 0,1,2,3, …, (n-1); q is 1,2,3, …, n; n is not less than 2 and is a natural number.
In a second aspect, the present application provides an exception data handling apparatus, comprising: a cycle determination unit and a data identification unit;
the period determining unit is used for acquiring a plurality of service data of a service, the period length T of each service data and the acquisition time of each service data; the cycle length is the number of the service data contained in the data cycle corresponding to the service data;
the data identification unit is used for grouping the service data with the same time sequence position into an extended set based on the time sequence position of the acquisition time of the service data in the data cycle of the service data; for each service data yiWill service data yiIs acquired at time tiBefore and after each WdEach service data and service data y in the extended set to which the service data corresponding to each acquisition time belongsiEach service data in the extended set and the collection time tiBefore and after each WsThe business data corresponding to the acquisition time form the business data yiExtended neighborhood set Ni(ii) a Determining abnormal business data from the plurality of business data based on the expansion neighborhood set corresponding to each business data;
wherein i is 1,2,3, …, n; n is not less than 2 and is a natural number; wdAnd WsAre all preset values, and Wd<Ws(ii) a T is more than or equal to 2 and is a natural number.
Optionally, the period determining unit includes a processing module, an autocorrelation calculating module, a peak searching module, and a period determining module, and the data identifying unit includes a data grouping module, a neighborhood determining module, and an anomaly detecting module;
the processing module is used for acquiring a plurality of service data of a service, the cycle length T of each service data and the acquisition time of each service data; the system is also used for acquiring a plurality of service data of a service and the acquisition time of each service data;
the autocorrelation calculating module is used for arranging n service data according to the acquisition time sequence of each service data to form a service data sequence X, then calculating a sequence A formed by the 1 st service data to the n-k service data in the service data sequence X and a Pearson correlation coefficient of a sequence B formed by the k +1 st service data to the n service data in the service data sequence to obtain n Pearson correlation coefficients ck(ii) a And the correlation coefficient of each Pearson is calculated according to the correlation coefficient of Pearson ckThe numerical values of k are sequentially arranged to form an autocorrelation sequence C; the numerical value sequence is the sequence of numerical values from large to small or the sequence of numerical values from small to large;
the peak searching module is used for searching for a peak based on a preset peak threshold value CthDetermining a plurality of spikes from the autocorrelation sequence C as follows: based on Pearson's correlation coefficient ckAnd ckEach W before and afterpA peak set composed of Pearson correlation coefficients, and determining the peak set with the maximum value not less than CthThe Peak correlation coefficient;
the period determining module is used for sequencing the position serial numbers q of the peak values in the autocorrelation sequence C in an ascending order according to the numerical value to form a serial number set, and subtracting the adjacent two position serial numbers in the serial number set to obtain the difference absolute value of the difference value of the two adjacent position serial numbers; after the absolute values of the differences are combined into a difference sequence, carrying out median calculation on the difference sequence to obtain the median of the difference sequence, wherein the median represents the cycle length T of each service data;
the data grouping module is used for grouping the service data with the same time sequence position into an extended set based on the time sequence position of the acquisition time of the service data in the data cycle of the service data;
the neighborhood determination module is used for determining each service data yiWill service data yiIs acquired at time tiBefore and after each WdEach service data and service data y in the extended set to which the service data corresponding to each acquisition time belongsiEach service data in the extended set and the collection time tiBefore and after each WsThe business data corresponding to the acquisition time form the business data yiExtended neighborhood set Ni
The abnormal detection module is used for determining abnormal business data from a plurality of business data based on the expansion neighborhood set corresponding to each business data;
wherein, WpIs a preset value; 1,2,3, …, n; k is 0,1,2,3, …, (n-1); q is 1,2,3, …, n; n is not less than 2 and is a natural number.
In a third aspect, the present application provides an abnormal data processing apparatus, including:
a processor and a memory;
the memory stores executable instructions executable by the processor;
wherein execution of the executable instructions stored by the memory by the processor causes the processor to perform the method as described above.
In a fourth aspect, the present application provides a storage medium having stored therein computer-executable instructions for implementing the method as described above when executed by a processor.
In a fifth aspect, the present application provides a program product comprising a computer program which, when executed by a processor, implements the method as described above.
According to the abnormal business data processing method, device and storage medium, the neighborhood set of each business data determined by the prior art method is periodically expanded to obtain the expanded neighborhood set of each business data, and accurate abnormal business data are determined from a plurality of business data based on the expanded neighborhood set of each business data. According to the method, in the process of determining the abnormal business data, the periodic mutation data in the periodic business data are filtered, so that the accuracy of the determined abnormal business data is ensured, and the problem that the abnormal data determined by the method in the prior art is not accurate is solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a prior art anomaly data determination system architecture diagram;
FIG. 2 is a schematic diagram of website visitation volume of a website within a week according to an embodiment of the present application;
FIG. 3 is a diagram of an architecture of a system for processing abnormal business data according to an embodiment of the present application;
fig. 4 is a schematic diagram of a processing method for abnormal service data according to an embodiment of the present application;
fig. 5 is a structural diagram of an abnormal data processing apparatus according to an embodiment of the present application;
fig. 6 is a structural diagram of an abnormal data processing apparatus according to an embodiment of the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
FIG. 1 is a prior art abnormal data determination system architecture diagram. As shown in fig. 1, the service data acquisition device 11 acquires service data, and sends a plurality of service data acquired within a period of time and acquisition time corresponding to each service data to the abnormal data identification device 12. The abnormal data detection unit 121 in the abnormal data identification device 12 determines abnormal data from the plurality of service data by using a K-Sigma abnormal value detection method. The anomaly detection unit 121 sends the determined anomaly data to the processing device 10, and the processing device 10 processes the service or the service problem of the service system by the anomaly data.
For example, when a website access problem occurs in an application process of a website, abnormal data is usually determined from traffic data (e.g., the amount of website access) of a time period (e.g., one week) including the time when the website access problem occurs, and then the traffic problem is determined based on the abnormal data and solved. Fig. 2 is a schematic view of website visitation amount of a certain website in one week according to an embodiment of the present application. As shown in fig. 2, the visit data of the website is periodic data with a period of one day, and sudden changes of the data (such as the periodic sudden change data shown in fig. 2) occur at certain fixed times of each day. The periodic mutation data is generated when the website is maintained every day, is normal business data, and does not represent that business problems occur in corresponding business (namely, the website application).
As shown in fig. 1, however,the method for determining abnormal data in the prior art is based on the service data yjNeighborhood set N ofjDetermining the service data y by adopting a K-Sigma abnormal value detection methodjWhether or not anomalous data, where neighborhood set NjThe periodicity of the traffic data is not considered.
When determining abnormal data in periodic traffic data as shown in fig. 2 by using a method of the related art, the periodic mutation data as shown in fig. 2 is generally determined as abnormal data. That is, the abnormal data is determined from the traffic data as shown in fig. 2 by using the method of the prior art, and the obtained abnormal data is composed of a plurality of periodic mutation data and one real abnormal data as shown in fig. 2. Of the determined abnormal data, only the true abnormal data as shown in fig. 2 is data associated with or can characterize a traffic problem, and the cycle break data is redundant data not associated with a traffic problem. Therefore, when the method in the prior art is used for determining the abnormal data of the periodic service data, the obtained abnormal data is not accurate. Inaccurate abnormal data can result in failure to determine or efficiently solve the business problem based on the abnormal data. In addition, if a technician is required to analyze the determined abnormal data to determine a business problem and solve the business problem of the business or the business system, the inaccuracy of the abnormal data causes great trouble to the analysis, determination and solution of the business problem for the technician, and the efficiency and accuracy of analyzing and solving the business problem by the technician are greatly reduced.
In view of the above, the present application provides a method for processing abnormal service data, which includes periodically expanding a neighborhood set of each service data to obtain an expanded neighborhood set of each service data, and determining abnormal service data based on the expanded neighborhood set of each service data, where the determined abnormal service data does not include periodic mutation data (i.e., the periodic mutation data is filtered out), so as to ensure that the obtained abnormal service data is accurate data associated with a service problem. When the abnormal business data is used for analyzing and determining the business problems, the high efficiency and the accuracy of the business problem analyzing and solving work are greatly improved, and the data volume for analyzing and solving the business problems is reduced.
The following describes a method for processing abnormal service data provided by the present application with reference to some embodiments.
Fig. 3 is a diagram of an architecture of a system for processing abnormal service data according to an embodiment of the present application. As shown in fig. 3, the system includes: the system comprises a processing device 10, a business data acquisition device 11 and an abnormal data processing device 13 connected with the processing device, wherein the abnormal data processing device 13 comprises a period determining unit 131 and a data identifying unit 132.
Illustratively, the service data collecting device 11 collects service data of a service, and sends a plurality of service data of the service, a cycle length T of each service data, and a collecting time of each service data to the cycle determining unit 131 in the abnormal data processing device 13. The period length T is the number of service data included in the data period corresponding to the service data. The cycle determining unit 131 sends the obtained plurality of pieces of service data, the cycle length T of each piece of service data, and the acquisition time of each piece of service data to the data identifying unit 132.
The data identification unit 132 classifies the service data with the same time sequence position into an extended set based on the time sequence position of the acquisition time of the service data in the data cycle of the service data. The data identification unit 132 also identifies each service data yiWill service data yiIs acquired at time tiBefore and after each WdEach service data and service data y in the extended set to which the service data corresponding to each acquisition time belongsiEach service data in the extended set and the collection time tiBefore and after each WsThe business data corresponding to the acquisition time form the business data yiExtended neighborhood set Ni. Then, the data identification unit 132 determines abnormal service data from the plurality of service data based on the extended neighborhood set corresponding to each service data. The data recognition unit 132 transmits the abnormal traffic data to the processing device 10. The processing device 10 obtains abnormal traffic data based on the received abnormal traffic dataAnd processing the service problem of the service or the service system.
Wherein i is 1,2,3, …, n; n is not less than 2 and is a natural number; w is a group ofdAnd WsAre all preset values, and Wd<Ws(ii) a T is more than or equal to 2 and is a natural number.
Alternatively, after the service data acquisition device 11 acquires the service data of a service, the service data acquisition device 11 may send only the plurality of service data of the service and the acquisition time of each service data to the period determination unit 131 in the abnormal data processing device 13. The cycle determining unit 131 determines the cycle length T of each service data based on the obtained plurality of service data and the acquisition time of each service data. The cycle determining unit 131 sends the obtained plurality of pieces of service data and the acquisition time of each piece of service data, and the determined cycle length T of each piece of service data to the data identifying unit 132.
According to the abnormal business data processing method provided by the embodiment of the application, all the business data with the same time sequence position are classified into one expansion set, the expansion neighborhood set of all the business data is determined and obtained based on the expansion set to which all the business data belong, and then accurate abnormal business data is determined from the business data based on the expansion neighborhood set corresponding to all the business data. By the adoption of the abnormal business data processing method, when abnormal business data are determined from a plurality of business data, the periodic mutation data are filtered by expanding the neighborhood set, the periodic mutation data are prevented from being determined as abnormal data, and the accuracy of the determined abnormal business data is guaranteed. The processing method for the abnormal business data, provided by the embodiment of the application, solves the problem that the obtained abnormal data is not accurate when the abnormal data is determined by adopting the method in the prior art.
The following describes a method for processing abnormal service data provided in this embodiment with reference to fig. 4. Fig. 4 is a schematic diagram of a method for processing abnormal service data according to an embodiment of the present application. The execution subject of the embodiment of the present application is the abnormal data processing apparatus 13 in the embodiment shown in fig. 3. As shown in fig. 4, the method includes:
s401, acquiring a plurality of service data of a service, the cycle length T of each service data and the acquisition time of each service data; the cycle length is the number of the service data contained in the data cycle corresponding to the service data.
Specifically, the cycle determining unit 131 in the abnormal data processing apparatus 13 acquires a plurality of pieces of service data of a service, the cycle length T of each piece of service data, and the acquisition time of each piece of service data from the service data acquiring apparatus 11. The period length is the number of the service data contained in the data period corresponding to the service data; t is more than or equal to 2 and is a natural number.
Alternatively, the period determining unit 131 may also obtain a plurality of service data of a service and the acquisition time of each service data from the service data acquisition device 11, and determine the period length T of each service data based on the obtained plurality of service data and the acquisition time of each service data.
Next, the cycle determining unit 131 sends the obtained plurality of pieces of service data, the cycle length T of each piece of service data, and the acquisition time of each piece of service data to the data identifying unit 132.
The cycle determining unit 131 determines the cycle length T of each service data based on the obtained plurality of service data and the acquisition time of each service data, as follows:
the cycle determining unit 131 determines the cycle length T of each service data in the following manner based on the plurality of service data and the acquisition time of each service data. Assuming that the plurality of service data are n service data, the cycle determining unit 131 determines the cycle length T of each service data according to steps S4011 to 4016 based on the n service data and the collection time of each service data.
S4011, arranging the n service data according to the collection time sequence of each service data to form a service data sequence X.
Illustratively, the period determination unit 131 arranges n service data in the collection time sequence of each service data to form a service data sequence X shown in table 1.
Table 1 service data sequence X
Sequencing serial number i of service data in X 1 2 3 4 n
Service data yi y1 y2 y3 y4 yn
S4012, calculating a sequence A composed of the 1 st service data to the n-k service data in the service data sequence X and a Pearson correlation coefficient c of a sequence B composed of the k +1 st service data to the n service data in the service data sequence XkObtaining n Pearson correlation coefficients ck. Wherein i is 1,2,3, …, n; k is 0,1,2,3, …, (n-1); n is not less than 2 and is a natural number.
Illustratively, the period determining unit 131 calculates the Pearson correlation system of the sequence a and the sequence B as follows in equations (1) and (2)Number ck
Figure BDA0003481042800000101
Figure BDA0003481042800000102
Wherein, m is the number of service data in each sequence of the sequence A or the sequence B; a is afFor traffic data in sequence A, bfThe service data in the sequence B is obtained; f is 1,2,3, …, m; m is not less than 1 and is a natural number.
S4013, calculating correlation coefficient of each Pearson according to Pearson correlation coefficient ckThe numerical order of k in (a) constitutes the autocorrelation sequence C.
Illustratively, the period determining unit 131 compares each of the pearson correlation coefficients by the pearson correlation coefficient ckThe numerical order of k in (a) constitutes an autocorrelation sequence C, an example of which is shown in table 2. The numerical sequence is from large to small, or from small to large.
TABLE 2 autocorrelation sequences C
Figure BDA0003481042800000103
Wherein q is 1,2,3, …, n; n is not less than 2 and is a natural number.
S4014, based on preset peak threshold value CthDetermining a plurality of spikes from the autocorrelation sequence C as follows: based on Pearson's correlation coefficient ckAnd ckEach W before and afterpDetermining peak set composed of Pearson correlation coefficients, determining the peak set with maximum value not less than CthThe pearson correlation coefficient of (a) is a peaked value. Wherein, WpIs a preset value.
Illustratively, the period determining unit 131 determines, for each of the pearson correlation coefficients in the autocorrelation sequence C, whether each of the pearson correlation coefficients is sharp or not as followsPeak value: based on Pearson's correlation coefficient ckAnd ckEach W before and afterpDetermining peak set composed of Pearson correlation coefficients, determining the peak set with maximum value not less than CthThe pearson correlation coefficient of (a) is a peaked value.
For example, suppose WpThe period determining unit 131 sequentially determines whether the pearson correlation coefficient shown in table 2 is a peaked value as follows:
c is to0And c0Before and after each 1 Pearson correlation coefficient (i.e., c)0,c1) Constituting a set of peaks c0,c1]Determining a set of peaks [ c ]0,c1]The median value is maximum and not less than CthThe Peak correlation coefficient;
c is to1And c1Before and after each 1 Pearson correlation coefficient (i.e., c)0,c1,c2) Constituting a set of peaks c0,c1,c2]Determining a set of peaks [ c ]0,c1,c2]The median value is maximum and not less than CthThe Peak correlation coefficient;
c is to2And c2Before and after each 1 Pearson correlation coefficient (i.e., c)1,c2,c3) Constituting a set of peaks c1,c2,c3]Determining a set of peaks [ c ]1,c2,c3]The median value is maximum and not less than CthThe Peak correlation coefficient;
…;
similarly, until it is determined whether each pearson correlation coefficient in the autocorrelation sequence C is a peaked value.
S4015, the position serial numbers q of the peak values in the autocorrelation sequence C are sorted in an ascending order according to the numerical value to form a serial number set, and the adjacent two position serial numbers in the serial number set are subtracted to obtain the difference absolute value of the difference value of the two adjacent position serial numbers.
Illustratively, the period determination unit 131 determines if it is determined according to step S4014Determining sharp peaks in the autocorrelation sequence C as shown in Table 2 as C0,c4,c8,c10. Each peak value c of the period determination unit 1310,c4,c8,c10The position sequence numbers 1,5,9 and 11 in the autocorrelation sequence C are sorted in ascending order according to the value size to form a sequence number set [1,5,9,11]And the serial numbers are aggregated to [1,5,9,11 ]]And subtracting the serial numbers of the two adjacent positions to obtain the absolute difference value of the serial numbers of the two adjacent positions, namely 4,4 and 2.
Alternatively, the numerical value of the position index q may be a correlation coefficient C associated with the pearson constituting the autocorrelation sequence CkK in (1) is the same value.
S4016, forming a difference sequence by the absolute values of the differences, and performing median calculation on the difference sequence to obtain a median of the difference sequence, wherein the median represents the cycle length T of each service data.
Illustratively, the period determining unit 131 makes the absolute difference values 4,4, and 2 of the differences between the sequence numbers of two adjacent positions determined in step S4015 form a difference sequence [4,4,2], and performs median calculation on the difference sequence [4,4,2], so as to obtain a median of the difference sequence [4,4,2] as 4, where the median represents the period length T of each service data, that is, T is 4.
S402, classifying the service data with the same time sequence position into an expansion set based on the time sequence position of the acquisition time of the service data in the data cycle of the service data.
For example, after the data identification unit 132 receives the plurality of service data, the period length T of each service data, and the collection time of each service data sent by the period determination unit 131, the data identification unit 132 classifies each service data with the same time sequence position into an extended set based on the time sequence position of the collection time of each service data in the data period of the service data.
Alternatively, the data identification unit 132 may sequentially sort each of the plurality of service data into the sequentially sorted extended set ZhIn (2), T expansion sets are obtained. Wherein the collection of each service data in each extended setTime, the time sequence positions in the data cycle to which the service data belongs are the same; h is 1,2,3, …, T.
For example, the data identification unit 132 obtains a plurality of service data, and arranges the service data according to the collection time sequence of each amateur data, so as to obtain the service data sequence X shown in table 1. Assuming that the period length T of each service data is 4 and n is 12, the data identification unit 132 sequentially puts each service data in the service data sequence X into the sequentially arranged extended set ZhIn (2), T extended sets are obtained as shown in table 3.
TABLE 3 extended set Zh
Extended set Z1 [y1,y5,y9]
Extended set Z2 [y2,y6,y10]
Extended set Z3 [y3,y7,y11]
Extended set Z4 [y4,y8,y12]
Optionally, the data identification unit 132 may also mark a location identifier for each service data, where the location identifier is an identifier of a time sequence location where the collection time of the service data is located in the data cycle of the service data. Then, the data identification unit 132 classifies the service data with the same location identity into an extended set.
For example, the data identification unit 132 obtains a plurality of service data in the service data sequence X as shown in table 1. Assuming that the service data in the service data sequence X shown in table 1 are arranged in time sequence, and the cycle length T of each service data is 4 and n is 12, the data identification unit 132 marks the position identifier shown in table 4 for each service data, and classifies the service data with the same position identifier into an extended set, so as to obtain an extended set shown in table 3.
TABLE 4 data period and location identification for each service data
Figure BDA0003481042800000131
The data cycle lengths of the service data are the same as the cycle lengths of the data cycle 1, the data cycle 2 and the data cycle 3 shown in table 4.
S403, aiming at each service data yiWill service data yiIs acquired at time tiBefore and after each WdEach service data and service data y in the extended set to which the service data corresponding to each acquisition time belongsiEach service data in the extended set and the collection time tiBefore and after each WsThe business data corresponding to the acquisition time form the business data yiExtended neighborhood set Ni
Specifically, the data identification unit 132 identifies the traffic data y for each service dataiWill service data yiIs acquired at time tiBefore and after each WdEach service data and service data y in the extended set to which the service data corresponding to each acquisition time belongsiService data in the extended set and acquisition time tiBefore and after each WsThe business data corresponding to the acquisition time form the business data yiExtended neighborhood set Ni. Wherein i is 1,2,3, …, n; 2N is not more than n and n is a natural number; wdAnd WsAre all preset values, and Wd<Ws
Exemplarily, it is assumed that the plurality of service data obtained by the data identification unit 132 are as shown in table 4, where T is 4, Wd=0,WsThe extended set corresponding to each service data is shown in table 3, where 1 is defined as the value. With service data y2Extended neighborhood set N2For example, the data identification unit 132 identifies the service data y2Is acquired at time t2Business data and business data y in the expansion set to which the business data corresponding to the previous and subsequent 0 acquisition times belong2The extension set Z to which it belongs2Each service data (i.e. y) in (2)2,y6,y10) And a time of acquisition t2Business data corresponding to 1 acquisition time before and after (i.e. y)1And y3) Form service data y2Extended neighborhood set N2:[y2,y6,y10,y1,y3]。
In contrast, if the method of the prior art is adopted, when the service data is the same, T is 4, WsWhen the parameter conditions are the same, such as 1, the neighborhood set N of the identified traffic data y2 is determined2cIs [ y ]2,y1,y3]。
S404, determining abnormal business data from the plurality of business data based on the expansion neighborhood set corresponding to each business data.
Illustratively, the data identification unit 132 determines abnormal service data from the plurality of service data based on the extended neighborhood set corresponding to each service data.
For example, the data identification unit 132 determines abnormal service data from the plurality of service data based on the extended neighborhood set corresponding to each service data as follows:
the data identification unit 132 determines the service data y satisfying equation (3)iIn the case of the abnormal data,
|yii|>K·σi(3);
wherein the content of the first and second substances,
Figure BDA0003481042800000141
Nifor traffic data yiExtended neighborhood set of, | NiL is the set NiRadix or component set NiK is a preset value.
Further, the data recognition unit 132 transmits the determined abnormal traffic data to the processing device 10. The processing device 10 processes the service problem of the service or the service system based on the obtained abnormal service data.
From the above, as shown in step S403, the service data y determined by the method of the present application2Extended neighborhood set N2Is [ y ]2,y6,y10,y1,y3]. And the service data y determined by the prior art method2Neighborhood set N of2cIs [ y ]2,y1,y3]. It is assumed that specific numerical values of the respective service data shown in table 4 are shown in table 5 below.
TABLE 5 Business data
Figure BDA0003481042800000142
As can be seen from Table 5, y2For periodic mutation data, y2Front and back service data y1And y3Are all normal data. When the method for processing abnormal service data provided by the application is adopted, the method is based on y2Extended neighborhood set N of2Y can be determined2Not abnormal traffic data; however, when determining abnormal data using the prior art method, y is based2Neighborhood set N of2cDetermining y2Is the exception data.
The following example is combined to specifically compare and explain technical effects achieved by the processing method of abnormal service data provided by the present application and the method in the prior art. It is assumed that the time-series service data and related information generated by the service D in the period L are as shown in table 6.
Table 6 service data and related information generated by service D during time period L
Figure BDA0003481042800000143
By respectively adopting the abnormal service data processing method provided by the present application and the method in the prior art, the results shown in table 7 are obtained when the abnormal service data is determined from the service data shown in table 6.
TABLE 7 abnormal service data determined by different processing methods
Figure BDA0003481042800000151
As can be seen from table 6 and table 7, the service data "-3" in table 6 is abnormal service data associated with the service problem occurring in the service D, and the other service data in table 6 is normal service data not associated with the service problem occurring in the service D. Therefore, as can be seen from the results of the abnormal service data determined in table 7, the abnormal service data associated with the service problem can be more accurately determined by using the method for processing abnormal service data provided by the present application.
Therefore, the abnormal business data processing method provided by the application periodically expands the neighborhood set of each business data to obtain the expanded neighborhood set of each business data. Therefore, by adopting the processing method of the abnormal business data, the periodic mutation data can be filtered when the abnormal business data is determined. The abnormal business data processing method avoids that the periodic mutation data is mistaken for abnormal data associated with business problems, and further the business problem determination and solving work is interfered.
According to the abnormal business data processing method, the neighborhood set of each business data is periodically expanded to obtain the expanded neighborhood set of each business data, the abnormal business data is determined based on the expanded neighborhood set of each business data, and the accurate abnormal business data associated with business problems are obtained. The processing method for the abnormal business data greatly improves the efficiency and accuracy of business problem analysis and solution work. In addition, when the period information such as the data period and the period length of the service data is unknown, the processing method of the abnormal service data provided by the application can be used for determining the period information of the service data.
An abnormal data processing device provided by the embodiment of the present application is further provided, and the following describes the abnormal data processing device provided by the embodiment of the present application with reference to fig. 3 and fig. 5. Fig. 5 is a structural diagram of an abnormal data processing apparatus according to an embodiment of the present application. As shown in fig. 3, the abnormal data processing apparatus 13 includes: a cycle determination unit 131 and a data identification unit 132;
a period determining unit 131, configured to obtain multiple service data of a service, a period length T of each service data, and an acquisition time of each service data; the cycle length is the number of the service data contained in the data cycle corresponding to the service data.
The data identification unit 132 is configured to classify the service data with the same time sequence position into an extended set based on the time sequence position of the acquisition time of the service data in the data cycle of the service data; for each service data yiWill service data yiIs acquired at time tiBefore and after each WdEach service data and service data y in the extended set to which the service data corresponding to each acquisition time belongsiEach service data in the extended set and the collection time tiBefore and after each WsThe business data corresponding to the acquisition time form the business data yiExtended neighborhood set Ni(ii) a Determining abnormal business data from the plurality of business data based on the expansion neighborhood set corresponding to each business data; wherein i is 1,2,3, …, n; n is not less than 2 and is a natural number; wdAnd WsAre all preset values, and Wd<Ws(ii) a T is more than or equal to 2 and is a natural number.
Optionally, as shown in fig. 5, the period determining unit 131 includes a processing module 1311, an autocorrelation calculating module 1312, a spike search module 1313, and a period determining module 1314.
The data recognition unit 132 includes a data grouping module 1321, a neighborhood determination module 1322, and an anomaly detection module 1323.
The processing module 1311 is configured to obtain multiple service data of a service, a cycle length T of each service data, and an acquisition time of each service data; the method is also used for acquiring a plurality of service data of one service and the acquisition time of each service data.
An autocorrelation calculating module 1312, configured to arrange the n service data in the collection time sequence of each service data to form a service data sequence X, and then calculate a Pearson correlation coefficient between a sequence a formed by the 1 st service data to the n-th service data in the service data sequence X and a sequence B formed by the k +1 th service data to the n-th service data in the service data sequence, so as to obtain n Pearson correlation coefficients ck(ii) a And the correlation coefficient of each Pearson is calculated according to the correlation coefficient of Pearson ckThe numerical order of k in (a) constitutes the autocorrelation sequence C. The numerical sequence is from large to small, or from small to large.
A peak search module 1313 configured to search for a peak based on a preset peak threshold CthDetermining a plurality of spikes from the autocorrelation sequence C as follows: based on Pearson's correlation coefficient ckAnd ckEach W before and afterpDetermining peak set composed of Pearson correlation coefficients, determining the peak set with maximum value not less than CthThe pearson correlation coefficient of (a) is a peaked value.
The period determining module 1314 is configured to sort the position sequence numbers q of each peaked value in the autocorrelation sequence C in an ascending order according to the magnitude of the value to form a sequence number set, and subtract two adjacent position sequence numbers in the sequence number set to obtain a difference absolute value of the difference between the two adjacent position sequence numbers; and after the absolute values of the differences are combined into a difference sequence, carrying out median calculation on the difference sequence to obtain the median of the difference sequence. Wherein, the median represents the period length T of each service data.
The data grouping module 1321 is configured to group the service data with the same time sequence position into an extended set based on the time sequence position of the acquisition time of the service data in the data cycle of the service data.
Neighborhood determination module 1322 for each traffic data yiWill service data yiIs acquired at time tiBefore and after each WdEach service data and service data y in the extended set to which the service data corresponding to each acquisition time belongsiService data in the extended set and acquisition time tiBefore and after each WsThe business data corresponding to the acquisition time form the business data yiExtended neighborhood set Ni
The anomaly detection module 1323 is configured to determine anomalous service data from the multiple service data based on the extended neighborhood set corresponding to each service data.
Wherein, WpIs a preset value; 1,2,3, …, n; k is 0,1,2,3, …, (n-1); q is 1,2,3, …, n; n is not less than 2 and is a natural number.
The specific implementation principle and the implemented technical effect of the abnormal data processing device provided in the embodiment of the present application are similar to the specific implementation principle and the implemented technical effect of the embodiment shown in fig. 4, and are not described herein again.
The embodiment of the application also provides an abnormal data processing device. Fig. 6 is a structural diagram of an abnormal data processing apparatus according to an embodiment of the present application. As shown in fig. 6, the apparatus includes a processor 61 and a memory 62, and the memory 62 stores executable instructions of the processor 61, so that the processor 61 can be used to execute the technical solution of the foregoing method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again. It should be understood that the Processor 61 may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor. The Memory 62 may include a high-speed Random Access Memory (RAM), a Non-volatile Memory (NVM), at least one disk Memory, a usb disk, a removable hard disk, a read-only Memory, a magnetic disk, or an optical disk.
The embodiment of the present application further provides a storage medium, where computer execution instructions are stored in the storage medium, and when the computer execution instructions are executed by the processor, the method for processing abnormal service data is implemented. The storage medium may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk or an optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.
The embodiments of the present application also provide a program product, such as a computer program, which when executed by a processor, implements the method for processing abnormal service data covered by the present application.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for processing abnormal service data is characterized by comprising the following steps:
acquiring a plurality of service data of a service, the cycle length T of each service data and the acquisition time of each service data; the cycle length is the number of the service data contained in the data cycle corresponding to the service data;
classifying the service data with the same time sequence position into an expansion set based on the time sequence position of the acquisition time of the service data in the data cycle of the service data;
for each service data yiService data yiIs acquired at time tiBefore and after each WdEach service data and service data y in the extended set to which the service data corresponding to each acquisition time belongsiEach service data in the extended set and the collection time tiBefore and after each WsService data corresponding to each acquisition time form service data yiExtended neighborhood set ofNi
Determining abnormal business data from the plurality of business data based on the expansion neighborhood set corresponding to each business data;
wherein i is 1,2,3, …, n; n is not less than 2 and is a natural number; wdAnd WsAre all preset values, and Wd<Ws(ii) a T is more than or equal to 2 and is a natural number.
2. The method of claim 1, wherein grouping the service data with the same time sequence position into an extended set based on the time sequence position of the collection time of the service data in the data cycle of the service data comprises:
sequentially classifying each service data in a plurality of service data arranged in time sequence into an extended set Z arranged in sequencehObtaining T expansion sets;
the time sequence positions of the acquisition time of each service data in each expansion set in the data cycle to which the service data belongs are the same; h is 1,2,3, …, T.
3. The method of claim 1, wherein grouping the service data with the same time sequence position into an extended set based on the time sequence position of the collection time of the service data in the data cycle of the service data comprises:
marking position identification on each service data, wherein the position identification is the identification of a time sequence position of the acquisition time of the service data in the data cycle of the service data;
and grouping the service data with the same position identification into an extended set.
4. The method according to any one of claims 1 to 3, wherein determining abnormal traffic data from the plurality of traffic data based on the extended neighborhood set corresponding to each traffic data comprises:
determining abnormal business data from a plurality of business data according to the following modes based on the expansion neighborhood set corresponding to each business data:
determining that the formula | y is satisfiedii|>K·σiTraffic data yiAbnormal data; wherein the content of the first and second substances,
Figure FDA0003481042790000021
Nifor traffic data yiExtended neighborhood set of, | NiL is the set NiRadix or component set N ofiK is a preset value.
5. The method according to any one of claims 1 to 3, wherein the obtaining of the period length T of each service data comprises:
based on the plurality of service data and the acquisition time of each service data, determining the period length T of each service data by adopting the following mode:
arranging the n service data according to the acquisition time sequence of each service data to form a service data sequence X;
calculating a sequence A formed by the 1 st service data to the n-k service data in the service data sequence X and a Pearson correlation coefficient c of a sequence B formed by the k +1 st service data to the n service data in the service data sequence XkObtaining n Pearson correlation coefficients ck
The correlation coefficient of each Pearson is calculated according to the correlation coefficient of the Pearson ckThe numerical values of k are sequentially arranged to form an autocorrelation sequence C; the numerical value sequence is the sequence of numerical values from large to small or the sequence of numerical values from small to large;
based on a preset peak threshold value CthDetermining a plurality of spikes from the autocorrelation sequence C as follows: based on Pearson's correlation coefficient ckAnd ckEach W before and afterpA peak set composed of Pearson correlation coefficients, determining the peak set with the largest value and not less than CthThe Peak correlation coefficient;
sequencing the position serial numbers q of the peak values in the autocorrelation sequence C in an ascending order according to the numerical value to form a serial number set, and subtracting the serial numbers of two adjacent positions in the serial number set to obtain the difference absolute value of the difference value of the serial numbers of the two adjacent positions;
forming a difference sequence by the absolute values of the differences, and carrying out median calculation on the difference sequence to obtain a median of the difference sequence, wherein the median represents the cycle length T of each service data;
wherein, WpIs a preset value; 1,2,3, …, n; k is 0,1,2,3, …, (n-1); q is 1,2,3, …, n; n is more than or equal to 2 and is a natural number.
6. An exception data handling apparatus, comprising: a cycle determination unit and a data identification unit;
the period determining unit is used for acquiring a plurality of service data of a service, the period length T of each service data and the acquisition time of each service data; the cycle length is the number of the service data contained in the data cycle corresponding to the service data;
the data identification unit is used for grouping the service data with the same time sequence position into an extended set based on the time sequence position of the acquisition time of the service data in the data cycle of the service data; for each service data yiWill service data yiIs acquired at time tiBefore and after each WdEach service data and service data y in the extended set to which the service data corresponding to each acquisition time belongsiEach service data in the extended set and the collection time tiBefore and after each WsThe business data corresponding to the acquisition time form the business data yiExtended neighborhood set Ni(ii) a Determining abnormal business data from the plurality of business data based on the expansion neighborhood set corresponding to each business data;
wherein i is 1,2,3, …, n; n is not less than 2 and is a natural number; wdAnd WsAre all preset values, and Wd<Ws(ii) a T is more than or equal to 2 and is a natural number.
7. The apparatus of claim 6, wherein the period determination unit comprises a processing module, an autocorrelation calculation module, a spike search module, a period determination module, and the data identification unit comprises a data grouping module, a neighborhood determination module, and an anomaly detection module;
the processing module is used for acquiring a plurality of service data of a service, the cycle length T of each service data and the acquisition time of each service data; the system is also used for acquiring a plurality of service data of a service and the acquisition time of each service data;
the autocorrelation calculating module is used for arranging n service data according to the acquisition time sequence of each service data to form a service data sequence X, then calculating a sequence A formed by the 1 st service data to the n-k service data in the service data sequence X and a Pearson correlation coefficient of a sequence B formed by the k +1 st service data to the n service data in the service data sequence to obtain n Pearson correlation coefficients ck(ii) a And the correlation coefficient of each Pearson is calculated according to the correlation coefficient of Pearson ckThe numerical values of k are sequentially arranged to form an autocorrelation sequence C; the numerical value sequence is the sequence of numerical values from large to small or the sequence of numerical values from small to large;
the peak searching module is used for searching for a peak based on a preset peak threshold value CthDetermining a plurality of spikes from the autocorrelation sequence C as follows: based on Pearson's correlation coefficient ckAnd ckEach W before and afterpA peak set composed of Pearson correlation coefficients, and determining the peak set with the maximum value not less than CthThe Peak correlation coefficient;
the period determining module is used for sequencing the position serial numbers q of the peak values in the autocorrelation sequence C in an ascending order according to the numerical value to form a serial number set, and subtracting the adjacent two position serial numbers in the serial number set to obtain the difference absolute value of the difference value of the two adjacent position serial numbers; after the absolute values of the differences are combined into a difference sequence, carrying out median calculation on the difference sequence to obtain the median of the difference sequence, wherein the median represents the cycle length T of each service data;
the data grouping module is used for grouping the service data with the same time sequence position into an extended set based on the time sequence position of the acquisition time of the service data in the data cycle of the service data;
the neighborhood determination module is used for determining each service data yiWill service data yiIs acquired at time tiBefore and after each WdEach service data and service data y in the extended set to which the service data corresponding to each acquisition time belongsiEach service data in the extended set and the collection time tiBefore and after each WsThe business data corresponding to the acquisition time form the business data yiExtended neighborhood set Ni
The abnormal detection module is used for determining abnormal business data from a plurality of business data based on the expansion neighborhood set corresponding to each business data;
wherein, WpIs a preset value; 1,2,3, …, n; k is 0,1,2,3, …, (n-1); q is 1,2,3, …, n; n is not less than 2 and is a natural number.
8. An exception data handling apparatus, comprising:
a processor and a memory;
the memory stores executable instructions executable by the processor;
wherein execution of the executable instructions stored by the memory by the processor causes the processor to perform the method of any of claims 1-5.
9. A storage medium having stored therein computer executable instructions for performing the method of any one of claims 1-5 when executed by a processor.
10. A program product comprising a computer program which, when executed by a processor, carries out the method of any one of claims 1 to 5.
CN202210068428.4A 2022-01-20 2022-01-20 Abnormal service data processing method and device and storage medium Pending CN114493230A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210068428.4A CN114493230A (en) 2022-01-20 2022-01-20 Abnormal service data processing method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210068428.4A CN114493230A (en) 2022-01-20 2022-01-20 Abnormal service data processing method and device and storage medium

Publications (1)

Publication Number Publication Date
CN114493230A true CN114493230A (en) 2022-05-13

Family

ID=81472291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210068428.4A Pending CN114493230A (en) 2022-01-20 2022-01-20 Abnormal service data processing method and device and storage medium

Country Status (1)

Country Link
CN (1) CN114493230A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915846A (en) * 2015-06-18 2015-09-16 北京京东尚科信息技术有限公司 Electronic commerce time sequence data anomaly detection method and system
US20190138643A1 (en) * 2017-11-06 2019-05-09 Adobe Systems Incorporated Extracting seasonal, level, and spike components from a time series of metrics data
CN110059904A (en) * 2017-12-13 2019-07-26 罗伯特·博世有限公司 The automatic method for working out the rule of rule-based anomalous identification in a stream

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915846A (en) * 2015-06-18 2015-09-16 北京京东尚科信息技术有限公司 Electronic commerce time sequence data anomaly detection method and system
US20190138643A1 (en) * 2017-11-06 2019-05-09 Adobe Systems Incorporated Extracting seasonal, level, and spike components from a time series of metrics data
CN110059904A (en) * 2017-12-13 2019-07-26 罗伯特·博世有限公司 The automatic method for working out the rule of rule-based anomalous identification in a stream

Similar Documents

Publication Publication Date Title
CN111538642B (en) Abnormal behavior detection method and device, electronic equipment and storage medium
US10452627B2 (en) Column weight calculation for data deduplication
CN109934268B (en) Abnormal transaction detection method and system
CN110110325B (en) Repeated case searching method and device and computer readable storage medium
CN106936778B (en) Method and device for detecting abnormal website traffic
CN110633371A (en) Log classification method and system
CN107909119A (en) The definite method and apparatus of similarity between set
CN110647913B (en) Abnormal data detection method and device based on clustering algorithm
CN111291567B (en) Evaluation method and device for manual labeling quality, electronic equipment and storage medium
CN110019762B (en) Problem positioning method, storage medium and server
CN114817645A (en) Time sequence data storage and reading method, device, equipment and storage medium
CN111898378B (en) Industry classification method and device for government enterprise clients, electronic equipment and storage medium
CN113987243A (en) Image file gathering method, image file gathering device and computer readable storage medium
CN112463564A (en) Method and device for determining correlation index influencing host state
CN112632000A (en) Log file clustering method and device, electronic equipment and readable storage medium
CN109409091B (en) Method, device and equipment for detecting Web page and computer storage medium
CN114493230A (en) Abnormal service data processing method and device and storage medium
CN111311276A (en) Abnormal user group identification method, identification device and readable storage medium
CN116821087A (en) Power transmission line fault database construction method, device, terminal and storage medium
CN111695829B (en) Index fluctuation period calculation method and device, storage medium and electronic equipment
CN113535458B (en) Abnormal false alarm processing method and device, storage medium and terminal
JP7494562B2 (en) Factor analysis device and factor analysis method
CN110609901B (en) User network behavior prediction method based on vectorization characteristics
CN112232962A (en) Transaction index processing method, device and equipment
CN112148764B (en) Feature screening method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination