CN111552684A - Abnormal data positioning method and device, computer equipment and storage medium - Google Patents

Abnormal data positioning method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111552684A
CN111552684A CN202010368783.4A CN202010368783A CN111552684A CN 111552684 A CN111552684 A CN 111552684A CN 202010368783 A CN202010368783 A CN 202010368783A CN 111552684 A CN111552684 A CN 111552684A
Authority
CN
China
Prior art keywords
frequent
dimension data
data
combination
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010368783.4A
Other languages
Chinese (zh)
Inventor
黄荣庚
张戎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010368783.4A priority Critical patent/CN111552684A/en
Publication of CN111552684A publication Critical patent/CN111552684A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Abstract

The application relates to an abnormal data positioning method, an abnormal data positioning device, computer equipment and a storage medium. The method comprises the following steps: acquiring abnormal log data and normal log data; converting the normal log data and the abnormal log data to obtain a normal log set and an abnormal log set; calculating according to the normal log set to obtain a corresponding first frequent dimensional data set, and calculating according to the abnormal log set to obtain a corresponding second frequent dimensional data set; obtaining a target frequent dimension data set according to the first frequent dimension data set and the second frequent dimension data set; calculating the difference support degree corresponding to the target frequent dimension data combination in the target frequent dimension data set; and sorting the target frequent dimension data combinations according to the difference support degree and the dimension data quantity of the target frequent dimension data combinations, and determining abnormal target frequent dimension data combinations according to sorting results. By adopting the method, the accuracy of positioning the abnormal data can be improved.

Description

Abnormal data positioning method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method and an apparatus for locating abnormal data, a computer device, and a storage medium.
Background
With the development of the intelligent operation and maintenance technology, whether the user accesses the webpage or the website and the like normally is monitored by monitoring indexes in the log. When the indexes in the monitoring log are abnormal, analyzing root causes (root causes) of the abnormal indexes for subsequent processing. For example, if the total number of accesses in a period of time suddenly increases, and at this time, the total number of accesses index is abnormal, the root cause of the sudden increase of the total number of accesses needs to be analyzed, for example, the root cause of the sudden increase of the total number of accesses is that the access request of a certain area increases, so that the access request of the area can be limited, and the access can be recovered to normal.
In the conventional technology, when a marker in a monitoring log is abnormal, the monitoring log with the abnormal marker is usually analyzed according to a preset analysis rule, so as to obtain a root cause of the abnormality.
However, in the current method of monitoring the log for the abnormality by presetting the analysis rule, the accuracy of analyzing the abnormal root cause positioning is low.
Disclosure of Invention
In view of the above, it is necessary to provide an abnormal data positioning method, an abnormal data positioning apparatus, a computer device, and a storage medium, which can improve the accuracy of abnormal data positioning.
A method of anomaly data location, the method comprising:
acquiring abnormal log data and normal log data;
respectively converting the normal log data and the abnormal log data to obtain a normal log set and an abnormal log set;
calculating to obtain a corresponding first frequent dimension data set according to dimension data in the normal log set, wherein the first frequent dimension data set comprises each first frequent dimension data combination and a corresponding first support degree, and calculating to obtain a corresponding second frequent dimension data set according to the dimension data in the abnormal log set, and the second frequent dimension data set comprises each second frequent dimension data combination and a corresponding second support degree;
performing de-coincidence on frequent dimension data combinations in the first frequent dimension data set and the second frequent dimension data set to obtain a target frequent dimension data set;
calculating to obtain a difference support degree corresponding to the target frequent dimensional data combination in the target frequent dimensional data set according to the first support degree corresponding to each first frequent dimensional data combination and the second support degree corresponding to each second frequent dimensional data combination;
and sorting the target frequent dimensional data combinations in the target frequent dimensional data set according to the difference support degree corresponding to the target frequent dimensional data combinations in the target frequent dimensional data set and the dimensional data number of the target frequent dimensional data combinations, and determining abnormal target frequent dimensional data combinations according to sorting results.
An anomaly data locating apparatus, the apparatus comprising:
the log processing module is used for acquiring abnormal log data and normal log data; respectively converting the normal log data and the abnormal log data to obtain a normal log set and an abnormal log set;
the frequent set calculation module is used for calculating to obtain a corresponding first frequent dimension data set according to the dimension data in the normal log set, wherein the first frequent dimension data set comprises each first frequent dimension data combination and a corresponding first support degree, and calculating to obtain a corresponding second frequent dimension data set according to the dimension data in the abnormal log set, and the second frequent dimension data set comprises each second frequent dimension data combination and a corresponding second support degree;
the target frequent set obtaining module is used for carrying out de-coincidence on frequent dimension data combinations in the first frequent dimension data set and the second frequent dimension data set to obtain a target frequent dimension data set;
the difference calculation module is used for calculating and obtaining the difference support degree corresponding to the target frequent dimensional data combination in the target frequent dimensional data set according to the first support degree corresponding to each first frequent dimensional data combination and the second support degree corresponding to each second frequent dimensional data combination;
and the anomaly determination module is used for sequencing the target frequent dimension data combinations in the target frequent dimension data set according to the difference support degree corresponding to the target frequent dimension data combinations in the target frequent dimension data set and the dimension data quantity of the target frequent dimension data combinations, and determining the anomalous target frequent dimension data combinations according to the sequencing result.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring abnormal log data and normal log data;
respectively converting the normal log data and the abnormal log data to obtain a normal log set and an abnormal log set;
calculating to obtain a corresponding first frequent dimension data set according to dimension data in the normal log set, wherein the first frequent dimension data set comprises each first frequent dimension data combination and a corresponding first support degree, and calculating to obtain a corresponding second frequent dimension data set according to the dimension data in the abnormal log set, and the second frequent dimension data set comprises each second frequent dimension data combination and a corresponding second support degree;
performing de-coincidence on frequent dimension data combinations in the first frequent dimension data set and the second frequent dimension data set to obtain a target frequent dimension data set;
calculating to obtain a difference support degree corresponding to the target frequent dimensional data combination in the target frequent dimensional data set according to the first support degree corresponding to each first frequent dimensional data combination and the second support degree corresponding to each second frequent dimensional data combination;
and sorting the target frequent dimensional data combinations in the target frequent dimensional data set according to the difference support degree corresponding to the target frequent dimensional data combinations in the target frequent dimensional data set and the dimensional data number of the target frequent dimensional data combinations, and determining abnormal target frequent dimensional data combinations according to sorting results.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring abnormal log data and normal log data;
respectively converting the normal log data and the abnormal log data to obtain a normal log set and an abnormal log set;
calculating to obtain a corresponding first frequent dimension data set according to dimension data in the normal log set, wherein the first frequent dimension data set comprises each first frequent dimension data combination and a corresponding first support degree, and calculating to obtain a corresponding second frequent dimension data set according to the dimension data in the abnormal log set, and the second frequent dimension data set comprises each second frequent dimension data combination and a corresponding second support degree;
performing de-coincidence on frequent dimension data combinations in the first frequent dimension data set and the second frequent dimension data set to obtain a target frequent dimension data set;
calculating to obtain a difference support degree corresponding to the target frequent dimensional data combination in the target frequent dimensional data set according to the first support degree corresponding to each first frequent dimensional data combination and the second support degree corresponding to each second frequent dimensional data combination;
and sorting the target frequent dimensional data combinations in the target frequent dimensional data set according to the difference support degree corresponding to the target frequent dimensional data combinations in the target frequent dimensional data set and the dimensional data number of the target frequent dimensional data combinations, and determining abnormal target frequent dimensional data combinations according to sorting results.
According to the abnormal data positioning method, the abnormal data positioning device, the computer equipment and the storage medium, the corresponding first frequent dimension data set is obtained through calculation according to the dimension data in the normal log set, and the corresponding second frequent dimension data set is obtained through calculation according to the dimension data in the abnormal log set; then obtaining a target frequent dimension data set according to the first frequent dimension data set and the second frequent dimension data set; calculating to obtain the difference support degree corresponding to the target frequent dimension data combination in the target frequent dimension data set; the target frequent dimension data combinations in the target frequent dimension data sets are sorted according to the difference support degree corresponding to the target frequent dimension data combinations in the target frequent dimension data sets and the dimension data number of the target frequent dimension data combinations, the abnormal target frequent dimension data combinations are determined according to the sorting result, the frequent dimension data combinations of normal logs and the frequent dimension data combinations of abnormal logs can be contrastively analyzed, the difference support degree is obtained, the root cause of the abnormality is determined according to the difference support degree and the dimension data number of the frequent dimension data combinations, the obtained abnormal target frequent dimension data combinations can be more accurate, and the accuracy of abnormal data positioning is improved.
Drawings
FIG. 1 is a diagram of an exemplary embodiment of an application environment for a method for locating anomalous data;
FIG. 2 is a flowchart illustrating an abnormal data locating method according to an embodiment;
FIG. 3 is a diagram illustrating time segments divided in one embodiment;
FIG. 4 is a schematic flow diagram for computing a frequent dimension dataset in one embodiment;
FIG. 5 is a flowchart illustrating an abnormal data locating method according to another embodiment;
FIG. 6 is a schematic flow diagram illustrating the calculation of a first frequent dimension dataset, under an embodiment;
FIG. 7 is a schematic diagram of a process for computing a first frequent dimension data set in accordance with another embodiment;
FIG. 8 is a flowchart illustrating an abnormal data locating method according to another embodiment;
FIG. 9 is a diagram of a third set of exceptional dimensional data obtained in one embodiment;
FIG. 10 is a flowchart illustrating a method for locating anomalous data in an exemplary embodiment;
FIG. 11 is a block diagram illustrating an exemplary method for locating abnormal data;
FIG. 12 is a diagram illustrating basic information of an anomaly obtained in the embodiment of FIG. 11;
FIG. 13 is a schematic diagram of the root cause of the calculated anomaly in the embodiment of FIG. 11;
FIG. 14 is a block diagram of an abnormal data locating device in one embodiment;
FIG. 15 is a diagram showing an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The abnormal data positioning method provided by the application can be applied to the application environment shown in fig. 1. The management terminal 102 communicates with the server 104 via a network. The server 104 acquires abnormal log data and normal log data; and respectively converting the normal log data and the abnormal log data to obtain a normal log set and an abnormal log set. The server 104 calculates to obtain a corresponding first frequent dimension data set according to the dimension data in the normal log set, wherein the first frequent dimension data set comprises each first frequent dimension data combination and a corresponding first support degree, and calculates to obtain a corresponding second frequent dimension data set according to the dimension data in the abnormal log set, and the second frequent dimension data set comprises each second frequent dimension data combination and a corresponding second support degree; and performing de-coincidence on the frequent dimension data combinations in the first frequent dimension data set and the second frequent dimension data set to obtain a target frequent dimension data set. The server 104 calculates a difference support degree corresponding to the target frequent dimensional data combination in the target frequent dimensional data set according to the first support degree corresponding to each first frequent dimensional data combination and the second support degree corresponding to each second frequent dimensional data combination. The server 104 sorts the target frequent dimensional data combinations in the target frequent dimensional data set according to the difference support degree corresponding to the target frequent dimensional data combinations in the target frequent dimensional data set and the dimensional data number of the target frequent dimensional data combinations, determines abnormal target frequent dimensional data combinations according to the sorting result, and then sends the abnormal target frequent dimensional data combinations to the management terminal 102 for display, so that managers can conveniently check the abnormal target frequent dimensional data combinations. The management terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, an abnormal data positioning method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
step 202, obtaining abnormal log data and normal log data.
The log data is a data sequence arranged according to the time occurrence sequence, and is a log of a webpage or a website accessed by a user. Each time point corresponds to one log data, and dimension data corresponding to different dimensions are recorded in each log data. The dimensions in the log data may include: user location, network operator, data center, data request source, and monitoring KPI (Key Performance Indicator), etc. Each dimension corresponds to different dimensional data, for example, the user location may include different locations in beijing, shanghai, and guangzhou. Network operators may include mobile, connectivity, telecommunications, and the like. The abnormal log data refers to log data when the monitoring KPI is abnormal, and the normal log data refers to log data when the monitoring KPI is normal. Monitoring the KPI to be abnormal refers to monitoring the KPI to exceed a preset normal KPI range.
Specifically, the server may acquire abnormal log data and normal log data from the historical log data. The server can also collect log data in real time, and abnormal log data and normal log data are obtained by monitoring the KPI.
In one embodiment, step 202, comprises the steps of:
acquiring monitoring log data, determining abnormal monitoring index data from the monitoring log data, acquiring a corresponding abnormal time period according to the abnormal monitoring index data, and acquiring log data corresponding to the abnormal time period from the monitoring log data to obtain abnormal log data; and determining a corresponding historical time period according to the abnormal time period, acquiring a preset transition time period, calculating to obtain a historical normal time period according to the historical time period and the preset transition time period, and acquiring log data corresponding to the historical normal time period from the monitoring log data to obtain normal log data.
The abnormal monitoring index data refers to KPI data after monitoring KPI is abnormal. The abnormal time period refers to a time period in which abnormal monitoring index data appears in the log. The history time period refers to a time period before the abnormal time period, and is set in advance, for example, three hours before an abnormal starting time point of the abnormal time period is set as the history time period. The preset transition time period refers to a preset transition period time period, generally, the monitoring KPI is slowly transited from normal to abnormal, the time period for the monitoring KPI to slowly transit from normal to abnormal is called the transition time period, and the transition time period can be set according to experience. Fig. 3 is a schematic diagram of time periods after being divided according to the monitoring KPIs.
Specifically, the server obtains monitoring log data, finds abnormal monitoring index data exceeding a preset monitoring index data range from the monitoring log data, obtains an abnormal time period corresponding to the abnormal monitoring index data, and obtains the abnormal log data according to the abnormal time period. And then obtaining a historical time period according to the abnormal time period, calculating to obtain a historical normal time period according to the historical time period and a preset transition time period, and obtaining log data corresponding to the historical normal time period from the monitoring log data to obtain normal log data. By screening the log data corresponding to the preset transition time period, the obtained normal log data can be more accurate.
In a specific embodiment, as shown in table 1, to obtain the monitoring log data table,
TABLE 1 monitoring Log data sheet
Figure BDA0002477481260000071
Wherein, the API is an application program interface. And the KPI monitoring comprises indexes such as access success rate, total access number, access success rate and the like. When monitoring KPI is abnormal in a period of time, the log data obtained in the abnormal time period is abnormal log data, and when monitoring KPI is not abnormal, the log data obtained in the normal time period is normal log data.
And step 204, respectively converting the normal log data and the abnormal log data to obtain a normal log set and an abnormal log set.
The normal log set is a log transaction set formed by taking each piece of log data in the normal log data as a log transaction and all the normal log data, for example, if the normal log data includes three logs, namely, a log a, a log B and a log C, the normal log set obtained by conversion is (a, B, C). The abnormal log set is a log transaction set formed by taking each piece of log data in the abnormal log data as a log transaction and all the abnormal log data. For example, if the normal log data includes three logs, i.e., a log D, a log E, and a log F, the normal log set obtained by conversion is (D, E, F).
Specifically, the normal log data is converted to obtain a normal log set, and the abnormal log data is converted to obtain an abnormal log set. And the normal log data and the abnormal log data can be converted in parallel to obtain abnormal log data and an abnormal log set.
In one embodiment, the normal log data and the abnormal log data comprise continuous log data and discrete log data, and step 204 comprises the steps of:
the method comprises the steps of obtaining continuous log data in normal log data and abnormal log data, discretizing the continuous log data to obtain discretized normal log data and discretized abnormal log data, converting the discretized normal log data and the discretized abnormal log data into log sets, and obtaining a normal log set and an abnormal log set.
The continuous log data refers to data which is acquired in a counting mode and can only be calculated by using an integer or floating point unit, namely the continuous log data is numerical value continuity. Such as access volume, total number of accesses, and access success rate, etc. The discrete log data is data whose data value is discontinuous and is represented discontinuously. Such as geographic location, network operator, etc.
Specifically, the server respectively obtains continuous log data in the normal log data and the abnormal log data, discretizes the continuous log data, and obtains discretized normal log data and discretized abnormal log data. The discretization may be performed by using different discretization methods, for example, the continuous log data may be subjected to equal-width discretization, that is, the continuous log data is divided into intervals having the same width from the maximum value to the minimum value, the number of the divided intervals may be preset, and then the continuous log data is divided according to the intervals to obtain the discretized log data. For example, the access success rate is [0, 100% ], the access success rate is discretized in an equal width, four intervals of [ low (0-25%), general (25-50%), high (50-75%), and high (75-100%) ] are obtained, and when the access success rate in the normal log data is 87%, the obtained historical access success rate is high. For example, the continuous log data may be subjected to equal-frequency discretization. For example, the continuous log data may be clustered and dispersed. For example, the continuous log data may be discretized based on the information entropy, or discretized based on the chi-square, and so on. And then converting the discretized normal log data and the discretized abnormal log data into a log set to obtain a normal log set and an abnormal log set.
And step 206, calculating to obtain a corresponding first frequent dimension data set according to the dimension data in the normal log set, wherein the first frequent dimension data set comprises each first frequent dimension data combination and a corresponding first support degree, and calculating to obtain a corresponding second frequent dimension data set according to the dimension data in the abnormal log set, and the second frequent dimension data set comprises each second frequent dimension data combination and a corresponding second support degree.
The dimension data refers to data corresponding to a dimension in a log of the log set, for example, the dimension data may be data "guangzhou" corresponding to a user position dimension in the log. The frequent dimension data set refers to a set consisting of frequent dimension data combinations and corresponding support degrees, the frequent dimension data combinations refer to dimension data combinations which frequently appear in log data, the dimension data combinations can be single dimension data, the dimension data combinations can also be combinations of a plurality of dimension data, each dimension data combination has a corresponding support degree, and the dimension data combinations with the support degrees larger than a preset support degree threshold value are frequent dimension data combinations. The support degree refers to the frequency of the dimension data combination appearing in the log set, that is, the ratio of the number of log data items of all the dimension data combinations appearing in the log set to the number of log data items in the log set, for example, if 4 logs exist in the log set, when the dimension data combination (shanghai, MC) appears in one log and the dimension data combination (shanghai, MC) does not appear in the other three logs, the support degree is 25%. Each dimension data combination has corresponding support degree.
The first frequent dimension data set is a set consisting of various first frequent dimension data combinations obtained through calculation according to the dimension data in the normal log set and corresponding first support degrees, the first frequent dimension data combinations are frequent dimension data combinations in the dimension data in the normal log set, and the first support degrees are support degrees corresponding to the first frequent dimension data combinations.
The second frequent dimension data set is a set consisting of each second frequent dimension data combination obtained through calculation according to the dimension data in the abnormal log set and a corresponding second support degree, the second frequent dimension data combination is a frequent dimension data combination in the dimension data in the abnormal log set, and the second support degree is a support degree corresponding to the second frequent dimension data combination.
Specifically, the server calculates to obtain a corresponding first frequent dimension data set according to the dimension data in the normal log set, the first frequent dimension data set includes each first frequent dimension data combination and a corresponding first support degree, and calculates to obtain a corresponding second frequent dimension data set according to the dimension data in the abnormal log set, and the second frequent dimension data set includes each second frequent dimension data combination and a corresponding second support degree.
In a specific embodiment, the first frequent dimension data set is obtained as shown in table 3:
TABLE 3 first frequent dimension dataset Table
Serial number First frequent dimension data combination First degree of support
1 Shanghai province 30%
2 Shanghai, API 60%
N Success rate, MC 65%
The resulting second frequent dimension dataset is shown in table 3:
TABLE 3 second frequent dimension dataset Table
Serial number Second frequent dimension data combination Second degree of support
1 Shanghai province 99%
2 Shanghai, API 85%
N Success rate, MC 50%
And 208, performing de-coincidence on the frequent dimension data combinations in the first frequent dimension data set and the second frequent dimension data set to obtain a target frequent dimension data set.
The target frequent dimension data set refers to all non-repeating frequent dimension data sets in the first frequent dimension data set and the second frequent dimension data set.
Specifically, all frequent dimension data combinations are obtained according to the first frequent dimension data set and the second frequent dimension data set, and the target frequent dimension data set is obtained by performing deduplication on the same frequent dimension data combination. The target frequent dimension data set comprises all the frequent dimension data combinations subjected to duplicate removal and a first support degree and a second support degree corresponding to the frequent dimension data combinations. For example, the first frequent dimension data combination (shanghai) and the first frequent dimension data combination (shanghai) are deduplicated to obtain the target frequent dimension data combination (shanghai) in the target frequent dimension data set, and obtain the first support degree of 30% and the second support degree of 99% corresponding to the frequent dimension data combination (shanghai).
And step 210, calculating to obtain a difference support degree corresponding to the target frequent dimensional data combination in the target frequent dimensional data set according to the first support degree corresponding to each first frequent dimensional data combination and the second support degree corresponding to each second frequent dimensional data combination.
The target frequent dimension data combination refers to a frequent dimension data combination in the target frequent dimension data set. The difference support degree is a difference value between a first support degree and a second support degree corresponding to the target frequent dimension data combination.
Specifically, the server obtains a first support degree and a second support degree corresponding to the target frequent dimensional data combination in the target frequent dimensional data set according to the first support degree corresponding to each first frequent dimensional data combination and the second support degree corresponding to each second frequent dimensional data combination, and then calculates a difference between the first support degree and the second support degree to obtain a difference support degree corresponding to the target frequent dimensional data combination.
And 212, sorting the target frequent dimension data combinations in the target frequent dimension data set according to the difference support degree corresponding to the target frequent dimension data combinations in the target frequent dimension data set and the dimension data quantity of the target frequent dimension data combinations, and determining abnormal target frequent dimension data combinations according to sorting results.
The dimension data quantity refers to the dimension data quantity in the target frequent dimension data combination. For example, if the target frequent dimension data combination is (shanghai), the number of dimension data is 1. The target frequent dimension data combination is (shanghai, API), then the number of dimension data is 2. The abnormal target frequent dimension data combination refers to a target frequent dimension data combination corresponding to a root cause for monitoring the KPI to be abnormal.
Specifically, the target frequent dimension data combinations in the target frequent dimension data set are sorted according to the difference support degree corresponding to the target frequent dimension data combinations in the target frequent dimension data set and the dimension data number of the target frequent dimension data combinations, and then the first sorted target frequent dimension data combination can be selected as the abnormal target frequent dimension data combination according to the sorting result.
In one embodiment, step 212 includes the steps of:
determining each target frequent dimension data combination in the target frequent dimension data set, acquiring the dimension data quantity corresponding to each target frequent dimension data combination, and sequencing each target frequent dimension data combination by using a preset sequencing rule according to the difference support degree and the dimension data quantity to obtain each sequenced target frequent dimension data combination; and selecting a preset number of target frequent dimension data combinations from the sorted target frequent dimension data combinations to obtain abnormal target frequent dimension data combinations.
The preset ordering rule is a preset rule capable of ordering each target frequent dimension data combination, for example, the target frequent dimension data combinations are ordered according to the difference support degree from large to small and the dimension data quantity from small to large. For another example, the target frequent dimension data combinations are sorted according to the difference support degree from large to small and the dimension data quantity from large to small. The preset number refers to the number of preset abnormal target frequent dimension data combinations needing to be selected.
Specifically, the server determines each target frequent dimension data combination in the target frequent dimension data set, obtains the dimension data number corresponding to each target frequent dimension data combination, sorts each target frequent dimension data combination according to the difference support degree and the dimension data number by using a preset sorting rule, sequentially selects a preset number of target frequent dimension data combinations from each sorted target frequent dimension data combination, and takes the selected target frequent dimension data combination as an abnormal target frequent dimension data combination. For example, the target frequent dimension data combination of the first three ranked may be selected as the abnormal target frequent dimension data combination.
In one embodiment, the obtained frequent dimension data of the abnormal target can be combined and returned to the monitoring terminal for displaying, so that monitoring personnel can conveniently check the frequent dimension data for subsequent processing.
In a specific embodiment, the information table of the sorted target frequent dimension data combinations is shown in table 4 below:
TABLE 4 information table of target frequent dimension data combinations
Figure BDA0002477481260000121
Wherein, a target frequent dimension data combination (shanghai) and (shanghai, API) can be selected as the abnormal target frequent dimension data combination.
In the abnormal data positioning method, a corresponding first frequent dimension data set is obtained through calculation according to dimension data in a normal log set, and a corresponding second frequent dimension data set is obtained through calculation according to dimension data in an abnormal log set; then obtaining a target frequent dimension data set according to the first frequent dimension data set and the second frequent dimension data set; calculating to obtain the difference support degree corresponding to the target frequent dimension data combination in the target frequent dimension data set; the target frequent dimension data combinations in the target frequent dimension data sets are sorted according to the difference support degree corresponding to the target frequent dimension data combinations in the target frequent dimension data sets and the dimension data number of the target frequent dimension data combinations, the abnormal target frequent dimension data combinations are determined according to the sorting result, the frequent dimension data combinations of normal logs and the frequent dimension data combinations of abnormal logs can be contrastively analyzed, the difference support degree is obtained, the root cause of the abnormality is determined according to the difference support degree and the dimension data number of the frequent dimension data combinations, the obtained abnormal target frequent dimension data combinations can be more accurate, and the accuracy of abnormal data positioning is improved.
In one embodiment, the server further screens the obtained abnormal target frequent dimension data combination according to preset expert experience to obtain the screened abnormal target frequent dimension data combination, and further enables the obtained abnormal target frequent dimension data combination to be more accurate.
In an embodiment, as shown in fig. 4, in step 206, calculating to obtain a corresponding first frequent dimension data set according to the dimension data in the normal log set, where the first frequent dimension data set includes each first frequent dimension data combination and a corresponding first support degree, and calculating to obtain a corresponding second frequent dimension data set according to the dimension data in the abnormal log set, and the second frequent dimension data set includes each second frequent dimension data combination and a corresponding second support degree, including the steps of:
step 402, determining each first dimension data in the normal log set, combining each first dimension data to obtain each to-be-selected first dimension data combination, calculating a first support degree corresponding to each to-be-selected first dimension data combination, screening from each to-be-selected first dimension data combination according to the first support degree to obtain each first frequent dimension data combination, and obtaining a first frequent dimension data set according to each first frequent dimension data combination and the corresponding first support degree.
The first dimension data refers to dimension data of logs in a normal log set, and the first dimension data combination to be selected refers to all possible combinations obtained by combining the first dimension data. Each of the first dimension data combinations to be selected may be all subsets corresponding to the normal log set. The first dimension data combination to be selected is the generated candidate item set. The first support degree is the support degree corresponding to the first dimension data combination to be selected.
Specifically, the server determines first dimension data corresponding to each log in the normal log set to obtain each first dimension data, and then obtains all combinations of each first dimension data to obtain each first to-be-selected first dimension data combination. And calculating the first support degree corresponding to each to-be-selected first dimension data combination, namely calculating the frequency of the to-be-selected first dimension data combination in the logs of the normal log set. Then, screening is performed on each first-dimension data combination to be selected according to the first support degree, for example, the first-dimension data combination to be selected, of which the first support degree exceeds a preset support degree threshold value, is screened out to obtain each first frequent-dimension data combination, and then a first frequent-dimension data set is obtained according to each first frequent-dimension data combination and the corresponding first support degree.
Step 404, determining each second dimension data in the abnormal log set, combining each second dimension data to obtain each to-be-selected second dimension data combination, calculating a second support degree corresponding to each to-be-selected second dimension data combination, screening from each to-be-selected second dimension data combination according to the second support degree to obtain each second frequent dimension data combination, and obtaining a second frequent dimension data set according to each second frequent dimension data combination and the corresponding second support degree.
The second dimension data refers to dimension data of logs in the abnormal log set, and the to-be-selected second dimension data combination refers to all possible combinations obtained by combining the first dimension data. Each to-be-selected second-dimension data combination may refer to all subsets corresponding to the abnormal log set. The second support degree is the support degree corresponding to the to-be-selected second dimension data combination, that is, the frequency of the to-be-selected second dimension data combination appearing in the log of the abnormal log set.
Specifically, the server determines the dimension data included in the logs in the abnormal log set to obtain each second dimension data. And combining the second dimension data to obtain each to-be-selected second dimension data combination. And then the server calculates a second support degree corresponding to each to-be-selected second dimension data combination, selects out the to-be-selected second dimension data combination with the second support degree exceeding a preset support degree threshold value to obtain each second frequent dimension data combination, and obtains a second frequent dimension data set according to each second frequent dimension data combination and the corresponding second support degree. The support degree threshold for obtaining the second frequent dimension data combination may be the same as or different from the support degree for obtaining the first frequent dimension data combination, and may be set freely.
In the embodiment, the accuracy of obtaining the frequent dimension data set can be improved by obtaining the to-be-selected dimension data combinations, then calculating the support degree corresponding to each to-be-selected dimension data combination, then screening out the frequent dimension data combinations through the support degrees, and finally obtaining the frequent dimension data set.
In an embodiment, as shown in fig. 5, in step 206, calculating to obtain a corresponding first frequent dimension data set according to the dimension data in the normal log set, where the first frequent dimension data set includes each first frequent dimension data combination and a corresponding first support degree, and calculating to obtain a corresponding second frequent dimension data set according to the dimension data in the abnormal log set, and the second frequent dimension data set includes each second frequent dimension data combination and a corresponding second support degree, including:
step 502, calculating according to the dimension data in the normal log set by using an association rule algorithm to obtain each first frequent dimension data combination and a corresponding first support degree, and obtaining a first frequent dimension data set according to each first frequent dimension data combination and the corresponding first support degree.
And step 504, calculating according to the dimension data in the abnormal log set by using an association rule algorithm to obtain each second frequent dimension data combination and a corresponding second support degree, and obtaining a second frequent dimension data set according to each second frequent dimension data combination and the corresponding second support degree.
The association rule algorithm is used for analyzing the dimension data in the log set to obtain a frequent dimension data combination, and calculating the support degree corresponding to the frequent dimension data combination. Association rule algorithms include, but are not limited to, Apriori algorithm (a frequent item set algorithm that mines association rules) and FP-Growth algorithm (a frequent item set mining algorithm based on frequent pattern trees, FP-trees).
Specifically, the server may calculate, by using an association rule algorithm, each first frequent dimension data combination from the dimension data in the normal log set, then calculate a frequency of each first frequent dimension data combination appearing in the logs of the normal log set, obtain a corresponding first support degree, and then obtain a first frequent dimension data set according to each first frequent dimension data combination and the corresponding first support degree. Or, an association rule algorithm may be used in parallel to calculate each second frequent dimension data combination from the dimension data in the abnormal log set, then calculate the frequency of each second frequent dimension data combination appearing in the logs of the normal log set to obtain a corresponding second support degree, and then obtain a second frequent dimension data set according to each second frequent dimension data combination and the corresponding second support degree, where the association rule algorithms used may be the same or different. For example, Apriori algorithm is used to calculate the first support degree and each first frequent dimension data combination corresponding to the normal log set. And calculating each second frequent dimension data combination and second support degree corresponding to the normal log set by using an FP-Growth algorithm.
In the above embodiment, the association rule algorithm may be used to calculate the frequent dimension data combination and the corresponding support degree, so that the calculation efficiency can be improved, and then the frequent dimension data set is obtained, thereby improving the efficiency of obtaining the frequent dimension data set.
In an embodiment, as shown in fig. 6, step 502, calculating, by using an association rule algorithm, each first frequent dimension data combination and a corresponding first support degree according to the dimension data in the normal log set, includes:
step 602, determining each candidate one-dimensional data corresponding to the normal log set, calculating a support degree corresponding to each candidate one-dimensional data, screening from each candidate one-dimensional data according to the support degree corresponding to each candidate one-dimensional data to obtain frequent one-dimensional data, and combining the frequent one-dimensional data as the current frequent one-dimensional data.
The candidate one-dimensional data refers to a combination of dimensional data with only one piece of dimensional data. The frequent one-dimensional data refers to frequent dimension data in the candidate one-dimensional data.
Specifically, the server takes all dimension data in the logs of the normal log set as each candidate one-dimension data, and the same dimension data is taken as the same candidate one-dimension data. And then calculating the frequency of each candidate one-dimensional data appearing in the logs of the normal log set to obtain the support degree corresponding to each candidate one-dimensional data, screening each candidate one-dimensional data to obtain frequent one-dimensional data, and taking the frequent one-dimensional data as the current frequent dimension data combination.
And 604, connecting the current frequent dimensional data combinations to obtain each candidate two-dimensional data combination, and deleting the target candidate two-dimensional data combination from each candidate two-dimensional data combination to obtain each primarily selected frequent two-dimensional data combination when the subset of the target candidate two-dimensional data combination in each candidate two-dimensional data combination is not in the current frequent dimensional data combination.
The candidate two-dimensional data combination refers to a dimension data combination composed of two dimension data. The target candidate two-dimensional data combination means that the subset of the candidate two-dimensional data combination is not in the current frequent dimension data combination. The subset of the target candidate two-dimensional data combination refers to one-dimensional data in the candidate two-dimensional data combination. The subset of the target candidate two-dimensional data combinations is an infrequent dimensional data combination.
Specifically, each current frequent dimensional data combination is subjected to connection operation to obtain each candidate two-dimensional data combination, and a target candidate two-dimensional data combination is deleted from each candidate two-dimensional data combination to obtain each primary-selection frequent two-dimensional data combination. And when the target candidate two-dimensional data combination does not exist in each candidate two-dimensional data combination, directly taking each candidate two-dimensional data combination as each primary selection frequent two-dimensional data combination.
And 606, calculating the support degree corresponding to each primary selection frequent two-dimensional data combination, and screening from each primary selection frequent two-dimensional data combination according to the support degree corresponding to each primary selection frequent two-dimensional data combination to obtain a frequent two-dimensional data combination.
And 608, taking the frequent two-dimensional data combination as a current frequent dimensional data combination, returning to the step of connecting the current frequent dimensional data combination, and performing the steps until the current frequent dimensional data combination is empty, taking all the obtained current frequent dimensional data combinations as each first frequent dimensional data combination, and obtaining a first support degree corresponding to each first frequent dimensional data combination.
Specifically, the server calculates the support degree corresponding to each primarily selected frequent two-dimensional data combination, then performs screening to obtain a frequent two-dimensional data combination, uses the frequent two-dimensional data combination as a current frequent two-dimensional data combination, and returns to step S604 to perform the operation until the current frequent two-dimensional data combination is empty, that is, after the obtained current frequent two-dimensional data combination is connected, the obtained dimensional data combination is the current frequent two-dimensional data combination, that is, the dimensional data combination is not changed. And when the subset of the current frequent dimension data combination is the non-frequent dimension data combination, deleting the current frequent dimension data combination to obtain that the current frequent dimension data combination is empty. At this time, the server takes all the current frequent dimension data combinations which are not deleted as the first frequent dimension data combinations, and determines the first support degrees corresponding to the first frequent dimension data combinations.
In one embodiment, the same method may be used to calculate respective second frequent-dimension data combinations and corresponding second support degrees corresponding to the anomaly log set. I.e. using Apriori algorithm to calculate each second frequent dimension data combination and the corresponding second support degree.
In the above embodiment, the frequent one-dimensional data is obtained by screening, then the frequent one-dimensional data is spliced to obtain the candidate two-dimensional data combination, then the candidate two-dimensional data combination is screened, the above steps are repeated until the current frequent dimension data combination is empty, all the screened frequent dimension data are used as each first frequent dimension data combination, the first support degree corresponding to each first frequent dimension data combination is determined, and the frequent one-dimensional data is used for splicing and then screening, so that the calculation amount can be reduced, and the efficiency of obtaining each first frequent dimension data combination and the corresponding first support degree is improved.
In an embodiment, as shown in fig. 7, step 502, calculating, by using an association rule algorithm, each first frequent dimension data combination and a corresponding first support degree according to the dimension data in the normal log set, includes:
step 702, determining each candidate one-dimensional data corresponding to the normal log set, calculating the support degree corresponding to each candidate one-dimensional data, screening from each candidate one-dimensional data according to the support degree corresponding to each candidate one-dimensional data to obtain frequent one-dimensional data, and sorting frequent one-dimensional data combinations according to the corresponding support degrees to obtain sorted frequent one-dimensional data combinations.
And 704, sorting the dimension data in the normal log set according to the sorted frequent one-dimension data combination to obtain a sorted normal log set.
Specifically, the server sorts the obtained frequent one-dimensional data according to the corresponding support degree from large to small to obtain the sorted frequent one-dimensional data combination. And then, sorting the dimension data in the logs of the normal log set according to the sorted frequent one-dimension data group, namely sorting the dimension data in the logs in sequence according to the support degree, and finishing sorting of each log to obtain a sorted normal log set.
Step 706, according to the corresponding frequent pattern tree generated by the sorted normal log set, each first frequent dimension data combination and the corresponding first support degree are obtained by using the frequent pattern tree to calculate.
Specifically, the frequent pattern Tree refers to an FP-Tree (FP-Tree) generated according to the sorted normal log set. The FP-tree is a special prefix tree and is composed of a frequent item head table and an item prefix tree. The frequent item head table stores each candidate one-dimensional data and corresponding support degree.
And then mining the first frequent dimension data combination by using the frequent pattern tree to obtain each first frequent dimension data combination and the corresponding first support degree. The condition FP trees are obtained through forward traversal according to the frequent item head table, the condition FP trees are used for obtaining condition mode bases, the condition FP trees are used for carrying out recursive mining according to the mode bases, and when the root node is mined, each first frequent dimension data combination and the corresponding first support degree are obtained.
In one embodiment, the same method may also be used to calculate the respective second frequent-dimension data combinations and the corresponding second support degrees corresponding to the anomaly log sets. Namely, the FP-Growth algorithm is used for calculating each second frequent dimension data combination and the corresponding second support degree.
In the above embodiment, by constructing the frequent pattern tree and calculating to obtain each first frequent dimension data combination and the corresponding first support degree using the frequent pattern tree, the efficiency of obtaining each first frequent dimension data combination and the corresponding first support degree can be further improved.
In an embodiment, after step S206, that is, after obtaining a corresponding first frequent dimension data set by calculation according to the dimension data in the normal log set, where the first frequent dimension data set includes each first frequent dimension data combination and a corresponding first support degree, and obtaining a corresponding second frequent dimension data set by calculation according to the dimension data in the abnormal log set, where the second frequent dimension data set includes each second frequent dimension data combination and a corresponding second support degree, the method further includes:
deleting a first parent frequent dimension data combination in each first frequent dimension data combination, wherein the first parent frequent dimension data combination refers to a first frequent dimension data combination with a child frequent dimension data combination in each first frequent dimension data combination; and deleting a second parent frequent dimension data combination in each second frequent dimension data combination, wherein the second parent frequent dimension data combination refers to a second frequent dimension data combination with a child frequent dimension data combination in each second frequent dimension data combination.
The child frequent dimension data combination refers to a frequent dimension data combination with a parent frequent dimension data combination in each frequent dimension data combination. Namely, the child frequent dimension data combination is included in the parent frequent dimension data combination, and the number of dimension data in the parent frequent dimension data combination is larger.
Specifically, the server searches for a first parent frequent dimension data combination in each obtained first frequent dimension data combination, that is, determines whether the first frequent dimension data combination has a corresponding child frequent dimension data combination, and if so, deletes the first parent frequent dimension data combination in each first frequent dimension data combination. And searching a second father frequent dimension data combination in each obtained second frequent dimension data combination, namely judging whether the second frequent dimension data combination has a corresponding child frequent dimension data combination, and deleting the second father frequent dimension data combination in each second frequent dimension data combination when the second frequent dimension data combination exists.
In an embodiment, in step 212, after the target frequent dimensional data combinations in the target frequent dimensional data set are sorted according to the difference support degree corresponding to the target frequent dimensional data combinations in the target frequent dimensional data set and the dimensional data number of the target frequent dimensional data combinations, the server may find the parent target frequent dimensional data combination from the sorting result, and delete the parent target frequent dimensional data combination in the target frequent dimensional data combination to obtain the deleted target frequent dimensional data combination. The parent target frequent dimension data combination refers to a target frequent dimension data combination with sub-target frequent dimension data combinations in each target frequent dimension data combination. The parent target frequent dimension data combination is searched after the sorting, so that the efficiency of searching the parent target frequent dimension data combination can be improved, and the efficiency of deleting the parent target frequent dimension data combination is improved.
In an embodiment, a first target frequent dimension data combination to be merged and a second target frequent dimension data combination to be merged may also be determined from the deleted target frequent dimension data combinations, and the first target frequent dimension data combination to be merged and the second target frequent dimension data combination to be merged are merged to obtain the merged target frequent dimension data combination, where the first target frequent dimension data combination to be merged and the second target frequent dimension data combination to be merged have more than a preset number of same dimension data, and a support difference between the first target frequent dimension data combination to be merged and the second target frequent dimension data combination to be merged is within a preset difference range. For example, the difference in the support degree is within 10%.
In the above embodiment, by deleting the parent frequent dimension data combination in each frequent dimension data combination, each obtained frequent dimension data combination can be more accurate.
In one embodiment, as shown in fig. 8, the method for locating abnormal data further includes:
step 802, obtaining target abnormal log data, where the number of dimensions in the target abnormal log data is less than a preset number of dimensions, and the type of the dimension data corresponding to the dimensions is less than the preset number of types.
The target abnormal log data refers to abnormal log data, the number of dimensions of which is less than the preset number of dimensions, and the type of dimension data corresponding to the dimensions of which is less than the preset number of types. The preset dimension number refers to the number of dimensions included in preset target abnormal log data. For example, the dimensions in the target anomaly log data are less than five. The preset type number refers to the number of dimension data types in preset target abnormal log data. For example, there are less than five dimensional data types for each dimension.
Specifically, when the number of dimensions in the obtained abnormal log data is smaller than the preset number of dimensions and the type of the dimension data corresponding to the dimensions is smaller than the preset number of types, it is indicated that the obtained abnormal log data is the target abnormal log data.
In a specific embodiment, the monitored abnormal log data only includes two dimensions, namely a region dimension and an operator dimension. The dimension data types corresponding to the region dimensions are only three types, including Guangzhou, Shanghai and Beijing. The dimension data types corresponding to the operator dimensions are only three types, including Unicom, Mobile and Telecommunications. At this time, the abnormality log data is the target abnormality log data.
Step 804, converting the target abnormal log data into a target abnormal log set.
Step 806, calculating to obtain a corresponding third frequent dimension data set according to the dimension data in the target abnormal log set, where the third frequent dimension data set includes each third frequent dimension data combination and a corresponding third support degree.
And 808, determining the number of target dimension data corresponding to each third frequent dimension data combination, sorting each third frequent dimension data combination according to a third support degree and the number of the dimension data, and determining the target third frequent dimension data combination according to a sorting result.
The third frequent dimension data combination is a frequent dimension data combination obtained by mining the dimension data in the target abnormal log set through a frequent item set. The third support degree is the support degree corresponding to the third frequent dimension data combination and is used for representing the frequency of the third frequent dimension data combination appearing in the logs of the target abnormal log set. The target dimension data number refers to the number of dimension data in the third frequent dimension data combination. The target third frequent dimension data combination refers to dimension data corresponding to a root cause of abnormality of the target abnormal log data, namely, a combination of suspicious dimension data of the target abnormal log data with abnormality.
Specifically, when target abnormal log data are obtained, the target abnormal log data are directly used for mining frequent dimension data combinations, namely, the target abnormal log data are converted into a target abnormal log set, then dimension data in the target abnormal log set are calculated to obtain a corresponding third frequent dimension data set, then the number of target dimension data corresponding to each third frequent dimension data combination is determined, each third frequent dimension data combination is sequenced according to a third support degree and the number of dimension data, a preset number of third frequent dimension data combinations are selected from the sequencing result according to the support degree from high to low and the number of dimension data from large to small, for example, the preset number is 3, then 3 third frequent dimension data combinations are selected from the support degree from high to low and the number of dimension data from large to small, and taking the obtained third frequent dimension data combination as a target third frequent dimension data combination, namely obtaining the root cause of the abnormality.
In the embodiment, when the target abnormal log data is obtained, the target abnormal log data is directly used for mining the frequent dimension data combination, so that the efficiency of obtaining the target third frequent dimension data combination can be improved.
In one embodiment, after the step 808, that is, after sorting the third frequent dimension data combinations according to the third support degree and the number of dimension data, the method further includes:
and obtaining each sequenced third frequent dimension data combination, and determining a third father frequent dimension data combination and a third son frequent dimension data combination from each sequenced third frequent dimension data combination. The third sub frequent dimension data combination is a subset of the third father frequent dimension data combination, and the abnormal difference between the support degree corresponding to the third sub frequent dimension data combination and the support degree corresponding to the third father frequent dimension data combination is smaller than a preset abnormal difference threshold. And deleting the third sub-frequent dimension data combination from each sorted abnormal frequent dimension data combination to obtain each deleted third frequent dimension data combination.
Specifically, the third child frequent dimension data combination is a subset of the third parent frequent dimension data combination, that is, it is stated that the third parent frequent dimension data combination includes the third child frequent dimension data combination. For example, the third complex dimensional data combination includes (a1, b1 → anomaly) and (a1 → anomaly). And the abnormal difference degree is smaller than the preset abnormal difference degree threshold value, and (a1 → abnormal) is included in (a1, b1 → abnormal), then (a1, b1 → abnormal) is the third parent frequent dimension data combination, and (a1 → abnormal) is the third child frequent dimension data combination. When the server determines a third father frequent dimension data combination and a third son frequent dimension data combination from the sorted third frequent dimension data combinations, the server deletes the third son frequent dimension data combination from the sorted abnormal frequent dimension data combinations to obtain the deleted third frequent dimension data combinations, and then screens the deleted third frequent dimension data combinations to obtain the abnormal third frequent dimension data combinations. The accuracy of obtaining the abnormal third frequent dimension data combination can be further improved.
In an embodiment, after deleting the third sub-frequent dimension data combination from the sorted abnormal frequent dimension data combinations to obtain each deleted third frequent dimension data combination, the method further includes:
and determining a first frequent dimension data combination to be merged and a second frequent dimension data combination to be merged from the deleted third frequent dimension data combinations, wherein a preset amount of same dimension data exists between the first frequent dimension data combination to be merged and the second frequent dimension data combination to be merged, and the difference between the support degree corresponding to the first frequent dimension data combination to be merged and the support degree corresponding to the second frequent dimension data combination to be merged does not exceed a preset merging threshold. And merging the first frequent dimension data combination to be merged and the second frequent dimension data combination to be merged to obtain a merged third frequent dimension data combination.
The preset number refers to the number of preset same-dimension data. For example, the preset number is 2 same dimension data or more than 2 same dimension data.
Specifically, the server determines all the first frequent dimension data combinations to be merged and all the second frequent dimension data combinations to be merged from the deleted third frequent dimension data combinations, and then merges all the first frequent dimension data combinations to be merged and all the second frequent dimension data combinations to be merged to obtain merged third frequent dimension data combinations.
For example, if the first to-be-merged frequent dimension data combination is (a1, b1, c1 → anomaly), and the second to-be-merged frequent dimension data combination is (a2, b1, c1 → anomaly), if two pieces of same dimension data, namely b1 and c1, exist, and the difference between the support degrees does not exceed the preset merging threshold, the first to-be-merged frequent dimension data combination and the second to-be-merged frequent dimension data combination are merged, and the resulting merged third frequent dimension data combination is (a1& a2, b1, c1 → anomaly).
In an embodiment, when the first to-be-merged frequent dimension data combination and the second to-be-merged frequent dimension data combination have only one same dimension data and the difference between the support degree corresponding to the first to-be-merged frequent dimension data combination and the support degree corresponding to the second to-be-merged frequent dimension data combination does not exceed the preset merging threshold, the first to-be-merged frequent dimension data combination and the second to-be-merged frequent dimension data combination may be merged to obtain a merged third frequent dimension data combination. For example, if the first frequent dimension data combination to be merged is (a1, b1, c1 → anomaly), and the second frequent dimension data combination to be merged is (a2, b1, c2 → anomaly), if there is only the same dimension data of b1, and the difference between the support degrees does not exceed the preset merging threshold, the first frequent dimension data combination to be merged and the second frequent dimension data combination to be merged are merged, and the resulting merged third frequent dimension data combination is (a1& a2, b1, c1& c2 → anomaly).
In a specific embodiment, the merged third frequent dimension data combination can be used for screening to obtain an abnormal third frequent dimension data combination, and the abnormal third frequent dimension data combination is sent to the management terminal for displaying. As shown in fig. 9, the diagram is a schematic diagram of an abnormal third frequent dimension data set, where the third frequent dimension data set is (a1, b1& b2, c1& c2), and the dimensions corresponding to the dimension data are the client IP, the reported machine, and the user number, respectively.
In the above embodiment, the frequent dimension data combinations with the same dimension data are combined to obtain a combined third frequent dimension data combination, and then the combined third frequent dimension data combination can be used for screening to obtain an abnormal third frequent dimension data combination, so that the obtained abnormal third frequent dimension data combination is more accurate.
In a specific embodiment, as shown in fig. 10, the method for locating abnormal data specifically includes the following steps:
step 1002, acquiring monitoring log data, determining abnormal monitoring index data from the monitoring log data, acquiring a corresponding abnormal time period according to the abnormal monitoring index data, and acquiring log data corresponding to the abnormal time period from the monitoring log data to obtain abnormal log data.
Step 1004, determining a corresponding historical time period according to the abnormal time period, acquiring a preset transition time period, calculating to obtain a historical normal time period according to the historical time period and the preset transition time period, and acquiring log data corresponding to the historical normal time period from the monitoring log data to obtain normal log data.
Step 1006, obtaining continuous log data in the normal log data and the abnormal log data, discretizing the continuous log data to obtain discretized normal log data and discretized abnormal log data, and converting the discretized normal log data and the discretized abnormal log data into a log set to obtain a normal log set and an abnormal log set.
And 1008, calculating according to the dimension data in the normal log set by using an association rule algorithm to obtain each first frequent dimension data combination and a corresponding first support degree, and obtaining a first frequent dimension data set according to each first frequent dimension data combination and the corresponding first support degree.
Step 1010, calculating according to the dimension data in the abnormal log set by using an association rule algorithm to obtain each second frequent dimension data combination and a corresponding second support degree, and obtaining a second frequent dimension data set according to each second frequent dimension data combination and the corresponding second support degree.
And 1012, performing de-coincidence on frequent dimension data combinations in the first frequent dimension data set and the second frequent dimension data set to obtain a target frequent dimension data set.
Step 1014, calculating to obtain a difference support degree corresponding to the target frequent dimension data combination in the target frequent dimension data set according to the first support degree corresponding to each first frequent dimension data combination and the second support degree corresponding to each second frequent dimension data combination.
Step 1016, sorting the target frequent dimensional data combinations in the target frequent dimensional data set according to the difference support degree corresponding to the target frequent dimensional data combinations in the target frequent dimensional data set and the dimensional data number of the target frequent dimensional data combinations.
Step 1020, finding the father target frequent dimension data combination from the sorting result, deleting the father target frequent dimension data combination in the target frequent dimension data combination to obtain a deleted target frequent dimension data set,
and 1022, determining a first to-be-merged frequent dimension data combination and a second to-be-merged frequent dimension data combination from the deleted target frequent dimension data set, merging the first to-be-merged frequent dimension data combination and the second to-be-merged frequent dimension data combination to obtain a merged target frequent dimension data set, and screening out a target frequent dimension data combination corresponding to the maximum support degree and the minimum dimension data quantity from the merged target frequent dimension data set as an abnormal target frequent dimension data combination.
The application also provides an application scene, and the application scene applies the abnormal data positioning method. Specifically, the application of the abnormal data positioning method in the application scenario is as follows:
fig. 11 is a schematic diagram of an abnormal data positioning method. Specifically, the method comprises the following steps:
and the server acquires the log through the service layer and performs normal and abnormal division on the log. That is, the server monitors that the log of the accessed web page is abnormal, and obtains the abnormal basic information, as shown in fig. 12, the log is a schematic diagram of the abnormal basic information. Including the occurrence time, duration, and recovery time. And acquiring a time period in which the abnormal log appears from the basic information, and acquiring page access log data corresponding to the time period to obtain the abnormal log data. And then determining the time period of the normal log according to the time period of the abnormal log, and acquiring page access log data corresponding to the time of the normal log to obtain the normal log data.
And then the server performs data preprocessing on the obtained abnormal log data and normal log data through a data layer, namely discretizing continuous log data, and converting the discretized log data into a log set to obtain a normal log set and an abnormal log set.
Therefore, the algorithm layer uses the mining algorithm of the frequent item set to mine the frequent item set from the normal log set and the abnormal log set, so as to obtain the normal frequent item set and the normal support degree corresponding to the normal log set, and obtain the abnormal frequent item set and the abnormal support degree corresponding to the abnormal log set. The normal frequent item set may be referred to as a normal mode, and the abnormal frequent item set may be referred to as an abnormal mode. And determining the number of elements corresponding to the normal frequent item set and the abnormal frequent item set. And then, sequencing the obtained normal frequent item sets according to the normal support and the element number to obtain the sequenced normal frequent item sets. And simultaneously sequencing the obtained abnormal frequent item sets according to the abnormal support and the element number to obtain the sequenced abnormal frequent item sets. And pruning the sorted normal frequent item sets and the sorted abnormal frequent item sets, namely deleting the parent frequent item sets with the child frequent item sets. And obtaining a normal frequent item set and an abnormal frequent item set after pruning.
And finally, making a decision through a decision layer, calculating the difference support degree between the normal support degree and the abnormal support degree corresponding to the pruned normal frequent item set and the pruned abnormal frequent item set, and determining the number of elements in the frequent item set. Namely, the normal support, the abnormal support, the support difference value and the number of elements corresponding to all the pruned frequent item sets are obtained. And then sorting the support degree difference values from large to small and the number of the elements from small to large, outputting the frequent item sets of the first three sorted, namely obtaining the dimensionality and the dimensionality data corresponding to the root cause of the abnormal index, and sending the dimensionality and the corresponding dimensionality data to a monitoring terminal for displaying. As shown in fig. 13, a schematic diagram of the root cause of the resulting anomaly is shown. The suspicious dimension comprises a client IP, a reporting machine and a user number. The IP of the client corresponds to an IP of 10.210.XXX.XXX, and the address of the reporting machine comprises two types of 10.210.XXX.XXX and 10.210. XXX.XXX. The user number also includes two "1001" and "1003".
It should be understood that although the various steps in the flowcharts of fig. 2, 4-8 and 10 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 4-8 and 10 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 14, an abnormal data locating apparatus 1400 is provided, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes: a log processing module 1402, a frequent sets calculation module 1404, a target frequent sets derivation module 1406, a variance calculation module 1408, and an anomaly determination module 1410, wherein:
a log processing module 1402, configured to obtain abnormal log data and normal log data; respectively converting the normal log data and the abnormal log data to obtain a normal log set and an abnormal log set;
a frequent set calculation module 1404, configured to calculate to obtain a corresponding first frequent dimension data set according to the dimension data in the normal log set, where the first frequent dimension data set includes each first frequent dimension data combination and a corresponding first support degree, and calculate to obtain a corresponding second frequent dimension data set according to the dimension data in the abnormal log set, and the second frequent dimension data set includes each second frequent dimension data combination and a corresponding second support degree;
a target frequent set obtaining module 1406, configured to perform de-coincidence on frequent dimension data combinations in the first frequent dimension data set and the second frequent dimension data set, and obtain a target frequent dimension data set;
a difference calculating module 1408, configured to calculate, according to the first support degree corresponding to each first frequent dimension data combination and the second support degree corresponding to each second frequent dimension data combination, a difference support degree corresponding to a target frequent dimension data combination in the target frequent dimension data set;
and the anomaly determination module 1410 is configured to sort the target frequent dimension data combinations in the target frequent dimension data set according to the difference support degrees corresponding to the target frequent dimension data combinations in the target frequent dimension data set and the dimension data number of the target frequent dimension data combinations, and determine the anomalous target frequent dimension data combinations according to the sorting result.
In one embodiment, the log processing module 1402 includes: the log obtaining unit is used for obtaining monitoring log data, determining abnormal monitoring index data from the monitoring log data, obtaining a corresponding abnormal time period according to the abnormal monitoring index data, and obtaining log data corresponding to the abnormal time period from the monitoring log data to obtain abnormal log data; and determining a corresponding historical time period according to the abnormal time period, acquiring a preset transition time period, calculating to obtain a historical normal time period according to the historical time period and the preset transition time period, and acquiring log data corresponding to the historical normal time period from the monitoring log data to obtain normal log data.
In one embodiment, the log processing module 1402 includes: the log dispersing unit is used for acquiring continuous log data in the normal log data and the abnormal log data, discretizing the continuous log data and obtaining discretized normal log data and discretized abnormal log data; and converting the discretized normal log data and the discretized abnormal log data into a log set to obtain a normal log set and an abnormal log set.
In one embodiment, frequent sets calculation module 1404 includes:
the first normal calculation unit is used for determining each first dimension data in the normal log set, combining each first dimension data to obtain each first dimension data combination to be selected, calculating a first support degree corresponding to each first dimension data combination to be selected, screening from each first dimension data combination to be selected according to the first support degree to obtain each first frequent dimension data combination, and obtaining a first frequent dimension data set according to each first frequent dimension data combination and the corresponding first support degree;
the first anomaly calculation unit is used for determining each second dimension data in the anomaly log set, combining the second dimension data to obtain each to-be-selected second dimension data combination, calculating a second support degree corresponding to each to-be-selected second dimension data combination, screening from each to-be-selected second dimension data combination according to the second support degree to obtain each second frequent dimension data combination, and obtaining a second frequent dimension data set according to each second frequent dimension data combination and the corresponding second support degree.
In one embodiment, frequent sets calculation module 1404 includes:
the second normal calculation unit is used for calculating to obtain each first frequent dimension data combination and corresponding first support degree according to the dimension data in the normal log set by using an association rule algorithm, and obtaining a first frequent dimension data set according to each first frequent dimension data combination and corresponding first support degree;
and the second anomaly calculation unit is used for calculating each second frequent dimension data combination and a corresponding second support degree according to the dimension data in the anomaly log set by using an association rule algorithm, and obtaining a second frequent dimension data set according to each second frequent dimension data combination and the corresponding second support degree.
In an embodiment, the second normal calculation unit is further configured to determine each candidate one-dimensional data corresponding to the normal log set, calculate a support degree corresponding to each candidate one-dimensional data, filter from each candidate one-dimensional data according to the support degree corresponding to each candidate one-dimensional data, obtain frequent one-dimensional data, and combine the frequent one-dimensional data as the current frequent one-dimensional data; connecting the current frequent dimension data combinations to obtain each candidate two-dimension data combination, and deleting the target candidate two-dimension data combination from each candidate two-dimension data combination to obtain each primarily selected frequent two-dimension data combination when the subset of the target candidate two-dimension data combination in each candidate two-dimension data combination is not in the current frequent dimension data combination; calculating the support degree corresponding to each primary selection frequent two-dimensional data combination, and screening from each primary selection frequent two-dimensional data combination according to the support degree corresponding to each primary selection frequent two-dimensional data combination to obtain a frequent two-dimensional data combination; and taking the frequent two-dimensional data combination as a current frequent dimensional data combination, returning the step of connecting the current frequent dimensional data combination, and taking all the obtained current frequent dimensional data combinations as each first frequent dimensional data combination until the current frequent dimensional data combination is empty, and acquiring the first support degree corresponding to each first frequent dimensional data combination.
In an embodiment, the second anomaly calculation unit is further configured to determine each candidate one-dimensional data corresponding to the normal log set, calculate a support degree corresponding to each candidate one-dimensional data, filter from each candidate one-dimensional data according to the support degree corresponding to each candidate one-dimensional data to obtain frequent one-dimensional data, and sort the frequent one-dimensional data combinations according to the corresponding support degrees to obtain sorted frequent one-dimensional data combinations; sorting the dimension data in the normal log set according to the sorted frequent one-dimension data combination to obtain a sorted normal log set; and according to the corresponding frequent pattern tree generated by the sorted normal log set, calculating by using the frequent pattern tree to obtain each first frequent dimension data combination and the corresponding first support degree.
In one embodiment, the anomaly data locating device 1400 further comprises:
the first deleting module is used for deleting a first parent frequent dimension data combination in each first frequent dimension data combination, wherein the first parent frequent dimension data combination refers to a first frequent dimension data combination of which a child frequent dimension data combination exists in each first frequent dimension data combination; and deleting a second parent frequent dimension data combination in each second frequent dimension data combination, wherein the second parent frequent dimension data combination refers to a second frequent dimension data combination with a child frequent dimension data combination in each second frequent dimension data combination.
In one embodiment, the anomaly determination module 1410 includes:
the sorting unit is used for determining each target frequent dimension data combination in the target frequent dimension data set, acquiring the dimension data quantity corresponding to each target frequent dimension data combination, and sorting each target frequent dimension data combination by using a preset sorting rule according to the difference support degree and the dimension data quantity to obtain each sorted target frequent dimension data combination; and selecting a preset number of target frequent dimension data combinations from the sorted target frequent dimension data combinations to obtain abnormal target frequent dimension data combinations.
In one embodiment, the anomaly data locating device 1400 further comprises:
the target acquisition module is used for acquiring target abnormal log data, the number of dimensions in the target abnormal log data is less than the preset number of dimensions, and the type of the dimension data corresponding to the dimensions is less than the preset number of types;
the target conversion module is used for converting the target abnormal log data into a target abnormal log set;
the calculation module is used for calculating to obtain a corresponding third frequent dimension data set according to the dimension data in the target abnormal log set, wherein the third frequent dimension data set comprises each third frequent dimension data combination and a corresponding third support degree;
and the combination determining module is used for determining the target dimension data quantity corresponding to each third frequent dimension data combination, sequencing each third frequent dimension data combination according to a third support degree and the dimension data quantity, and determining the target third frequent dimension data combination according to the sequencing result.
In one embodiment, the anomaly data locating device 1400 further comprises:
the second deleting module is used for acquiring each sorted third frequent dimension data combination and determining a third father frequent dimension data combination and a third son frequent dimension data combination from each sorted third frequent dimension data combination; the third child frequent dimension data combination is a subset of a third parent frequent dimension data combination, and the abnormal difference between the support degree corresponding to the third child frequent dimension data combination and the support degree corresponding to the third parent frequent dimension data combination is smaller than a preset abnormal difference threshold; and deleting the third sub-frequent dimension data combination from each sorted abnormal frequent dimension data combination to obtain each deleted third frequent dimension data combination.
In one embodiment, the anomaly data locating device 1400 further comprises:
the merging module is used for determining a first frequent dimension data combination to be merged and a second frequent dimension data combination to be merged from the deleted third frequent dimension data combinations, a preset amount of same dimension data exists between the first frequent dimension data combination to be merged and the second frequent dimension data combination to be merged, and the difference between the support degree corresponding to the first frequent dimension data combination to be merged and the support degree corresponding to the second frequent dimension data combination to be merged does not exceed a preset merging threshold; and merging the first frequent dimension data combination to be merged and the second frequent dimension data combination to be merged to obtain a merged third frequent dimension data combination.
For the specific definition of the abnormal data positioning device, reference may be made to the above definition of the abnormal data positioning method, which is not described herein again. The modules in the abnormal data positioning device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 15. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing log data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of anomaly data localization.
Those skilled in the art will appreciate that the architecture shown in fig. 15 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (15)

1. An abnormal data positioning method, characterized in that the method comprises:
acquiring abnormal log data and normal log data;
respectively converting the normal log data and the abnormal log data to obtain a normal log set and an abnormal log set;
calculating to obtain a corresponding first frequent dimension data set according to the dimension data in the normal log set, wherein the first frequent dimension data set comprises each first frequent dimension data combination and a corresponding first support degree, and calculating to obtain a corresponding second frequent dimension data set according to the dimension data in the abnormal log set, and the second frequent dimension data set comprises each second frequent dimension data combination and a corresponding second support degree;
performing de-coincidence on frequent dimension data combinations in the first frequent dimension data set and the second frequent dimension data set to obtain a target frequent dimension data set;
calculating to obtain a difference support degree corresponding to the target frequent dimensional data combination in the target frequent dimensional data set according to the first support degree corresponding to each first frequent dimensional data combination and the second support degree corresponding to each second frequent dimensional data combination;
and sorting the target frequent dimensional data combinations in the target frequent dimensional data set according to the difference support degree corresponding to the target frequent dimensional data combinations in the target frequent dimensional data set and the dimensional data number of the target frequent dimensional data combinations, and determining abnormal target frequent dimensional data combinations according to sorting results.
2. The method of claim 1, wherein obtaining abnormal log data and normal log data comprises:
acquiring monitoring log data, determining abnormal monitoring index data from the monitoring log data, acquiring a corresponding abnormal time period according to the abnormal monitoring index data, and acquiring log data corresponding to the abnormal time period from the monitoring log data to obtain abnormal log data;
determining a corresponding historical time period according to the abnormal time period, acquiring a preset transition time period, calculating to obtain a historical normal time period according to the historical time period and the preset transition time period, and acquiring log data corresponding to the historical normal time period from the monitoring log data to obtain normal log data.
3. The method according to claim 1, wherein the normal log data and the abnormal log data include continuous log data and discrete log data, and the converting the normal log data and the abnormal log data to obtain a normal log set and an abnormal log set comprises:
acquiring continuous log data in the normal log data and the abnormal log data, and discretizing the continuous log data to obtain discretized normal log data and discretized abnormal log data;
and converting the discretized normal log data and the discretized abnormal log data into a log set to obtain a normal log set and an abnormal log set.
4. The method according to claim 1, wherein the calculating according to the dimension data in the normal log set to obtain a corresponding first frequent dimension data set, the first frequent dimension data set including each first frequent dimension data combination and a corresponding first support degree, and calculating according to the dimension data in the abnormal log set to obtain a corresponding second frequent dimension data set, the second frequent dimension data set including each second frequent dimension data combination and a corresponding second support degree, includes:
determining each first-dimension data in the normal log set, combining the first-dimension data to obtain each first-dimension data combination to be selected, calculating a first support degree corresponding to each first-dimension data combination to be selected, screening from the first-dimension data combinations to be selected according to the first support degree to obtain each first frequent-dimension data combination, and obtaining the first frequent-dimension data set according to each first frequent-dimension data combination and the corresponding first support degree;
determining each second dimension data in the abnormal log set, combining the second dimension data to obtain each to-be-selected second dimension data combination, calculating a second support degree corresponding to each to-be-selected second dimension data combination, screening from each to-be-selected second dimension data combination according to the second support degree to obtain each second frequent dimension data combination, and obtaining the second frequent dimension data set according to each second frequent dimension data combination and the corresponding second support degree.
5. The method according to claim 1, wherein the calculating according to the dimension data in the normal log set to obtain a corresponding first frequent dimension data set, the first frequent dimension data set including each first frequent dimension data combination and a corresponding first support degree, and calculating according to the dimension data in the abnormal log set to obtain a corresponding second frequent dimension data set, the second frequent dimension data set including each second frequent dimension data combination and a corresponding second support degree, includes:
calculating to obtain each first frequent dimension data combination and a corresponding first support degree according to the dimension data in the normal log set by using an association rule algorithm, and obtaining a first frequent dimension data set according to each first frequent dimension data combination and the corresponding first support degree;
and calculating to obtain each second frequent dimension data combination and a corresponding second support degree by using the association rule algorithm according to the dimension data in the abnormal log set, and obtaining a second frequent dimension data set according to each second frequent dimension data combination and the corresponding second support degree.
6. The method of claim 5, wherein the calculating the respective first frequent dimension data combinations and the corresponding first support degrees according to the dimension data in the normal log set by using an association rule algorithm comprises:
determining each candidate one-dimensional data corresponding to the normal log set, calculating the support degree corresponding to each candidate one-dimensional data, screening from each candidate one-dimensional data according to the support degree corresponding to each candidate one-dimensional data to obtain frequent one-dimensional data, and taking the frequent one-dimensional data as a current frequent one-dimensional data combination;
connecting the current frequent dimension data combinations to obtain each candidate two-dimension data combination, and deleting the target candidate two-dimension data combination from each candidate two-dimension data combination to obtain each primarily selected frequent two-dimension data combination when the subset of the target candidate two-dimension data combination in each candidate two-dimension data combination is not in the current frequent dimension data combination;
calculating the support degree corresponding to each primary selection frequent two-dimensional data combination, and screening from each primary selection frequent two-dimensional data combination according to the support degree corresponding to each primary selection frequent two-dimensional data combination to obtain a frequent two-dimensional data combination;
and taking the frequent two-dimensional data combination as a current frequent dimensional data combination, returning to the step of connecting the current frequent dimensional data combination, and taking all the obtained current frequent dimensional data combinations as the first frequent dimensional data combinations until the current frequent dimensional data combination is empty, and acquiring the first support degrees corresponding to the first frequent dimensional data combinations.
7. The method of claim 5, wherein the calculating the respective first frequent dimension data combinations and the corresponding first support degrees according to the dimension data in the normal log set by using an association rule algorithm comprises:
determining each candidate one-dimensional data corresponding to the normal log set, calculating the support degree corresponding to each candidate one-dimensional data, screening from each candidate one-dimensional data according to the support degree corresponding to each candidate one-dimensional data to obtain frequent one-dimensional data, and sequencing the frequent one-dimensional data combinations according to the corresponding support degrees to obtain sequenced frequent one-dimensional data combinations;
sorting the dimension data in the normal log set according to the sorted frequent one-dimension data combination to obtain a sorted normal log set;
and calculating to obtain each first frequent dimension data combination and corresponding first support degree by using the frequent pattern tree according to the corresponding frequent pattern tree generated by the sorted normal log set.
8. The method according to claim 1, wherein after obtaining a first frequent dimension data set by calculation according to the dimension data in the normal log set, the first frequent dimension data set including each first frequent dimension data combination and a corresponding first support degree, and obtaining a second frequent dimension data set by calculation according to the dimension data in the abnormal log set, the second frequent dimension data set including each second frequent dimension data combination and a corresponding second support degree, the method further comprises:
deleting a first parent frequent dimension data combination in each first frequent dimension data combination, wherein the first parent frequent dimension data combination refers to a first frequent dimension data combination with a child frequent dimension data combination in each first frequent dimension data combination;
deleting a second parent frequent dimension data combination in each second frequent dimension data combination, wherein the second parent frequent dimension data combination refers to a second frequent dimension data combination with a child frequent dimension data combination in each second frequent dimension data combination.
9. The method of claim 1, wherein the step of sorting the target frequent dimension data combinations in the target frequent dimension data set according to the difference support corresponding to the target frequent dimension data combinations in the target frequent dimension data set and the dimension data number of the target frequent dimension data combinations and determining abnormal target frequent dimension data combinations according to the sorting result comprises:
determining each target frequent dimension data combination in the target frequent dimension data set, acquiring the dimension data quantity corresponding to each target frequent dimension data combination, and sorting each target frequent dimension data combination by using a preset sorting rule according to the difference support degree and the dimension data quantity to obtain each sorted target frequent dimension data combination;
and selecting a preset number of target frequent dimension data combinations from the sorted target frequent dimension data combinations to obtain abnormal target frequent dimension data combinations.
10. The method of claim 1, further comprising:
acquiring target abnormal log data, wherein the number of dimensions in the target abnormal log data is less than the preset number of dimensions, and the type of dimension data corresponding to the dimensions is less than the preset number of types;
converting the target abnormal log data into a target abnormal log set;
calculating to obtain a corresponding third frequent dimension data set according to the dimension data in the target abnormal log set, wherein the third frequent dimension data set comprises each third frequent dimension data combination and a corresponding third support degree;
and determining the number of target dimension data corresponding to each third frequent dimension data combination, sorting the third frequent dimension data combinations according to the third support degree and the number of the dimension data, and determining the target third frequent dimension data combination according to a sorting result.
11. The method according to claim 10, further comprising, after said sorting the respective third frequent dimension data combinations according to the third support degree and the number of dimension data:
obtaining each sequenced third frequent dimension data combination, and determining a third father frequent dimension data combination and a third son frequent dimension data combination from each sequenced third frequent dimension data combination;
the third child frequent dimension data combination is a subset of the third parent frequent dimension data combination, and the abnormal difference between the support corresponding to the third child frequent dimension data combination and the support corresponding to the third parent frequent dimension data combination is smaller than a preset abnormal difference threshold;
and deleting the third sub-frequent dimension data combination from each sorted abnormal frequent dimension data combination to obtain each deleted third frequent dimension data combination.
12. The method according to claim 11, further comprising, after the deleting the third sub-frequent dimension data combination from the sorted abnormal frequent dimension data combinations to obtain deleted third frequent dimension data combinations:
determining a first frequent dimension data combination to be merged and a second frequent dimension data combination to be merged from the deleted third frequent dimension data combinations, wherein a preset amount of same dimension data exists between the first frequent dimension data combination to be merged and the second frequent dimension data combination to be merged, and the difference between the support degree corresponding to the first frequent dimension data combination to be merged and the support degree corresponding to the second frequent dimension data combination to be merged does not exceed a preset merging threshold;
and merging the first frequent dimension data combination to be merged and the second frequent dimension data combination to be merged to obtain a merged third frequent dimension data combination.
13. An anomaly data locating apparatus, said apparatus comprising:
the log processing module is used for acquiring abnormal log data and normal log data; respectively converting the normal log data and the abnormal log data to obtain a normal log set and an abnormal log set;
a frequent set calculation module, configured to calculate to obtain a corresponding first frequent dimension data set according to the dimension data in the normal log set, where the first frequent dimension data set includes each first frequent dimension data combination and a corresponding first support degree, and calculate to obtain a corresponding second frequent dimension data set according to the dimension data in the abnormal log set, and the second frequent dimension data set includes each second frequent dimension data combination and a corresponding second support degree;
a target frequent set obtaining module, configured to perform de-coincidence on frequent dimension data combinations in the first frequent dimension data set and the second frequent dimension data set, and obtain a target frequent dimension data set;
the difference calculation module is used for calculating and obtaining the difference support degree corresponding to the target frequent dimensional data combination in the target frequent dimensional data set according to the first support degree corresponding to each first frequent dimensional data combination and the second support degree corresponding to each second frequent dimensional data combination;
and the anomaly determination module is used for sequencing the target frequent dimension data combinations in the target frequent dimension data set according to the difference support degree corresponding to the target frequent dimension data combinations in the target frequent dimension data set and the dimension data quantity of the target frequent dimension data combinations, and determining the anomalous target frequent dimension data combinations according to the sequencing result.
14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 12.
15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.
CN202010368783.4A 2020-04-29 2020-04-29 Abnormal data positioning method and device, computer equipment and storage medium Pending CN111552684A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010368783.4A CN111552684A (en) 2020-04-29 2020-04-29 Abnormal data positioning method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010368783.4A CN111552684A (en) 2020-04-29 2020-04-29 Abnormal data positioning method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111552684A true CN111552684A (en) 2020-08-18

Family

ID=72004878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010368783.4A Pending CN111552684A (en) 2020-04-29 2020-04-29 Abnormal data positioning method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111552684A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448761A (en) * 2021-06-17 2021-09-28 新浪网技术(中国)有限公司 Root cause positioning method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448761A (en) * 2021-06-17 2021-09-28 新浪网技术(中国)有限公司 Root cause positioning method and device

Similar Documents

Publication Publication Date Title
CN111614690B (en) Abnormal behavior detection method and device
CN108650684B (en) Association rule determination method and device
CN105224636A (en) A kind of data access method and device
CN110555164B (en) Method, device, computer equipment and storage medium for generating group interest labels
CN109783457B (en) CGI interface management method, device, computer equipment and storage medium
CN115392799B (en) Attribution analysis method and device, computer equipment and storage medium
CN113254255B (en) Cloud platform log analysis method, system, device and medium
CN103678293A (en) Data storage method and device
CN112181931A (en) Big data system link tracking method and electronic equipment
CN113688288A (en) Data association analysis method and device, computer equipment and storage medium
Sujatha Improved user navigation pattern prediction technique from web log data
CN115544519A (en) Method for carrying out security association analysis on threat information of metering automation system
CN110535686B (en) Abnormal event processing method and device
CN113283502B (en) Device state threshold determining method and device based on clustering
CN111552684A (en) Abnormal data positioning method and device, computer equipment and storage medium
CN114817243A (en) Method, device and equipment for establishing database joint index and storage medium
Basik et al. Slim: Scalable linkage of mobility data
CN107920067A (en) A kind of intrusion detection method in active objects storage system
CN112241410A (en) Data storage method, data index construction method and device and computer equipment
CN104794234A (en) Data processing method and device for benchmarking
CN106776704B (en) Statistical information collection method and device
CN115129724A (en) Statistical report paging method, system, equipment and medium
US11704315B1 (en) Trimming blackhole clusters
CN114064723A (en) Association rule mining method and device, computer equipment and storage medium
CN109582806B (en) Personal information processing method and system based on graph calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40027876

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination