CN108683530A - Data analysing method, device and the storage medium of multi-dimensional data - Google Patents
Data analysing method, device and the storage medium of multi-dimensional data Download PDFInfo
- Publication number
- CN108683530A CN108683530A CN201810400910.7A CN201810400910A CN108683530A CN 108683530 A CN108683530 A CN 108683530A CN 201810400910 A CN201810400910 A CN 201810400910A CN 108683530 A CN108683530 A CN 108683530A
- Authority
- CN
- China
- Prior art keywords
- dimension
- dimensional data
- stream magnitude
- data
- time period
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
- H04L41/0636—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis based on a decision tree analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0677—Localisation of faults
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the present invention proposes a kind of data analysing method of multi-dimensional data, device and computer readable storage medium.The data analysing method of wherein multi-dimensional data includes:Obtain the normal stream magnitude and exception stream magnitude of each dimension in the dimension combination of multi-dimensional data;The normal stream magnitude and exception stream magnitude of the dimension combination of multi-dimensional data and dimension combination are inputted into decision tree, doubtful is filtered out because of dimension from the combination of the dimension of multi-dimensional data using decision tree;Calculate the doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree;According to the calculated doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree, doubtful of identification is because whether dimension is root because of dimension, and the root of root flow loss caused by dimension is is because of corresponding data dimension.The embodiment of the present invention, according to the multi-dimensional data of fault indices, quickly analyzes root because of dimension, saves the time of operation maintenance personnel positioning failure, reduce the loss that failure is brought when breaking down.
Description
Technical field
The present invention relates to a kind of information technology field more particularly to data analysing method of multi-dimensional data, device and meters
Calculation machine readable storage medium storing program for executing.
Background technology
In order to preferably understand the operation conditions with Analysis Service in real time, Internet company would generally be in acquisition monitoring data
When, attribute tags as much as possible are enclosed, such as UA (User Agent, user agent), network formats, geographical location etc..Label
It is from different angles or description of the dimension to data, the description information of different dimensions make the gathered data have powerful expression
Ability constitutes the multi-dimensional data of the gathered data.
It is current to be positioned mainly by manually checking, comparing to the data of different dimensions using multi-dimensional data, from
The apparent dimension of intensity of anomaly is found in all dimensions.By manually being judged according to multi-dimensional data when failure occurs, need
Wanting staff has certain experience, and since deterministic process needs to check comprehensive descision after the tendency charts of more data, mistake
Journey can expend longer time.When data dimension is more, positioning time can rise sharply, and cause to stop loss because can not quickly position
And cause larger loss.
Invention content
The embodiment of the present invention provides a kind of data analysing method of multi-dimensional data, device and computer-readable storage medium
Matter, at least to solve one or more technical problems in the prior art.
In a first aspect, an embodiment of the present invention provides a kind of data analysing methods of multi-dimensional data, including:Obtain multidimensional
The normal stream magnitude and exception stream magnitude of each dimension in the dimension combination of degrees of data;Dimension combination by multi-dimensional data and institute
Normal stream magnitude and the exception stream magnitude input decision tree for stating dimension combination, using the decision tree from the multi-dimensional data
Doubtful is filtered out because of dimension in dimension combination;Described doubtful is calculated because the contribution degree of dimension is consistent with the sub- dimension extent of damage
Degree;And according to the calculated described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree, identify described doubt
Whether rooty is root because of dimension because of dimension, wherein the root of the described flow loss caused by dimension is is because of corresponding data dimension
Degree.
With reference to first aspect, the embodiment of the present invention obtains multi-dimensional data in the first realization method of first aspect
Each dimension normal stream magnitude and exception stream magnitude, including:Monitor the total flow of the multi-dimensional data;And if monitoring
The total flow of the multi-dimensional data in preset time period has flow loss, then obtains the various dimensions in the preset time period
The normal stream magnitude and exception stream magnitude of each dimension of data.
The first realization method with reference to first aspect, second realization method of the embodiment of the present invention in first aspect
In, the normal stream magnitude and exception stream magnitude that obtain each dimension of the multi-dimensional data in the preset time period include:It will obtain
The flor rate data value of each dimension in the flor rate data value and designated time period of each dimension in the preset time period taken
Difference is determined as the exception stream magnitude of each dimension.
The first realization method with reference to first aspect, the third realization method of the embodiment of the present invention in first aspect
In, the normal stream magnitude and exception stream magnitude that obtain each dimension of the multi-dimensional data in the preset time period include:Statistics
The number of the failed access of each dimension in the preset time period, wherein by not receiving back in the preset time period
The access of complex information is as failed access;And the number of the access failure of each dimension is determined as to the exception stream of each dimension
Magnitude.
The first realization method with reference to first aspect, four kind realization method of the embodiment of the present invention in first aspect
In, the normal stream magnitude and exception stream magnitude that obtain each dimension of the multi-dimensional data in the preset time period include:Prediction
The flor rate data value of each dimension in the preset time period;By the flow number of each dimension in the preset time period of acquisition
It is determined as the exception of each dimension according to the difference of value and the flor rate data value of each dimension in the preset time period of prediction
Flow value.
With reference to first aspect, the first realization method, second of realization method of first aspect of first aspect, first party
The third realization method in face, the 4th of first aspect the kind of realization method, five kind reality of the embodiment of the present invention in first aspect
In existing mode, doubtful is filtered out because of dimension using the decision tree, including:The exception stream that the dimension of multi-dimensional data is combined
Magnitude combines the weight in positive example set as the dimension, and the normal stream magnitude that the dimension of multi-dimensional data is combined is as institute
Dimension combination is stated in the weight for bearing example set;Positive and negative example sample weights are balanced, so that positive and negative example sample weights phase under original state
When;The information gain-ratio of each dimension is calculated according to the positive and negative example sample weights after balance, selects the maximum dimension of information gain-ratio
Degree is divided, and the decision tree is constructed;And the path of the decision tree of construction is determined as doubtful because of dimension.
The 5th kind of realization method with reference to first aspect, six kind realization method of the embodiment of the present invention in first aspect
In, the positive and negative example sample weights of balance include:Exception stream magnitude that the dimension of multi-dimensional data is combined and coefficient of balance
Product combines the weight in positive example set as the dimension, and the normal stream magnitude that the dimension of multi-dimensional data is combined is as institute
Dimension combination is stated in the weight for bearing example set, wherein the coefficient of balance is the normal stream magnitude of each dimension of multi-dimensional data
Summation and each dimension exception stream magnitude the ratio between summation.
With reference to first aspect, the first realization method, second of realization method of first aspect of first aspect, first party
The third realization method in face, the 4th of first aspect the kind of realization method, seven kind reality of the embodiment of the present invention in first aspect
In existing mode, according to the calculated described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree, described in identification
Doubtful because whether dimension is root because of dimension, including:By calculated described doubtful because the contribution degree of dimension and sub- dimension are damaged
Mistake degree consistent degree is input to grader, to described doubtful because whether dimension is root because dimension is classified.
Second aspect, an embodiment of the present invention provides a kind of data analysis set-ups of multi-dimensional data, including:Flow obtains
Unit, the normal stream magnitude and exception stream magnitude of each dimension during the dimension for obtaining multi-dimensional data combines;Dimension screening is single
Member, for the normal stream magnitude and exception stream magnitude of the dimension combination of multi-dimensional data and dimension combination to be inputted decision
Tree filters out doubtful because of dimension using the decision tree from the combination of the dimension of the multi-dimensional data;Feature calculation unit,
For calculating the described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree;And recognition unit, it is used for basis
The calculated described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree identify described doubtful because dimension is
It is no for root because of dimension, wherein the root of the described flow loss caused by dimension is is because of corresponding data dimension.
In conjunction with second aspect, in the first realization method of second aspect, the flow obtains single the embodiment of the present invention
Member includes:Monitor subelement, the total flow for monitoring the multi-dimensional data;And subelement is obtained, it is used for:If monitoring
The total flow of the multi-dimensional data in preset time period has flow loss, then obtains the various dimensions in the preset time period
The normal stream magnitude and exception stream magnitude of each dimension of data.
In conjunction with the first realization method of second aspect, second realization method of the embodiment of the present invention in second aspect
In, the acquisition subelement is additionally operable to:By the flor rate data value of each dimension in the preset time period of acquisition with it is specified when
Between the difference of flor rate data value of each dimension in section be determined as the exception stream magnitude of each dimension.
In conjunction with the first realization method of second aspect, the third realization method of the embodiment of the present invention in second aspect
In, the acquisition subelement is additionally operable to:Count the number of the failed access of each dimension in the preset time period, wherein will
The access for not receiving return information in the preset time period is as failed access;And the access of each dimension fails
Number is determined as the exception stream magnitude of each dimension.
In conjunction with the first realization method of second aspect, four kind realization method of the embodiment of the present invention in second aspect
In, the acquisition subelement is additionally operable to:Predict the flor rate data value of each dimension in the preset time period;Described in acquisition
The flor rate data value of the flor rate data value of each dimension in preset time period and each dimension in the preset time period of prediction
Difference be determined as the exception stream magnitude of each dimension.
In conjunction with the first realization method, second of realization method of second aspect, second party of second aspect, second aspect
The third realization method in face, the 4th of second aspect the kind of realization method, five kind reality of the embodiment of the present invention in second aspect
In existing mode, the dimension screening unit is additionally operable to:The exception stream magnitude that the dimension of multi-dimensional data is combined is as the dimension
Degree combination is in the weight of positive example set, and the normal stream magnitude that the dimension of multi-dimensional data is combined is as dimension combination negative
The weight of example set;Positive and negative example sample weights are balanced, so that positive and negative example sample weights are suitable under original state;After balance
Positive and negative example sample weights calculate the information gain-ratio of each dimension, and the maximum dimension of information gain-ratio is selected to be divided, and construct
The decision tree;And the path of the decision tree of construction is determined as doubtful because of dimension.
In conjunction with the 5th kind of realization method of second aspect, six kind realization method of the embodiment of the present invention in second aspect
In, the positive and negative example sample weights of balance include:Exception stream magnitude that the dimension of multi-dimensional data is combined and coefficient of balance
Product combines the weight in positive example set as the dimension, and the normal stream magnitude that the dimension of multi-dimensional data is combined is as institute
Dimension combination is stated in the weight for bearing example set, wherein the coefficient of balance is the normal stream magnitude of each dimension of multi-dimensional data
Summation and each dimension exception stream magnitude the ratio between summation.
In conjunction with the first realization method, second of realization method of second aspect, second party of second aspect, second aspect
The third realization method in face, the 4th of second aspect the kind of realization method, seven kind reality of the embodiment of the present invention in second aspect
In existing mode, the recognition unit is additionally operable to:By calculated described doubtful because the contribution degree of dimension and sub- dimension lose journey
Degree consistent degree is input to grader, to described doubtful because whether dimension is root because dimension is classified.
The third aspect, an embodiment of the present invention provides a kind of data analysis set-ups of multi-dimensional data, including:One or more
A processor;Storage device, for storing one or more programs;When one or more of programs are one or more of
When processor executes so that one or more of processors realize the method as described in any in above-mentioned first aspect.
In a possible design, the structure of the data analysis set-up of multi-dimensional data includes processor and storage
Device, the memory, which is used to store, supports the data analysis set-up of multi-dimensional data to execute multi-dimensional data in above-mentioned first aspect
Data analysing method program, the processor is configurable for executing the program stored in the memory.It is described more
The data analysis set-up of dimension data can also include communication interface, and data analysis set-up and other for multi-dimensional data are set
Standby or communication.
Fourth aspect, an embodiment of the present invention provides a kind of computer readable storage mediums, are stored with computer program,
The program realizes any method in above-mentioned first aspect when being executed by processor.
Above-mentioned technical proposal has the following advantages that or advantageous effect:It can be when breaking down, according to the more of fault indices
Dimension data quickly analyzes root because of dimension, saves the time of operation maintenance personnel positioning failure, reduces the loss that failure is brought.
Above-mentioned general introduction is merely to illustrate that the purpose of book, it is not intended to be limited in any way.Except foregoing description
Schematical aspect, except embodiment and feature, by reference to attached drawing and the following detailed description, the present invention is further
Aspect, embodiment and feature, which will be, to be readily apparent that.
Description of the drawings
In the accompanying drawings, unless specified otherwise herein, otherwise run through the identical reference numeral of multiple attached drawings and indicate same or analogous
Component or element.What these attached drawings were not necessarily to scale.It should be understood that these attached drawings are depicted only according to the present invention
Some disclosed embodiments, and should not serve to limit the scope of the present invention.
Fig. 1 is the general frame figure of the data analysing method of the multi-dimensional data of the embodiment of the present invention;
Fig. 2 is a kind of step flow of preferred embodiment of the data analysing method of multi-dimensional data provided by the invention
Figure;
Fig. 3 shows the signal of the decision tree of the data analysing method of the multi-dimensional data according to an embodiment of the present invention
Figure;
Fig. 4 a and Fig. 4 b show the decision tree of the data analysing method of the multi-dimensional data according to an embodiment of the present invention
Structural division process schematic;
Fig. 5 shows doubtful of the data analysing method of the multi-dimensional data according to an embodiment of the present invention because of dimension group
Close complete or collected works' schematic diagram;
Fig. 6 is the general frame figure of the data analysis set-up of the multi-dimensional data of the embodiment of the present invention;
Fig. 7 shows the structure diagram of the data analysis set-up of multi-dimensional data according to another embodiment of the present invention;
Fig. 8 shows the structure diagram of the data analysis set-up of multi-dimensional data according to another embodiment of the present invention.
Specific implementation mode
Hereinafter, certain exemplary embodiments are simply just described.As one skilled in the art will recognize that
Like that, without departing from the spirit or scope of the present invention, described embodiment can be changed by various different modes.
Therefore, attached drawing and the content of description are considered essentially illustrative rather than restrictive.
An embodiment of the present invention provides a kind of data analysing methods of multi-dimensional data.Fig. 1 is the more of the embodiment of the present invention
The general frame figure of the data analysing method of dimension data.As shown in Figure 1, the data of the multi-dimensional data of the embodiment of the present invention point
Analysis method includes:Step S110 obtains the normal stream magnitude and exception stream magnitude of each dimension in the dimension combination of multi-dimensional data;
Step S120 determines the normal stream magnitude and the input of exception stream magnitude of the dimension combination of multi-dimensional data and dimension combination
Plan tree filters out doubtful because of dimension using the decision tree from the combination of the dimension of the multi-dimensional data;Step S130, meter
Calculate the described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree;And step S140, according to calculated institute
The doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree are stated, identifies described doubtful because whether dimension is Gen Yinwei
Degree, wherein the root of the described flow loss caused by dimension is is because of corresponding data dimension.
The data analysing method of the multi-dimensional data of the embodiment of the present invention can be used for when failure occurs from all dimensions
Root is found because of dimension, wherein root is the apparent dimension of intensity of anomaly because of dimension.It is below two and positions root in multi-dimensional data
Because of the example of dimension.
Example one:Dimension combination includes province and operator, wherein operator such as unicom, movement, telecommunications etc..In service flow
Amount reads in the data on flows of each dimension when failure when damaging, when according to failure the data on flows of each dimension to root because dimension carries out
Quickly positioning, such as the data traffic loss of telecommunications are more, then positioning result is:The apparent root of intensity of anomaly is operation because of dimension
Quotient's dimension.
Example two:Dimension combination includes operating system, browser and mobile communication technology, wherein operating system such as apple, peace
Zhuo etc.;Browser such as Google's browser, 360 browsers, UC browsers etc.;Mobile communication technology such as 3G, 4G etc..It is applied in publication
Monitoring data total flow later judges failure occur when total flow damages, reads in the data on flows of each dimension when failure, root
The data on flows of each dimension is to root because dimension is quickly positioned when according to failure, for example positioning result is:It is being used if this is applied
Flow loss intensity of anomaly is apparent when Google's browser, then root is because dimension is browser.
In a particular application, traffic monitoring software, monitoring network flow can be used.It, can when flow of services damages
Using the embodiment of the present invention multi-dimensional data data analysing method to root because dimension is quickly positioned, so as to shorten stopping loss
Time reduces breakdown loss.
According to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, each dimension of multi-dimensional data is obtained
The normal stream magnitude and exception stream magnitude of degree, including:Monitor the total flow of the multi-dimensional data;And if when monitoring default
Between the total flow of the multi-dimensional data in section have flow loss, then obtain the multi-dimensional data in the preset time period
The normal stream magnitude and exception stream magnitude of each dimension.
In this embodiment, monitoring data total flow judges failure occur when total flow damages, reads in failure
When each dimension flor rate data value.Wherein, flor rate data value includes normal stream magnitude, also includes exception stream magnitude, flow number
It is the summation of normal stream magnitude and exception stream magnitude according to value.It needs by certain mode, such as by way of acquiring or predicting,
The exception stream magnitude of each dimension obtained, exception stream magnitude namely lose flor rate data value.
According to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, obtain in the preset time period
Multi-dimensional data each dimension normal stream magnitude and exception stream magnitude include:It will be each in the preset time period of acquisition
The difference of the flor rate data value of each dimension in the flor rate data value and designated time period of dimension is determined as the different of each dimension
Normal flow value.
In this embodiment, the exception stream magnitude of each dimension obtained by way of acquisition, acquisition include acquisition
The flow actually occurred.Can drop how many according to the flow rate calculation flow actually occurred, how much calculate flow drop can be with finger
The flor rate data value for each dimension in section of fixing time, which makes the difference, to be worth.For example, the stream of each dimension in current slot can be calculated
Measure the difference of data value and the flor rate data value of each dimension in previous time period.Optionally, it can calculate in current slot
The difference of the flor rate data value of each dimension and the flor rate data value of each dimension in the same period of the previous day.Another optional
In embodiment, each dimension in the flor rate data value of each dimension in current slot and the same period of other day can be also calculated
The difference of the flor rate data value of degree may specify the number of days in " other day ", such as a week or one month.
According to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, obtain in the preset time period
Multi-dimensional data each dimension normal stream magnitude and exception stream magnitude include:Count each dimension in the preset time period
Failed access number, wherein using the access for not receiving return information in the preset time period as failed access;
And the number of the access failure of each dimension is determined as to the exception stream magnitude of each dimension.
Specifically, the specific method of the exception stream magnitude of each dimension obtained by way of acquisition, can also count
How many requests are not handled, and are exactly the number of failed access without processed request number of times.If access does not receive back
Complex information, that is, the access request are not handled, then it is assumed that are the case where accessing failure.It can be by the failed access of each dimension
Number be determined as the exception stream magnitude of each dimension.Similarly, the access for receiving return information is considered as then accessing successfully
The case where, the number of the successful access of each dimension is determined as to the normal stream magnitude of each dimension.
According to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, obtain in the preset time period
Multi-dimensional data each dimension normal stream magnitude and exception stream magnitude include:Predict each dimension in the preset time period
Flor rate data value;By the preset time of the flor rate data value of each dimension in the preset time period of acquisition and prediction
The difference of the flor rate data value of each dimension in section is determined as the exception stream magnitude of each dimension.
The exception stream magnitude of each dimension obtained by way of prediction, including:Prediction were it not for the stream to break down
Amount, the difference with the collected flow actually occurred are exception stream magnitude, that is, the flow lost.Specifically, it can count
The cyclically-varying rule of network flow, according in the information predictions such as period and/or user browsing behavior pattern current slot
Each dimension flor rate data value.The difference for the flor rate data value that the flor rate data value of prediction and actual acquisition are arrived is as abnormal
Flow value.
Fig. 2 is a kind of step flow of preferred embodiment of the data analysing method of multi-dimensional data provided by the invention
Figure.As shown in Fig. 2, according to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, the step in Fig. 1
S120 filters out doubtful because of dimension using the decision tree, including:Step S210 combines the dimension of multi-dimensional data
Exception stream magnitude combines the weight in positive example set as the dimension, the normal stream magnitude that the dimension of multi-dimensional data is combined
As dimension combination in the weight for bearing example set;Step S220 balances positive and negative example sample weights, so that under original state just
Negative example sample weights are suitable;Step S230 calculates the information gain of each dimension according to the positive and negative example sample weights after balance
Rate selects the maximum dimension of information gain-ratio to be divided, constructs the decision tree;And step S240, described in construction
The path of decision tree is determined as doubtful because of dimension.
Decision tree is a kind of tree construction of similar flow chart, wherein each internal node (non-leaf nodes) is indicated at one
Test on attribute, each branch represents a test output, and each leaf nodes store a class label.Once establishing
Decision tree, for the tuple of given class label, as soon as tracking has the root node to the path of leaf node, the leaf node
Store the prediction of the tuple.
The embodiment of the present invention filters out doubtful because of dimension with the process of construction decision tree, and the input feature vector of decision tree is
The dimension of access combines, such as province and operator and its normal stream magnitude, exception stream magnitude, and dimension combination is thus for output
No is positive example, that is, doubtful because of dimension;Being obtained by model training has the decision tree of preferable discrimination, doubtful to obtain
Root combines complete or collected works, i.e. decision tree path because of dimension.It wherein, can be with doubtful to screen based on the process of C4.5 algorithm construction decision trees
Root filters out the doubtful calculation amount that can reduce subsequent dimensional characteristics calculating and root because of identification because of dimension because of dimension.
In step S210, some dimension combination d in multi-dimensional data is regarded as a sample point, then dimension combines d's
Access frequency of failure pvlostd, that is, exception stream magnitude, as d positive example set weight weightpositive_d, dimension
Combine the access number of success pv of dd, that is, normal stream magnitude, as d in the weight weight for bearing example setnegative_d。
According to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, step S220 balances positive and negative example sample
This weight includes:The exception stream magnitude that the dimension of multi-dimensional data combines is combined with the product of coefficient of balance as the dimension
In the weight of positive example set, the normal stream magnitude that the dimension of multi-dimensional data is combined is combined as the dimension in negative example set
Weight, wherein the coefficient of balance is the exception of the summation and each dimension of the normal stream magnitude of each dimension of multi-dimensional data
The ratio between summation of flow value.
Doubtful is screened because dimension is it is assumed that make initial state information entropy maximum, need using information gain-ratio in order to meet
The positive and negative example sample weights of balance are used so that positive and negative example sample weights are suitable under original state.In this embodiment, most
Whole positive example weight weightpositive_d'=pvlostd*(pvtotal/pvlosttotal);Final negative example weight
weightnegative_d'=pvd。
For example, when only being combined there are two dimension, according to pvlosttotalIt is 1, pvtotalIt is 100, pvlostd1It is 1, pvd1
It is 10, pvlostd2It is 0, pvd2The case where being 90, calculates:
The positive example weight weight of sample point d1positive_d1For pvlostd1*(pvtotal/pvlosttotalExample is born in)=100
Weight weightnegative_d1For pvd1=10;
Similarly the positive example weight of d2 is 0, and it is 90 to bear example weight, and generally the positive example weight of original state is 100, bears example
Weight is 100.Initial state information entropy is maximum.
In step S230, the training stage of decision tree constructs a decision tree from given training dataset.It can be with
It is trained based on C4.5 algorithms to establish decision tree.Division only uses a dimension and is screened every time, in each divide, calculates
The information gain-ratio that each dimension is brought, the maximum and feature (i.e. dimension) more than 0 of greed selection information gain-ratio are drawn
Point.Stop subtree when entropy production is negative to generate, saves the calculating of sub-tree section in this way, result in the decision tree ultimately generated
Node path for non-negative example is doubtful because of dimension, wherein the node path of non-negative example includes n omicronn-leaf child node.
For example, according to the case where only there are two dimensions, province has value Beijing, Shanghai, operator to have value telecommunications, connection
It is logical.The situation of telecommunications exception is taken to analyze, telecommunications can cause telecommunications positive example weight very high (with pvlost positive correlations) extremely, deviate
Equilbrium position, comentropy are less than the dimension of other relative equilibriums;The negative example weight of unicom is very high, also offsets from equilbrium position, letter
Breath entropy is relatively low, and the information gain-ratio of operator's dimension can be made to be higher than the information gain-ratio of province dimension, select operator at this time
It is divided, is not considered further that<Province>、<Province, operator>This two classes dimension combines, wherein information gain-ratio is that comentropy is equal
The reduction degree of value.The rest may be inferred, and one group can be obtained based on greedy method can preferably distinguish normal and abnormal dimension group
It closes, and beta pruning is with obvious effects.
For another example, still according to the case where only there are two dimensions, province has value Beijing, Hebei, operator to have value unicom, electricity
Letter.Table 1 is the flor rate data value and weighted value of the multi-dimensional data in this example.Table 1 shows 4 sample points altogether, is respectively:Sample
This d11, Beijing unicom;Sample point d12, Beijing Telecom;Sample point d21, Hebei unicom;Sample point d22, Hebei telecommunications.It presses
According to data in table 1, total pvlost of exception stream magnitudetotalIt is 100, total pv of normal stream magnitudetotalIt is 1000,
pvlostd11It is 90, pvd1It is 100, is calculated:The positive example weight weight of sample point d11positive_d11For pvlostd11*
(pvtotal/pvlosttotalExample weight weight is born in)=900negative_d11For pvd1=100;Similarly the positive example weight of d12 is
100, it is 80 to bear example weight;The positive example weight of d21 is 0, and it is 200 to bear example weight;The positive example weight of d22 is 0, and negative example weight is
620。
The flor rate data value and weighted value of 1 multi-dimensional data of table
Province | Operator | Normal stream magnitude | Exception stream magnitude | Positive example weight | Negative example weight |
Beijing | Unicom | 100 | 90 | 900 | 100 |
Beijing | Telecommunications | 80 | 10 | 100 | 80 |
Hebei | Unicom | 200 | 0 | 0 | 200 |
Hebei | Telecommunications | 620 | 0 | 0 | 620 |
It is total | 1000 | 100 | 1000 | 1000 |
Fig. 3 shows the signal of the decision tree of the data analysing method of the multi-dimensional data according to an embodiment of the present invention
Figure;Fig. 4 a and Fig. 4 b show that the decision tree construction of the data analysing method of the multi-dimensional data according to an embodiment of the present invention is drawn
Divide process schematic.Fig. 3 is the decision tree schematic diagram gone out according to sample set data configuration shown in table 1.Decision tree shown in Fig. 3
Specific partition process shown by Fig. 4 a and Fig. 4 b.
Wherein, Fig. 4 a are that decision tree divides schematic diagram for the first time.As shown in fig. 4 a, it divides for the first time by node (1), also
It is root node, node (2) Beijing and node (3) Hebei is divided into using province.Specifically, according to sample set data, i.e. 1 institute of table
The normal stream magnitude of sample point d11, d12, d21, d22 for showing and the data of exception stream magnitude calculate, if partition dimension uses province
Part divides, then Pekinese's positive example/negative example ratio is 1000/180, and the positive example/negative example ratio in Hebei is 0/820;If partition dimension
It is divided using operator, then the positive example of telecommunications/negative example ratio is 100/700, and positive example/negative example ratio of unicom is 900/300.
In the present embodiment, it is trained based on C4.5 algorithms to establish decision tree.C4.5 algorithms are selected with information gain-ratio
Attribute.Attributions selection measurement is also known as splitting rule, because they determine how the tuple given on node divides.Attributions selection degree
Amount provides the ranking of the given training tuple of each attribute description, and there is the attribute of preferably measurement score to be selected as given tuple
Split Attribute.Such as C4.5 algorithms select attribute with information gain-ratio.When decision tree creates, many branch reflections are
The problem of exception in training data, pruning method is for handling this excessive fitting data.In decision tree construction process
Beta pruning is carried out, because certain nodes with seldom element may make the decision tree of construction cross adaptation, if not considering these
Node may be more preferable.
In machine learning and Feature Engineering, the uncertainty of information can be indicated with entropy.Limited is taken for one
The stochastic variable X of value, if its probability distribution is:
P (X=xi)=pi, i=1,2 ..., n
So the entropy of stochastic variable X can be described with following formula:
For example, if in a categorizing system, the mark of classification is c, and value condition is c1,c2,…,c n, n is
The sum of classification, then the entropy of this categorizing system is:
What information gain referred to is exactly the decrement of entropy, is after dividing the entropy of preceding sample set and being divided using some feature
The difference of the entropy of data subset, that is, after some feature X is held to, the information gain brought to system.It is whole as feature X
When body distribution situation is fixed, conditional entropy is H (c | X).So because being characterized after X is held to, increase to the information that system is brought
Benefit is:IG (X)=H (c)-H (c | X).
Information gain-ratio is to divide Information Meter with above-mentioned information gain and division measure information come common definition
The entropy H (X) of amount i.e. feature X, then information gain-ratio is:
During first time shown in figure 4a divides, the information after dividing according to province and being divided according to operator is calculated separately
Ratio of profit increase, since the information gain-ratio after being divided according to province is more than the information gain-ratio after being divided according to operator, choosing
It selects and is divided according to province, node (1) is made to divide out child node (2) Beijing and child node (3) Hebei.
It is identical as the calculation that first time divides in second of division shown in Fig. 4 b, pass through information gain-ratio
Calculate the dividing mode for determining node (2) and node (3).For node (2), selection is divided according to operator, makes node (2) point
Split child node (4) Beijing Telecom and child node (5) Beijing unicom;For node (3), the information gain divided due to operator
Rate is 0, so not subdivided.Doubtful finally obtained is as shown in Figure 5 because of dimension combination complete or collected works, that is, decision tree path.
Step S120 in Fig. 1 after filtering out doubtful because of dimension using decision tree, executes step S130, dimension
Characteristic value calculates.Calculate all doubtful two features because of dimension:Contribution degree, sub- dimension extent of damage consistent degree.Contribution degree
It can be calculated according to formula 1, sub- dimension extent of damage consistent degree can be weighed with the coefficient of variation, as shown in formula 2:
In above formula, pvlostdFor the penalty values of dimension d, pvlosttotalFor the penalty values of total dimension.Wherein, penalty values
It is exactly exception stream magnitude.
In formula, pvd、pvlostdThe respectively successful number of dimension d (normal stream magnitude), unsuccessfully number (exception stream magnitude);rd
For the intensity of anomaly of dimension d;Dimension { t1,t2,t3…tnBe dimension d sub- dimension, such as:The sub- dimension of Beijing dimension is
Beijing unicom, Beijing movement and Beijing Telecom.
According to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, step S140, according to calculating
The described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree, described doubtful of identification is because whether dimension be root
Because of dimension, including:By calculated described doubtful because the contribution degree of dimension and sub- dimension extent of damage consistent degree are input to point
Class device, to described doubtful because whether dimension is root because dimension is classified.
In step S140, by each doubtful because the contribution degree of dimension and sub- dimension extent of damage consistent degree are input to and are based on
Whether linear two grader that historical data is trained carries out identification of the root because of dimension, be root because dimension is classified to dimension.
It is to the training process of grader based on historical data:Obtain historical failure when data, and by each dimension according to whether for root because
Dimension is labeled as two classes, such as 0 for non-root because, 1 for root because.Two features that each dimension is calculated according to above-mentioned steps, utilize machine
Learning classification algorithm, such as decision tree, logistic regression, training obtain two graders.
The multi-dimensional data analysis method of the embodiment of the present invention can not only use and arrive fault location scene, be suitable for simultaneously
In any multi-dimensional data analysis that can be summed it up.Wherein it is possible to which the multi-dimensional data of adduction refers to that total dimension data is equal to
The sum of each fractional dimension data, such as the data of operator's dimension are equal to the sums of the data such as unicom, movement, telecommunications.
On the other hand, an embodiment of the present invention provides a kind of data analysis set-ups of multi-dimensional data.Fig. 6 is that the present invention is real
Apply the general frame figure of the data analysis set-up of the multi-dimensional data of example.As shown in fig. 6, the multi-dimensional data of the embodiment of the present invention
Data analysis set-up include:Flow acquiring unit 100, each dimension is normal during the dimension for obtaining multi-dimensional data combines
Flow value and exception stream magnitude;Dimension screening unit 200 is used for dimension combination and the dimension by multi-dimensional data and combines
Normal stream magnitude and exception stream magnitude input decision tree, using the decision tree from the dimension of the multi-dimensional data combination in
Doubtful is filtered out because of dimension;Feature calculation unit 300, for calculating described doubtful because the contribution degree of dimension and sub- dimension are damaged
Mistake degree consistent degree;And recognition unit 400, for according to the calculated described doubtful contribution degree because of dimension and sub- dimension
Whether extent of damage consistent degree, described doubtful of identification are root because of dimension because of dimension, wherein the described flow caused by dimension is
The root of loss is because of corresponding data dimension.
Fig. 7 shows the structure diagram of the data analysis set-up of multi-dimensional data according to another embodiment of the present invention.Such as Fig. 7
Shown, according to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the flow acquiring unit 100 is wrapped
It includes:Monitor subelement 110, the total flow for monitoring the multi-dimensional data;And subelement 120 is obtained, it is used for:If monitoring
The total flow of the multi-dimensional data in preset time period has flow loss, then obtains the multidimensional in the preset time period
The normal stream magnitude and exception stream magnitude of each dimension of degrees of data.
According to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the acquisition subelement 120 is also
For:By the flow of each dimension in the flor rate data value and designated time period of each dimension in the preset time period of acquisition
The difference of data value is determined as the exception stream magnitude of each dimension.
According to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the acquisition subelement 120 is also
For:Count the number of the failed access of each dimension in the preset time period, wherein by not having in the preset time period
There is the access for receiving return information as failed access;And the number of the access failure of each dimension is determined as each dimension
Exception stream magnitude.
According to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the acquisition subelement 120 is also
For:Predict the flor rate data value of each dimension in the preset time period;By each dimension in the preset time period of acquisition
The difference of the flor rate data value of degree and the flor rate data value of each dimension in the preset time period of prediction is determined as described each
The exception stream magnitude of dimension.
According to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the dimension screening unit 200
It is additionally operable to:The exception stream magnitude that the dimension of multi-dimensional data is combined combines the weight in positive example set as the dimension, will
The normal stream magnitude of the dimension combination of multi-dimensional data combines the weight in negative example set as the dimension;Balance positive and negative example sample
This weight, so that positive and negative example sample weights are suitable under original state;Each dimension is calculated according to the positive and negative example sample weights after balance
The information gain-ratio of degree selects the maximum dimension of information gain-ratio to be divided, constructs the decision tree;And by the institute of construction
The path for stating decision tree is determined as doubtful because of dimension.
According to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the positive and negative example sample power of balance
Include again:The exception stream magnitude that the dimension of multi-dimensional data combines is combined with the product of coefficient of balance as the dimension just
The weight of example set, the normal stream magnitude that the dimension of multi-dimensional data is combined combine the power in negative example set as the dimension
Weight, wherein the coefficient of balance is the abnormal flow of the summation and each dimension of the normal stream magnitude of each dimension of multi-dimensional data
The ratio between summation of value.
Referring to Fig. 6, according to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the recognition unit
400 are additionally operable to:By calculated described doubtful because the contribution degree of dimension and sub- dimension extent of damage consistent degree are input to classification
Device, to described doubtful because whether dimension is root because dimension is classified.
The function of each module may refer to the associated description of the above method in the device of the embodiment of the present invention, no longer superfluous herein
It states.
On the other hand, an embodiment of the present invention provides a kind of data analysis set-ups of multi-dimensional data, including:One or more
A processor;Storage device, for storing one or more programs;When one or more of programs are one or more of
When processor executes so that one or more of processors are realized any in the data analysing method such as above-mentioned multi-dimensional data
The method.
In a possible design, the structure of the data analysis set-up of multi-dimensional data includes processor and storage
Device, the memory are used to store the data analysis for supporting that the data analysis set-up of multi-dimensional data executes above-mentioned multi-dimensional data
The program of method, the processor are configurable for executing the program stored in the memory.The multi-dimensional data
Data analysis set-up can also include communication interface, data analysis set-up and other equipment or communication network for multi-dimensional data
Network communicates.
Fig. 8 shows the structure diagram of the data analysis set-up of multi-dimensional data according to another embodiment of the present invention.Such as Fig. 8
Shown, the device of the image procossing includes:Memory 910 and processor 920, being stored in memory 910 can be in processor
The computer program run on 920.The processor 920 realizes the multidimensional in above-described embodiment when executing the computer program
The data analysing method of degrees of data.The quantity of the memory 910 and processor 920 can be one or more.
The data analysis set-up of the multi-dimensional data further includes:
Communication interface 930 carries out data interaction for being communicated with external device.
Memory 910 may include high-speed RAM memory, it is also possible to further include nonvolatile memory (non-
Volatile memory), a for example, at least magnetic disk storage.
If memory 910, processor 920 and the independent realization of communication interface 930, memory 910,920 and of processor
Communication interface 930 can be connected with each other by bus and complete mutual communication.The bus can be Industry Standard Architecture
Structure (ISA, Industry Standard Architecture) bus, external equipment interconnection (PCI, Peripheral
Component) bus or extended industry-standard architecture (EISA, Extended Industry Standard
Component) bus etc..The bus can be divided into address bus, data/address bus, controlling bus etc..For ease of indicating, Fig. 8
In only indicated with a thick line, it is not intended that an only bus or a type of bus.
Optionally, in specific implementation, if memory 910, processor 920 and communication interface 930 are integrated in one piece of core
On piece, then memory 910, processor 920 and communication interface 930 can complete mutual communication by internal interface.
Another aspect, an embodiment of the present invention provides a kind of computer readable storage mediums, are stored with computer program,
The program realizes any method in above-described embodiment when being executed by processor.
Above-mentioned technical proposal has the following advantages that or advantageous effect:It can be when breaking down, according to the more of fault indices
Dimension data quickly analyzes root because of dimension, saves the time of operation maintenance personnel positioning failure, reduces the loss that failure is brought.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.Moreover, particular features, structures, materials, or characteristics described
It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this
The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples
Sign is combined.
In addition, term " first ", " second " are used for description purposes only, it is not understood to indicate or imply relative importance
Or implicitly indicate the quantity of indicated technical characteristic." first " is defined as a result, the feature of " second " can be expressed or hidden
Include at least one this feature containing ground.In the description of the present invention, the meaning of " plurality " is two or more, unless otherwise
Clear specific restriction.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable
Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention
Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (system of such as computer based system including processor or other can be held from instruction
The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating or passing
Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment
It sets.The more specific example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring
Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable read-only memory
(CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other suitable Jie
Matter, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or when necessary with other
Suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned
In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage
Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware
Any one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signal
Discrete logic, with suitable combinational logic gate circuit application-specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries
Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium
In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also
That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould
The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a computer
In readable storage medium storing program for executing.The storage medium can be read-only memory, disk or CD etc..
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in its various change or replacement,
These should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the guarantor of the claim
It protects subject to range.
Claims (18)
1. a kind of data analysing method of multi-dimensional data, which is characterized in that including:
Obtain the normal stream magnitude and exception stream magnitude of each dimension in the dimension combination of multi-dimensional data;
The normal stream magnitude and exception stream magnitude of the dimension combination of multi-dimensional data and dimension combination are inputted into decision tree,
Doubtful is filtered out because of dimension from the combination of the dimension of the multi-dimensional data using the decision tree;
Calculate the described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree;And
According to the calculated described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree, described doubtful of identification
Because whether dimension is root because of dimension, wherein the root of the described flow loss caused by dimension is is because of corresponding data dimension.
2. according to the method described in claim 1, it is characterized in that, obtain multi-dimensional data each dimension normal stream magnitude and
Exception stream magnitude, including:
Monitor the total flow of the multi-dimensional data;And
If the total flow for monitoring the multi-dimensional data in preset time period has flow loss, the preset time is obtained
The normal stream magnitude and exception stream magnitude of each dimension of multi-dimensional data in section.
3. according to the method described in claim 2, it is characterized in that, obtaining each of the multi-dimensional data in the preset time period
The normal stream magnitude and exception stream magnitude of dimension include:
By the flow of each dimension in the flor rate data value and designated time period of each dimension in the preset time period of acquisition
The difference of data value is determined as the exception stream magnitude of each dimension.
4. according to the method described in claim 2, it is characterized in that, obtaining each of the multi-dimensional data in the preset time period
The normal stream magnitude and exception stream magnitude of dimension include:
Count the number of the failed access of each dimension in the preset time period, wherein by not having in the preset time period
There is the access for receiving return information as failed access;And
The number of the access failure of each dimension is determined as to the exception stream magnitude of each dimension.
5. according to the method described in claim 2, it is characterized in that, obtaining each of the multi-dimensional data in the preset time period
The normal stream magnitude and exception stream magnitude of dimension include:
Predict the flor rate data value of each dimension in the preset time period;
It will be each in the flor rate data value of each dimension in the preset time period of acquisition and the preset time period of prediction
The difference of the flor rate data value of dimension is determined as the exception stream magnitude of each dimension.
6. method according to any one of claims 1-5, which is characterized in that filter out doubtful using the decision tree
Because of dimension, including:
The exception stream magnitude that the dimension of multi-dimensional data is combined combines the weight in positive example set as the dimension, by multidimensional
The normal stream magnitude of the dimension combination of degrees of data combines the weight in negative example set as the dimension;
Positive and negative example sample weights are balanced, so that positive and negative example sample weights are suitable under original state;
The information gain-ratio of each dimension is calculated according to the positive and negative example sample weights after balance, selects the maximum dimension of information gain-ratio
Degree is divided, and the decision tree is constructed;And
The path of the decision tree of construction is determined as doubtful because of dimension.
7. according to the method described in claim 6, it is characterized in that, the positive and negative example sample weights of balance include:By various dimensions
The exception stream magnitude of the dimension combination of data combines the weight in positive example set with the product of coefficient of balance as the dimension, will
The normal stream magnitude of the dimension combination of multi-dimensional data combines the weight in negative example set as the dimension, wherein described flat
Weighing apparatus coefficient is the ratio between the summation of the summation and the exception stream magnitude of each dimension of the normal stream magnitude of each dimension of multi-dimensional data.
8. method according to any one of claims 1-5, which is characterized in that according to the calculated doubtful Gen Yinwei
The contribution degree of degree and sub- dimension extent of damage consistent degree, described doubtful of identification because whether dimension is root because of dimension, including:
By calculated described doubtful because the contribution degree of dimension and sub- dimension extent of damage consistent degree are input to grader, to institute
Doubtful is stated because whether dimension is root because dimension is classified.
9. a kind of data analysis set-up of multi-dimensional data, which is characterized in that including:
Flow acquiring unit, the normal stream magnitude and abnormal flow of each dimension during the dimension for obtaining multi-dimensional data combines
Value;
Dimension screening unit is used for normal stream magnitude and exception by the dimension combination of multi-dimensional data and dimension combination
Flow value inputs decision tree, and doubtful Gen Yinwei is filtered out from the combination of the dimension of the multi-dimensional data using the decision tree
Degree;
Feature calculation unit, for calculating the described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree;And
Recognition unit is used for according to the calculated described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree,
Described doubtful is identified because whether dimension is root because of dimension, wherein the root of the described flow loss caused by dimension is because pair
The data dimension answered.
10. device according to claim 9, which is characterized in that the flow acquiring unit includes:
Monitor subelement, the total flow for monitoring the multi-dimensional data;And
Subelement is obtained, is used for:If the total flow for monitoring the multi-dimensional data in preset time period has flow loss,
Obtain the normal stream magnitude and exception stream magnitude of each dimension of the multi-dimensional data in the preset time period.
11. device according to claim 10, which is characterized in that the acquisition subelement is additionally operable to:Described in acquisition
The difference of the flor rate data value of each dimension in the flor rate data value and designated time period of each dimension in preset time period determines
For the exception stream magnitude of each dimension.
12. device according to claim 10, which is characterized in that the acquisition subelement is additionally operable to:
Count the number of the failed access of each dimension in the preset time period, wherein by not having in the preset time period
There is the access for receiving return information as failed access;And
The number of the access failure of each dimension is determined as to the exception stream magnitude of each dimension.
13. device according to claim 10, which is characterized in that the acquisition subelement is additionally operable to:
Predict the flor rate data value of each dimension in the preset time period;
It will be each in the flor rate data value of each dimension in the preset time period of acquisition and the preset time period of prediction
The difference of the flor rate data value of dimension is determined as the exception stream magnitude of each dimension.
14. according to the device described in any one of claim 9-13, which is characterized in that the dimension screening unit is additionally operable to:
The exception stream magnitude that the dimension of multi-dimensional data is combined combines the weight in positive example set as the dimension, by multidimensional
The normal stream magnitude of the dimension combination of degrees of data combines the weight in negative example set as the dimension;
Positive and negative example sample weights are balanced, so that positive and negative example sample weights are suitable under original state;
The information gain-ratio of each dimension is calculated according to the positive and negative example sample weights after balance, selects the maximum dimension of information gain-ratio
Degree is divided, and the decision tree is constructed;And
The path of the decision tree of construction is determined as doubtful because of dimension.
15. device according to claim 14, which is characterized in that the positive and negative example sample weights of the balance include:By multidimensional
The exception stream magnitude of the dimension combination of degrees of data combines the weight in positive example set with the product of coefficient of balance as the dimension,
The normal stream magnitude that the dimension of multi-dimensional data is combined combines the weight in negative example set as the dimension, wherein described
Coefficient of balance is the ratio between the summation of the summation of the normal stream magnitude of each dimension of multi-dimensional data and the exception stream magnitude of each dimension.
16. according to the device described in any one of claim 9-13, which is characterized in that the recognition unit is additionally operable to:It will meter
Described doubtful calculated is because the contribution degree of dimension and sub- dimension extent of damage consistent degree are input to grader, to described doubtful
Because whether dimension is root because dimension is classified.
17. a kind of data analysis set-up of multi-dimensional data, which is characterized in that including:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors so that one or more of processors
Realize such as method according to any one of claims 1-8.
18. a kind of computer readable storage medium, is stored with computer program, which is characterized in that the program is held by processor
Such as method according to any one of claims 1-8 is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810400910.7A CN108683530B (en) | 2018-04-28 | 2018-04-28 | Data analysis method and device for multi-dimensional data and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810400910.7A CN108683530B (en) | 2018-04-28 | 2018-04-28 | Data analysis method and device for multi-dimensional data and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108683530A true CN108683530A (en) | 2018-10-19 |
CN108683530B CN108683530B (en) | 2021-06-01 |
Family
ID=63802628
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810400910.7A Active CN108683530B (en) | 2018-04-28 | 2018-04-28 | Data analysis method and device for multi-dimensional data and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108683530B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109858821A (en) * | 2019-02-14 | 2019-06-07 | 金瓜子科技发展(北京)有限公司 | A kind of influence feature determines method, apparatus, equipment and medium |
CN110009012A (en) * | 2019-03-20 | 2019-07-12 | 阿里巴巴集团控股有限公司 | A kind of risk specimen discerning method, apparatus and electronic equipment |
CN110995524A (en) * | 2019-10-28 | 2020-04-10 | 北京三快在线科技有限公司 | Flow data monitoring method and device, electronic equipment and computer readable medium |
CN111064614A (en) * | 2019-12-17 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Fault root cause positioning method, device, equipment and storage medium |
CN111209179A (en) * | 2020-04-23 | 2020-05-29 | 成都四方伟业软件股份有限公司 | Method, device and system for collecting and analyzing system operation and maintenance data |
CN111241128A (en) * | 2020-01-21 | 2020-06-05 | 北京字节跳动网络技术有限公司 | Data processing method and device and electronic equipment |
CN111314173A (en) * | 2020-01-20 | 2020-06-19 | 腾讯科技(深圳)有限公司 | Monitoring information abnormity positioning method and device, computer equipment and storage medium |
CN112015995A (en) * | 2020-09-29 | 2020-12-01 | 北京百度网讯科技有限公司 | Data analysis method, device, equipment and storage medium |
CN113220796A (en) * | 2020-01-21 | 2021-08-06 | 北京达佳互联信息技术有限公司 | Abnormal business index analysis method and device |
CN113535444A (en) * | 2020-04-14 | 2021-10-22 | 中国移动通信集团浙江有限公司 | Transaction detection method, transaction detection device, computing equipment and computer storage medium |
CN113746798A (en) * | 2021-07-14 | 2021-12-03 | 清华大学 | Cloud network shared resource abnormal root cause positioning method based on multi-dimensional analysis |
CN114900835A (en) * | 2022-04-20 | 2022-08-12 | 广州爱浦路网络技术有限公司 | Malicious traffic intelligent detection method and device and storage medium |
CN115578078A (en) * | 2022-11-15 | 2023-01-06 | 云智慧(北京)科技有限公司 | Data processing method, device and equipment of operation and maintenance system |
CN116227995A (en) * | 2023-02-06 | 2023-06-06 | 北京三维天地科技股份有限公司 | Index analysis method and system based on machine learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3110198A2 (en) * | 2015-06-22 | 2016-12-28 | Accenture Global Services Limited | Wi-fi access points performance management |
CN106874574A (en) * | 2017-01-22 | 2017-06-20 | 清华大学 | Mobile solution performance bottleneck analysis method and device based on decision tree |
CN107025154A (en) * | 2016-01-29 | 2017-08-08 | 阿里巴巴集团控股有限公司 | The failure prediction method and device of disk |
CN107154880A (en) * | 2016-03-03 | 2017-09-12 | 阿里巴巴集团控股有限公司 | system monitoring method and device |
-
2018
- 2018-04-28 CN CN201810400910.7A patent/CN108683530B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3110198A2 (en) * | 2015-06-22 | 2016-12-28 | Accenture Global Services Limited | Wi-fi access points performance management |
CN107025154A (en) * | 2016-01-29 | 2017-08-08 | 阿里巴巴集团控股有限公司 | The failure prediction method and device of disk |
CN107154880A (en) * | 2016-03-03 | 2017-09-12 | 阿里巴巴集团控股有限公司 | system monitoring method and device |
CN106874574A (en) * | 2017-01-22 | 2017-06-20 | 清华大学 | Mobile solution performance bottleneck analysis method and device based on decision tree |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109858821A (en) * | 2019-02-14 | 2019-06-07 | 金瓜子科技发展(北京)有限公司 | A kind of influence feature determines method, apparatus, equipment and medium |
CN110009012A (en) * | 2019-03-20 | 2019-07-12 | 阿里巴巴集团控股有限公司 | A kind of risk specimen discerning method, apparatus and electronic equipment |
CN110995524A (en) * | 2019-10-28 | 2020-04-10 | 北京三快在线科技有限公司 | Flow data monitoring method and device, electronic equipment and computer readable medium |
CN110995524B (en) * | 2019-10-28 | 2022-06-14 | 北京三快在线科技有限公司 | Flow data monitoring method and device, electronic equipment and computer readable medium |
CN111064614A (en) * | 2019-12-17 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Fault root cause positioning method, device, equipment and storage medium |
CN111314173B (en) * | 2020-01-20 | 2022-04-08 | 腾讯科技(深圳)有限公司 | Monitoring information abnormity positioning method and device, computer equipment and storage medium |
CN111314173A (en) * | 2020-01-20 | 2020-06-19 | 腾讯科技(深圳)有限公司 | Monitoring information abnormity positioning method and device, computer equipment and storage medium |
CN113220796A (en) * | 2020-01-21 | 2021-08-06 | 北京达佳互联信息技术有限公司 | Abnormal business index analysis method and device |
CN111241128A (en) * | 2020-01-21 | 2020-06-05 | 北京字节跳动网络技术有限公司 | Data processing method and device and electronic equipment |
CN113535444A (en) * | 2020-04-14 | 2021-10-22 | 中国移动通信集团浙江有限公司 | Transaction detection method, transaction detection device, computing equipment and computer storage medium |
CN113535444B (en) * | 2020-04-14 | 2023-11-03 | 中国移动通信集团浙江有限公司 | Abnormal motion detection method, device, computing equipment and computer storage medium |
CN111209179A (en) * | 2020-04-23 | 2020-05-29 | 成都四方伟业软件股份有限公司 | Method, device and system for collecting and analyzing system operation and maintenance data |
CN112015995A (en) * | 2020-09-29 | 2020-12-01 | 北京百度网讯科技有限公司 | Data analysis method, device, equipment and storage medium |
CN113746798A (en) * | 2021-07-14 | 2021-12-03 | 清华大学 | Cloud network shared resource abnormal root cause positioning method based on multi-dimensional analysis |
CN114900835A (en) * | 2022-04-20 | 2022-08-12 | 广州爱浦路网络技术有限公司 | Malicious traffic intelligent detection method and device and storage medium |
CN115578078A (en) * | 2022-11-15 | 2023-01-06 | 云智慧(北京)科技有限公司 | Data processing method, device and equipment of operation and maintenance system |
CN116227995A (en) * | 2023-02-06 | 2023-06-06 | 北京三维天地科技股份有限公司 | Index analysis method and system based on machine learning |
CN116227995B (en) * | 2023-02-06 | 2023-09-12 | 北京三维天地科技股份有限公司 | Index analysis method and system based on machine learning |
Also Published As
Publication number | Publication date |
---|---|
CN108683530B (en) | 2021-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108683530A (en) | Data analysing method, device and the storage medium of multi-dimensional data | |
EP3743859A1 (en) | Systems and methods for preparing data for use by machine learning algorithms | |
US20160055044A1 (en) | Fault analysis method, fault analysis system, and storage medium | |
EP3418910A1 (en) | Big data-based method and device for calculating relationship between development objects | |
CN112258093A (en) | Risk level data processing method and device, storage medium and electronic equipment | |
US10616040B2 (en) | Managing network alarms | |
CN109960839B (en) | Service link discovery method and system of service support system based on machine learning | |
CN114170002A (en) | Method and device for predicting access frequency | |
CN110674104B (en) | Feature combination screening method, device, computer equipment and storage medium | |
CN110633304B (en) | Combined feature screening method, device, computer equipment and storage medium | |
CN111783883A (en) | Abnormal data detection method and device | |
CN113835947A (en) | Method and system for determining abnormality reason based on abnormality identification result | |
CN111008871A (en) | Real estate repurchase customer follow-up quantity calculation method, device and storage medium | |
CN110264306B (en) | Big data-based product recommendation method, device, server and medium | |
CN110619406A (en) | Method and device for determining business abnormity | |
CN112529428A (en) | Method and equipment for evaluating operation efficiency of bank outlet equipment | |
US20210373987A1 (en) | Reinforcement learning approach to root cause analysis | |
CN113762421A (en) | Training method of classification model, traffic analysis method, device and equipment | |
CN113535522A (en) | Abnormal condition detection method, device and equipment | |
CN108804640B (en) | Data grouping method, device, storage medium and equipment based on maximized IV | |
CN113205363A (en) | Service index monitoring method and device based on big data | |
CN112631892B (en) | Method, computing device, and computer medium for predicting server health status | |
CN112395179A (en) | Model training method, disk prediction method, device and electronic equipment | |
CN110008100A (en) | Method and device for web page access amount abnormality detection | |
CN109726084A (en) | The analysis method and device of the failure problems of data center |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |