CN108683530A - Data analysing method, device and the storage medium of multi-dimensional data - Google Patents

Data analysing method, device and the storage medium of multi-dimensional data Download PDF

Info

Publication number
CN108683530A
CN108683530A CN201810400910.7A CN201810400910A CN108683530A CN 108683530 A CN108683530 A CN 108683530A CN 201810400910 A CN201810400910 A CN 201810400910A CN 108683530 A CN108683530 A CN 108683530A
Authority
CN
China
Prior art keywords
dimension
dimensional data
stream magnitude
data
time period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810400910.7A
Other languages
Chinese (zh)
Other versions
CN108683530B (en
Inventor
陈云
陈宇
李聪
王博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810400910.7A priority Critical patent/CN108683530B/en
Publication of CN108683530A publication Critical patent/CN108683530A/en
Application granted granted Critical
Publication of CN108683530B publication Critical patent/CN108683530B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/0636Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis based on a decision tree analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the present invention proposes a kind of data analysing method of multi-dimensional data, device and computer readable storage medium.The data analysing method of wherein multi-dimensional data includes:Obtain the normal stream magnitude and exception stream magnitude of each dimension in the dimension combination of multi-dimensional data;The normal stream magnitude and exception stream magnitude of the dimension combination of multi-dimensional data and dimension combination are inputted into decision tree, doubtful is filtered out because of dimension from the combination of the dimension of multi-dimensional data using decision tree;Calculate the doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree;According to the calculated doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree, doubtful of identification is because whether dimension is root because of dimension, and the root of root flow loss caused by dimension is is because of corresponding data dimension.The embodiment of the present invention, according to the multi-dimensional data of fault indices, quickly analyzes root because of dimension, saves the time of operation maintenance personnel positioning failure, reduce the loss that failure is brought when breaking down.

Description

Data analysing method, device and the storage medium of multi-dimensional data
Technical field
The present invention relates to a kind of information technology field more particularly to data analysing method of multi-dimensional data, device and meters Calculation machine readable storage medium storing program for executing.
Background technology
In order to preferably understand the operation conditions with Analysis Service in real time, Internet company would generally be in acquisition monitoring data When, attribute tags as much as possible are enclosed, such as UA (User Agent, user agent), network formats, geographical location etc..Label It is from different angles or description of the dimension to data, the description information of different dimensions make the gathered data have powerful expression Ability constitutes the multi-dimensional data of the gathered data.
It is current to be positioned mainly by manually checking, comparing to the data of different dimensions using multi-dimensional data, from The apparent dimension of intensity of anomaly is found in all dimensions.By manually being judged according to multi-dimensional data when failure occurs, need Wanting staff has certain experience, and since deterministic process needs to check comprehensive descision after the tendency charts of more data, mistake Journey can expend longer time.When data dimension is more, positioning time can rise sharply, and cause to stop loss because can not quickly position And cause larger loss.
Invention content
The embodiment of the present invention provides a kind of data analysing method of multi-dimensional data, device and computer-readable storage medium Matter, at least to solve one or more technical problems in the prior art.
In a first aspect, an embodiment of the present invention provides a kind of data analysing methods of multi-dimensional data, including:Obtain multidimensional The normal stream magnitude and exception stream magnitude of each dimension in the dimension combination of degrees of data;Dimension combination by multi-dimensional data and institute Normal stream magnitude and the exception stream magnitude input decision tree for stating dimension combination, using the decision tree from the multi-dimensional data Doubtful is filtered out because of dimension in dimension combination;Described doubtful is calculated because the contribution degree of dimension is consistent with the sub- dimension extent of damage Degree;And according to the calculated described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree, identify described doubt Whether rooty is root because of dimension because of dimension, wherein the root of the described flow loss caused by dimension is is because of corresponding data dimension Degree.
With reference to first aspect, the embodiment of the present invention obtains multi-dimensional data in the first realization method of first aspect Each dimension normal stream magnitude and exception stream magnitude, including:Monitor the total flow of the multi-dimensional data;And if monitoring The total flow of the multi-dimensional data in preset time period has flow loss, then obtains the various dimensions in the preset time period The normal stream magnitude and exception stream magnitude of each dimension of data.
The first realization method with reference to first aspect, second realization method of the embodiment of the present invention in first aspect In, the normal stream magnitude and exception stream magnitude that obtain each dimension of the multi-dimensional data in the preset time period include:It will obtain The flor rate data value of each dimension in the flor rate data value and designated time period of each dimension in the preset time period taken Difference is determined as the exception stream magnitude of each dimension.
The first realization method with reference to first aspect, the third realization method of the embodiment of the present invention in first aspect In, the normal stream magnitude and exception stream magnitude that obtain each dimension of the multi-dimensional data in the preset time period include:Statistics The number of the failed access of each dimension in the preset time period, wherein by not receiving back in the preset time period The access of complex information is as failed access;And the number of the access failure of each dimension is determined as to the exception stream of each dimension Magnitude.
The first realization method with reference to first aspect, four kind realization method of the embodiment of the present invention in first aspect In, the normal stream magnitude and exception stream magnitude that obtain each dimension of the multi-dimensional data in the preset time period include:Prediction The flor rate data value of each dimension in the preset time period;By the flow number of each dimension in the preset time period of acquisition It is determined as the exception of each dimension according to the difference of value and the flor rate data value of each dimension in the preset time period of prediction Flow value.
With reference to first aspect, the first realization method, second of realization method of first aspect of first aspect, first party The third realization method in face, the 4th of first aspect the kind of realization method, five kind reality of the embodiment of the present invention in first aspect In existing mode, doubtful is filtered out because of dimension using the decision tree, including:The exception stream that the dimension of multi-dimensional data is combined Magnitude combines the weight in positive example set as the dimension, and the normal stream magnitude that the dimension of multi-dimensional data is combined is as institute Dimension combination is stated in the weight for bearing example set;Positive and negative example sample weights are balanced, so that positive and negative example sample weights phase under original state When;The information gain-ratio of each dimension is calculated according to the positive and negative example sample weights after balance, selects the maximum dimension of information gain-ratio Degree is divided, and the decision tree is constructed;And the path of the decision tree of construction is determined as doubtful because of dimension.
The 5th kind of realization method with reference to first aspect, six kind realization method of the embodiment of the present invention in first aspect In, the positive and negative example sample weights of balance include:Exception stream magnitude that the dimension of multi-dimensional data is combined and coefficient of balance Product combines the weight in positive example set as the dimension, and the normal stream magnitude that the dimension of multi-dimensional data is combined is as institute Dimension combination is stated in the weight for bearing example set, wherein the coefficient of balance is the normal stream magnitude of each dimension of multi-dimensional data Summation and each dimension exception stream magnitude the ratio between summation.
With reference to first aspect, the first realization method, second of realization method of first aspect of first aspect, first party The third realization method in face, the 4th of first aspect the kind of realization method, seven kind reality of the embodiment of the present invention in first aspect In existing mode, according to the calculated described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree, described in identification Doubtful because whether dimension is root because of dimension, including:By calculated described doubtful because the contribution degree of dimension and sub- dimension are damaged Mistake degree consistent degree is input to grader, to described doubtful because whether dimension is root because dimension is classified.
Second aspect, an embodiment of the present invention provides a kind of data analysis set-ups of multi-dimensional data, including:Flow obtains Unit, the normal stream magnitude and exception stream magnitude of each dimension during the dimension for obtaining multi-dimensional data combines;Dimension screening is single Member, for the normal stream magnitude and exception stream magnitude of the dimension combination of multi-dimensional data and dimension combination to be inputted decision Tree filters out doubtful because of dimension using the decision tree from the combination of the dimension of the multi-dimensional data;Feature calculation unit, For calculating the described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree;And recognition unit, it is used for basis The calculated described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree identify described doubtful because dimension is It is no for root because of dimension, wherein the root of the described flow loss caused by dimension is is because of corresponding data dimension.
In conjunction with second aspect, in the first realization method of second aspect, the flow obtains single the embodiment of the present invention Member includes:Monitor subelement, the total flow for monitoring the multi-dimensional data;And subelement is obtained, it is used for:If monitoring The total flow of the multi-dimensional data in preset time period has flow loss, then obtains the various dimensions in the preset time period The normal stream magnitude and exception stream magnitude of each dimension of data.
In conjunction with the first realization method of second aspect, second realization method of the embodiment of the present invention in second aspect In, the acquisition subelement is additionally operable to:By the flor rate data value of each dimension in the preset time period of acquisition with it is specified when Between the difference of flor rate data value of each dimension in section be determined as the exception stream magnitude of each dimension.
In conjunction with the first realization method of second aspect, the third realization method of the embodiment of the present invention in second aspect In, the acquisition subelement is additionally operable to:Count the number of the failed access of each dimension in the preset time period, wherein will The access for not receiving return information in the preset time period is as failed access;And the access of each dimension fails Number is determined as the exception stream magnitude of each dimension.
In conjunction with the first realization method of second aspect, four kind realization method of the embodiment of the present invention in second aspect In, the acquisition subelement is additionally operable to:Predict the flor rate data value of each dimension in the preset time period;Described in acquisition The flor rate data value of the flor rate data value of each dimension in preset time period and each dimension in the preset time period of prediction Difference be determined as the exception stream magnitude of each dimension.
In conjunction with the first realization method, second of realization method of second aspect, second party of second aspect, second aspect The third realization method in face, the 4th of second aspect the kind of realization method, five kind reality of the embodiment of the present invention in second aspect In existing mode, the dimension screening unit is additionally operable to:The exception stream magnitude that the dimension of multi-dimensional data is combined is as the dimension Degree combination is in the weight of positive example set, and the normal stream magnitude that the dimension of multi-dimensional data is combined is as dimension combination negative The weight of example set;Positive and negative example sample weights are balanced, so that positive and negative example sample weights are suitable under original state;After balance Positive and negative example sample weights calculate the information gain-ratio of each dimension, and the maximum dimension of information gain-ratio is selected to be divided, and construct The decision tree;And the path of the decision tree of construction is determined as doubtful because of dimension.
In conjunction with the 5th kind of realization method of second aspect, six kind realization method of the embodiment of the present invention in second aspect In, the positive and negative example sample weights of balance include:Exception stream magnitude that the dimension of multi-dimensional data is combined and coefficient of balance Product combines the weight in positive example set as the dimension, and the normal stream magnitude that the dimension of multi-dimensional data is combined is as institute Dimension combination is stated in the weight for bearing example set, wherein the coefficient of balance is the normal stream magnitude of each dimension of multi-dimensional data Summation and each dimension exception stream magnitude the ratio between summation.
In conjunction with the first realization method, second of realization method of second aspect, second party of second aspect, second aspect The third realization method in face, the 4th of second aspect the kind of realization method, seven kind reality of the embodiment of the present invention in second aspect In existing mode, the recognition unit is additionally operable to:By calculated described doubtful because the contribution degree of dimension and sub- dimension lose journey Degree consistent degree is input to grader, to described doubtful because whether dimension is root because dimension is classified.
The third aspect, an embodiment of the present invention provides a kind of data analysis set-ups of multi-dimensional data, including:One or more A processor;Storage device, for storing one or more programs;When one or more of programs are one or more of When processor executes so that one or more of processors realize the method as described in any in above-mentioned first aspect.
In a possible design, the structure of the data analysis set-up of multi-dimensional data includes processor and storage Device, the memory, which is used to store, supports the data analysis set-up of multi-dimensional data to execute multi-dimensional data in above-mentioned first aspect Data analysing method program, the processor is configurable for executing the program stored in the memory.It is described more The data analysis set-up of dimension data can also include communication interface, and data analysis set-up and other for multi-dimensional data are set Standby or communication.
Fourth aspect, an embodiment of the present invention provides a kind of computer readable storage mediums, are stored with computer program, The program realizes any method in above-mentioned first aspect when being executed by processor.
Above-mentioned technical proposal has the following advantages that or advantageous effect:It can be when breaking down, according to the more of fault indices Dimension data quickly analyzes root because of dimension, saves the time of operation maintenance personnel positioning failure, reduces the loss that failure is brought.
Above-mentioned general introduction is merely to illustrate that the purpose of book, it is not intended to be limited in any way.Except foregoing description Schematical aspect, except embodiment and feature, by reference to attached drawing and the following detailed description, the present invention is further Aspect, embodiment and feature, which will be, to be readily apparent that.
Description of the drawings
In the accompanying drawings, unless specified otherwise herein, otherwise run through the identical reference numeral of multiple attached drawings and indicate same or analogous Component or element.What these attached drawings were not necessarily to scale.It should be understood that these attached drawings are depicted only according to the present invention Some disclosed embodiments, and should not serve to limit the scope of the present invention.
Fig. 1 is the general frame figure of the data analysing method of the multi-dimensional data of the embodiment of the present invention;
Fig. 2 is a kind of step flow of preferred embodiment of the data analysing method of multi-dimensional data provided by the invention Figure;
Fig. 3 shows the signal of the decision tree of the data analysing method of the multi-dimensional data according to an embodiment of the present invention Figure;
Fig. 4 a and Fig. 4 b show the decision tree of the data analysing method of the multi-dimensional data according to an embodiment of the present invention Structural division process schematic;
Fig. 5 shows doubtful of the data analysing method of the multi-dimensional data according to an embodiment of the present invention because of dimension group Close complete or collected works' schematic diagram;
Fig. 6 is the general frame figure of the data analysis set-up of the multi-dimensional data of the embodiment of the present invention;
Fig. 7 shows the structure diagram of the data analysis set-up of multi-dimensional data according to another embodiment of the present invention;
Fig. 8 shows the structure diagram of the data analysis set-up of multi-dimensional data according to another embodiment of the present invention.
Specific implementation mode
Hereinafter, certain exemplary embodiments are simply just described.As one skilled in the art will recognize that Like that, without departing from the spirit or scope of the present invention, described embodiment can be changed by various different modes. Therefore, attached drawing and the content of description are considered essentially illustrative rather than restrictive.
An embodiment of the present invention provides a kind of data analysing methods of multi-dimensional data.Fig. 1 is the more of the embodiment of the present invention The general frame figure of the data analysing method of dimension data.As shown in Figure 1, the data of the multi-dimensional data of the embodiment of the present invention point Analysis method includes:Step S110 obtains the normal stream magnitude and exception stream magnitude of each dimension in the dimension combination of multi-dimensional data; Step S120 determines the normal stream magnitude and the input of exception stream magnitude of the dimension combination of multi-dimensional data and dimension combination Plan tree filters out doubtful because of dimension using the decision tree from the combination of the dimension of the multi-dimensional data;Step S130, meter Calculate the described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree;And step S140, according to calculated institute The doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree are stated, identifies described doubtful because whether dimension is Gen Yinwei Degree, wherein the root of the described flow loss caused by dimension is is because of corresponding data dimension.
The data analysing method of the multi-dimensional data of the embodiment of the present invention can be used for when failure occurs from all dimensions Root is found because of dimension, wherein root is the apparent dimension of intensity of anomaly because of dimension.It is below two and positions root in multi-dimensional data Because of the example of dimension.
Example one:Dimension combination includes province and operator, wherein operator such as unicom, movement, telecommunications etc..In service flow Amount reads in the data on flows of each dimension when failure when damaging, when according to failure the data on flows of each dimension to root because dimension carries out Quickly positioning, such as the data traffic loss of telecommunications are more, then positioning result is:The apparent root of intensity of anomaly is operation because of dimension Quotient's dimension.
Example two:Dimension combination includes operating system, browser and mobile communication technology, wherein operating system such as apple, peace Zhuo etc.;Browser such as Google's browser, 360 browsers, UC browsers etc.;Mobile communication technology such as 3G, 4G etc..It is applied in publication Monitoring data total flow later judges failure occur when total flow damages, reads in the data on flows of each dimension when failure, root The data on flows of each dimension is to root because dimension is quickly positioned when according to failure, for example positioning result is:It is being used if this is applied Flow loss intensity of anomaly is apparent when Google's browser, then root is because dimension is browser.
In a particular application, traffic monitoring software, monitoring network flow can be used.It, can when flow of services damages Using the embodiment of the present invention multi-dimensional data data analysing method to root because dimension is quickly positioned, so as to shorten stopping loss Time reduces breakdown loss.
According to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, each dimension of multi-dimensional data is obtained The normal stream magnitude and exception stream magnitude of degree, including:Monitor the total flow of the multi-dimensional data;And if when monitoring default Between the total flow of the multi-dimensional data in section have flow loss, then obtain the multi-dimensional data in the preset time period The normal stream magnitude and exception stream magnitude of each dimension.
In this embodiment, monitoring data total flow judges failure occur when total flow damages, reads in failure When each dimension flor rate data value.Wherein, flor rate data value includes normal stream magnitude, also includes exception stream magnitude, flow number It is the summation of normal stream magnitude and exception stream magnitude according to value.It needs by certain mode, such as by way of acquiring or predicting, The exception stream magnitude of each dimension obtained, exception stream magnitude namely lose flor rate data value.
According to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, obtain in the preset time period Multi-dimensional data each dimension normal stream magnitude and exception stream magnitude include:It will be each in the preset time period of acquisition The difference of the flor rate data value of each dimension in the flor rate data value and designated time period of dimension is determined as the different of each dimension Normal flow value.
In this embodiment, the exception stream magnitude of each dimension obtained by way of acquisition, acquisition include acquisition The flow actually occurred.Can drop how many according to the flow rate calculation flow actually occurred, how much calculate flow drop can be with finger The flor rate data value for each dimension in section of fixing time, which makes the difference, to be worth.For example, the stream of each dimension in current slot can be calculated Measure the difference of data value and the flor rate data value of each dimension in previous time period.Optionally, it can calculate in current slot The difference of the flor rate data value of each dimension and the flor rate data value of each dimension in the same period of the previous day.Another optional In embodiment, each dimension in the flor rate data value of each dimension in current slot and the same period of other day can be also calculated The difference of the flor rate data value of degree may specify the number of days in " other day ", such as a week or one month.
According to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, obtain in the preset time period Multi-dimensional data each dimension normal stream magnitude and exception stream magnitude include:Count each dimension in the preset time period Failed access number, wherein using the access for not receiving return information in the preset time period as failed access; And the number of the access failure of each dimension is determined as to the exception stream magnitude of each dimension.
Specifically, the specific method of the exception stream magnitude of each dimension obtained by way of acquisition, can also count How many requests are not handled, and are exactly the number of failed access without processed request number of times.If access does not receive back Complex information, that is, the access request are not handled, then it is assumed that are the case where accessing failure.It can be by the failed access of each dimension Number be determined as the exception stream magnitude of each dimension.Similarly, the access for receiving return information is considered as then accessing successfully The case where, the number of the successful access of each dimension is determined as to the normal stream magnitude of each dimension.
According to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, obtain in the preset time period Multi-dimensional data each dimension normal stream magnitude and exception stream magnitude include:Predict each dimension in the preset time period Flor rate data value;By the preset time of the flor rate data value of each dimension in the preset time period of acquisition and prediction The difference of the flor rate data value of each dimension in section is determined as the exception stream magnitude of each dimension.
The exception stream magnitude of each dimension obtained by way of prediction, including:Prediction were it not for the stream to break down Amount, the difference with the collected flow actually occurred are exception stream magnitude, that is, the flow lost.Specifically, it can count The cyclically-varying rule of network flow, according in the information predictions such as period and/or user browsing behavior pattern current slot Each dimension flor rate data value.The difference for the flor rate data value that the flor rate data value of prediction and actual acquisition are arrived is as abnormal Flow value.
Fig. 2 is a kind of step flow of preferred embodiment of the data analysing method of multi-dimensional data provided by the invention Figure.As shown in Fig. 2, according to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, the step in Fig. 1 S120 filters out doubtful because of dimension using the decision tree, including:Step S210 combines the dimension of multi-dimensional data Exception stream magnitude combines the weight in positive example set as the dimension, the normal stream magnitude that the dimension of multi-dimensional data is combined As dimension combination in the weight for bearing example set;Step S220 balances positive and negative example sample weights, so that under original state just Negative example sample weights are suitable;Step S230 calculates the information gain of each dimension according to the positive and negative example sample weights after balance Rate selects the maximum dimension of information gain-ratio to be divided, constructs the decision tree;And step S240, described in construction The path of decision tree is determined as doubtful because of dimension.
Decision tree is a kind of tree construction of similar flow chart, wherein each internal node (non-leaf nodes) is indicated at one Test on attribute, each branch represents a test output, and each leaf nodes store a class label.Once establishing Decision tree, for the tuple of given class label, as soon as tracking has the root node to the path of leaf node, the leaf node Store the prediction of the tuple.
The embodiment of the present invention filters out doubtful because of dimension with the process of construction decision tree, and the input feature vector of decision tree is The dimension of access combines, such as province and operator and its normal stream magnitude, exception stream magnitude, and dimension combination is thus for output No is positive example, that is, doubtful because of dimension;Being obtained by model training has the decision tree of preferable discrimination, doubtful to obtain Root combines complete or collected works, i.e. decision tree path because of dimension.It wherein, can be with doubtful to screen based on the process of C4.5 algorithm construction decision trees Root filters out the doubtful calculation amount that can reduce subsequent dimensional characteristics calculating and root because of identification because of dimension because of dimension.
In step S210, some dimension combination d in multi-dimensional data is regarded as a sample point, then dimension combines d's Access frequency of failure pvlostd, that is, exception stream magnitude, as d positive example set weight weightpositive_d, dimension Combine the access number of success pv of dd, that is, normal stream magnitude, as d in the weight weight for bearing example setnegative_d
According to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, step S220 balances positive and negative example sample This weight includes:The exception stream magnitude that the dimension of multi-dimensional data combines is combined with the product of coefficient of balance as the dimension In the weight of positive example set, the normal stream magnitude that the dimension of multi-dimensional data is combined is combined as the dimension in negative example set Weight, wherein the coefficient of balance is the exception of the summation and each dimension of the normal stream magnitude of each dimension of multi-dimensional data The ratio between summation of flow value.
Doubtful is screened because dimension is it is assumed that make initial state information entropy maximum, need using information gain-ratio in order to meet The positive and negative example sample weights of balance are used so that positive and negative example sample weights are suitable under original state.In this embodiment, most Whole positive example weight weightpositive_d'=pvlostd*(pvtotal/pvlosttotal);Final negative example weight weightnegative_d'=pvd
For example, when only being combined there are two dimension, according to pvlosttotalIt is 1, pvtotalIt is 100, pvlostd1It is 1, pvd1 It is 10, pvlostd2It is 0, pvd2The case where being 90, calculates:
The positive example weight weight of sample point d1positive_d1For pvlostd1*(pvtotal/pvlosttotalExample is born in)=100 Weight weightnegative_d1For pvd1=10;
Similarly the positive example weight of d2 is 0, and it is 90 to bear example weight, and generally the positive example weight of original state is 100, bears example Weight is 100.Initial state information entropy is maximum.
In step S230, the training stage of decision tree constructs a decision tree from given training dataset.It can be with It is trained based on C4.5 algorithms to establish decision tree.Division only uses a dimension and is screened every time, in each divide, calculates The information gain-ratio that each dimension is brought, the maximum and feature (i.e. dimension) more than 0 of greed selection information gain-ratio are drawn Point.Stop subtree when entropy production is negative to generate, saves the calculating of sub-tree section in this way, result in the decision tree ultimately generated Node path for non-negative example is doubtful because of dimension, wherein the node path of non-negative example includes n omicronn-leaf child node.
For example, according to the case where only there are two dimensions, province has value Beijing, Shanghai, operator to have value telecommunications, connection It is logical.The situation of telecommunications exception is taken to analyze, telecommunications can cause telecommunications positive example weight very high (with pvlost positive correlations) extremely, deviate Equilbrium position, comentropy are less than the dimension of other relative equilibriums;The negative example weight of unicom is very high, also offsets from equilbrium position, letter Breath entropy is relatively low, and the information gain-ratio of operator's dimension can be made to be higher than the information gain-ratio of province dimension, select operator at this time It is divided, is not considered further that<Province>、<Province, operator>This two classes dimension combines, wherein information gain-ratio is that comentropy is equal The reduction degree of value.The rest may be inferred, and one group can be obtained based on greedy method can preferably distinguish normal and abnormal dimension group It closes, and beta pruning is with obvious effects.
For another example, still according to the case where only there are two dimensions, province has value Beijing, Hebei, operator to have value unicom, electricity Letter.Table 1 is the flor rate data value and weighted value of the multi-dimensional data in this example.Table 1 shows 4 sample points altogether, is respectively:Sample This d11, Beijing unicom;Sample point d12, Beijing Telecom;Sample point d21, Hebei unicom;Sample point d22, Hebei telecommunications.It presses According to data in table 1, total pvlost of exception stream magnitudetotalIt is 100, total pv of normal stream magnitudetotalIt is 1000, pvlostd11It is 90, pvd1It is 100, is calculated:The positive example weight weight of sample point d11positive_d11For pvlostd11* (pvtotal/pvlosttotalExample weight weight is born in)=900negative_d11For pvd1=100;Similarly the positive example weight of d12 is 100, it is 80 to bear example weight;The positive example weight of d21 is 0, and it is 200 to bear example weight;The positive example weight of d22 is 0, and negative example weight is 620。
The flor rate data value and weighted value of 1 multi-dimensional data of table
Province Operator Normal stream magnitude Exception stream magnitude Positive example weight Negative example weight
Beijing Unicom 100 90 900 100
Beijing Telecommunications 80 10 100 80
Hebei Unicom 200 0 0 200
Hebei Telecommunications 620 0 0 620
It is total 1000 100 1000 1000
Fig. 3 shows the signal of the decision tree of the data analysing method of the multi-dimensional data according to an embodiment of the present invention Figure;Fig. 4 a and Fig. 4 b show that the decision tree construction of the data analysing method of the multi-dimensional data according to an embodiment of the present invention is drawn Divide process schematic.Fig. 3 is the decision tree schematic diagram gone out according to sample set data configuration shown in table 1.Decision tree shown in Fig. 3 Specific partition process shown by Fig. 4 a and Fig. 4 b.
Wherein, Fig. 4 a are that decision tree divides schematic diagram for the first time.As shown in fig. 4 a, it divides for the first time by node (1), also It is root node, node (2) Beijing and node (3) Hebei is divided into using province.Specifically, according to sample set data, i.e. 1 institute of table The normal stream magnitude of sample point d11, d12, d21, d22 for showing and the data of exception stream magnitude calculate, if partition dimension uses province Part divides, then Pekinese's positive example/negative example ratio is 1000/180, and the positive example/negative example ratio in Hebei is 0/820;If partition dimension It is divided using operator, then the positive example of telecommunications/negative example ratio is 100/700, and positive example/negative example ratio of unicom is 900/300.
In the present embodiment, it is trained based on C4.5 algorithms to establish decision tree.C4.5 algorithms are selected with information gain-ratio Attribute.Attributions selection measurement is also known as splitting rule, because they determine how the tuple given on node divides.Attributions selection degree Amount provides the ranking of the given training tuple of each attribute description, and there is the attribute of preferably measurement score to be selected as given tuple Split Attribute.Such as C4.5 algorithms select attribute with information gain-ratio.When decision tree creates, many branch reflections are The problem of exception in training data, pruning method is for handling this excessive fitting data.In decision tree construction process Beta pruning is carried out, because certain nodes with seldom element may make the decision tree of construction cross adaptation, if not considering these Node may be more preferable.
In machine learning and Feature Engineering, the uncertainty of information can be indicated with entropy.Limited is taken for one The stochastic variable X of value, if its probability distribution is:
P (X=xi)=pi, i=1,2 ..., n
So the entropy of stochastic variable X can be described with following formula:
For example, if in a categorizing system, the mark of classification is c, and value condition is c1,c2,…,c n, n is The sum of classification, then the entropy of this categorizing system is:
What information gain referred to is exactly the decrement of entropy, is after dividing the entropy of preceding sample set and being divided using some feature The difference of the entropy of data subset, that is, after some feature X is held to, the information gain brought to system.It is whole as feature X When body distribution situation is fixed, conditional entropy is H (c | X).So because being characterized after X is held to, increase to the information that system is brought Benefit is:IG (X)=H (c)-H (c | X).
Information gain-ratio is to divide Information Meter with above-mentioned information gain and division measure information come common definition The entropy H (X) of amount i.e. feature X, then information gain-ratio is:
During first time shown in figure 4a divides, the information after dividing according to province and being divided according to operator is calculated separately Ratio of profit increase, since the information gain-ratio after being divided according to province is more than the information gain-ratio after being divided according to operator, choosing It selects and is divided according to province, node (1) is made to divide out child node (2) Beijing and child node (3) Hebei.
It is identical as the calculation that first time divides in second of division shown in Fig. 4 b, pass through information gain-ratio Calculate the dividing mode for determining node (2) and node (3).For node (2), selection is divided according to operator, makes node (2) point Split child node (4) Beijing Telecom and child node (5) Beijing unicom;For node (3), the information gain divided due to operator Rate is 0, so not subdivided.Doubtful finally obtained is as shown in Figure 5 because of dimension combination complete or collected works, that is, decision tree path.
Step S120 in Fig. 1 after filtering out doubtful because of dimension using decision tree, executes step S130, dimension Characteristic value calculates.Calculate all doubtful two features because of dimension:Contribution degree, sub- dimension extent of damage consistent degree.Contribution degree It can be calculated according to formula 1, sub- dimension extent of damage consistent degree can be weighed with the coefficient of variation, as shown in formula 2:
In above formula, pvlostdFor the penalty values of dimension d, pvlosttotalFor the penalty values of total dimension.Wherein, penalty values It is exactly exception stream magnitude.
In formula, pvd、pvlostdThe respectively successful number of dimension d (normal stream magnitude), unsuccessfully number (exception stream magnitude);rd For the intensity of anomaly of dimension d;Dimension { t1,t2,t3…tnBe dimension d sub- dimension, such as:The sub- dimension of Beijing dimension is Beijing unicom, Beijing movement and Beijing Telecom.
According to a kind of embodiment of the data analysing method of multi-dimensional data of the present invention, step S140, according to calculating The described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree, described doubtful of identification is because whether dimension be root Because of dimension, including:By calculated described doubtful because the contribution degree of dimension and sub- dimension extent of damage consistent degree are input to point Class device, to described doubtful because whether dimension is root because dimension is classified.
In step S140, by each doubtful because the contribution degree of dimension and sub- dimension extent of damage consistent degree are input to and are based on Whether linear two grader that historical data is trained carries out identification of the root because of dimension, be root because dimension is classified to dimension. It is to the training process of grader based on historical data:Obtain historical failure when data, and by each dimension according to whether for root because Dimension is labeled as two classes, such as 0 for non-root because, 1 for root because.Two features that each dimension is calculated according to above-mentioned steps, utilize machine Learning classification algorithm, such as decision tree, logistic regression, training obtain two graders.
The multi-dimensional data analysis method of the embodiment of the present invention can not only use and arrive fault location scene, be suitable for simultaneously In any multi-dimensional data analysis that can be summed it up.Wherein it is possible to which the multi-dimensional data of adduction refers to that total dimension data is equal to The sum of each fractional dimension data, such as the data of operator's dimension are equal to the sums of the data such as unicom, movement, telecommunications.
On the other hand, an embodiment of the present invention provides a kind of data analysis set-ups of multi-dimensional data.Fig. 6 is that the present invention is real Apply the general frame figure of the data analysis set-up of the multi-dimensional data of example.As shown in fig. 6, the multi-dimensional data of the embodiment of the present invention Data analysis set-up include:Flow acquiring unit 100, each dimension is normal during the dimension for obtaining multi-dimensional data combines Flow value and exception stream magnitude;Dimension screening unit 200 is used for dimension combination and the dimension by multi-dimensional data and combines Normal stream magnitude and exception stream magnitude input decision tree, using the decision tree from the dimension of the multi-dimensional data combination in Doubtful is filtered out because of dimension;Feature calculation unit 300, for calculating described doubtful because the contribution degree of dimension and sub- dimension are damaged Mistake degree consistent degree;And recognition unit 400, for according to the calculated described doubtful contribution degree because of dimension and sub- dimension Whether extent of damage consistent degree, described doubtful of identification are root because of dimension because of dimension, wherein the described flow caused by dimension is The root of loss is because of corresponding data dimension.
Fig. 7 shows the structure diagram of the data analysis set-up of multi-dimensional data according to another embodiment of the present invention.Such as Fig. 7 Shown, according to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the flow acquiring unit 100 is wrapped It includes:Monitor subelement 110, the total flow for monitoring the multi-dimensional data;And subelement 120 is obtained, it is used for:If monitoring The total flow of the multi-dimensional data in preset time period has flow loss, then obtains the multidimensional in the preset time period The normal stream magnitude and exception stream magnitude of each dimension of degrees of data.
According to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the acquisition subelement 120 is also For:By the flow of each dimension in the flor rate data value and designated time period of each dimension in the preset time period of acquisition The difference of data value is determined as the exception stream magnitude of each dimension.
According to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the acquisition subelement 120 is also For:Count the number of the failed access of each dimension in the preset time period, wherein by not having in the preset time period There is the access for receiving return information as failed access;And the number of the access failure of each dimension is determined as each dimension Exception stream magnitude.
According to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the acquisition subelement 120 is also For:Predict the flor rate data value of each dimension in the preset time period;By each dimension in the preset time period of acquisition The difference of the flor rate data value of degree and the flor rate data value of each dimension in the preset time period of prediction is determined as described each The exception stream magnitude of dimension.
According to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the dimension screening unit 200 It is additionally operable to:The exception stream magnitude that the dimension of multi-dimensional data is combined combines the weight in positive example set as the dimension, will The normal stream magnitude of the dimension combination of multi-dimensional data combines the weight in negative example set as the dimension;Balance positive and negative example sample This weight, so that positive and negative example sample weights are suitable under original state;Each dimension is calculated according to the positive and negative example sample weights after balance The information gain-ratio of degree selects the maximum dimension of information gain-ratio to be divided, constructs the decision tree;And by the institute of construction The path for stating decision tree is determined as doubtful because of dimension.
According to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the positive and negative example sample power of balance Include again:The exception stream magnitude that the dimension of multi-dimensional data combines is combined with the product of coefficient of balance as the dimension just The weight of example set, the normal stream magnitude that the dimension of multi-dimensional data is combined combine the power in negative example set as the dimension Weight, wherein the coefficient of balance is the abnormal flow of the summation and each dimension of the normal stream magnitude of each dimension of multi-dimensional data The ratio between summation of value.
Referring to Fig. 6, according to a kind of embodiment of the data analysis set-up of multi-dimensional data of the present invention, the recognition unit 400 are additionally operable to:By calculated described doubtful because the contribution degree of dimension and sub- dimension extent of damage consistent degree are input to classification Device, to described doubtful because whether dimension is root because dimension is classified.
The function of each module may refer to the associated description of the above method in the device of the embodiment of the present invention, no longer superfluous herein It states.
On the other hand, an embodiment of the present invention provides a kind of data analysis set-ups of multi-dimensional data, including:One or more A processor;Storage device, for storing one or more programs;When one or more of programs are one or more of When processor executes so that one or more of processors are realized any in the data analysing method such as above-mentioned multi-dimensional data The method.
In a possible design, the structure of the data analysis set-up of multi-dimensional data includes processor and storage Device, the memory are used to store the data analysis for supporting that the data analysis set-up of multi-dimensional data executes above-mentioned multi-dimensional data The program of method, the processor are configurable for executing the program stored in the memory.The multi-dimensional data Data analysis set-up can also include communication interface, data analysis set-up and other equipment or communication network for multi-dimensional data Network communicates.
Fig. 8 shows the structure diagram of the data analysis set-up of multi-dimensional data according to another embodiment of the present invention.Such as Fig. 8 Shown, the device of the image procossing includes:Memory 910 and processor 920, being stored in memory 910 can be in processor The computer program run on 920.The processor 920 realizes the multidimensional in above-described embodiment when executing the computer program The data analysing method of degrees of data.The quantity of the memory 910 and processor 920 can be one or more.
The data analysis set-up of the multi-dimensional data further includes:
Communication interface 930 carries out data interaction for being communicated with external device.
Memory 910 may include high-speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory), a for example, at least magnetic disk storage.
If memory 910, processor 920 and the independent realization of communication interface 930, memory 910,920 and of processor Communication interface 930 can be connected with each other by bus and complete mutual communication.The bus can be Industry Standard Architecture Structure (ISA, Industry Standard Architecture) bus, external equipment interconnection (PCI, Peripheral Component) bus or extended industry-standard architecture (EISA, Extended Industry Standard Component) bus etc..The bus can be divided into address bus, data/address bus, controlling bus etc..For ease of indicating, Fig. 8 In only indicated with a thick line, it is not intended that an only bus or a type of bus.
Optionally, in specific implementation, if memory 910, processor 920 and communication interface 930 are integrated in one piece of core On piece, then memory 910, processor 920 and communication interface 930 can complete mutual communication by internal interface.
Another aspect, an embodiment of the present invention provides a kind of computer readable storage mediums, are stored with computer program, The program realizes any method in above-described embodiment when being executed by processor.
Above-mentioned technical proposal has the following advantages that or advantageous effect:It can be when breaking down, according to the more of fault indices Dimension data quickly analyzes root because of dimension, saves the time of operation maintenance personnel positioning failure, reduces the loss that failure is brought.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.Moreover, particular features, structures, materials, or characteristics described It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples Sign is combined.
In addition, term " first ", " second " are used for description purposes only, it is not understood to indicate or imply relative importance Or implicitly indicate the quantity of indicated technical characteristic." first " is defined as a result, the feature of " second " can be expressed or hidden Include at least one this feature containing ground.In the description of the present invention, the meaning of " plurality " is two or more, unless otherwise Clear specific restriction.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (system of such as computer based system including processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating or passing Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable read-only memory (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other suitable Jie Matter, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or when necessary with other Suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit application-specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a computer In readable storage medium storing program for executing.The storage medium can be read-only memory, disk or CD etc..
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in its various change or replacement, These should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the guarantor of the claim It protects subject to range.

Claims (18)

1. a kind of data analysing method of multi-dimensional data, which is characterized in that including:
Obtain the normal stream magnitude and exception stream magnitude of each dimension in the dimension combination of multi-dimensional data;
The normal stream magnitude and exception stream magnitude of the dimension combination of multi-dimensional data and dimension combination are inputted into decision tree, Doubtful is filtered out because of dimension from the combination of the dimension of the multi-dimensional data using the decision tree;
Calculate the described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree;And
According to the calculated described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree, described doubtful of identification Because whether dimension is root because of dimension, wherein the root of the described flow loss caused by dimension is is because of corresponding data dimension.
2. according to the method described in claim 1, it is characterized in that, obtain multi-dimensional data each dimension normal stream magnitude and Exception stream magnitude, including:
Monitor the total flow of the multi-dimensional data;And
If the total flow for monitoring the multi-dimensional data in preset time period has flow loss, the preset time is obtained The normal stream magnitude and exception stream magnitude of each dimension of multi-dimensional data in section.
3. according to the method described in claim 2, it is characterized in that, obtaining each of the multi-dimensional data in the preset time period The normal stream magnitude and exception stream magnitude of dimension include:
By the flow of each dimension in the flor rate data value and designated time period of each dimension in the preset time period of acquisition The difference of data value is determined as the exception stream magnitude of each dimension.
4. according to the method described in claim 2, it is characterized in that, obtaining each of the multi-dimensional data in the preset time period The normal stream magnitude and exception stream magnitude of dimension include:
Count the number of the failed access of each dimension in the preset time period, wherein by not having in the preset time period There is the access for receiving return information as failed access;And
The number of the access failure of each dimension is determined as to the exception stream magnitude of each dimension.
5. according to the method described in claim 2, it is characterized in that, obtaining each of the multi-dimensional data in the preset time period The normal stream magnitude and exception stream magnitude of dimension include:
Predict the flor rate data value of each dimension in the preset time period;
It will be each in the flor rate data value of each dimension in the preset time period of acquisition and the preset time period of prediction The difference of the flor rate data value of dimension is determined as the exception stream magnitude of each dimension.
6. method according to any one of claims 1-5, which is characterized in that filter out doubtful using the decision tree Because of dimension, including:
The exception stream magnitude that the dimension of multi-dimensional data is combined combines the weight in positive example set as the dimension, by multidimensional The normal stream magnitude of the dimension combination of degrees of data combines the weight in negative example set as the dimension;
Positive and negative example sample weights are balanced, so that positive and negative example sample weights are suitable under original state;
The information gain-ratio of each dimension is calculated according to the positive and negative example sample weights after balance, selects the maximum dimension of information gain-ratio Degree is divided, and the decision tree is constructed;And
The path of the decision tree of construction is determined as doubtful because of dimension.
7. according to the method described in claim 6, it is characterized in that, the positive and negative example sample weights of balance include:By various dimensions The exception stream magnitude of the dimension combination of data combines the weight in positive example set with the product of coefficient of balance as the dimension, will The normal stream magnitude of the dimension combination of multi-dimensional data combines the weight in negative example set as the dimension, wherein described flat Weighing apparatus coefficient is the ratio between the summation of the summation and the exception stream magnitude of each dimension of the normal stream magnitude of each dimension of multi-dimensional data.
8. method according to any one of claims 1-5, which is characterized in that according to the calculated doubtful Gen Yinwei The contribution degree of degree and sub- dimension extent of damage consistent degree, described doubtful of identification because whether dimension is root because of dimension, including:
By calculated described doubtful because the contribution degree of dimension and sub- dimension extent of damage consistent degree are input to grader, to institute Doubtful is stated because whether dimension is root because dimension is classified.
9. a kind of data analysis set-up of multi-dimensional data, which is characterized in that including:
Flow acquiring unit, the normal stream magnitude and abnormal flow of each dimension during the dimension for obtaining multi-dimensional data combines Value;
Dimension screening unit is used for normal stream magnitude and exception by the dimension combination of multi-dimensional data and dimension combination Flow value inputs decision tree, and doubtful Gen Yinwei is filtered out from the combination of the dimension of the multi-dimensional data using the decision tree Degree;
Feature calculation unit, for calculating the described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree;And
Recognition unit is used for according to the calculated described doubtful contribution degree because of dimension and sub- dimension extent of damage consistent degree, Described doubtful is identified because whether dimension is root because of dimension, wherein the root of the described flow loss caused by dimension is because pair The data dimension answered.
10. device according to claim 9, which is characterized in that the flow acquiring unit includes:
Monitor subelement, the total flow for monitoring the multi-dimensional data;And
Subelement is obtained, is used for:If the total flow for monitoring the multi-dimensional data in preset time period has flow loss, Obtain the normal stream magnitude and exception stream magnitude of each dimension of the multi-dimensional data in the preset time period.
11. device according to claim 10, which is characterized in that the acquisition subelement is additionally operable to:Described in acquisition The difference of the flor rate data value of each dimension in the flor rate data value and designated time period of each dimension in preset time period determines For the exception stream magnitude of each dimension.
12. device according to claim 10, which is characterized in that the acquisition subelement is additionally operable to:
Count the number of the failed access of each dimension in the preset time period, wherein by not having in the preset time period There is the access for receiving return information as failed access;And
The number of the access failure of each dimension is determined as to the exception stream magnitude of each dimension.
13. device according to claim 10, which is characterized in that the acquisition subelement is additionally operable to:
Predict the flor rate data value of each dimension in the preset time period;
It will be each in the flor rate data value of each dimension in the preset time period of acquisition and the preset time period of prediction The difference of the flor rate data value of dimension is determined as the exception stream magnitude of each dimension.
14. according to the device described in any one of claim 9-13, which is characterized in that the dimension screening unit is additionally operable to:
The exception stream magnitude that the dimension of multi-dimensional data is combined combines the weight in positive example set as the dimension, by multidimensional The normal stream magnitude of the dimension combination of degrees of data combines the weight in negative example set as the dimension;
Positive and negative example sample weights are balanced, so that positive and negative example sample weights are suitable under original state;
The information gain-ratio of each dimension is calculated according to the positive and negative example sample weights after balance, selects the maximum dimension of information gain-ratio Degree is divided, and the decision tree is constructed;And
The path of the decision tree of construction is determined as doubtful because of dimension.
15. device according to claim 14, which is characterized in that the positive and negative example sample weights of the balance include:By multidimensional The exception stream magnitude of the dimension combination of degrees of data combines the weight in positive example set with the product of coefficient of balance as the dimension, The normal stream magnitude that the dimension of multi-dimensional data is combined combines the weight in negative example set as the dimension, wherein described Coefficient of balance is the ratio between the summation of the summation of the normal stream magnitude of each dimension of multi-dimensional data and the exception stream magnitude of each dimension.
16. according to the device described in any one of claim 9-13, which is characterized in that the recognition unit is additionally operable to:It will meter Described doubtful calculated is because the contribution degree of dimension and sub- dimension extent of damage consistent degree are input to grader, to described doubtful Because whether dimension is root because dimension is classified.
17. a kind of data analysis set-up of multi-dimensional data, which is characterized in that including:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors so that one or more of processors Realize such as method according to any one of claims 1-8.
18. a kind of computer readable storage medium, is stored with computer program, which is characterized in that the program is held by processor Such as method according to any one of claims 1-8 is realized when row.
CN201810400910.7A 2018-04-28 2018-04-28 Data analysis method and device for multi-dimensional data and storage medium Active CN108683530B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810400910.7A CN108683530B (en) 2018-04-28 2018-04-28 Data analysis method and device for multi-dimensional data and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810400910.7A CN108683530B (en) 2018-04-28 2018-04-28 Data analysis method and device for multi-dimensional data and storage medium

Publications (2)

Publication Number Publication Date
CN108683530A true CN108683530A (en) 2018-10-19
CN108683530B CN108683530B (en) 2021-06-01

Family

ID=63802628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810400910.7A Active CN108683530B (en) 2018-04-28 2018-04-28 Data analysis method and device for multi-dimensional data and storage medium

Country Status (1)

Country Link
CN (1) CN108683530B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858821A (en) * 2019-02-14 2019-06-07 金瓜子科技发展(北京)有限公司 A kind of influence feature determines method, apparatus, equipment and medium
CN110009012A (en) * 2019-03-20 2019-07-12 阿里巴巴集团控股有限公司 A kind of risk specimen discerning method, apparatus and electronic equipment
CN110995524A (en) * 2019-10-28 2020-04-10 北京三快在线科技有限公司 Flow data monitoring method and device, electronic equipment and computer readable medium
CN111064614A (en) * 2019-12-17 2020-04-24 腾讯科技(深圳)有限公司 Fault root cause positioning method, device, equipment and storage medium
CN111209179A (en) * 2020-04-23 2020-05-29 成都四方伟业软件股份有限公司 Method, device and system for collecting and analyzing system operation and maintenance data
CN111241128A (en) * 2020-01-21 2020-06-05 北京字节跳动网络技术有限公司 Data processing method and device and electronic equipment
CN111314173A (en) * 2020-01-20 2020-06-19 腾讯科技(深圳)有限公司 Monitoring information abnormity positioning method and device, computer equipment and storage medium
CN112015995A (en) * 2020-09-29 2020-12-01 北京百度网讯科技有限公司 Data analysis method, device, equipment and storage medium
CN113220796A (en) * 2020-01-21 2021-08-06 北京达佳互联信息技术有限公司 Abnormal business index analysis method and device
CN113535444A (en) * 2020-04-14 2021-10-22 中国移动通信集团浙江有限公司 Transaction detection method, transaction detection device, computing equipment and computer storage medium
CN113746798A (en) * 2021-07-14 2021-12-03 清华大学 Cloud network shared resource abnormal root cause positioning method based on multi-dimensional analysis
CN114900835A (en) * 2022-04-20 2022-08-12 广州爱浦路网络技术有限公司 Malicious traffic intelligent detection method and device and storage medium
CN115578078A (en) * 2022-11-15 2023-01-06 云智慧(北京)科技有限公司 Data processing method, device and equipment of operation and maintenance system
CN116227995A (en) * 2023-02-06 2023-06-06 北京三维天地科技股份有限公司 Index analysis method and system based on machine learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3110198A2 (en) * 2015-06-22 2016-12-28 Accenture Global Services Limited Wi-fi access points performance management
CN106874574A (en) * 2017-01-22 2017-06-20 清华大学 Mobile solution performance bottleneck analysis method and device based on decision tree
CN107025154A (en) * 2016-01-29 2017-08-08 阿里巴巴集团控股有限公司 The failure prediction method and device of disk
CN107154880A (en) * 2016-03-03 2017-09-12 阿里巴巴集团控股有限公司 system monitoring method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3110198A2 (en) * 2015-06-22 2016-12-28 Accenture Global Services Limited Wi-fi access points performance management
CN107025154A (en) * 2016-01-29 2017-08-08 阿里巴巴集团控股有限公司 The failure prediction method and device of disk
CN107154880A (en) * 2016-03-03 2017-09-12 阿里巴巴集团控股有限公司 system monitoring method and device
CN106874574A (en) * 2017-01-22 2017-06-20 清华大学 Mobile solution performance bottleneck analysis method and device based on decision tree

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858821A (en) * 2019-02-14 2019-06-07 金瓜子科技发展(北京)有限公司 A kind of influence feature determines method, apparatus, equipment and medium
CN110009012A (en) * 2019-03-20 2019-07-12 阿里巴巴集团控股有限公司 A kind of risk specimen discerning method, apparatus and electronic equipment
CN110995524A (en) * 2019-10-28 2020-04-10 北京三快在线科技有限公司 Flow data monitoring method and device, electronic equipment and computer readable medium
CN110995524B (en) * 2019-10-28 2022-06-14 北京三快在线科技有限公司 Flow data monitoring method and device, electronic equipment and computer readable medium
CN111064614A (en) * 2019-12-17 2020-04-24 腾讯科技(深圳)有限公司 Fault root cause positioning method, device, equipment and storage medium
CN111314173B (en) * 2020-01-20 2022-04-08 腾讯科技(深圳)有限公司 Monitoring information abnormity positioning method and device, computer equipment and storage medium
CN111314173A (en) * 2020-01-20 2020-06-19 腾讯科技(深圳)有限公司 Monitoring information abnormity positioning method and device, computer equipment and storage medium
CN113220796A (en) * 2020-01-21 2021-08-06 北京达佳互联信息技术有限公司 Abnormal business index analysis method and device
CN111241128A (en) * 2020-01-21 2020-06-05 北京字节跳动网络技术有限公司 Data processing method and device and electronic equipment
CN113535444A (en) * 2020-04-14 2021-10-22 中国移动通信集团浙江有限公司 Transaction detection method, transaction detection device, computing equipment and computer storage medium
CN113535444B (en) * 2020-04-14 2023-11-03 中国移动通信集团浙江有限公司 Abnormal motion detection method, device, computing equipment and computer storage medium
CN111209179A (en) * 2020-04-23 2020-05-29 成都四方伟业软件股份有限公司 Method, device and system for collecting and analyzing system operation and maintenance data
CN112015995A (en) * 2020-09-29 2020-12-01 北京百度网讯科技有限公司 Data analysis method, device, equipment and storage medium
CN113746798A (en) * 2021-07-14 2021-12-03 清华大学 Cloud network shared resource abnormal root cause positioning method based on multi-dimensional analysis
CN114900835A (en) * 2022-04-20 2022-08-12 广州爱浦路网络技术有限公司 Malicious traffic intelligent detection method and device and storage medium
CN115578078A (en) * 2022-11-15 2023-01-06 云智慧(北京)科技有限公司 Data processing method, device and equipment of operation and maintenance system
CN116227995A (en) * 2023-02-06 2023-06-06 北京三维天地科技股份有限公司 Index analysis method and system based on machine learning
CN116227995B (en) * 2023-02-06 2023-09-12 北京三维天地科技股份有限公司 Index analysis method and system based on machine learning

Also Published As

Publication number Publication date
CN108683530B (en) 2021-06-01

Similar Documents

Publication Publication Date Title
CN108683530A (en) Data analysing method, device and the storage medium of multi-dimensional data
EP3743859A1 (en) Systems and methods for preparing data for use by machine learning algorithms
US20160055044A1 (en) Fault analysis method, fault analysis system, and storage medium
EP3418910A1 (en) Big data-based method and device for calculating relationship between development objects
CN112258093A (en) Risk level data processing method and device, storage medium and electronic equipment
US10616040B2 (en) Managing network alarms
CN109960839B (en) Service link discovery method and system of service support system based on machine learning
CN114170002A (en) Method and device for predicting access frequency
CN110674104B (en) Feature combination screening method, device, computer equipment and storage medium
CN110633304B (en) Combined feature screening method, device, computer equipment and storage medium
CN111783883A (en) Abnormal data detection method and device
CN113835947A (en) Method and system for determining abnormality reason based on abnormality identification result
CN111008871A (en) Real estate repurchase customer follow-up quantity calculation method, device and storage medium
CN110264306B (en) Big data-based product recommendation method, device, server and medium
CN110619406A (en) Method and device for determining business abnormity
CN112529428A (en) Method and equipment for evaluating operation efficiency of bank outlet equipment
US20210373987A1 (en) Reinforcement learning approach to root cause analysis
CN113762421A (en) Training method of classification model, traffic analysis method, device and equipment
CN113535522A (en) Abnormal condition detection method, device and equipment
CN108804640B (en) Data grouping method, device, storage medium and equipment based on maximized IV
CN113205363A (en) Service index monitoring method and device based on big data
CN112631892B (en) Method, computing device, and computer medium for predicting server health status
CN112395179A (en) Model training method, disk prediction method, device and electronic equipment
CN110008100A (en) Method and device for web page access amount abnormality detection
CN109726084A (en) The analysis method and device of the failure problems of data center

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant