CN110046179A - A kind of method for digging, device and the equipment of dimension of alarming - Google Patents

A kind of method for digging, device and the equipment of dimension of alarming Download PDF

Info

Publication number
CN110046179A
CN110046179A CN201811588986.3A CN201811588986A CN110046179A CN 110046179 A CN110046179 A CN 110046179A CN 201811588986 A CN201811588986 A CN 201811588986A CN 110046179 A CN110046179 A CN 110046179A
Authority
CN
China
Prior art keywords
business
service feature
node
information
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811588986.3A
Other languages
Chinese (zh)
Other versions
CN110046179B (en
Inventor
赵孝松
陈治
王少华
游永胜
张文洪
曹峻
庄里
周扬
霍扬扬
杨树波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811588986.3A priority Critical patent/CN110046179B/en
Publication of CN110046179A publication Critical patent/CN110046179A/en
Application granted granted Critical
Publication of CN110046179B publication Critical patent/CN110046179B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Abstract

Disclose method for digging, device and the equipment of a kind of dimension of alarming.By constructing an optimal decision tree of classifying quality in sample data, and find out abnormal probability from the decision tree and meet the path of actual alarm condition so that the alarm dimension chosen, can local maxima exception probability for classifying.By the above-mentioned means, avoiding artificial experience selection alarm dimension, the accuracy to alarm dimension configuration is improved, and improve the efficiency of selection for dimension of alarming.

Description

A kind of method for digging, device and the equipment of dimension of alarming
Technical field
This specification embodiment be related to information technology field more particularly to a kind of method for digging of dimension of alarming, device and Equipment.
Background technique
With the development of business and platform, incident is to guarantee that the monitoring operated normally needs to these business and platform It asks.Common monitoring logic is from different angles or different alarm dimensions is monitored, after finding data target exception It alarms.
Currently, the determination for dimension of alarming inside O&M, if selected using violence dimension, efficiency is too low, makes us nothing Method receives.If whenever encountering new business scenario, with regard to being configured by business experience professional abundant, then accurately Property is influenced by personnel's experience, scene changes.Often match the alarm dimension come out and actually detaches, it is not accurate enough, need basis Practical situation is constantly adjusted to optimize.
Based on this, a kind of more accurately alarm dimension is needed to excavate scheme.
Summary of the invention
For the problem of existing selection alarm dimension inaccuracy, more accurately alarm dimension is found from business sample to realize Degree, this specification embodiment provide method for digging, device and the equipment of a kind of dimension of alarming, which comprises
The business sample set comprising regular traffic sample and abnormal traffic sample is obtained, includes more in any business sample The characteristic value of a service feature;
Building decision tree classifies to the business sample set, wherein each node indicates one in the decision tree A business sample set, root node corresponds to the business sample set in decision tree, and the characteristic value with service feature is Bian Jinhang Division, and, when dividing in each non-leaf nodes, information gain parameter is maximum, and the information gain parameter includes information gain Amount, information gain ratio or Gini coefficient, for indicating that the information when decision tree is divided with the value of service feature increases Add degree;
The slave leaf node for meeting alert if in the decision tree is determined to the path of root node, the path characterizes one The combination of kind service feature value;
The combination of service feature value corresponding to the path is determined as dimension of alarming, so as to according to the alarm dimension Classify to the business datum not marked.
On the other hand, this specification embodiment also provides a kind of business datum classification method based on above-mentioned alarm dimension, Include:
Determine the service feature value of the business datum;
If in the service feature value of the business datum including the alarm dimension, determine the business datum for abnormal industry Business data.
Corresponding, this specification embodiment also provides a kind of excavating gear of dimension of alarming, comprising:
Module is obtained, the business sample set comprising regular traffic sample and abnormal traffic sample, any business sample are obtained It include the characteristic value of multiple service features in this;
Module is constructed, building decision tree classifies to the business sample set, wherein each section in the decision tree Point indicates a business sample set, and root node corresponds to the business sample set in decision tree, with the characteristic value of service feature It is divided for side, and, when dividing in each non-leaf nodes, information gain parameter is maximum, and the information gain parameter includes Information gain amount, information gain ratio or Gini coefficient, for indicating when decision tree is divided with the value of service feature Information increase degree;
Path determination module determines and meets the slave leaf node of alert if in the decision tree to the path of root node, The path characterizes a kind of combination of service feature value;
The combination of service feature value corresponding to the path is determined as dimension of alarming, so as to root by dimension determining module Classify according to the alarm dimension to the business datum not marked.
On the other hand, this specification embodiment also provides a kind of business datum sorter based on above-mentioned alarm dimension, Include:
Determining module, the service feature value of the business datum;
Judgment module determines the business number if including the alarm dimension in the service feature value of the business datum According to for abnormal traffic data.
Exception is found out generally by constructing an optimal decision tree of classifying quality in sample data, and from the decision tree Rate meets the path of actual alarm condition so that the alarm dimension chosen, can local maxima exception probability with For classifying.By the above-mentioned means, avoiding artificial experience selection alarm dimension, improve to the accurate of alarm dimension configuration Property, and improve the efficiency of selection for dimension of alarming.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not This specification embodiment can be limited.
In addition, any embodiment in this specification embodiment does not need to reach above-mentioned whole effects.
Detailed description of the invention
In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only The some embodiments recorded in this specification embodiment for those of ordinary skill in the art can also be attached according to these Figure obtains other attached drawings.
Fig. 1 is a kind of flow diagram of the method for digging for alarm dimension that this specification embodiment provides;
Fig. 2 is the schematic diagram of Path selection in a kind of decision tree provided by this specification embodiment;
Fig. 3 is a kind of flow diagram for obtaining alarm dimension provided by this specification embodiment;
Fig. 4 is a kind of structural schematic diagram of the excavating gear for alarm dimension that this specification embodiment provides;
Fig. 5 is the structural schematic diagram for configuring a kind of equipment of this specification embodiment method.
Specific embodiment
In order to make those skilled in the art more fully understand the technical solution in this specification embodiment, below in conjunction with this Attached drawing in specification embodiment is described in detail the technical solution in this specification embodiment, it is clear that described Embodiment is only a part of the embodiment of this specification, instead of all the embodiments.The embodiment of base in this manual, Those of ordinary skill in the art's every other embodiment obtained, all should belong to the range of protection.
Business sample in this specification refers to the business sample comprising multiple service features.For example, srvice instance.System The interface of system all receives the processing request of called side initiation all the time, requests to be not quite similar into ginseng and returning the result, multiple In service feature dimensional extent, by the calling within the scope of the polymerization identical services characteristic value unit time, unit time model is formed It encloses interior different and uniquely calls form, as srvice instance sample.
The service feature of srvice instance includes but is not limited to " interface, interface requests parameter, interface return parameters, request amount Grade, the directed acyclic structure of internal system node, the upstream and downstream system of calling, deployment unit " etc., and each service feature one As have multiple values.For example, for service feature " age ", the value that may include can be " teenager ", " youth ", " middle age " and " old age ".In simple terms, srvice instance is exactly that one called to similar traffic is abstracted.Operation system can lead to It crosses certain means (comparing for example, establishing regular instance library) and judges that the state of some example is normal or abnormal.
Therefore, it can be detected with service feature value included in Case-based Reasoning.For example, by certain several specific business The combination conduct alarm dimension of characteristic value, such as " magnitude=3 " are requested in interface=1, return parameters=0, once some business sample Include the valued combinations of these service features in this, determines that the business sample is abnormal.Determining for current alerts dimension relies on In business experience, professional abundant is configured, and accuracy is influenced by personnel's experience, scene changes.Based on this, this theory Bright book embodiment provides a kind of method for digging of dimension of alarming, and realizes that automatic mining goes out more accurately alarm dimension.
Below in conjunction with attached drawing, the technical solution that each embodiment of this specification provides is described in detail.As shown in Figure 1, Fig. 1 is this A kind of flow diagram of the method for digging for alarm dimension that specification embodiment provides, the process specifically comprise the following steps:
S101, obtain include regular traffic sample and abnormal traffic sample business sample set, in any business sample Characteristic value comprising multiple service features.
The mode of acquisition business sample can be the extraction section from from the historical data in currently nearlyr a period of time, with As training sample.In the historical data, it can be compared according to the regular traffic sample database pre-established, judge an industry Business sample is normal or abnormal.
S103, building decision tree classify to the business sample set.
The root node of decision tree corresponds to the business sample set.Since each business sample includes multiple service features, And each service feature might have multiple values.Therefore, can the value based on some service feature to all business Sample is classified, and finally all has been classified all business samples, obtains a decision tree.It shows on decision tree i.e. To be divided since root node, obtaining multiple child nodes, child node can continue to divide, until meeting certain point Cut-off condition is split, leaf node is obtained.Each child node both corresponds to a business sample set.
Since service feature has multiple, decision tree needs a selected service feature to be divided when being divided, with So that the decision tree that division obtains has better classifying quality.That is, make the classification of child node pure as far as possible, here it is pure just Refer to that the label of business sample in child node is all normal as far as possible or is all abnormal.
Many to the selection method of classification service feature, common method can be determined based on information gain parameter according to which A service feature is divided.For business sample set to be sorted, information gain parameter is used for table before division and after division Show the information increase degree when decision tree is divided with the value of service feature.Information gain parameter is bigger, then it represents that root If dividing business sample set according to current signature, purity rises faster.Information gain parameter includes information gain amount, letter The ratio of gains or Gini coefficient etc. are ceased, for indicating that the information when decision tree is divided with the value of service feature increases Degree.
Wherein, for each business sample set to be sorted, (business sample set itself can also be considered as the son of oneself Collection), can express its information content with certain specific calculation, and information gain amount and after presentation class and before classification The difference of information content;Information gain can use the ratio of the information content before information gain amount and division;Gini coefficient then characterizes The purity of set, Gini index is smaller, indicates that the sample being selected in sample set is smaller by the probability of misclassification, that is to say, that The purity of the subset of business sample after division is higher.The calculation of specific information gain parameter can voluntarily be determined Justice.
S105 is determined and is met the slave leaf node of alert if in the decision tree to the path of root node, the path Characterize a kind of combination of service feature value.
The decision tree built is the decision tree for having optimal classification effect for business sample set.Decision tree In path characterize the combination of service feature value a kind of, correspond to an optimal classification strategy/rule.It therefore, can be according to reality Border needs, and some alert if is set, specifically, the alert if may include the number such as path interior joint The restriction of amount, for the limited proportion of node abnormal traffic sample in path and for the abnormal accounting of node each in path Statistics limit etc..So as to filter out the road for having preferable classifying quality for abnormal traffic sample from decision tree Diameter.As shown in Fig. 2, schematic diagram of the Fig. 2 for Path selection in a kind of decision tree provided by this specification embodiment, void therein Line part represents the paths for meeting alert if.
The combination of service feature value corresponding to the path is determined as dimension of alarming, so as to according to the report by S107 Alert dimension classifies to the business datum not marked.
Specifically, determining the combination of service feature value corresponding to path.For example, the alarm selected is tieed up in Fig. 2 Degree is " E → C → D ", and corresponding characteristic value combinations are " D=D2 " & " C=C2 ".Above-mentioned alarm dimension is in general, be not necessarily to Including node direction therein.
Exception is found out generally by constructing an optimal decision tree of classifying quality in sample data, and from the decision tree Rate meets the path of actual alarm condition so that the alarm dimension chosen, can local maxima exception probability with For classifying.By the above-mentioned means, avoiding artificial experience selection alarm dimension, improve to the accurate of alarm dimension configuration Property, and improve the efficiency of selection for dimension of alarming.
During constructing decision tree, it can be classified based on the value of all service features, until using up all Service feature.But some modes can also be used, the division for child node is terminated in advance, directly generates leaf node, with It improves building speed and improves the generalization ability of decision tree.
For example, determining that the node is leaf node if the label of business sample is identical in a node.I.e. in a son section Business sample in point is all normal or in the case of being all exception, without further being divided to the node.
For example, determining that the node is leaf node if abnormal traffic sample accounting is more than threshold value in a node.I.e. one It is significantly embodied in a node, the path comprising the node may have abnormal traffic sample preferable classification When, determine that the node is leaf node.
Furthermore, it is also possible in the presence of column situation, service feature included in srvice instance sample is not fully identical. For example, including example A and B in a sample set.Include service feature (a, b, c, d) in hypothesis instance A, and is wrapped in example B Service feature (a, b, c, e) is contained.If being division service feature with special service feature e, B can be divided according to value under In one child node.Due to not having service feature e in sample A, a leaf node then can be generated at this time, it is special to correspond to business A is divided into wherein by the sample set for levying e.
It is maximum for information gain parameter when constructing decision tree, it can be realized using various ways.Its mode are as follows: For business sample set corresponding to any non-leaf nodes, the business sample for determining that the business appearance is concentrated is included Service feature, each service feature include multiple values;The first information amount of the non-leaf nodes is calculated, and, calculate root Second information content of the node after being divided according to any service feature to the business sample set;According to the first information amount and Second information content determines the information gain parameter of all service features in the sample set;Information gain parameter is maximum Service feature is determined as dividing service feature, is divided according to the value of the division service feature the non-leaf nodes It splits.
Specifically, being the information content (i.e. first information amount) for determining node to be divided first, for therein One service feature is divided according to the value of the service feature, calculates the information content after dividing at this time (i.e. the according to child node Two information content), then determine an information gain parameter.Then service feature included in the node is traversed, from all Information gain parameter in select a maximum value, i.e. the service feature has node to be divided best classifying quality, It can be used as division service feature.And the child node for being obtained after dividing, it is special to remove the division business from service feature Sign.Used division service feature is not considered further that when dividing again for child node.
For the calculation method of information gain parameter, can take various forms.For example, for business sample set D, Information content is defined as follows:Wherein, wherein piIndicate i-th of service feature in the industry The probability occurred in business sample set, can be with the business sample size comprising the service feature divided by business sample in the set Total quantity as estimation.Letter if business sample set D divided by attribute A, after being divided according to A to D Breath are as follows:DjI.e. for according to the obtained j business sample of j value of service feature A This subset.Information gain is the difference of the two: gain (A)=info (D)-infoA(D).Decision tree needs to divide every time When, the information gain of each service feature is calculated, then the maximum service feature of information gain is selected to be divided.
In another example information content is defined using another mode, for information when the use information ratio of gains is classified Amount is defined as follows:So gain ratio is definedI.e. first according to the information content (first information amount) before division and after dividing Information content (the second information content) determine the information gain amount of service feature, then by the ratio of information gain amount and division information content Value is determined as information gain ratio.And the information content after division can also be added it is some smooth, for example, to gain ratio Denominator in, be added one expression child node information content average value smooth item Ave (split_info (A)) so that denominator As split_info (A)+Ave (split_info (A)).
In another example for given business sample set D, it is assumed that have k service feature, the quantity of k-th of service feature For Ck, then the Gini coefficient expression formula of sample D are as follows:Then it is found out from all possibility divisions The smallest division of Gini coefficient, service feature corresponding to this division points are best points divided to sample set D Split service feature.
It,, can be using such as the building mode of decision tree after having input business sample data sets for sum up Lower pseudo-code is indicated:
1. construction one root node N, N correspond to sample collection of services;
2. all identical if the label of business samples all in present node: if count abnormal probability under the node, it is raw At leaf node;
3. if attribute be sky: if count abnormal probability under the node, generate leaf node;
4. if generating leaf node abnormal probability > threshold value of present node, counts the abnormal probability under the node;
5. selecting the maximum service feature A of information gain parameter in candidate Traffic characteristic set;
6. sample is divided according to each value j of A, and division service feature is subtracted from business feature list A remembers that the data of wherein j-th of branch are Aj
7. if AjFor sky, then a leaf node is created, then count the abnormal probability under the node, generates leaf section Point;
8. otherwise recursive call obtains subtree node Nj
By the above-mentioned means, a business sample set for input can be obtained with optimal classifying quality Decision tree.State in use building decision tree mode carry out alarm dimension excavate when, if it is assumed that having d in business sample set A sample, the service feature for including have n, if that alarm combination is carried out by the way of power carries out dimension excavation, then it is multiple Miscellaneous degree isAnd when carrying out alarm dimension selection using aforesaid way, it will time degree of complication It has been optimized to O (n*d*log (d)) ≈ O (N3), it can be seen that carrying out what alarm dimension automatically selected using big data sample In the process, scheme provided by this specification embodiment can greatly improve efficiency.
When in one embodiment, for the selection of alert if, it can be carried out based on such as under type: the node in path In exceptional sample accounting be more than proportion threshold value;With, in the business sample sequence of continuous multiple unit intervals, abnormal sample The SS of this accounting sequence divides point threshold value that is above standard.
Include multiple nodes in one path from leaf node to root node, have in each node normal sample and Exceptional sample counts the ratio of exceptional sample number and total number of samples all on the path, as in the node in the path Exceptional sample accounting, the threshold value of abnormal accounting can preset.
Unit interval can be set as such as 30s, 60s etc. according to practical situation.In the situation for obtaining data in real time Under, the srvice instance sample comprising the path in continuous multiple unit intervals can be counted, one is obtained and includes the path Sample sequence, and then the abnormal accounting sequence of the available sample sequence.The calculation of abnormal accounting can be exception The ratio of number of samples and total number of samples.Assuming that the abnormal accounting sequence under the alarm dimension is Xi={ X1, X2..., Xn, then the mean value u and standard deviation sigma of the exception accounting sequence can be counted, the mode for calculating the standard scores of the sequence can be with It is with such as under type, standard scores Z=(Max (Xi)-u)/σ.Standard scores under this calculation have been reacted comprising the path The abnormal maximum sample of accounting deviates from the degree of normal abnormal accounting in sample sequence, if the degree has been more than default threshold Value thinks that the business sample comprising the path has had deviated from normal conditions.In other words, which can be used for alarming dimension Degree.
Under general scenario, alarm dimension is without changing.Both it can alarm the business sample not marked.Such as Fruit is changed with business development, business datum, and the warning effect under the alarm dimension is deteriorated.So at this point, only needing Instant another batch of current business sample of acquisition excavates another batch of alarm dimension substitution i.e. according to current business sample It can.
Further, in SS timesharing, if the number of nodes in path is very few, it be easy to cause alarm dimension institute right The data answered are too many, can not accurately reflect abnormal traffic sample.And hence it is also possible in the alarm for screening alarm dimension Another condition is added in condition: the number of nodes in path is more than amount threshold.It is obtained screening decision tree by alert if After alarm dimension, it can according to the combination of service feature value corresponding to the path as alarm dimension.Such as Fig. 3 institute Show, Fig. 3 is a kind of flow diagram for obtaining alarm dimension provided by this specification embodiment.
On the other hand, after acquiring alarm dimension, this specification embodiment is also based on above scheme and obtains Alarm dimension carry out data classification, specifically include: determining the service feature value of the business datum of band classification;If the industry Include the alarm dimension in the service feature value for data of being engaged in, determines that the business datum is abnormal traffic data.
Corresponding, this specification embodiment also provides a kind of excavating gear of dimension of alarming, as shown in figure 4, Fig. 4 is this theory A kind of structural schematic diagram of the excavating gear for alarm dimension that bright book embodiment provides, comprising:
Module 401 is obtained, the business sample set comprising regular traffic sample and abnormal traffic sample, any business are obtained It include the characteristic value of multiple service features in sample;
Module 403 is constructed, building decision tree classifies to the business sample set, wherein every in the decision tree A node indicates a business sample set, and root node corresponds to the business sample set in decision tree, with the spy of service feature Value indicative is divided for side, and, when dividing in each non-leaf nodes, information gain parameter is maximum, the information gain parameter Including information gain amount, information gain ratio or Gini coefficient, divided for indicating in decision tree with the value of service feature Information increase degree when splitting;
Path determination module 405 determines and meets the slave leaf node of alert if in the decision tree to the road of root node Diameter, the path characterize a kind of combination of service feature value;
The combination of service feature value corresponding to the path is determined as dimension of alarming by dimension determining module 407, so as to Classified according to the alarm dimension to the business datum not marked.
Further, the building module 403 determines that the node is if the label of business sample is identical in a node Leaf node;Alternatively, determining that the node is leaf node if abnormal traffic sample accounting is more than threshold value in a node;Alternatively, If divided using the characteristic value of a service feature as side, there is the industry not comprising the service feature in business sample set Business sample, generates the leaf node for corresponding to the business sample not comprising the service feature.
Further, the building module 403 is determined for business sample set corresponding to any non-leaf nodes The service feature that the business sample that the business appearance is concentrated is included, each service feature includes multiple values;Described in calculating The first information amount of non-leaf nodes, and, after calculating divides the business sample set according to any service feature Node the second information content;Amount and the second information content according to the first information, determine all service features in the sample set Information gain parameter;The maximum service feature of information gain parameter is determined as to divide service feature, according to the division industry The value of business feature divides the non-leaf nodes.
Further, the building module 403 determines the difference of the first information amount and second information content For the information gain amount of service feature;Alternatively, measuring the difference with second information content according to the first information, service feature is determined Information gain amount, and, the node division information content that is included after determining division believes the information gain amount and division The ratio of breath amount is determined as information gain ratio.
Further, it is more than ratio that the alert if in described device, which includes: the exceptional sample accounting in the node in path, Threshold value;With in the business sample sequence of continuous multiple unit intervals, the SS point of exceptional sample accounting sequence is super Cross standard scores threshold value.
Further, the alert if in described device further includes that the number of nodes in the path is more than amount threshold.
On the other hand, also very a kind of business datum sorter based on upper alarm dimension of this specification embodiment, comprising:
Determining module, the service feature value of the business datum;
Judgment module determines the business number if including the alarm dimension in the service feature value of the business datum According to for abnormal traffic data.
This specification embodiment also provides a kind of computer equipment, includes at least memory, processor and is stored in On reservoir and the computer program that can run on a processor, wherein processor realizes report shown in FIG. 1 when executing described program The method for digging of alert dimension.
Fig. 5 shows one kind provided by this specification embodiment and more specifically calculates device hardware structural schematic diagram, The equipment may include: processor 1010, memory 1020, input/output interface 1030, communication interface 1040 and bus 1050.Wherein processor 1010, memory 1020, input/output interface 1030 and communication interface 1040 are real by bus 1050 The now communication connection inside equipment each other.
Processor 1010 can use general CPU (Central Processing Unit, central processing unit), micro- place Reason device, application specific integrated circuit (Application Specific Integrated Circuit, ASIC) or one Or the modes such as multiple integrated circuits are realized, for executing relative program, to realize technical side provided by this specification embodiment Case.
Memory 1020 can use ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, the forms such as dynamic memory realize.Memory 1020 can store Operating system and other applications are realizing technical solution provided by this specification embodiment by software or firmware When, relevant program code is stored in memory 1020, and execution is called by processor 1010.
Input/output interface 1030 is for connecting input/output module, to realize information input and output.Input and output/ Module can be used as component Configuration (not shown) in a device, can also be external in equipment to provide corresponding function.Wherein Input equipment may include keyboard, mouse, touch screen, microphone, various kinds of sensors etc., output equipment may include display, Loudspeaker, vibrator, indicator light etc..
Communication interface 1040 is used for connection communication module (not shown), to realize the communication of this equipment and other equipment Interaction.Wherein communication module can be realized by wired mode (such as USB, cable etc.) and be communicated, can also be wirelessly (such as mobile network, WIFI, bluetooth etc.) realizes communication.
Bus 1050 include an access, equipment various components (such as processor 1010, memory 1020, input/it is defeated Outgoing interface 1030 and communication interface 1040) between transmit information.
It should be noted that although above equipment illustrates only processor 1010, memory 1020, input/output interface 1030, communication interface 1040 and bus 1050, but in the specific implementation process, which can also include realizing normal fortune Other assemblies necessary to row.In addition, it will be appreciated by those skilled in the art that, it can also be only comprising real in above equipment Component necessary to existing this specification example scheme, without including all components shown in figure.
This specification embodiment also provides a kind of computer readable storage medium, is stored thereon with computer program, the journey The method for digging of alarm dimension shown in FIG. 1 is realized when sequence is executed by processor.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
As seen through the above description of the embodiments, those skilled in the art can be understood that this specification Embodiment can be realized by means of software and necessary general hardware platform.Based on this understanding, this specification is implemented Substantially the part that contributes to existing technology can be embodied in the form of software products the technical solution of example in other words, The computer software product can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are to make It is each to obtain computer equipment (can be personal computer, server or the network equipment etc.) execution this specification embodiment Method described in certain parts of a embodiment or embodiment.
System, method, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.A kind of typically to realize that equipment is computer, the concrete form of computer can To be personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play In device, navigation equipment, E-mail receiver/send equipment, game console, tablet computer, wearable device or these equipment The combination of any several equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for method reality For applying example, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to embodiment of the method Part explanation.Embodiment of the method described above is only schematical, wherein described be used as separate part description Module may or may not be physically separated, can be each module when implementing this specification example scheme Function realize in the same or multiple software and or hardware.Can also select according to the actual needs part therein or Person's whole module achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are not the case where making the creative labor Under, it can it understands and implements.
The above is only the specific embodiment of this specification embodiment, it is noted that for the general of the art For logical technical staff, under the premise of not departing from this specification embodiment principle, several improvements and modifications can also be made, this A little improvements and modifications also should be regarded as the protection scope of this specification embodiment.

Claims (15)

1. a kind of method for digging for dimension of alarming, comprising:
The business sample set comprising regular traffic sample and abnormal traffic sample is obtained, includes multiple industry in any business sample The characteristic value for feature of being engaged in;
Building decision tree classifies to the business sample set, wherein each node indicates an industry in the decision tree It is engaged in sample set, root node corresponds to the business sample set in decision tree, it is divided using the characteristic value of service feature as side, And when dividing in each non-leaf nodes, information gain parameter is maximum, the information gain parameter include information gain amount or Person's information gain ratio, for indicating the information increase degree when decision tree is divided with the value of service feature;
The slave leaf node for meeting alert if in the decision tree is determined to the path of root node, the path characterizes a kind of industry The combination for characteristic value of being engaged in;
By the combination of service feature value corresponding to the path be determined as alarm dimension, so as to according to the alarm dimension to not The business datum of mark is classified.
2. the method as described in claim 1, building decision tree classifies to the business sample set, comprising:
If the label of business sample is identical in a node, determine that the node is leaf node;Alternatively,
If abnormal traffic sample accounting is more than threshold value in a node, determine that the node is leaf node;Alternatively,
If divided using the characteristic value of a service feature as side, exists in business sample set and do not include the service feature Business sample, generate correspond to not comprising the service feature business sample leaf node.
3. the method as described in claim 1, building decision tree classifies to the business sample set, wherein each Information gain parameter is maximum when dividing in non-leaf nodes, comprising:
For business sample set corresponding to any non-leaf nodes, the business sample for determining that the business appearance is concentrated is wrapped The service feature contained, each service feature include multiple values;
The first information amount of the non-leaf nodes is calculated, and, it calculates according to any service feature to business sample Collect the second information content of the node after being divided;
Amount and the second information content according to the first information, determine the information gain parameter of all service features in the sample set;
The maximum service feature of information gain parameter is determined as to divide service feature, according to the value of the division service feature The non-leaf nodes is divided.
4. method as claimed in claim 3, amount and the second information content, determine in the sample set and own according to the first information The information gain parameter of service feature, comprising:
By the difference of the first information amount and second information content, it is determined as the information gain amount of service feature;Alternatively,
The difference with second information content is measured according to the first information, determines the information gain amount of service feature, and, it determines and divides The ratio of the information gain amount and division information content is determined as information gain by the division information content that the node after splitting is included Than.
5. the method as described in claim 1, the alert if, comprising:
Exceptional sample accounting in the node in path is more than proportion threshold value;With,
In the business sample sequence of continuous multiple unit intervals, the SS point of exceptional sample accounting sequence is more than mark Standard divides threshold value.
6. method as claimed in claim 5, the alert if further include: the number of nodes in the path is more than quantity threshold Value.
7. a kind of business datum classification method based on dimension of alarming as described in any one of claim 1 to 6, comprising:
Determine the service feature value of the business datum;
If in the service feature value of the business datum including the alarm dimension, determine that the business datum is abnormal traffic number According to.
8. a kind of excavating gear for dimension of alarming, comprising:
Obtain module, obtain include regular traffic sample and abnormal traffic sample business sample set, in any business sample Characteristic value comprising multiple service features;
Module is constructed, building decision tree classifies to the business sample set, wherein each node table in the decision tree Show a business sample set, root node corresponds to the business sample set in decision tree, using the characteristic value of service feature as side It is divided, and, when dividing in each non-leaf nodes, information gain parameter is maximum, and the information gain parameter includes information Amount of gain, information gain ratio or Gini coefficient, for indicating the letter when decision tree is divided with the value of service feature Cease increase degree;
Path determination module, determine the slave leaf node for meeting alert if in the decision tree to the path of root node, it is described Path characterizes a kind of combination of service feature value;
The combination of service feature value corresponding to the path is determined as dimension of alarming, so as to according to institute by dimension determining module Alarm dimension is stated to classify to the business datum not marked.
9. device as claimed in claim 8, the building module determine if the label of business sample is identical in a node The node is leaf node;Alternatively, determining that the node is leaf section if abnormal traffic sample accounting is more than threshold value in a node Point;Alternatively, if existing in business sample set when being divided using the characteristic value of a service feature as side and not including the business The business sample of feature generates the leaf node for corresponding to the business sample not comprising the service feature.
10. device as claimed in claim 8, the building module, for business sample corresponding to any non-leaf nodes Subset determines the service feature that the business sample that the business appearance is concentrated is included, and each service feature includes multiple values; Calculate the first information amount of the non-leaf nodes, and, calculate according to any service feature to the business sample set into Second information content of the node after line splitting;Amount and the second information content according to the first information determine in the sample set and own The information gain parameter of service feature;The maximum service feature of information gain parameter is determined as to divide service feature, according to institute The value for stating division service feature divides the non-leaf nodes.
11. device as claimed in claim 10, the building module, by the first information amount and second information content Difference is determined as the information gain amount of service feature;Alternatively, measuring the difference with second information content according to the first information, really Determine the information gain amount of service feature, and, the division information content that the node after determining division is included, by the information gain The ratio of amount and division information content is determined as information gain ratio.
12. device as claimed in claim 8, the alert if includes: that the exceptional sample accounting in the node in path is more than Proportion threshold value;With, in the business sample sequence of continuous multiple unit intervals, the SS of exceptional sample accounting sequence Divide point threshold value that is above standard.
13. device as claimed in claim 12, the alert if further include, the number of nodes in the path is more than quantity Threshold value.
14. a kind of business datum sorter based on dimension of alarming as described in any one of claim 8 to 12, comprising:
Determining module, the service feature value of the business datum;
Judgment module determines that the business datum is if including the alarm dimension in the service feature value of the business datum Abnormal traffic data.
15. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, wherein the processor realizes method as described in any one of claim 1 to 7 when executing described program.
CN201811588986.3A 2018-12-25 2018-12-25 Mining method, device and equipment for alarm dimension Active CN110046179B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811588986.3A CN110046179B (en) 2018-12-25 2018-12-25 Mining method, device and equipment for alarm dimension

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811588986.3A CN110046179B (en) 2018-12-25 2018-12-25 Mining method, device and equipment for alarm dimension

Publications (2)

Publication Number Publication Date
CN110046179A true CN110046179A (en) 2019-07-23
CN110046179B CN110046179B (en) 2023-09-08

Family

ID=67274017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811588986.3A Active CN110046179B (en) 2018-12-25 2018-12-25 Mining method, device and equipment for alarm dimension

Country Status (1)

Country Link
CN (1) CN110046179B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102920A (en) * 2020-07-30 2020-12-18 苏州因顿医学检验实验室有限公司 Alcohol consumption prediction system based on gene screening
CN112948608A (en) * 2021-02-01 2021-06-11 北京百度网讯科技有限公司 Picture searching method and device, electronic equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7197504B1 (en) * 1999-04-23 2007-03-27 Oracle International Corporation System and method for generating decision trees
CN107526666A (en) * 2017-07-17 2017-12-29 阿里巴巴集团控股有限公司 Alarm method, system, device and electronic equipment based on deep learning
CN108629413A (en) * 2017-03-15 2018-10-09 阿里巴巴集团控股有限公司 Neural network model training, trading activity Risk Identification Method and device
CN108733966A (en) * 2017-04-14 2018-11-02 国网重庆市电力公司 A kind of multidimensional electric energy meter field thermodynamic state verification method based on decision woodlot

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7197504B1 (en) * 1999-04-23 2007-03-27 Oracle International Corporation System and method for generating decision trees
CN108629413A (en) * 2017-03-15 2018-10-09 阿里巴巴集团控股有限公司 Neural network model training, trading activity Risk Identification Method and device
CN108733966A (en) * 2017-04-14 2018-11-02 国网重庆市电力公司 A kind of multidimensional electric energy meter field thermodynamic state verification method based on decision woodlot
CN107526666A (en) * 2017-07-17 2017-12-29 阿里巴巴集团控股有限公司 Alarm method, system, device and electronic equipment based on deep learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102920A (en) * 2020-07-30 2020-12-18 苏州因顿医学检验实验室有限公司 Alcohol consumption prediction system based on gene screening
CN112102920B (en) * 2020-07-30 2023-11-10 苏州因顿医学检验实验室有限公司 Drinking volume prediction system based on gene screening
CN112948608A (en) * 2021-02-01 2021-06-11 北京百度网讯科技有限公司 Picture searching method and device, electronic equipment and computer readable storage medium
CN112948608B (en) * 2021-02-01 2023-08-22 北京百度网讯科技有限公司 Picture searching method and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN110046179B (en) 2023-09-08

Similar Documents

Publication Publication Date Title
CN109242135A (en) A kind of model method for running, device and service server
US10962968B2 (en) Predicting failures in electrical submersible pumps using pattern recognition
TWI688968B (en) Method and device for determining index grid of geographic fence, computer equipment and computer readable storage medium for performing the above method
WO2020147488A1 (en) Method and device for identifying irregular group
CN109214436A (en) A kind of prediction model training method and device for target scene
CN107679985B (en) Risk feature screening and description message generating method and device and electronic equipment
CN112199421B (en) Multi-source heterogeneous data fusion and measurement data multi-source mutual verification method and system
US10140516B2 (en) Event-based image management using clustering
CN109697456A (en) Business diagnosis method, apparatus, equipment and storage medium
CN109634819B (en) Alarm root cause positioning method and device and electronic equipment
US11360873B2 (en) Evaluation device, evaluation method, and evaluation program
US20140005916A1 (en) Real-time traffic prediction and/or estimation using gps data with low sampling rates
CN105824840B (en) A kind of method and device for area label management
CN110209560A (en) Data exception detection method and detection device
JP2020501232A (en) Risk control event automatic processing method and apparatus
CN110046179A (en) A kind of method for digging, device and the equipment of dimension of alarming
CN111176953B (en) Abnormality detection and model training method, computer equipment and storage medium
CN110046633B (en) Data quality detection method and device
CN109347653A (en) A kind of Indexes Abnormality discovery method and apparatus
CN103968825B (en) Navigation system and its operating method with abnormity detecting mechanism
CN106603299B (en) Method and device for generating service health index
CN110351136A (en) A kind of Fault Locating Method and device
US9273972B2 (en) Navigation system with error detection mechanism and method of operation thereof
CN110033276A (en) It is a kind of for security strategy generation method, device and the equipment transferred accounts
Zhiling et al. Location: A feature for service selection in the era of big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant