Specific embodiment
In order to make those skilled in the art more fully understand the technical solution in this specification embodiment, below in conjunction with this
Attached drawing in specification embodiment is described in detail the technical solution in this specification embodiment, it is clear that described
Embodiment is only a part of the embodiment of this specification, instead of all the embodiments.The embodiment of base in this manual,
Those of ordinary skill in the art's every other embodiment obtained, all should belong to the range of protection.
Business sample in this specification refers to the business sample comprising multiple service features.For example, srvice instance.System
The interface of system all receives the processing request of called side initiation all the time, requests to be not quite similar into ginseng and returning the result, multiple
In service feature dimensional extent, by the calling within the scope of the polymerization identical services characteristic value unit time, unit time model is formed
It encloses interior different and uniquely calls form, as srvice instance sample.
The service feature of srvice instance includes but is not limited to " interface, interface requests parameter, interface return parameters, request amount
Grade, the directed acyclic structure of internal system node, the upstream and downstream system of calling, deployment unit " etc., and each service feature one
As have multiple values.For example, for service feature " age ", the value that may include can be " teenager ", " youth ",
" middle age " and " old age ".In simple terms, srvice instance is exactly that one called to similar traffic is abstracted.Operation system can lead to
It crosses certain means (comparing for example, establishing regular instance library) and judges that the state of some example is normal or abnormal.
Therefore, it can be detected with service feature value included in Case-based Reasoning.For example, by certain several specific business
The combination conduct alarm dimension of characteristic value, such as " magnitude=3 " are requested in interface=1, return parameters=0, once some business sample
Include the valued combinations of these service features in this, determines that the business sample is abnormal.Determining for current alerts dimension relies on
In business experience, professional abundant is configured, and accuracy is influenced by personnel's experience, scene changes.Based on this, this theory
Bright book embodiment provides a kind of method for digging of dimension of alarming, and realizes that automatic mining goes out more accurately alarm dimension.
Below in conjunction with attached drawing, the technical solution that each embodiment of this specification provides is described in detail.As shown in Figure 1, Fig. 1 is this
A kind of flow diagram of the method for digging for alarm dimension that specification embodiment provides, the process specifically comprise the following steps:
S101, obtain include regular traffic sample and abnormal traffic sample business sample set, in any business sample
Characteristic value comprising multiple service features.
The mode of acquisition business sample can be the extraction section from from the historical data in currently nearlyr a period of time, with
As training sample.In the historical data, it can be compared according to the regular traffic sample database pre-established, judge an industry
Business sample is normal or abnormal.
S103, building decision tree classify to the business sample set.
The root node of decision tree corresponds to the business sample set.Since each business sample includes multiple service features,
And each service feature might have multiple values.Therefore, can the value based on some service feature to all business
Sample is classified, and finally all has been classified all business samples, obtains a decision tree.It shows on decision tree i.e.
To be divided since root node, obtaining multiple child nodes, child node can continue to divide, until meeting certain point
Cut-off condition is split, leaf node is obtained.Each child node both corresponds to a business sample set.
Since service feature has multiple, decision tree needs a selected service feature to be divided when being divided, with
So that the decision tree that division obtains has better classifying quality.That is, make the classification of child node pure as far as possible, here it is pure just
Refer to that the label of business sample in child node is all normal as far as possible or is all abnormal.
Many to the selection method of classification service feature, common method can be determined based on information gain parameter according to which
A service feature is divided.For business sample set to be sorted, information gain parameter is used for table before division and after division
Show the information increase degree when decision tree is divided with the value of service feature.Information gain parameter is bigger, then it represents that root
If dividing business sample set according to current signature, purity rises faster.Information gain parameter includes information gain amount, letter
The ratio of gains or Gini coefficient etc. are ceased, for indicating that the information when decision tree is divided with the value of service feature increases
Degree.
Wherein, for each business sample set to be sorted, (business sample set itself can also be considered as the son of oneself
Collection), can express its information content with certain specific calculation, and information gain amount and after presentation class and before classification
The difference of information content;Information gain can use the ratio of the information content before information gain amount and division;Gini coefficient then characterizes
The purity of set, Gini index is smaller, indicates that the sample being selected in sample set is smaller by the probability of misclassification, that is to say, that
The purity of the subset of business sample after division is higher.The calculation of specific information gain parameter can voluntarily be determined
Justice.
S105 is determined and is met the slave leaf node of alert if in the decision tree to the path of root node, the path
Characterize a kind of combination of service feature value.
The decision tree built is the decision tree for having optimal classification effect for business sample set.Decision tree
In path characterize the combination of service feature value a kind of, correspond to an optimal classification strategy/rule.It therefore, can be according to reality
Border needs, and some alert if is set, specifically, the alert if may include the number such as path interior joint
The restriction of amount, for the limited proportion of node abnormal traffic sample in path and for the abnormal accounting of node each in path
Statistics limit etc..So as to filter out the road for having preferable classifying quality for abnormal traffic sample from decision tree
Diameter.As shown in Fig. 2, schematic diagram of the Fig. 2 for Path selection in a kind of decision tree provided by this specification embodiment, void therein
Line part represents the paths for meeting alert if.
The combination of service feature value corresponding to the path is determined as dimension of alarming, so as to according to the report by S107
Alert dimension classifies to the business datum not marked.
Specifically, determining the combination of service feature value corresponding to path.For example, the alarm selected is tieed up in Fig. 2
Degree is " E → C → D ", and corresponding characteristic value combinations are " D=D2 " & " C=C2 ".Above-mentioned alarm dimension is in general, be not necessarily to
Including node direction therein.
Exception is found out generally by constructing an optimal decision tree of classifying quality in sample data, and from the decision tree
Rate meets the path of actual alarm condition so that the alarm dimension chosen, can local maxima exception probability with
For classifying.By the above-mentioned means, avoiding artificial experience selection alarm dimension, improve to the accurate of alarm dimension configuration
Property, and improve the efficiency of selection for dimension of alarming.
During constructing decision tree, it can be classified based on the value of all service features, until using up all
Service feature.But some modes can also be used, the division for child node is terminated in advance, directly generates leaf node, with
It improves building speed and improves the generalization ability of decision tree.
For example, determining that the node is leaf node if the label of business sample is identical in a node.I.e. in a son section
Business sample in point is all normal or in the case of being all exception, without further being divided to the node.
For example, determining that the node is leaf node if abnormal traffic sample accounting is more than threshold value in a node.I.e. one
It is significantly embodied in a node, the path comprising the node may have abnormal traffic sample preferable classification
When, determine that the node is leaf node.
Furthermore, it is also possible in the presence of column situation, service feature included in srvice instance sample is not fully identical.
For example, including example A and B in a sample set.Include service feature (a, b, c, d) in hypothesis instance A, and is wrapped in example B
Service feature (a, b, c, e) is contained.If being division service feature with special service feature e, B can be divided according to value under
In one child node.Due to not having service feature e in sample A, a leaf node then can be generated at this time, it is special to correspond to business
A is divided into wherein by the sample set for levying e.
It is maximum for information gain parameter when constructing decision tree, it can be realized using various ways.Its mode are as follows:
For business sample set corresponding to any non-leaf nodes, the business sample for determining that the business appearance is concentrated is included
Service feature, each service feature include multiple values;The first information amount of the non-leaf nodes is calculated, and, calculate root
Second information content of the node after being divided according to any service feature to the business sample set;According to the first information amount and
Second information content determines the information gain parameter of all service features in the sample set;Information gain parameter is maximum
Service feature is determined as dividing service feature, is divided according to the value of the division service feature the non-leaf nodes
It splits.
Specifically, being the information content (i.e. first information amount) for determining node to be divided first, for therein
One service feature is divided according to the value of the service feature, calculates the information content after dividing at this time (i.e. the according to child node
Two information content), then determine an information gain parameter.Then service feature included in the node is traversed, from all
Information gain parameter in select a maximum value, i.e. the service feature has node to be divided best classifying quality,
It can be used as division service feature.And the child node for being obtained after dividing, it is special to remove the division business from service feature
Sign.Used division service feature is not considered further that when dividing again for child node.
For the calculation method of information gain parameter, can take various forms.For example, for business sample set D,
Information content is defined as follows:Wherein, wherein piIndicate i-th of service feature in the industry
The probability occurred in business sample set, can be with the business sample size comprising the service feature divided by business sample in the set
Total quantity as estimation.Letter if business sample set D divided by attribute A, after being divided according to A to D
Breath are as follows:DjI.e. for according to the obtained j business sample of j value of service feature A
This subset.Information gain is the difference of the two: gain (A)=info (D)-infoA(D).Decision tree needs to divide every time
When, the information gain of each service feature is calculated, then the maximum service feature of information gain is selected to be divided.
In another example information content is defined using another mode, for information when the use information ratio of gains is classified
Amount is defined as follows:So gain ratio is definedI.e. first according to the information content (first information amount) before division and after dividing
Information content (the second information content) determine the information gain amount of service feature, then by the ratio of information gain amount and division information content
Value is determined as information gain ratio.And the information content after division can also be added it is some smooth, for example, to gain ratio
Denominator in, be added one expression child node information content average value smooth item Ave (split_info (A)) so that denominator
As split_info (A)+Ave (split_info (A)).
In another example for given business sample set D, it is assumed that have k service feature, the quantity of k-th of service feature
For Ck, then the Gini coefficient expression formula of sample D are as follows:Then it is found out from all possibility divisions
The smallest division of Gini coefficient, service feature corresponding to this division points are best points divided to sample set D
Split service feature.
It,, can be using such as the building mode of decision tree after having input business sample data sets for sum up
Lower pseudo-code is indicated:
1. construction one root node N, N correspond to sample collection of services;
2. all identical if the label of business samples all in present node: if count abnormal probability under the node, it is raw
At leaf node;
3. if attribute be sky: if count abnormal probability under the node, generate leaf node;
4. if generating leaf node abnormal probability > threshold value of present node, counts the abnormal probability under the node;
5. selecting the maximum service feature A of information gain parameter in candidate Traffic characteristic set;
6. sample is divided according to each value j of A, and division service feature is subtracted from business feature list
A remembers that the data of wherein j-th of branch are Aj;
7. if AjFor sky, then a leaf node is created, then count the abnormal probability under the node, generates leaf section
Point;
8. otherwise recursive call obtains subtree node Nj。
By the above-mentioned means, a business sample set for input can be obtained with optimal classifying quality
Decision tree.State in use building decision tree mode carry out alarm dimension excavate when, if it is assumed that having d in business sample set
A sample, the service feature for including have n, if that alarm combination is carried out by the way of power carries out dimension excavation, then it is multiple
Miscellaneous degree isAnd when carrying out alarm dimension selection using aforesaid way, it will time degree of complication
It has been optimized to O (n*d*log (d)) ≈ O (N3), it can be seen that carrying out what alarm dimension automatically selected using big data sample
In the process, scheme provided by this specification embodiment can greatly improve efficiency.
When in one embodiment, for the selection of alert if, it can be carried out based on such as under type: the node in path
In exceptional sample accounting be more than proportion threshold value;With, in the business sample sequence of continuous multiple unit intervals, abnormal sample
The SS of this accounting sequence divides point threshold value that is above standard.
Include multiple nodes in one path from leaf node to root node, have in each node normal sample and
Exceptional sample counts the ratio of exceptional sample number and total number of samples all on the path, as in the node in the path
Exceptional sample accounting, the threshold value of abnormal accounting can preset.
Unit interval can be set as such as 30s, 60s etc. according to practical situation.In the situation for obtaining data in real time
Under, the srvice instance sample comprising the path in continuous multiple unit intervals can be counted, one is obtained and includes the path
Sample sequence, and then the abnormal accounting sequence of the available sample sequence.The calculation of abnormal accounting can be exception
The ratio of number of samples and total number of samples.Assuming that the abnormal accounting sequence under the alarm dimension is Xi={ X1, X2...,
Xn, then the mean value u and standard deviation sigma of the exception accounting sequence can be counted, the mode for calculating the standard scores of the sequence can be with
It is with such as under type, standard scores Z=(Max (Xi)-u)/σ.Standard scores under this calculation have been reacted comprising the path
The abnormal maximum sample of accounting deviates from the degree of normal abnormal accounting in sample sequence, if the degree has been more than default threshold
Value thinks that the business sample comprising the path has had deviated from normal conditions.In other words, which can be used for alarming dimension
Degree.
Under general scenario, alarm dimension is without changing.Both it can alarm the business sample not marked.Such as
Fruit is changed with business development, business datum, and the warning effect under the alarm dimension is deteriorated.So at this point, only needing
Instant another batch of current business sample of acquisition excavates another batch of alarm dimension substitution i.e. according to current business sample
It can.
Further, in SS timesharing, if the number of nodes in path is very few, it be easy to cause alarm dimension institute right
The data answered are too many, can not accurately reflect abnormal traffic sample.And hence it is also possible in the alarm for screening alarm dimension
Another condition is added in condition: the number of nodes in path is more than amount threshold.It is obtained screening decision tree by alert if
After alarm dimension, it can according to the combination of service feature value corresponding to the path as alarm dimension.Such as Fig. 3 institute
Show, Fig. 3 is a kind of flow diagram for obtaining alarm dimension provided by this specification embodiment.
On the other hand, after acquiring alarm dimension, this specification embodiment is also based on above scheme and obtains
Alarm dimension carry out data classification, specifically include: determining the service feature value of the business datum of band classification;If the industry
Include the alarm dimension in the service feature value for data of being engaged in, determines that the business datum is abnormal traffic data.
Corresponding, this specification embodiment also provides a kind of excavating gear of dimension of alarming, as shown in figure 4, Fig. 4 is this theory
A kind of structural schematic diagram of the excavating gear for alarm dimension that bright book embodiment provides, comprising:
Module 401 is obtained, the business sample set comprising regular traffic sample and abnormal traffic sample, any business are obtained
It include the characteristic value of multiple service features in sample;
Module 403 is constructed, building decision tree classifies to the business sample set, wherein every in the decision tree
A node indicates a business sample set, and root node corresponds to the business sample set in decision tree, with the spy of service feature
Value indicative is divided for side, and, when dividing in each non-leaf nodes, information gain parameter is maximum, the information gain parameter
Including information gain amount, information gain ratio or Gini coefficient, divided for indicating in decision tree with the value of service feature
Information increase degree when splitting;
Path determination module 405 determines and meets the slave leaf node of alert if in the decision tree to the road of root node
Diameter, the path characterize a kind of combination of service feature value;
The combination of service feature value corresponding to the path is determined as dimension of alarming by dimension determining module 407, so as to
Classified according to the alarm dimension to the business datum not marked.
Further, the building module 403 determines that the node is if the label of business sample is identical in a node
Leaf node;Alternatively, determining that the node is leaf node if abnormal traffic sample accounting is more than threshold value in a node;Alternatively,
If divided using the characteristic value of a service feature as side, there is the industry not comprising the service feature in business sample set
Business sample, generates the leaf node for corresponding to the business sample not comprising the service feature.
Further, the building module 403 is determined for business sample set corresponding to any non-leaf nodes
The service feature that the business sample that the business appearance is concentrated is included, each service feature includes multiple values;Described in calculating
The first information amount of non-leaf nodes, and, after calculating divides the business sample set according to any service feature
Node the second information content;Amount and the second information content according to the first information, determine all service features in the sample set
Information gain parameter;The maximum service feature of information gain parameter is determined as to divide service feature, according to the division industry
The value of business feature divides the non-leaf nodes.
Further, the building module 403 determines the difference of the first information amount and second information content
For the information gain amount of service feature;Alternatively, measuring the difference with second information content according to the first information, service feature is determined
Information gain amount, and, the node division information content that is included after determining division believes the information gain amount and division
The ratio of breath amount is determined as information gain ratio.
Further, it is more than ratio that the alert if in described device, which includes: the exceptional sample accounting in the node in path,
Threshold value;With in the business sample sequence of continuous multiple unit intervals, the SS point of exceptional sample accounting sequence is super
Cross standard scores threshold value.
Further, the alert if in described device further includes that the number of nodes in the path is more than amount threshold.
On the other hand, also very a kind of business datum sorter based on upper alarm dimension of this specification embodiment, comprising:
Determining module, the service feature value of the business datum;
Judgment module determines the business number if including the alarm dimension in the service feature value of the business datum
According to for abnormal traffic data.
This specification embodiment also provides a kind of computer equipment, includes at least memory, processor and is stored in
On reservoir and the computer program that can run on a processor, wherein processor realizes report shown in FIG. 1 when executing described program
The method for digging of alert dimension.
Fig. 5 shows one kind provided by this specification embodiment and more specifically calculates device hardware structural schematic diagram,
The equipment may include: processor 1010, memory 1020, input/output interface 1030, communication interface 1040 and bus
1050.Wherein processor 1010, memory 1020, input/output interface 1030 and communication interface 1040 are real by bus 1050
The now communication connection inside equipment each other.
Processor 1010 can use general CPU (Central Processing Unit, central processing unit), micro- place
Reason device, application specific integrated circuit (Application Specific Integrated Circuit, ASIC) or one
Or the modes such as multiple integrated circuits are realized, for executing relative program, to realize technical side provided by this specification embodiment
Case.
Memory 1020 can use ROM (Read Only Memory, read-only memory), RAM (Random Access
Memory, random access memory), static storage device, the forms such as dynamic memory realize.Memory 1020 can store
Operating system and other applications are realizing technical solution provided by this specification embodiment by software or firmware
When, relevant program code is stored in memory 1020, and execution is called by processor 1010.
Input/output interface 1030 is for connecting input/output module, to realize information input and output.Input and output/
Module can be used as component Configuration (not shown) in a device, can also be external in equipment to provide corresponding function.Wherein
Input equipment may include keyboard, mouse, touch screen, microphone, various kinds of sensors etc., output equipment may include display,
Loudspeaker, vibrator, indicator light etc..
Communication interface 1040 is used for connection communication module (not shown), to realize the communication of this equipment and other equipment
Interaction.Wherein communication module can be realized by wired mode (such as USB, cable etc.) and be communicated, can also be wirelessly
(such as mobile network, WIFI, bluetooth etc.) realizes communication.
Bus 1050 include an access, equipment various components (such as processor 1010, memory 1020, input/it is defeated
Outgoing interface 1030 and communication interface 1040) between transmit information.
It should be noted that although above equipment illustrates only processor 1010, memory 1020, input/output interface
1030, communication interface 1040 and bus 1050, but in the specific implementation process, which can also include realizing normal fortune
Other assemblies necessary to row.In addition, it will be appreciated by those skilled in the art that, it can also be only comprising real in above equipment
Component necessary to existing this specification example scheme, without including all components shown in figure.
This specification embodiment also provides a kind of computer readable storage medium, is stored thereon with computer program, the journey
The method for digging of alarm dimension shown in FIG. 1 is realized when sequence is executed by processor.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
As seen through the above description of the embodiments, those skilled in the art can be understood that this specification
Embodiment can be realized by means of software and necessary general hardware platform.Based on this understanding, this specification is implemented
Substantially the part that contributes to existing technology can be embodied in the form of software products the technical solution of example in other words,
The computer software product can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are to make
It is each to obtain computer equipment (can be personal computer, server or the network equipment etc.) execution this specification embodiment
Method described in certain parts of a embodiment or embodiment.
System, method, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,
Or it is realized by the product with certain function.A kind of typically to realize that equipment is computer, the concrete form of computer can
To be personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play
In device, navigation equipment, E-mail receiver/send equipment, game console, tablet computer, wearable device or these equipment
The combination of any several equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for method reality
For applying example, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to embodiment of the method
Part explanation.Embodiment of the method described above is only schematical, wherein described be used as separate part description
Module may or may not be physically separated, can be each module when implementing this specification example scheme
Function realize in the same or multiple software and or hardware.Can also select according to the actual needs part therein or
Person's whole module achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are not the case where making the creative labor
Under, it can it understands and implements.
The above is only the specific embodiment of this specification embodiment, it is noted that for the general of the art
For logical technical staff, under the premise of not departing from this specification embodiment principle, several improvements and modifications can also be made, this
A little improvements and modifications also should be regarded as the protection scope of this specification embodiment.