Specific embodiment
In view of existing business calls monitoring method to need to carry out investigation analysis one by one to the log recording of operation system,
The data volume of processing is very huge, needs to consume a large amount of time and resource, is also unfavorable for carrying out abnormality alarming in time.This explanation
For book embodiment by obtaining multiple calling service data under pre-set business scene, each calling service data include multiple dimensions
Supplemental characteristic, multiple calling service data are clustered by preparatory trained clustering algorithm, obtain target cluster knot
Fruit, and then for each class cluster in target cluster result, based on calling service data each in class cluster under each dimension
Supplemental characteristic determines the characteristic dimension of such cluster from above-mentioned multiple dimensions.The characteristic dimension that each class cluster determines can be with
As a calling service example under the business scenario, for being monitored to the exception call under the business scenario.
In this specification embodiment, the calling service data that operation system acquires are learnt by clustering algorithm, are mentioned
It takes out and calls the characteristic dimension i.e. calling service example of unique calling form of data for characterizing similar traffic, so as into one
Step is monitored the business processing of operation system according to characteristic dimension, and appearance can also be positioned according to characteristic dimension
The position of exception call.
For example, in a kind of application scenarios, for the characteristic dimension that certain a kind of cluster obtained by clustering algorithm extracts, including
Dimension W1, dimension W2 and dimension W3.At this point, according to features described above dimension, that is, dimension W1, dimension W2 and dimension W3 to exception call
The implementation process being monitored may include:
The multiple groups regular traffic under business scenario is obtained in advance and calls data, and it includes multiple that every group of regular traffic, which calls data,
Calling service data are analyzed above-mentioned every group of regular traffic respectively and are called in data, the corresponding parameter distribution data of dimension W1, dimension
The corresponding parameter distribution data of W2 and the corresponding parameter distribution data of dimension W3.For example, dimension W3 is corresponding to the means of payment
Dimension, the corresponding supplemental characteristic of dimension W3 includes the means of payment 1, the means of payment 2 and the means of payment 3, in a certain group of normal industry
Business is called in data, and the calling service data corresponding to the means of payment 1 have a, the calling service data corresponding to the means of payment 2
There are b, the calling service data corresponding to the means of payment 3 there are c, calculate separately the calling service data number of every kind of means of payment
It measures and calls the accounting in data in this group of regular traffic, using the accounting of every kind of means of payment as the corresponding parameter distribution of dimension W2
Data.
The distributed data that every group of regular traffic calls the corresponding every kind of supplemental characteristic of features described above dimension in data is counted, is obtained
To the corresponding normal accounting threshold range of features described above dimension.If above-mentioned in calling service data to be analyzed under the business scenario
The corresponding parameter distribution data of characteristic dimension are unsatisfactory for above-mentioned normal accounting threshold range, it is possible to determine that the industry under the business scenario
It is abnormal that business calls calling of the data in this feature dimension to occur.
Log is called compared to analysis one by one, is obtained by the business data processing method that this specification embodiment provides
Characteristic dimension carrys out the exception call of monitoring business system, carries out abnormal point to calling service data from the level of characteristic dimension
Analysis, can save a large amount of time and system resource.Also, further abnormal position can be located in characteristic dimension, favorably
In the range for reducing investigation problem, the resource consumption of computer is reduced.
Fig. 1 shows a kind of running environment signal of business data processing method provided suitable for this specification embodiment
Figure.As shown in Figure 1, the business data processing method that this specification embodiment provides can be applied to include that multiple servers are
In system framework.Part server in above-mentioned multiple servers can be used as operation system and carry out specific business processing;It is another
Part server can be used as monitoring server, will for executing the business data processing method of this specification embodiment offer
It is calling service example that the similar traffic of operation system, which calls data abstraction, further to pass through calling service example to business system
The business processing of system is monitored extremely.
Wherein, server can be an electronic equipment with data operation, store function and network interaction function;
Or run in the electronic equipment, the software of support is provided for data processing, storage and network interaction.In the present embodiment
The quantity of above-mentioned server is not limited specifically.The server can be a server, can also be several servers, or
Person, the server cluster formed by several servers.
In order to better understand the business data processing method of this specification embodiment offer, below by attached drawing and tool
Body embodiment is described in detail the technical solution of this specification embodiment, it should be understood that this specification embodiment and implementation
Specific features in example are the detailed description to this specification embodiment technical solution, rather than to this specification technical solution
Restriction, in the absence of conflict, the technical characteristic in this specification embodiment and embodiment can be combined with each other.It needs
Illustrate, in this specification embodiment, " multiple " refer to " two or more ", and term " two or more " includes two or is greater than
Two situations.
In a first aspect, this specification embodiment provides a kind of business data processing method, the business data processing method
It can be executed by above-mentioned monitoring server.As shown in Fig. 2, the method at least may comprise steps of S200- step S204.
Step S200 obtains multiple calling service data under pre-set business scene, each calling service data packet
Include the supplemental characteristic of multiple dimensions.
In the present embodiment, operation system can provide multiple interfaces, by calling these interfaces, can execute corresponding
Business.The interface of operation system receives the processing request of called side initiation all the time to execute corresponding business, but every time
Some interface is called, calling service data all may be different.Above-mentioned calling service data specifically can be understood as characterizing industry
Supplemental characteristic involved by the calling process of business Data processing, including required parameter, return parameters and description were called
The data of the information of specific call flow in journey.By above-mentioned calling service data can it is more complete, clearly restore business
The specific calling process that Data processing is related to.
Each calling service data include the supplemental characteristic of multiple dimensions, for example, calling service data may include: to connect
Mouth, interface requests parameter, interface return parameters, request magnitude, the directed acyclic structure of internal system node, calling it is upper and lower
Trip system and deployment unit etc..Wherein, the directed acyclic structure of above system internal node specifically can be understood as call flow
In the calling sequence of the calling that is related to called functional module in call flow.Above-mentioned up-stream system can specifically be managed
Solution is any called side once specifically called in calling process.Above-mentioned down-stream system specifically can be understood as in calling process
Any called side once specifically called.Above-mentioned deployment unit can specifically refer to the calling being related in call flow and be adjusted
The structural unit or functional module that functional module is deployed in.
It should be noted that above-mentioned cited dimension is intended merely to that this specification embodiment is better described.Specifically
When implementation, dimension of the other kinds of calling service parameter as above-mentioned calling service data may be incorporated into as the case may be
Degree, this specification are not construed as limiting this.
In practical application scene, log can be called to be acquired by the interface to operation system and parsed, obtained
Corresponding calling service data are called under corresponding service scene every time.Specifically, it can be pre-configured with a period, obtained
Multiple calling service data in the preset time period, as the sample data for extracting characteristic dimension.The preset time period
It can be arranged according to practical application scene and processing requirement, for example, can be set to a nearest period, it is such as previous small
When, first 10 minutes or first 1 minute etc., this specification embodiment does not limit this.
In the present embodiment, above-mentioned pre-set business scene can be a business scenario, alternatively, also may include multiple and different
Business scenario.Wherein, business scenario specifically can be understood as business processing corresponding to calling process.In a kind of application scenarios
In, operation system can distinguish different business scenarios according to Apply Names (Appname) and interface.Apply Names are different
And/or interface number difference is different business scenario, the calling process of different business scene corresponds to different business processings,
The dimension that corresponding calling service data include is also just different.
For example, certain network platform operation system includes two kinds of business scenarios, and one of business scenario is corresponding to answer
With entitled A1, interface B1, the corresponding Apply Names of another business scenario are A2, and interface is also B1.Wherein, table 1 is shown
M calling service data of the application A1 at interface B1, each calling service data include q dimension.In the business scenario
Under each calling service data in, XijFor indicating the supplemental characteristic under respective dimensions.Wherein, i takes 1 to whole between q
Number, j take 1 to the integer between M.For example, certain dimension be " place city ", corresponding supplemental characteristic be then city name either
For indicating the feature coding etc. of city name.Similarly, table 2 shows N number of calling service data using A2 at interface B1,
Each calling service data include p dimension, in each calling service data under the business scenario, YghFor indicating in phase
Answer the supplemental characteristic under dimension.Wherein, g takes 1 to the integer between p, and h takes 1 to the integer between N.It is understood that two
Between different business scene, in multiple dimensions that calling service data include, at least one dimension is different, and calling service number
According to comprising dimension number, that is, q and p may be the same or different.
Table 1
Table 2
Step S202 clusters the multiple calling service data based on preparatory trained clustering algorithm, obtains
Target cluster result.
It is understood that clustering algorithm is the algorithm classified based on the similitude between sample to sample.Yu Ben
In one embodiment of specification, preparatory trained clustering algorithm can be the algorithm clustered based on the distance between sample.
At this point, the similitude between the calling service data to be learnt can be got over the distance between calling service data characterization, distance
Small, similitude is higher.
Since each dimension that calling service data include is parameter such as required parameter, the return that calling service process is related to
Parameter etc. or calling structure as above, down-stream system, deployment unit etc., therefore, it is necessary to by being respectively compared two calling service numbers
It is whether identical according to the supplemental characteristic under with dimension, to calculate the distance between two calling service data.It can as one kind
The embodiment of choosing, what is clustered by above-mentioned trained clustering algorithm to the calling service data that step S200 is obtained
In the process, the distance between any two calling service data can be obtained by following steps: two calling service numbers of detection
It is whether identical according to the supplemental characteristic under with dimension, obtain the testing result of each dimension;Detection knot based on each dimension
Fruit obtains the distance between the two calling service data.
It is understood that the dimension that the calling service data under same business scenario include is identical.Calculating two industry
When the distance between data are called in business, need to judge for each dimension two calling service data supplemental characteristic whether phase
Together, as in a certain business scenario, one of dimension of calling service data is the city where when user initiates to call, if
The supplemental characteristic of two calling service data is " Chengdu " under the dimension, then it represents that the two calling service data under the dimension
Supplemental characteristic it is identical.
Specifically, the above-mentioned testing result based on each dimension, obtains the distance between the two calling service data
Implementation process may include: obtain the business scenario under calling service data include dimension sum, then according to each dimension
The testing result of degree obtains the number of dimensions that supplemental characteristic is different between two calling service data, and supplemental characteristic is different
Accounting of the number of dimensions in above-mentioned dimension sum, as the distance between the two calling service data.Wherein, distance is big
In or equal to 0 and be less than or equal to 1 value.Certainly, it in this specification other embodiments, can also calculate in other ways
The distance between to any two calling service data, herein with no restriction.
As another embodiment, the similitude between calling service data can also use similarity characterization, similarity
Bigger, then the similarity degree between calling service data is higher.At this point it is possible to obtain two business tune according to above-mentioned testing result
With the identical number of dimensions of supplemental characteristic between data, by the identical number of dimensions of supplemental characteristic accounting in above-mentioned dimension sum
Than as the similarity between two calling service data.Similarity is also the value more than or equal to 0 and less than or equal to 1.
For example, it is assumed that the calling service data under the business scenario include the supplemental characteristic of 10 dimensions, are being calculated
Similarity between two calling service data or apart from when, between the two calling service data, there is the parameter under 3 dimensions
Data are identical, and supplemental characteristic under 7 dimensions is different, then the similarity between the two calling service data can be with are as follows: 3/10
=0.3, distance can be 0.7.
In above process, the supplemental characteristic recorded between two calling service data under which dimension for convenience is identical
And the supplemental characteristic under which dimension is different, it can be by the preset characteristic value of above-mentioned testing result.For example, in one kind
In application scenarios, it is assumed that the supplemental characteristic of two calling service data is identical as up-stream system is identical under some dimension, then can be with
Testing result under the dimension is denoted as 1, it is on the contrary then be denoted as 0.At this point, the dimension number that testing result is 1 is two business
The identical number of dimensions of supplemental characteristic between data is called, the dimension number that testing result is 0 is two calling service data
Between the different number of dimensions of supplemental characteristic.
Specifically, in above-mentioned steps S202, density-based algorithms can be used, such as DBSCAN (Density-
Based Spatial Clustering of Applications with Noise, has noisy density clustering
Method) algorithm, which can be cluster by the region division with sufficient density, and send out in having noisy spatial database
The cluster of existing arbitrary shape.Certainly, in this specification other embodiments, other clustering algorithms such as kmeans, layer can also be used
Secondary cluster, manifold cluster etc..
In addition, needing to be previously obtained trained clustering algorithm before executing above-mentioned steps S202.In the present embodiment,
The process of training clustering algorithm can specifically include parameter training process.Specifically, parameter training process may include following
Step S300 and step S302.
Step S300 obtains training sample set, and the training sample set includes multiple calling service data samples;
It should be noted that when being directed to a specific transactions scene progress algorithm training, the industry of training sample concentration
Business calls data sample to acquire from the specific transactions scene, the dimension that training sample concentrates all calling service data samples to include
It spends identical.In addition, when being directed to multiple specific transactions scenes progress algorithm training simultaneously, the calling service of training sample concentration
Data sample is acquired respectively from multiple business scenarios, at this point it is possible to which training sample is concentrated industry of the acquisition from same business scenario
Business calls data sample to be divided into a subset, so that the dimension that calling service data sample includes in same subset is identical, from
And can be trained respectively for each subset, obtain the corresponding trained clustering algorithm of each business scenario.
Step S302 clusters the training sample set based on preset clustering algorithm, when cluster result is unsatisfactory for
When preset polymerization condition, the configuration parameter in the preset clustering algorithm is adjusted according to preset rules, until cluster
As a result meet the preset polymerization condition, obtain the trained clustering algorithm.
By taking DBSCAN algorithm as an example, when measuring the similitude between sample using above-mentioned distance, DBSCAN algorithm
Configuration parameter includes: radius and minimum neighborhood points.It should be noted that in the mistake being trained to preset clustering algorithm
Cheng Zhong, initial radius and minimum neighborhood points can be rule of thumb arranged.At this point, above-mentioned be based on preset clustering algorithm to institute
It states training sample set to be clustered, the process for obtaining cluster result may include: to obtain training sample to concentrate any two business
Call the distance between data sample;In turn, based on preset configuration parameter, that is, above-mentioned initial radius, above-mentioned minimum neighborhood point
The distance between several and above-mentioned any two calling service data sample, determines that training sample concentrates all kernel objects;
Obtaining training sample concentrates the direct density of each kernel object up to sample, and the direct density based on each kernel object can
Up to sample, cluster result is obtained.
For example, in a kind of concrete application scene, it is assumed that radius is expressed as E, as kernel object in E neighborhood
Minimum neighborhood points are MinPts.It should be noted that the region in given object radius E is known as the object in the present embodiment
E neighborhood.
During being clustered by above-mentioned DBSCAN algorithm, each calling service data sample in sample set is detected
Whether it is that the process of kernel object can specifically include: the calling service data sample in sample set is traversed, it will wherein
Any one calling service data sample detects in other samples in sample set in addition to target sample as target sample,
The distance between target sample is less than or equal to the number of samples of above-mentioned radius E, the as sample in the E neighborhood of target sample
Number then determines the target sample for core pair when the number of samples in the E neighborhood of target sample is greater than or equal to MinPts
As conversely, then target sample is not kernel object.Then using next calling service data sample as target sample, until
Traversal finishes.
After having determined kernel object all in sample set, need to further determine that all in the E neighborhood of kernel object
Then direct density finds density up to sample for all direct density in the E neighborhood of all kernel objects up to sample
Be connected sample set.Certainly, some density are related to during this up to the merging of sample.It should be noted that given one
A sample set D, and xiAnd xjD is belonged to, if xiIn xjE neighborhood in, and xjIt is a kernel object, then we say sample
xiFrom sample xjSet out is that direct density is reachable.It is the reachable transitive closure of direct density that density is reachable, and this relationship is
Asymmetrical, mutual density is reachable only between kernel object.And it is symmetric relation that density, which is connected, the purpose of DBSCAN algorithm is
Find the maximum set of the connected object of density.
For example, sample is concentrated with 12 kernel objects, respectively indicate are as follows: P1~P12.Wherein, P2 can by the direct density of P1
It reaches, P3 is reachable by the direct density of P2, and P4 is reachable by the direct density of P3, and P5 is reachable by the direct density of P4, and P6 can by the direct density of P5
It reaches, P7 is reachable by the direct density of P6;P9 is reachable by the direct density of P8, and P10 is reachable by the direct density of P9, and P11 is by the direct density of P10
Reachable, P12 is reachable by the direct density of P11.At this point, can be obtained by the connected sample set of two density, specially by P1 to P7
And the direct density achievable pair of P1 each kernel object into P7 as merge into a density be connected sample set, by P1 to P7
And the direct density achievable pair of P1 each kernel object into P7 is as merging into the connected sample set of another density.It is each close
Spending the sample set that is connected is a class cluster in cluster result.
In the training process, after each iteration obtains cluster result, need to judge whether cluster result meets preset polymerization
Condition, when cluster result is unsatisfactory for preset polymerization condition, according to preset rules to the configuration in the preset clustering algorithm
Parameter is adjusted, based on the entrance algorithm iteration process next time of configuration parameter after adjustment, until cluster result meet it is default
Polymerizing condition.
In the present embodiment, preset polymerization condition can be specifically arranged according to practical application request.For example, can be according to quilt
Accounting of the calling service data sample number of cluster in total number of samples amount requires setting, correspondingly, can also be according to poly-
The accounting of noise spot requires setting in class result, and the accounting of noise spot is the calling service data being not included in any class cluster
Accounting of the number of samples in the total number of samples amount of training sample set.In another example apart from the requirements above, preset polymerization condition
Such as class cluster number that cluster result includes can also be judged whether preset comprising the requirement of the class cluster number generated to cluster
Within the scope of number etc..
In one embodiment, above-mentioned parameter adjustment process may include: the business tune obtained in the cluster result
With accounting of the data sample quantity in the total number of samples amount of the training sample set;When the accounting is less than the first preset threshold
When, above-mentioned configuration parameter is adjusted according to preset rules, based on the clustering algorithm after adjustment configuration parameter to training sample
Collection is clustered, until obtaining trained clustering algorithm when accounting is greater than or equal to first preset threshold.
In above-mentioned implementation process, the calling service data sample quantity in cluster result is each class in cluster result
The sum of the sample size that cluster includes, obtained accounting are able to reflect the extent of polymerization of clustering algorithm under corresponding configuration parameter.The
One preset threshold can be arranged according to practical application scene and process demand, for example, can be set to 0.9 or 0.95 etc..It lifts
For example, it is assumed that training sample set includes 1000 samples, and the sum of sample size that each class cluster includes in cluster result is 910
It is a, then accounting are as follows: 910/1000=0.91 shows that the cluster result meets preset polymerization item if the first preset threshold is 0.9
Part.
In addition, specifically can wrap according to preset rules to the process that configuration parameter is adjusted in above-mentioned training process
It includes: above-mentioned radius and/or minimum neighborhood points being adjusted according to preset rules, until obtaining radius of target and target minimum
Neighborhood points, so that the cluster result of current iteration meets preset polymerization condition.Specifically, preset rules can be according to reality
The corresponding preset polymerization condition of application scenarios and test of many times setting.Each of it is understood that radius becomes larger, then generate
The calling service data sample for including in class cluster will become more, and the cluster number being polymerized to accordingly will tail off, and vice versa.And
MinPts becomes smaller, then can form more clusters, vice versa.For example, the first step-length and the second step-length can be respectively set, the
One step-length is the adjusting step-length of radius, the adjusting step-length of a length of minimum neighborhood points of second step.In above-mentioned implementation process, when poly-
The sum of the calling service data sample quantity that each class cluster includes in class result accounting in the total number of samples amount of training sample set
It, can be on the basis of current radius and minimum neighborhood points, according to the first step-length pair radius when than less than the first preset threshold
It is adjusted and/or minimum neighborhood points is adjusted according to the second step-length, such as can reduce radius and/or reduce minimum adjacent
Domain points, specific adjustment rule are arranged according to the preset polymerization condition and test of many times of setting.
It should be noted that above-mentioned preset clustering algorithm, which can also use, to be passed through in this specification other embodiments
Above-mentioned similarity measures the algorithm of the similitude between two samples.For example, the configuration of above-mentioned DBSCAN algorithm can be joined
Number is set as similarity threshold and smallest sample number.Correspondingly, according to practical application scene set similarity threshold with
And after smallest sample number, when the number of samples that the similarity between target sample is greater than or equal to above-mentioned similarity threshold is big
When the smallest sample number, it is determined that the target sample is kernel object, and the direct density of kernel object is reachable
Sample is the sample for being greater than or equal to above-mentioned similarity threshold with the similarity of the kernel object.
Optionally, in order to guarantee the stability of obtained clustering algorithm, after completing parameter training process, training is poly-
The process of class algorithm can also include test of heuristics process.
As an implementation, test of heuristics process may include: the test sample obtained in preset test period
Collection, test sample collection also includes multiple calling service data samples;Test sample collection input above-mentioned steps S302 training is obtained
Clustering algorithm, obtain test cluster result;Judge whether the test cluster result meets default test condition, when the survey
When examination cluster result meets default test condition, then determine that the clustering algorithm that above-mentioned steps S302 training obtains is trained gathers
Class algorithm.
During above-mentioned test of heuristics, test period can be arranged according to practical application scene and process demand.For example,
Test period can be set to S302 through the above steps to complete one day, two days or three days etc. after parameter training.In one kind
In embodiment, multiple special time periods can be set within test period, for example, when test period is three days, Ke Yishe
Morning 10:00 to 11:00,17:00 to 18:00 in afternoon and evening 21:00 to 22:00 daily in this three days is set to be set as
Special time period, at this point, the test sample collection in above-mentioned acquisition preset test period is specially to obtain preset test period
Calling service data in interior special time period form test sample collection;It correspondingly, can be successively by acquired test week
Sample in each special time period of interim every day inputs the clustering algorithm that above-mentioned steps S302 training obtains respectively, obtains
Corresponding test cluster result, and then judge whether the test cluster result obtained in test period meets default test condition.
Specifically, default test condition can be arranged according to practical application scene and process demand.Such as, it can be determined that
The class cluster number that test cluster result includes whether with the class that meets the cluster result of preset polymerization condition in step S302 and include
Cluster number is consistent, and when consistent, discriminating test cluster result meets default test condition, when there is inconsistency, discriminating test cluster
As a result it is unsatisfactory for default test condition.Either, it when preset test period being provided with multiple special time periods, can calculate
The degree of consistency of the corresponding test cluster result of test sample in test period in all special time periods, when consistency journey
When degree reaches preset condition for consistence, indicates that the clustering algorithm of step S302 training meets stability requirement, then can be determined that
The clustering algorithm that above-mentioned steps S302 training obtains is trained clustering algorithm.
Wherein, the degree of consistency can be according to the class cluster that all test cluster results obtained within test period include
Number distribution determines.For example, the identical maximum number for testing cluster result of class cluster number and total test cluster result can be used
The ratio of number carrys out the degree of consistency of characterization test cluster result, when the ratio is more than preset consistency threshold value, then sentences
Determine the degree of consistency and reaches preset condition for consistence.For example, one 30 tests cluster knot is obtained within test period
Fruit, wherein the class cluster number that 28 test cluster results include is 10, the class cluster number that 2 test cluster results include is 9
It is a, at this point, the degree of consistency of test cluster result are as follows: 28/30=0.933, it is assumed that preset consistency threshold value is 0.9, then
Indicate that the clustering algorithm of step S302 training meets stability requirement.
It should be noted that test sample can be concentrated and be wrapped when test cluster result is unsatisfactory for default test condition
The calling service data contained are added to training sample and concentrate as new training sample set, and adjust the configuration ginseng of clustering algorithm
Number, repeats above-mentioned parameter training process and test process, until configuration parameter adjusted makes the poly- of clustering algorithm
Class result meets above-mentioned preset polymerization condition, and tests cluster result and meet default test condition.
Further, so that it may by trained clustering algorithm to multiple calling service data under corresponding service scene
It is clustered, obtains target cluster result.And then following steps S204 is executed for target cluster result, extract each class cluster pair
The characteristic dimension answered, as the calling service example under the business scenario.
Step S204, for each class cluster in the target cluster result, based on each calling service in the class cluster
Supplemental characteristic of the data under each dimension determines the characteristic dimension of the class cluster, wherein described from the multiple dimension
The characteristic dimension of each class cluster is for being monitored the exception call under the pre-set business scene.
It include more than two class clusters by the target cluster result that above-mentioned trained clustering algorithm obtains, each class cluster is
For a calling service data acquisition system with certain similitude.In turn, so that it may by comparing business tune each in the set
With the supplemental characteristic under each dimension of data, the feature dimensions for being able to reflect the similitude of the calling service data acquisition system are obtained
Degree.This feature dimension is the polymerization dimension of the corresponding calling service data acquisition system of respective class cluster.
In the present embodiment, each class cluster corresponds to one group of characteristic dimension, for example, it is assumed that target cluster result includes 5 classes
Cluster can then correspond to obtain 5 groups of characteristic dimensions.Wherein, one group of characteristic dimension may include more than two dimensions.For example, one
In kind application scenarios, a certain group of characteristic dimension includes four dimensions, is followed successively by " city & product code & order scene & payer
Method ", wherein " & " indicates the combination of this four dimensions.It should be noted that in this specification other embodiments, characteristic dimension
It is also possible to a dimension.
In above-mentioned steps S204, the characteristic dimension of class cluster is determined from multiple dimensions, as determines which type of polymerize
Dimension can polymerize to obtain the calling service data acquisition system of such cluster.As an alternative embodiment, above-mentioned from multiple dimensions
The process that the characteristic dimension of class cluster is determined in degree may include: by each calling service data in the comparison class cluster every
Supplemental characteristic under a dimension determines that coverage rate is more than the dimension combination of the second preset threshold from the multiple dimension, described
Dimension combination includes more than one dimension in the multiple dimension;It is more than the dimension of the second preset threshold based on the coverage rate
Combination, obtains the characteristic dimension.Wherein, the coverage rate of dimension combination has identical ginseng for being characterized under dimension combination
Accounting of the calling service data in such cluster of number data, it can by calculating in such cluster under dimension combination, have
The ratio between calling service data count that the calling service data amount check and such cluster of identical parameters data include obtains.
Specifically, the second preset threshold can be set according to actual needs, such as can be set to 0.8 or 0.9 etc..With
For second preset threshold is 0.9, then need for each class cluster, first the determining coverage rate in such cluster is more than 0.9 dimension
Combination, i.e., there are supplemental characteristic of 90% or more the calling service data in dimension combination under each dimension is right in such cluster
It answers identical.By taking dimension combines " city & product code & order scene & method of payment " as an example, if dimension combination is in certain class cluster
Coverage rate is more than 0.9, then illustrates in the calling service data for having 90% or more in such cluster to include identical city, is identical
Product code, identical order scene and identical method of payment.
In an embodiment of the present embodiment, it is more than the dimension combination of the second preset threshold based on coverage rate, obtains
The implementation process of characteristic dimension may include: compare coverage rate be more than the second preset threshold dimension combination in, each dimension group
The dimension number that conjunction includes, the dimension for including using the largest number of dimension combinations of dimension is as characteristic dimension.For example, there are 5
Dimension combined covering rate is more than the second preset threshold, and the number of dimensions that this 5 dimension combinations include is respectively 1,1,2,2 and 4, then
It regard the dimension combination of the data comprising 4 dimensions as characteristic dimension.It should be noted that coverage rate is more than the second preset threshold
Dimension combination in, it is identical there are the dimension number of two dimensions combination and when being dimension the largest number of dimensions combination, will
The dimension that the maximum dimension combination of coverage rate includes is as characteristic dimension, either, by any one in the combination of the two dimensions
The dimension that dimension combination includes is as characteristic dimension.
In addition, key dimension set, key dimension set packet can also be preset in this specification other embodiments
A variety of specified calling service parameters are included, such as may include " place city " and " method of payment ".It is more than the from coverage rate
Two preset thresholds and at least one corresponding dimension of calling service parameter combines in determinant attribute set, select dimension number
The dimension that most dimension combinations includes is as features described above dimension.
After S204 obtains the corresponding one group of characteristic dimension of each class cluster through the above steps, so that it may according to every group of feature
Dimension is monitored the exception call in business procession.
In one optional embodiment of this specification, class cluster is determined in above-mentioned multiple dimensions from calling service data
After characteristic dimension, this business data processing method can also include exception call monitoring step, processed for monitoring business
Exception call in journey.Specifically, exception call monitoring step may include: the characteristic dimension based on each class cluster, identification
Exception call data under the pre-set business scene in calling service data to be analyzed.That is, to certain business field
Calling service data under scape carry out clustering processing, and after extracting characteristic dimension, can use obtained characteristic dimension
To identify calling service data to be analyzed under the business scenario with the presence or absence of abnormal.From characteristic dimension, that is, calling service example
In level, abnormal monitoring is carried out to the business procession of operation system, is conducive to improve monitoring efficiency, can be saved a large amount of
Time and system resource.And it is possible to which further abnormal position is located in characteristic dimension, be conducive to reduce investigation problem
Range reduces the resource consumption of computer.
In the present embodiment, the calling service to be analyzed under pre-set business scene is identified using obtained characteristic dimension
Data with the presence or absence of abnormal embodiment can there are many.It is set forth below two kinds therein to be introduced, certainly, specific real
During applying, it is not limited to following two situation.
The first, can be directed to the characteristic dimension of each class cluster, execute following detection process: determining that characteristic dimension is corresponding
Anomaly parameter data;Obtain the frequency of occurrence of above-mentioned anomaly parameter data in calling service data to be analyzed, wherein frequency occur
It is secondary to be used to characterize in calling service data to be analyzed, it include the quantity of the calling service data of above-mentioned anomaly parameter data;
It then will include the calling service data identification of the anomaly parameter data when the frequency of occurrence is more than third predetermined threshold value
For exception call data.For instance it can be possible that exception is called as caused by the attack of Hei Chan clique, thus in time to abnormal conditions
It is handled.And it is possible to which abnormal position is further navigated to the specific dimension that this feature dimension includes, be conducive to reduce row
Interrogate the range of topic.
Wherein, calling service data to be analyzed can be the calling service data in period specified time, specify
Time cycle can be set according to actual needs, such as can be a hour, one day or 7 days etc..For example, with characteristic dimension packet
For including " city & product code & order scene & method of payment ", it is assumed that wherein calling service data are deposited under " method of payment " dimension
In many kinds of parameters data, one of supplemental characteristic is error message, as anomaly parameter data, calling service number to be analyzed
According to a shared S, if there is W calling service data packet to contain the anomaly parameter data in this S data, then it represents that in S number
In, the corresponding anomaly parameter data of features described above dimension occur W times.At this point, the frequency of occurrence of the anomaly parameter data can
W is thought, alternatively, being also possible to W/S.
Third predetermined threshold value can be obtained according to specific business scenario and test of many times, as under normal circumstances, be somebody's turn to do
The frequency threshold value that anomaly parameter data occur.It is understood that different group characteristic dimensions, i.e., the spy extracted based on inhomogeneity cluster
The corresponding third predetermined threshold value of sign dimension can be set to difference.For example, the anomaly parameter data under certain characteristic dimension are normal
In the case of daily frequency of occurrence be up to 100 times, then can set the corresponding third predetermined threshold value of this feature dimension to most
10 times of high frequency of occurrence are 1000 times.If the frequency of occurrence of the anomaly parameter data under certain day this feature dimension is more than 1000 times
When, then it will include that the calling service data of the anomaly parameter data are identified as exception.
The third, can be with after obtaining the characteristic dimension under the business scenario in S200 through the above steps to step S204
A regular instance library is constructed for every group of characteristic dimension by way of mark, i.e., executes following mistake for every group of characteristic dimension
Journey: it is added in the corresponding supplemental characteristic of characteristic dimension labeled as normal supplemental characteristic in one regular instance library of building in advance
Into the regular instance library, for example, it is assumed that characteristic dimension includes " city & product code & order scene & method of payment ", then it will mark
It is set to normal specific city, product code, order scene and means of payment correspondence to be added in above-mentioned regular instance library.Building
A certain group of feature dimensions under the good business scenario behind the corresponding regular instance library of every group of characteristic dimension, under through the business scenario
Degree, when judging calling service data to be analyzed with the presence or absence of exception, it can be determined that calling service data to be analyzed are in the spy
The supplemental characteristic under dimension is levied whether in the corresponding regular instance library of this feature dimension, if not corresponding just in this feature dimension
In normal case library, then it is abnormal to determine that the calling service data exist, and abnormal position is located at this feature dimension, conversely, then not
There are exceptions.
The business data processing method that this specification embodiment provides, by preparatory trained clustering algorithm to default industry
Calling service data under business scene are learnt, and the spy for calling unique calling form of data for characterizing similar traffic is obtained
Dimension is levied, compared to extraction feature dimension is rule of thumb carried out, is conducive to improve accuracy and efficiency that characteristic dimension is extracted, nothing
The a large amount of time need to be consumed characteristic dimension is adjusted and be verified repeatedly, reduce money of the system in characteristic dimension extraction
Source consumption.And the characteristic dimension that this method can be suitable for Added Business scene is extracted, and scalability is strong.
Further, the characteristic dimension determined can be used in supervising the exception call under above-mentioned pre-set business scene
Control, is monitored the exception call of operation system from the level of characteristic dimension, calls log, energy compared to analysis one by one
Enough save a large amount of time and system resource.Also, further abnormal position can be located in characteristic dimension, be conducive to reduce
The range of investigation problem reduces resource consumption of the system in abnormal position positioning.
Second aspect, based on the same invention structure of the business data processing method that is provided with aforementioned first aspect embodiment
Think, this specification embodiment additionally provides a kind of service data processing apparatus.Fig. 3 is referred to, the service data processing apparatus 30
Include:
Data acquisition module 31, for obtaining multiple calling service data under pre-set business scene, each business
Calling data includes the supplemental characteristic of multiple dimensions;
Cluster module 32, for being gathered based on preparatory trained clustering algorithm to the multiple calling service data
Class obtains target cluster result;
Dimension determining module 33, it is every in the class cluster for being based on for each class cluster in the target cluster result
Supplemental characteristic of a calling service data under each dimension, determines the feature dimensions of the class cluster from the multiple dimension
Degree, wherein the characteristic dimension of each class cluster is for being monitored the exception call under the pre-set business scene.
As a kind of optional embodiment, the clustering algorithm is to be clustered based on the distance between calling service data
Algorithm, the distance between calling service data described in any two are obtained by following steps:
It whether identical detects supplemental characteristic of two calling service data under with dimension, obtains the detection of each dimension
As a result;
Based on the testing result of each dimension, the distance between described two calling service data are obtained.
As a kind of optional embodiment, above-mentioned service data processing apparatus 30 further includes training module.The trained mould
Block includes:
Sample acquisition submodule, for obtaining training sample set, the training sample set includes multiple calling service data
Sample;
Submodule is clustered, for clustering based on preset clustering algorithm to the training sample set;
Parameter adjusting submodule, for when cluster result is unsatisfactory for preset polymerization condition, according to preset rules to described
Configuration parameter in preset clustering algorithm is adjusted, until the obtained cluster result of cluster submodule meet it is described pre-
If polymerizing condition, the trained clustering algorithm is obtained.
As a kind of optional embodiment, the cluster submodule is used for:
It obtains the training sample and concentrates the distance between any two calling service data sample;
Based on the distance between preset configuration parameter and any two calling service data sample, determine described in
Training sample concentrates all kernel objects;
Obtaining the training sample concentrates the direct density of each kernel object up to sample, and is based on each core
The direct density of object obtains cluster result up to sample.
As a kind of optional embodiment, the parameter adjusting submodule is used for:
The calling service data sample quantity in the cluster result is obtained in the total number of samples amount of the training sample set
In accounting;
When the accounting is less than the first preset threshold, the configuration parameter is adjusted according to the preset rules,
The training sample set is clustered based on the clustering algorithm after adjustment configuration parameter, until the accounting is greater than or equal to institute
When stating the first preset threshold, the trained clustering algorithm is obtained.
As a kind of optional embodiment, the dimension determining module 33 includes:
First determines submodule 331, is used for through each calling service data in the comparison class cluster under each dimension
Supplemental characteristic, from the multiple dimension determine coverage rate be more than the second preset threshold dimension combination, wherein dimension combine
Coverage rate for being characterized under dimension combination, the calling service data with identical parameters data are in the class cluster
Accounting, the dimension combination include more than one dimension in the multiple dimension;
Second determines submodule 332, for being more than the dimension combination of the second preset threshold based on the coverage rate, obtains institute
State characteristic dimension.
As a kind of optional embodiment, described second determines that submodule 332 is used for: being more than second pre- by the coverage rate
If in the dimension combination of threshold value, being used as the characteristic dimension comprising the most dimension combination of dimension.
As a kind of optional embodiment, above-mentioned service data processing apparatus 30 further includes anomalous identification module, is used for: base
In the characteristic dimension of each class cluster, the abnormal tune in calling service data to be analyzed under the pre-set business scene is identified
Use data.
It should be noted that service data processing apparatus 30 provided by this specification embodiment, wherein modules are held
The concrete mode of row operation is described in detail in the embodiment of the method that above-mentioned first aspect provides, and will not do herein
Elaborate explanation.
The third aspect, based on inventive concept same as the business data processing method that previous embodiment provides, this explanation
Book embodiment additionally provides a kind of electronic equipment, as shown in figure 4, including memory 404, one or more processors 402 and depositing
The computer program that can be run on memory 404 and on the processor 402 is stored up, when the processor 402 executes described program
The step of realizing the business data processing method that first aspect provides above.
Wherein, in Fig. 4, bus architecture (is represented) with bus 400, and bus 400 may include any number of interconnection
Bus and bridge, bus 400 will include the one or more processors represented by processor 402 and what memory 404 represented deposits
The various circuits of reservoir link together.Bus 400 can also will peripheral equipment, voltage-stablizer and management circuit etc. it
Various other circuits of class link together, and these are all it is known in the art, therefore, no longer carry out further to it herein
Description.Bus interface 405 provides interface between bus 400 and receiver 401 and transmitter 403.Receiver 401 and transmitter
403 can be the same element, i.e. transceiver, provide the unit for communicating over a transmission medium with various other devices.Place
It manages device 402 and is responsible for management bus 400 and common processing, and memory 404 can be used for storage processor 402 and execute behaviour
Used data when making.
It is understood that structure shown in Fig. 4 is only to illustrate, the electronic equipment that this specification embodiment provides can also be wrapped
Include than shown in Fig. 4 more perhaps less component or with the configuration different from shown in Fig. 4.Each component shown in Fig. 4
It can be realized using hardware, software, or its combination.
Fourth aspect, based on inventive concept same as the business data processing method provided in previous embodiment, this theory
Bright book embodiment additionally provides a kind of computer readable storage medium, is stored thereon with computer program, and the program is by processor
The step of business data processing method that first aspect provides above is realized when execution.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims
It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment
It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable
Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can
With or may be advantageous.
This specification is referring to the method, equipment (system) and computer program product according to this specification embodiment
Flowchart and/or the block diagram describes.It should be understood that can be realized by computer program instructions every in flowchart and/or the block diagram
The combination of process and/or box in one process and/or box and flowchart and/or the block diagram.It can provide these computers
Processor of the program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices
To generate a machine, so that generating use by the instruction that computer or the processor of other programmable data processing devices execute
In setting for the function that realization is specified in one or more flows of the flowchart and/or one or more blocks of the block diagram
It is standby.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of equipment, the commander equipment realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of this specification has been described, once a person skilled in the art knows basic wounds
The property made concept, then additional changes and modifications may be made to these embodiments.So the following claims are intended to be interpreted as includes
Preferred embodiment and all change and modification for falling into this specification range.
Obviously, those skilled in the art can carry out various modification and variations without departing from this specification to this specification
Spirit and scope.In this way, if these modifications and variations of this specification belong to this specification claim and its equivalent skill
Within the scope of art, then this specification is also intended to include these modifications and variations.