CN110457175A - Business data processing method, device, electronic equipment and medium - Google Patents

Business data processing method, device, electronic equipment and medium Download PDF

Info

Publication number
CN110457175A
CN110457175A CN201910609962.XA CN201910609962A CN110457175A CN 110457175 A CN110457175 A CN 110457175A CN 201910609962 A CN201910609962 A CN 201910609962A CN 110457175 A CN110457175 A CN 110457175A
Authority
CN
China
Prior art keywords
dimension
service data
calling service
cluster
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910609962.XA
Other languages
Chinese (zh)
Other versions
CN110457175B (en
Inventor
赵孝松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910609962.XA priority Critical patent/CN110457175B/en
Publication of CN110457175A publication Critical patent/CN110457175A/en
Application granted granted Critical
Publication of CN110457175B publication Critical patent/CN110457175B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This specification embodiment provides a kind of business data processing method, multiple calling service data under pre-set business scene are clustered by preparatory trained clustering algorithm, obtain target cluster result, and then for each class cluster in target cluster result, supplemental characteristic based on calling service data each in such cluster under each dimension, the characteristic dimension of such cluster is determined from multiple dimensions of calling service data, to be monitored from the level of characteristic dimension to the exception call under the business scenario, be conducive to save a large amount of time and system resource.

Description

Business data processing method, device, electronic equipment and medium
Technical field
This specification embodiment is related to Internet technical field more particularly to a kind of business data processing method, device, electricity Sub- equipment and medium.
Background technique
With the development of internet technology, for providing the industry that various businesses are such as done shopping, pay and leased for user Business system is come into being.For the normal operation and guarantee service quality of maintenance service system, the business to operation system is needed Disposition is monitored, in order to alert when there is exception call and be repaired accordingly in time to abnormal position It is multiple.However, with the type of business and number of users rapid growth of operation system, business processing log recording is also increasingly It is more, investigation analysis one by one is carried out by the log recording to operation system to monitor exception call, data volume right and wrong to be processed It is often huge, need to consume a large amount of time and system resource.
Summary of the invention
This specification embodiment provides a kind of business data processing method, device, electronic equipment and medium.
In a first aspect, this specification embodiment provides a kind of business data processing method, comprising: obtain pre-set business field Multiple calling service data under scape, each calling service data include the supplemental characteristic of multiple dimensions;Based on preparatory instruction The clustering algorithm perfected clusters the multiple calling service data, obtains target cluster result;It is poly- for the target Each class cluster in class result, based on supplemental characteristic of each calling service data under each dimension in the class cluster, from institute State the characteristic dimension that the class cluster is determined in multiple dimensions, wherein the characteristic dimension of each class cluster is used for described pre- If the exception call under business scenario is monitored.
Second aspect, this specification embodiment provide a kind of service data processing apparatus, comprising: data acquisition module, For obtaining multiple calling service data under pre-set business scene, each calling service data include the ginseng of multiple dimensions Number data;Cluster module is obtained for being clustered based on preparatory trained clustering algorithm to the multiple calling service data To target cluster result;Dimension determining module, for being based on the class cluster for each class cluster in the target cluster result In supplemental characteristic of each calling service data under each dimension, the feature of the class cluster is determined from the multiple dimension Dimension, wherein the characteristic dimension of each class cluster is for being monitored the exception call under the pre-set business scene.
The third aspect, this specification embodiment provide a kind of electronic equipment, comprising: memory, processor and are stored in On memory and the computer program that can run on a processor, the processor realize above-mentioned first party when executing described program The step of business data processing method that face provides.
Fourth aspect, this specification embodiment provide a kind of computer readable storage medium, are stored thereon with computer The step of program, which realizes the business data processing method that above-mentioned first aspect provides when being executed by processor.
This specification embodiment has the beneficial effect that:
The business data processing method that this specification embodiment provides, by preparatory trained clustering algorithm to default industry Multiple calling service data under business scene are clustered, and obtain target cluster result, and then in target cluster result Each class cluster, the supplemental characteristic based on calling service data each in such cluster under each dimension, from calling service data The characteristic dimension of such cluster is determined in multiple dimensions, and then can be obtained by the characteristic dimension of each class cluster.It is calculated by cluster Method obtains the characteristic dimension for calling unique calling form of data for characterizing similar traffic, extracts industry compared to rule of thumb The characteristic dimension of data is called in business, is conducive to improve accuracy and efficiency that characteristic dimension is extracted, when without consuming a large amount of Between characteristic dimension adjusted and verified repeatedly, reduce resource consumption of the system in characteristic dimension extraction.
Further, so that it may according to the characteristic dimension of each class cluster to the exception call under above-mentioned pre-set business scene into Row monitoring.The exception call of operation system is monitored in this way from the level of characteristic dimension, is adjusted compared to analysis one by one With log, a large amount of time and system resource can be saved.
Detailed description of the invention
Fig. 1 is a kind of running environment schematic diagram suitable for this specification embodiment;
Fig. 2 is a kind of flow chart for business data processing method that this specification embodiment first aspect provides;
Fig. 3 is a kind of module frame chart for service data processing apparatus that this specification embodiment second aspect provides;
Fig. 4 is the structural schematic diagram for a kind of electronic equipment that this specification embodiment third aspect provides.
Specific embodiment
In view of existing business calls monitoring method to need to carry out investigation analysis one by one to the log recording of operation system, The data volume of processing is very huge, needs to consume a large amount of time and resource, is also unfavorable for carrying out abnormality alarming in time.This explanation For book embodiment by obtaining multiple calling service data under pre-set business scene, each calling service data include multiple dimensions Supplemental characteristic, multiple calling service data are clustered by preparatory trained clustering algorithm, obtain target cluster knot Fruit, and then for each class cluster in target cluster result, based on calling service data each in class cluster under each dimension Supplemental characteristic determines the characteristic dimension of such cluster from above-mentioned multiple dimensions.The characteristic dimension that each class cluster determines can be with As a calling service example under the business scenario, for being monitored to the exception call under the business scenario.
In this specification embodiment, the calling service data that operation system acquires are learnt by clustering algorithm, are mentioned It takes out and calls the characteristic dimension i.e. calling service example of unique calling form of data for characterizing similar traffic, so as into one Step is monitored the business processing of operation system according to characteristic dimension, and appearance can also be positioned according to characteristic dimension The position of exception call.
For example, in a kind of application scenarios, for the characteristic dimension that certain a kind of cluster obtained by clustering algorithm extracts, including Dimension W1, dimension W2 and dimension W3.At this point, according to features described above dimension, that is, dimension W1, dimension W2 and dimension W3 to exception call The implementation process being monitored may include:
The multiple groups regular traffic under business scenario is obtained in advance and calls data, and it includes multiple that every group of regular traffic, which calls data, Calling service data are analyzed above-mentioned every group of regular traffic respectively and are called in data, the corresponding parameter distribution data of dimension W1, dimension The corresponding parameter distribution data of W2 and the corresponding parameter distribution data of dimension W3.For example, dimension W3 is corresponding to the means of payment Dimension, the corresponding supplemental characteristic of dimension W3 includes the means of payment 1, the means of payment 2 and the means of payment 3, in a certain group of normal industry Business is called in data, and the calling service data corresponding to the means of payment 1 have a, the calling service data corresponding to the means of payment 2 There are b, the calling service data corresponding to the means of payment 3 there are c, calculate separately the calling service data number of every kind of means of payment It measures and calls the accounting in data in this group of regular traffic, using the accounting of every kind of means of payment as the corresponding parameter distribution of dimension W2 Data.
The distributed data that every group of regular traffic calls the corresponding every kind of supplemental characteristic of features described above dimension in data is counted, is obtained To the corresponding normal accounting threshold range of features described above dimension.If above-mentioned in calling service data to be analyzed under the business scenario The corresponding parameter distribution data of characteristic dimension are unsatisfactory for above-mentioned normal accounting threshold range, it is possible to determine that the industry under the business scenario It is abnormal that business calls calling of the data in this feature dimension to occur.
Log is called compared to analysis one by one, is obtained by the business data processing method that this specification embodiment provides Characteristic dimension carrys out the exception call of monitoring business system, carries out abnormal point to calling service data from the level of characteristic dimension Analysis, can save a large amount of time and system resource.Also, further abnormal position can be located in characteristic dimension, favorably In the range for reducing investigation problem, the resource consumption of computer is reduced.
Fig. 1 shows a kind of running environment signal of business data processing method provided suitable for this specification embodiment Figure.As shown in Figure 1, the business data processing method that this specification embodiment provides can be applied to include that multiple servers are In system framework.Part server in above-mentioned multiple servers can be used as operation system and carry out specific business processing;It is another Part server can be used as monitoring server, will for executing the business data processing method of this specification embodiment offer It is calling service example that the similar traffic of operation system, which calls data abstraction, further to pass through calling service example to business system The business processing of system is monitored extremely.
Wherein, server can be an electronic equipment with data operation, store function and network interaction function; Or run in the electronic equipment, the software of support is provided for data processing, storage and network interaction.In the present embodiment The quantity of above-mentioned server is not limited specifically.The server can be a server, can also be several servers, or Person, the server cluster formed by several servers.
In order to better understand the business data processing method of this specification embodiment offer, below by attached drawing and tool Body embodiment is described in detail the technical solution of this specification embodiment, it should be understood that this specification embodiment and implementation Specific features in example are the detailed description to this specification embodiment technical solution, rather than to this specification technical solution Restriction, in the absence of conflict, the technical characteristic in this specification embodiment and embodiment can be combined with each other.It needs Illustrate, in this specification embodiment, " multiple " refer to " two or more ", and term " two or more " includes two or is greater than Two situations.
In a first aspect, this specification embodiment provides a kind of business data processing method, the business data processing method It can be executed by above-mentioned monitoring server.As shown in Fig. 2, the method at least may comprise steps of S200- step S204.
Step S200 obtains multiple calling service data under pre-set business scene, each calling service data packet Include the supplemental characteristic of multiple dimensions.
In the present embodiment, operation system can provide multiple interfaces, by calling these interfaces, can execute corresponding Business.The interface of operation system receives the processing request of called side initiation all the time to execute corresponding business, but every time Some interface is called, calling service data all may be different.Above-mentioned calling service data specifically can be understood as characterizing industry Supplemental characteristic involved by the calling process of business Data processing, including required parameter, return parameters and description were called The data of the information of specific call flow in journey.By above-mentioned calling service data can it is more complete, clearly restore business The specific calling process that Data processing is related to.
Each calling service data include the supplemental characteristic of multiple dimensions, for example, calling service data may include: to connect Mouth, interface requests parameter, interface return parameters, request magnitude, the directed acyclic structure of internal system node, calling it is upper and lower Trip system and deployment unit etc..Wherein, the directed acyclic structure of above system internal node specifically can be understood as call flow In the calling sequence of the calling that is related to called functional module in call flow.Above-mentioned up-stream system can specifically be managed Solution is any called side once specifically called in calling process.Above-mentioned down-stream system specifically can be understood as in calling process Any called side once specifically called.Above-mentioned deployment unit can specifically refer to the calling being related in call flow and be adjusted The structural unit or functional module that functional module is deployed in.
It should be noted that above-mentioned cited dimension is intended merely to that this specification embodiment is better described.Specifically When implementation, dimension of the other kinds of calling service parameter as above-mentioned calling service data may be incorporated into as the case may be Degree, this specification are not construed as limiting this.
In practical application scene, log can be called to be acquired by the interface to operation system and parsed, obtained Corresponding calling service data are called under corresponding service scene every time.Specifically, it can be pre-configured with a period, obtained Multiple calling service data in the preset time period, as the sample data for extracting characteristic dimension.The preset time period It can be arranged according to practical application scene and processing requirement, for example, can be set to a nearest period, it is such as previous small When, first 10 minutes or first 1 minute etc., this specification embodiment does not limit this.
In the present embodiment, above-mentioned pre-set business scene can be a business scenario, alternatively, also may include multiple and different Business scenario.Wherein, business scenario specifically can be understood as business processing corresponding to calling process.In a kind of application scenarios In, operation system can distinguish different business scenarios according to Apply Names (Appname) and interface.Apply Names are different And/or interface number difference is different business scenario, the calling process of different business scene corresponds to different business processings, The dimension that corresponding calling service data include is also just different.
For example, certain network platform operation system includes two kinds of business scenarios, and one of business scenario is corresponding to answer With entitled A1, interface B1, the corresponding Apply Names of another business scenario are A2, and interface is also B1.Wherein, table 1 is shown M calling service data of the application A1 at interface B1, each calling service data include q dimension.In the business scenario Under each calling service data in, XijFor indicating the supplemental characteristic under respective dimensions.Wherein, i takes 1 to whole between q Number, j take 1 to the integer between M.For example, certain dimension be " place city ", corresponding supplemental characteristic be then city name either For indicating the feature coding etc. of city name.Similarly, table 2 shows N number of calling service data using A2 at interface B1, Each calling service data include p dimension, in each calling service data under the business scenario, YghFor indicating in phase Answer the supplemental characteristic under dimension.Wherein, g takes 1 to the integer between p, and h takes 1 to the integer between N.It is understood that two Between different business scene, in multiple dimensions that calling service data include, at least one dimension is different, and calling service number According to comprising dimension number, that is, q and p may be the same or different.
Table 1
Table 2
Step S202 clusters the multiple calling service data based on preparatory trained clustering algorithm, obtains Target cluster result.
It is understood that clustering algorithm is the algorithm classified based on the similitude between sample to sample.Yu Ben In one embodiment of specification, preparatory trained clustering algorithm can be the algorithm clustered based on the distance between sample. At this point, the similitude between the calling service data to be learnt can be got over the distance between calling service data characterization, distance Small, similitude is higher.
Since each dimension that calling service data include is parameter such as required parameter, the return that calling service process is related to Parameter etc. or calling structure as above, down-stream system, deployment unit etc., therefore, it is necessary to by being respectively compared two calling service numbers It is whether identical according to the supplemental characteristic under with dimension, to calculate the distance between two calling service data.It can as one kind The embodiment of choosing, what is clustered by above-mentioned trained clustering algorithm to the calling service data that step S200 is obtained In the process, the distance between any two calling service data can be obtained by following steps: two calling service numbers of detection It is whether identical according to the supplemental characteristic under with dimension, obtain the testing result of each dimension;Detection knot based on each dimension Fruit obtains the distance between the two calling service data.
It is understood that the dimension that the calling service data under same business scenario include is identical.Calculating two industry When the distance between data are called in business, need to judge for each dimension two calling service data supplemental characteristic whether phase Together, as in a certain business scenario, one of dimension of calling service data is the city where when user initiates to call, if The supplemental characteristic of two calling service data is " Chengdu " under the dimension, then it represents that the two calling service data under the dimension Supplemental characteristic it is identical.
Specifically, the above-mentioned testing result based on each dimension, obtains the distance between the two calling service data Implementation process may include: obtain the business scenario under calling service data include dimension sum, then according to each dimension The testing result of degree obtains the number of dimensions that supplemental characteristic is different between two calling service data, and supplemental characteristic is different Accounting of the number of dimensions in above-mentioned dimension sum, as the distance between the two calling service data.Wherein, distance is big In or equal to 0 and be less than or equal to 1 value.Certainly, it in this specification other embodiments, can also calculate in other ways The distance between to any two calling service data, herein with no restriction.
As another embodiment, the similitude between calling service data can also use similarity characterization, similarity Bigger, then the similarity degree between calling service data is higher.At this point it is possible to obtain two business tune according to above-mentioned testing result With the identical number of dimensions of supplemental characteristic between data, by the identical number of dimensions of supplemental characteristic accounting in above-mentioned dimension sum Than as the similarity between two calling service data.Similarity is also the value more than or equal to 0 and less than or equal to 1.
For example, it is assumed that the calling service data under the business scenario include the supplemental characteristic of 10 dimensions, are being calculated Similarity between two calling service data or apart from when, between the two calling service data, there is the parameter under 3 dimensions Data are identical, and supplemental characteristic under 7 dimensions is different, then the similarity between the two calling service data can be with are as follows: 3/10 =0.3, distance can be 0.7.
In above process, the supplemental characteristic recorded between two calling service data under which dimension for convenience is identical And the supplemental characteristic under which dimension is different, it can be by the preset characteristic value of above-mentioned testing result.For example, in one kind In application scenarios, it is assumed that the supplemental characteristic of two calling service data is identical as up-stream system is identical under some dimension, then can be with Testing result under the dimension is denoted as 1, it is on the contrary then be denoted as 0.At this point, the dimension number that testing result is 1 is two business The identical number of dimensions of supplemental characteristic between data is called, the dimension number that testing result is 0 is two calling service data Between the different number of dimensions of supplemental characteristic.
Specifically, in above-mentioned steps S202, density-based algorithms can be used, such as DBSCAN (Density- Based Spatial Clustering of Applications with Noise, has noisy density clustering Method) algorithm, which can be cluster by the region division with sufficient density, and send out in having noisy spatial database The cluster of existing arbitrary shape.Certainly, in this specification other embodiments, other clustering algorithms such as kmeans, layer can also be used Secondary cluster, manifold cluster etc..
In addition, needing to be previously obtained trained clustering algorithm before executing above-mentioned steps S202.In the present embodiment, The process of training clustering algorithm can specifically include parameter training process.Specifically, parameter training process may include following Step S300 and step S302.
Step S300 obtains training sample set, and the training sample set includes multiple calling service data samples;
It should be noted that when being directed to a specific transactions scene progress algorithm training, the industry of training sample concentration Business calls data sample to acquire from the specific transactions scene, the dimension that training sample concentrates all calling service data samples to include It spends identical.In addition, when being directed to multiple specific transactions scenes progress algorithm training simultaneously, the calling service of training sample concentration Data sample is acquired respectively from multiple business scenarios, at this point it is possible to which training sample is concentrated industry of the acquisition from same business scenario Business calls data sample to be divided into a subset, so that the dimension that calling service data sample includes in same subset is identical, from And can be trained respectively for each subset, obtain the corresponding trained clustering algorithm of each business scenario.
Step S302 clusters the training sample set based on preset clustering algorithm, when cluster result is unsatisfactory for When preset polymerization condition, the configuration parameter in the preset clustering algorithm is adjusted according to preset rules, until cluster As a result meet the preset polymerization condition, obtain the trained clustering algorithm.
By taking DBSCAN algorithm as an example, when measuring the similitude between sample using above-mentioned distance, DBSCAN algorithm Configuration parameter includes: radius and minimum neighborhood points.It should be noted that in the mistake being trained to preset clustering algorithm Cheng Zhong, initial radius and minimum neighborhood points can be rule of thumb arranged.At this point, above-mentioned be based on preset clustering algorithm to institute It states training sample set to be clustered, the process for obtaining cluster result may include: to obtain training sample to concentrate any two business Call the distance between data sample;In turn, based on preset configuration parameter, that is, above-mentioned initial radius, above-mentioned minimum neighborhood point The distance between several and above-mentioned any two calling service data sample, determines that training sample concentrates all kernel objects; Obtaining training sample concentrates the direct density of each kernel object up to sample, and the direct density based on each kernel object can Up to sample, cluster result is obtained.
For example, in a kind of concrete application scene, it is assumed that radius is expressed as E, as kernel object in E neighborhood Minimum neighborhood points are MinPts.It should be noted that the region in given object radius E is known as the object in the present embodiment E neighborhood.
During being clustered by above-mentioned DBSCAN algorithm, each calling service data sample in sample set is detected Whether it is that the process of kernel object can specifically include: the calling service data sample in sample set is traversed, it will wherein Any one calling service data sample detects in other samples in sample set in addition to target sample as target sample, The distance between target sample is less than or equal to the number of samples of above-mentioned radius E, the as sample in the E neighborhood of target sample Number then determines the target sample for core pair when the number of samples in the E neighborhood of target sample is greater than or equal to MinPts As conversely, then target sample is not kernel object.Then using next calling service data sample as target sample, until Traversal finishes.
After having determined kernel object all in sample set, need to further determine that all in the E neighborhood of kernel object Then direct density finds density up to sample for all direct density in the E neighborhood of all kernel objects up to sample Be connected sample set.Certainly, some density are related to during this up to the merging of sample.It should be noted that given one A sample set D, and xiAnd xjD is belonged to, if xiIn xjE neighborhood in, and xjIt is a kernel object, then we say sample xiFrom sample xjSet out is that direct density is reachable.It is the reachable transitive closure of direct density that density is reachable, and this relationship is Asymmetrical, mutual density is reachable only between kernel object.And it is symmetric relation that density, which is connected, the purpose of DBSCAN algorithm is Find the maximum set of the connected object of density.
For example, sample is concentrated with 12 kernel objects, respectively indicate are as follows: P1~P12.Wherein, P2 can by the direct density of P1 It reaches, P3 is reachable by the direct density of P2, and P4 is reachable by the direct density of P3, and P5 is reachable by the direct density of P4, and P6 can by the direct density of P5 It reaches, P7 is reachable by the direct density of P6;P9 is reachable by the direct density of P8, and P10 is reachable by the direct density of P9, and P11 is by the direct density of P10 Reachable, P12 is reachable by the direct density of P11.At this point, can be obtained by the connected sample set of two density, specially by P1 to P7 And the direct density achievable pair of P1 each kernel object into P7 as merge into a density be connected sample set, by P1 to P7 And the direct density achievable pair of P1 each kernel object into P7 is as merging into the connected sample set of another density.It is each close Spending the sample set that is connected is a class cluster in cluster result.
In the training process, after each iteration obtains cluster result, need to judge whether cluster result meets preset polymerization Condition, when cluster result is unsatisfactory for preset polymerization condition, according to preset rules to the configuration in the preset clustering algorithm Parameter is adjusted, based on the entrance algorithm iteration process next time of configuration parameter after adjustment, until cluster result meet it is default Polymerizing condition.
In the present embodiment, preset polymerization condition can be specifically arranged according to practical application request.For example, can be according to quilt Accounting of the calling service data sample number of cluster in total number of samples amount requires setting, correspondingly, can also be according to poly- The accounting of noise spot requires setting in class result, and the accounting of noise spot is the calling service data being not included in any class cluster Accounting of the number of samples in the total number of samples amount of training sample set.In another example apart from the requirements above, preset polymerization condition Such as class cluster number that cluster result includes can also be judged whether preset comprising the requirement of the class cluster number generated to cluster Within the scope of number etc..
In one embodiment, above-mentioned parameter adjustment process may include: the business tune obtained in the cluster result With accounting of the data sample quantity in the total number of samples amount of the training sample set;When the accounting is less than the first preset threshold When, above-mentioned configuration parameter is adjusted according to preset rules, based on the clustering algorithm after adjustment configuration parameter to training sample Collection is clustered, until obtaining trained clustering algorithm when accounting is greater than or equal to first preset threshold.
In above-mentioned implementation process, the calling service data sample quantity in cluster result is each class in cluster result The sum of the sample size that cluster includes, obtained accounting are able to reflect the extent of polymerization of clustering algorithm under corresponding configuration parameter.The One preset threshold can be arranged according to practical application scene and process demand, for example, can be set to 0.9 or 0.95 etc..It lifts For example, it is assumed that training sample set includes 1000 samples, and the sum of sample size that each class cluster includes in cluster result is 910 It is a, then accounting are as follows: 910/1000=0.91 shows that the cluster result meets preset polymerization item if the first preset threshold is 0.9 Part.
In addition, specifically can wrap according to preset rules to the process that configuration parameter is adjusted in above-mentioned training process It includes: above-mentioned radius and/or minimum neighborhood points being adjusted according to preset rules, until obtaining radius of target and target minimum Neighborhood points, so that the cluster result of current iteration meets preset polymerization condition.Specifically, preset rules can be according to reality The corresponding preset polymerization condition of application scenarios and test of many times setting.Each of it is understood that radius becomes larger, then generate The calling service data sample for including in class cluster will become more, and the cluster number being polymerized to accordingly will tail off, and vice versa.And MinPts becomes smaller, then can form more clusters, vice versa.For example, the first step-length and the second step-length can be respectively set, the One step-length is the adjusting step-length of radius, the adjusting step-length of a length of minimum neighborhood points of second step.In above-mentioned implementation process, when poly- The sum of the calling service data sample quantity that each class cluster includes in class result accounting in the total number of samples amount of training sample set It, can be on the basis of current radius and minimum neighborhood points, according to the first step-length pair radius when than less than the first preset threshold It is adjusted and/or minimum neighborhood points is adjusted according to the second step-length, such as can reduce radius and/or reduce minimum adjacent Domain points, specific adjustment rule are arranged according to the preset polymerization condition and test of many times of setting.
It should be noted that above-mentioned preset clustering algorithm, which can also use, to be passed through in this specification other embodiments Above-mentioned similarity measures the algorithm of the similitude between two samples.For example, the configuration of above-mentioned DBSCAN algorithm can be joined Number is set as similarity threshold and smallest sample number.Correspondingly, according to practical application scene set similarity threshold with And after smallest sample number, when the number of samples that the similarity between target sample is greater than or equal to above-mentioned similarity threshold is big When the smallest sample number, it is determined that the target sample is kernel object, and the direct density of kernel object is reachable Sample is the sample for being greater than or equal to above-mentioned similarity threshold with the similarity of the kernel object.
Optionally, in order to guarantee the stability of obtained clustering algorithm, after completing parameter training process, training is poly- The process of class algorithm can also include test of heuristics process.
As an implementation, test of heuristics process may include: the test sample obtained in preset test period Collection, test sample collection also includes multiple calling service data samples;Test sample collection input above-mentioned steps S302 training is obtained Clustering algorithm, obtain test cluster result;Judge whether the test cluster result meets default test condition, when the survey When examination cluster result meets default test condition, then determine that the clustering algorithm that above-mentioned steps S302 training obtains is trained gathers Class algorithm.
During above-mentioned test of heuristics, test period can be arranged according to practical application scene and process demand.For example, Test period can be set to S302 through the above steps to complete one day, two days or three days etc. after parameter training.In one kind In embodiment, multiple special time periods can be set within test period, for example, when test period is three days, Ke Yishe Morning 10:00 to 11:00,17:00 to 18:00 in afternoon and evening 21:00 to 22:00 daily in this three days is set to be set as Special time period, at this point, the test sample collection in above-mentioned acquisition preset test period is specially to obtain preset test period Calling service data in interior special time period form test sample collection;It correspondingly, can be successively by acquired test week Sample in each special time period of interim every day inputs the clustering algorithm that above-mentioned steps S302 training obtains respectively, obtains Corresponding test cluster result, and then judge whether the test cluster result obtained in test period meets default test condition.
Specifically, default test condition can be arranged according to practical application scene and process demand.Such as, it can be determined that The class cluster number that test cluster result includes whether with the class that meets the cluster result of preset polymerization condition in step S302 and include Cluster number is consistent, and when consistent, discriminating test cluster result meets default test condition, when there is inconsistency, discriminating test cluster As a result it is unsatisfactory for default test condition.Either, it when preset test period being provided with multiple special time periods, can calculate The degree of consistency of the corresponding test cluster result of test sample in test period in all special time periods, when consistency journey When degree reaches preset condition for consistence, indicates that the clustering algorithm of step S302 training meets stability requirement, then can be determined that The clustering algorithm that above-mentioned steps S302 training obtains is trained clustering algorithm.
Wherein, the degree of consistency can be according to the class cluster that all test cluster results obtained within test period include Number distribution determines.For example, the identical maximum number for testing cluster result of class cluster number and total test cluster result can be used The ratio of number carrys out the degree of consistency of characterization test cluster result, when the ratio is more than preset consistency threshold value, then sentences Determine the degree of consistency and reaches preset condition for consistence.For example, one 30 tests cluster knot is obtained within test period Fruit, wherein the class cluster number that 28 test cluster results include is 10, the class cluster number that 2 test cluster results include is 9 It is a, at this point, the degree of consistency of test cluster result are as follows: 28/30=0.933, it is assumed that preset consistency threshold value is 0.9, then Indicate that the clustering algorithm of step S302 training meets stability requirement.
It should be noted that test sample can be concentrated and be wrapped when test cluster result is unsatisfactory for default test condition The calling service data contained are added to training sample and concentrate as new training sample set, and adjust the configuration ginseng of clustering algorithm Number, repeats above-mentioned parameter training process and test process, until configuration parameter adjusted makes the poly- of clustering algorithm Class result meets above-mentioned preset polymerization condition, and tests cluster result and meet default test condition.
Further, so that it may by trained clustering algorithm to multiple calling service data under corresponding service scene It is clustered, obtains target cluster result.And then following steps S204 is executed for target cluster result, extract each class cluster pair The characteristic dimension answered, as the calling service example under the business scenario.
Step S204, for each class cluster in the target cluster result, based on each calling service in the class cluster Supplemental characteristic of the data under each dimension determines the characteristic dimension of the class cluster, wherein described from the multiple dimension The characteristic dimension of each class cluster is for being monitored the exception call under the pre-set business scene.
It include more than two class clusters by the target cluster result that above-mentioned trained clustering algorithm obtains, each class cluster is For a calling service data acquisition system with certain similitude.In turn, so that it may by comparing business tune each in the set With the supplemental characteristic under each dimension of data, the feature dimensions for being able to reflect the similitude of the calling service data acquisition system are obtained Degree.This feature dimension is the polymerization dimension of the corresponding calling service data acquisition system of respective class cluster.
In the present embodiment, each class cluster corresponds to one group of characteristic dimension, for example, it is assumed that target cluster result includes 5 classes Cluster can then correspond to obtain 5 groups of characteristic dimensions.Wherein, one group of characteristic dimension may include more than two dimensions.For example, one In kind application scenarios, a certain group of characteristic dimension includes four dimensions, is followed successively by " city & product code & order scene & payer Method ", wherein " & " indicates the combination of this four dimensions.It should be noted that in this specification other embodiments, characteristic dimension It is also possible to a dimension.
In above-mentioned steps S204, the characteristic dimension of class cluster is determined from multiple dimensions, as determines which type of polymerize Dimension can polymerize to obtain the calling service data acquisition system of such cluster.As an alternative embodiment, above-mentioned from multiple dimensions The process that the characteristic dimension of class cluster is determined in degree may include: by each calling service data in the comparison class cluster every Supplemental characteristic under a dimension determines that coverage rate is more than the dimension combination of the second preset threshold from the multiple dimension, described Dimension combination includes more than one dimension in the multiple dimension;It is more than the dimension of the second preset threshold based on the coverage rate Combination, obtains the characteristic dimension.Wherein, the coverage rate of dimension combination has identical ginseng for being characterized under dimension combination Accounting of the calling service data in such cluster of number data, it can by calculating in such cluster under dimension combination, have The ratio between calling service data count that the calling service data amount check and such cluster of identical parameters data include obtains.
Specifically, the second preset threshold can be set according to actual needs, such as can be set to 0.8 or 0.9 etc..With For second preset threshold is 0.9, then need for each class cluster, first the determining coverage rate in such cluster is more than 0.9 dimension Combination, i.e., there are supplemental characteristic of 90% or more the calling service data in dimension combination under each dimension is right in such cluster It answers identical.By taking dimension combines " city & product code & order scene & method of payment " as an example, if dimension combination is in certain class cluster Coverage rate is more than 0.9, then illustrates in the calling service data for having 90% or more in such cluster to include identical city, is identical Product code, identical order scene and identical method of payment.
In an embodiment of the present embodiment, it is more than the dimension combination of the second preset threshold based on coverage rate, obtains The implementation process of characteristic dimension may include: compare coverage rate be more than the second preset threshold dimension combination in, each dimension group The dimension number that conjunction includes, the dimension for including using the largest number of dimension combinations of dimension is as characteristic dimension.For example, there are 5 Dimension combined covering rate is more than the second preset threshold, and the number of dimensions that this 5 dimension combinations include is respectively 1,1,2,2 and 4, then It regard the dimension combination of the data comprising 4 dimensions as characteristic dimension.It should be noted that coverage rate is more than the second preset threshold Dimension combination in, it is identical there are the dimension number of two dimensions combination and when being dimension the largest number of dimensions combination, will The dimension that the maximum dimension combination of coverage rate includes is as characteristic dimension, either, by any one in the combination of the two dimensions The dimension that dimension combination includes is as characteristic dimension.
In addition, key dimension set, key dimension set packet can also be preset in this specification other embodiments A variety of specified calling service parameters are included, such as may include " place city " and " method of payment ".It is more than the from coverage rate Two preset thresholds and at least one corresponding dimension of calling service parameter combines in determinant attribute set, select dimension number The dimension that most dimension combinations includes is as features described above dimension.
After S204 obtains the corresponding one group of characteristic dimension of each class cluster through the above steps, so that it may according to every group of feature Dimension is monitored the exception call in business procession.
In one optional embodiment of this specification, class cluster is determined in above-mentioned multiple dimensions from calling service data After characteristic dimension, this business data processing method can also include exception call monitoring step, processed for monitoring business Exception call in journey.Specifically, exception call monitoring step may include: the characteristic dimension based on each class cluster, identification Exception call data under the pre-set business scene in calling service data to be analyzed.That is, to certain business field Calling service data under scape carry out clustering processing, and after extracting characteristic dimension, can use obtained characteristic dimension To identify calling service data to be analyzed under the business scenario with the presence or absence of abnormal.From characteristic dimension, that is, calling service example In level, abnormal monitoring is carried out to the business procession of operation system, is conducive to improve monitoring efficiency, can be saved a large amount of Time and system resource.And it is possible to which further abnormal position is located in characteristic dimension, be conducive to reduce investigation problem Range reduces the resource consumption of computer.
In the present embodiment, the calling service to be analyzed under pre-set business scene is identified using obtained characteristic dimension Data with the presence or absence of abnormal embodiment can there are many.It is set forth below two kinds therein to be introduced, certainly, specific real During applying, it is not limited to following two situation.
The first, can be directed to the characteristic dimension of each class cluster, execute following detection process: determining that characteristic dimension is corresponding Anomaly parameter data;Obtain the frequency of occurrence of above-mentioned anomaly parameter data in calling service data to be analyzed, wherein frequency occur It is secondary to be used to characterize in calling service data to be analyzed, it include the quantity of the calling service data of above-mentioned anomaly parameter data; It then will include the calling service data identification of the anomaly parameter data when the frequency of occurrence is more than third predetermined threshold value For exception call data.For instance it can be possible that exception is called as caused by the attack of Hei Chan clique, thus in time to abnormal conditions It is handled.And it is possible to which abnormal position is further navigated to the specific dimension that this feature dimension includes, be conducive to reduce row Interrogate the range of topic.
Wherein, calling service data to be analyzed can be the calling service data in period specified time, specify Time cycle can be set according to actual needs, such as can be a hour, one day or 7 days etc..For example, with characteristic dimension packet For including " city & product code & order scene & method of payment ", it is assumed that wherein calling service data are deposited under " method of payment " dimension In many kinds of parameters data, one of supplemental characteristic is error message, as anomaly parameter data, calling service number to be analyzed According to a shared S, if there is W calling service data packet to contain the anomaly parameter data in this S data, then it represents that in S number In, the corresponding anomaly parameter data of features described above dimension occur W times.At this point, the frequency of occurrence of the anomaly parameter data can W is thought, alternatively, being also possible to W/S.
Third predetermined threshold value can be obtained according to specific business scenario and test of many times, as under normal circumstances, be somebody's turn to do The frequency threshold value that anomaly parameter data occur.It is understood that different group characteristic dimensions, i.e., the spy extracted based on inhomogeneity cluster The corresponding third predetermined threshold value of sign dimension can be set to difference.For example, the anomaly parameter data under certain characteristic dimension are normal In the case of daily frequency of occurrence be up to 100 times, then can set the corresponding third predetermined threshold value of this feature dimension to most 10 times of high frequency of occurrence are 1000 times.If the frequency of occurrence of the anomaly parameter data under certain day this feature dimension is more than 1000 times When, then it will include that the calling service data of the anomaly parameter data are identified as exception.
The third, can be with after obtaining the characteristic dimension under the business scenario in S200 through the above steps to step S204 A regular instance library is constructed for every group of characteristic dimension by way of mark, i.e., executes following mistake for every group of characteristic dimension Journey: it is added in the corresponding supplemental characteristic of characteristic dimension labeled as normal supplemental characteristic in one regular instance library of building in advance Into the regular instance library, for example, it is assumed that characteristic dimension includes " city & product code & order scene & method of payment ", then it will mark It is set to normal specific city, product code, order scene and means of payment correspondence to be added in above-mentioned regular instance library.Building A certain group of feature dimensions under the good business scenario behind the corresponding regular instance library of every group of characteristic dimension, under through the business scenario Degree, when judging calling service data to be analyzed with the presence or absence of exception, it can be determined that calling service data to be analyzed are in the spy The supplemental characteristic under dimension is levied whether in the corresponding regular instance library of this feature dimension, if not corresponding just in this feature dimension In normal case library, then it is abnormal to determine that the calling service data exist, and abnormal position is located at this feature dimension, conversely, then not There are exceptions.
The business data processing method that this specification embodiment provides, by preparatory trained clustering algorithm to default industry Calling service data under business scene are learnt, and the spy for calling unique calling form of data for characterizing similar traffic is obtained Dimension is levied, compared to extraction feature dimension is rule of thumb carried out, is conducive to improve accuracy and efficiency that characteristic dimension is extracted, nothing The a large amount of time need to be consumed characteristic dimension is adjusted and be verified repeatedly, reduce money of the system in characteristic dimension extraction Source consumption.And the characteristic dimension that this method can be suitable for Added Business scene is extracted, and scalability is strong.
Further, the characteristic dimension determined can be used in supervising the exception call under above-mentioned pre-set business scene Control, is monitored the exception call of operation system from the level of characteristic dimension, calls log, energy compared to analysis one by one Enough save a large amount of time and system resource.Also, further abnormal position can be located in characteristic dimension, be conducive to reduce The range of investigation problem reduces resource consumption of the system in abnormal position positioning.
Second aspect, based on the same invention structure of the business data processing method that is provided with aforementioned first aspect embodiment Think, this specification embodiment additionally provides a kind of service data processing apparatus.Fig. 3 is referred to, the service data processing apparatus 30 Include:
Data acquisition module 31, for obtaining multiple calling service data under pre-set business scene, each business Calling data includes the supplemental characteristic of multiple dimensions;
Cluster module 32, for being gathered based on preparatory trained clustering algorithm to the multiple calling service data Class obtains target cluster result;
Dimension determining module 33, it is every in the class cluster for being based on for each class cluster in the target cluster result Supplemental characteristic of a calling service data under each dimension, determines the feature dimensions of the class cluster from the multiple dimension Degree, wherein the characteristic dimension of each class cluster is for being monitored the exception call under the pre-set business scene.
As a kind of optional embodiment, the clustering algorithm is to be clustered based on the distance between calling service data Algorithm, the distance between calling service data described in any two are obtained by following steps:
It whether identical detects supplemental characteristic of two calling service data under with dimension, obtains the detection of each dimension As a result;
Based on the testing result of each dimension, the distance between described two calling service data are obtained.
As a kind of optional embodiment, above-mentioned service data processing apparatus 30 further includes training module.The trained mould Block includes:
Sample acquisition submodule, for obtaining training sample set, the training sample set includes multiple calling service data Sample;
Submodule is clustered, for clustering based on preset clustering algorithm to the training sample set;
Parameter adjusting submodule, for when cluster result is unsatisfactory for preset polymerization condition, according to preset rules to described Configuration parameter in preset clustering algorithm is adjusted, until the obtained cluster result of cluster submodule meet it is described pre- If polymerizing condition, the trained clustering algorithm is obtained.
As a kind of optional embodiment, the cluster submodule is used for:
It obtains the training sample and concentrates the distance between any two calling service data sample;
Based on the distance between preset configuration parameter and any two calling service data sample, determine described in Training sample concentrates all kernel objects;
Obtaining the training sample concentrates the direct density of each kernel object up to sample, and is based on each core The direct density of object obtains cluster result up to sample.
As a kind of optional embodiment, the parameter adjusting submodule is used for:
The calling service data sample quantity in the cluster result is obtained in the total number of samples amount of the training sample set In accounting;
When the accounting is less than the first preset threshold, the configuration parameter is adjusted according to the preset rules, The training sample set is clustered based on the clustering algorithm after adjustment configuration parameter, until the accounting is greater than or equal to institute When stating the first preset threshold, the trained clustering algorithm is obtained.
As a kind of optional embodiment, the dimension determining module 33 includes:
First determines submodule 331, is used for through each calling service data in the comparison class cluster under each dimension Supplemental characteristic, from the multiple dimension determine coverage rate be more than the second preset threshold dimension combination, wherein dimension combine Coverage rate for being characterized under dimension combination, the calling service data with identical parameters data are in the class cluster Accounting, the dimension combination include more than one dimension in the multiple dimension;
Second determines submodule 332, for being more than the dimension combination of the second preset threshold based on the coverage rate, obtains institute State characteristic dimension.
As a kind of optional embodiment, described second determines that submodule 332 is used for: being more than second pre- by the coverage rate If in the dimension combination of threshold value, being used as the characteristic dimension comprising the most dimension combination of dimension.
As a kind of optional embodiment, above-mentioned service data processing apparatus 30 further includes anomalous identification module, is used for: base In the characteristic dimension of each class cluster, the abnormal tune in calling service data to be analyzed under the pre-set business scene is identified Use data.
It should be noted that service data processing apparatus 30 provided by this specification embodiment, wherein modules are held The concrete mode of row operation is described in detail in the embodiment of the method that above-mentioned first aspect provides, and will not do herein Elaborate explanation.
The third aspect, based on inventive concept same as the business data processing method that previous embodiment provides, this explanation Book embodiment additionally provides a kind of electronic equipment, as shown in figure 4, including memory 404, one or more processors 402 and depositing The computer program that can be run on memory 404 and on the processor 402 is stored up, when the processor 402 executes described program The step of realizing the business data processing method that first aspect provides above.
Wherein, in Fig. 4, bus architecture (is represented) with bus 400, and bus 400 may include any number of interconnection Bus and bridge, bus 400 will include the one or more processors represented by processor 402 and what memory 404 represented deposits The various circuits of reservoir link together.Bus 400 can also will peripheral equipment, voltage-stablizer and management circuit etc. it Various other circuits of class link together, and these are all it is known in the art, therefore, no longer carry out further to it herein Description.Bus interface 405 provides interface between bus 400 and receiver 401 and transmitter 403.Receiver 401 and transmitter 403 can be the same element, i.e. transceiver, provide the unit for communicating over a transmission medium with various other devices.Place It manages device 402 and is responsible for management bus 400 and common processing, and memory 404 can be used for storage processor 402 and execute behaviour Used data when making.
It is understood that structure shown in Fig. 4 is only to illustrate, the electronic equipment that this specification embodiment provides can also be wrapped Include than shown in Fig. 4 more perhaps less component or with the configuration different from shown in Fig. 4.Each component shown in Fig. 4 It can be realized using hardware, software, or its combination.
Fourth aspect, based on inventive concept same as the business data processing method provided in previous embodiment, this theory Bright book embodiment additionally provides a kind of computer readable storage medium, is stored thereon with computer program, and the program is by processor The step of business data processing method that first aspect provides above is realized when execution.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can With or may be advantageous.
This specification is referring to the method, equipment (system) and computer program product according to this specification embodiment Flowchart and/or the block diagram describes.It should be understood that can be realized by computer program instructions every in flowchart and/or the block diagram The combination of process and/or box in one process and/or box and flowchart and/or the block diagram.It can provide these computers Processor of the program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices To generate a machine, so that generating use by the instruction that computer or the processor of other programmable data processing devices execute In setting for the function that realization is specified in one or more flows of the flowchart and/or one or more blocks of the block diagram It is standby.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of equipment, the commander equipment realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of this specification has been described, once a person skilled in the art knows basic wounds The property made concept, then additional changes and modifications may be made to these embodiments.So the following claims are intended to be interpreted as includes Preferred embodiment and all change and modification for falling into this specification range.
Obviously, those skilled in the art can carry out various modification and variations without departing from this specification to this specification Spirit and scope.In this way, if these modifications and variations of this specification belong to this specification claim and its equivalent skill Within the scope of art, then this specification is also intended to include these modifications and variations.

Claims (18)

1. a kind of business data processing method, comprising:
Multiple calling service data under pre-set business scene are obtained, each calling service data include the ginseng of multiple dimensions Number data;
The multiple calling service data are clustered based on preparatory trained clustering algorithm, obtain target cluster result;
For each class cluster in the target cluster result, based on each calling service data in the class cluster in each dimension Under supplemental characteristic, the characteristic dimension of the class cluster is determined from the multiple dimension, wherein the feature of each class cluster Dimension is for being monitored the exception call under the pre-set business scene.
2. according to the method described in claim 1, the clustering algorithm is to be gathered based on the distance between calling service data The algorithm of class, the distance between calling service data described in any two are obtained by following steps:
It whether identical detects supplemental characteristic of two calling service data under with dimension, obtains the detection knot of each dimension Fruit;
Based on the testing result of each dimension, the distance between described two calling service data are obtained.
3. according to the method described in claim 1, the trained clustering algorithm in advance is obtained by following steps:
Training sample set is obtained, the training sample set includes multiple calling service data samples;
The training sample set is clustered based on preset clustering algorithm, when cluster result is unsatisfactory for preset polymerization condition When, the configuration parameter in the preset clustering algorithm is adjusted according to preset rules, until described in cluster result satisfaction Preset polymerization condition obtains the trained clustering algorithm.
4. according to the method described in claim 3, described cluster the training sample set based on preset clustering algorithm, Include:
It obtains the training sample and concentrates the distance between any two calling service data sample;
Based on the distance between preset configuration parameter and any two calling service data sample, the training is determined All kernel objects in sample set;
Obtaining the training sample concentrates the direct density of each kernel object up to sample, and is based on each kernel object Direct density up to sample, obtain cluster result.
5. according to the method described in claim 3, described when cluster result is unsatisfactory for preset polymerization condition, according to preset rules Configuration parameter in the preset clustering algorithm is adjusted, until cluster result meets the preset polymerization condition, is obtained To the trained clustering algorithm, comprising:
The calling service data sample quantity in the cluster result is obtained in the total number of samples amount of the training sample set Accounting;
When the accounting is less than the first preset threshold, the configuration parameter is adjusted according to the preset rules, is based on Clustering algorithm after adjustment configuration parameter clusters the training sample set, until the accounting is greater than or equal to described the When one preset threshold, the trained clustering algorithm is obtained.
6. according to the method described in claim 1, it is described based on each calling service data in the class cluster under each dimension Supplemental characteristic, the characteristic dimension of the class cluster is determined from the multiple dimension, comprising:
By comparing supplemental characteristic of each calling service data under each dimension in the class cluster, from the multiple dimension Determine that coverage rate is more than the dimension combination of the second preset threshold, wherein the coverage rate of dimension combination is for being characterized in the dimension Under combination, accounting of the calling service data with identical parameters data in the class cluster, the dimension combination includes described More than one dimension in multiple dimensions;
It is more than the dimension combination of the second preset threshold based on the coverage rate, obtains the characteristic dimension.
7. being obtained according to the method described in claim 6, described combined based on the dimension that the coverage rate is more than the second preset threshold To the characteristic dimension, comprising:
Comparing the coverage rate is more than in the dimension combination of the second preset threshold, and each dimension combines the dimension number for including, will The dimension that the largest number of dimension combinations of dimension include is as the characteristic dimension.
8. according to the method described in claim 1, it is described based on each calling service data in the class cluster under each dimension Supplemental characteristic, after the characteristic dimension for determining the class cluster in the multiple dimension, further includes:
Based on the characteristic dimension of each class cluster, identify in calling service data to be analyzed under the pre-set business scene Exception call data.
9. a kind of service data processing apparatus, comprising:
Data acquisition module, for obtaining multiple calling service data under pre-set business scene, each calling service number According to the supplemental characteristic including multiple dimensions;
Cluster module is obtained for being clustered based on preparatory trained clustering algorithm to the multiple calling service data Target cluster result;
Dimension determining module, for being based on each business in the class cluster for each class cluster in the target cluster result Supplemental characteristic of the data under each dimension is called, the characteristic dimension of the class cluster is determined from the multiple dimension, wherein The characteristic dimension of each class cluster is for being monitored the exception call under the pre-set business scene.
10. device according to claim 9, the clustering algorithm is to be gathered based on the distance between calling service data The algorithm of class, the distance between calling service data described in any two are obtained by following steps:
It whether identical detects supplemental characteristic of two calling service data under with dimension, obtains the detection knot of each dimension Fruit;
Based on the testing result of each dimension, the distance between described two calling service data are obtained.
11. device according to claim 9 further includes training module, the training module includes:
Sample acquisition submodule, for obtaining training sample set, the training sample set includes multiple calling service data samples;
Submodule is clustered, for clustering based on preset clustering algorithm to the training sample set;
Parameter adjusting submodule, for being preset to described according to preset rules when cluster result is unsatisfactory for preset polymerization condition Clustering algorithm in configuration parameter be adjusted, until the obtained cluster result of cluster submodule meet it is described default poly- Conjunction condition obtains the trained clustering algorithm.
12. device according to claim 11, the cluster submodule is used for:
It obtains the training sample and concentrates the distance between any two calling service data sample;
Based on the distance between preset configuration parameter and any two calling service data sample, the training is determined All kernel objects in sample set;
Obtaining the training sample concentrates the direct density of each kernel object up to sample, and is based on each kernel object Direct density up to sample, obtain cluster result.
13. device according to claim 11, the parameter adjusting submodule is used for:
The calling service data sample quantity in the cluster result is obtained in the total number of samples amount of the training sample set Accounting;
When the accounting is less than the first preset threshold, the configuration parameter is adjusted according to the preset rules, is based on Clustering algorithm after adjustment configuration parameter clusters the training sample set, until the accounting is greater than or equal to described the When one preset threshold, the trained clustering algorithm is obtained.
14. device according to claim 9, the dimension determining module include:
First determines submodule, for by comparing parameter number of each calling service data under each dimension in the class cluster According to determining coverage rate is more than the dimension combination of the second preset threshold from the multiple dimension, wherein the coverage rate of dimension combination For being characterized under the dimension combination, accounting of the calling service data with identical parameters data in the class cluster, institute Stating dimension combination includes more than one dimension in the multiple dimension;
Second determines submodule, for being more than the dimension combination of the second preset threshold based on the coverage rate, obtains the feature Dimension.
15. device according to claim 14, described second determines that submodule is used for:
Comparing the coverage rate is more than in the dimension combination of the second preset threshold, and each dimension combines the dimension number for including, will The dimension that the largest number of dimension combinations of dimension include is as the characteristic dimension.
16. device according to claim 9 further includes anomalous identification module, is used for:
Based on the characteristic dimension of each class cluster, identify in calling service data to be analyzed under the pre-set business scene Exception call data.
17. a kind of electronic equipment, comprising: memory, processor and storage are on a memory and the meter that can run on a processor The step of calculation machine program, the processor realizes any one of claim 1-8 the method when executing described program.
18. a kind of computer readable storage medium, is stored thereon with computer program, power is realized when which is executed by processor Benefit requires the step of any one of 1-8 the method.
CN201910609962.XA 2019-07-08 2019-07-08 Service data processing method and device, electronic equipment and medium Active CN110457175B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910609962.XA CN110457175B (en) 2019-07-08 2019-07-08 Service data processing method and device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910609962.XA CN110457175B (en) 2019-07-08 2019-07-08 Service data processing method and device, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN110457175A true CN110457175A (en) 2019-11-15
CN110457175B CN110457175B (en) 2023-04-18

Family

ID=68482427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910609962.XA Active CN110457175B (en) 2019-07-08 2019-07-08 Service data processing method and device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN110457175B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507397A (en) * 2020-04-16 2020-08-07 深圳前海微众银行股份有限公司 Abnormal data analysis method and device
CN111581508A (en) * 2020-04-30 2020-08-25 广州市百果园信息技术有限公司 Service monitoring method, device, equipment and storage medium
CN111815361A (en) * 2020-07-10 2020-10-23 北京思特奇信息技术股份有限公司 Region boundary calculation method and device, electronic equipment and storage medium
CN115081642A (en) * 2022-07-19 2022-09-20 浙江大学 Method and system for updating service prediction model in multi-party cooperation manner
CN115981910A (en) * 2023-03-20 2023-04-18 建信金融科技有限责任公司 Method, device, electronic equipment and computer readable medium for processing exception request
CN116340504A (en) * 2023-03-23 2023-06-27 深圳市申甲网格科技有限公司 Method for realizing digital visualization of plans
CN116484230A (en) * 2023-06-20 2023-07-25 世优(北京)科技有限公司 Method for identifying abnormal business data and training method of AI digital person

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015090241A1 (en) * 2013-12-18 2015-06-25 腾讯科技(深圳)有限公司 Method for monitoring business operations data storage, and related device and system
CN107168995A (en) * 2017-03-29 2017-09-15 联想(北京)有限公司 A kind of data processing method and server
CN107204894A (en) * 2017-05-18 2017-09-26 华为技术有限公司 The monitoring method and device of network servicequality
CN107741955A (en) * 2017-09-15 2018-02-27 平安科技(深圳)有限公司 Business datum monitoring method, device, terminal device and storage medium
CN108322363A (en) * 2018-02-12 2018-07-24 腾讯科技(深圳)有限公司 Propelling data abnormality monitoring method, device, computer equipment and storage medium
CN108880845A (en) * 2017-05-16 2018-11-23 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of information alert
CN109118043A (en) * 2018-06-29 2019-01-01 阿里巴巴集团控股有限公司 A kind of online data quality control method, device, server and storage medium
CN109146381A (en) * 2018-08-23 2019-01-04 北京顺丰同城科技有限公司 Logistics data monitoring method, device, electronic equipment and computer storage medium
CN109495291A (en) * 2018-09-30 2019-03-19 阿里巴巴集团控股有限公司 Call abnormal localization method, device and server

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015090241A1 (en) * 2013-12-18 2015-06-25 腾讯科技(深圳)有限公司 Method for monitoring business operations data storage, and related device and system
CN107168995A (en) * 2017-03-29 2017-09-15 联想(北京)有限公司 A kind of data processing method and server
CN108880845A (en) * 2017-05-16 2018-11-23 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of information alert
CN107204894A (en) * 2017-05-18 2017-09-26 华为技术有限公司 The monitoring method and device of network servicequality
CN107741955A (en) * 2017-09-15 2018-02-27 平安科技(深圳)有限公司 Business datum monitoring method, device, terminal device and storage medium
CN108322363A (en) * 2018-02-12 2018-07-24 腾讯科技(深圳)有限公司 Propelling data abnormality monitoring method, device, computer equipment and storage medium
CN109118043A (en) * 2018-06-29 2019-01-01 阿里巴巴集团控股有限公司 A kind of online data quality control method, device, server and storage medium
CN109146381A (en) * 2018-08-23 2019-01-04 北京顺丰同城科技有限公司 Logistics data monitoring method, device, electronic equipment and computer storage medium
CN109495291A (en) * 2018-09-30 2019-03-19 阿里巴巴集团控股有限公司 Call abnormal localization method, device and server

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XUANHONG LIANG ET AL.: "Power Transformer Abnormal State Recognition Model Based on Improved K-Means Clustering" *
洪斌等: "基于PCA降维的云资源状态监控数据压缩技术", 《计算机科学》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507397A (en) * 2020-04-16 2020-08-07 深圳前海微众银行股份有限公司 Abnormal data analysis method and device
CN111581508A (en) * 2020-04-30 2020-08-25 广州市百果园信息技术有限公司 Service monitoring method, device, equipment and storage medium
CN111815361A (en) * 2020-07-10 2020-10-23 北京思特奇信息技术股份有限公司 Region boundary calculation method and device, electronic equipment and storage medium
CN111815361B (en) * 2020-07-10 2024-06-18 北京思特奇信息技术股份有限公司 Region boundary calculation method, device, electronic equipment and storage medium
CN115081642A (en) * 2022-07-19 2022-09-20 浙江大学 Method and system for updating service prediction model in multi-party cooperation manner
CN115981910A (en) * 2023-03-20 2023-04-18 建信金融科技有限责任公司 Method, device, electronic equipment and computer readable medium for processing exception request
CN115981910B (en) * 2023-03-20 2023-06-16 建信金融科技有限责任公司 Method, apparatus, electronic device and computer readable medium for processing exception request
CN116340504A (en) * 2023-03-23 2023-06-27 深圳市申甲网格科技有限公司 Method for realizing digital visualization of plans
CN116484230A (en) * 2023-06-20 2023-07-25 世优(北京)科技有限公司 Method for identifying abnormal business data and training method of AI digital person
CN116484230B (en) * 2023-06-20 2023-09-01 世优(北京)科技有限公司 Method for identifying abnormal business data and training method of AI digital person

Also Published As

Publication number Publication date
CN110457175B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN110457175A (en) Business data processing method, device, electronic equipment and medium
EP0897566B1 (en) Monitoring and retraining neural network
EP0894378B1 (en) Signature based fraud detection system
CN110992167A (en) Bank client business intention identification method and device
CN107301119A (en) The method and device of IT failure root cause analysis is carried out using timing dependence
GB2321362A (en) Generic processing capability
CN106327324B (en) A kind of quick calculation method and system of network behavior feature
CN106803799B (en) Performance test method and device
CN110390584A (en) A kind of recognition methods of abnormal user, identification device and readable storage medium storing program for executing
CN111181757B (en) Information security risk prediction method and device, computing equipment and storage medium
CN110471821A (en) Abnormal alteration detection method, server and computer readable storage medium
CN109582452A (en) A kind of container dispatching method, dispatching device and electronic equipment
CN110262951A (en) A kind of business second grade monitoring method and system, storage medium and client
CN109873790A (en) Network security detection method, device and computer readable storage medium
WO2023165271A1 (en) Knowledge graph construction and graph calculation
CN117519951B (en) Real-time data processing method and system based on message center
CN110825589B (en) Abnormality detection method and device for micro-service system and electronic equipment
Saravanan et al. A graph-based churn prediction model for mobile telecom networks
CN109963292A (en) Complain method, apparatus, electronic equipment and the storage medium of prediction
CN109359034A (en) A kind of operation system test method, computer readable storage medium and terminal device
CN109241511A (en) A kind of generation method and equipment of electronic report
CN110209713A (en) Abnormal grid structure recognition methods and device
CN107025227A (en) User is to the determination of the familiarity of product, information sifting, processing method and processing device
WO2021083144A1 (en) Service quality testing for multi-service system
CN114913015A (en) Hot account identification method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200927

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

Effective date of registration: 20200927

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant