CN108959493A - Detection method, device and the equipment of Indexes Abnormality fluctuation - Google Patents

Detection method, device and the equipment of Indexes Abnormality fluctuation Download PDF

Info

Publication number
CN108959493A
CN108959493A CN201810662139.0A CN201810662139A CN108959493A CN 108959493 A CN108959493 A CN 108959493A CN 201810662139 A CN201810662139 A CN 201810662139A CN 108959493 A CN108959493 A CN 108959493A
Authority
CN
China
Prior art keywords
dimension
detected
ratio
index
change rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810662139.0A
Other languages
Chinese (zh)
Inventor
王蓬金
赵坤
张冠男
邹润
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810662139.0A priority Critical patent/CN108959493A/en
Publication of CN108959493A publication Critical patent/CN108959493A/en
Pending legal-status Critical Current

Links

Abstract

This specification provides detection method, device and the equipment of a kind of Indexes Abnormality fluctuation, and the present embodiment provides have Data Detection configuration interface that can input the change rate of dimension and index to be detected for user by the interface;For dimension to be detected, analysis model can be called to calculate one or more parameters in comentropy, information gain-ratio or the sixteen principle parameters of dimension to be detected, the influence degree that each dimension changes Indexes Abnormality be can detecte out by parameter calculated.

Description

Detection method, device and the equipment of Indexes Abnormality fluctuation
Technical field
This specification is related to the detection method in data analysis technique field more particularly to Indexes Abnormality fluctuation, device and sets It is standby.
Background technique
With the development of information technology, all trades and professions can all generate a large amount of data daily during operation.For These data, Data Analyst usually have the index much paid close attention to, such as the quantity that Adds User, any active ues quantity etc.. The fluctuation situation of these indexs is usually characterized using year-on-year or ring than Equal variation.Pass through the change rate of these indexs, data Analyst can check whether index is abnormal fluctuation, for example, may illustrate that exception occurs in the index if change rate is larger Fluctuation.
In practical business scene, the unusual fluctuations of index may be generated by many reasons, by taking number of users is year-on-year as an example, shadow The dimension for ringing number of users, city, age or gender etc. where may include occupation, user, different dimensions are for number of users The influence degree for measuring fluctuation may be different.Based on this, it is desirable to provide one kind is able to detect Indexes Abnormality fluctuation, determines unusual fluctuations The scheme of reason.
Summary of the invention
To overcome the problems in correlation technique, present description provides the detection methods of Indexes Abnormality fluctuation, device And equipment.
According to this specification embodiment in a first aspect, providing a kind of detection method of Indexes Abnormality fluctuation, the method Include:
Configuration interface is provided, Data Detection configuration information is obtained by the configuration interface, the Data Detection matches confidence Breath includes: the change rate of dimension and index to be detected;
Data to be tested are loaded, using the change rate of the dimension to be detected and the index as the input of analysis model, It calls the analysis model, wherein the analysis model is used for: calculating the dimension to be detected using the data to be tested Following one or more parameters: comentropy, information gain-ratio or sixteen principle parameters, and institute is determined based on the parameter being calculated Dimension to be detected is stated to the influence degree of the change rate of the index;
Export the testing result.
Optionally, the change rate includes: year-on-year or ring ratio.
Optionally, the comentropy is determined based on such as under type: the chance event in the comentropy is divided into two classes: described The enumerated value of dimension to be detected corresponds to the change rate whether change rate is greater than the index;The probability base that the chance event occurs Determine in such as under type: the enumerated value number that the corresponding change rate is greater than the change rate of the index accounts for piece of dimension to be detected The ratio of act value total number.
Optionally, the comentropy is calculated in the following way:
Wherein, gm(D) comentropy of dimension m is indicated, n is equal to 2, piIndicate the ratio, D indicates the overall variation rate.
Optionally, ratio of the information gain-ratio based on the comentropy Yu the enumerated value total number of the dimension to be detected It is worth and determines.
Optionally, the sixteen principles parameter accounts for the enumerated value total number of the dimension to be detected based on target number Q Ratio and determine;Wherein, the target number Q is indicated: the enumerated value of the dimension to be detected is absolute according to corresponding variation Value from high to low sequence, preceding Q enumerated value corresponding change absolute value and value set more than the change absolute value of the index Certainty ratio, the setting ratio are determined based on 80%.
Optionally, the influence degree is positively correlated with the comentropy or information gain-ratio, joins with sixteen principle Number is negatively correlated.
Optionally, the analysis model is run in Hive platform.
According to the second aspect of this specification embodiment, a kind of detection device of Indexes Abnormality fluctuation is provided, comprising:
Configuration module is used for: configuration interface is provided, Data Detection configuration information is obtained by the configuration interface, it is described Data Detection configuration information includes: the change rate of dimension and index to be detected;
Computing module is used for: load data to be tested, using the change rate of the dimension to be detected and the index as divide The input for analysing model, calls the analysis model, wherein the analysis model is used for: calculating institute using the data to be tested Following one or more parameters of dimension to be detected: comentropy, information gain-ratio or sixteen principle parameters are stated, and are based on calculating To parameter determine the dimension to be detected to the influence degree of the change rate of the index;
Output module is used for: exporting the testing result.
Optionally, the change rate includes: year-on-year or ring ratio.
Optionally, the comentropy is determined based on such as under type: the chance event in the comentropy is divided into two classes: described The enumerated value of dimension to be detected corresponds to the change rate whether change rate is greater than the index;The probability base that the chance event occurs Determine in such as under type: the enumerated value number that the corresponding change rate is greater than the change rate of the index accounts for piece of dimension to be detected The ratio of act value total number.
Optionally, the comentropy is calculated in the following way:
Wherein, gm(D) comentropy of dimension m is indicated, n is equal to 2, piIndicate the ratio, D indicates the overall variation rate.
Optionally, ratio of the information gain-ratio based on the comentropy Yu the enumerated value total number of the dimension to be detected It is worth and determines.
Optionally, the sixteen principles parameter accounts for the enumerated value total number of the dimension to be detected based on target number Q Ratio and determine;Wherein, the target number Q is indicated: the enumerated value of the dimension to be detected is absolute according to corresponding variation Value from high to low sequence, preceding Q enumerated value corresponding change absolute value and value set more than the change absolute value of the index Certainty ratio, the setting ratio are determined based on 80%.
Optionally, the influence degree is positively correlated with the comentropy or information gain-ratio, joins with sixteen principle Number is negatively correlated.
Optionally, the analysis model is run in Hive platform.
According to the third aspect of this specification embodiment, a kind of detection device of Indexes Abnormality fluctuation is provided, comprising:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to:
Configuration interface is provided, Data Detection configuration information is obtained by the configuration interface, the Data Detection matches confidence Breath includes: the change rate of dimension and index to be detected;
Data to be tested are loaded, using the change rate of the dimension to be detected and the index as the input of analysis model, It calls the analysis model, wherein the analysis model is used for: calculating the dimension to be detected using the data to be tested Following one or more parameters: comentropy, information gain-ratio or sixteen principle parameters, and institute is determined based on the parameter being calculated Dimension to be detected is stated to the influence degree of the change rate of the index;
Export the testing result.
The technical solution that the embodiment of this specification provides can include the following benefits:
It is provided with Data Detection configuration interface in this specification embodiment, by the interface, can be inputted for user to be checked Survey the change rate of dimension and index;For dimension to be detected, analysis model can be called to calculate the comentropy of dimension to be detected, letter One or more parameters in ratio of profit increase or sixteen principle parameters are ceased, each dimension pair be can detecte out by parameter calculated The influence degree that Indexes Abnormality changes.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not This specification can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the reality for meeting this specification Example is applied, and is used to explain the principle of this specification together with specification.
Figure 1A is a kind of this specification stream of the detection method of Indexes Abnormality fluctuation shown according to an exemplary embodiment Cheng Tu.
Figure 1B is a kind of answering for this specification detection method of Indexes Abnormality fluctuation shown according to an exemplary embodiment With scene figure.
Fig. 2 is a kind of hardware structure diagram of equipment where the detection device of this specification embodiment Indexes Abnormality fluctuation.
Fig. 3 is a kind of this specification frame of the detection device of Indexes Abnormality fluctuation shown according to an exemplary embodiment Figure.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with this specification.On the contrary, they are only and such as institute The example of the consistent device and method of some aspects be described in detail in attached claims, this specification.
It is only to be not intended to be limiting this explanation merely for for the purpose of describing particular embodiments in the term that this specification uses Book.The "an" of used singular, " described " and "the" are also intended to packet in this specification and in the appended claims Most forms are included, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein is Refer to and includes that one or more associated any or all of project listed may combine.
It will be appreciated that though various information may be described using term first, second, third, etc. in this specification, but These information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not taking off In the case where this specification range, the first information can also be referred to as the second information, and similarly, the second information can also be claimed For the first information.Depending on context, word as used in this " if " can be construed to " ... when " or " when ... " or " in response to determination ".
Under the trend of big data, whether analysis indexes are abnormal fluctuation, have analyzed the producing cause of unusual fluctuations As being engaged in datamation, person needs work taken up daily.Under many business scenarios, Data Analyst can pay close attention to business number According to many indexes.As an example, the index of data may include: light exposure, ad click rate, channel conversion ratio, add day Amount, active users, user conversation number or user's value index nember etc..Data Analyst can pay close attention to the wave of these indexs Dynamic, the fluctuation of index is characterized using the change rate of index, such as year-on-year or ring ratio etc.;Indicate on year-on-year basis by the current statistic period and History compares with the period, and ring is than indicating to compare current statistic period and a upper measurement period.
The unusual fluctuations of index may be generated by many reasons, by way of example, if index is number of users, change rate is same Than, influence the dimension of number of users, may include occupation, city, age or gender etc. where user, different dimensions for The influence degree of family fluctuation quantity may be different.If the related causes for causing Indexes Abnormality to fluctuate can be identified in time, Corresponding analysis and solution can be made in time for abnormal data, be so as to make service platform more stably User provides service.
Year-on-year or ring belongs to ratio shape parameter than Equal variation, which is calculated by molecule divided by denominator, change rate Height, it may be possible to molecule, which becomes larger, to be caused, is likely to be denominator and becomes smaller and cause, it is also possible to be that molecule becomes larger and denominator becomes It is small, it could also be possible that molecule is constant and denominator becomes smaller etc. many reasons, therefore using on year-on-year basis or ring ratio etc. belongs to Ratio-type ginseng It counts to find that Indexes Abnormality changes, the producing cause that analysis indexes change extremely is complex.How according to the change rate of index Fluctuation is influenced, how by technology and algorithm progress automated analysis to find which dimension, and analysis knot is accurately provided The technical issues of fruit is urgent need to resolve.
Based on this, the present embodiment provides a kind of schemes for being able to detect Indexes Abnormality fluctuation, and the present embodiment provides have data Detection configuration interface can input the change rate of dimension and index to be detected, for dimension to be detected by the interface for user Degree, calculates one or more parameters in comentropy, information gain-ratio or the sixteen principle parameters of dimension to be detected, by being counted The parameter of calculation can detecte out the influence degree that each dimension changes Indexes Abnormality.
It as shown in Figure 1A, is a kind of this specification detection side of Indexes Abnormality fluctuation shown according to an exemplary embodiment The flow chart of method, includes the following steps:
In a step 102, configuration interface is provided, Data Detection configuration information, the number are obtained by the configuration interface It include: the change rate of dimension and index to be detected according to detection configuration information.
At step 104, data to be tested are loaded, using the change rate of the dimension to be detected and the index as analysis The analysis model is called in the input of model, wherein the analysis model is used for: using described in data to be tested calculating Following one or more parameters of dimension to be detected: comentropy, information gain-ratio or sixteen principle parameters, and be based on being calculated Parameter determine the dimension to be detected to the influence degree of the change rate of the index.
In step 106, the testing result is exported.
A kind of application scenario diagram of the detection method of the fluctuation of the Indexes Abnormality in conjunction with shown in Figure 1B is illustrated, the present embodiment In data to be tested can be various report datas, for example, KPI Key Performance Indicator (K e y Performance Indicators, KPI) data, various businesses data etc..In practical application, data to be tested can be obtained from database, it should Data to be tested can be the business datum of operation system real-time Transmission, be also possible to off-line data.
Index, for measuring the unit or method of things development degree, such as: population, GDP, income, number of users, benefit Profit rate, retention ratio, coverage rate etc..Index can summarize calculation by adduction, average etc. and obtain, and can be certain Precondition carries out summarizing calculating, such as time, place, range, that is, Statistical Criteria and range.Index can be divided into absolute number Index and relative number index, the reflection of absolute number index is scale index, such as population, GDP, income, number of users, and Relative number index is mainly used to reflect the index of quality, such as profit margin, retention ratio, coverage rate.
Dimension, for characterizing certain feature of things or phenomenon, such as gender, area, time are all dimensions.The wherein time It is a kind of common, special dimension, passes through the comparison before and after the time, so that it may know that the development of things improves or degenerates, Such as user's ring of numbers than increasing by 10% last month, with increasing by 20% compared with the same period of last year, here it is temporal comparisons, also referred to as vertical ratio; Another kind is relatively horizontal ratio, such as country variant population, the comparison of GDP, different province incomes, the comparison of number of users, different public affairs Comparison etc. between department, different departments, referred to as horizontal ratio.Dimension can be divided into qualitative dimension and quantitation dimension, that is, according to number It is divided according to type, it is exactly qualitative dimension that data type, which is character type (text-type) data, and such as area, gender are all qualitative dimensions Degree;Data type is numeric type data, is just quantitation dimension, such as income, age, consumption.
All values of dimension are known as enumerated value, for example, the value of dimension gender includes male and female, then the dimension is enumerated Value is male and female, and there are two the numbers of enumerated value.
In the present embodiment, for data to be tested, it can be divided using a variety of dimensions, for the ease of customer analysis, The present embodiment provides there is configuration interface, in some instances, which can be a visualization interface, can wrap containing with The function of user's interaction, by the Data Detection configuration information of the available user's input of the configuration interface, Data Detection is matched Confidence breath may include: the change rate of dimension and index to be detected.In practical application, it can be divided in advance for data to be tested There are many dimensions, and for the dimension divided, the dimension to be detected of its expectation detection can be chosen for user;On the other hand, Index and corresponding a variety of change rates there are many summarizing can be counted for data to be tested in advance, for the index counted Change rate chooses the change rate of the index of its expectation detection for user.
Be provided with analysis model in the present embodiment in advance, which is used for based on described in data to be tested calculating Following one or more parameters of dimension to be detected: comentropy, information gain-ratio or sixteen principle parameters.
As an example, have a business datum, the index of this part of data is number of users, index change rate using on year-on-year basis and As an example, the dimension to be detected of this part of data includes: occupation, city, gender and age bracket to ring ratio.In order to which example is convenient, this Embodiment can according to need by taking 4 dimensions as an example, in practical application is arranged other a variety of dimensions, and the present embodiment does not limit this It is fixed.Wherein, professional enumerated value includes following university student, teaching and administrative staff or blue collar etc. 11, the corresponding change rate of each enumerated value It is as follows:
Occupation User is year-on-year User's ring ratio
It can not identify 1.21% 0.13%
University student 3.29% 0.35%
Famous enterprise employee 21.87% 6.1%
Teaching and administrative staff 1.44% 0.62%
Self-employed worker 2.03% 0.27%
Blue collar 0.69% 0.07%
Civil servant 0.81% 0.05%
Medical worker 1.88% 0.19%
White collar 0.13% - 0.02%
Listed company employee - 1.04% - 0.35%
Listed Company 0.43% 0.03%
In this part of data of the present embodiment, using ring ratio as change rate, the ring ratio of number of users is 0.18% (expression use The change rate of amount amount, i.e., the average rate of change for the number of users being calculated with whole part data), data analysis target be, Influence of each dimension to user's ring ratio is analyzed from 4 occupation, city, gender and age bracket dimensions, that is to say, that detection is made The influence of the main reason at user's ring than abnormal variation, each dimension of detection have much or detect which dimension influences maximum Etc..
In the present embodiment, comentropy be in information theory be used for metric amount a concept, comentropy be defined as from The probability of occurrence of chance event is dissipated, higher entropy indicates biggish information content.As soon as a system is more ordered into, comentropy is got over It is low;As soon as comentropy is higher conversely, system is chaotic.Under normal circumstances, the data distribution of each dimension has consistent Property, index does not have unusual fluctuations;And when unusual fluctuations occurs in index, it may be possible to due to the number of wherein one or more dimensions According to there is consistency that is abnormal and destroying data distribution, therefore, the comentropy of dimension can be used as judge the dimension whether shadow The effectively mark of one of snap mark unusual fluctuations.If the comentropy of the dimension is higher, that is, means that the fluctuation of the dimension is got over Greatly, i.e. the influence of Indexes Abnormality fluctuation is maximum;Vice versa.
Based on this, one of the effect of analysis model of the present embodiment can be the comentropy for calculating dimension to be detected, optional , existing comentropy calculation method can be incorporated in the data analysis scene of the present embodiment, by way of example, the comentropy base Determine in such as under type: the chance event in the comentropy is divided into two classes: the corresponding variation of the enumerated value of the dimension to be detected Whether rate is greater than the overall variation rate;The probability that the chance event occurs is determined based on such as under type: the corresponding variation The enumerated value number that rate is greater than the overall variation rate accounts for the ratio of the enumerated value total number of dimension to be detected.Pass through above-mentioned side Chance event in comentropy is divided into two classes by formula, and one kind is to the contributive event of overall variation rate of index, and one kind is pair The overall variation rate of index is without the event of contribution, if contributes whether correspond to change rate by the enumerated value of dimension to be detected big It is determined in overall variation rate, therefore the comentropy of dimension to be detected can be quickly determined.
In an optional implementation, the present embodiment additionally provides a kind of calculation of comentropy, by way of example, In the present embodiment, the comentropy is calculated in the following way:
Wherein, gm(D) comentropy of dimension m is indicated;N is equal to 2, that is, characterizes two class events, i.e., dimension pair to be detected The whether influential event of change rate;
piThe probability that the chance event occurs is indicated in comentropy formula, is characterized in the present embodiment using aforementioned proportion;
D indicates data set in comentropy formula, is characterized in the present embodiment using the change rate.
Further, as an example with aforementioned table 1, the process for calculating comentropy to analysis model illustrates.By It is found that occupation has 11 enumerated values shown in table 1.Wherein, user's ring of each enumerated value is than as shown in table 1, this 11 enumerated values In, the number greater than average ring than 0.18% has 6, and the number no more than average ring than 0.18% has 5, therefore:
The calculating process of the comentropy of occupation may is that 6/11*log (6/11)+5/11*log (5/11).
According to above-mentioned calculation, the comentropy of other dimensions to be detected can also be accordingly calculated.
In practical application, the difference of the enumerated value of different dimensions to be detected may be larger, for example, the enumerated value of gender it is less, And city enumerated value may it is more, comentropy be suitable for the to be detected dimension more to enumerated value calculating, in order to avoid piece It is influenced caused by act value difference is different, in other examples, analysis model can also calculate the information gain-ratio of dimension to be detected, optional , the information gain-ratio can the ratio based on the comentropy and the enumerated value total number of the dimension to be detected and it is true It is fixed, it is influenced so as to weaken enumerated value difference bring.As an example, comentropy and the dimension to be detected can be calculated For the ratio of enumerated value total number as the information gain-ratio, calculating process is as follows:
Intlm(D)=gm(D)/N
Wherein, IntlmIt (D) is the information gain-ratio for indicating dimension m, N is the enumerated value total number of dimension m.
It in other examples, can also be in the ratio based on comentropy and the enumerated value total number of the dimension to be detected On the basis of, information gain-ratio is calculated by increasing the modes such as other modification parameters, the present embodiment is not construed as limiting this.
The analysis model of the present embodiment can also calculate sixteen principle parameters on the other hand;Wherein, former based on sixteen Then, in the abnormal variation of index, the wherein small portion in each dimension to be detected of most contribution (about 80%) Divide (about 20%).Therefore, bigger to the influence of index changed extremely if the data movement of some dimension is more concentrated.This reality It applies in example, sixteen principle parameters can account for the ratio of the enumerated value total number of the dimension to be detected and true based on target number Q It is fixed;Wherein, the target number Q is indicated: by the enumerated value of the dimension to be detected according to corresponding change absolute value from up to Low sequence, the corresponding change absolute value of preceding Q enumerated value and value be more than the index change absolute value setting ratio, institute Setting ratio is stated to determine based on 80%.Understand from principle, the concentration of data movement in the sixteen principle parameter characterization dimensions Degree, the influence degree which changes index, the intensity of data movement in the dimension, intensity is higher, to finger It is bigger to mark the influence changed.
As an example, sixteen principle parameters can be calculated in the following way, it is assumed that be directed to number of users ring ratio, whole number Changing absolute value according to the index is 100,000;And setting ratio can be determined based on 80%, in practical application, also can according to need The flexible configuration ratio, for example, with 80% similar in ratio.For using 80% in the present embodiment, the 80% of 100,000 is 80,000.
It will be to be checked according to the change absolute value of all enumerated values of the dimension to be detected for first dimension to be detected All enumerated values sequence for surveying dimension, reads the change absolute value of each enumerated value, and seeks and be worth from high to low, when super with value Cross this 80,000, then can obtain target number Q, the ratio based on target number Q Yu the enumerated value total number of the dimension can obtain Obtain sixteen principle parameters.
Above-mentioned calculating process can use following algorithmic notation in practical applications:
Wherein, count indicates to count, and count (all) indicates the enumerated value total number of the dimension to be detected;
βjIndicate the change absolute value of the enumerated value j of dimension to be detected, sum (βj) indicate that the index of dimension to be detected changes Absolute value, wherein 1≤j≤N, N are the enumerated value total number of dimension to be detected, wherein according to all pieces of the dimension to be detected The change absolute value of act value sorts all enumerated values of dimension to be detected;
Indicate βjAccount for sum (βj) ratio whether be greater than 0.8, this 0.8 It can according to need and be adjusted flexibly.
As an example, by taking the sixteen principle parameters for calculating city dimension as an example, wherein the enumerated value of city dimension has 365 It is a, according to the variation absolute value of the corresponding number of users of each enumerated value, wherein changing absolute value highest first 5 has been more than 8 Ten thousand, therefore the influence value of city dimension is 5/365.
For calculating professional dimension, wherein the enumerated value of professional dimension has 11, according to the corresponding use of each enumerated value The variation absolute value of amount amount, wherein the variation absolute value of highest 1 enumerated value of quantity alreadys exceed 80,000, therefore occupation dimension The influence value of degree is 1/11.
It is appreciated that the influence value changed based on the dimension that sixteen rules algorithms calculate to index, is characterized in the dimension The intensity of data movement, intensity is higher, and the influence changed to index is bigger.
By the above-mentioned means, analysis model can be calculated in comentropy, information gain-ratio or sixteen principle parameters One or more, and then determine the dimension to be detected to the influence degree of the change rate of the index, wherein the influence degree It is positively correlated with the comentropy or information gain-ratio, negatively correlated with sixteen principle parameters, when specific implementation can be according to need The method of determination of flexible configuration influence degree is wanted, for example, if only calculating one of parameter, it can be according to the parameter to be checked Dimension sequence is surveyed, the correlativity based on above-mentioned influence degree Yu comentropy, information gain-ratio or sixteen principle parameters can be true The influence degree of fixed each dimension to be detected.It in other examples, can be in conjunction with influence degree and each if calculating has multiple parameters The correlativity of a parameter and integrate determination, optionally, in the case where comprehensive determine, above three parameter can also be right respectively There should be weighted value,
As an example, for ease of calculation, can by comentropy, information gain-ratio or sixteen principle parameter normalizations, In, sixteen principle parameters can be normalized that (backward refers to, by dimension according to sixteen principle parameters by inverse based on backward Sequence arrangement), be multiplied summation again later:
Score=normalize (f1)×normalize(f2)×normalize(f3)
Finally, can be ranked up according to the Score of each dimension, the maximum dimension of Score can be determined as to index The influence of variation is maximum.Calculated result based on analysis model, the Score that can obtain each dimension to be detected optionally can To export the Score of each dimension to be detected, Indexes Abnormality is become so as to find each dimension to be detected for user Dynamic influence degree.
In practical application, based on calculated each dimension to be detected influence degree, since dimension to be detected may be used also To be further partitioned into more various dimensions, above-mentioned detection method can also execute again, so as to which further segmentation is tieed up again Degree, to find more careful influencing factor.That is, the detection method of the present embodiment can be such as tree-shaped calculating knot Structure detects progressively since first layer, and dimension to be detected can be from slightly dividing to sub-layers layer, for example, when detecting occupation Maximum is influenced on the change rate of index, can use the present embodiment method, using each enumerated value of occupation as newly to be detected Dimension further analyzes, to the influence degree of the change rate of index in professional each enumerated value.
The present embodiment method can run on the Hive platform based on distributed computing, it can be achieved that real-time calculation processing, goes back Acceleration processing can be carried out by technologies such as cachings, this method quickly detects data.
Corresponding with the embodiment of detection method of aforementioned Indexes Abnormality fluctuation, this specification additionally provides Indexes Abnormality wave The embodiment of dynamic detection device and its applied equipment.
The embodiment of the detection device of this specification Indexes Abnormality fluctuation can be applied to be calculated in equipment in server etc..Dress Setting embodiment can also be realized by software realization by way of hardware or software and hardware combining.It is implemented in software to be Example, as the device on a logical meaning, being will be non-volatile by the processor of the detection of Indexes Abnormality fluctuation where it Corresponding computer program instructions are read into memory what operation was formed in memory.For hardware view, as shown in Fig. 2, A kind of hardware structure diagram of equipment where the detection device fluctuated for this specification Indexes Abnormality, in addition to processor shown in Fig. 2 210, except memory 230, network interface 220 and nonvolatile memory 240,231 place equipment of device in embodiment is led to Often according to the actual functional capability of the equipment, it can also include other hardware, this is repeated no more.
As shown in figure 3, Fig. 3 is a kind of this specification detection of Indexes Abnormality fluctuation shown according to an exemplary embodiment The block diagram of device, described device include:
Configuration module 31, is used for: providing configuration interface, obtains Data Detection configuration information, institute by the configuration interface State the change rate that Data Detection configuration information includes: dimension and index to be detected;
Computing module 32, is used for: load data to be tested, using the change rate of the dimension to be detected and the index as The analysis model is called in the input of analysis model, wherein the analysis model is used for: being calculated using the data to be tested Following one or more parameters of the dimension to be detected: comentropy, information gain-ratio or sixteen principle parameters, and based on calculating Obtained parameter determines the dimension to be detected to the influence degree of the change rate of the index;
Output module 33, is used for: exporting the testing result.
Optionally, the change rate includes: year-on-year or ring ratio.
Optionally, the comentropy is determined based on such as under type: the chance event in the comentropy is divided into two classes: described The enumerated value of dimension to be detected corresponds to the change rate whether change rate is greater than the index;The probability base that the chance event occurs Determine in such as under type: the enumerated value number that the corresponding change rate is greater than the change rate of the index accounts for piece of dimension to be detected The ratio of act value total number.
Optionally, the comentropy is calculated in the following way:
Wherein, gm(D) comentropy of dimension m is indicated, n is equal to 2, piIndicate the ratio, D indicates the overall variation rate.
Optionally, ratio of the information gain-ratio based on the comentropy Yu the enumerated value total number of the dimension to be detected It is worth and determines.
Optionally, the sixteen principles parameter accounts for the enumerated value total number of the dimension to be detected based on target number Q Ratio and determine;Wherein, the target number Q is indicated: the enumerated value of the dimension to be detected is absolute according to corresponding variation Value from high to low sequence, preceding Q enumerated value corresponding change absolute value and value set more than the change absolute value of the index Certainty ratio, the setting ratio are determined based on 80%.
Optionally, the influence degree is positively correlated with the comentropy or information gain-ratio, joins with sixteen principle Number is negatively correlated.
Optionally, the analysis model is run in Hive platform.
Correspondingly, this specification also provides a kind of detection device of Indexes Abnormality fluctuation, it include processor;For storing The memory of processor-executable instruction;Wherein, the processor is configured to:
Configuration interface is provided, Data Detection configuration information is obtained by the configuration interface, the Data Detection matches confidence Breath includes: the change rate of dimension and index to be detected;
Data to be tested are loaded, using the change rate of the dimension to be detected and the index as the input of analysis model, It calls the analysis model, wherein the analysis model is used for: calculating the dimension to be detected using the data to be tested Following one or more parameters: comentropy, information gain-ratio or sixteen principle parameters, and institute is determined based on the parameter being calculated Dimension to be detected is stated to the influence degree of the change rate of the index;
Export the testing result.
The function of modules and the realization process of effect are specifically detailed in the detection device of These parameters unusual fluctuations The realization process that step is corresponded in the detection method of Indexes Abnormality fluctuation is stated, details are not described herein.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit The module of explanation may or may not be physically separated, and the component shown as module can be or can also be with It is not physical module, it can it is in one place, or may be distributed on multiple network modules.It can be according to actual The purpose for needing to select some or all of the modules therein to realize this specification scheme.Those of ordinary skill in the art are not In the case where making the creative labor, it can understand and implement.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can With or may be advantageous.
Those skilled in the art will readily occur to this specification after considering specification and practicing the invention applied here Other embodiments.This specification is intended to cover any variations, uses, or adaptations of this specification, these modifications, Purposes or adaptive change follow the general principle of this specification and do not apply in the art including this specification Common knowledge or conventional techniques.The description and examples are only to be considered as illustrative, the true scope of this specification and Spirit is indicated by the following claims.
It should be understood that this specification is not limited to the precise structure that has been described above and shown in the drawings, And various modifications and changes may be made without departing from the scope thereof.The range of this specification is only limited by the attached claims System.
The foregoing is merely the preferred embodiments of this specification, all in this explanation not to limit this specification Within the spirit and principle of book, any modification, equivalent substitution, improvement and etc. done should be included in the model of this specification protection Within enclosing.

Claims (14)

1. a kind of detection method of Indexes Abnormality fluctuation, comprising:
Configuration interface is provided, Data Detection configuration information, the Data Detection configuration information packet are obtained by the configuration interface It includes: the change rate of dimension and index to be detected;
Data to be tested are loaded, using the change rate of the dimension to be detected and the index as the input of analysis model, are called The analysis model, wherein the analysis model is used for: the as follows of the dimension to be detected is calculated using the data to be tested One or more parameters: comentropy, information gain-ratio or sixteen principle parameters, and based on the parameter that is calculated determine it is described to Influence degree of the detection dimensions to the change rate of the index;
Export the testing result.
2. according to the method described in claim 1, the change rate includes: year-on-year or ring ratio.
3. according to the method described in claim 1, the comentropy is determined based on such as under type: the Random event in the comentropy Part is divided into two classes: the enumerated value of the dimension to be detected corresponds to the change rate whether change rate is greater than the index;Wherein, described The probability that chance event occurs is determined based on such as under type: the corresponding change rate is greater than the enumerated value of the change rate of the index Number accounts for the ratio of the enumerated value total number of dimension to be detected.
4. according to the method described in claim 3, the comentropy is calculated in the following way:
Wherein, gm(D) comentropy of dimension m is indicated, n is equal to 2, piIndicate the ratio, D indicates the overall variation rate.
5. according to the method described in claim 1, the information gain-ratio is based on the comentropy and the dimension to be detected The ratio of enumerated value total number and determine.
6. according to the method described in claim 1, the sixteen principles parameter accounts for the dimension to be detected based on target number Q The ratio of enumerated value total number and determine;Wherein, the target number Q is indicated: by the enumerated value of the dimension to be detected according to Corresponding change absolute value from high to low sequence, the corresponding change absolute value of preceding Q enumerated value and value be more than the index The setting ratio of change absolute value, the setting ratio are determined based on 80%.
7. according to the method described in claim 1, the influence degree is positively correlated with the comentropy or information gain-ratio, with The sixteen principles parameter is negatively correlated.
8. according to the method described in claim 1, the analysis model is run in Hive platform.
9. a kind of detection device of Indexes Abnormality fluctuation, described device include:
Configuration module is used for: being provided Data Detection and is configured interface, obtains Data Detection request, the data by the interface Detection request includes: the change rate of dimension and index to be detected;
Computing module is used for: load data to be tested, using the change rate of the dimension to be detected and the index as analysis mould The analysis model is called in the input of type, wherein the analysis model is used for: using the data to be tested calculate it is described to Following one or more parameters of detection dimensions: comentropy, information gain-ratio or sixteen principle parameters, and based on being calculated Parameter determines the dimension to be detected to the influence degree of the change rate of the index;
Output module is used for: exporting the testing result.
10. device according to claim 9, the comentropy is determined based on such as under type: random in the comentropy Event is divided into two classes: the enumerated value of the dimension to be detected corresponds to the change rate whether change rate is greater than the index;It is described with The probability that machine event occurs is determined based on such as under type: the corresponding change rate is greater than the enumerated value of the change rate of the index Number accounts for the ratio of the enumerated value total number of dimension to be detected.
11. device according to claim 9, the information gain-ratio is based on the comentropy and the dimension to be detected The ratio of enumerated value total number and determine.
12. device according to claim 9, the sixteen principles parameter is based on target number Q and accounts for the dimension to be detected Enumerated value total number ratio and determine;Wherein, the target number Q is indicated: the enumerated value of the dimension to be detected is pressed According to corresponding change absolute value from high to low sequence, the corresponding change absolute value of preceding Q enumerated value and value be more than the index Change absolute value setting ratio, the setting ratio determines based on 80%.
13. device according to claim 9, the influence degree is positively correlated with the comentropy or information gain-ratio, It is negatively correlated with the sixteen principles parameter.
14. a kind of detection device of Indexes Abnormality fluctuation, comprising:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to:
Configuration interface is provided, Data Detection configuration information, the Data Detection configuration information packet are obtained by the configuration interface It includes: the change rate of dimension and index to be detected;
Data to be tested are loaded, using the change rate of the dimension to be detected and the index as the input of analysis model, are called The analysis model, wherein the analysis model is used for: the as follows of the dimension to be detected is calculated using the data to be tested One or more parameters: comentropy, information gain-ratio or sixteen principle parameters, and based on the parameter that is calculated determine it is described to Influence degree of the detection dimensions to the change rate of the index;
Export the testing result.
CN201810662139.0A 2018-06-25 2018-06-25 Detection method, device and the equipment of Indexes Abnormality fluctuation Pending CN108959493A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810662139.0A CN108959493A (en) 2018-06-25 2018-06-25 Detection method, device and the equipment of Indexes Abnormality fluctuation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810662139.0A CN108959493A (en) 2018-06-25 2018-06-25 Detection method, device and the equipment of Indexes Abnormality fluctuation

Publications (1)

Publication Number Publication Date
CN108959493A true CN108959493A (en) 2018-12-07

Family

ID=64486561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810662139.0A Pending CN108959493A (en) 2018-06-25 2018-06-25 Detection method, device and the equipment of Indexes Abnormality fluctuation

Country Status (1)

Country Link
CN (1) CN108959493A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458473A (en) * 2019-08-20 2019-11-15 国网福建省电力有限公司 A kind of dynamic decision analysis method and terminal for electric power billboard
CN110459276A (en) * 2019-08-15 2019-11-15 北京嘉和海森健康科技有限公司 A kind of data processing method and relevant device
CN110991241A (en) * 2019-10-31 2020-04-10 支付宝(杭州)信息技术有限公司 Abnormality recognition method, apparatus, and computer-readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009145251A1 (en) * 2008-05-30 2009-12-03 株式会社日立ハイテクノロジーズ Method for assisting judgment of abnormality of reaction process data and automatic analyzer
CN105447323A (en) * 2015-12-11 2016-03-30 百度在线网络技术(北京)有限公司 Data abnormal fluctuations detecting method and apparatus
CN106612216A (en) * 2015-10-27 2017-05-03 北京国双科技有限公司 Method and apparatus of detecting website access exception
CN107682354A (en) * 2017-10-25 2018-02-09 东软集团股份有限公司 A kind of network virus detection method, apparatus and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009145251A1 (en) * 2008-05-30 2009-12-03 株式会社日立ハイテクノロジーズ Method for assisting judgment of abnormality of reaction process data and automatic analyzer
CN106612216A (en) * 2015-10-27 2017-05-03 北京国双科技有限公司 Method and apparatus of detecting website access exception
CN105447323A (en) * 2015-12-11 2016-03-30 百度在线网络技术(北京)有限公司 Data abnormal fluctuations detecting method and apparatus
CN107682354A (en) * 2017-10-25 2018-02-09 东软集团股份有限公司 A kind of network virus detection method, apparatus and equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110459276A (en) * 2019-08-15 2019-11-15 北京嘉和海森健康科技有限公司 A kind of data processing method and relevant device
CN110458473A (en) * 2019-08-20 2019-11-15 国网福建省电力有限公司 A kind of dynamic decision analysis method and terminal for electric power billboard
CN110458473B (en) * 2019-08-20 2022-07-05 国网福建省电力有限公司 Dynamic decision analysis method and terminal for electric billboard
CN110991241A (en) * 2019-10-31 2020-04-10 支付宝(杭州)信息技术有限公司 Abnormality recognition method, apparatus, and computer-readable medium
CN110991241B (en) * 2019-10-31 2022-06-03 支付宝(杭州)信息技术有限公司 Abnormality recognition method, apparatus, and computer-readable medium

Similar Documents

Publication Publication Date Title
US11645581B2 (en) Meaningfully explaining black-box machine learning models
US10367888B2 (en) Cloud process for rapid data investigation and data integrity analysis
US8577775B1 (en) Systems and methods for managing investments
WO2019061976A1 (en) Fund product recommendation method and apparatus, terminal device, and storage medium
CN107993143A (en) A kind of Credit Risk Assessment method and system
US10140661B2 (en) Systems and methods for managing investments
Toshniwal Clustering techniques for streaming data-a survey
US20150142520A1 (en) Crowd-based sentiment indices
Amin et al. Implementation of decision tree using C4. 5 algorithm in decision making of loan application by debtor (Case study: Bank pasar of Yogyakarta Special Region)
CN108959493A (en) Detection method, device and the equipment of Indexes Abnormality fluctuation
US8694427B2 (en) Time-efficient and deterministic adaptive score calibration techniques for maintaining a predefined score distribution
CN107622326B (en) User classification and available resource prediction method, device and equipment
US10613525B1 (en) Automated health assessment and outage prediction system
CN115641019A (en) Index anomaly analysis method and device, computer equipment and storage medium
Maredza Internal determinants of bank profitability in South Africa: does bank efficiency matter?
CN115545103A (en) Abnormal data identification method, label identification method and abnormal data identification device
CN108428138B (en) Customer survival rate analysis device and method based on customer clustering
US9928516B2 (en) System and method for automated analysis of data to populate natural language description of data relationships
Lukić Analysis of productivity of distribution trade of selective countries of the European Union, Russia and Serbia based on the OCRA method
US20200265354A1 (en) Decision Making Entity Analytics Methods and Systems
WO2022183019A1 (en) Methods for mitigation of algorithmic bias discrimination, proxy discrimination and disparate impact
US20220067460A1 (en) Variance Characterization Based on Feature Contribution
CN112948469A (en) Data mining method and device, computer equipment and storage medium
WO2019227415A1 (en) Scorecard model adjustment method, device, server and storage medium
CN111506826A (en) User recommendation method, device, equipment and storage medium based on intimacy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200924

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

Effective date of registration: 20200924

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20181207

RJ01 Rejection of invention patent application after publication