Summary of the invention
To overcome the problems in correlation technique, present description provides the detection methods of Indexes Abnormality fluctuation, device
And equipment.
According to this specification embodiment in a first aspect, providing a kind of detection method of Indexes Abnormality fluctuation, the method
Include:
Configuration interface is provided, Data Detection configuration information is obtained by the configuration interface, the Data Detection matches confidence
Breath includes: the change rate of dimension and index to be detected;
Data to be tested are loaded, using the change rate of the dimension to be detected and the index as the input of analysis model,
It calls the analysis model, wherein the analysis model is used for: calculating the dimension to be detected using the data to be tested
Following one or more parameters: comentropy, information gain-ratio or sixteen principle parameters, and institute is determined based on the parameter being calculated
Dimension to be detected is stated to the influence degree of the change rate of the index;
Export the testing result.
Optionally, the change rate includes: year-on-year or ring ratio.
Optionally, the comentropy is determined based on such as under type: the chance event in the comentropy is divided into two classes: described
The enumerated value of dimension to be detected corresponds to the change rate whether change rate is greater than the index;The probability base that the chance event occurs
Determine in such as under type: the enumerated value number that the corresponding change rate is greater than the change rate of the index accounts for piece of dimension to be detected
The ratio of act value total number.
Optionally, the comentropy is calculated in the following way:
Wherein, gm(D) comentropy of dimension m is indicated, n is equal to 2, piIndicate the ratio, D indicates the overall variation rate.
Optionally, ratio of the information gain-ratio based on the comentropy Yu the enumerated value total number of the dimension to be detected
It is worth and determines.
Optionally, the sixteen principles parameter accounts for the enumerated value total number of the dimension to be detected based on target number Q
Ratio and determine;Wherein, the target number Q is indicated: the enumerated value of the dimension to be detected is absolute according to corresponding variation
Value from high to low sequence, preceding Q enumerated value corresponding change absolute value and value set more than the change absolute value of the index
Certainty ratio, the setting ratio are determined based on 80%.
Optionally, the influence degree is positively correlated with the comentropy or information gain-ratio, joins with sixteen principle
Number is negatively correlated.
Optionally, the analysis model is run in Hive platform.
According to the second aspect of this specification embodiment, a kind of detection device of Indexes Abnormality fluctuation is provided, comprising:
Configuration module is used for: configuration interface is provided, Data Detection configuration information is obtained by the configuration interface, it is described
Data Detection configuration information includes: the change rate of dimension and index to be detected;
Computing module is used for: load data to be tested, using the change rate of the dimension to be detected and the index as divide
The input for analysing model, calls the analysis model, wherein the analysis model is used for: calculating institute using the data to be tested
Following one or more parameters of dimension to be detected: comentropy, information gain-ratio or sixteen principle parameters are stated, and are based on calculating
To parameter determine the dimension to be detected to the influence degree of the change rate of the index;
Output module is used for: exporting the testing result.
Optionally, the change rate includes: year-on-year or ring ratio.
Optionally, the comentropy is determined based on such as under type: the chance event in the comentropy is divided into two classes: described
The enumerated value of dimension to be detected corresponds to the change rate whether change rate is greater than the index;The probability base that the chance event occurs
Determine in such as under type: the enumerated value number that the corresponding change rate is greater than the change rate of the index accounts for piece of dimension to be detected
The ratio of act value total number.
Optionally, the comentropy is calculated in the following way:
Wherein, gm(D) comentropy of dimension m is indicated, n is equal to 2, piIndicate the ratio, D indicates the overall variation rate.
Optionally, ratio of the information gain-ratio based on the comentropy Yu the enumerated value total number of the dimension to be detected
It is worth and determines.
Optionally, the sixteen principles parameter accounts for the enumerated value total number of the dimension to be detected based on target number Q
Ratio and determine;Wherein, the target number Q is indicated: the enumerated value of the dimension to be detected is absolute according to corresponding variation
Value from high to low sequence, preceding Q enumerated value corresponding change absolute value and value set more than the change absolute value of the index
Certainty ratio, the setting ratio are determined based on 80%.
Optionally, the influence degree is positively correlated with the comentropy or information gain-ratio, joins with sixteen principle
Number is negatively correlated.
Optionally, the analysis model is run in Hive platform.
According to the third aspect of this specification embodiment, a kind of detection device of Indexes Abnormality fluctuation is provided, comprising:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to:
Configuration interface is provided, Data Detection configuration information is obtained by the configuration interface, the Data Detection matches confidence
Breath includes: the change rate of dimension and index to be detected;
Data to be tested are loaded, using the change rate of the dimension to be detected and the index as the input of analysis model,
It calls the analysis model, wherein the analysis model is used for: calculating the dimension to be detected using the data to be tested
Following one or more parameters: comentropy, information gain-ratio or sixteen principle parameters, and institute is determined based on the parameter being calculated
Dimension to be detected is stated to the influence degree of the change rate of the index;
Export the testing result.
The technical solution that the embodiment of this specification provides can include the following benefits:
It is provided with Data Detection configuration interface in this specification embodiment, by the interface, can be inputted for user to be checked
Survey the change rate of dimension and index;For dimension to be detected, analysis model can be called to calculate the comentropy of dimension to be detected, letter
One or more parameters in ratio of profit increase or sixteen principle parameters are ceased, each dimension pair be can detecte out by parameter calculated
The influence degree that Indexes Abnormality changes.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
This specification can be limited.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with this specification.On the contrary, they are only and such as institute
The example of the consistent device and method of some aspects be described in detail in attached claims, this specification.
It is only to be not intended to be limiting this explanation merely for for the purpose of describing particular embodiments in the term that this specification uses
Book.The "an" of used singular, " described " and "the" are also intended to packet in this specification and in the appended claims
Most forms are included, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein is
Refer to and includes that one or more associated any or all of project listed may combine.
It will be appreciated that though various information may be described using term first, second, third, etc. in this specification, but
These information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not taking off
In the case where this specification range, the first information can also be referred to as the second information, and similarly, the second information can also be claimed
For the first information.Depending on context, word as used in this " if " can be construed to " ... when " or
" when ... " or " in response to determination ".
Under the trend of big data, whether analysis indexes are abnormal fluctuation, have analyzed the producing cause of unusual fluctuations
As being engaged in datamation, person needs work taken up daily.Under many business scenarios, Data Analyst can pay close attention to business number
According to many indexes.As an example, the index of data may include: light exposure, ad click rate, channel conversion ratio, add day
Amount, active users, user conversation number or user's value index nember etc..Data Analyst can pay close attention to the wave of these indexs
Dynamic, the fluctuation of index is characterized using the change rate of index, such as year-on-year or ring ratio etc.;Indicate on year-on-year basis by the current statistic period and
History compares with the period, and ring is than indicating to compare current statistic period and a upper measurement period.
The unusual fluctuations of index may be generated by many reasons, by way of example, if index is number of users, change rate is same
Than, influence the dimension of number of users, may include occupation, city, age or gender etc. where user, different dimensions for
The influence degree of family fluctuation quantity may be different.If the related causes for causing Indexes Abnormality to fluctuate can be identified in time,
Corresponding analysis and solution can be made in time for abnormal data, be so as to make service platform more stably
User provides service.
Year-on-year or ring belongs to ratio shape parameter than Equal variation, which is calculated by molecule divided by denominator, change rate
Height, it may be possible to molecule, which becomes larger, to be caused, is likely to be denominator and becomes smaller and cause, it is also possible to be that molecule becomes larger and denominator becomes
It is small, it could also be possible that molecule is constant and denominator becomes smaller etc. many reasons, therefore using on year-on-year basis or ring ratio etc. belongs to Ratio-type ginseng
It counts to find that Indexes Abnormality changes, the producing cause that analysis indexes change extremely is complex.How according to the change rate of index
Fluctuation is influenced, how by technology and algorithm progress automated analysis to find which dimension, and analysis knot is accurately provided
The technical issues of fruit is urgent need to resolve.
Based on this, the present embodiment provides a kind of schemes for being able to detect Indexes Abnormality fluctuation, and the present embodiment provides have data
Detection configuration interface can input the change rate of dimension and index to be detected, for dimension to be detected by the interface for user
Degree, calculates one or more parameters in comentropy, information gain-ratio or the sixteen principle parameters of dimension to be detected, by being counted
The parameter of calculation can detecte out the influence degree that each dimension changes Indexes Abnormality.
It as shown in Figure 1A, is a kind of this specification detection side of Indexes Abnormality fluctuation shown according to an exemplary embodiment
The flow chart of method, includes the following steps:
In a step 102, configuration interface is provided, Data Detection configuration information, the number are obtained by the configuration interface
It include: the change rate of dimension and index to be detected according to detection configuration information.
At step 104, data to be tested are loaded, using the change rate of the dimension to be detected and the index as analysis
The analysis model is called in the input of model, wherein the analysis model is used for: using described in data to be tested calculating
Following one or more parameters of dimension to be detected: comentropy, information gain-ratio or sixteen principle parameters, and be based on being calculated
Parameter determine the dimension to be detected to the influence degree of the change rate of the index.
In step 106, the testing result is exported.
A kind of application scenario diagram of the detection method of the fluctuation of the Indexes Abnormality in conjunction with shown in Figure 1B is illustrated, the present embodiment
In data to be tested can be various report datas, for example, KPI Key Performance Indicator (K e y Performance
Indicators, KPI) data, various businesses data etc..In practical application, data to be tested can be obtained from database, it should
Data to be tested can be the business datum of operation system real-time Transmission, be also possible to off-line data.
Index, for measuring the unit or method of things development degree, such as: population, GDP, income, number of users, benefit
Profit rate, retention ratio, coverage rate etc..Index can summarize calculation by adduction, average etc. and obtain, and can be certain
Precondition carries out summarizing calculating, such as time, place, range, that is, Statistical Criteria and range.Index can be divided into absolute number
Index and relative number index, the reflection of absolute number index is scale index, such as population, GDP, income, number of users, and
Relative number index is mainly used to reflect the index of quality, such as profit margin, retention ratio, coverage rate.
Dimension, for characterizing certain feature of things or phenomenon, such as gender, area, time are all dimensions.The wherein time
It is a kind of common, special dimension, passes through the comparison before and after the time, so that it may know that the development of things improves or degenerates,
Such as user's ring of numbers than increasing by 10% last month, with increasing by 20% compared with the same period of last year, here it is temporal comparisons, also referred to as vertical ratio;
Another kind is relatively horizontal ratio, such as country variant population, the comparison of GDP, different province incomes, the comparison of number of users, different public affairs
Comparison etc. between department, different departments, referred to as horizontal ratio.Dimension can be divided into qualitative dimension and quantitation dimension, that is, according to number
It is divided according to type, it is exactly qualitative dimension that data type, which is character type (text-type) data, and such as area, gender are all qualitative dimensions
Degree;Data type is numeric type data, is just quantitation dimension, such as income, age, consumption.
All values of dimension are known as enumerated value, for example, the value of dimension gender includes male and female, then the dimension is enumerated
Value is male and female, and there are two the numbers of enumerated value.
In the present embodiment, for data to be tested, it can be divided using a variety of dimensions, for the ease of customer analysis,
The present embodiment provides there is configuration interface, in some instances, which can be a visualization interface, can wrap containing with
The function of user's interaction, by the Data Detection configuration information of the available user's input of the configuration interface, Data Detection is matched
Confidence breath may include: the change rate of dimension and index to be detected.In practical application, it can be divided in advance for data to be tested
There are many dimensions, and for the dimension divided, the dimension to be detected of its expectation detection can be chosen for user;On the other hand,
Index and corresponding a variety of change rates there are many summarizing can be counted for data to be tested in advance, for the index counted
Change rate chooses the change rate of the index of its expectation detection for user.
Be provided with analysis model in the present embodiment in advance, which is used for based on described in data to be tested calculating
Following one or more parameters of dimension to be detected: comentropy, information gain-ratio or sixteen principle parameters.
As an example, have a business datum, the index of this part of data is number of users, index change rate using on year-on-year basis and
As an example, the dimension to be detected of this part of data includes: occupation, city, gender and age bracket to ring ratio.In order to which example is convenient, this
Embodiment can according to need by taking 4 dimensions as an example, in practical application is arranged other a variety of dimensions, and the present embodiment does not limit this
It is fixed.Wherein, professional enumerated value includes following university student, teaching and administrative staff or blue collar etc. 11, the corresponding change rate of each enumerated value
It is as follows:
Occupation |
User is year-on-year |
User's ring ratio |
It can not identify |
1.21% |
0.13% |
University student |
3.29% |
0.35% |
Famous enterprise employee |
21.87% |
6.1% |
Teaching and administrative staff |
1.44% |
0.62% |
Self-employed worker |
2.03% |
0.27% |
Blue collar |
0.69% |
0.07% |
Civil servant |
0.81% |
0.05% |
Medical worker |
1.88% |
0.19% |
White collar |
0.13% |
- 0.02% |
Listed company employee |
- 1.04% |
- 0.35% |
Listed Company |
0.43% |
0.03% |
In this part of data of the present embodiment, using ring ratio as change rate, the ring ratio of number of users is 0.18% (expression use
The change rate of amount amount, i.e., the average rate of change for the number of users being calculated with whole part data), data analysis target be,
Influence of each dimension to user's ring ratio is analyzed from 4 occupation, city, gender and age bracket dimensions, that is to say, that detection is made
The influence of the main reason at user's ring than abnormal variation, each dimension of detection have much or detect which dimension influences maximum
Etc..
In the present embodiment, comentropy be in information theory be used for metric amount a concept, comentropy be defined as from
The probability of occurrence of chance event is dissipated, higher entropy indicates biggish information content.As soon as a system is more ordered into, comentropy is got over
It is low;As soon as comentropy is higher conversely, system is chaotic.Under normal circumstances, the data distribution of each dimension has consistent
Property, index does not have unusual fluctuations;And when unusual fluctuations occurs in index, it may be possible to due to the number of wherein one or more dimensions
According to there is consistency that is abnormal and destroying data distribution, therefore, the comentropy of dimension can be used as judge the dimension whether shadow
The effectively mark of one of snap mark unusual fluctuations.If the comentropy of the dimension is higher, that is, means that the fluctuation of the dimension is got over
Greatly, i.e. the influence of Indexes Abnormality fluctuation is maximum;Vice versa.
Based on this, one of the effect of analysis model of the present embodiment can be the comentropy for calculating dimension to be detected, optional
, existing comentropy calculation method can be incorporated in the data analysis scene of the present embodiment, by way of example, the comentropy base
Determine in such as under type: the chance event in the comentropy is divided into two classes: the corresponding variation of the enumerated value of the dimension to be detected
Whether rate is greater than the overall variation rate;The probability that the chance event occurs is determined based on such as under type: the corresponding variation
The enumerated value number that rate is greater than the overall variation rate accounts for the ratio of the enumerated value total number of dimension to be detected.Pass through above-mentioned side
Chance event in comentropy is divided into two classes by formula, and one kind is to the contributive event of overall variation rate of index, and one kind is pair
The overall variation rate of index is without the event of contribution, if contributes whether correspond to change rate by the enumerated value of dimension to be detected big
It is determined in overall variation rate, therefore the comentropy of dimension to be detected can be quickly determined.
In an optional implementation, the present embodiment additionally provides a kind of calculation of comentropy, by way of example,
In the present embodiment, the comentropy is calculated in the following way:
Wherein, gm(D) comentropy of dimension m is indicated;N is equal to 2, that is, characterizes two class events, i.e., dimension pair to be detected
The whether influential event of change rate;
piThe probability that the chance event occurs is indicated in comentropy formula, is characterized in the present embodiment using aforementioned proportion;
D indicates data set in comentropy formula, is characterized in the present embodiment using the change rate.
Further, as an example with aforementioned table 1, the process for calculating comentropy to analysis model illustrates.By
It is found that occupation has 11 enumerated values shown in table 1.Wherein, user's ring of each enumerated value is than as shown in table 1, this 11 enumerated values
In, the number greater than average ring than 0.18% has 6, and the number no more than average ring than 0.18% has 5, therefore:
The calculating process of the comentropy of occupation may is that 6/11*log (6/11)+5/11*log (5/11).
According to above-mentioned calculation, the comentropy of other dimensions to be detected can also be accordingly calculated.
In practical application, the difference of the enumerated value of different dimensions to be detected may be larger, for example, the enumerated value of gender it is less,
And city enumerated value may it is more, comentropy be suitable for the to be detected dimension more to enumerated value calculating, in order to avoid piece
It is influenced caused by act value difference is different, in other examples, analysis model can also calculate the information gain-ratio of dimension to be detected, optional
, the information gain-ratio can the ratio based on the comentropy and the enumerated value total number of the dimension to be detected and it is true
It is fixed, it is influenced so as to weaken enumerated value difference bring.As an example, comentropy and the dimension to be detected can be calculated
For the ratio of enumerated value total number as the information gain-ratio, calculating process is as follows:
Intlm(D)=gm(D)/N
Wherein, IntlmIt (D) is the information gain-ratio for indicating dimension m, N is the enumerated value total number of dimension m.
It in other examples, can also be in the ratio based on comentropy and the enumerated value total number of the dimension to be detected
On the basis of, information gain-ratio is calculated by increasing the modes such as other modification parameters, the present embodiment is not construed as limiting this.
The analysis model of the present embodiment can also calculate sixteen principle parameters on the other hand;Wherein, former based on sixteen
Then, in the abnormal variation of index, the wherein small portion in each dimension to be detected of most contribution (about 80%)
Divide (about 20%).Therefore, bigger to the influence of index changed extremely if the data movement of some dimension is more concentrated.This reality
It applies in example, sixteen principle parameters can account for the ratio of the enumerated value total number of the dimension to be detected and true based on target number Q
It is fixed;Wherein, the target number Q is indicated: by the enumerated value of the dimension to be detected according to corresponding change absolute value from up to
Low sequence, the corresponding change absolute value of preceding Q enumerated value and value be more than the index change absolute value setting ratio, institute
Setting ratio is stated to determine based on 80%.Understand from principle, the concentration of data movement in the sixteen principle parameter characterization dimensions
Degree, the influence degree which changes index, the intensity of data movement in the dimension, intensity is higher, to finger
It is bigger to mark the influence changed.
As an example, sixteen principle parameters can be calculated in the following way, it is assumed that be directed to number of users ring ratio, whole number
Changing absolute value according to the index is 100,000;And setting ratio can be determined based on 80%, in practical application, also can according to need
The flexible configuration ratio, for example, with 80% similar in ratio.For using 80% in the present embodiment, the 80% of 100,000 is 80,000.
It will be to be checked according to the change absolute value of all enumerated values of the dimension to be detected for first dimension to be detected
All enumerated values sequence for surveying dimension, reads the change absolute value of each enumerated value, and seeks and be worth from high to low, when super with value
Cross this 80,000, then can obtain target number Q, the ratio based on target number Q Yu the enumerated value total number of the dimension can obtain
Obtain sixteen principle parameters.
Above-mentioned calculating process can use following algorithmic notation in practical applications:
Wherein, count indicates to count, and count (all) indicates the enumerated value total number of the dimension to be detected;
βjIndicate the change absolute value of the enumerated value j of dimension to be detected, sum (βj) indicate that the index of dimension to be detected changes
Absolute value, wherein 1≤j≤N, N are the enumerated value total number of dimension to be detected, wherein according to all pieces of the dimension to be detected
The change absolute value of act value sorts all enumerated values of dimension to be detected;
Indicate βjAccount for sum (βj) ratio whether be greater than 0.8, this 0.8
It can according to need and be adjusted flexibly.
As an example, by taking the sixteen principle parameters for calculating city dimension as an example, wherein the enumerated value of city dimension has 365
It is a, according to the variation absolute value of the corresponding number of users of each enumerated value, wherein changing absolute value highest first 5 has been more than 8
Ten thousand, therefore the influence value of city dimension is 5/365.
For calculating professional dimension, wherein the enumerated value of professional dimension has 11, according to the corresponding use of each enumerated value
The variation absolute value of amount amount, wherein the variation absolute value of highest 1 enumerated value of quantity alreadys exceed 80,000, therefore occupation dimension
The influence value of degree is 1/11.
It is appreciated that the influence value changed based on the dimension that sixteen rules algorithms calculate to index, is characterized in the dimension
The intensity of data movement, intensity is higher, and the influence changed to index is bigger.
By the above-mentioned means, analysis model can be calculated in comentropy, information gain-ratio or sixteen principle parameters
One or more, and then determine the dimension to be detected to the influence degree of the change rate of the index, wherein the influence degree
It is positively correlated with the comentropy or information gain-ratio, negatively correlated with sixteen principle parameters, when specific implementation can be according to need
The method of determination of flexible configuration influence degree is wanted, for example, if only calculating one of parameter, it can be according to the parameter to be checked
Dimension sequence is surveyed, the correlativity based on above-mentioned influence degree Yu comentropy, information gain-ratio or sixteen principle parameters can be true
The influence degree of fixed each dimension to be detected.It in other examples, can be in conjunction with influence degree and each if calculating has multiple parameters
The correlativity of a parameter and integrate determination, optionally, in the case where comprehensive determine, above three parameter can also be right respectively
There should be weighted value,
As an example, for ease of calculation, can by comentropy, information gain-ratio or sixteen principle parameter normalizations,
In, sixteen principle parameters can be normalized that (backward refers to, by dimension according to sixteen principle parameters by inverse based on backward
Sequence arrangement), be multiplied summation again later:
Score=normalize (f1)×normalize(f2)×normalize(f3)
Finally, can be ranked up according to the Score of each dimension, the maximum dimension of Score can be determined as to index
The influence of variation is maximum.Calculated result based on analysis model, the Score that can obtain each dimension to be detected optionally can
To export the Score of each dimension to be detected, Indexes Abnormality is become so as to find each dimension to be detected for user
Dynamic influence degree.
In practical application, based on calculated each dimension to be detected influence degree, since dimension to be detected may be used also
To be further partitioned into more various dimensions, above-mentioned detection method can also execute again, so as to which further segmentation is tieed up again
Degree, to find more careful influencing factor.That is, the detection method of the present embodiment can be such as tree-shaped calculating knot
Structure detects progressively since first layer, and dimension to be detected can be from slightly dividing to sub-layers layer, for example, when detecting occupation
Maximum is influenced on the change rate of index, can use the present embodiment method, using each enumerated value of occupation as newly to be detected
Dimension further analyzes, to the influence degree of the change rate of index in professional each enumerated value.
The present embodiment method can run on the Hive platform based on distributed computing, it can be achieved that real-time calculation processing, goes back
Acceleration processing can be carried out by technologies such as cachings, this method quickly detects data.
Corresponding with the embodiment of detection method of aforementioned Indexes Abnormality fluctuation, this specification additionally provides Indexes Abnormality wave
The embodiment of dynamic detection device and its applied equipment.
The embodiment of the detection device of this specification Indexes Abnormality fluctuation can be applied to be calculated in equipment in server etc..Dress
Setting embodiment can also be realized by software realization by way of hardware or software and hardware combining.It is implemented in software to be
Example, as the device on a logical meaning, being will be non-volatile by the processor of the detection of Indexes Abnormality fluctuation where it
Corresponding computer program instructions are read into memory what operation was formed in memory.For hardware view, as shown in Fig. 2,
A kind of hardware structure diagram of equipment where the detection device fluctuated for this specification Indexes Abnormality, in addition to processor shown in Fig. 2
210, except memory 230, network interface 220 and nonvolatile memory 240,231 place equipment of device in embodiment is led to
Often according to the actual functional capability of the equipment, it can also include other hardware, this is repeated no more.
As shown in figure 3, Fig. 3 is a kind of this specification detection of Indexes Abnormality fluctuation shown according to an exemplary embodiment
The block diagram of device, described device include:
Configuration module 31, is used for: providing configuration interface, obtains Data Detection configuration information, institute by the configuration interface
State the change rate that Data Detection configuration information includes: dimension and index to be detected;
Computing module 32, is used for: load data to be tested, using the change rate of the dimension to be detected and the index as
The analysis model is called in the input of analysis model, wherein the analysis model is used for: being calculated using the data to be tested
Following one or more parameters of the dimension to be detected: comentropy, information gain-ratio or sixteen principle parameters, and based on calculating
Obtained parameter determines the dimension to be detected to the influence degree of the change rate of the index;
Output module 33, is used for: exporting the testing result.
Optionally, the change rate includes: year-on-year or ring ratio.
Optionally, the comentropy is determined based on such as under type: the chance event in the comentropy is divided into two classes: described
The enumerated value of dimension to be detected corresponds to the change rate whether change rate is greater than the index;The probability base that the chance event occurs
Determine in such as under type: the enumerated value number that the corresponding change rate is greater than the change rate of the index accounts for piece of dimension to be detected
The ratio of act value total number.
Optionally, the comentropy is calculated in the following way:
Wherein, gm(D) comentropy of dimension m is indicated, n is equal to 2, piIndicate the ratio, D indicates the overall variation rate.
Optionally, ratio of the information gain-ratio based on the comentropy Yu the enumerated value total number of the dimension to be detected
It is worth and determines.
Optionally, the sixteen principles parameter accounts for the enumerated value total number of the dimension to be detected based on target number Q
Ratio and determine;Wherein, the target number Q is indicated: the enumerated value of the dimension to be detected is absolute according to corresponding variation
Value from high to low sequence, preceding Q enumerated value corresponding change absolute value and value set more than the change absolute value of the index
Certainty ratio, the setting ratio are determined based on 80%.
Optionally, the influence degree is positively correlated with the comentropy or information gain-ratio, joins with sixteen principle
Number is negatively correlated.
Optionally, the analysis model is run in Hive platform.
Correspondingly, this specification also provides a kind of detection device of Indexes Abnormality fluctuation, it include processor;For storing
The memory of processor-executable instruction;Wherein, the processor is configured to:
Configuration interface is provided, Data Detection configuration information is obtained by the configuration interface, the Data Detection matches confidence
Breath includes: the change rate of dimension and index to be detected;
Data to be tested are loaded, using the change rate of the dimension to be detected and the index as the input of analysis model,
It calls the analysis model, wherein the analysis model is used for: calculating the dimension to be detected using the data to be tested
Following one or more parameters: comentropy, information gain-ratio or sixteen principle parameters, and institute is determined based on the parameter being calculated
Dimension to be detected is stated to the influence degree of the change rate of the index;
Export the testing result.
The function of modules and the realization process of effect are specifically detailed in the detection device of These parameters unusual fluctuations
The realization process that step is corresponded in the detection method of Indexes Abnormality fluctuation is stated, details are not described herein.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality
Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit
The module of explanation may or may not be physically separated, and the component shown as module can be or can also be with
It is not physical module, it can it is in one place, or may be distributed on multiple network modules.It can be according to actual
The purpose for needing to select some or all of the modules therein to realize this specification scheme.Those of ordinary skill in the art are not
In the case where making the creative labor, it can understand and implement.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims
It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment
It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable
Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can
With or may be advantageous.
Those skilled in the art will readily occur to this specification after considering specification and practicing the invention applied here
Other embodiments.This specification is intended to cover any variations, uses, or adaptations of this specification, these modifications,
Purposes or adaptive change follow the general principle of this specification and do not apply in the art including this specification
Common knowledge or conventional techniques.The description and examples are only to be considered as illustrative, the true scope of this specification and
Spirit is indicated by the following claims.
It should be understood that this specification is not limited to the precise structure that has been described above and shown in the drawings,
And various modifications and changes may be made without departing from the scope thereof.The range of this specification is only limited by the attached claims
System.
The foregoing is merely the preferred embodiments of this specification, all in this explanation not to limit this specification
Within the spirit and principle of book, any modification, equivalent substitution, improvement and etc. done should be included in the model of this specification protection
Within enclosing.