CN109857593A - A kind of data center's log missing data restoration methods - Google Patents
A kind of data center's log missing data restoration methods Download PDFInfo
- Publication number
- CN109857593A CN109857593A CN201910056129.7A CN201910056129A CN109857593A CN 109857593 A CN109857593 A CN 109857593A CN 201910056129 A CN201910056129 A CN 201910056129A CN 109857593 A CN109857593 A CN 109857593A
- Authority
- CN
- China
- Prior art keywords
- data
- discretization
- attribute
- tensor
- data attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of restoration methods of missing data in data center's log, the correlation of different data attribute in data center's log is excavated using correlation analysis first, optimal data attribute set is chosen, and discretization optimization is carried out to data using a two stages discretization step-length optimization algorithm;Then using the optimal data attribute set of selection as the attribute of tensor, a sparse tensor is constructed;The tensor complementing method based on tensor resolution is finally used, completion is carried out to sparse tensor, obtains a dense tensor;By the dense tensor in conjunction with original imperfect daily record data, a complete log data set is obtained.
Description
Technical field
The invention belongs to data center's log analysis fields, and in particular to missing data is extensive in a kind of data center's log
Compound method.
Background technique
Large-scale data center is the basis for IT application facility of internet and related exhibition industry, is mentioned for the operation of Internet service
For software and hardware resources such as calculating, storage and networks.Commonly used virtualization technology in modern data center, containerization technique and
Server Consolidation technology.In this context, often a variety of Computational frames coexist for data center, and a variety of heterogeneous workloads coexist.
Data center can generate massive logs data in the process of running, comprising data center server, load operation when information.
Data center's log analysis is one of the important means of data center's performance optimization.Pass through data center's log point
Analysis, the important informations such as the available data center's load characteristic of data center manager, resource use pattern further instruct number
According to central task scheduling, resource management, programming model Optimization Work.However as the continuous growth of data center's scale, number
Increasingly serious shortage of data problem is faced according to central log.The shortage of data problem of data center's log is score in the middle part of log
According to for empty or fail data, the input that can not be worked directly as log analysis.There are two the reason of problem occurs is main:
(1) Bug in the acquisition stage of daily record data, monitoring system may cause shortage of data.Simultaneously as monitoring system into
Journey is typically provided at lower priority, is deprived of resource compared with Gao Shihui when a group of planes is loaded, and then lead to shortage of data;(2)
In the processing stage of daily record data, due to secrecy etc., some data can be by anonymization or normalization.The process can be direct
Shortage of data is caused, then will lead to unexpected shortage of data in the Bug that the process occurs.However, current log analysis field pair
The processing method of missing data mainly has simple removal missing data item, and mends using based on mean value or the statistics of recurrence
Full method carries out missing data recovery.Existing method has the following problems:
(1) large scale shortage of data problem can not be coped with.As data center's scale increases, data center's daily record data is lacked
Mistake ratio has the tendency that rising.When facing large scale shortage of data problem, existing simple removing method will lead to log number
It is greatly decreased according to Global Information amount;And it is low to restore accuracy rate based on mean value or the statistics complementing method of recurrence.Both of which
Large scale shortage of data problem cannot be coped with, and then influences the accuracy of log analysis work.
(2) correlativity complicated between different data attribute in data center's log can not be coped with.Data center's log
Usually possess ten to dozens of data attribute.There is different linearly or nonlinearly related passes between different data attributes
System, carrying out analysis to the correlativity data attribute can be improved the accuracy of data recovery.Existing method is restoring log
The correlativity between different data attribute is not considered when missing data, causes recovery accuracy rate lower.And it is needed when restoring
The input data attribute of recovery algorithms is manually specified, non-expert personnel are not the case where carrying out correlation analysis to daily record data
Under, it is difficult correctly to be selected.
Summary of the invention
In view of the above-mentioned problems, the invention proposes a kind of data center's log missing data restoration methods based on tensor.
The present invention uses correlation analysis to excavate the correlation of different data attribute in data center's log first, chooses optimal number
Discretization optimization is carried out to data according to attribute set, and using a two stages discretization step-length optimization algorithm;Then it will choose
Attribute of the optimal data attribute set as tensor, construct a sparse tensor;Finally use the tensor based on tensor resolution
Complementing method carries out completion to sparse tensor, obtains a dense tensor.By the dense tensor and original imperfect daily record data
In conjunction with obtaining a complete log data set.
In the present invention, using CANDECOMP/PARAFAC (CP) decomposition method to sparse carry out completion.CP decomposition is one
The widely applied tensor complementing method of kind, it excavates tensor data by being several one tensors of order by sparse tensor resolution
Changing rule, and then completion is carried out to sparse tensor data.For data center's daily record data due to its own feature, that constructs is dilute
Dredging tensor has low-rank, therefore is suitble to decompose using CP and carries out tensor completion.
Data center's log missing data restoration methods of the present invention are broadly divided into five steps: initialization, data
Attribute is chosen, the optimization of data attribute discretization, tensor constructs and completion, log missing data completion.In the method, there are five
Basic parameter: discretization branch mailbox number lower bound NL, discretization branch mailbox number upper bound NH, attribute selection discretization step-length S1, discretization optimization
Step-length S2, CP decomposition one tensor number R of order, gradient decline learning rateGradient declines objective function weight λ1And λ2, gradient decline
Objective function convergence threshold θ.NLBetween general value 50-150, NHBetween general value 400-500, S1General value 100-200
Between, S2Between general value 25-50, between the general value 5-30 of R,Generally take 0.00001, λ1And λ2Generally take 0-1 it
Between, θ generally takes 0.01.
The above method is realized according to the following steps:
(1) it initializes.If sharing n data attribute in log, m item record.Then data attribute set can be expressed as A, A
={ a1,a2,…an}.Data record set in log can be expressed as E, E={ e1,e2,…em}.Data in log can be with
V is expressed as,Wherein vijIndicate the value of i-th of data attribute in j-th strip data record.With scarce
The data attribute for losing data is denoted as aT。
(2) data attribute is chosen.
2.1) all data attributes that there may be correlativity with target missing data attribute are manually chosen as candidate number
According to attribute set A ', A '={ a1,a2,…an′}。
2.2) the discretization rule set in data decimation stage is constructedEach of them rule
It is then ri={ ri1,ri2,…rin′, rijIndicate the i-th rule to candidate attribute ajDiscretization case number.
I.e. by discretization branch mailbox number lower bound NL, discretization branch mailbox number upper bound NH, attribute selection discretization step-length S1Determining search space
In, traverse the combination of all data attributes Yu discretization branch mailbox number.
2.3) each discretization rule r is usedi∈ Rule uses discretization rule r to Data DiscretizationiAfter discretization
Daily record data can be expressed as Vi,Then data attribute selection is carried out one by one.
2.3.1 all candidate data attribute a) are calculated using formula (1) and formula (2)i∈ A ' and target data attribute aT's
AMI is denoted as AMI (ai;aT).Then the priority P of candidate data attribute, P={ p are initialized1,p2,…pn, wherein pi=AMI
(ai;aT)。
2.3.2) data attribute of a highest priority is selected (to be denoted as ak) be added to selection data attribute set, and will
It is from the middle removal of candidate data attribute set A '.By remaining candidate data attribute alThe priority update of ∈ A ' is pl×(1-AMI
(al,ak))。
2.3.3 step 2.3.2) is repeated) it is equal to Object selection quantity until choosing the quantity of data attribute.
2.3.4 result) will be chosen and be denoted as resultiAnd it is added to and chooses in results set Result.
2.4) all selection results chosen in results set Result are counted, by the highest data of the frequency of occurrences
Attribute chooses set as final data attribute and chooses result AS,AS={ a1,a2,…aq}。
(3) discretization granularity optimizes.
3.1) the discretization rule set of discretization granularity optimizing phase is constructed
Each of them rule is r 'i={ r 'i1,r′i2,…r′iq, r 'ijIndicate the i-th rule to candidate attribute ajDiscretization case
Number.I.e. by discretization branch mailbox number lower bound NL, discretization branch mailbox number upper bound NH, attribute selection discretization step
Long S2In determining search space, the combination of all data attributes Yu discretization branch mailbox number is traversed.
3.2) the data attribute subset A based on selectionS, use each discretization rule r 'i∈ Rule ', carry out data from
Dispersion.Use discretization rule riDaily record data after discretization can be expressed as Vi′,
Variance coefficient (Weighted coefficient of variation, WCV) is calculated to the daily record data after discretization.
Steps are as follows for the calculating of WCV: the record in daily record data being pressed data attribute subset A firstSIn each data attribute value
Grouping, is denoted as G, G={ g1,g2,…gp, each groupingEach of them is recorded in institute
There is data attribute ak∈ASOn be owned by equal numerical valueTarget data attribute a in each grouping is calculated using formula (3)T
Numerical valueThe coefficient of variation, be denoted as ci.The WCV of each grouping is calculated using formula (4)i, then calculated using formula (5) entire
The WCV of log.
Wherein σ (X) indicates the standard deviation of X, and μ (X) indicates the mean value of X, and size (X) indicates the data entry number in X.
3.3) the smallest Data Discretization result of WCV value is chosen as final Data Discretization as a result, after discretization
Daily record data is denoted as
4) tensor building and tensor completion.
4.1) using the daily record data V after discretizationFAnd target data attribute aTConstruct tensor.If each data attribute
ai∈ASOn discrete values number beIt then constructs a q and ties up tensor
4.1.1) by each data attribute ai∈ASIn discrete values by ascending order arrange, building numerical value v to arrange serial number d
Mapping
4.1.2) by target data attribute aTNumerical value as tensor value insert tensor.If data record eiChoosing data
Attribute AS={ a1,a2,…aqOn numerical value be respectively { vF i1, vF i2..., vF iq, in target data attribute aTNumerical valueIt is logical
It crosses mapping M and obtains { vF i1, vF i1..., vF iqCorresponding arrangement serial number { di1, di2..., dij, then the numerical value in tensorWhen there is u record to possess identical tensor subscript, the objective attribute target attribute data recorded using these are equal
ValueNumerical value as tensor.
4.2) using CP decomposition method to tensor completion, decomposable process is solved using gradient descent method.
4.2.1 q factor matrix, factor matrix) are initialized using the random number on section [0,1]It is right
Answer data attribute ai, SiFor aiThe number of attribute discretization data, R are the hyper parameter of algorithm, initialize weight square according to formula (6)
Battle array W.
4.2.2) factor matrix is updated according to formula (7).Wherein ε=χ-[[F1,F2…,Fq]], χ is building
Sparse tensor, " [[]] " are Khatri-Rao operator, (χ)(N)Indicate the N-mode matrixing of tensor χ,λ1And λ2For algorithm
Hyper parameter.
4.2.3) according to formula (8) calculating target function value.
4.2.4 step 4.2.2) is repeated) and 4.2.3) until the variable quantity of target function value twice is less than threshold θ.
5) daily record data restores.The record e of missing data is had to eachiIn data attribute AS={ a1,a2,…aqOn
Numerical value be respectively { vF i1, vF i2..., vF iq, { v is obtained by mapping MF i1, vF i1..., vF iqCorresponding arrangement serial number { di1,
di2..., dij, then using the tensor value after completionMissing data is restored.
Detailed description of the invention
Fig. 1 is the deployment diagram of the method for the present invention.
Fig. 2 is overview flow chart of the invention.
Fig. 3 is the flow chart that daily record data attribute is chosen.
Fig. 4 is the flow chart of daily record data discretization optimization.
Fig. 5 is the flow chart of tensor building and completion.
Specific embodiment
The present invention is illustrated with reference to the accompanying drawings and detailed description.
Fig. 1 is the deployment diagram of the method for the present invention.The present invention is made of multiple computer servers, passes through network between server
Connection.Platform nodes are divided into two classes: including a memory node and calculate node.The method of the present invention includes two class kernel softwares
Module: log memory module and log processing module.Wherein, log memory module is responsible for the storage of daily record data, saves in storage
It is disposed on point;Log processing module is responsible for handling daily record data, disposes in calculate node.
Illustrate the specific implementation method of the method for the present invention below with reference to Fig. 2 summary of the invention main-process stream.In present implementation,
Basic parameter is provided that discretization branch mailbox number lower bound NL=100, discretization branch mailbox number upper bound NH=500, attribute is chosen discrete
Change step-length S1=100, discretization optimizes step-length S2=25, CP decompose one tensor number R=25 of order, and gradient declines learning rateGradient declines objective function weight λ1=0.5 and λ2=0.5, gradient decline objective function convergence threshold θ=
0.01。
Specific implementation method can be divided into following steps:
(1) it initializes.It enables and shares 49 data attributes in data center's log, 10364956 records.Then data attribute
Set can be expressed as A, A={ a1,a2,…a49}.Data record set in log can be expressed as E, E={ e1,e2,…
e10364956}.Daily record data can be expressed as V,Data attribute with missing data is
Real_mem_avg (average memory usage amount), is denoted as aT。
(2) data attribute is chosen, and the flow chart of steps is as shown in Figure 3.
2.1) all data attributes that there may be correlativity with target missing data attribute are manually chosen as candidate number
According to attribute set A ', A '=plan_cpu, plan_mem, instance_num, duration,
, real_cpu_avg, end_time } and (application cpu resource, application memory source, example quantity, duration, reality
Border cpu resource usage amount average value, end time).
2.2) discretization rule set Rule, the Rule={ r in data decimation stage are constructed1,r2,…r15625, wherein each
Rule is ri={ ri1,ri2,…ri6, rijIndicate the i-th rule to candidate attribute ajDiscretization case number.I.e. by discrete
Change branch mailbox number lower bound 100, the discretization branch mailbox number upper bound 500, attribute is chosen in the search space that discretization step-length 100 determines, time
Go through the combination of all data attributes Yu discretization branch mailbox number.
2.3) each discretization rule r is usedi∈ Rule uses discretization rule r to Data DiscretizationiAfter discretization
Daily record data can be expressed as Vi,Then data attribute selection is carried out one by one.
2.3.1) according to the method in summary of the invention 2.3.1), all candidate data attribute a are calculatedi∈ A ' and target data
Attribute aTAMI, be denoted as AMI (ai;aT).Then the priority P of initialization candidate data attribute, P=0.02,0.11,
0.018,0.09,0.009,0.14}。
2.3.2 the data attribute end_time of a highest priority) is selected to be added to selection data attribute set, and will
It is from the middle removal of candidate data attribute set A '.According to the method in summary of the invention 2.3.2) by remaining candidate data attribute al∈
The priority update of A ' is { 0.018,0.09,0.015,0.07,0.0087 }.
2.3.3 step 2.3.2) is repeated) it is equal to Object selection quantity until choosing the quantity of data attribute.
2.3.4 result) will be chosen and be denoted as resultiAnd it is added to and chooses in results set Result.
2.4) all selection results chosen in results set Result are counted, by the highest data of the frequency of occurrences
Attribute chooses set as final data attribute and chooses result AS,AS={ end_time, plan_mem, duration }.
(3) discretization granularity optimizes, and the flow chart of steps is as shown in Figure 4.
3.1) discretization rule set Rule ', Rule '={ r ' of discretization granularity optimizing phase is constructed1,r′2,…r
′4096, each of them rule is r 'i={ r 'i1,r′i2,r′i3, r 'ijIndicate the i-th rule to candidate attribute { end_
Time, plan_mem, duration } in j-th of attribute discretization case number.It is discrete i.e. by discretization branch mailbox number lower bound 100
Change the branch mailbox number upper bound 500, attribute is chosen in the search space that discretization step-length 25 determines, all data attributes and discretization are traversed
The combination of branch mailbox number.
3.2) the data attribute subset A based on selectionS, use each discretization rule r 'i∈ Rule ', carry out data from
Dispersion.Use discretization rule riDaily record data after discretization can be expressed as Vi′, Root
According to summary of the invention 3.2) in method to after discretization daily record data calculate plus WCV=0.35647
3.3) the smallest Data Discretization result of WCV value is chosen as final Data Discretization as a result, after discretization
Daily record data is denoted as
4) tensor building and tensor completion, the flow chart of steps are as shown in Figure 5.
4.1) using the daily record data V after discretizationFAnd target data attribute aTConstruct tensor.Each data attribute ai
∈ASOn discrete values number be respectively 276,87,61, construct one 3 dimension tensor
4.1.1) by each data attribute ai∈ASIn discrete values by ascending order arrange, building numerical value v to arrange serial number d
Mapping
4.1.2) by target data attribute aTNumerical value as tensor value insert tensor.If data record eiChoosing data
Attribute ASNumerical value on={ end_time, plan_mem, duration } is respectively { 35519,0.016,34 }, in target data
Attribute aTNumerical value be 0.023814, by mapping M obtain { 35519,0.016,34 } corresponding arrangement serial number 1,13 ...,
24 }, then the numerical value χ in tensor1 13 24=0.023814.
4.2) using CP decomposition method to tensor completion, decomposable process is solved using gradient descent method.
4.2.1 three factor matrixs) are initialized using the random number on section [0,1], three factor matrixs are respectivelyWeight matrix is initialized according to the method in summary of the invention 4.2.1)
W。
4.2.2) three factor matrixs are updated according to the method in summary of the invention 4.2.2).
4.2.3) according to the method calculating target function value E=7983.348 in summary of the invention 4.2.2)
4.2.4 step 4.2.2) is repeated) and 4.2.3) until the variable quantity of target function value twice is less than threshold value 0.01.
5) daily record data restores.If having the record e of missing data1In data attribute AS=end_time, plan_mem,
Duration } on numerical value be respectively { 45682,0.008,89 }, pass through mapping M and obtain { 45682,0.008,89 } corresponding row
Column serial number { 34,5,41 }, then using the tensor value x after completion34 5 41Missing data is restored.
The data receiving channel dynamic allocation method proposed according to the present invention, inventor have carried out relevant performance and have surveyed
Examination.Test result shows that the method for the present invention can accurately restore the missing data in data center's log.
Performance test uses the log of data center, Alibaba as test data set, and existing log analysis is worked
In missing data restoration methods: mean value restore, linear regression restore;And be widely used in other field advance data it is extensive
Compound method: KNN restores, multi-layer perception (MLP) restores, support vector machines is restored totally five kinds of methods and is compared, and is mentioned with embodying the present invention
Method out is in the advantage for restoring data center's log missing data accuracy rate.Performance test is run on by 1 computer, hardware
Configuration includes: the CPU, 32GB DDR4RAM, 512GB NVMe SSD of 7 1700X@3.80GHz of AMD Ryzen.
Performance test uses two parameter evaluation data recovery errors: average relative error (MRE) and root-mean-square error
(RMSE), their calculation formula such as formula (9) and formula (10) are shown:
Performance test is divided into 4 groupings according to different daily record data missing ratios and missing mode, respectively according to Ah
Li Baba log lacks 30% shortage of data rate (TM30) of mode, lacks 85% shortage of data rate of mode according to Alibaba's log
(TM85), completely random lacks 30% shortage of data rate (RM30), and completely random lacks 85% shortage of data rate (RM85).Performance
The result of test is as shown in Table 1 and Table 2.
1 the performance test results of table (MRE)
2 the performance test results of table (RMSE)
By the data of Tables 1 and 2 it can be concluded that, in four groups of experiments, relative to five kinds of control methods, the method for the present invention
MRE averagely reduces 47.7%, RMSE and averagely reduces 56.6%, MRE maximum and reduce 85.9%, RMSE maximum and reduces
92%.The mistake that the multiple perceptron of the lower two machine learning data reconstruction methods of mean error restores and support vector machines is restored
Difference is significantly increased with the rising of shortage of data ratio, and the error of the method for the present invention then keeps stable, in 30% and 85% two kind of number
Maximum lift according to MRE under miss rate is respectively 32.7% and 50%.The performance test results prove relative to five kinds of control methods,
The missing data restoration errors of the method for the present invention are lower and more stable, can obtain under different shortage of data rates higher
Accuracy rate.
Finally, it should be noted that above example is only to illustrate the present invention and not limits technology described in the invention,
And the technical solution and its improvement of all spirit and scope for not departing from invention, it should all cover in claim model of the invention
In enclosing.
Claims (5)
1. a kind of data center's log missing data restoration methods, it is characterised in that: the following steps are included:
(1) it initializes, if sharing n data attribute in log, m item record.Then data attribute set can be expressed as A, A=
{a1, a2... an, the data record set in log can be expressed as E, E={ e1, e2... em, the data in log can be with table
It is shown asWherein, vijThe value for indicating i-th of data attribute in j-th strip data record, with missing
The data attribute of data is denoted as aT;
(2) data attribute is chosen.
2.1) selection is all to have the data attribute of correlativity as candidate data property set with target missing data attribute
Close A ', A '={ a1, a2... an′};
2.2) the discretization rule set Rule in data decimation stage is constructed,Each of them rule is
ri={ ri1, ri2... rin′, rijIndicate the i-th rule to candidate attribute ajDiscretization case number. Exist
By discretization branch mailbox number lower bound NL, discretization branch mailbox number upper bound NH, attribute selection discretization step-length S1In determining search space,
Traverse the combination of all data attributes Yu discretization branch mailbox number;
2.3) each discretization rule r is usedi∈ Rule uses discretization rule r to Data DiscretizationiDay after discretization
Will data can be expressed asThen data attribute selection is carried out one by one;
2.4) all selection results chosen in results set Result are counted, by the highest data attribute of the frequency of occurrences
Set is chosen as final data attribute and chooses result AS, AS={ a1, a2... aq};
(3) discretization granularity optimizes
3.1) the discretization rule set Rule ' of discretization granularity optimizing phase is constructed,It is wherein each
Rule is r 'i={ r 'i1, r 'i2... r 'iq, r 'ijIndicate the i-th rule to candidate attribute ajDiscretization case number.I.e. by discretization branch mailbox number lower bound NL, discretization branch mailbox number upper bound NH, attribute selection discretization step-length S2
In determining search space, the combination of all data attributes Yu discretization branch mailbox number is traversed;
3.2) the data attribute subset A based on selectionS, use each discretization rule r 'i∈ Rule ' carries out data discrete
Change, uses discretization rule riDaily record data after discretization can be expressed as To from
Daily record data after dispersion calculates variance coefficient (Weighted coefficient of variation, WCV);
3.3) the smallest Data Discretization result of WCV value is chosen as final Data Discretization as a result, log after discretization
Data are denoted as
4) tensor building and tensor completion
4.1) using the daily record data V after discretizationFAnd target data attribute aTConstruct tensor.If each data attribute ai∈AS
On discrete values number beIt then constructs a q and ties up tensor
4.2) using CP decomposition method to tensor completion, decomposable process is solved using gradient descent method.;
5) daily record data restores
The record e of missing data is had to eachiIn data attribute AS={ a1, a2... aqOn numerical value be respectively { vF i1,
vF i2..., vF iq, { v is obtained by mapping MF i1, vF i1..., vF iqCorresponding arrangement serial number { di1, di2..., dij, then it uses
Tensor value after completionMissing data is restored.
2. data center's log missing data restoration methods as described in claim 1, it is characterised in that: 2.3) include:
2.3.1 all candidate data attribute a) are calculated using formula (1) and formula (2)i∈ A ' and target data attribute aTAMI,
It is denoted as AMI (ai;aT).Then the priority P of candidate data attribute, P={ p are initialized1, p2... pn, wherein pi=AMI (ai;
aT)。
2.3.2) data attribute of a highest priority is selected (to be denoted as ak) be added to choose data attribute set, and by its from
The middle removal of candidate data attribute set A '.By remaining candidate data attribute alThe priority update of ∈ A ' is pl×(1-AMI(al,
ak))。
2.3.3 step 2.3.2) is repeated) it is equal to Object selection quantity until choosing the quantity of data attribute.
2.3.4 result) will be chosen and be denoted as resultiAnd it is added to and chooses in results set Result.
3. data center's log missing data restoration methods as described in claim 1, it is characterised in that: 4.1) include:
4.1.1) by each data attribute ai∈ASIn discrete values arranged by ascending order, building numerical value v being reflected to serial number d is arranged
It penetrates
4.1.2) by target data attribute aTNumerical value as tensor value insert tensor, if data record eiChoosing data attribute
As={ a1, a2... aqOn numerical value be respectively { vF i1, vF i2..., vF iq, in target data attribute aTNumerical valuePass through
It maps M and obtains { vF i1, vF i1..., vF iqCorresponding arrangement serial number { di1, di2..., dij), then the numerical value in tensorWhen there is u record to possess identical tensor subscript, the objective attribute target attribute data mean value of these records is usedNumerical value as tensor.
4. data center's log missing data restoration methods as described in claim 1, it is characterised in that: 4.2) include:
4.2.1 q factor matrix, factor matrix) are initialized using the random number on section [0,1]Corresponding number
According to attribute ai, SiFor aiThe number of attribute discretization data, R are the hyper parameter of algorithm, initialize weight matrix W according to formula (6),
4.2.2) factor matrix is updated according to formula (7), wherein For the dilute of building
Tensor is dredged, " [[]] " is Khatri-Rao operator,Indicate tensorN-mode matrixing,λ1And λ2To calculate
Method hyper parameter;
4.2.3) according to formula (8) calculating target function value.
4.2.4 step 4.2.2) is repeated) and 4.2.3) until the variable quantity of target function value twice is less than threshold θ.
5. data center's log missing data restoration methods as described in claim 1, it is characterised in that: 4.2) include: 3.2)
Steps are as follows for the calculating of middle WCV: the record in daily record data being pressed data attribute subset A firstSIn each data attribute
Value grouping, is denoted as G, G={ g1, g2... gp, each grouping Each of them is recorded in institute
There is data attribute ak∈ASOn be owned by equal numerical value vi′ jk.Target data attribute a in each grouping is calculated using formula (3)T
Numerical valueThe coefficient of variation, be denoted as ci.The WCV of each grouping is calculated using formula (4)i, then calculated using formula (5) entire
The WCV of log.
Wherein σ (X) indicates the standard deviation of X, and μ (X) indicates the mean value of X, and size (X) indicates the data entry number in X.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910056129.7A CN109857593B (en) | 2019-01-21 | 2019-01-21 | Data center log missing data recovery method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910056129.7A CN109857593B (en) | 2019-01-21 | 2019-01-21 | Data center log missing data recovery method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109857593A true CN109857593A (en) | 2019-06-07 |
CN109857593B CN109857593B (en) | 2020-08-28 |
Family
ID=66895519
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910056129.7A Active CN109857593B (en) | 2019-01-21 | 2019-01-21 | Data center log missing data recovery method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109857593B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183644A (en) * | 2020-09-29 | 2021-01-05 | 中国平安人寿保险股份有限公司 | Index stability monitoring method and device, computer equipment and medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102156720A (en) * | 2011-03-28 | 2011-08-17 | 中国人民解放军国防科学技术大学 | Method, device and system for restoring data |
CN102289524A (en) * | 2011-09-26 | 2011-12-21 | 深圳市万兴软件有限公司 | Data recovery method and system |
US20130117237A1 (en) * | 2011-11-07 | 2013-05-09 | Sap Ag | Distributed Database Log Recovery |
CN103631676A (en) * | 2013-11-06 | 2014-03-12 | 华为技术有限公司 | Snapshot data generating method and device for read-only snapshot |
CN103838642A (en) * | 2012-11-26 | 2014-06-04 | 腾讯科技(深圳)有限公司 | Data recovery method, device and system |
CN103942252A (en) * | 2014-03-17 | 2014-07-23 | 华为技术有限公司 | Method and system for recovering data |
CN105955845A (en) * | 2016-04-26 | 2016-09-21 | 浪潮电子信息产业股份有限公司 | Data recovery method and device |
CN107220142A (en) * | 2016-03-22 | 2017-09-29 | 阿里巴巴集团控股有限公司 | Perform the method and device of data recovery operation |
-
2019
- 2019-01-21 CN CN201910056129.7A patent/CN109857593B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102156720A (en) * | 2011-03-28 | 2011-08-17 | 中国人民解放军国防科学技术大学 | Method, device and system for restoring data |
CN102289524A (en) * | 2011-09-26 | 2011-12-21 | 深圳市万兴软件有限公司 | Data recovery method and system |
US20130117237A1 (en) * | 2011-11-07 | 2013-05-09 | Sap Ag | Distributed Database Log Recovery |
CN103838642A (en) * | 2012-11-26 | 2014-06-04 | 腾讯科技(深圳)有限公司 | Data recovery method, device and system |
CN103631676A (en) * | 2013-11-06 | 2014-03-12 | 华为技术有限公司 | Snapshot data generating method and device for read-only snapshot |
CN103942252A (en) * | 2014-03-17 | 2014-07-23 | 华为技术有限公司 | Method and system for recovering data |
CN107220142A (en) * | 2016-03-22 | 2017-09-29 | 阿里巴巴集团控股有限公司 | Perform the method and device of data recovery operation |
CN105955845A (en) * | 2016-04-26 | 2016-09-21 | 浪潮电子信息产业股份有限公司 | Data recovery method and device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183644A (en) * | 2020-09-29 | 2021-01-05 | 中国平安人寿保险股份有限公司 | Index stability monitoring method and device, computer equipment and medium |
CN112183644B (en) * | 2020-09-29 | 2024-05-03 | 中国平安人寿保险股份有限公司 | Index stability monitoring method and device, computer equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN109857593B (en) | 2020-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110199273B (en) | System and method for loading, aggregating and bulk computing in one scan in a multidimensional database environment | |
US20200142872A1 (en) | System and method for use of a dynamic flow in a multidimensional database environment | |
US10223437B2 (en) | Adaptive data repartitioning and adaptive data replication | |
US11200223B2 (en) | System and method for dependency analysis in a multidimensional database environment | |
CN108255712A (en) | The test system and test method of data system | |
US20070282470A1 (en) | Method and system for capturing and reusing intellectual capital in IT management | |
CN106547882A (en) | A kind of real-time processing method and system of big data of marketing in intelligent grid | |
CN104205039A (en) | Interest-driven business intelligence systems and methods of data analysis using interest-driven data pipelines | |
US8688819B2 (en) | Query optimization in a parallel computer system with multiple networks | |
US7284011B1 (en) | System and methods for processing a multidimensional database | |
CN104468274A (en) | Cluster monitor and management method and system | |
CN112559237B (en) | Operation and maintenance system troubleshooting method and device, server and storage medium | |
CN109308309B (en) | Data service quality assessment method and terminal | |
CN112579586A (en) | Data processing method, device, equipment and storage medium | |
US9042263B1 (en) | Systems and methods for comparative load analysis in storage networks | |
CN105556474A (en) | Managing memory and storage space for a data operation | |
US20030139900A1 (en) | Methods and apparatus for statistical analysis | |
CN111708895B (en) | Knowledge graph system construction method and device | |
US20060200484A1 (en) | Unified reporting | |
KR101973328B1 (en) | Correlation analysis and visualization method of Hadoop based machine tool environmental data | |
Creţu-Ciocârlie et al. | Hunting for problems with Artemis | |
CN109857593A (en) | A kind of data center's log missing data restoration methods | |
CN113506098A (en) | Power plant metadata management system and method based on multi-source data | |
CN109359205A (en) | A kind of remote sensing image cutting method and equipment based on geographical grid | |
CN113504996A (en) | Load balance detection method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |