CN109857593A - A kind of data center's log missing data restoration methods - Google Patents

A kind of data center's log missing data restoration methods Download PDF

Info

Publication number
CN109857593A
CN109857593A CN201910056129.7A CN201910056129A CN109857593A CN 109857593 A CN109857593 A CN 109857593A CN 201910056129 A CN201910056129 A CN 201910056129A CN 109857593 A CN109857593 A CN 109857593A
Authority
CN
China
Prior art keywords
data
discretization
attribute
tensor
data attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910056129.7A
Other languages
Chinese (zh)
Other versions
CN109857593B (en
Inventor
梁毅
毕临风
苏醒
苏超
陈金栋
丁治明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201910056129.7A priority Critical patent/CN109857593B/en
Publication of CN109857593A publication Critical patent/CN109857593A/en
Application granted granted Critical
Publication of CN109857593B publication Critical patent/CN109857593B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of restoration methods of missing data in data center's log, the correlation of different data attribute in data center's log is excavated using correlation analysis first, optimal data attribute set is chosen, and discretization optimization is carried out to data using a two stages discretization step-length optimization algorithm;Then using the optimal data attribute set of selection as the attribute of tensor, a sparse tensor is constructed;The tensor complementing method based on tensor resolution is finally used, completion is carried out to sparse tensor, obtains a dense tensor;By the dense tensor in conjunction with original imperfect daily record data, a complete log data set is obtained.

Description

A kind of data center's log missing data restoration methods
Technical field
The invention belongs to data center's log analysis fields, and in particular to missing data is extensive in a kind of data center's log Compound method.
Background technique
Large-scale data center is the basis for IT application facility of internet and related exhibition industry, is mentioned for the operation of Internet service For software and hardware resources such as calculating, storage and networks.Commonly used virtualization technology in modern data center, containerization technique and Server Consolidation technology.In this context, often a variety of Computational frames coexist for data center, and a variety of heterogeneous workloads coexist. Data center can generate massive logs data in the process of running, comprising data center server, load operation when information.
Data center's log analysis is one of the important means of data center's performance optimization.Pass through data center's log point Analysis, the important informations such as the available data center's load characteristic of data center manager, resource use pattern further instruct number According to central task scheduling, resource management, programming model Optimization Work.However as the continuous growth of data center's scale, number Increasingly serious shortage of data problem is faced according to central log.The shortage of data problem of data center's log is score in the middle part of log According to for empty or fail data, the input that can not be worked directly as log analysis.There are two the reason of problem occurs is main: (1) Bug in the acquisition stage of daily record data, monitoring system may cause shortage of data.Simultaneously as monitoring system into Journey is typically provided at lower priority, is deprived of resource compared with Gao Shihui when a group of planes is loaded, and then lead to shortage of data;(2) In the processing stage of daily record data, due to secrecy etc., some data can be by anonymization or normalization.The process can be direct Shortage of data is caused, then will lead to unexpected shortage of data in the Bug that the process occurs.However, current log analysis field pair The processing method of missing data mainly has simple removal missing data item, and mends using based on mean value or the statistics of recurrence Full method carries out missing data recovery.Existing method has the following problems:
(1) large scale shortage of data problem can not be coped with.As data center's scale increases, data center's daily record data is lacked Mistake ratio has the tendency that rising.When facing large scale shortage of data problem, existing simple removing method will lead to log number It is greatly decreased according to Global Information amount;And it is low to restore accuracy rate based on mean value or the statistics complementing method of recurrence.Both of which Large scale shortage of data problem cannot be coped with, and then influences the accuracy of log analysis work.
(2) correlativity complicated between different data attribute in data center's log can not be coped with.Data center's log Usually possess ten to dozens of data attribute.There is different linearly or nonlinearly related passes between different data attributes System, carrying out analysis to the correlativity data attribute can be improved the accuracy of data recovery.Existing method is restoring log The correlativity between different data attribute is not considered when missing data, causes recovery accuracy rate lower.And it is needed when restoring The input data attribute of recovery algorithms is manually specified, non-expert personnel are not the case where carrying out correlation analysis to daily record data Under, it is difficult correctly to be selected.
Summary of the invention
In view of the above-mentioned problems, the invention proposes a kind of data center's log missing data restoration methods based on tensor. The present invention uses correlation analysis to excavate the correlation of different data attribute in data center's log first, chooses optimal number Discretization optimization is carried out to data according to attribute set, and using a two stages discretization step-length optimization algorithm;Then it will choose Attribute of the optimal data attribute set as tensor, construct a sparse tensor;Finally use the tensor based on tensor resolution Complementing method carries out completion to sparse tensor, obtains a dense tensor.By the dense tensor and original imperfect daily record data In conjunction with obtaining a complete log data set.
In the present invention, using CANDECOMP/PARAFAC (CP) decomposition method to sparse carry out completion.CP decomposition is one The widely applied tensor complementing method of kind, it excavates tensor data by being several one tensors of order by sparse tensor resolution Changing rule, and then completion is carried out to sparse tensor data.For data center's daily record data due to its own feature, that constructs is dilute Dredging tensor has low-rank, therefore is suitble to decompose using CP and carries out tensor completion.
Data center's log missing data restoration methods of the present invention are broadly divided into five steps: initialization, data Attribute is chosen, the optimization of data attribute discretization, tensor constructs and completion, log missing data completion.In the method, there are five Basic parameter: discretization branch mailbox number lower bound NL, discretization branch mailbox number upper bound NH, attribute selection discretization step-length S1, discretization optimization Step-length S2, CP decomposition one tensor number R of order, gradient decline learning rateGradient declines objective function weight λ1And λ2, gradient decline Objective function convergence threshold θ.NLBetween general value 50-150, NHBetween general value 400-500, S1General value 100-200 Between, S2Between general value 25-50, between the general value 5-30 of R,Generally take 0.00001, λ1And λ2Generally take 0-1 it Between, θ generally takes 0.01.
The above method is realized according to the following steps:
(1) it initializes.If sharing n data attribute in log, m item record.Then data attribute set can be expressed as A, A ={ a1,a2,…an}.Data record set in log can be expressed as E, E={ e1,e2,…em}.Data in log can be with V is expressed as,Wherein vijIndicate the value of i-th of data attribute in j-th strip data record.With scarce The data attribute for losing data is denoted as aT
(2) data attribute is chosen.
2.1) all data attributes that there may be correlativity with target missing data attribute are manually chosen as candidate number According to attribute set A ', A '={ a1,a2,…an′}。
2.2) the discretization rule set in data decimation stage is constructedEach of them rule It is then ri={ ri1,ri2,…rin′, rijIndicate the i-th rule to candidate attribute ajDiscretization case number. I.e. by discretization branch mailbox number lower bound NL, discretization branch mailbox number upper bound NH, attribute selection discretization step-length S1Determining search space In, traverse the combination of all data attributes Yu discretization branch mailbox number.
2.3) each discretization rule r is usedi∈ Rule uses discretization rule r to Data DiscretizationiAfter discretization Daily record data can be expressed as Vi,Then data attribute selection is carried out one by one.
2.3.1 all candidate data attribute a) are calculated using formula (1) and formula (2)i∈ A ' and target data attribute aT's AMI is denoted as AMI (ai;aT).Then the priority P of candidate data attribute, P={ p are initialized1,p2,…pn, wherein pi=AMI (ai;aT)。
2.3.2) data attribute of a highest priority is selected (to be denoted as ak) be added to selection data attribute set, and will It is from the middle removal of candidate data attribute set A '.By remaining candidate data attribute alThe priority update of ∈ A ' is pl×(1-AMI (al,ak))。
2.3.3 step 2.3.2) is repeated) it is equal to Object selection quantity until choosing the quantity of data attribute.
2.3.4 result) will be chosen and be denoted as resultiAnd it is added to and chooses in results set Result.
2.4) all selection results chosen in results set Result are counted, by the highest data of the frequency of occurrences Attribute chooses set as final data attribute and chooses result AS,AS={ a1,a2,…aq}。
(3) discretization granularity optimizes.
3.1) the discretization rule set of discretization granularity optimizing phase is constructed Each of them rule is r 'i={ r 'i1,r′i2,…r′iq, r 'ijIndicate the i-th rule to candidate attribute ajDiscretization case Number.I.e. by discretization branch mailbox number lower bound NL, discretization branch mailbox number upper bound NH, attribute selection discretization step Long S2In determining search space, the combination of all data attributes Yu discretization branch mailbox number is traversed.
3.2) the data attribute subset A based on selectionS, use each discretization rule r 'i∈ Rule ', carry out data from Dispersion.Use discretization rule riDaily record data after discretization can be expressed as Vi′, Variance coefficient (Weighted coefficient of variation, WCV) is calculated to the daily record data after discretization. Steps are as follows for the calculating of WCV: the record in daily record data being pressed data attribute subset A firstSIn each data attribute value Grouping, is denoted as G, G={ g1,g2,…gp, each groupingEach of them is recorded in institute There is data attribute ak∈ASOn be owned by equal numerical valueTarget data attribute a in each grouping is calculated using formula (3)T Numerical valueThe coefficient of variation, be denoted as ci.The WCV of each grouping is calculated using formula (4)i, then calculated using formula (5) entire The WCV of log.
Wherein σ (X) indicates the standard deviation of X, and μ (X) indicates the mean value of X, and size (X) indicates the data entry number in X.
3.3) the smallest Data Discretization result of WCV value is chosen as final Data Discretization as a result, after discretization Daily record data is denoted as
4) tensor building and tensor completion.
4.1) using the daily record data V after discretizationFAnd target data attribute aTConstruct tensor.If each data attribute ai∈ASOn discrete values number beIt then constructs a q and ties up tensor
4.1.1) by each data attribute ai∈ASIn discrete values by ascending order arrange, building numerical value v to arrange serial number d Mapping
4.1.2) by target data attribute aTNumerical value as tensor value insert tensor.If data record eiChoosing data Attribute AS={ a1,a2,…aqOn numerical value be respectively { vF i1, vF i2..., vF iq, in target data attribute aTNumerical valueIt is logical It crosses mapping M and obtains { vF i1, vF i1..., vF iqCorresponding arrangement serial number { di1, di2..., dij, then the numerical value in tensorWhen there is u record to possess identical tensor subscript, the objective attribute target attribute data recorded using these are equal ValueNumerical value as tensor.
4.2) using CP decomposition method to tensor completion, decomposable process is solved using gradient descent method.
4.2.1 q factor matrix, factor matrix) are initialized using the random number on section [0,1]It is right Answer data attribute ai, SiFor aiThe number of attribute discretization data, R are the hyper parameter of algorithm, initialize weight square according to formula (6) Battle array W.
4.2.2) factor matrix is updated according to formula (7).Wherein ε=χ-[[F1,F2…,Fq]], χ is building Sparse tensor, " [[]] " are Khatri-Rao operator, (χ)(N)Indicate the N-mode matrixing of tensor χ,λ1And λ2For algorithm Hyper parameter.
4.2.3) according to formula (8) calculating target function value.
4.2.4 step 4.2.2) is repeated) and 4.2.3) until the variable quantity of target function value twice is less than threshold θ.
5) daily record data restores.The record e of missing data is had to eachiIn data attribute AS={ a1,a2,…aqOn Numerical value be respectively { vF i1, vF i2..., vF iq, { v is obtained by mapping MF i1, vF i1..., vF iqCorresponding arrangement serial number { di1, di2..., dij, then using the tensor value after completionMissing data is restored.
Detailed description of the invention
Fig. 1 is the deployment diagram of the method for the present invention.
Fig. 2 is overview flow chart of the invention.
Fig. 3 is the flow chart that daily record data attribute is chosen.
Fig. 4 is the flow chart of daily record data discretization optimization.
Fig. 5 is the flow chart of tensor building and completion.
Specific embodiment
The present invention is illustrated with reference to the accompanying drawings and detailed description.
Fig. 1 is the deployment diagram of the method for the present invention.The present invention is made of multiple computer servers, passes through network between server Connection.Platform nodes are divided into two classes: including a memory node and calculate node.The method of the present invention includes two class kernel softwares Module: log memory module and log processing module.Wherein, log memory module is responsible for the storage of daily record data, saves in storage It is disposed on point;Log processing module is responsible for handling daily record data, disposes in calculate node.
Illustrate the specific implementation method of the method for the present invention below with reference to Fig. 2 summary of the invention main-process stream.In present implementation, Basic parameter is provided that discretization branch mailbox number lower bound NL=100, discretization branch mailbox number upper bound NH=500, attribute is chosen discrete Change step-length S1=100, discretization optimizes step-length S2=25, CP decompose one tensor number R=25 of order, and gradient declines learning rateGradient declines objective function weight λ1=0.5 and λ2=0.5, gradient decline objective function convergence threshold θ= 0.01。
Specific implementation method can be divided into following steps:
(1) it initializes.It enables and shares 49 data attributes in data center's log, 10364956 records.Then data attribute Set can be expressed as A, A={ a1,a2,…a49}.Data record set in log can be expressed as E, E={ e1,e2,… e10364956}.Daily record data can be expressed as V,Data attribute with missing data is Real_mem_avg (average memory usage amount), is denoted as aT
(2) data attribute is chosen, and the flow chart of steps is as shown in Figure 3.
2.1) all data attributes that there may be correlativity with target missing data attribute are manually chosen as candidate number According to attribute set A ', A '=plan_cpu, plan_mem, instance_num, duration,
, real_cpu_avg, end_time } and (application cpu resource, application memory source, example quantity, duration, reality Border cpu resource usage amount average value, end time).
2.2) discretization rule set Rule, the Rule={ r in data decimation stage are constructed1,r2,…r15625, wherein each Rule is ri={ ri1,ri2,…ri6, rijIndicate the i-th rule to candidate attribute ajDiscretization case number.I.e. by discrete Change branch mailbox number lower bound 100, the discretization branch mailbox number upper bound 500, attribute is chosen in the search space that discretization step-length 100 determines, time Go through the combination of all data attributes Yu discretization branch mailbox number.
2.3) each discretization rule r is usedi∈ Rule uses discretization rule r to Data DiscretizationiAfter discretization Daily record data can be expressed as Vi,Then data attribute selection is carried out one by one.
2.3.1) according to the method in summary of the invention 2.3.1), all candidate data attribute a are calculatedi∈ A ' and target data Attribute aTAMI, be denoted as AMI (ai;aT).Then the priority P of initialization candidate data attribute, P=0.02,0.11, 0.018,0.09,0.009,0.14}。
2.3.2 the data attribute end_time of a highest priority) is selected to be added to selection data attribute set, and will It is from the middle removal of candidate data attribute set A '.According to the method in summary of the invention 2.3.2) by remaining candidate data attribute al∈ The priority update of A ' is { 0.018,0.09,0.015,0.07,0.0087 }.
2.3.3 step 2.3.2) is repeated) it is equal to Object selection quantity until choosing the quantity of data attribute.
2.3.4 result) will be chosen and be denoted as resultiAnd it is added to and chooses in results set Result.
2.4) all selection results chosen in results set Result are counted, by the highest data of the frequency of occurrences Attribute chooses set as final data attribute and chooses result AS,AS={ end_time, plan_mem, duration }.
(3) discretization granularity optimizes, and the flow chart of steps is as shown in Figure 4.
3.1) discretization rule set Rule ', Rule '={ r ' of discretization granularity optimizing phase is constructed1,r′2,…r ′4096, each of them rule is r 'i={ r 'i1,r′i2,r′i3, r 'ijIndicate the i-th rule to candidate attribute { end_ Time, plan_mem, duration } in j-th of attribute discretization case number.It is discrete i.e. by discretization branch mailbox number lower bound 100 Change the branch mailbox number upper bound 500, attribute is chosen in the search space that discretization step-length 25 determines, all data attributes and discretization are traversed The combination of branch mailbox number.
3.2) the data attribute subset A based on selectionS, use each discretization rule r 'i∈ Rule ', carry out data from Dispersion.Use discretization rule riDaily record data after discretization can be expressed as Vi′, Root According to summary of the invention 3.2) in method to after discretization daily record data calculate plus WCV=0.35647
3.3) the smallest Data Discretization result of WCV value is chosen as final Data Discretization as a result, after discretization Daily record data is denoted as
4) tensor building and tensor completion, the flow chart of steps are as shown in Figure 5.
4.1) using the daily record data V after discretizationFAnd target data attribute aTConstruct tensor.Each data attribute ai ∈ASOn discrete values number be respectively 276,87,61, construct one 3 dimension tensor
4.1.1) by each data attribute ai∈ASIn discrete values by ascending order arrange, building numerical value v to arrange serial number d Mapping
4.1.2) by target data attribute aTNumerical value as tensor value insert tensor.If data record eiChoosing data Attribute ASNumerical value on={ end_time, plan_mem, duration } is respectively { 35519,0.016,34 }, in target data Attribute aTNumerical value be 0.023814, by mapping M obtain { 35519,0.016,34 } corresponding arrangement serial number 1,13 ..., 24 }, then the numerical value χ in tensor1 13 24=0.023814.
4.2) using CP decomposition method to tensor completion, decomposable process is solved using gradient descent method.
4.2.1 three factor matrixs) are initialized using the random number on section [0,1], three factor matrixs are respectivelyWeight matrix is initialized according to the method in summary of the invention 4.2.1) W。
4.2.2) three factor matrixs are updated according to the method in summary of the invention 4.2.2).
4.2.3) according to the method calculating target function value E=7983.348 in summary of the invention 4.2.2)
4.2.4 step 4.2.2) is repeated) and 4.2.3) until the variable quantity of target function value twice is less than threshold value 0.01.
5) daily record data restores.If having the record e of missing data1In data attribute AS=end_time, plan_mem, Duration } on numerical value be respectively { 45682,0.008,89 }, pass through mapping M and obtain { 45682,0.008,89 } corresponding row Column serial number { 34,5,41 }, then using the tensor value x after completion34 5 41Missing data is restored.
The data receiving channel dynamic allocation method proposed according to the present invention, inventor have carried out relevant performance and have surveyed Examination.Test result shows that the method for the present invention can accurately restore the missing data in data center's log.
Performance test uses the log of data center, Alibaba as test data set, and existing log analysis is worked In missing data restoration methods: mean value restore, linear regression restore;And be widely used in other field advance data it is extensive Compound method: KNN restores, multi-layer perception (MLP) restores, support vector machines is restored totally five kinds of methods and is compared, and is mentioned with embodying the present invention Method out is in the advantage for restoring data center's log missing data accuracy rate.Performance test is run on by 1 computer, hardware Configuration includes: the CPU, 32GB DDR4RAM, 512GB NVMe SSD of 7 1700X@3.80GHz of AMD Ryzen.
Performance test uses two parameter evaluation data recovery errors: average relative error (MRE) and root-mean-square error (RMSE), their calculation formula such as formula (9) and formula (10) are shown:
Performance test is divided into 4 groupings according to different daily record data missing ratios and missing mode, respectively according to Ah Li Baba log lacks 30% shortage of data rate (TM30) of mode, lacks 85% shortage of data rate of mode according to Alibaba's log (TM85), completely random lacks 30% shortage of data rate (RM30), and completely random lacks 85% shortage of data rate (RM85).Performance The result of test is as shown in Table 1 and Table 2.
1 the performance test results of table (MRE)
2 the performance test results of table (RMSE)
By the data of Tables 1 and 2 it can be concluded that, in four groups of experiments, relative to five kinds of control methods, the method for the present invention MRE averagely reduces 47.7%, RMSE and averagely reduces 56.6%, MRE maximum and reduce 85.9%, RMSE maximum and reduces 92%.The mistake that the multiple perceptron of the lower two machine learning data reconstruction methods of mean error restores and support vector machines is restored Difference is significantly increased with the rising of shortage of data ratio, and the error of the method for the present invention then keeps stable, in 30% and 85% two kind of number Maximum lift according to MRE under miss rate is respectively 32.7% and 50%.The performance test results prove relative to five kinds of control methods, The missing data restoration errors of the method for the present invention are lower and more stable, can obtain under different shortage of data rates higher Accuracy rate.
Finally, it should be noted that above example is only to illustrate the present invention and not limits technology described in the invention, And the technical solution and its improvement of all spirit and scope for not departing from invention, it should all cover in claim model of the invention In enclosing.

Claims (5)

1. a kind of data center's log missing data restoration methods, it is characterised in that: the following steps are included:
(1) it initializes, if sharing n data attribute in log, m item record.Then data attribute set can be expressed as A, A= {a1, a2... an, the data record set in log can be expressed as E, E={ e1, e2... em, the data in log can be with table It is shown asWherein, vijThe value for indicating i-th of data attribute in j-th strip data record, with missing The data attribute of data is denoted as aT
(2) data attribute is chosen.
2.1) selection is all to have the data attribute of correlativity as candidate data property set with target missing data attribute Close A ', A '={ a1, a2... an′};
2.2) the discretization rule set Rule in data decimation stage is constructed,Each of them rule is ri={ ri1, ri2... rin′, rijIndicate the i-th rule to candidate attribute ajDiscretization case number. Exist By discretization branch mailbox number lower bound NL, discretization branch mailbox number upper bound NH, attribute selection discretization step-length S1In determining search space, Traverse the combination of all data attributes Yu discretization branch mailbox number;
2.3) each discretization rule r is usedi∈ Rule uses discretization rule r to Data DiscretizationiDay after discretization Will data can be expressed asThen data attribute selection is carried out one by one;
2.4) all selection results chosen in results set Result are counted, by the highest data attribute of the frequency of occurrences Set is chosen as final data attribute and chooses result AS, AS={ a1, a2... aq};
(3) discretization granularity optimizes
3.1) the discretization rule set Rule ' of discretization granularity optimizing phase is constructed,It is wherein each Rule is r 'i={ r 'i1, r 'i2... r 'iq, r 'ijIndicate the i-th rule to candidate attribute ajDiscretization case number.I.e. by discretization branch mailbox number lower bound NL, discretization branch mailbox number upper bound NH, attribute selection discretization step-length S2 In determining search space, the combination of all data attributes Yu discretization branch mailbox number is traversed;
3.2) the data attribute subset A based on selectionS, use each discretization rule r 'i∈ Rule ' carries out data discrete Change, uses discretization rule riDaily record data after discretization can be expressed as To from Daily record data after dispersion calculates variance coefficient (Weighted coefficient of variation, WCV);
3.3) the smallest Data Discretization result of WCV value is chosen as final Data Discretization as a result, log after discretization Data are denoted as
4) tensor building and tensor completion
4.1) using the daily record data V after discretizationFAnd target data attribute aTConstruct tensor.If each data attribute ai∈AS On discrete values number beIt then constructs a q and ties up tensor
4.2) using CP decomposition method to tensor completion, decomposable process is solved using gradient descent method.;
5) daily record data restores
The record e of missing data is had to eachiIn data attribute AS={ a1, a2... aqOn numerical value be respectively { vF i1, vF i2..., vF iq, { v is obtained by mapping MF i1, vF i1..., vF iqCorresponding arrangement serial number { di1, di2..., dij, then it uses Tensor value after completionMissing data is restored.
2. data center's log missing data restoration methods as described in claim 1, it is characterised in that: 2.3) include:
2.3.1 all candidate data attribute a) are calculated using formula (1) and formula (2)i∈ A ' and target data attribute aTAMI, It is denoted as AMI (ai;aT).Then the priority P of candidate data attribute, P={ p are initialized1, p2... pn, wherein pi=AMI (ai; aT)。
2.3.2) data attribute of a highest priority is selected (to be denoted as ak) be added to choose data attribute set, and by its from The middle removal of candidate data attribute set A '.By remaining candidate data attribute alThe priority update of ∈ A ' is pl×(1-AMI(al, ak))。
2.3.3 step 2.3.2) is repeated) it is equal to Object selection quantity until choosing the quantity of data attribute.
2.3.4 result) will be chosen and be denoted as resultiAnd it is added to and chooses in results set Result.
3. data center's log missing data restoration methods as described in claim 1, it is characterised in that: 4.1) include:
4.1.1) by each data attribute ai∈ASIn discrete values arranged by ascending order, building numerical value v being reflected to serial number d is arranged It penetrates
4.1.2) by target data attribute aTNumerical value as tensor value insert tensor, if data record eiChoosing data attribute As={ a1, a2... aqOn numerical value be respectively { vF i1, vF i2..., vF iq, in target data attribute aTNumerical valuePass through It maps M and obtains { vF i1, vF i1..., vF iqCorresponding arrangement serial number { di1, di2..., dij), then the numerical value in tensorWhen there is u record to possess identical tensor subscript, the objective attribute target attribute data mean value of these records is usedNumerical value as tensor.
4. data center's log missing data restoration methods as described in claim 1, it is characterised in that: 4.2) include:
4.2.1 q factor matrix, factor matrix) are initialized using the random number on section [0,1]Corresponding number According to attribute ai, SiFor aiThe number of attribute discretization data, R are the hyper parameter of algorithm, initialize weight matrix W according to formula (6),
4.2.2) factor matrix is updated according to formula (7), wherein For the dilute of building Tensor is dredged, " [[]] " is Khatri-Rao operator,Indicate tensorN-mode matrixing,λ1And λ2To calculate Method hyper parameter;
4.2.3) according to formula (8) calculating target function value.
4.2.4 step 4.2.2) is repeated) and 4.2.3) until the variable quantity of target function value twice is less than threshold θ.
5. data center's log missing data restoration methods as described in claim 1, it is characterised in that: 4.2) include: 3.2) Steps are as follows for the calculating of middle WCV: the record in daily record data being pressed data attribute subset A firstSIn each data attribute Value grouping, is denoted as G, G={ g1, g2... gp, each grouping Each of them is recorded in institute There is data attribute ak∈ASOn be owned by equal numerical value vi′ jk.Target data attribute a in each grouping is calculated using formula (3)T Numerical valueThe coefficient of variation, be denoted as ci.The WCV of each grouping is calculated using formula (4)i, then calculated using formula (5) entire The WCV of log.
Wherein σ (X) indicates the standard deviation of X, and μ (X) indicates the mean value of X, and size (X) indicates the data entry number in X.
CN201910056129.7A 2019-01-21 2019-01-21 Data center log missing data recovery method Active CN109857593B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910056129.7A CN109857593B (en) 2019-01-21 2019-01-21 Data center log missing data recovery method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910056129.7A CN109857593B (en) 2019-01-21 2019-01-21 Data center log missing data recovery method

Publications (2)

Publication Number Publication Date
CN109857593A true CN109857593A (en) 2019-06-07
CN109857593B CN109857593B (en) 2020-08-28

Family

ID=66895519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910056129.7A Active CN109857593B (en) 2019-01-21 2019-01-21 Data center log missing data recovery method

Country Status (1)

Country Link
CN (1) CN109857593B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183644A (en) * 2020-09-29 2021-01-05 中国平安人寿保险股份有限公司 Index stability monitoring method and device, computer equipment and medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156720A (en) * 2011-03-28 2011-08-17 中国人民解放军国防科学技术大学 Method, device and system for restoring data
CN102289524A (en) * 2011-09-26 2011-12-21 深圳市万兴软件有限公司 Data recovery method and system
US20130117237A1 (en) * 2011-11-07 2013-05-09 Sap Ag Distributed Database Log Recovery
CN103631676A (en) * 2013-11-06 2014-03-12 华为技术有限公司 Snapshot data generating method and device for read-only snapshot
CN103838642A (en) * 2012-11-26 2014-06-04 腾讯科技(深圳)有限公司 Data recovery method, device and system
CN103942252A (en) * 2014-03-17 2014-07-23 华为技术有限公司 Method and system for recovering data
CN105955845A (en) * 2016-04-26 2016-09-21 浪潮电子信息产业股份有限公司 Data recovery method and device
CN107220142A (en) * 2016-03-22 2017-09-29 阿里巴巴集团控股有限公司 Perform the method and device of data recovery operation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156720A (en) * 2011-03-28 2011-08-17 中国人民解放军国防科学技术大学 Method, device and system for restoring data
CN102289524A (en) * 2011-09-26 2011-12-21 深圳市万兴软件有限公司 Data recovery method and system
US20130117237A1 (en) * 2011-11-07 2013-05-09 Sap Ag Distributed Database Log Recovery
CN103838642A (en) * 2012-11-26 2014-06-04 腾讯科技(深圳)有限公司 Data recovery method, device and system
CN103631676A (en) * 2013-11-06 2014-03-12 华为技术有限公司 Snapshot data generating method and device for read-only snapshot
CN103942252A (en) * 2014-03-17 2014-07-23 华为技术有限公司 Method and system for recovering data
CN107220142A (en) * 2016-03-22 2017-09-29 阿里巴巴集团控股有限公司 Perform the method and device of data recovery operation
CN105955845A (en) * 2016-04-26 2016-09-21 浪潮电子信息产业股份有限公司 Data recovery method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183644A (en) * 2020-09-29 2021-01-05 中国平安人寿保险股份有限公司 Index stability monitoring method and device, computer equipment and medium
CN112183644B (en) * 2020-09-29 2024-05-03 中国平安人寿保险股份有限公司 Index stability monitoring method and device, computer equipment and medium

Also Published As

Publication number Publication date
CN109857593B (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN110199273B (en) System and method for loading, aggregating and bulk computing in one scan in a multidimensional database environment
US20200142872A1 (en) System and method for use of a dynamic flow in a multidimensional database environment
US10223437B2 (en) Adaptive data repartitioning and adaptive data replication
US11200223B2 (en) System and method for dependency analysis in a multidimensional database environment
CN108255712A (en) The test system and test method of data system
US20070282470A1 (en) Method and system for capturing and reusing intellectual capital in IT management
CN106547882A (en) A kind of real-time processing method and system of big data of marketing in intelligent grid
CN104205039A (en) Interest-driven business intelligence systems and methods of data analysis using interest-driven data pipelines
US8688819B2 (en) Query optimization in a parallel computer system with multiple networks
US7284011B1 (en) System and methods for processing a multidimensional database
CN104468274A (en) Cluster monitor and management method and system
CN112559237B (en) Operation and maintenance system troubleshooting method and device, server and storage medium
CN109308309B (en) Data service quality assessment method and terminal
CN112579586A (en) Data processing method, device, equipment and storage medium
US9042263B1 (en) Systems and methods for comparative load analysis in storage networks
CN105556474A (en) Managing memory and storage space for a data operation
US20030139900A1 (en) Methods and apparatus for statistical analysis
CN111708895B (en) Knowledge graph system construction method and device
US20060200484A1 (en) Unified reporting
KR101973328B1 (en) Correlation analysis and visualization method of Hadoop based machine tool environmental data
Creţu-Ciocârlie et al. Hunting for problems with Artemis
CN109857593A (en) A kind of data center's log missing data restoration methods
CN113506098A (en) Power plant metadata management system and method based on multi-source data
CN109359205A (en) A kind of remote sensing image cutting method and equipment based on geographical grid
CN113504996A (en) Load balance detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant