CN104216916B - Data restoration method and device - Google Patents

Data restoration method and device Download PDF

Info

Publication number
CN104216916B
CN104216916B CN201310219030.7A CN201310219030A CN104216916B CN 104216916 B CN104216916 B CN 104216916B CN 201310219030 A CN201310219030 A CN 201310219030A CN 104216916 B CN104216916 B CN 104216916B
Authority
CN
China
Prior art keywords
attribute
missing
relational matrix
difference relationship
filled
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310219030.7A
Other languages
Chinese (zh)
Other versions
CN104216916A (en
Inventor
金成美
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201310219030.7A priority Critical patent/CN104216916B/en
Publication of CN104216916A publication Critical patent/CN104216916A/en
Application granted granted Critical
Publication of CN104216916B publication Critical patent/CN104216916B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity

Abstract

The embodiment of the present invention proposes that a kind of data restoration method and device, method include:Decision table is built according to data to be restored;The relational matrix of difference relationship between expression any two object is built according to decision table;Relational matrix is searched for, obtains difference relationship elements corresponding with each object to be filled up in relational matrix one by one;According to the difference relationship elements of acquisition, judge whether the difference relationship between object to be filled up and other each objects meets conflict avoidance condition one by one;It is unsatisfactory for all in the object of conflict avoidance condition, belonging to the property value of conditional attribute as filling up value with the missing attribute values of accordingly object to be filled up is recorded;The property value of missing is filled according to the value of filling up of record.The device of the embodiment of the present invention fills up missing attribute using the difference relationship between object, greatly reduces the complexity of calculating, and it is possible to prevente effectively from the data collision generated due to filling up.

Description

Data restoration method and device
Technical field
The present invention relates to field of computer technology, more particularly to a kind of data restoration method and device.
Background technology
With the development of technology and the progress of science, the problem of being skyrocketed through having become one and can not be ignored of data volume, In face of the data that pile up like a mountain, it is desirable to obtain useful information and knowledge not a duck soup.Therefore data mining technology is along with people Demand and science and technology progress obtain it is fast-developing.However in practical application, the appearance of missing data is to all data point It is all a generally existing challenging problem that analysis technology, which includes data mining,.
The generation of missing data is recurrent in practice or even is inevitable.That causes shortage of data can Can be that information temporarily can not be obtained or be missed in operation.At present, many methods for solving the problems, such as shortage of data It is with feasible, the value of an estimation (such as mean value mostly using simple enthesis (simple imputation) Or the estimated value obtained with other methods) filling is carried out to missing data.Its advantage is that speed is fast, but this Method underestimates the incidence relation between variable, can distort the distribution of sample, it is impossible to reflect the uncertainty of missing values.
Difference lies in Multiple Imputations pair with simple enthesis for Multiple Imputation (Multiple Imputation, MI) Each missing values is filled up with the set of a probable value, is repeated as many times, so multiple imputation is, to reflect missing values Uncertainty, so as to generate several complete data sets.Then, each is filled up with the statistical method for complete data set Data set difference is for statistical analysis, and obtained result is integrated, and then generates final statistical inference.It is but multiple The processing procedure of enthesis is more complicated, is not particularly suited for lightweight data analysis.
Therefore, a kind of data reduction technology is needed at present, the incidence relation between variable not only can be considered, but also can be applicable in In the data analysis of lightweight.
Invention content
The purpose of the embodiment of the present invention is to provide a kind of data restoration method and device, to solve existing data convert skill The problem of art error rate height or excessively complicated processing procedure.
The embodiment of the present invention proposes a kind of data restoration method, including:
Decision table is built according to data to be restored, the decision table includes object set, conditional attribute collection and decision attribute Collection;
The relational matrix of difference relationship between expression any two object is built according to decision table, is wrapped in the relational matrix Include it is multiple expression two objects between difference relationship difference relationship elements;
Relational matrix is searched for, obtains difference relationship elements corresponding with each object to be filled up in relational matrix one by one;
According to the difference relationship elements of acquisition, the difference relationship between object to be filled up and other each objects is judged one by one Whether conflict avoidance condition is met, and the conflict avoidance condition includes decision attribute difference and removes other except missing attribute All conditions property value all same;
It is unsatisfactory for all in the object of conflict avoidance condition, one is belonged to the missing attribute values of accordingly object to be filled up The property value conduct of a conditional attribute is filled up value and is recorded;
The property value of missing is filled according to the value of filling up of record.
The embodiment of the present invention proposes also a kind of data recovery device, including:
Decision table construction unit, for building decision table according to data to be restored, the decision table includes object set, item Part property set and decision kind set;
Relational matrix construction unit, for building the relationship of difference relationship between expression any two object according to decision table Matrix, the relational matrix include the difference relationship elements of difference relationship between two objects of multiple expressions;
Difference relationship elements acquiring unit for searching for relational matrix, is obtained in relational matrix with each waiting to fill up one by one The corresponding difference relationship elements of object;
Conflict anticipation unit, for the difference relationship elements according to acquisition, judge one by one object to be filled up with it is other each Whether the difference relationship between object meets conflict avoidance condition, and the conflict avoidance condition includes decision attribute difference and removes Lack other all conditions property value all sames except attribute;
Value recording unit is filled up, for being unsatisfactory for all in the object of conflict avoidance condition, with corresponding object to be filled up Missing attribute values belong to the property value of conditional attribute as filling up value and recorded;
Shim is filled the property value of missing for the value of filling up according to record.
Relative to the prior art, the beneficial effects of the invention are as follows:The device of the embodiment of the present invention utilizes the difference between object Not relationship come to missing attribute fill up, and introduce the judgment rule of core attributes and object conflict, greatly reduce calculating Complexity, and the it is possible to prevente effectively from data collision generated due to filling up.
Description of the drawings
Fig. 1 is a kind of flow chart of data restoration method of the embodiment of the present invention;
Fig. 2 is the flow chart of another data restoration method of the embodiment of the present invention;
Fig. 3 is a kind of structure chart of data recovery device of the embodiment of the present invention;
Fig. 4 is the structure chart of another data recovery device of the embodiment of the present invention.
Specific embodiment
For the present invention aforementioned and other technology contents, feature and effect, in following cooperation with reference to the preferable reality of schema Applying during example is described in detail clearly to be presented.By the explanation of specific embodiment, when can be to the present invention to reach predetermined mesh The technological means taken and effect be able to more deeply and it is specific understand, however institute's accompanying drawings are only to provide with reference to saying It is bright to be used, it is not intended to limit the present invention.
Involved partial symbols are described as follows in the embodiment of the present invention:" ∧ " expression " and " the meaning;" ∨ " is represented The meaning of "or";Represent the meaning of " arbitrary ";Represent the meaning of " presence ";“|CD| " represent set CDGesture, i.e., Middle set CDThe number of element.
Fig. 1 is referred to, is a kind of flow chart of data restoration method of the embodiment of the present invention, includes the following steps:
S101 builds decision table according to data to be restored, and the decision table includes object set, conditional attribute collection and decision Property set.
Decision table is a kind of graphical tool in table shape, more suitable for description processing Rule of judgment, each condition and phase Mutually combination, the situation there are many decision scheme.It the different demands statistic abstraction of specific industry is condition category to be in data mining Property and decision attribute.For example, mobile user's statistical form, as decision table, wherein user is exactly object, and conditional attribute can be The essential information of user, such as gender, occupation, decision attribute are exactly the set meal that user handles.Under normal conditions, decision attribute Value is determined by the value of one or more conditional attributes, for example, " package name " this decision attribute is by " monthly flowing This conditional attribute of amount " determines.
S102 builds the relational matrix of difference relationship between expression any two object, the relationship square according to decision table Battle array includes the difference relationship elements of difference relationship between two objects of multiple expressions.
Each object in decision table has its corresponding conditional attribute and decision attribute, passes through the item between two objects The comparison of part attribute and decision attribute, it is possible to obtain the difference relationship between the two objects, in other words, difference Relation Element Element is obtained by the comparison and operation of conditional attribute and decision attribute between two objects.
The difference relationship elements can include representing the whether consistent decision mark of two object decision attributes, represent two The discernible attributes set of conditional attribute difference and the item of mark action is played for the conditional attribute value to object between a object Part attribute-bit vector.
If decision table is S=(U, A, V, f), wherein U={ x1, x2..., xnIt is a nonempty finite object set, A is The attribute set of object, respectively by conditional attribute collection C={ ai| i=1 ..., m } and decision kind set D={ d } two it is non-intersect Subset composition, i.e. A=CUD;aiIt is sample xiIn attribute ai(xi) on value.The model of the relational matrix of structure can be VM (i, j)=(D, CD, F), wherein decision mark D is expressed as:
d(xi) and d (xj) it is object x respectivelyiWith object xjThe value of decision attribute;
Discernible attributes set C between objectDIt is expressed as:
CD={ ak|ak∈A∧ak(xi)≠ak(xj)∧ak(xi)≠*∧ak(xj) ≠ * } (formula 2)
That is CDIn each element belong to conditional attribute, and object xiWith object xjConditional attribute akValue it is different, and Object xiWith object xjConditional attribute akValue do not lack;
Conditional attribute mark vector F is expressed as:
F=(f (a1), f (a2) ..., f (an)) (formula 3)
Wherein,
Obviously, relational matrix VM is matrix symmetrical, based on vector, and each element more completely has recorded phase in matrix The relationship for the sample object answered:
WhenIt is that different decision is differentiated to represent two sample objects;
WhenIt is conflict to represent two sample objects;
WhenIt is that different decision is indiscriminate to represent two sample objects;
WhenIt is that same decision is differentiated to represent two sample objects;
WhenIt is that same decision is indiscriminate to represent two sample objects.
S103 searches for relational matrix, obtains difference Relation Element corresponding with each object to be filled up in relational matrix one by one Element.
Object to be filled up refers to that corresponding conditional attribute contains missing values, and prepares to fill out the conditional attribute value of missing The object of benefit.
S104 according to the difference relationship elements of acquisition, judges the difference between object to be filled up and other each objects one by one Whether relationship does not meet conflict avoidance condition, and the conflict avoidance condition includes decision attribute difference and removes except missing attribute Other all conditions property value all sames.
By taking aforementioned decision table S=(U, A, V, f) as an example, wherein A=CUD, xi∈ U, xiIn have missing.Then to xiIt is filled out After benefit, be only possible toX corresponding tojGenerate contradiction, without withX corresponding tojGenerate contradiction.Thus willMake For conflict avoidance condition.
It proves:A. it is directed toX corresponding toj, i.e. xiDifferent decision indifference Object is into line justification:
Due to xiAnd xjDecisional conflict is likely to occur after filling up, i.e.,It is aobvious Right, what can uniquely be met the requirements is exactly the indifference object of different decision, i.e. xiAfter filling up, be only possible toCorresponding xjGenerate decisional conflict.
B. it is directed toX corresponding toj, i.e. xiThere is difference object to be demonstrate,proved It is bright:
Show xiAnd xjIt certainly exists at least oneai(xi)≠ai(xj) ≠ *, decisional conflict are:So xiIt can not possibly be with after filling upCorresponding xjGenerate decisional conflict.
S105 is unsatisfactory for all in the object of conflict avoidance condition, same with the missing attribute values of accordingly object to be filled up Belong to the property value of conditional attribute as filling up value to be recorded.
S106 is filled the property value of missing according to the value of filling up of record.
The method of the embodiment of the present invention fills up missing attribute using the difference relationship between object, reduces meter The complexity of calculation, and it is possible to prevente effectively from the data collision generated due to filling up.
Fig. 2 is referred to, is the flow chart of another data restoration method of the embodiment of the present invention, includes the following steps:
S201 builds decision table according to data to be restored, and the decision table includes object set, conditional attribute collection and decision Property set.
S202 searches for the decision table, obtains the missing attribute of each object in decision table, and forms the something lost of each object Lose property set.
For example, object x1Attribute a missing attribute values, then object x1Loss property set MAS (x1)={ a };For another example, it is right The x of elephant2Attribute a and attribute b missing attribute values, then object x2Loss property set MAS (x2)={ a, b }.
S203 according to the number of element in each loss property set, arranges object to be filled up accordingly, forms and lack Lose object set.
The number of element is also referred to as gesture in set.For example, object x1Loss property set MAS (x1)={ a }, wherein element Number is 1;Object x2Loss property set MAS (x2)={ a, b }, wherein element number is 2;Object x3Loss property set MAS (x3)={ a, b, c }, wherein element number is 3;It then may be constructed missing object set MOS={ x1, x2, x3}。
S204 builds the relational matrix of difference relationship between expression any two object, the relationship square according to decision table Battle array includes the difference relationship elements of difference relationship between two objects of multiple expressions.
S205 searches for the relational matrix, and any two pair of missing attribute is not contained according to the judgement of difference relationship elements Whether conflict as between.If conflict, enters step S206;If not conflicting, S207 is entered step.
The judgment rule of conflict can be:In the relational matrix between corresponding two objects of each difference relationship elements Whether difference relationship meets decision attribute values difference, without missing attribute and all conditions attribute all same simultaneously.Because decision What the value of attribute would generally be determined by one or more conditional attributes, if conditional attribute all between two objects is homogeneous Together, but decision attribute is different, then the data among the two objects there are one possibility are wrong, and the number of this conflict There is mistake according to the calculating of filling up of missing values is easily made.
If for example, object x1With object x2All only comprising conditional attribute a, conditional attribute b and decision attribute d, object x1With Object x2Conditional attribute a value all for 1, the value of conditional attribute b is all 2, but object x1Decision attribute d value for 1, it is right As x2Decision attribute d value for 2, then illustrate object x1With object x2It is conflict.
S206, remove in the relational matrix with conflict the relevant difference relationship elements of object.
For example, x is included in decision table1、x2、x3、x4、x5Totally 5 objects, if x1And x2Conflict, then by x in relational matrix1 With x2、x1With x3、x1With x4、x1With x5、x2With x3、x2With x4、x2With x5Corresponding difference relationship elements remove.
S207 searches for the relational matrix, judge in the relational matrix each difference relationship elements it is corresponding two it is right Whether difference relationship it is different, different without missing attribute and one and only one property value meet decision attribute values simultaneously as between Conditional attribute.
S208, by the property value of two objects corresponding to the difference relationship elements for meeting step S207 conditionals not Same conditional attribute is denoted as core attributes.
During missing data is filled, consider to need to pay attention to important attribute and redundant attributes to decision during economic factor Contribution it is different, so when carrying out data convert, can preferentially be filled up according to Attribute Significance.Core attributes described here are Refer to the conditional attribute that may be played a key effect to the value of decision attribute.
In other words, in any one VM (i, j)=(D, CD, F) in, ifThe then corresponding attribute a of the elementiCore attributes for prime information table.
It proves:ExplanationCorresponding object xiAnd xjIt is complete, xi, xj ∈ S ', S ' are to determine The complete information table of plan table.
That is object xiAnd xjDiscernible attributes set CDIn only there are one element, again D=1, that is, d (xi) ≠ d (xj), illustrate that only object x can be distinguished there are one attribute in entire property setiAnd xj.So the attribute is exactly Core attributes in complete sub-information table.
In addition to this, incomplete information table S=(U, A, V, f) and its complete sub-information table S ', wherein A=CUD are given, ai∈ C, if aiThe core attributes in S, then aiCertain is also the core attributes in S '.
It proves:If aiIt is the core attributes of S ', then removes aiAfterwards,There is aj(x)=aj(y)∧d(x) ≠ d (y), wherein d ∈ D, S ' is middle to generate decisional conflict, and U ' ∈ U, removes aiAfterwards, decisional conflict can be equally generated in S.Therefore It can prove aiEqually it is the core attributes in S.
S209 searches for the decision table, obtains the conditional attribute containing missing attribute values.
S210 judges whether the conditional attribute containing missing attribute values is core attributes one by one.If it is not, then illustrate the missing category Property value be redundant attributes, enter step S211;If so, enter step S212.
S211 arbitrarily fills up the missing attribute values of redundancy.
The corresponding object of the missing attribute values of core attributes is denoted as object to be filled up by S212.
S213 according to putting in order for the object to be filled up in missing object set, searches for the relational matrix, and obtain one by one Take difference relationship elements corresponding with each object to be filled up in relational matrix.It is right that the difference relationship elements include expression two As the discernible attributes set of conditional attribute difference between two the whether consistent decision mark of decision attribute, expression objects and it is used for The conditional attribute mark vector of mark action is played to the conditional attribute value of object.
S214 according to the difference relationship elements of acquisition, judges the difference between object to be filled up and other each objects one by one Whether relationship does not meet conflict avoidance condition, and the conflict avoidance condition includes decision attribute difference and removes except missing attribute Other all conditions property value all sames.
S215 is unsatisfactory for all in the object of conflict avoidance condition, same with the missing attribute values of accordingly object to be filled up Belong to the property value of conditional attribute as filling up value to be recorded.
S216 is filled the property value of missing according to the value of filling up of record.
For uncoordinated incomplete decision information system, the basic thought for obtaining decision rule is optimal selection.Most preferably Selection method eliminates the subjectivity of people, is not the data for going to fill up those missings by people's subjective judgement, but finds out those It is the data that decision is most likely to occur.It is not isolated condition selection, but passes through known conditional attribute value, the choosing of system The data value of missing is selected, i.e., selects the data value of missing on the whole.Incomplete data fills up different from yojan, the emphasis of concern It is not the elimination problem of redundant attributes, but payes attention to the reasonable of loss data and fill up problem.Only reasonably filling up can Ensure and improve the accuracy and reliability of Rule Extraction after Reduction of Knowledge.
To investigate the validity of algorithm, a given uncoordinated incomplete decision table (being shown in Table 1) is now selected, utilizes this The method of invention carries out filling up operation.In table 1:A, b, c are conditional attribute, and d is decision attribute, x1~x7For object." * " is sky It is worth (property value of missing).
U a b c d
x1 1 1 1 1
x2 1 * 1 1
x3 2 1 1 1
x4 1 2 * 1
x5 1 * 1 2
x6 2 1 2 2
x7 1 2 1 2
Table 1
The object x according to table 12、x4、x5Contain missing attribute values, object x2Loss property set MAS (x2)={ b }, it is right As x4Loss property set MAS (x4)={ c }, object x5Loss property set MAS (x5)={ b }.Due to object x2、x4、x5's The number for losing property set is all 1, so missing object set MOS={ x2, x4, x5}。
According to the decision table of table 1, relational matrix as shown in Table 2 is built using 1~formula of aforementioned formula 4:
Table 2
Each difference relationship elements VM (i, j)=(D, C in table 2D, F) represent difference relationship between two objects, x1With x1~x7Operation is carried out successively, obtains the difference relationship elements of the first row, x2With x2~x7Operation is carried out successively, obtains the second row Difference relationship elements ..., x7With x7Operation is carried out, obtains the difference relationship elements of the 7th row.Wherein D is identified for decision, CDFor difference Other property set, F are conditional attribute mark vector.
Relational matrix is searched for, is first judged whether there isI.e. Whether the object for not containing missing values conflicts, according to table 1 it can be seen that no complete object conflict.
Then it judges whether there isI.e. Attribute of whether having ready conditions is core attributes, and VM (1,7)=(1, { b }, (0,1,0)) can be obtained through search, VM (3,6)=(1, { c }, (0,0,1)) meet condition, then change core attribute set H={ b, c };
Object containing missing values is x2、x4、x5, the property value of three object missings is filled up respectively.With x2For Example:
Because MAS (x2)={ b }, b ∈ H, so being scanned for relational matrix, with object x2Relevant difference Relation Element The difference relationship elements of second row in the element i.e. relational matrix table of table 2.
It judges whether there is and is unsatisfactory forDifference relationship elements, can obtainMeet, that is, illustrate after being filled up to b, x2And x7It may clash, thus need to be excluded filling up Except calculating, as shown in table 3:
Table 3
And it records and is excluded the corresponding b (x of itemj) value be recorded in set impossiblegetV, willCorresponding b (x7)=2 are recorded in set impossiblegetV, are recorded in set impossiblegetV Value will not be used to lacking item and fill up, can obtain impossiblegetV={ 2 }.
For meetingDifference relationship elements, by corresponding b (xj) value collection is recorded It closes in possiblegetV, the value recorded in set possiblegetV is used for filling up missing item.MeetDifference relationship elements include(0, { a }, (1, *, 0)),(1, { a, c }, (1/2, *, 1/2)), corresponding b (xj) value be b (x1)=1, b (x2)=*, b(x3)=1, b (x4)=2, b (x5)=*, b (x6The value 2 in null value * and set impossiblegetV is removed, finally in)=1 Obtain possoblegetV={ 1 }.
Containing only there are one element in being gathered due to possiblegetV, so directly carrying out missing item b (x with it2) fill out Mending (has multiple elements in such as possiblegetV set, then arbitrary value in possiblegetV set can be used to be filled out It mends).After the completion of filling up, modified MAS (x2) it is empty, it was demonstrated that the object x after filling up2To be complete, so modification relationship square Battle array, obtains relational matrix as shown in table 4:
Table 4
Object x4And x5Operation be same as above.After the completion of filling up, missing object set MOS is sky, and algorithm terminates.It obtains complete Relational matrix it is as shown in table 5:
Table 5
It is as shown in table 6 to obtain complete decision table:
U a b c d
x1 1 1 1 1
x2 1 1 1 1
x3 2 1 1 1
x4 1 2 2 1
x5 1 2 1 2
x6 2 1 2 2
x7 1 2 1 2
Table 6
Former incomplete decision tables have attribute at three to lack, six kinds of completion selections.Six kinds of completion selections are verified It calculates it is found that optimal complete selection there are two types of former incomplete decision tables, is 1,1,2 and 1,2,2 (specific to calculate here no longer respectively It repeats).It is apparent that decision conflict can be generated in first group of complete selection;And it is exactly with the obtained complete solution of context of methods Second group of optimal complete selection, the conflict for effectively avoiding due to filling up and generating.
Fig. 3 is referred to, is a kind of structure chart of data recovery device of the embodiment of the present invention, is built including decision table Unit 301, difference relationship elements acquiring unit 303, conflict anticipation unit 304, fills up value note at relational matrix construction unit 302 Record unit 305 and shim 306.Relational matrix construction unit 302 is connected with decision table construction unit 301, difference relationship Element acquiring unit 303 is connected with relational matrix construction unit 302, and conflict anticipation unit 304 obtains single with difference relationship elements Member 303 is connected, and fills up value recording unit 305 and is connected with conflict anticipation unit 304, shim 306 is with filling up value recording unit 305 are connected.
Decision table construction unit 301 is used to build decision table according to data to be restored, the decision table include object set, Conditional attribute collection and decision kind set.
Relational matrix construction unit 302 is used for the decision table built according to the decision table construction unit 301, and structure represents The relational matrix of difference relationship between any two object.The relational matrix includes difference between two objects of multiple expressions The difference relationship elements of relationship.Each difference relationship elements can include the decision for representing whether two object decision attributes are consistent Mark, represent two objects between conditional attribute difference discernible attributes set and play mark for the conditional attribute value to object The conditional attribute mark vector of knowledge effect.
Difference relationship elements acquiring unit 303 is used to search for the relational matrix that the relational matrix construction unit 302 is built, Difference relationship elements corresponding with each object to be filled up in relational matrix are obtained one by one.The object to be filled up refers to corresponding Conditional attribute contains missing values, and prepares the object filled up to the conditional attribute value of missing.
Conflict anticipation unit 304 is used for the difference relationship elements obtained according to the difference relationship elements acquiring unit 303 (the difference relationship elements each obtained at least correspond to an object to be filled up), judge one by one object to be filled up with it is other each Whether the difference relationship between object meets conflict avoidance condition, and the conflict avoidance condition includes decision attribute difference and removes Lack other all conditions property value all sames except attribute.
It fills up value recording unit 305 and is unsatisfactory for conflict avoidance item for what is judged according to the conflict anticipation unit 304 By in two objects corresponding to these difference relationship elements, one is belonged to missing attribute values for the difference relationship elements of part The property value conduct of conditional attribute is filled up value and is recorded.
Shim 306 be used to filling up according to value recording unit 305 records fill up value to the property value of missing into Row filling.
The device of the embodiment of the present invention fills up missing attribute using the difference relationship between object, reduces meter The complexity of calculation, and it is possible to prevente effectively from the data collision generated due to filling up.
Fig. 4 is referred to, is the structure chart of another data recovery device of the embodiment of the present invention.With the embodiment of Fig. 3 It compares, the present embodiment further includes core condition judgment unit 307, core attributes recording unit 308, deletion condition acquiring unit 309, core Determined property unit 310, object record unit 311 to be filled up, redundant attributes shim 312, conflict judging unit 313, punching Prominent element removal unit 314 loses property set Component units 315 and missing object set Component units 316.Conflict judging unit 313 are connected with relational matrix construction unit 302, and conflict element removal unit 314 is connected with conflict judging unit 313, loses and belongs to Property collection Component units 315 be connected with decision table construction unit 301, missing object set Component units 316 are respectively with losing property set Component units 315 and difference relationship elements acquiring unit 303 are connected, core condition judgment unit 307 and conflict element removal unit 314 are connected, and core attributes recording unit 308 is connected with core condition judgment unit 307, deletion condition acquiring unit 309 and conflict member Plain removal unit 314 be connected, core attributes judging unit 310 respectively with deletion condition acquiring unit 309 and core attributes recording unit 308 are connected, and object record unit 311 to be filled up is connected with core attributes judging unit 310, redundant attributes shim 312 and core Determined property unit 310 is connected.
The judging unit 313 that conflicts is used for after the relational matrix construction unit 302 constructs relational matrix, searches for institute Relational matrix is stated, whether is conflicted between not containing any two object of missing attribute according to the judgement of difference relationship elements.It is described Conflict judging unit 313 can be closed according to difference between corresponding two objects of difference relationship elements each in the relational matrix Whether system meets decision attribute values difference, without missing attribute and all conditions attribute all same simultaneously, scarce to judge not containing Whether conflict between any two object of mistake attribute.
Conflict element removal unit 314 is used to remove in the relational matrix that the conflict judging unit 313 is judged With the relevant difference relationship elements of object that conflict.
Core condition judgment unit 307 is used to eliminate the relevant difference of conflict object in the conflict element removal unit 314 After other relationship elements, search for relational matrix, judge in the relational matrix corresponding two objects of each difference relationship elements it Between difference relationship whether meet simultaneously decision attribute values it is different, without the different item of missing attribute and one and only one property value Part attribute.
Core attributes recording unit 308 be used for two that meet condition that judge the core condition judgment unit 307 it is right The different conditional attribute of property value is denoted as core attributes as between.
Deletion condition acquiring unit 309 is used to obtain relational matrix one by one in the difference relationship elements acquiring unit 303 In before difference relationship elements corresponding with each object to be filled up, search for the decision table, obtain containing missing attribute values Conditional attribute.
Core attributes judging unit 310 is used for the core attributes according to the core attributes recording unit records, judges to contain one by one Whether the conditional attribute of missing attribute values is core attributes.
Object record unit 311 to be filled up is used to judge containing missing attribute values in the core attributes judging unit 310 Conditional attribute when being core attributes, the corresponding object of missing attribute values is denoted as object to be filled up.
Redundant attributes shim 312 be used for the core attributes judging unit 310 judge be not core attributes lack Property value is lost arbitrarily to be filled up.
Lose property set Component units 315 to be used for after the decision table construction unit 301 builds decision table, search for institute Decision table is stated, obtains the missing attribute of each object in decision table, and forms the loss property set of each object.
It lacks object set Component units 316 and is used for each loss formed according to the loss property set Component units 315 The number of element in property set, arranges corresponding object, forms missing object set.The difference relationship elements obtain single Member 303 puts in order according to the object in the missing object set of the missing composition of object set Component units 316, searches for the relationship Matrix, and difference relationship elements corresponding with each object to be filled up in relational matrix are obtained one by one.
The device of the embodiment of the present invention fills up, and introduce missing attribute using the difference relationship between object The judgment rule of core attributes and object conflict greatly reduces the complexity of calculating, and it is possible to prevente effectively from is produced due to filling up Raw data collision.
Through the above description of the embodiments, those skilled in the art can be understood that the embodiment of the present invention The mode of necessary general hardware platform can also be added to realize by software by hardware realization.Based on such reason Solution, the technical solution of the embodiment of the present invention can be embodied in the form of software product, which can be stored in one A non-volatile memory medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in, it is used including some instructions so that a meter It calculates machine equipment (can be personal computer, server or the network equipment etc.) and performs each implement scene institute of the embodiment of the present invention The method stated.
The above described is only a preferred embodiment of the present invention, not make limitation in any form to the present invention, though So the present invention is disclosed above with preferred embodiment, however is not limited to the present invention, any technology people for being familiar with this profession Member, is not departing from the range of technical scheme, when the technology contents using the disclosure above make a little change or modification For the equivalent embodiment of equivalent variations, as long as being without departing from technical scheme content, technical spirit pair according to the present invention Any simple modification, equivalent change and modification that above example is made, in the range of still falling within technical solution of the present invention.

Claims (14)

1. a kind of data restoration method, which is characterized in that including:
Decision table is built according to data to be restored, the decision table includes object set, conditional attribute collection and decision kind set;
The relational matrix of difference relationship between expression any two object is built according to decision table, the relational matrix includes more The difference relationship elements of difference relationship between two objects of a expression;
Relational matrix is searched for, obtains difference relationship elements corresponding with each object to be filled up in relational matrix one by one;
According to the difference relationship elements of acquisition, judge whether is difference relationship between object to be filled up and other each objects one by one Meet conflict avoidance condition, the conflict avoidance condition includes decision attribute difference and removes other all except missing attribute Conditional attribute value all same;
It is unsatisfactory for all in the object of conflict avoidance condition, an item is belonged to the missing attribute values of accordingly object to be filled up The property value conduct of part attribute is filled up value and is recorded;
The property value of missing is filled according to the value of filling up of record.
2. data restoration method as described in claim 1, which is characterized in that laggard the one of described the step of building relational matrix Step includes:
The relational matrix is searched for, judges in the relational matrix difference between corresponding two objects of each difference relationship elements Whether relationship meets decision attribute values difference, without the different condition category of missing attribute and one and only one property value simultaneously Property;
If satisfied, the different conditional attribute of the property value is then denoted as core attributes;
Described search relational matrix obtains the step of difference relationship elements corresponding with each object to be filled up in relational matrix one by one Further comprise before rapid:
The decision table is searched for, obtains the conditional attribute containing missing attribute values;
Judge whether the conditional attribute containing missing attribute values is core attributes one by one;
If so, the corresponding object of missing attribute values is denoted as object to be filled up.
3. data restoration method as claimed in claim 2, which is characterized in that described to judge the item containing missing attribute values one by one Include after the step of whether part attribute is core attributes:
If it is not, then the property value of missing is arbitrarily filled up.
4. data restoration method as described in claim 1, which is characterized in that laggard the one of described the step of building relational matrix Step includes:
Search for the relational matrix, according to difference relationship elements judgement do not contain missing attribute any two object between whether Conflict;
If conflict, remove in the relational matrix with conflict the relevant difference relationship elements of object.
5. data restoration method as claimed in claim 4, which is characterized in that the judgement does not contain arbitrary the two of missing attribute The step of whether conflicting between a object includes:
Judge in the relational matrix between corresponding two objects of each difference relationship elements difference relationship whether and meanwhile meet Decision attribute values are different, without missing attribute and all conditions attribute all same;
If satisfied, it is then conflict.
6. data restoration method as described in claim 1, which is characterized in that
It is further included after the step of structure decision table:
The decision table is searched for, obtains the missing attribute of each object in decision table, and forms the loss property set of each object;
According to the number of element in each loss property set, corresponding object is arranged, forms missing object set;
Described search relational matrix obtains the step of difference relationship elements corresponding with each object to be filled up in relational matrix one by one Suddenly include:
Put in order according to the object in missing object set, search for the relational matrix, and obtain one by one in relational matrix with Each corresponding difference relationship elements of object to be filled up.
7. such as claim 1~6 any one of them data restoration method, which is characterized in that the difference relationship elements include Represent the whether consistent decision mark of two object decision attributes, the difference attribute for representing conditional attribute difference between two objects Collection and the conditional attribute mark vector that mark action is played for the conditional attribute value to object.
8. a kind of data recovery device, which is characterized in that including:
Decision table construction unit, for building decision table according to data to be restored, the decision table includes object set, condition category Property collection and decision kind set;
Relational matrix construction unit, for building the relationship square of difference relationship between expression any two object according to decision table Battle array, the relational matrix include the difference relationship elements of difference relationship between two objects of multiple expressions;
Difference relationship elements acquiring unit, for searching for relational matrix, obtain one by one in relational matrix with each object to be filled up Corresponding difference relationship elements;
Conflict anticipation unit, for the difference relationship elements according to acquisition, judges object to be filled up and other each objects one by one Between difference relationship whether meet conflict avoidance condition, it is different and remove missing that the conflict avoidance condition includes decision attribute Other all conditions property value all sames except attribute;
Value recording unit is filled up, for being unsatisfactory for all in the object of conflict avoidance condition, with lacking for accordingly object to be filled up It loses property value and belongs to the property value of conditional attribute as filling up value and recorded;
Shim is filled the property value of missing for the value of filling up according to record.
9. data recovery device as claimed in claim 8, which is characterized in that the data recovery device further includes:
Core condition judgment unit for searching for the relational matrix, judges each difference relationship elements pair in the relational matrix Between two objects answered difference relationship whether and meanwhile meet decision attribute values it is different, without missing attribute and one and only one The different conditional attribute of property value;
Core attributes recording unit, for the different conditional attribute of property value between the two of the condition that meets object to be denoted as core category Property;
Deletion condition acquiring unit, in obtaining relational matrix one by one in the difference relationship elements acquiring unit with each treating It fills up before the corresponding difference relationship elements of object, searches for the decision table, obtain the conditional attribute containing missing attribute values;
Core attributes judging unit for the core attributes according to the core attributes recording unit records, judges to belong to containing missing one by one Whether the conditional attribute of property value is core attributes;
Object record unit to be filled up, for when the conditional attribute containing missing attribute values is core attributes, by missing attribute values Corresponding object is denoted as object to be filled up.
10. data recovery device as claimed in claim 9, which is characterized in that the data recovery device further includes:
Redundant attributes shim, for not being that the missing attribute values of core attributes are arbitrarily filled up.
11. data recovery device as claimed in claim 8, which is characterized in that the data recovery device further includes:
Conflict judging unit, after constructing relational matrix in the relational matrix construction unit, searches for the relationship square Whether battle array conflicts between not containing any two object of missing attribute according to the judgement of difference relationship elements;
Conflict element removal unit, for remove in the relational matrix with conflict the relevant difference relationship elements of object.
12. data recovery device as claimed in claim 11, which is characterized in that the conflict judging unit is according to the relationship Whether difference relationship meets that decision attribute values are different, nothing simultaneously between corresponding two objects of each difference relationship elements in matrix Attribute and all conditions attribute all same are lacked, to judge not containing whether rushed between any two object of missing attribute It is prominent.
13. data recovery device as claimed in claim 8, which is characterized in that the data recovery device further includes:
Lose property set Component units, after building decision table in the decision table construction unit, search for the decision table, The missing attribute of each object in decision table is obtained, and forms the loss property set of each object;
Object set Component units are lacked, for the number according to element in each loss property set, corresponding object is arranged Row form missing object set;
The difference relationship elements acquiring unit puts in order according to the object in missing object set, searches for the relationship square Battle array, and difference relationship elements corresponding with each object to be filled up in relational matrix are obtained one by one.
14. such as claim 8~13 any one of them data recovery device, which is characterized in that the difference relationship elements packet The difference category for include and represent the whether consistent decision mark of two object decision attributes, representing conditional attribute difference between two objects Property collection and the conditional attribute mark vector that mark action is played for the conditional attribute value to object.
CN201310219030.7A 2013-06-04 2013-06-04 Data restoration method and device Active CN104216916B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310219030.7A CN104216916B (en) 2013-06-04 2013-06-04 Data restoration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310219030.7A CN104216916B (en) 2013-06-04 2013-06-04 Data restoration method and device

Publications (2)

Publication Number Publication Date
CN104216916A CN104216916A (en) 2014-12-17
CN104216916B true CN104216916B (en) 2018-07-03

Family

ID=52098414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310219030.7A Active CN104216916B (en) 2013-06-04 2013-06-04 Data restoration method and device

Country Status (1)

Country Link
CN (1) CN104216916B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038460A (en) * 2017-04-10 2017-08-11 南京航空航天大学 A kind of ship monitor shortage of data value complementing method based on improvement KNN
CN109033454A (en) * 2018-08-27 2018-12-18 广东电网有限责任公司 Data filling method, apparatus, equipment and storage medium based on attributes similarity

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1602042A1 (en) * 2004-02-25 2005-12-07 Microsoft Corporation Database data recovery system and method
CN101030211A (en) * 2006-01-31 2007-09-05 韦瑞吉(新加坡)私人有限公司 Methods and systems for derivation of missing data objects from test data
CN102025531A (en) * 2010-08-16 2011-04-20 北京亿阳信通软件研究院有限公司 Filling method and device thereof for performance data
CN102800078A (en) * 2012-07-20 2012-11-28 西安电子科技大学 Non-local mean image detail restoration method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1602042A1 (en) * 2004-02-25 2005-12-07 Microsoft Corporation Database data recovery system and method
CN101030211A (en) * 2006-01-31 2007-09-05 韦瑞吉(新加坡)私人有限公司 Methods and systems for derivation of missing data objects from test data
CN102025531A (en) * 2010-08-16 2011-04-20 北京亿阳信通软件研究院有限公司 Filling method and device thereof for performance data
CN102800078A (en) * 2012-07-20 2012-11-28 西安电子科技大学 Non-local mean image detail restoration method

Also Published As

Publication number Publication date
CN104216916A (en) 2014-12-17

Similar Documents

Publication Publication Date Title
CN109299811B (en) Complex network-based fraud group recognition and risk propagation prediction method
CN105306475B (en) A kind of network inbreak detection method based on Classification of Association Rules
CN104778173B (en) Target user determination method, device and equipment
CN103744928B (en) A kind of network video classification method based on history access record
CN103795613B (en) Method for predicting friend relationships in online social network
CN101996102B (en) Method and system for mining data association rule
CN106469376B (en) Risk control method and equipment
Snodgrass et al. A hierarchical mdmc approach to 2d video game map generation
CN102822822B (en) Image management apparatus, image management method, program, record medium, integrated circuit
CN109299258A (en) A kind of public sentiment event detecting method, device and equipment
CN106909643A (en) The social media big data motif discovery method of knowledge based collection of illustrative plates
CN104598569A (en) Association rule-based MBD (Model Based Definition) data set completeness checking method
CN112615888B (en) Threat assessment method and device for network attack behavior
CN104850567A (en) Method and device for identifying association between network users
CN107315822A (en) A kind of method for digging of Knowledge Relation
CN106598999A (en) Method and device for calculating text theme membership degree
CN103366009B (en) A kind of book recommendation method based on self-adaption cluster
CN105678590A (en) topN recommendation method for social network based on cloud model
CN112085125A (en) Missing value filling method based on linear self-learning network, storage medium and system
Snodgrass et al. Procedural level generation using multi-layer level representations with mdmcs
CN104216916B (en) Data restoration method and device
CN108874663A (en) Black box fault filling method and system and medium apparatus
Goldstein et al. Group-based Yule model for bipartite author-paper networks
CN103780263B (en) Device and method of data compression and recording medium
CN107358534A (en) The unbiased data collecting system and acquisition method of social networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant