CN104216916B - Data restoration method and device - Google Patents
Data restoration method and device Download PDFInfo
- Publication number
- CN104216916B CN104216916B CN201310219030.7A CN201310219030A CN104216916B CN 104216916 B CN104216916 B CN 104216916B CN 201310219030 A CN201310219030 A CN 201310219030A CN 104216916 B CN104216916 B CN 104216916B
- Authority
- CN
- China
- Prior art keywords
- attribute
- missing
- relational matrix
- difference relationship
- filled
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
Abstract
The embodiment of the present invention proposes that a kind of data restoration method and device, method include:Decision table is built according to data to be restored;The relational matrix of difference relationship between expression any two object is built according to decision table;Relational matrix is searched for, obtains difference relationship elements corresponding with each object to be filled up in relational matrix one by one;According to the difference relationship elements of acquisition, judge whether the difference relationship between object to be filled up and other each objects meets conflict avoidance condition one by one;It is unsatisfactory for all in the object of conflict avoidance condition, belonging to the property value of conditional attribute as filling up value with the missing attribute values of accordingly object to be filled up is recorded;The property value of missing is filled according to the value of filling up of record.The device of the embodiment of the present invention fills up missing attribute using the difference relationship between object, greatly reduces the complexity of calculating, and it is possible to prevente effectively from the data collision generated due to filling up.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of data restoration method and device.
Background technology
With the development of technology and the progress of science, the problem of being skyrocketed through having become one and can not be ignored of data volume,
In face of the data that pile up like a mountain, it is desirable to obtain useful information and knowledge not a duck soup.Therefore data mining technology is along with people
Demand and science and technology progress obtain it is fast-developing.However in practical application, the appearance of missing data is to all data point
It is all a generally existing challenging problem that analysis technology, which includes data mining,.
The generation of missing data is recurrent in practice or even is inevitable.That causes shortage of data can
Can be that information temporarily can not be obtained or be missed in operation.At present, many methods for solving the problems, such as shortage of data
It is with feasible, the value of an estimation (such as mean value mostly using simple enthesis (simple imputation)
Or the estimated value obtained with other methods) filling is carried out to missing data.Its advantage is that speed is fast, but this
Method underestimates the incidence relation between variable, can distort the distribution of sample, it is impossible to reflect the uncertainty of missing values.
Difference lies in Multiple Imputations pair with simple enthesis for Multiple Imputation (Multiple Imputation, MI)
Each missing values is filled up with the set of a probable value, is repeated as many times, so multiple imputation is, to reflect missing values
Uncertainty, so as to generate several complete data sets.Then, each is filled up with the statistical method for complete data set
Data set difference is for statistical analysis, and obtained result is integrated, and then generates final statistical inference.It is but multiple
The processing procedure of enthesis is more complicated, is not particularly suited for lightweight data analysis.
Therefore, a kind of data reduction technology is needed at present, the incidence relation between variable not only can be considered, but also can be applicable in
In the data analysis of lightweight.
Invention content
The purpose of the embodiment of the present invention is to provide a kind of data restoration method and device, to solve existing data convert skill
The problem of art error rate height or excessively complicated processing procedure.
The embodiment of the present invention proposes a kind of data restoration method, including:
Decision table is built according to data to be restored, the decision table includes object set, conditional attribute collection and decision attribute
Collection;
The relational matrix of difference relationship between expression any two object is built according to decision table, is wrapped in the relational matrix
Include it is multiple expression two objects between difference relationship difference relationship elements;
Relational matrix is searched for, obtains difference relationship elements corresponding with each object to be filled up in relational matrix one by one;
According to the difference relationship elements of acquisition, the difference relationship between object to be filled up and other each objects is judged one by one
Whether conflict avoidance condition is met, and the conflict avoidance condition includes decision attribute difference and removes other except missing attribute
All conditions property value all same;
It is unsatisfactory for all in the object of conflict avoidance condition, one is belonged to the missing attribute values of accordingly object to be filled up
The property value conduct of a conditional attribute is filled up value and is recorded;
The property value of missing is filled according to the value of filling up of record.
The embodiment of the present invention proposes also a kind of data recovery device, including:
Decision table construction unit, for building decision table according to data to be restored, the decision table includes object set, item
Part property set and decision kind set;
Relational matrix construction unit, for building the relationship of difference relationship between expression any two object according to decision table
Matrix, the relational matrix include the difference relationship elements of difference relationship between two objects of multiple expressions;
Difference relationship elements acquiring unit for searching for relational matrix, is obtained in relational matrix with each waiting to fill up one by one
The corresponding difference relationship elements of object;
Conflict anticipation unit, for the difference relationship elements according to acquisition, judge one by one object to be filled up with it is other each
Whether the difference relationship between object meets conflict avoidance condition, and the conflict avoidance condition includes decision attribute difference and removes
Lack other all conditions property value all sames except attribute;
Value recording unit is filled up, for being unsatisfactory for all in the object of conflict avoidance condition, with corresponding object to be filled up
Missing attribute values belong to the property value of conditional attribute as filling up value and recorded;
Shim is filled the property value of missing for the value of filling up according to record.
Relative to the prior art, the beneficial effects of the invention are as follows:The device of the embodiment of the present invention utilizes the difference between object
Not relationship come to missing attribute fill up, and introduce the judgment rule of core attributes and object conflict, greatly reduce calculating
Complexity, and the it is possible to prevente effectively from data collision generated due to filling up.
Description of the drawings
Fig. 1 is a kind of flow chart of data restoration method of the embodiment of the present invention;
Fig. 2 is the flow chart of another data restoration method of the embodiment of the present invention;
Fig. 3 is a kind of structure chart of data recovery device of the embodiment of the present invention;
Fig. 4 is the structure chart of another data recovery device of the embodiment of the present invention.
Specific embodiment
For the present invention aforementioned and other technology contents, feature and effect, in following cooperation with reference to the preferable reality of schema
Applying during example is described in detail clearly to be presented.By the explanation of specific embodiment, when can be to the present invention to reach predetermined mesh
The technological means taken and effect be able to more deeply and it is specific understand, however institute's accompanying drawings are only to provide with reference to saying
It is bright to be used, it is not intended to limit the present invention.
Involved partial symbols are described as follows in the embodiment of the present invention:" ∧ " expression " and " the meaning;" ∨ " is represented
The meaning of "or";Represent the meaning of " arbitrary ";Represent the meaning of " presence ";“|CD| " represent set CDGesture, i.e.,
Middle set CDThe number of element.
Fig. 1 is referred to, is a kind of flow chart of data restoration method of the embodiment of the present invention, includes the following steps:
S101 builds decision table according to data to be restored, and the decision table includes object set, conditional attribute collection and decision
Property set.
Decision table is a kind of graphical tool in table shape, more suitable for description processing Rule of judgment, each condition and phase
Mutually combination, the situation there are many decision scheme.It the different demands statistic abstraction of specific industry is condition category to be in data mining
Property and decision attribute.For example, mobile user's statistical form, as decision table, wherein user is exactly object, and conditional attribute can be
The essential information of user, such as gender, occupation, decision attribute are exactly the set meal that user handles.Under normal conditions, decision attribute
Value is determined by the value of one or more conditional attributes, for example, " package name " this decision attribute is by " monthly flowing
This conditional attribute of amount " determines.
S102 builds the relational matrix of difference relationship between expression any two object, the relationship square according to decision table
Battle array includes the difference relationship elements of difference relationship between two objects of multiple expressions.
Each object in decision table has its corresponding conditional attribute and decision attribute, passes through the item between two objects
The comparison of part attribute and decision attribute, it is possible to obtain the difference relationship between the two objects, in other words, difference Relation Element
Element is obtained by the comparison and operation of conditional attribute and decision attribute between two objects.
The difference relationship elements can include representing the whether consistent decision mark of two object decision attributes, represent two
The discernible attributes set of conditional attribute difference and the item of mark action is played for the conditional attribute value to object between a object
Part attribute-bit vector.
If decision table is S=(U, A, V, f), wherein U={ x1, x2..., xnIt is a nonempty finite object set, A is
The attribute set of object, respectively by conditional attribute collection C={ ai| i=1 ..., m } and decision kind set D={ d } two it is non-intersect
Subset composition, i.e. A=CUD;aiIt is sample xiIn attribute ai(xi) on value.The model of the relational matrix of structure can be
VM (i, j)=(D, CD, F), wherein decision mark D is expressed as:
d(xi) and d (xj) it is object x respectivelyiWith object xjThe value of decision attribute;
Discernible attributes set C between objectDIt is expressed as:
CD={ ak|ak∈A∧ak(xi)≠ak(xj)∧ak(xi)≠*∧ak(xj) ≠ * } (formula 2)
That is CDIn each element belong to conditional attribute, and object xiWith object xjConditional attribute akValue it is different, and
Object xiWith object xjConditional attribute akValue do not lack;
Conditional attribute mark vector F is expressed as:
F=(f (a1), f (a2) ..., f (an)) (formula 3)
Wherein,
Obviously, relational matrix VM is matrix symmetrical, based on vector, and each element more completely has recorded phase in matrix
The relationship for the sample object answered:
WhenIt is that different decision is differentiated to represent two sample objects;
WhenIt is conflict to represent two sample objects;
WhenIt is that different decision is indiscriminate to represent two sample objects;
WhenIt is that same decision is differentiated to represent two sample objects;
WhenIt is that same decision is indiscriminate to represent two sample objects.
S103 searches for relational matrix, obtains difference Relation Element corresponding with each object to be filled up in relational matrix one by one
Element.
Object to be filled up refers to that corresponding conditional attribute contains missing values, and prepares to fill out the conditional attribute value of missing
The object of benefit.
S104 according to the difference relationship elements of acquisition, judges the difference between object to be filled up and other each objects one by one
Whether relationship does not meet conflict avoidance condition, and the conflict avoidance condition includes decision attribute difference and removes except missing attribute
Other all conditions property value all sames.
By taking aforementioned decision table S=(U, A, V, f) as an example, wherein A=CUD, xi∈ U, xiIn have missing.Then to xiIt is filled out
After benefit, be only possible toX corresponding tojGenerate contradiction, without withX corresponding tojGenerate contradiction.Thus willMake
For conflict avoidance condition.
It proves:A. it is directed toX corresponding toj, i.e. xiDifferent decision indifference
Object is into line justification:
Due to xiAnd xjDecisional conflict is likely to occur after filling up, i.e.,It is aobvious
Right, what can uniquely be met the requirements is exactly the indifference object of different decision, i.e. xiAfter filling up, be only possible toCorresponding xjGenerate decisional conflict.
B. it is directed toX corresponding toj, i.e. xiThere is difference object to be demonstrate,proved
It is bright:
Show xiAnd xjIt certainly exists at least oneai(xi)≠ai(xj) ≠ *, decisional conflict are:So xiIt can not possibly be with after filling upCorresponding xjGenerate decisional conflict.
S105 is unsatisfactory for all in the object of conflict avoidance condition, same with the missing attribute values of accordingly object to be filled up
Belong to the property value of conditional attribute as filling up value to be recorded.
S106 is filled the property value of missing according to the value of filling up of record.
The method of the embodiment of the present invention fills up missing attribute using the difference relationship between object, reduces meter
The complexity of calculation, and it is possible to prevente effectively from the data collision generated due to filling up.
Fig. 2 is referred to, is the flow chart of another data restoration method of the embodiment of the present invention, includes the following steps:
S201 builds decision table according to data to be restored, and the decision table includes object set, conditional attribute collection and decision
Property set.
S202 searches for the decision table, obtains the missing attribute of each object in decision table, and forms the something lost of each object
Lose property set.
For example, object x1Attribute a missing attribute values, then object x1Loss property set MAS (x1)={ a };For another example, it is right
The x of elephant2Attribute a and attribute b missing attribute values, then object x2Loss property set MAS (x2)={ a, b }.
S203 according to the number of element in each loss property set, arranges object to be filled up accordingly, forms and lack
Lose object set.
The number of element is also referred to as gesture in set.For example, object x1Loss property set MAS (x1)={ a }, wherein element
Number is 1;Object x2Loss property set MAS (x2)={ a, b }, wherein element number is 2;Object x3Loss property set MAS
(x3)={ a, b, c }, wherein element number is 3;It then may be constructed missing object set MOS={ x1, x2, x3}。
S204 builds the relational matrix of difference relationship between expression any two object, the relationship square according to decision table
Battle array includes the difference relationship elements of difference relationship between two objects of multiple expressions.
S205 searches for the relational matrix, and any two pair of missing attribute is not contained according to the judgement of difference relationship elements
Whether conflict as between.If conflict, enters step S206;If not conflicting, S207 is entered step.
The judgment rule of conflict can be:In the relational matrix between corresponding two objects of each difference relationship elements
Whether difference relationship meets decision attribute values difference, without missing attribute and all conditions attribute all same simultaneously.Because decision
What the value of attribute would generally be determined by one or more conditional attributes, if conditional attribute all between two objects is homogeneous
Together, but decision attribute is different, then the data among the two objects there are one possibility are wrong, and the number of this conflict
There is mistake according to the calculating of filling up of missing values is easily made.
If for example, object x1With object x2All only comprising conditional attribute a, conditional attribute b and decision attribute d, object x1With
Object x2Conditional attribute a value all for 1, the value of conditional attribute b is all 2, but object x1Decision attribute d value for 1, it is right
As x2Decision attribute d value for 2, then illustrate object x1With object x2It is conflict.
S206, remove in the relational matrix with conflict the relevant difference relationship elements of object.
For example, x is included in decision table1、x2、x3、x4、x5Totally 5 objects, if x1And x2Conflict, then by x in relational matrix1
With x2、x1With x3、x1With x4、x1With x5、x2With x3、x2With x4、x2With x5Corresponding difference relationship elements remove.
S207 searches for the relational matrix, judge in the relational matrix each difference relationship elements it is corresponding two it is right
Whether difference relationship it is different, different without missing attribute and one and only one property value meet decision attribute values simultaneously as between
Conditional attribute.
S208, by the property value of two objects corresponding to the difference relationship elements for meeting step S207 conditionals not
Same conditional attribute is denoted as core attributes.
During missing data is filled, consider to need to pay attention to important attribute and redundant attributes to decision during economic factor
Contribution it is different, so when carrying out data convert, can preferentially be filled up according to Attribute Significance.Core attributes described here are
Refer to the conditional attribute that may be played a key effect to the value of decision attribute.
In other words, in any one VM (i, j)=(D, CD, F) in, ifThe then corresponding attribute a of the elementiCore attributes for prime information table.
It proves:ExplanationCorresponding object xiAnd xjIt is complete, xi, xj ∈ S ', S ' are to determine
The complete information table of plan table.
That is object xiAnd xjDiscernible attributes set CDIn only there are one element, again
D=1, that is, d (xi) ≠ d (xj), illustrate that only object x can be distinguished there are one attribute in entire property setiAnd xj.So the attribute is exactly
Core attributes in complete sub-information table.
In addition to this, incomplete information table S=(U, A, V, f) and its complete sub-information table S ', wherein A=CUD are given,
ai∈ C, if aiThe core attributes in S, then aiCertain is also the core attributes in S '.
It proves:If aiIt is the core attributes of S ', then removes aiAfterwards,There is aj(x)=aj(y)∧d(x)
≠ d (y), wherein d ∈ D, S ' is middle to generate decisional conflict, and U ' ∈ U, removes aiAfterwards, decisional conflict can be equally generated in S.Therefore
It can prove aiEqually it is the core attributes in S.
S209 searches for the decision table, obtains the conditional attribute containing missing attribute values.
S210 judges whether the conditional attribute containing missing attribute values is core attributes one by one.If it is not, then illustrate the missing category
Property value be redundant attributes, enter step S211;If so, enter step S212.
S211 arbitrarily fills up the missing attribute values of redundancy.
The corresponding object of the missing attribute values of core attributes is denoted as object to be filled up by S212.
S213 according to putting in order for the object to be filled up in missing object set, searches for the relational matrix, and obtain one by one
Take difference relationship elements corresponding with each object to be filled up in relational matrix.It is right that the difference relationship elements include expression two
As the discernible attributes set of conditional attribute difference between two the whether consistent decision mark of decision attribute, expression objects and it is used for
The conditional attribute mark vector of mark action is played to the conditional attribute value of object.
S214 according to the difference relationship elements of acquisition, judges the difference between object to be filled up and other each objects one by one
Whether relationship does not meet conflict avoidance condition, and the conflict avoidance condition includes decision attribute difference and removes except missing attribute
Other all conditions property value all sames.
S215 is unsatisfactory for all in the object of conflict avoidance condition, same with the missing attribute values of accordingly object to be filled up
Belong to the property value of conditional attribute as filling up value to be recorded.
S216 is filled the property value of missing according to the value of filling up of record.
For uncoordinated incomplete decision information system, the basic thought for obtaining decision rule is optimal selection.Most preferably
Selection method eliminates the subjectivity of people, is not the data for going to fill up those missings by people's subjective judgement, but finds out those
It is the data that decision is most likely to occur.It is not isolated condition selection, but passes through known conditional attribute value, the choosing of system
The data value of missing is selected, i.e., selects the data value of missing on the whole.Incomplete data fills up different from yojan, the emphasis of concern
It is not the elimination problem of redundant attributes, but payes attention to the reasonable of loss data and fill up problem.Only reasonably filling up can
Ensure and improve the accuracy and reliability of Rule Extraction after Reduction of Knowledge.
To investigate the validity of algorithm, a given uncoordinated incomplete decision table (being shown in Table 1) is now selected, utilizes this
The method of invention carries out filling up operation.In table 1:A, b, c are conditional attribute, and d is decision attribute, x1~x7For object." * " is sky
It is worth (property value of missing).
U | a | b | c | d |
x1 | 1 | 1 | 1 | 1 |
x2 | 1 | * | 1 | 1 |
x3 | 2 | 1 | 1 | 1 |
x4 | 1 | 2 | * | 1 |
x5 | 1 | * | 1 | 2 |
x6 | 2 | 1 | 2 | 2 |
x7 | 1 | 2 | 1 | 2 |
Table 1
The object x according to table 12、x4、x5Contain missing attribute values, object x2Loss property set MAS (x2)={ b }, it is right
As x4Loss property set MAS (x4)={ c }, object x5Loss property set MAS (x5)={ b }.Due to object x2、x4、x5's
The number for losing property set is all 1, so missing object set MOS={ x2, x4, x5}。
According to the decision table of table 1, relational matrix as shown in Table 2 is built using 1~formula of aforementioned formula 4:
Table 2
Each difference relationship elements VM (i, j)=(D, C in table 2D, F) represent difference relationship between two objects, x1With
x1~x7Operation is carried out successively, obtains the difference relationship elements of the first row, x2With x2~x7Operation is carried out successively, obtains the second row
Difference relationship elements ..., x7With x7Operation is carried out, obtains the difference relationship elements of the 7th row.Wherein D is identified for decision, CDFor difference
Other property set, F are conditional attribute mark vector.
Relational matrix is searched for, is first judged whether there isI.e.
Whether the object for not containing missing values conflicts, according to table 1 it can be seen that no complete object conflict.
Then it judges whether there isI.e.
Attribute of whether having ready conditions is core attributes, and VM (1,7)=(1, { b }, (0,1,0)) can be obtained through search, VM (3,6)=(1, { c },
(0,0,1)) meet condition, then change core attribute set H={ b, c };
Object containing missing values is x2、x4、x5, the property value of three object missings is filled up respectively.With x2For
Example:
Because MAS (x2)={ b }, b ∈ H, so being scanned for relational matrix, with object x2Relevant difference Relation Element
The difference relationship elements of second row in the element i.e. relational matrix table of table 2.
It judges whether there is and is unsatisfactory forDifference relationship elements, can obtainMeet, that is, illustrate after being filled up to b, x2And x7It may clash, thus need to be excluded filling up
Except calculating, as shown in table 3:
Table 3
And it records and is excluded the corresponding b (x of itemj) value be recorded in set impossiblegetV, willCorresponding b (x7)=2 are recorded in set impossiblegetV, are recorded in set impossiblegetV
Value will not be used to lacking item and fill up, can obtain impossiblegetV={ 2 }.
For meetingDifference relationship elements, by corresponding b (xj) value collection is recorded
It closes in possiblegetV, the value recorded in set possiblegetV is used for filling up missing item.MeetDifference relationship elements include(0, { a }, (1, *, 0)),(1, { a, c }, (1/2, *, 1/2)), corresponding b (xj) value be b (x1)=1, b (x2)=*,
b(x3)=1, b (x4)=2, b (x5)=*, b (x6The value 2 in null value * and set impossiblegetV is removed, finally in)=1
Obtain possoblegetV={ 1 }.
Containing only there are one element in being gathered due to possiblegetV, so directly carrying out missing item b (x with it2) fill out
Mending (has multiple elements in such as possiblegetV set, then arbitrary value in possiblegetV set can be used to be filled out
It mends).After the completion of filling up, modified MAS (x2) it is empty, it was demonstrated that the object x after filling up2To be complete, so modification relationship square
Battle array, obtains relational matrix as shown in table 4:
Table 4
Object x4And x5Operation be same as above.After the completion of filling up, missing object set MOS is sky, and algorithm terminates.It obtains complete
Relational matrix it is as shown in table 5:
Table 5
It is as shown in table 6 to obtain complete decision table:
U | a | b | c | d |
x1 | 1 | 1 | 1 | 1 |
x2 | 1 | 1 | 1 | 1 |
x3 | 2 | 1 | 1 | 1 |
x4 | 1 | 2 | 2 | 1 |
x5 | 1 | 2 | 1 | 2 |
x6 | 2 | 1 | 2 | 2 |
x7 | 1 | 2 | 1 | 2 |
Table 6
Former incomplete decision tables have attribute at three to lack, six kinds of completion selections.Six kinds of completion selections are verified
It calculates it is found that optimal complete selection there are two types of former incomplete decision tables, is 1,1,2 and 1,2,2 (specific to calculate here no longer respectively
It repeats).It is apparent that decision conflict can be generated in first group of complete selection;And it is exactly with the obtained complete solution of context of methods
Second group of optimal complete selection, the conflict for effectively avoiding due to filling up and generating.
Fig. 3 is referred to, is a kind of structure chart of data recovery device of the embodiment of the present invention, is built including decision table
Unit 301, difference relationship elements acquiring unit 303, conflict anticipation unit 304, fills up value note at relational matrix construction unit 302
Record unit 305 and shim 306.Relational matrix construction unit 302 is connected with decision table construction unit 301, difference relationship
Element acquiring unit 303 is connected with relational matrix construction unit 302, and conflict anticipation unit 304 obtains single with difference relationship elements
Member 303 is connected, and fills up value recording unit 305 and is connected with conflict anticipation unit 304, shim 306 is with filling up value recording unit
305 are connected.
Decision table construction unit 301 is used to build decision table according to data to be restored, the decision table include object set,
Conditional attribute collection and decision kind set.
Relational matrix construction unit 302 is used for the decision table built according to the decision table construction unit 301, and structure represents
The relational matrix of difference relationship between any two object.The relational matrix includes difference between two objects of multiple expressions
The difference relationship elements of relationship.Each difference relationship elements can include the decision for representing whether two object decision attributes are consistent
Mark, represent two objects between conditional attribute difference discernible attributes set and play mark for the conditional attribute value to object
The conditional attribute mark vector of knowledge effect.
Difference relationship elements acquiring unit 303 is used to search for the relational matrix that the relational matrix construction unit 302 is built,
Difference relationship elements corresponding with each object to be filled up in relational matrix are obtained one by one.The object to be filled up refers to corresponding
Conditional attribute contains missing values, and prepares the object filled up to the conditional attribute value of missing.
Conflict anticipation unit 304 is used for the difference relationship elements obtained according to the difference relationship elements acquiring unit 303
(the difference relationship elements each obtained at least correspond to an object to be filled up), judge one by one object to be filled up with it is other each
Whether the difference relationship between object meets conflict avoidance condition, and the conflict avoidance condition includes decision attribute difference and removes
Lack other all conditions property value all sames except attribute.
It fills up value recording unit 305 and is unsatisfactory for conflict avoidance item for what is judged according to the conflict anticipation unit 304
By in two objects corresponding to these difference relationship elements, one is belonged to missing attribute values for the difference relationship elements of part
The property value conduct of conditional attribute is filled up value and is recorded.
Shim 306 be used to filling up according to value recording unit 305 records fill up value to the property value of missing into
Row filling.
The device of the embodiment of the present invention fills up missing attribute using the difference relationship between object, reduces meter
The complexity of calculation, and it is possible to prevente effectively from the data collision generated due to filling up.
Fig. 4 is referred to, is the structure chart of another data recovery device of the embodiment of the present invention.With the embodiment of Fig. 3
It compares, the present embodiment further includes core condition judgment unit 307, core attributes recording unit 308, deletion condition acquiring unit 309, core
Determined property unit 310, object record unit 311 to be filled up, redundant attributes shim 312, conflict judging unit 313, punching
Prominent element removal unit 314 loses property set Component units 315 and missing object set Component units 316.Conflict judging unit
313 are connected with relational matrix construction unit 302, and conflict element removal unit 314 is connected with conflict judging unit 313, loses and belongs to
Property collection Component units 315 be connected with decision table construction unit 301, missing object set Component units 316 are respectively with losing property set
Component units 315 and difference relationship elements acquiring unit 303 are connected, core condition judgment unit 307 and conflict element removal unit
314 are connected, and core attributes recording unit 308 is connected with core condition judgment unit 307, deletion condition acquiring unit 309 and conflict member
Plain removal unit 314 be connected, core attributes judging unit 310 respectively with deletion condition acquiring unit 309 and core attributes recording unit
308 are connected, and object record unit 311 to be filled up is connected with core attributes judging unit 310, redundant attributes shim 312 and core
Determined property unit 310 is connected.
The judging unit 313 that conflicts is used for after the relational matrix construction unit 302 constructs relational matrix, searches for institute
Relational matrix is stated, whether is conflicted between not containing any two object of missing attribute according to the judgement of difference relationship elements.It is described
Conflict judging unit 313 can be closed according to difference between corresponding two objects of difference relationship elements each in the relational matrix
Whether system meets decision attribute values difference, without missing attribute and all conditions attribute all same simultaneously, scarce to judge not containing
Whether conflict between any two object of mistake attribute.
Conflict element removal unit 314 is used to remove in the relational matrix that the conflict judging unit 313 is judged
With the relevant difference relationship elements of object that conflict.
Core condition judgment unit 307 is used to eliminate the relevant difference of conflict object in the conflict element removal unit 314
After other relationship elements, search for relational matrix, judge in the relational matrix corresponding two objects of each difference relationship elements it
Between difference relationship whether meet simultaneously decision attribute values it is different, without the different item of missing attribute and one and only one property value
Part attribute.
Core attributes recording unit 308 be used for two that meet condition that judge the core condition judgment unit 307 it is right
The different conditional attribute of property value is denoted as core attributes as between.
Deletion condition acquiring unit 309 is used to obtain relational matrix one by one in the difference relationship elements acquiring unit 303
In before difference relationship elements corresponding with each object to be filled up, search for the decision table, obtain containing missing attribute values
Conditional attribute.
Core attributes judging unit 310 is used for the core attributes according to the core attributes recording unit records, judges to contain one by one
Whether the conditional attribute of missing attribute values is core attributes.
Object record unit 311 to be filled up is used to judge containing missing attribute values in the core attributes judging unit 310
Conditional attribute when being core attributes, the corresponding object of missing attribute values is denoted as object to be filled up.
Redundant attributes shim 312 be used for the core attributes judging unit 310 judge be not core attributes lack
Property value is lost arbitrarily to be filled up.
Lose property set Component units 315 to be used for after the decision table construction unit 301 builds decision table, search for institute
Decision table is stated, obtains the missing attribute of each object in decision table, and forms the loss property set of each object.
It lacks object set Component units 316 and is used for each loss formed according to the loss property set Component units 315
The number of element in property set, arranges corresponding object, forms missing object set.The difference relationship elements obtain single
Member 303 puts in order according to the object in the missing object set of the missing composition of object set Component units 316, searches for the relationship
Matrix, and difference relationship elements corresponding with each object to be filled up in relational matrix are obtained one by one.
The device of the embodiment of the present invention fills up, and introduce missing attribute using the difference relationship between object
The judgment rule of core attributes and object conflict greatly reduces the complexity of calculating, and it is possible to prevente effectively from is produced due to filling up
Raw data collision.
Through the above description of the embodiments, those skilled in the art can be understood that the embodiment of the present invention
The mode of necessary general hardware platform can also be added to realize by software by hardware realization.Based on such reason
Solution, the technical solution of the embodiment of the present invention can be embodied in the form of software product, which can be stored in one
A non-volatile memory medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in, it is used including some instructions so that a meter
It calculates machine equipment (can be personal computer, server or the network equipment etc.) and performs each implement scene institute of the embodiment of the present invention
The method stated.
The above described is only a preferred embodiment of the present invention, not make limitation in any form to the present invention, though
So the present invention is disclosed above with preferred embodiment, however is not limited to the present invention, any technology people for being familiar with this profession
Member, is not departing from the range of technical scheme, when the technology contents using the disclosure above make a little change or modification
For the equivalent embodiment of equivalent variations, as long as being without departing from technical scheme content, technical spirit pair according to the present invention
Any simple modification, equivalent change and modification that above example is made, in the range of still falling within technical solution of the present invention.
Claims (14)
1. a kind of data restoration method, which is characterized in that including:
Decision table is built according to data to be restored, the decision table includes object set, conditional attribute collection and decision kind set;
The relational matrix of difference relationship between expression any two object is built according to decision table, the relational matrix includes more
The difference relationship elements of difference relationship between two objects of a expression;
Relational matrix is searched for, obtains difference relationship elements corresponding with each object to be filled up in relational matrix one by one;
According to the difference relationship elements of acquisition, judge whether is difference relationship between object to be filled up and other each objects one by one
Meet conflict avoidance condition, the conflict avoidance condition includes decision attribute difference and removes other all except missing attribute
Conditional attribute value all same;
It is unsatisfactory for all in the object of conflict avoidance condition, an item is belonged to the missing attribute values of accordingly object to be filled up
The property value conduct of part attribute is filled up value and is recorded;
The property value of missing is filled according to the value of filling up of record.
2. data restoration method as described in claim 1, which is characterized in that laggard the one of described the step of building relational matrix
Step includes:
The relational matrix is searched for, judges in the relational matrix difference between corresponding two objects of each difference relationship elements
Whether relationship meets decision attribute values difference, without the different condition category of missing attribute and one and only one property value simultaneously
Property;
If satisfied, the different conditional attribute of the property value is then denoted as core attributes;
Described search relational matrix obtains the step of difference relationship elements corresponding with each object to be filled up in relational matrix one by one
Further comprise before rapid:
The decision table is searched for, obtains the conditional attribute containing missing attribute values;
Judge whether the conditional attribute containing missing attribute values is core attributes one by one;
If so, the corresponding object of missing attribute values is denoted as object to be filled up.
3. data restoration method as claimed in claim 2, which is characterized in that described to judge the item containing missing attribute values one by one
Include after the step of whether part attribute is core attributes:
If it is not, then the property value of missing is arbitrarily filled up.
4. data restoration method as described in claim 1, which is characterized in that laggard the one of described the step of building relational matrix
Step includes:
Search for the relational matrix, according to difference relationship elements judgement do not contain missing attribute any two object between whether
Conflict;
If conflict, remove in the relational matrix with conflict the relevant difference relationship elements of object.
5. data restoration method as claimed in claim 4, which is characterized in that the judgement does not contain arbitrary the two of missing attribute
The step of whether conflicting between a object includes:
Judge in the relational matrix between corresponding two objects of each difference relationship elements difference relationship whether and meanwhile meet
Decision attribute values are different, without missing attribute and all conditions attribute all same;
If satisfied, it is then conflict.
6. data restoration method as described in claim 1, which is characterized in that
It is further included after the step of structure decision table:
The decision table is searched for, obtains the missing attribute of each object in decision table, and forms the loss property set of each object;
According to the number of element in each loss property set, corresponding object is arranged, forms missing object set;
Described search relational matrix obtains the step of difference relationship elements corresponding with each object to be filled up in relational matrix one by one
Suddenly include:
Put in order according to the object in missing object set, search for the relational matrix, and obtain one by one in relational matrix with
Each corresponding difference relationship elements of object to be filled up.
7. such as claim 1~6 any one of them data restoration method, which is characterized in that the difference relationship elements include
Represent the whether consistent decision mark of two object decision attributes, the difference attribute for representing conditional attribute difference between two objects
Collection and the conditional attribute mark vector that mark action is played for the conditional attribute value to object.
8. a kind of data recovery device, which is characterized in that including:
Decision table construction unit, for building decision table according to data to be restored, the decision table includes object set, condition category
Property collection and decision kind set;
Relational matrix construction unit, for building the relationship square of difference relationship between expression any two object according to decision table
Battle array, the relational matrix include the difference relationship elements of difference relationship between two objects of multiple expressions;
Difference relationship elements acquiring unit, for searching for relational matrix, obtain one by one in relational matrix with each object to be filled up
Corresponding difference relationship elements;
Conflict anticipation unit, for the difference relationship elements according to acquisition, judges object to be filled up and other each objects one by one
Between difference relationship whether meet conflict avoidance condition, it is different and remove missing that the conflict avoidance condition includes decision attribute
Other all conditions property value all sames except attribute;
Value recording unit is filled up, for being unsatisfactory for all in the object of conflict avoidance condition, with lacking for accordingly object to be filled up
It loses property value and belongs to the property value of conditional attribute as filling up value and recorded;
Shim is filled the property value of missing for the value of filling up according to record.
9. data recovery device as claimed in claim 8, which is characterized in that the data recovery device further includes:
Core condition judgment unit for searching for the relational matrix, judges each difference relationship elements pair in the relational matrix
Between two objects answered difference relationship whether and meanwhile meet decision attribute values it is different, without missing attribute and one and only one
The different conditional attribute of property value;
Core attributes recording unit, for the different conditional attribute of property value between the two of the condition that meets object to be denoted as core category
Property;
Deletion condition acquiring unit, in obtaining relational matrix one by one in the difference relationship elements acquiring unit with each treating
It fills up before the corresponding difference relationship elements of object, searches for the decision table, obtain the conditional attribute containing missing attribute values;
Core attributes judging unit for the core attributes according to the core attributes recording unit records, judges to belong to containing missing one by one
Whether the conditional attribute of property value is core attributes;
Object record unit to be filled up, for when the conditional attribute containing missing attribute values is core attributes, by missing attribute values
Corresponding object is denoted as object to be filled up.
10. data recovery device as claimed in claim 9, which is characterized in that the data recovery device further includes:
Redundant attributes shim, for not being that the missing attribute values of core attributes are arbitrarily filled up.
11. data recovery device as claimed in claim 8, which is characterized in that the data recovery device further includes:
Conflict judging unit, after constructing relational matrix in the relational matrix construction unit, searches for the relationship square
Whether battle array conflicts between not containing any two object of missing attribute according to the judgement of difference relationship elements;
Conflict element removal unit, for remove in the relational matrix with conflict the relevant difference relationship elements of object.
12. data recovery device as claimed in claim 11, which is characterized in that the conflict judging unit is according to the relationship
Whether difference relationship meets that decision attribute values are different, nothing simultaneously between corresponding two objects of each difference relationship elements in matrix
Attribute and all conditions attribute all same are lacked, to judge not containing whether rushed between any two object of missing attribute
It is prominent.
13. data recovery device as claimed in claim 8, which is characterized in that the data recovery device further includes:
Lose property set Component units, after building decision table in the decision table construction unit, search for the decision table,
The missing attribute of each object in decision table is obtained, and forms the loss property set of each object;
Object set Component units are lacked, for the number according to element in each loss property set, corresponding object is arranged
Row form missing object set;
The difference relationship elements acquiring unit puts in order according to the object in missing object set, searches for the relationship square
Battle array, and difference relationship elements corresponding with each object to be filled up in relational matrix are obtained one by one.
14. such as claim 8~13 any one of them data recovery device, which is characterized in that the difference relationship elements packet
The difference category for include and represent the whether consistent decision mark of two object decision attributes, representing conditional attribute difference between two objects
Property collection and the conditional attribute mark vector that mark action is played for the conditional attribute value to object.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310219030.7A CN104216916B (en) | 2013-06-04 | 2013-06-04 | Data restoration method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310219030.7A CN104216916B (en) | 2013-06-04 | 2013-06-04 | Data restoration method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104216916A CN104216916A (en) | 2014-12-17 |
CN104216916B true CN104216916B (en) | 2018-07-03 |
Family
ID=52098414
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310219030.7A Active CN104216916B (en) | 2013-06-04 | 2013-06-04 | Data restoration method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104216916B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107038460A (en) * | 2017-04-10 | 2017-08-11 | 南京航空航天大学 | A kind of ship monitor shortage of data value complementing method based on improvement KNN |
CN109033454A (en) * | 2018-08-27 | 2018-12-18 | 广东电网有限责任公司 | Data filling method, apparatus, equipment and storage medium based on attributes similarity |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1602042A1 (en) * | 2004-02-25 | 2005-12-07 | Microsoft Corporation | Database data recovery system and method |
CN101030211A (en) * | 2006-01-31 | 2007-09-05 | 韦瑞吉(新加坡)私人有限公司 | Methods and systems for derivation of missing data objects from test data |
CN102025531A (en) * | 2010-08-16 | 2011-04-20 | 北京亿阳信通软件研究院有限公司 | Filling method and device thereof for performance data |
CN102800078A (en) * | 2012-07-20 | 2012-11-28 | 西安电子科技大学 | Non-local mean image detail restoration method |
-
2013
- 2013-06-04 CN CN201310219030.7A patent/CN104216916B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1602042A1 (en) * | 2004-02-25 | 2005-12-07 | Microsoft Corporation | Database data recovery system and method |
CN101030211A (en) * | 2006-01-31 | 2007-09-05 | 韦瑞吉(新加坡)私人有限公司 | Methods and systems for derivation of missing data objects from test data |
CN102025531A (en) * | 2010-08-16 | 2011-04-20 | 北京亿阳信通软件研究院有限公司 | Filling method and device thereof for performance data |
CN102800078A (en) * | 2012-07-20 | 2012-11-28 | 西安电子科技大学 | Non-local mean image detail restoration method |
Also Published As
Publication number | Publication date |
---|---|
CN104216916A (en) | 2014-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299811B (en) | Complex network-based fraud group recognition and risk propagation prediction method | |
CN105306475B (en) | A kind of network inbreak detection method based on Classification of Association Rules | |
CN104778173B (en) | Target user determination method, device and equipment | |
CN103744928B (en) | A kind of network video classification method based on history access record | |
CN103795613B (en) | Method for predicting friend relationships in online social network | |
CN101996102B (en) | Method and system for mining data association rule | |
CN106469376B (en) | Risk control method and equipment | |
Snodgrass et al. | A hierarchical mdmc approach to 2d video game map generation | |
CN102822822B (en) | Image management apparatus, image management method, program, record medium, integrated circuit | |
CN109299258A (en) | A kind of public sentiment event detecting method, device and equipment | |
CN106909643A (en) | The social media big data motif discovery method of knowledge based collection of illustrative plates | |
CN104598569A (en) | Association rule-based MBD (Model Based Definition) data set completeness checking method | |
CN112615888B (en) | Threat assessment method and device for network attack behavior | |
CN104850567A (en) | Method and device for identifying association between network users | |
CN107315822A (en) | A kind of method for digging of Knowledge Relation | |
CN106598999A (en) | Method and device for calculating text theme membership degree | |
CN103366009B (en) | A kind of book recommendation method based on self-adaption cluster | |
CN105678590A (en) | topN recommendation method for social network based on cloud model | |
CN112085125A (en) | Missing value filling method based on linear self-learning network, storage medium and system | |
Snodgrass et al. | Procedural level generation using multi-layer level representations with mdmcs | |
CN104216916B (en) | Data restoration method and device | |
CN108874663A (en) | Black box fault filling method and system and medium apparatus | |
Goldstein et al. | Group-based Yule model for bipartite author-paper networks | |
CN103780263B (en) | Device and method of data compression and recording medium | |
CN107358534A (en) | The unbiased data collecting system and acquisition method of social networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |