CN105138588A - Database overlap mode abstract generating method based on multi-label propagation - Google Patents
Database overlap mode abstract generating method based on multi-label propagation Download PDFInfo
- Publication number
- CN105138588A CN105138588A CN201510464314.1A CN201510464314A CN105138588A CN 105138588 A CN105138588 A CN 105138588A CN 201510464314 A CN201510464314 A CN 201510464314A CN 105138588 A CN105138588 A CN 105138588A
- Authority
- CN
- China
- Prior art keywords
- relation table
- similarity
- label
- database
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a database overlap mode abstract generating method based on multi-label propagation. The overlap mode abstract generating method comprises: mapping database mode information into a multi-label graph model; clustering the database mode information with a multi-label propagation algorithm, and generating an overlap group; clustering the overlap group with a hierarchical clustering algorithm, and further generating result categories of appropriate sizes; and and finally, on the basis of an information entropy and a random walk model, selecting a topic table for each result category, thus generating the final overlap mode abstract of the database. According to the overlap mode abstract generating scheme provided by the invention, a user can be provided with a more accurate and more meaningful database overlap mode abstract, thus helping the user to understand the database information quickly.
Description
Technical field
The invention belongs to database technical field, be specifically related to a kind of novel relational database overlap scheme summarization generation technology.
Background technology
Along with the develop rapidly of the universal of computing machine and infotech, a large amount of data messages makes database technology obtain to use widely, and database application starts to move towards domestic consumer.But the scale in modern data storehouse is often very huge and complicated, user wants in query script, generate suitable Structured Query Language (SQL), just must have certain understanding to the pattern information of database.But pattern information corresponding to large scale database is usual also very complicated, and ubiquity relevant documentation deficient phenomena, more understand database schema to user and cause difficulty.
Pattern summarization generation technology can effectively solve the problem, and for user provides a simple and clear database schema summary, improves the availability of database.Existing pattern summary solution is all only absorbed in the generation of non-overlapping pattern summary, namely only permission database relational table belongs to a theme class in pattern summary, but in reality, database relational table often can have multi-meaning and be under the jurisdiction of multiple theme class.Only considering non-overlapped situation to cause summary, result is imperfect even makes user misunderstand.
To make a summary the problem often can not comprehensively met consumers' demand relative to non-overlapping pattern.Overlap scheme summarization generation technology can generate more reasonably database schema summary info, and effectively minimizing user understands the time and efforts that database schema consumes, and has future in engineering applications widely.
Summary of the invention
The object of the invention is to overcome prior art above shortcomings, propose a kind of database overlap scheme summary automatic generation method propagated based on many labels.
The database overlap scheme abstraction generating method propagated based on many labels provided by the invention, innovatively proposes overlap scheme summary concept; Design a kind of new database many label mode graph model; Have employed many labels propagation algorithm and hierarchical clustering algorithm carries out cluster to database schema respectively; Final each result class for cluster gained chooses a subject heading list, can overlapping pattern make a summary for user returns one.The step of the method is as follows:
1st, database schema is mapped as many labels figure of a Weight;
1.1st, database schema is mapped as label figure more than,
Define 1: one relational data base schema and can be mapped as label figure more than, with tlv triple G=(V, E, a L
m) represent, wherein:
1. .V represents the set of relation table node in database, and v ∈ V represents the relation table node in database;
2. .E represents the set of foreign key relationship in database, and e ∈ E represents the foreign key relationship in database;
3. .L
mbe a label mapping function, by node mapping to one or more corresponding label, wherein label (c, b) represent, c represents a result class indications, and b is label degree of membership, represents that a database relational table v's and its result class indications c is subordinate to intensity;
Similarity between two relation tables 1.2nd, calculating fillet in many labels figure, as label figure weight;
1.2.1, the table name of usage space vector model calculated relationship table and the text similarity of attribute-name, as the title similarity of relation table;
1.2.2, use Jaccard coefficient carry out numerical value similarity analysis to the value of relation table attribute column, and find optimum matching attribute pair by greedy algorithm, get the mean value of optimum matching attribute to value similarity and try to achieve relation tabular value similarity;
1.2.3, by analyzing the count rate between relation table, calculate the mapping relations similarity of relation table,
Definition 2: the mapping relations similarity between relation table R and relation table S, is denoted as Sim
m(R, S), is defined as follows:
Wherein:
1.. τ represents all tuples of relation table;
2. .fan (τ
i) be tuple τ
ifan-out degree on fillet e, fan-out degree defines for the fillet number between tuple and tuple, represents the different tuple numbers that certain a line tuple can connect;
3. .q
ifan (τ is met for all in relation table R
i) number of tuples of > 0;
1.2.4, walk based on above-mentioned 1.2.1 to 1.2.3 in three kinds of similarity features, adopt multiple linear regression model to calculate relation table similarity, and using the weight of this similarity as many labels figure.
2nd, adopt many labels propagation algorithm to carry out cluster to many labels figure, generation can overlapping be rolled into a ball;
2.1st, determine the parameter θ of many labels propagation algorithm, θ is the maximum portable number of tags of each node; If user's designated mode summary net result class number is k, then θ attempts value is k-1 to k+3, final select to make many labels to propagate gained can the maximum θ of the inside cluster similarity of overlapping group, inner cluster similarity is defined as follows:
Definition 3: supposing that many labels are propagated many labels figure cluster is C={C
1, C
2..., C
mcan overlapping roll into a ball, the intra-cluster similarity of so much label propagation result C is as follows:
Wherein:
1. .Sim (v
i, v
j) be relation table v
iand v
jbetween similarity;
2. .|C
i| represent C
iin relation table number;
2.2nd, be the label that each Node configuration in label figure one is unique, the classification indications of this label is set to the relation table title of this node, and degree of membership is set to 1;
2.3rd, the label of an all neighbor node of node joins in the label of this node according to the weight on degree of membership and limit by each iteration, and does standardization and make the degree of membership of this node and be 1,
Definition 4: normalization function b
x(c, v
i) represent when the secondary iteration of xth, node v
ilabel in, the mapping relations of corporations indications c and its degree of membership b are:
Wherein:
1. .N (v
i) be node v
iall neighbor nodes;
2..
represent limit (v
i, v
j) weight;
2.4th, the label of degree of membership lower than 1/ θ is deleted;
2.5th, when the nodes that the minimum classification indications be labeled marks is constant, iteration stopping; After supposing that iteration terminates, remaining classification indications is m, will with indications c
mnode be referred to a C
min, now, many labels figure is divided into the group C={C that m can have lap
1, C
2..., C
m;
2.6th, θ gets different values, repeats above-mentioned 2.2nd to the 2.5th step, selects maximum one group of inner cluster similarity overlappingly can roll into a ball the result propagated as many labels.
3rd, to overlapping group hierarchical clustering can be carried out, result class is generated;
3.1st, calculating can similarity between overlapping group,
Definition 5:C
iand C
jrepresent two that many labels propagation clustering obtains respectively can overlapping roll into a ball, C
iand C
jbetween similarity can be defined as:
Wherein, Sim (C
i, C
j) representation relation table v
iand v
jbetween similarity, if two table between there is no incidence edge, the similarity between them is 0;
3.2nd, each overlappingly can be rolled into a ball as an independent class, in each iteration, merge two classes that similarity is maximum, until stop iteration after being incorporated into k result class specified by user.
4th, for each result class chooses subject heading list, final pattern summary is returned to user;
4.1st, the importance degree of calculated relationship table;
The quantity of information of 4.1.1, calculated relationship table,
Definition 6: the attribute A in relation table R is denoted as R.A, and the information entropy on this attribute is defined as:
Wherein, h represents the number of all not identical values on attribute A; If the value on attribute A can be expressed as the set R.A={a of h different value
1..., a
h, use p
irepresent a
ithe probability occurred;
Definition 7: the quantity of information of relation table R is defined as:
Wherein, | R| represents the tuple number in R;
Transition probability between 4.1.2, calculated relationship table,
Definition 8: for relation table R and relation table S, the definition of probability being transferred to S by R is as follows:
Wherein:
1. .R.A-S.B represents the foreign key reference between the A attribute of relation table R and the B attribute of relation table S;
2.. for any attribute A ', q in R
a 'represent the upper all external key linking numbers of R.A ';
4.3rd, adopt random walk model, using the quantity of information of relation table as the initial value of random walk, using the transition probability between relation table as the transition probability of random walk, quantity of information distribution when model reaches stable state is the importance degree of relation table;
4.4th, select relation table that in each result class, importance degree is the highest as such subject heading list, return to the pattern summary that user is final.
Advantage of the present invention and beneficial effect:
The present invention innovatively proposes the mapping method of a kind of database schema to many labels figure, the classification information of relation table is stored form with label, and determines the final cluster result of pattern summary by degree of membership; Analyse in depth the many labels propagation algorithm based on figure, and propose a kind of pattern summary Auto-generation Model propagated based on many labels based on this; Compared with conventional model, this model inheritance advantage of many labels propagation algorithm, the pattern that can automatically generate with lap makes a summary, and achieves higher clustering precision; For user's quick-searching database provides help;
Accompanying drawing explanation
Fig. 1 is method general flow chart;
Fig. 2 is primitive relation database schema figure;
Fig. 3 is many label graphic formula that example relationship database is corresponding;
Fig. 4 is can overlapping group divide after many labels propagation clustering;
Fig. 5 is that the result class after hierarchical clustering divides;
Fig. 6 is pattern summary result figure, and wherein, a, b are the Database clustering figure that pattern summary is corresponding, and c is pattern summary figure;
Table 1 is illustrative data base relation table importance degree result of calculation information.
Embodiment
The treatment scheme of the inventive method as shown in Figure 1.
Introduce the embodiment of the inventive method below in conjunction with embodiment, be illustrated in figure 2 embodiment relational data base schema figure.The pattern generated through overlap scheme abstraction generating method is made a summary as shown in Figure 6, wherein Fig. 6 (c) is overlap scheme summary figure, be convenient to user and put complex patterns relation in order, simultaneously, user also can check for certain part in pattern summary figure in detail, after launching as shown in Fig. 6 (a) He (b).The concrete steps of the inventive method are introduced below in conjunction with the embodiment shown in Fig. 2:
Step 1: many labels figure database schema being mapped as a Weight.
1.1st, database schema is mapped as label figure more than,
First traveling through the pattern information of relational database, is label figure more than by the pattern information formal definitions of relational database, by tlv triple G=(V, E, L
m) represent, wherein V represents the set of relation table node in database, and v ∈ V represents the relation table node in database; E represents the set of foreign key relationship in database, and e ∈ E represents the foreign key relationship in database; L
mbe a label mapping function, by node mapping to one or more corresponding label, wherein label (c, b) represent, c represents a result class indications, and b is label degree of membership, represents that a database relational table v's and its result class c is subordinate to intensity.Fig. 3 shows many label graphic formula corresponding to example relationship database in Fig. 2, and initially, for relation table each in many labels figure only arranges a unique label, indications is the table name of relation table, and degree of membership is 1.
1.2nd, calculate many labels figure weight, concrete steps are as follows:
1.2.1, the table name of usage space vector model calculated relationship table and the text similarity of attribute-name, as the title similarity of relation table;
First regard often open relation table as one section of text be made up of table name and the attribute-name of the relation table through word segmentation processing as, for the ProductCategory relation table in Fig. 2, the table name of this relation table and attribute-name can be divided into following morpheme: Product, Category, ID and Type, occur three times in the text that wherein Category represents at relation table ProductCategory, ID and Type all occurs once; Whole relational database is regarded as a text set be made up of the morpheme after participle, by the weight of calculated relationship table morpheme in text set, the name information of relation table is mapped as a space vector; Vector space model is adopted to calculate the angle of two space vectors, i.e. the title similarity of two relation tables;
1.2.2, use Jaccard coefficient carry out numerical value similarity analysis to the value of relation table attribute column, and find optimum matching attribute pair by greedy algorithm, get the mean value of optimum matching attribute to value similarity and try to achieve relation tabular value similarity;
The false code of the lookup algorithm specific implementation that optimum matching attribute is right is as follows:
Algorithm 1: the lookup algorithm GreedyMatching that optimum matching attribute is right
Input: relation table R, the attribute of relation table S, R and S is as calculated to similarity set P
Export: optimum matching community set Z
①.
2. the property set of .U:=R
3. the property set of .V:=S
④.WHILE(1)DO
⑤.
BREAK;
6.. traversal U and V
7.. there is the attribute of maximal value maximum to (u, v) according to P searching
8.. (u, v) is inserted in Z
9.. in U, u is deleted, in V, v is deleted
⑩.ENDWHILE
RETURNZ
algorithm terminates
For the Product relation table in Fig. 2 and ProductCategory relation table, first the attributes similarity between these two relation tables is calculated: J (Product.ProductID, ProductCategory.CategoryID)=0.1, J (Product.ProducName, ProductCategory.CategoryType)=0.05, J (Product.CategoryID, ProductCategory.CategoryID)=0.8, the similar value between other attributes is 0.Optimum matching attribute pair is excavated, respectively: J (Product.CategoryID, ProductCategory.CategoryID) and J (Product.ProducName, ProductCategory.CategoryType) by algorithm 1.Therefore the value similarity between Product relation table and ProductCategory relation table: Sim
v(Product, ProductCategory)=(0.8+0.05)/2=0.425.
1.2.3, for the fillet number between tuple and tuple, adopting tuple fan-out degree to represent the different tuple numbers that certain a line tuple can connect, representing the mapping relations similarity of relation table by defining the linear function be directly proportional to tuple fan-out degree.
1.2.4, last based on above-mentioned three kinds of relation table similarity features, adopt each feature of multiple linear regression model comprehensive consideration, calculate relation table similarity, as the weight of many labels figure; First the title similarity between relation table, value similarity and mapping relations similarity are normalized, make within data-mapping to 0 ~ 1 scope.Following employing multiple linear regression model, it is considered herein that the influence degree of the title factor, the value factor and mapping relations factor pair relation table similarity is successively decreased successively, therefore by the parameter alpha in algorithm, β, γ, δ are set to 6.4,4.8,2.0 and 0.2 respectively, make Sim (R, S) ∈ [0,1].
Step 2: adopt many labels propagation algorithm to carry out cluster to many labels figure, generation can overlapping be rolled into a ball.
2.1st, determine the parameter θ of many labels propagation algorithm, θ is the maximum portable number of tags of each node; If user's designated mode summary net result class number is k, then θ attempts value is k-1 to k+3;
For Fig. 2 illustrative data base, when designated result class number k is 2, the value of θ this attempt 1 respectively, 2,3,4,5 carry out the propagation of many labels.
2.2nd, be the label that each Node configuration in label figure one is unique, the classification indications of this label is set to the relation table title of this node, and degree of membership is set to 1.
2.3rd, the label of a nodes neighbors node joins in the label of this node according to degree of membership by each iteration, and does standardization and make the degree of membership of this node and be 1.
2.4th, be unlikely to again last each node have all labels to retain multiple label, algorithm calculates the degree of membership of each label, and deletes those labels lower than given threshold value.Threshold value is herein 1/ θ.
2.5th, after many wheel iteration, when the nodes that the minimum classification indications be labeled marks is constant, iteration is stopped; Relation table now with identical indications label being divided into one can in overlapping group.
2.6 the 2.2nd to the 2.5th step, selects maximum one group of inner cluster similarity overlappingly can roll into a ball the result propagated as many labels.
For Fig. 2 illustrative data base, when designated result class number k212345534 θ gets 3, what many labels marked off after propagating can overlapping roll into a ball, and wherein rolling into a ball 1 with the lap of group 2 is relation table ZipCode and Order, and the lap between group 1 and group 3 is relation table Supply.
Step 3, to overlapping group hierarchical clustering can be carried out, generate result class.
3.1st, calculating can similarity between overlapping group.
3.2nd, each overlappingly can be rolled into a ball as an independent class, in each iteration, merge two classes that similarity is maximum, until stop iteration after being incorporated into k result class specified by user.
The false code of hierarchical clustering algorithm specific implementation is as follows:
Algorithm 2: hierarchical clustering algorithm HierarchicalClustering
Input: overlapping group can divide C={C
1, C
2..., C
m, result class number k
Export: result class divides C={C
1, C
2..., C
k}
①.FORi=|S|TOk
2.. two classes finding similarity maximum, C
p, C
q∈ C
3.. merge class C
pand C
q
4.. from class C, delete C
q
⑤.FOREACHC
j
6.. compute classes C
jand C
qbetween similarity
⑦.ENDFOR
⑧.ENDFOR
⑨.RETURNC
10.. algorithm terminates
Algorithm 2 describes the execution flow process of hierarchical clustering algorithm.First each overlapping group can be used as an independent result class by this algorithm; In each step iterative process, search two classes that similarity is maximum, merged, as shown in 2. arrive in algorithm 4.; Iterative process can be carried out always, until reach k result class.
Can overlapping roll into a ball for Fig. 4 example, when designated result class number k is 2, hierarchical clustering stops after obtaining two result classes.As shown in Figure 5, now mode chart is divided into 2 result classes, and relation table Supply is the lap of two classes.
Step 4, choose subject heading list for each result class, final pattern summary is returned to user.
4.1st, the importance of often opening relation table is weighed by main foreign key information, attribute information and the tuple information in relation table.Part relation table importance degree result of calculation information in illustrative data base is listed in table 1.
Table 1 illustrative data base relation table importance degree result of calculation information
Rank | Relation table | Importance degree |
1 | Company | 189.35 |
2 | Order183.28 | |
3 | Customer | 116.54 |
4 | Product | 101.07 |
The false code of calculated relationship table importance degree specific implementation is as follows:
Algorithm 3: calculated relationship table importance degree method TableImportance
Input: label figure G
Export: relation table importance degree vector I
①.FOREACHnodeRING
2. .IC [R]: the quantity of information of=relation table R
③.I
0[R]:=IC[R]
④.ENDFOR
⑤.FOREACHedgeeING
6.. Π: the transition matrix between=relation table
⑦.ENDFOR
8. .done=FALSE; / * stochastic process convergence identifier */
⑨.WHILE(!done)DO
⑩.I:=I
0*Π
iF (dist (I, I
0)≤ε)/* uses Infinite Norm compute vector distance, ε be minimal value */
done=TRUE;
I
0:=I
ENDWHILE
RETURNI
algorithm terminates
This arthmetic statement method of calculated relationship table importance degree.First, the quantity of information of often opening relation table is calculated according to main foreign key information, attribute information and the tuple information in relation table, by calculating the initial value of quantity of information as random walk of gained, then the transition probability between gained relation table is calculated by the foreign key reference relation between relation table, repeatedly send and receive information along the limit in figure according to transition probability, until stochastic process converges to a Stable distritation.Finally, the information magnitude of each relation table during stationary distribution is defined as the importance degree of this relation table.
4.2nd, select relation table that in each result class, importance degree is the highest as such subject heading list, return to the pattern summary that user is final.
For the result class of Fig. 5 example, for each result class chooses the highest relation table of importance degree as subject heading list, wherein in classification 1, most important table is Company, and in classification 2, most important table is Product.Fig. 6 be automatically generate can overlap scheme summary figure, wherein Fig. 6 (a) and (b) be depicted as cluster result be mapped to relational database after result display.
Claims (1)
1., based on the database overlap scheme abstraction generating method that many labels are propagated, it is characterized in that the method comprises:
1st, database schema is mapped as many labels figure of a Weight;
1.1st, database schema is mapped as label figure more than,
Define 1: one relational data base schema and can be mapped as label figure more than, with tlv triple G=(V, E, a L
m) represent, wherein:
1. .V represents the set of relation table node in database, and v ∈ V represents the relation table node in database;
2. .E represents the set of foreign key relationship in database, and e ∈ E represents the foreign key relationship in database;
3. .L
mbe a label mapping function, by node mapping to one or more corresponding label, wherein label (c, b) represent, c represents a result class indications, and b is label degree of membership, represents that a database relational table v's and its result class indications c is subordinate to intensity;
Similarity between two relation tables 1.2nd, calculating fillet in many labels figure, as label figure weight;
1.2.1, the table name of usage space vector model calculated relationship table and the text similarity of attribute-name, as the title similarity of relation table;
1.2.2, use Jaccard coefficient carry out numerical value similarity analysis to the value of relation table attribute column, and find optimum matching attribute pair by greedy algorithm, get the mean value of optimum matching attribute to value similarity and try to achieve relation tabular value similarity;
1.2.3, by analyzing the count rate between relation table, calculate the mapping relations similarity of relation table,
Definition 2: the mapping relations similarity between relation table R and relation table S, is denoted as Sim
m(R, S), is defined as follows:
Wherein:
1.. τ represents all tuples of relation table;
2. .fan (τ
i) be tuple τ
ifan-out degree on fillet e, fan-out degree defines for the fillet number between tuple and tuple, represents the different tuple numbers that certain a line tuple can connect;
3. .q
ifan (τ is met for all in relation table R
i) number of tuples of > 0;
1.2.4, walk based on above-mentioned 1.2.1 to 1.2.3 in three kinds of similarity features, adopt multiple linear regression model to calculate relation table similarity, and using the weight of this similarity as many labels figure;
2nd, adopt many labels propagation algorithm to carry out cluster to many labels figure, generation can overlapping be rolled into a ball;
2.1st, determine the parameter θ of many labels propagation algorithm, θ is the maximum portable number of tags of each node; If user's designated mode summary net result class number is k, then θ attempts value is k-1 to k+3, final select to make many labels to propagate gained can the maximum θ of the inside cluster similarity of overlapping group, inner cluster similarity is defined as follows:
Definition 3: supposing that many labels are propagated many labels figure cluster is C={C
1, C
2..., C
mcan overlapping roll into a ball, the intra-cluster similarity of so much label propagation result C is as follows:
Wherein:
1. .Sim (v
i, v
j) be relation table v
iand v
jbetween similarity;
2. .|C
i| represent C
iin relation table number;
2.2nd, be the label that each Node configuration in label figure one is unique, the classification indications of this label is set to the relation table title of this node, and degree of membership is set to 1;
2.3rd, the label of an all neighbor node of node joins in the label of this node according to the weight on degree of membership and limit by each iteration, and does standardization and make the degree of membership of this node and be 1,
Definition 4: normalization function b
x(c, v
i) represent when the secondary iteration of xth, node v
ilabel in, the mapping relations of corporations indications c and its degree of membership b are:
Wherein:
1. .N (v
i) be node v
iall neighbor nodes;
2..
represent limit (v
i, v
j) weight;
2.4th, the label of degree of membership lower than 1/ θ is deleted;
2.5th, when the nodes that the minimum classification indications be labeled marks is constant, iteration stopping; After supposing that iteration terminates, remaining classification indications is m, will with indications c
mnode be referred to a C
min, now, many labels figure is divided into the group C={C that m can have lap
1, C
2..., C
m;
2.6th, θ gets different values, repeats above-mentioned 2.2nd to the 2.5th step, selects maximum one group of inner cluster similarity overlappingly can roll into a ball the result propagated as many labels;
3rd, to overlapping group hierarchical clustering can be carried out, result class is generated;
3.1st, calculating can similarity between overlapping group,
Definition 5:C
iand C
jrepresent two that many labels propagation clustering obtains respectively can overlapping roll into a ball, C
iand C
jbetween similarity can be defined as:
Wherein, Sim (C
i, C
j) representation relation table v
iand v
jbetween similarity, if two table between there is no incidence edge, the similarity between them is 0;
3.2nd, each overlappingly can be rolled into a ball as an independent class, in each iteration, merge two classes that similarity is maximum, until stop iteration after being incorporated into k result class specified by user;
4th, for each result class chooses subject heading list, final pattern summary is returned to user;
4.1st, the importance degree of calculated relationship table;
The quantity of information of 4.1.1, calculated relationship table,
Definition 6: the attribute A in relation table R is denoted as R.A, and the information entropy on this attribute is defined as:
Wherein, h represents the number of all not identical values on attribute A; If the value on attribute A can be expressed as the set R.A={a of h different value
1..., a
h, use p
irepresent a
ithe probability occurred;
Definition 7: the quantity of information of relation table R is defined as:
Wherein, | R| represents the tuple number in R;
Transition probability between 4.1.2, calculated relationship table,
Definition 8: for relation table R and relation table S, the definition of probability being transferred to S by R is as follows:
Wherein:
1. .R.A-S.B represents the foreign key reference between the A attribute of relation table R and the B attribute of relation table S;
2.. for any attribute A ', q in R
a 'represent the upper all external key linking numbers of R.A ';
4.3rd, adopt random walk model, using the quantity of information of relation table as the initial value of random walk, using the transition probability between relation table as the transition probability of random walk, quantity of information distribution when model reaches stable state is the importance degree of relation table;
4.4th, select relation table that in each result class, importance degree is the highest as such subject heading list, return to the pattern summary that user is final.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510464314.1A CN105138588B (en) | 2015-07-31 | 2015-07-31 | A kind of database overlap scheme abstraction generating method propagated based on multi-tag |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510464314.1A CN105138588B (en) | 2015-07-31 | 2015-07-31 | A kind of database overlap scheme abstraction generating method propagated based on multi-tag |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105138588A true CN105138588A (en) | 2015-12-09 |
CN105138588B CN105138588B (en) | 2018-09-28 |
Family
ID=54723937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510464314.1A Expired - Fee Related CN105138588B (en) | 2015-07-31 | 2015-07-31 | A kind of database overlap scheme abstraction generating method propagated based on multi-tag |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105138588B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991614A (en) * | 2017-03-02 | 2017-07-28 | 南京信息工程大学 | The parallel overlapping community discovery method propagated under Spark based on label |
CN107402932A (en) * | 2016-05-20 | 2017-11-28 | 腾讯科技(深圳)有限公司 | Extension processing method, the text of user tag recommend method and apparatus |
CN107992608A (en) * | 2017-12-15 | 2018-05-04 | 南开大学 | A kind of SPARQL query statement automatic generation methods based on keyword context |
CN107992590A (en) * | 2017-12-11 | 2018-05-04 | 成都逸重力网络科技有限公司 | A kind of big data system for being conducive to information comparison |
CN108052587A (en) * | 2017-12-11 | 2018-05-18 | 成都逸重力网络科技有限公司 | Big data analysis method based on decision tree |
CN108804582A (en) * | 2018-05-24 | 2018-11-13 | 天津大学 | Method based on the chart database optimization of complex relationship between big data |
CN110309419A (en) * | 2018-05-14 | 2019-10-08 | 桂林远望智能通信科技有限公司 | A kind of overlapping anatomic framework method for digging and device propagated based on balance multi-tag |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102118249A (en) * | 2010-12-22 | 2011-07-06 | 厦门柏事特信息科技有限公司 | Photographing and evidence-taking method based on digital digest and digital signature |
CN102254022A (en) * | 2011-07-27 | 2011-11-23 | 河海大学 | Method for sharing metadata of information resources of various data types |
CN104036051A (en) * | 2014-07-04 | 2014-09-10 | 南开大学 | Database mode abstract generation method based on label propagation |
US20150019217A1 (en) * | 2005-08-05 | 2015-01-15 | Voicebox Technologies Corporation | Systems and methods for responding to natural language speech utterance |
-
2015
- 2015-07-31 CN CN201510464314.1A patent/CN105138588B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150019217A1 (en) * | 2005-08-05 | 2015-01-15 | Voicebox Technologies Corporation | Systems and methods for responding to natural language speech utterance |
CN102118249A (en) * | 2010-12-22 | 2011-07-06 | 厦门柏事特信息科技有限公司 | Photographing and evidence-taking method based on digital digest and digital signature |
CN102254022A (en) * | 2011-07-27 | 2011-11-23 | 河海大学 | Method for sharing metadata of information resources of various data types |
CN104036051A (en) * | 2014-07-04 | 2014-09-10 | 南开大学 | Database mode abstract generation method based on label propagation |
Non-Patent Citations (1)
Title |
---|
XIAOJIE YUAN ET.AL.: ""Summarizing Relational Database Schema Based on Label Propagation"", 《APWEB 2014, LNCS 8709》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107402932A (en) * | 2016-05-20 | 2017-11-28 | 腾讯科技(深圳)有限公司 | Extension processing method, the text of user tag recommend method and apparatus |
CN106991614A (en) * | 2017-03-02 | 2017-07-28 | 南京信息工程大学 | The parallel overlapping community discovery method propagated under Spark based on label |
CN107992590A (en) * | 2017-12-11 | 2018-05-04 | 成都逸重力网络科技有限公司 | A kind of big data system for being conducive to information comparison |
CN108052587A (en) * | 2017-12-11 | 2018-05-18 | 成都逸重力网络科技有限公司 | Big data analysis method based on decision tree |
CN108052587B (en) * | 2017-12-11 | 2021-11-05 | 成都逸重力网络科技有限公司 | Big data analysis method based on decision tree |
CN107992590B (en) * | 2017-12-11 | 2021-11-05 | 成都逸重力网络科技有限公司 | Big data system beneficial to information comparison |
CN107992608A (en) * | 2017-12-15 | 2018-05-04 | 南开大学 | A kind of SPARQL query statement automatic generation methods based on keyword context |
CN107992608B (en) * | 2017-12-15 | 2021-07-02 | 南开大学 | SPARQL query statement automatic generation method based on keyword context |
CN110309419A (en) * | 2018-05-14 | 2019-10-08 | 桂林远望智能通信科技有限公司 | A kind of overlapping anatomic framework method for digging and device propagated based on balance multi-tag |
CN108804582A (en) * | 2018-05-24 | 2018-11-13 | 天津大学 | Method based on the chart database optimization of complex relationship between big data |
Also Published As
Publication number | Publication date |
---|---|
CN105138588B (en) | 2018-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105138588A (en) | Database overlap mode abstract generating method based on multi-label propagation | |
CN106919689B (en) | Professional domain knowledge mapping dynamic fixing method based on definitions blocks of knowledge | |
CN104036051B (en) | A kind of database schema abstraction generating method propagated based on label | |
US10565200B2 (en) | Conversion of model views into relational models | |
US8655918B2 (en) | System and method of transforming data for use in data analysis tools | |
Leventidis et al. | QueryVis: Logic-based diagrams help users understand complicated SQL queries faster | |
CN111542813A (en) | Using object models of heterogeneous data to facilitate building data visualizations | |
CN103793422A (en) | Methods for generating cube metadata and query statements on basis of enhanced star schema | |
CN104239513A (en) | Semantic retrieval method oriented to field data | |
CN104700190A (en) | Method and device for matching item and professionals | |
Qiu et al. | A new approach for multiple attribute group decision making with interval-valued intuitionistic fuzzy information | |
CN103262076A (en) | Analytical data processing | |
CN105843799A (en) | Academic paper label recommendation method based on multi-source heterogeneous information graph model | |
Lin et al. | A frequent itemset mining algorithm based on the Principle of Inclusion–Exclusion and transaction mapping | |
Danping et al. | The data mining of the human resources data warehouse in university based on association rule | |
CN105045863A (en) | Method and system used for entity matching | |
CN104778205A (en) | Heterogeneous information network-based mobile application ordering and clustering method | |
CN117010373A (en) | Recommendation method for category and group to which asset management data of power equipment belong | |
CN117291494A (en) | Spare part inventory matching control method based on knowledge graph | |
CN103150371B (en) | Forward and reverse training goes to obscure text searching method | |
Anuradha et al. | Mining generalized positive and negative inter-cross fuzzy multiple-level coherent rules | |
Guo et al. | EC‐Structure: Establishing Consumption Structure through Mining E‐Commerce Data to Discover Consumption Upgrade | |
Halpin | Join constraints | |
CN107480199B (en) | Query reconstruction method, device, equipment and storage medium of database | |
Nebot et al. | Towards Analytical MD Stars from Linked Data. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180928 Termination date: 20210731 |