CN105138588B - A kind of database overlap scheme abstraction generating method propagated based on multi-tag - Google Patents
A kind of database overlap scheme abstraction generating method propagated based on multi-tag Download PDFInfo
- Publication number
- CN105138588B CN105138588B CN201510464314.1A CN201510464314A CN105138588B CN 105138588 B CN105138588 B CN 105138588B CN 201510464314 A CN201510464314 A CN 201510464314A CN 105138588 B CN105138588 B CN 105138588B
- Authority
- CN
- China
- Prior art keywords
- relation table
- similarity
- tag
- database
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
Abstract
A kind of database overlap scheme abstraction generating method propagated based on multi-tag.Including:It is multi-tag graph model by database schema information MAP;Database pattern information is clustered using multi-tag propagation algorithm, generation can the group of overlapping;Using hierarchical clustering algorithm pair can the group of overlapping cluster, further generate suitable scale result class;It is that each result class chooses subject heading list to be finally based on comentropy and random walk model, is made a summary with generating final database overlap scheme.Overlap scheme summarization generation scheme proposed by the present invention can provide more accurate, meaningful database overlap scheme abstract to the user, help user that database information is understood quickly.
Description
Technical field
The invention belongs to database technical fields, and in particular to a kind of novel relational database overlap scheme summarization generation
Technology.
Background technology
With universal and information technology the rapid development of computer, a large amount of data information makes database technology obtain
Extensive use, database application start to move towards ordinary user.However the scale in modern data library is often very huge and answers
Miscellaneous, user just has to the pattern information tool to database to generate structured query language appropriate in query process
There is certain understanding.However the pattern information corresponding to large scale database is generally also sufficiently complex, and generally existing is related
Document deficient phenomena more understands database schema to user and causes difficulty.
Pattern summarization generation technology can the effective solution above problem, provide the database schema of a simplicity to the user
Summary improves the availability of database.Existing pattern abstract solution is all only absorbed in the generation of non-overlapping pattern abstract,
A theme class for namely only allowing a database relational table to belong to during pattern is made a summary, however in reality, database closes
It is table can often possess multi-meaning and be under the jurisdiction of multiple theme class.Only consider that non-overlapping situation can cause abstract result endless
It is whole to misunderstand even with family.
It often can not meet the problem of user demand comprehensively relative to non-overlapping pattern abstract.Overlap scheme summarization generation
Technology can generate more rational database schema summary info, effectively reduce that user understands that database schema consumed when
Between and energy, have extensive future in engineering applications.
Invention content
It is an object of the invention to overcome deficiencies of the prior art, a kind of number propagated based on multi-tag is proposed
According to library overlap scheme abstract automatic generation method.
The database overlap scheme abstraction generating method provided by the invention propagated based on multi-tag, innovatively proposes weight
Folded pattern abstract concept;Design a kind of new database multi-tag pattern graph model;Use multi-tag propagation algorithm and level
Clustering algorithm respectively clusters database schema;Final each result class for cluster gained chooses a subject heading list, is
User returns to the pattern that can a be overlapped abstract.The step of this method, is as follows:
The 1st, database schema is mapped as to the multi-tag figure of a Weight;
The 1.1st, database schema is mapped as to a multi-tag figure,
Define 1:One relational data base schema can be mapped as a multi-tag figure, with a triple G=(V, E,
LM) indicate, wherein:
1. .V indicates that the set of relation table node in database, v ∈ V indicate the relation table node in database;
2. .E indicates that the set of foreign key relationship in database, e ∈ E indicate the foreign key relationship in database;
③.LMFor a label mapping function, node is mapped to one or more corresponding label, wherein label is used
(c, b) is indicated, c indicates that a result class indications, b are label degree of membership, indicates a database relational table v and its result class
Indications c's is subordinate to intensity;
1.2nd, the similitude between two relation tables on connection side in multi-tag figure is calculated, as label figure weight;
1.2.1, use space vector model calculated relationship table table name and attribute-name text similarity, as relationship
The title similarity of table;
1.2.2, numerical value similarity analysis is carried out to the value of relation table attribute column using Jaccard coefficients, and by greedy
Center algorithm finds best match attribute pair, and best match attribute is taken to acquire relationship tabular value similarity to the average value of value similarity;
1.2.3, by analyze relation table between count rate, calculate the mapping relations similarity of relation table,
Define 2:Mapping relations similarity between relation table R and relation table S is denoted as Simm (R, S), is defined as follows:
Wherein:
1. τ indicate all tuples of relation table;
②.fan(τi) it is tuple τiDegree of being fanned out on connection side e, degree of being fanned out to are for the connection between tuple and tuple
Edge strip number and define, indicate the different tuple numbers that certain a line tuple can connect;
③.qiMeet fan (τ to be all in relation table Ri) > 0 number of tuples;
1.2.4, based on above-mentioned 1.2.1 to 1.2.3 walk in three kinds of similarity features, using multiple linear regression
Relation table similarity is calculated in model, and using the similarity as the weight of multi-tag figure.
2nd, multi-tag figure is clustered using multi-tag propagation algorithm, generation can the group of overlapping;
2.1st, determine that the parameter θ of multi-tag propagation algorithm, θ are the at most portable number of tags of each node;If user
Designated mode makes a summary final result class number as k, then it is k-1 to k+3 that θ, which attempts value, and final choice makes multi-tag propagate institute
The inside of the group of overlapping obtained clusters the maximum θ of similarity, and inside cluster similarity is defined as follows:
Define 3:Assuming that it is C={ C that multi-tag, which is propagated multi-tag figure cluster,1,C2,...,CmThe group of overlapping, it is so much
The intra-cluster similarity that label propagates result C is as follows:
Wherein:
①.Sim(vi,vj) it is relation table viAnd vjBetween similarity;
②.|Ci| indicate CiIn relation table number;
2.2nd, one unique label is set for each node in label figure, the classification indications of the label are set as
The relationship table name of the node, degree of membership are set as 1;
2.3rd, the label of all neighbor nodes of node is added to by each iteration according to the weight on degree of membership and side
In the label of the node, and do standardization make the node degree of membership and be 1,
Define 4:Normalization function bx(c,vi) indicate in x: th iteration, relation table viLabel in, corporations indications c
Mapping relations with its degree of membership b are:
Wherein:
①.N(vi) it is relation table viAll neighborhood tables;
②.Indicate side (vi,vj) weight;
2.4th, the label that degree of membership is less than 1/ θ is deleted;
2.5th, when the number of nodes that labeled minimum classification indications are marked is constant, iteration stopping;Assuming that repeatedly
After generation, remaining classification indications are m, will carry indications cmNode be referred to a CmIn, at this point, multi-tag figure
It is divided into the m group C={ C that there can be lap1,C2,...,Cm};
2.6th, θ takes different values, repeats above-mentioned 2.2nd to the 2.5th step, selects internal maximum one group of similarity of cluster
It can the result propagated as multi-tag of the group of overlapping.
3rd, pair can the group of overlapping carry out hierarchical clustering, generate result class;
3.1st, calculate can similarity between the group of overlapping,
Define 5:CiAnd CjRespectively represent obtained two of multi-tag propagation clustering can the group of overlapping, CiAnd CjBetween phase
It can be defined as like degree:
Wherein, Sim (vi,vj) representation relation table viAnd vjBetween similarity, if there is no incidence edge between two tables, they
Between similarity be 0;
3.2nd, by each can one individual class of the group's of overlapping conduct it is maximum to merge similarity in each iteration
Two classes, stop iteration after being incorporated into k result class specified by user.
4th, it is that each result class chooses subject heading list, final pattern abstract is returned into user;
4.1st, the importance of calculated relationship table;
The information content of 4.1.1, calculated relationship table,
Define 6:Attribute A in relation table R is denoted as R.A, the comentropy on the attribute is defined as:
Wherein, h indicates all numbers for differing value on attribute A;If the value on attribute A can be expressed as h difference
Set R.A={ a of value1,...,ah, use piTo indicate aiThe probability of appearance;
Define 7:The information content of relation table R is defined as:
Wherein, | R | indicate the tuple number in R;
Transition probability between 4.1.2, calculated relationship table,
Define 8:By taking relation table R and relation table S as an example, the definition of probability that S is transferred to by R is as follows:
Wherein:
1. .R.A-S.B indicates the foreign key reference between the A attributes and the B attributes of relation table S of relation table R;
2. is for arbitrary the attribute A ', q in RA′Indicate that R.A ' goes up all external key linking numbers;
4.3rd, using random walk model, using the information content of relation table as the initial value of random walk, with relation table
Between transition probability of the transition probability as random walk, information content distribution when model reaches stable state is the important of relation table
Degree;
4.4th, the highest relation table of importance in each result class is selected to return to user most as such subject heading list
Whole pattern abstract.
The advantages of the present invention:
The present invention innovatively proposes a kind of database schema to the mapping method of multi-tag figure, and the classification of relation table is believed
Breath is stored by label in the form of, and the final cluster result of pattern abstract is determined by degree of membership;It analyses in depth based on the more of figure
Label propagation algorithm, and a kind of pattern abstract Auto-generation Model propagated based on multi-tag is proposed based on this;With biography
System model is compared, and the model inheritance advantage of multi-tag propagation algorithm can automatically generate the pattern with lap and pluck
It wants, and achieves higher clustering precision;Help is provided for user's quick-searching database;
Description of the drawings
Fig. 1 is method general flow chart;
Fig. 2 is primitive relation database schema figure;
Fig. 3 is the corresponding multi-tag diagram form of example relationship database;
Fig. 4 is that the group of overlapping after multi-tag propagation clustering divides;
Fig. 5 is that the result class after hierarchical clustering divides;
Fig. 6 is pattern abstract result figure, wherein a, b are the corresponding Database clustering figure of pattern abstract, and c is that pattern is made a summary
Figure;Table 1 is illustrative data base relation table importance result of calculation information.
Specific implementation mode
The process flow of the method for the present invention is as shown in Figure 1.
The specific implementation mode that the method for the present invention is introduced with reference to embodiment is illustrated in figure 2 embodiment relation data
Library ideograph.The pattern abstract generated by overlap scheme abstraction generating method is as shown in fig. 6, wherein Fig. 6 (c) is overlap scheme
Summary figure clears complex patterns relationship convenient for user, meanwhile, certain part that user can also be directed in pattern summary figure is looked into detail
It sees, after expansion as shown in Fig. 6 (a) and (b).The specific steps of the method for the present invention are introduced below in conjunction with embodiment shown in Fig. 2:
Step 1:Database schema is mapped as to the multi-tag figure of a Weight.
The 1.1st, database schema is mapped as to a multi-tag figure,
The pattern information formal definitions of relational database are more than one by the pattern information for traversing relational database first
Label figure, by triple G=(V, E, LM) indicate, wherein V indicates that the set of relation table node in database, v ∈ V indicate data
Relation table node in library;E indicates that the set of foreign key relationship in database, e ∈ E indicate the foreign key relationship in database;LMFor
Node is mapped to one or more corresponding label by one label mapping function, and wherein label indicates that c is indicated with (c, b)
One result class indications, b are label degree of membership, indicate that a database relational table v and its result class c's is subordinate to intensity.Fig. 3
The corresponding multi-tag diagram form of example relationship database in Fig. 2 is shown, initially, is only arranged for each relation table in multi-tag figure
One unique label, indications are the table name of relation table, degree of membership 1.
1.2nd, multi-tag figure weight is calculated, is as follows:
1.2.1, use space vector model calculated relationship table table name and attribute-name text similarity, as relationship
The title similarity of table;
Regard every relation table as one be made of the table name and attribute-name of the relation table Jing Guo word segmentation processing first
Text, by taking the ProductCategory relation tables in Fig. 2 as an example, the table name of the relation table can be divided into following word with attribute-name
Element:Product, Category, ID and Type, wherein Category are in the text that relation table ProductCategory is indicated
Occur three times, ID and Type occur once;Regard entire relational database as a text being made of the morpheme after segmenting
The name information of relation table is mapped as a space vector by this collection by weight of the calculated relationship table morpheme in text set;
The angle of two spaces vector, i.e., the title similarity of two relation tables are calculated using vector space model;
1.2.2, numerical value similarity analysis is carried out to the value of relation table attribute column using Jaccard coefficients, and by greedy
Center algorithm finds best match attribute pair, and best match attribute is taken to acquire relationship tabular value similarity to the average value of value similarity;
The pseudocode of the lookup algorithm specific implementation of best match attribute pair is as follows:
Algorithm 1:The lookup algorithm GreedyMatching of best match attribute pair
Input:The attribute of relation table R, relation table S, the R and S that are computed are to similarity set P
Output:Best match attribute set Z
By in Fig. 2 Product relation tables and ProductCategory relation tables for, calculate this two passes first
It is the attributes similarity between table:J (Product.ProductID, ProductCategory.CategoryID)=0.1, J
(Product.ProducName, ProductCategory.CategoryType)=0.05, J (Product.CategoryID,
ProductCategory.CategoryID)=0.8, the similar value between other attributes is 0.Best is excavated by algorithm 1
It is properties right, be respectively:J (Product.CategoryID, ProductCategory.CategoryID) and J
(Product.ProducName,ProductCategory.CategoryType).Therefore Product relation tables and
Value similarity between ProductCategory relation tables:Simv(Product, ProductCategory)=(0.8+
0.05)/2=0.425.
1.2.3, for the connection edge strip number between tuple and tuple, certain a line tuple energy is indicated using tuple degree of being fanned out to
The different tuple numbers enough connected indicate that the mapping of relation table is closed by defining the linear function directly proportional to tuple degree of being fanned out to
It is similarity.
1.2.4, it is finally based on above-mentioned three kinds of relation tables similarity feature, using multiple linear regression model comprehensive consideration
Relation table similarity is calculated in each feature, the weight as multi-tag figure;Title first between relation table is similar
Degree, value similarity and mapping relations similarity are normalized, and data is made to be mapped within the scope of 0~1.Next it uses
Multiple linear regression model, it is considered herein that the influence of the title factor, the value factor and mapping relations factor pair relation table similarity
Degree is successively decreased successively, therefore by the parameter alpha in algorithm, beta, gamma, and δ is set to 6.4,4.8,2.0 and 0.2, makes Sim (R, S) ∈
[0,1]。
Step 2:Multi-tag figure is clustered using multi-tag propagation algorithm, generation can the group of overlapping.
2.1st, determine that the parameter θ of multi-tag propagation algorithm, θ are the at most portable number of tags of each node;If user
Designated mode makes a summary final result class number as k, then it is k-1 to k+3 that θ, which attempts value,;
By taking Fig. 2 illustrative data bases as an example, when designated result class number k is 2, the value of θ this attempt 1,2,3,4,5 respectively
To carry out multi-tag propagation.
2.2nd, one unique label is set for each node in label figure, the classification indications of the label are set as
The relationship table name of the node, degree of membership are set as 1.
2.3rd, the label of a nodes neighbors node is added to according to degree of membership in the label of the node by each iteration,
And do standardization make the node degree of membership and be 1.
2.4th, being unlikely to last each node again to retain multiple labels is owned by all labels, and algorithm calculates each
The degree of membership of label, and delete those labels for being less than given threshold value.Threshold value herein is 1/ θ.
2.5th, mostly after wheel iteration, when the number of nodes that labeled minimum classification indications are marked is constant, stop
Iteration;The relation table for carrying identical indications label at this time is divided into one can be in the group of overlapping.
2.6th, different values is taken, repeats above-mentioned 2.2nd to the 2.5th step, select internal maximum one group of similarity of cluster
It can the result propagated as multi-tag of the group of overlapping.
By taking Fig. 2 illustrative data bases as an example, when designated result class number k is 2, the value attempted respectively is 1,2,3,4,5
Multi-tag propagation is carried out, 5 groups of result classes are obtained, finds that, when value is 3, the inside of acquired results clusters similarity by calculating
It is maximum;When Fig. 4 is that θ takes 3, the group of overlapping that multi-tag is marked off after propagating, wherein the lap of group 1 and group 2 is relationship
Table ZipCode and Order, the lap between group 1 and group 3 are relation table Supply.
Step 3, pair can the group of overlapping carry out hierarchical clustering, generate result class.
3.1st, calculating can similarity between the group of overlapping.
3.2nd, by each can one individual class of the group's of overlapping conduct it is maximum to merge similarity in each iteration
Two classes, stop iteration after being incorporated into k result class specified by user.
The pseudocode of hierarchical clustering algorithm specific implementation is as follows:
Algorithm 2:Hierarchical clustering algorithm HierarchicalClustering
Input:It can the group of overlapping division C={ C1,C2,...,Cm, as a result class number k
Output:As a result class divides C={ C1,C2,...,Ck}
Algorithm 2 describes the execution flow of hierarchical clustering algorithm.The algorithm first by each can the group of overlapping as one
Individual result class;In each step iterative process, maximum two classes of similarity are searched, are merged, as 2. arrived in algorithm
Shown in 4.;Iterative process can carry out always, until reaching k result class.
By taking the exemplary groups of overlapping of Fig. 4 as an example, when designated result class number k is 2, hierarchical clustering is obtaining two results
Stop after class.As shown in figure 5, ideograph is divided into 2 result classes at this time, and the overlapping portion that relation table Supply is two classes
Point.
Step 4 chooses subject heading list for each result class, and final pattern abstract is returned to user.
4.1st, the weight of every relation table is weighed by main foreign key information, attribute information and the tuple information in relation table
The property wanted.Part relation table importance result of calculation information in illustrative data base is listed in table 1.
1 illustrative data base relation table importance result of calculation information of table
Ranking | Relation table | Importance |
1 | Company | 189.35 |
2 | Order | 183.28 |
3 | Customer | 116.54 |
4 | Product | 101.07 |
The pseudocode of calculated relationship table importance specific implementation is as follows:
Algorithm 3:Calculated relationship table importance method TableImportance
Input:Label figure G
Output:Relation table importance vector I
The algorithm description method of calculated relationship table importance.First, according to the main foreign key information in relation table, attribute
Information and tuple information calculate the information content of every relation table, and the information content by calculating gained is used as the initial of random walk
It is worth, then the transition probability between the relation table as obtained by the foreign key reference relationship calculating between relation table, along the side root in figure
It is sent and received information repeatedly according to transition probability, until random process converges to a Stable distritation.When finally, by Stationary Distribution
The information magnitude of each relation table is defined as the importance of the relation table.
4.2nd, the highest relation table of importance in each result class is selected to return to user most as such subject heading list
Whole pattern abstract.
By taking the exemplary result classes of Fig. 5 as an example, the highest relation table of importance is chosen as subject heading list for each result class,
Most important table is Company in middle classification 1, and most important table is Product in classification 2.Fig. 6 is being overlapped of automatically generating
Pattern summary figure, wherein Fig. 6 (a) and (b), which show cluster result and be mapped to the result after relational database, to be shown.
Claims (1)
1. a kind of database overlap scheme abstraction generating method propagated based on multi-tag, it is characterised in that this method includes:
The 1st, database schema is mapped as to the multi-tag figure of a Weight;
The 1.1st, database schema is mapped as to a multi-tag figure,
Define 1:One relational data base schema can be mapped as a multi-tag figure, with triple G=(V, E, a LM) table
Show, wherein:
1. .V indicates that the set of relation table node in database, v ∈ V indicate the relation table node in database;
2. .E indicates that the set of foreign key relationship in database, e ∈ E indicate the foreign key relationship in database;
③.LMFor a label mapping function, node is mapped to one or more corresponding label, wherein label uses (c, b)
It indicates, c indicates that a result class indications, b are label degree of membership, indicates that a database relational table v is indicated with its result class
Symbol c's is subordinate to intensity;
1.2nd, the similitude between two relation tables on connection side in multi-tag figure is calculated, as label figure weight;
1.2.1, use space vector model calculated relationship table table name and attribute-name text similarity, as relation table
Title similarity;
1.2.2, numerical value similarity analysis is carried out to the value of relation table attribute column using Jaccard coefficients, and is calculated by greed
Method finds best match attribute pair, and best match attribute is taken to acquire relationship tabular value similarity to the average value of value similarity;
1.2.3, by analyze relation table between count rate, calculate the mapping relations similarity of relation table,
Define 2:Mapping relations similarity between relation table R and relation table S, is denoted as Simm(R, S), is defined as follows:
Wherein:
1. τ indicate all tuples of relation table;
②.fan(τi) it is tuple τiDegree of being fanned out on connection side e, degree of being fanned out to are for the connection edge strip between tuple and tuple
It is several and definition, indicate the different tuple numbers that certain a line tuple can connect;
③.qiMeet fan (τ to be all in relation table Ri) > 0 number of tuples;
1.2.4, based on above-mentioned 1.2.1 to 1.2.3 walk in three kinds of similarity features, using multiple linear regression model
Relation table similarity is calculated, and using the similarity as the weight of multi-tag figure;
2nd, multi-tag figure is clustered using multi-tag propagation algorithm, generation can the group of overlapping;
2.1st, determine that the parameter θ of multi-tag propagation algorithm, θ are the at most portable number of tags of each node;If user is specified
Pattern makes a summary final result class number as k, then it is k-1 to k+3 that θ, which attempts value, and final choice makes multi-tag propagate gained
Can the group of overlapping inside cluster the maximum θ of similarity, inside cluster similarity be defined as follows:
Define 3:Assuming that it is C={ C that multi-tag, which is propagated multi-tag figure cluster,1,C2,...,CmThe group of overlapping, then multi-tag
The intra-cluster similarity for propagating result C is as follows:
Wherein:
①.Sim(vi,vj) it is relation table viAnd vjBetween similarity;
②.|Ci| indicate CiIn relation table number;
2.2nd, one unique label is set for each node in label figure, the classification indications of the label are set as the section
The relationship table name of point, degree of membership are set as 1;
2.3rd, the label of all neighbor nodes of node is added to the section by each iteration according to the weight of degree of membership and side
Point label in, and do standardization make the node degree of membership and be 1,
Define 4:Normalization function bx(c,vi) indicate in x: th iteration, relation table viLabel in, corporations indications c and its
The mapping relations of degree of membership b are:
Wherein:
①.N(vi) it is relation table viAll neighborhood tables;
②.Indicate side (vi,vj) weight;
2.4th, the label that degree of membership is less than 1/ θ is deleted;
2.5th, when the number of nodes that labeled minimum classification indications are marked is constant, iteration stopping;Assuming that iteration knot
Shu Hou, remaining classification indications are m, will carry indications cmNode be referred to a CmIn, at this point, multi-tag figure is drawn
It is divided into the m group C={ C that there can be lap1,C2,...,Cm};
2.6th, θ takes different values, repeats above-mentioned 2.2nd to the 2.5th step, selects internal maximum one group of similarity of cluster that can weigh
The result that folded group propagates as multi-tag;
3rd, pair can the group of overlapping carry out hierarchical clustering, generate result class;
3.1st, calculate can similarity between the group of overlapping,
Define 5:CiAnd CjRespectively represent obtained two of multi-tag propagation clustering can the group of overlapping, CiAnd CjBetween similarity can
To be defined as:
Wherein, Sim (vi,vj) representation relation table viAnd vjBetween similarity, if there is no incidence edge between two tables, between them
Similarity is 0;
3.2nd, by each can one individual class of the group's of overlapping conduct, in each iteration, merge similarity maximum two
A class stops iteration after being incorporated into k result class specified by user;
4th, it is that each result class chooses subject heading list, final pattern abstract is returned into user;
4.1st, the importance of calculated relationship table;
The information content of 4.1.1, calculated relationship table,
Define 6:Attribute A in relation table R is denoted as R.A, the comentropy on the attribute is defined as:
Wherein, h indicates all numbers for differing value on attribute A;If the value on attribute A can be expressed as h different value
Set R.A={ a1,...,ah, use piTo indicate aiThe probability of appearance;
Define 7:The information content of relation table R is defined as:
Wherein, | R | indicate the tuple number in R;
Transition probability between 4.1.2, calculated relationship table,
Define 8:By taking relation table R and relation table S as an example, the definition of probability that S is transferred to by R is as follows:
Wherein:
1. .R.A-S.B indicates the foreign key reference between the A attributes and the B attributes of relation table S of relation table R;
2. is for arbitrary the attribute A ', q in RA′Indicate that R.A ' goes up all external key linking numbers;
4.3rd, using random walk model, using the information content of relation table as the initial value of random walk, between relation table
Transition probability of the transition probability as random walk, information content distribution when model reaches stable state are the importance of relation table;
4.4th, it selects in each result class the highest relation table of importance as such subject heading list, it is final to return to user
Pattern is made a summary.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510464314.1A CN105138588B (en) | 2015-07-31 | 2015-07-31 | A kind of database overlap scheme abstraction generating method propagated based on multi-tag |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510464314.1A CN105138588B (en) | 2015-07-31 | 2015-07-31 | A kind of database overlap scheme abstraction generating method propagated based on multi-tag |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105138588A CN105138588A (en) | 2015-12-09 |
CN105138588B true CN105138588B (en) | 2018-09-28 |
Family
ID=54723937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510464314.1A Expired - Fee Related CN105138588B (en) | 2015-07-31 | 2015-07-31 | A kind of database overlap scheme abstraction generating method propagated based on multi-tag |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105138588B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107402932B (en) * | 2016-05-20 | 2021-04-13 | 腾讯科技(深圳)有限公司 | Expansion processing method of user tag, text recommendation method and text recommendation device |
CN106991614A (en) * | 2017-03-02 | 2017-07-28 | 南京信息工程大学 | The parallel overlapping community discovery method propagated under Spark based on label |
CN108052587B (en) * | 2017-12-11 | 2021-11-05 | 成都逸重力网络科技有限公司 | Big data analysis method based on decision tree |
CN107992590B (en) * | 2017-12-11 | 2021-11-05 | 成都逸重力网络科技有限公司 | Big data system beneficial to information comparison |
CN107992608B (en) * | 2017-12-15 | 2021-07-02 | 南开大学 | SPARQL query statement automatic generation method based on keyword context |
CN110309419A (en) * | 2018-05-14 | 2019-10-08 | 桂林远望智能通信科技有限公司 | A kind of overlapping anatomic framework method for digging and device propagated based on balance multi-tag |
CN108804582A (en) * | 2018-05-24 | 2018-11-13 | 天津大学 | Method based on the chart database optimization of complex relationship between big data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102118249A (en) * | 2010-12-22 | 2011-07-06 | 厦门柏事特信息科技有限公司 | Photographing and evidence-taking method based on digital digest and digital signature |
CN102254022A (en) * | 2011-07-27 | 2011-11-23 | 河海大学 | Method for sharing metadata of information resources of various data types |
CN104036051A (en) * | 2014-07-04 | 2014-09-10 | 南开大学 | Database mode abstract generation method based on label propagation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7640160B2 (en) * | 2005-08-05 | 2009-12-29 | Voicebox Technologies, Inc. | Systems and methods for responding to natural language speech utterance |
-
2015
- 2015-07-31 CN CN201510464314.1A patent/CN105138588B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102118249A (en) * | 2010-12-22 | 2011-07-06 | 厦门柏事特信息科技有限公司 | Photographing and evidence-taking method based on digital digest and digital signature |
CN102254022A (en) * | 2011-07-27 | 2011-11-23 | 河海大学 | Method for sharing metadata of information resources of various data types |
CN104036051A (en) * | 2014-07-04 | 2014-09-10 | 南开大学 | Database mode abstract generation method based on label propagation |
Non-Patent Citations (1)
Title |
---|
"Summarizing Relational Database Schema Based on Label Propagation";Xiaojie Yuan et.al.;《APWeb 2014, LNCS 8709》;20141231;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN105138588A (en) | 2015-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105138588B (en) | A kind of database overlap scheme abstraction generating method propagated based on multi-tag | |
CN106919689B (en) | Professional domain knowledge mapping dynamic fixing method based on definitions blocks of knowledge | |
Lin et al. | CK-LPA: Efficient community detection algorithm based on label propagation with community kernel | |
CN104036051B (en) | A kind of database schema abstraction generating method propagated based on label | |
CN112463980A (en) | Intelligent plan recommendation method based on knowledge graph | |
CN105843799B (en) | A kind of academic paper label recommendation method based on multi-source heterogeneous information graph model | |
CN105893585B (en) | A kind of bigraph (bipartite graph) model academic paper recommended method of combination tag data | |
CN104346698B (en) | Based on the analysis of the food and drink member big data of cloud computing and data mining and checking system | |
CN105159971B (en) | A kind of cloud platform data retrieval method | |
CN104700190A (en) | Method and device for matching item and professionals | |
Zhang et al. | A survey of key technologies for high utility patterns mining | |
Dev et al. | Recommendation system for big data applications based on set similarity of user preferences | |
CN109871470A (en) | A kind of grid equipment data label management system and implementation method | |
Uotila et al. | MultiCategory: multi-model query processing meets category theory and functional programming | |
Zhuge et al. | Automatic maintenance of category hierarchy | |
CN104317853B (en) | A kind of service cluster construction method based on Semantic Web | |
CN106651461A (en) | Film personalized recommendation method based on gray theory | |
CN106055690B (en) | A kind of quick-searching based on attributes match and acquisition data characteristics method | |
Dong et al. | ETTA-IM: A deep web query interface matching approach based on evidence theory and task assignment | |
CN103150371B (en) | Forward and reverse training goes to obscure text searching method | |
CN110706049A (en) | Data processing method and device | |
CN103164499A (en) | Order clustering method during product planning | |
Li et al. | Cost-based query optimization for XPath | |
Yin et al. | Heterogeneous information network model for equipment-standard system | |
CN104102654A (en) | Vocabulary clustering method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180928 Termination date: 20210731 |
|
CF01 | Termination of patent right due to non-payment of annual fee |