CN105138588B - A kind of database overlap scheme abstraction generating method propagated based on multi-tag - Google Patents

A kind of database overlap scheme abstraction generating method propagated based on multi-tag Download PDF

Info

Publication number
CN105138588B
CN105138588B CN201510464314.1A CN201510464314A CN105138588B CN 105138588 B CN105138588 B CN 105138588B CN 201510464314 A CN201510464314 A CN 201510464314A CN 105138588 B CN105138588 B CN 105138588B
Authority
CN
China
Prior art keywords
relation table
similarity
tag
database
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510464314.1A
Other languages
Chinese (zh)
Other versions
CN105138588A (en
Inventor
袁晓洁
于漫
王超
靳宇东
温延龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN201510464314.1A priority Critical patent/CN105138588B/en
Publication of CN105138588A publication Critical patent/CN105138588A/en
Application granted granted Critical
Publication of CN105138588B publication Critical patent/CN105138588B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data

Abstract

A kind of database overlap scheme abstraction generating method propagated based on multi-tag.Including:It is multi-tag graph model by database schema information MAP;Database pattern information is clustered using multi-tag propagation algorithm, generation can the group of overlapping;Using hierarchical clustering algorithm pair can the group of overlapping cluster, further generate suitable scale result class;It is that each result class chooses subject heading list to be finally based on comentropy and random walk model, is made a summary with generating final database overlap scheme.Overlap scheme summarization generation scheme proposed by the present invention can provide more accurate, meaningful database overlap scheme abstract to the user, help user that database information is understood quickly.

Description

A kind of database overlap scheme abstraction generating method propagated based on multi-tag
Technical field
The invention belongs to database technical fields, and in particular to a kind of novel relational database overlap scheme summarization generation Technology.
Background technology
With universal and information technology the rapid development of computer, a large amount of data information makes database technology obtain Extensive use, database application start to move towards ordinary user.However the scale in modern data library is often very huge and answers Miscellaneous, user just has to the pattern information tool to database to generate structured query language appropriate in query process There is certain understanding.However the pattern information corresponding to large scale database is generally also sufficiently complex, and generally existing is related Document deficient phenomena more understands database schema to user and causes difficulty.
Pattern summarization generation technology can the effective solution above problem, provide the database schema of a simplicity to the user Summary improves the availability of database.Existing pattern abstract solution is all only absorbed in the generation of non-overlapping pattern abstract, A theme class for namely only allowing a database relational table to belong to during pattern is made a summary, however in reality, database closes It is table can often possess multi-meaning and be under the jurisdiction of multiple theme class.Only consider that non-overlapping situation can cause abstract result endless It is whole to misunderstand even with family.
It often can not meet the problem of user demand comprehensively relative to non-overlapping pattern abstract.Overlap scheme summarization generation Technology can generate more rational database schema summary info, effectively reduce that user understands that database schema consumed when Between and energy, have extensive future in engineering applications.
Invention content
It is an object of the invention to overcome deficiencies of the prior art, a kind of number propagated based on multi-tag is proposed According to library overlap scheme abstract automatic generation method.
The database overlap scheme abstraction generating method provided by the invention propagated based on multi-tag, innovatively proposes weight Folded pattern abstract concept;Design a kind of new database multi-tag pattern graph model;Use multi-tag propagation algorithm and level Clustering algorithm respectively clusters database schema;Final each result class for cluster gained chooses a subject heading list, is User returns to the pattern that can a be overlapped abstract.The step of this method, is as follows:
The 1st, database schema is mapped as to the multi-tag figure of a Weight;
The 1.1st, database schema is mapped as to a multi-tag figure,
Define 1:One relational data base schema can be mapped as a multi-tag figure, with a triple G=(V, E, LM) indicate, wherein:
1. .V indicates that the set of relation table node in database, v ∈ V indicate the relation table node in database;
2. .E indicates that the set of foreign key relationship in database, e ∈ E indicate the foreign key relationship in database;
③.LMFor a label mapping function, node is mapped to one or more corresponding label, wherein label is used (c, b) is indicated, c indicates that a result class indications, b are label degree of membership, indicates a database relational table v and its result class Indications c's is subordinate to intensity;
1.2nd, the similitude between two relation tables on connection side in multi-tag figure is calculated, as label figure weight;
1.2.1, use space vector model calculated relationship table table name and attribute-name text similarity, as relationship The title similarity of table;
1.2.2, numerical value similarity analysis is carried out to the value of relation table attribute column using Jaccard coefficients, and by greedy Center algorithm finds best match attribute pair, and best match attribute is taken to acquire relationship tabular value similarity to the average value of value similarity;
1.2.3, by analyze relation table between count rate, calculate the mapping relations similarity of relation table,
Define 2:Mapping relations similarity between relation table R and relation table S is denoted as Simm (R, S), is defined as follows:
Wherein:
1. τ indicate all tuples of relation table;
②.fan(τi) it is tuple τiDegree of being fanned out on connection side e, degree of being fanned out to are for the connection between tuple and tuple Edge strip number and define, indicate the different tuple numbers that certain a line tuple can connect;
③.qiMeet fan (τ to be all in relation table Ri) > 0 number of tuples;
1.2.4, based on above-mentioned 1.2.1 to 1.2.3 walk in three kinds of similarity features, using multiple linear regression Relation table similarity is calculated in model, and using the similarity as the weight of multi-tag figure.
2nd, multi-tag figure is clustered using multi-tag propagation algorithm, generation can the group of overlapping;
2.1st, determine that the parameter θ of multi-tag propagation algorithm, θ are the at most portable number of tags of each node;If user Designated mode makes a summary final result class number as k, then it is k-1 to k+3 that θ, which attempts value, and final choice makes multi-tag propagate institute The inside of the group of overlapping obtained clusters the maximum θ of similarity, and inside cluster similarity is defined as follows:
Define 3:Assuming that it is C={ C that multi-tag, which is propagated multi-tag figure cluster,1,C2,...,CmThe group of overlapping, it is so much The intra-cluster similarity that label propagates result C is as follows:
Wherein:
①.Sim(vi,vj) it is relation table viAnd vjBetween similarity;
②.|Ci| indicate CiIn relation table number;
2.2nd, one unique label is set for each node in label figure, the classification indications of the label are set as The relationship table name of the node, degree of membership are set as 1;
2.3rd, the label of all neighbor nodes of node is added to by each iteration according to the weight on degree of membership and side In the label of the node, and do standardization make the node degree of membership and be 1,
Define 4:Normalization function bx(c,vi) indicate in x: th iteration, relation table viLabel in, corporations indications c Mapping relations with its degree of membership b are:
Wherein:
①.N(vi) it is relation table viAll neighborhood tables;
②.Indicate side (vi,vj) weight;
2.4th, the label that degree of membership is less than 1/ θ is deleted;
2.5th, when the number of nodes that labeled minimum classification indications are marked is constant, iteration stopping;Assuming that repeatedly After generation, remaining classification indications are m, will carry indications cmNode be referred to a CmIn, at this point, multi-tag figure It is divided into the m group C={ C that there can be lap1,C2,...,Cm};
2.6th, θ takes different values, repeats above-mentioned 2.2nd to the 2.5th step, selects internal maximum one group of similarity of cluster It can the result propagated as multi-tag of the group of overlapping.
3rd, pair can the group of overlapping carry out hierarchical clustering, generate result class;
3.1st, calculate can similarity between the group of overlapping,
Define 5:CiAnd CjRespectively represent obtained two of multi-tag propagation clustering can the group of overlapping, CiAnd CjBetween phase It can be defined as like degree:
Wherein, Sim (vi,vj) representation relation table viAnd vjBetween similarity, if there is no incidence edge between two tables, they Between similarity be 0;
3.2nd, by each can one individual class of the group's of overlapping conduct it is maximum to merge similarity in each iteration Two classes, stop iteration after being incorporated into k result class specified by user.
4th, it is that each result class chooses subject heading list, final pattern abstract is returned into user;
4.1st, the importance of calculated relationship table;
The information content of 4.1.1, calculated relationship table,
Define 6:Attribute A in relation table R is denoted as R.A, the comentropy on the attribute is defined as:
Wherein, h indicates all numbers for differing value on attribute A;If the value on attribute A can be expressed as h difference Set R.A={ a of value1,...,ah, use piTo indicate aiThe probability of appearance;
Define 7:The information content of relation table R is defined as:
Wherein, | R | indicate the tuple number in R;
Transition probability between 4.1.2, calculated relationship table,
Define 8:By taking relation table R and relation table S as an example, the definition of probability that S is transferred to by R is as follows:
Wherein:
1. .R.A-S.B indicates the foreign key reference between the A attributes and the B attributes of relation table S of relation table R;
2. is for arbitrary the attribute A ', q in RA′Indicate that R.A ' goes up all external key linking numbers;
4.3rd, using random walk model, using the information content of relation table as the initial value of random walk, with relation table Between transition probability of the transition probability as random walk, information content distribution when model reaches stable state is the important of relation table Degree;
4.4th, the highest relation table of importance in each result class is selected to return to user most as such subject heading list Whole pattern abstract.
The advantages of the present invention:
The present invention innovatively proposes a kind of database schema to the mapping method of multi-tag figure, and the classification of relation table is believed Breath is stored by label in the form of, and the final cluster result of pattern abstract is determined by degree of membership;It analyses in depth based on the more of figure Label propagation algorithm, and a kind of pattern abstract Auto-generation Model propagated based on multi-tag is proposed based on this;With biography System model is compared, and the model inheritance advantage of multi-tag propagation algorithm can automatically generate the pattern with lap and pluck It wants, and achieves higher clustering precision;Help is provided for user's quick-searching database;
Description of the drawings
Fig. 1 is method general flow chart;
Fig. 2 is primitive relation database schema figure;
Fig. 3 is the corresponding multi-tag diagram form of example relationship database;
Fig. 4 is that the group of overlapping after multi-tag propagation clustering divides;
Fig. 5 is that the result class after hierarchical clustering divides;
Fig. 6 is pattern abstract result figure, wherein a, b are the corresponding Database clustering figure of pattern abstract, and c is that pattern is made a summary Figure;Table 1 is illustrative data base relation table importance result of calculation information.
Specific implementation mode
The process flow of the method for the present invention is as shown in Figure 1.
The specific implementation mode that the method for the present invention is introduced with reference to embodiment is illustrated in figure 2 embodiment relation data Library ideograph.The pattern abstract generated by overlap scheme abstraction generating method is as shown in fig. 6, wherein Fig. 6 (c) is overlap scheme Summary figure clears complex patterns relationship convenient for user, meanwhile, certain part that user can also be directed in pattern summary figure is looked into detail It sees, after expansion as shown in Fig. 6 (a) and (b).The specific steps of the method for the present invention are introduced below in conjunction with embodiment shown in Fig. 2:
Step 1:Database schema is mapped as to the multi-tag figure of a Weight.
The 1.1st, database schema is mapped as to a multi-tag figure,
The pattern information formal definitions of relational database are more than one by the pattern information for traversing relational database first Label figure, by triple G=(V, E, LM) indicate, wherein V indicates that the set of relation table node in database, v ∈ V indicate data Relation table node in library;E indicates that the set of foreign key relationship in database, e ∈ E indicate the foreign key relationship in database;LMFor Node is mapped to one or more corresponding label by one label mapping function, and wherein label indicates that c is indicated with (c, b) One result class indications, b are label degree of membership, indicate that a database relational table v and its result class c's is subordinate to intensity.Fig. 3 The corresponding multi-tag diagram form of example relationship database in Fig. 2 is shown, initially, is only arranged for each relation table in multi-tag figure One unique label, indications are the table name of relation table, degree of membership 1.
1.2nd, multi-tag figure weight is calculated, is as follows:
1.2.1, use space vector model calculated relationship table table name and attribute-name text similarity, as relationship The title similarity of table;
Regard every relation table as one be made of the table name and attribute-name of the relation table Jing Guo word segmentation processing first Text, by taking the ProductCategory relation tables in Fig. 2 as an example, the table name of the relation table can be divided into following word with attribute-name Element:Product, Category, ID and Type, wherein Category are in the text that relation table ProductCategory is indicated Occur three times, ID and Type occur once;Regard entire relational database as a text being made of the morpheme after segmenting The name information of relation table is mapped as a space vector by this collection by weight of the calculated relationship table morpheme in text set; The angle of two spaces vector, i.e., the title similarity of two relation tables are calculated using vector space model;
1.2.2, numerical value similarity analysis is carried out to the value of relation table attribute column using Jaccard coefficients, and by greedy Center algorithm finds best match attribute pair, and best match attribute is taken to acquire relationship tabular value similarity to the average value of value similarity;
The pseudocode of the lookup algorithm specific implementation of best match attribute pair is as follows:
Algorithm 1:The lookup algorithm GreedyMatching of best match attribute pair
Input:The attribute of relation table R, relation table S, the R and S that are computed are to similarity set P
Output:Best match attribute set Z
By in Fig. 2 Product relation tables and ProductCategory relation tables for, calculate this two passes first It is the attributes similarity between table:J (Product.ProductID, ProductCategory.CategoryID)=0.1, J (Product.ProducName, ProductCategory.CategoryType)=0.05, J (Product.CategoryID, ProductCategory.CategoryID)=0.8, the similar value between other attributes is 0.Best is excavated by algorithm 1 It is properties right, be respectively:J (Product.CategoryID, ProductCategory.CategoryID) and J (Product.ProducName,ProductCategory.CategoryType).Therefore Product relation tables and Value similarity between ProductCategory relation tables:Simv(Product, ProductCategory)=(0.8+ 0.05)/2=0.425.
1.2.3, for the connection edge strip number between tuple and tuple, certain a line tuple energy is indicated using tuple degree of being fanned out to The different tuple numbers enough connected indicate that the mapping of relation table is closed by defining the linear function directly proportional to tuple degree of being fanned out to It is similarity.
1.2.4, it is finally based on above-mentioned three kinds of relation tables similarity feature, using multiple linear regression model comprehensive consideration Relation table similarity is calculated in each feature, the weight as multi-tag figure;Title first between relation table is similar Degree, value similarity and mapping relations similarity are normalized, and data is made to be mapped within the scope of 0~1.Next it uses Multiple linear regression model, it is considered herein that the influence of the title factor, the value factor and mapping relations factor pair relation table similarity Degree is successively decreased successively, therefore by the parameter alpha in algorithm, beta, gamma, and δ is set to 6.4,4.8,2.0 and 0.2, makes Sim (R, S) ∈ [0,1]。
Step 2:Multi-tag figure is clustered using multi-tag propagation algorithm, generation can the group of overlapping.
2.1st, determine that the parameter θ of multi-tag propagation algorithm, θ are the at most portable number of tags of each node;If user Designated mode makes a summary final result class number as k, then it is k-1 to k+3 that θ, which attempts value,;
By taking Fig. 2 illustrative data bases as an example, when designated result class number k is 2, the value of θ this attempt 1,2,3,4,5 respectively To carry out multi-tag propagation.
2.2nd, one unique label is set for each node in label figure, the classification indications of the label are set as The relationship table name of the node, degree of membership are set as 1.
2.3rd, the label of a nodes neighbors node is added to according to degree of membership in the label of the node by each iteration, And do standardization make the node degree of membership and be 1.
2.4th, being unlikely to last each node again to retain multiple labels is owned by all labels, and algorithm calculates each The degree of membership of label, and delete those labels for being less than given threshold value.Threshold value herein is 1/ θ.
2.5th, mostly after wheel iteration, when the number of nodes that labeled minimum classification indications are marked is constant, stop Iteration;The relation table for carrying identical indications label at this time is divided into one can be in the group of overlapping.
2.6th, different values is taken, repeats above-mentioned 2.2nd to the 2.5th step, select internal maximum one group of similarity of cluster It can the result propagated as multi-tag of the group of overlapping.
By taking Fig. 2 illustrative data bases as an example, when designated result class number k is 2, the value attempted respectively is 1,2,3,4,5 Multi-tag propagation is carried out, 5 groups of result classes are obtained, finds that, when value is 3, the inside of acquired results clusters similarity by calculating It is maximum;When Fig. 4 is that θ takes 3, the group of overlapping that multi-tag is marked off after propagating, wherein the lap of group 1 and group 2 is relationship Table ZipCode and Order, the lap between group 1 and group 3 are relation table Supply.
Step 3, pair can the group of overlapping carry out hierarchical clustering, generate result class.
3.1st, calculating can similarity between the group of overlapping.
3.2nd, by each can one individual class of the group's of overlapping conduct it is maximum to merge similarity in each iteration Two classes, stop iteration after being incorporated into k result class specified by user.
The pseudocode of hierarchical clustering algorithm specific implementation is as follows:
Algorithm 2:Hierarchical clustering algorithm HierarchicalClustering
Input:It can the group of overlapping division C={ C1,C2,...,Cm, as a result class number k
Output:As a result class divides C={ C1,C2,...,Ck}
Algorithm 2 describes the execution flow of hierarchical clustering algorithm.The algorithm first by each can the group of overlapping as one Individual result class;In each step iterative process, maximum two classes of similarity are searched, are merged, as 2. arrived in algorithm Shown in 4.;Iterative process can carry out always, until reaching k result class.
By taking the exemplary groups of overlapping of Fig. 4 as an example, when designated result class number k is 2, hierarchical clustering is obtaining two results Stop after class.As shown in figure 5, ideograph is divided into 2 result classes at this time, and the overlapping portion that relation table Supply is two classes Point.
Step 4 chooses subject heading list for each result class, and final pattern abstract is returned to user.
4.1st, the weight of every relation table is weighed by main foreign key information, attribute information and the tuple information in relation table The property wanted.Part relation table importance result of calculation information in illustrative data base is listed in table 1.
1 illustrative data base relation table importance result of calculation information of table
Ranking Relation table Importance
1 Company 189.35
2 Order 183.28
3 Customer 116.54
4 Product 101.07
The pseudocode of calculated relationship table importance specific implementation is as follows:
Algorithm 3:Calculated relationship table importance method TableImportance
Input:Label figure G
Output:Relation table importance vector I
The algorithm description method of calculated relationship table importance.First, according to the main foreign key information in relation table, attribute Information and tuple information calculate the information content of every relation table, and the information content by calculating gained is used as the initial of random walk It is worth, then the transition probability between the relation table as obtained by the foreign key reference relationship calculating between relation table, along the side root in figure It is sent and received information repeatedly according to transition probability, until random process converges to a Stable distritation.When finally, by Stationary Distribution The information magnitude of each relation table is defined as the importance of the relation table.
4.2nd, the highest relation table of importance in each result class is selected to return to user most as such subject heading list Whole pattern abstract.
By taking the exemplary result classes of Fig. 5 as an example, the highest relation table of importance is chosen as subject heading list for each result class, Most important table is Company in middle classification 1, and most important table is Product in classification 2.Fig. 6 is being overlapped of automatically generating Pattern summary figure, wherein Fig. 6 (a) and (b), which show cluster result and be mapped to the result after relational database, to be shown.

Claims (1)

1. a kind of database overlap scheme abstraction generating method propagated based on multi-tag, it is characterised in that this method includes:
The 1st, database schema is mapped as to the multi-tag figure of a Weight;
The 1.1st, database schema is mapped as to a multi-tag figure,
Define 1:One relational data base schema can be mapped as a multi-tag figure, with triple G=(V, E, a LM) table Show, wherein:
1. .V indicates that the set of relation table node in database, v ∈ V indicate the relation table node in database;
2. .E indicates that the set of foreign key relationship in database, e ∈ E indicate the foreign key relationship in database;
③.LMFor a label mapping function, node is mapped to one or more corresponding label, wherein label uses (c, b) It indicates, c indicates that a result class indications, b are label degree of membership, indicates that a database relational table v is indicated with its result class Symbol c's is subordinate to intensity;
1.2nd, the similitude between two relation tables on connection side in multi-tag figure is calculated, as label figure weight;
1.2.1, use space vector model calculated relationship table table name and attribute-name text similarity, as relation table Title similarity;
1.2.2, numerical value similarity analysis is carried out to the value of relation table attribute column using Jaccard coefficients, and is calculated by greed Method finds best match attribute pair, and best match attribute is taken to acquire relationship tabular value similarity to the average value of value similarity;
1.2.3, by analyze relation table between count rate, calculate the mapping relations similarity of relation table,
Define 2:Mapping relations similarity between relation table R and relation table S, is denoted as Simm(R, S), is defined as follows:
Wherein:
1. τ indicate all tuples of relation table;
②.fan(τi) it is tuple τiDegree of being fanned out on connection side e, degree of being fanned out to are for the connection edge strip between tuple and tuple It is several and definition, indicate the different tuple numbers that certain a line tuple can connect;
③.qiMeet fan (τ to be all in relation table Ri) > 0 number of tuples;
1.2.4, based on above-mentioned 1.2.1 to 1.2.3 walk in three kinds of similarity features, using multiple linear regression model Relation table similarity is calculated, and using the similarity as the weight of multi-tag figure;
2nd, multi-tag figure is clustered using multi-tag propagation algorithm, generation can the group of overlapping;
2.1st, determine that the parameter θ of multi-tag propagation algorithm, θ are the at most portable number of tags of each node;If user is specified Pattern makes a summary final result class number as k, then it is k-1 to k+3 that θ, which attempts value, and final choice makes multi-tag propagate gained Can the group of overlapping inside cluster the maximum θ of similarity, inside cluster similarity be defined as follows:
Define 3:Assuming that it is C={ C that multi-tag, which is propagated multi-tag figure cluster,1,C2,...,CmThe group of overlapping, then multi-tag The intra-cluster similarity for propagating result C is as follows:
Wherein:
①.Sim(vi,vj) it is relation table viAnd vjBetween similarity;
②.|Ci| indicate CiIn relation table number;
2.2nd, one unique label is set for each node in label figure, the classification indications of the label are set as the section The relationship table name of point, degree of membership are set as 1;
2.3rd, the label of all neighbor nodes of node is added to the section by each iteration according to the weight of degree of membership and side Point label in, and do standardization make the node degree of membership and be 1,
Define 4:Normalization function bx(c,vi) indicate in x: th iteration, relation table viLabel in, corporations indications c and its The mapping relations of degree of membership b are:
Wherein:
①.N(vi) it is relation table viAll neighborhood tables;
②.Indicate side (vi,vj) weight;
2.4th, the label that degree of membership is less than 1/ θ is deleted;
2.5th, when the number of nodes that labeled minimum classification indications are marked is constant, iteration stopping;Assuming that iteration knot Shu Hou, remaining classification indications are m, will carry indications cmNode be referred to a CmIn, at this point, multi-tag figure is drawn It is divided into the m group C={ C that there can be lap1,C2,...,Cm};
2.6th, θ takes different values, repeats above-mentioned 2.2nd to the 2.5th step, selects internal maximum one group of similarity of cluster that can weigh The result that folded group propagates as multi-tag;
3rd, pair can the group of overlapping carry out hierarchical clustering, generate result class;
3.1st, calculate can similarity between the group of overlapping,
Define 5:CiAnd CjRespectively represent obtained two of multi-tag propagation clustering can the group of overlapping, CiAnd CjBetween similarity can To be defined as:
Wherein, Sim (vi,vj) representation relation table viAnd vjBetween similarity, if there is no incidence edge between two tables, between them Similarity is 0;
3.2nd, by each can one individual class of the group's of overlapping conduct, in each iteration, merge similarity maximum two A class stops iteration after being incorporated into k result class specified by user;
4th, it is that each result class chooses subject heading list, final pattern abstract is returned into user;
4.1st, the importance of calculated relationship table;
The information content of 4.1.1, calculated relationship table,
Define 6:Attribute A in relation table R is denoted as R.A, the comentropy on the attribute is defined as:
Wherein, h indicates all numbers for differing value on attribute A;If the value on attribute A can be expressed as h different value Set R.A={ a1,...,ah, use piTo indicate aiThe probability of appearance;
Define 7:The information content of relation table R is defined as:
Wherein, | R | indicate the tuple number in R;
Transition probability between 4.1.2, calculated relationship table,
Define 8:By taking relation table R and relation table S as an example, the definition of probability that S is transferred to by R is as follows:
Wherein:
1. .R.A-S.B indicates the foreign key reference between the A attributes and the B attributes of relation table S of relation table R;
2. is for arbitrary the attribute A ', q in RA′Indicate that R.A ' goes up all external key linking numbers;
4.3rd, using random walk model, using the information content of relation table as the initial value of random walk, between relation table Transition probability of the transition probability as random walk, information content distribution when model reaches stable state are the importance of relation table;
4.4th, it selects in each result class the highest relation table of importance as such subject heading list, it is final to return to user Pattern is made a summary.
CN201510464314.1A 2015-07-31 2015-07-31 A kind of database overlap scheme abstraction generating method propagated based on multi-tag Expired - Fee Related CN105138588B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510464314.1A CN105138588B (en) 2015-07-31 2015-07-31 A kind of database overlap scheme abstraction generating method propagated based on multi-tag

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510464314.1A CN105138588B (en) 2015-07-31 2015-07-31 A kind of database overlap scheme abstraction generating method propagated based on multi-tag

Publications (2)

Publication Number Publication Date
CN105138588A CN105138588A (en) 2015-12-09
CN105138588B true CN105138588B (en) 2018-09-28

Family

ID=54723937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510464314.1A Expired - Fee Related CN105138588B (en) 2015-07-31 2015-07-31 A kind of database overlap scheme abstraction generating method propagated based on multi-tag

Country Status (1)

Country Link
CN (1) CN105138588B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107402932B (en) * 2016-05-20 2021-04-13 腾讯科技(深圳)有限公司 Expansion processing method of user tag, text recommendation method and text recommendation device
CN106991614A (en) * 2017-03-02 2017-07-28 南京信息工程大学 The parallel overlapping community discovery method propagated under Spark based on label
CN108052587B (en) * 2017-12-11 2021-11-05 成都逸重力网络科技有限公司 Big data analysis method based on decision tree
CN107992590B (en) * 2017-12-11 2021-11-05 成都逸重力网络科技有限公司 Big data system beneficial to information comparison
CN107992608B (en) * 2017-12-15 2021-07-02 南开大学 SPARQL query statement automatic generation method based on keyword context
CN110309419A (en) * 2018-05-14 2019-10-08 桂林远望智能通信科技有限公司 A kind of overlapping anatomic framework method for digging and device propagated based on balance multi-tag
CN108804582A (en) * 2018-05-24 2018-11-13 天津大学 Method based on the chart database optimization of complex relationship between big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102118249A (en) * 2010-12-22 2011-07-06 厦门柏事特信息科技有限公司 Photographing and evidence-taking method based on digital digest and digital signature
CN102254022A (en) * 2011-07-27 2011-11-23 河海大学 Method for sharing metadata of information resources of various data types
CN104036051A (en) * 2014-07-04 2014-09-10 南开大学 Database mode abstract generation method based on label propagation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7640160B2 (en) * 2005-08-05 2009-12-29 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102118249A (en) * 2010-12-22 2011-07-06 厦门柏事特信息科技有限公司 Photographing and evidence-taking method based on digital digest and digital signature
CN102254022A (en) * 2011-07-27 2011-11-23 河海大学 Method for sharing metadata of information resources of various data types
CN104036051A (en) * 2014-07-04 2014-09-10 南开大学 Database mode abstract generation method based on label propagation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Summarizing Relational Database Schema Based on Label Propagation";Xiaojie Yuan et.al.;《APWeb 2014, LNCS 8709》;20141231;全文 *

Also Published As

Publication number Publication date
CN105138588A (en) 2015-12-09

Similar Documents

Publication Publication Date Title
CN105138588B (en) A kind of database overlap scheme abstraction generating method propagated based on multi-tag
CN106919689B (en) Professional domain knowledge mapping dynamic fixing method based on definitions blocks of knowledge
Lin et al. CK-LPA: Efficient community detection algorithm based on label propagation with community kernel
CN104036051B (en) A kind of database schema abstraction generating method propagated based on label
CN112463980A (en) Intelligent plan recommendation method based on knowledge graph
CN105843799B (en) A kind of academic paper label recommendation method based on multi-source heterogeneous information graph model
CN105893585B (en) A kind of bigraph (bipartite graph) model academic paper recommended method of combination tag data
CN104346698B (en) Based on the analysis of the food and drink member big data of cloud computing and data mining and checking system
CN105159971B (en) A kind of cloud platform data retrieval method
CN104700190A (en) Method and device for matching item and professionals
Zhang et al. A survey of key technologies for high utility patterns mining
Dev et al. Recommendation system for big data applications based on set similarity of user preferences
CN109871470A (en) A kind of grid equipment data label management system and implementation method
Uotila et al. MultiCategory: multi-model query processing meets category theory and functional programming
Zhuge et al. Automatic maintenance of category hierarchy
CN104317853B (en) A kind of service cluster construction method based on Semantic Web
CN106651461A (en) Film personalized recommendation method based on gray theory
CN106055690B (en) A kind of quick-searching based on attributes match and acquisition data characteristics method
Dong et al. ETTA-IM: A deep web query interface matching approach based on evidence theory and task assignment
CN103150371B (en) Forward and reverse training goes to obscure text searching method
CN110706049A (en) Data processing method and device
CN103164499A (en) Order clustering method during product planning
Li et al. Cost-based query optimization for XPath
Yin et al. Heterogeneous information network model for equipment-standard system
CN104102654A (en) Vocabulary clustering method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180928

Termination date: 20210731

CF01 Termination of patent right due to non-payment of annual fee