CN105912646A - Keyword retrieval method based on diversity and proportion characteristics - Google Patents

Keyword retrieval method based on diversity and proportion characteristics Download PDF

Info

Publication number
CN105912646A
CN105912646A CN201610218405.1A CN201610218405A CN105912646A CN 105912646 A CN105912646 A CN 105912646A CN 201610218405 A CN201610218405 A CN 201610218405A CN 105912646 A CN105912646 A CN 105912646A
Authority
CN
China
Prior art keywords
node
value
size
weights
multiformity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610218405.1A
Other languages
Chinese (zh)
Other versions
CN105912646B (en
Inventor
才智
兰许
曹阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201610218405.1A priority Critical patent/CN105912646B/en
Publication of CN105912646A publication Critical patent/CN105912646A/en
Application granted granted Critical
Publication of CN105912646B publication Critical patent/CN105912646B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a keyword retrieval method based on diversity and proportion characteristics. For a keyword and the natural number 1 which are input by a user, according to a link relation between the keyword and object information, an algorithm is utilized to return one piece of most comprehensive object information based on the keyword to the user. The keyword retrieval method comprises the steps of (1) designing static off-line ordering evaluation scores according to a link analysis algorithm PageRank, and generating initial values of all nodes; (2) inputting a keyword to generate an alternative OS; and (3) inputting the natural number 1, and generating a DS-rooted tree finally containing a node by a k LASP algorithm according to the obtained OS. The experimental results show that the experimental effects obtained by the method are significant.

Description

A kind of keyword retrieval method based on multiformity and proportionality
Technical field
The invention belongs to Data Mining, relate to a kind of keyword retrieval method based on multiformity and proportionality.
Background technology
Along with the development of the Internet, search engine brings huge as a kind of novel Network retrieval technology to user Convenient.But due to developing rapidly of network in recent years, significantly increasing occurs in the quantity of information of the Internet, and big data are as one Individual emerging field is flooded with life, and this allows for user when facing this substantial amounts of information, and search engine possibly cannot be recommended Go out information diversified, that arrange based on keyword retrieval by significance level.One that solves this problem has latent profit very much Method be to provide an arranging system, it can to return l bar important information, (wherein l be according to the key word that user is given Natural number), and arrange by multiformity and proportionality.
This technology introduces tuple-set (ObjectSummaries is abbreviated as OS), and it is to comprise the data of key word The set of the information tuple based on key word generated in storehouse.One OS can be with key word as root, adjacent with key word Node is the tree structure of its descendant nodes.In order to generate OS, one is intended to have about inquiry data subject (Data Subjects, is abbreviated as DS) relation of information, this relation is abbreviated as RDS, it is i.e. the root of tree structure;Another need with RDSThe relation of link, namely generates RDSDescendants.For each RDSFor can form a DS ideograph, namely GDS.This technology is constantly to carry out beta pruning optimization according to the OS generated finally to draw important information.
One complete OS may have thousands of bar tuple information, all enumerate out by these information and not only can disappear Consume the more time, and it is also extremely difficult that user chooses information wherein useful for oneself, so selecting Choose the tuple information that l bar is most useful;Natural number l to input, (refers to step by using k-LASP algorithm in whole OS 3.3) l bar important information (i.e. size-l OS) is obtained, if the quiescent value that light uses PageRank or ValueRank to calculate is come Return information, then may make a plurality of similar information repeat, so in order to enable this l bar information going up to greatest extent Present to the more diversified information of user, allow users to more fully understand information, introduce multiformity (Dsize-l) and ratio The method of characteristic (Psize-l) two kinds balance information importance.This method can not only greatly reduce the consumption of time, improves The efficiency of return information, and disclosure satisfy that user to search information diversified demand, optimize to a certain extent based on The search of key word.
Summary of the invention
The present invention provides a kind of keyword retrieval method based on multiformity and proportionality, the key being inputted user Word and natural number l, then according to the linking relationship between key word and each tuple information, use algorithm to return to user's l bar Comprehensively tuple information based on key word.
A kind of keyword retrieval method based on multiformity and proportionality, the steps include:
Step one: inspired by link analysis technology PageRank, design static off-line sequence evaluation score, generate all The initial value of node;
Step 1.1: collect and disposal data collection, builds data relationship.At this moment definition directed graph G (V, E), wherein V (v1,...,vn) it is node (summit) collection, node on behalf various information here, E is the set of representative edge (arc), E={ < vi, vj>|vi,vj∈ V}, < vi,vj> represent from viTo vjA limit (arc), i.e. viInformation can be linked to vj
Step 1.2:r is a vector (queue of the evaluation score of each the page), the most each node viAll exist Corresponding ri, then the evaluation score of iterative computation vector r is carried out by below equation:
r = d A r + ( 1 - d ) e | V | - - - ( 1 )
Wherein d is the damped coefficient of (0,1), and this coefficient ensure that and obtains more accurate result, and general value is 0.85;A is a n*n matrix, and n represents number of vertices, if wherein existing from viTo vjLimit (arc), then(O(vj) Represent vjOut-degree), be otherwise 0, say, that if there being three nodes, then A is a 3*3 matrix, v0To v1And v2Have limit and v1To v2There is limit, thenAnd A21=1, remaining is all 0;E=[1....1]T;| V | is number of vertices.
To sum up, iterative computation goes out the evaluation score of each node in data set, and at this moment this value is referred to as overall situation weights (globalimportance is abbreviated as gi), i.e. gi (vi) represent viThe initial value of node.
Step 2: input key word generates alternative OS;
Step 2.1: input key word (i.e. DS), system generation one is (i.e. R with DS summit as root nodeDS), with energy and RDS The tree that relation is descendants of link, i.e. OS.In order to distinguish each unit group node v in OS during generating OSiImportant Property, is by this tuple overall weights in data base by a local weight (local importance is abbreviated as li) (gi) and this tuple in OS and RDSAffinity (Affinity is abbreviated as Af) two parts determined;
Step 2.2: in generating OS, GDSIn with RDSThe relation having higher affinity will be added in OS, RiTo RDS's Affinity Af (Ri) by below equation iterative computation:
A f ( R i ) = &Sigma; j w j m j &CenterDot; A f ( R P a r e n t ) - - - ( 2 )
Wherein j is a scope, and this scope is index set (m1,m2,...,mn) and its corresponding weights set (w1,w2,...,wn), four indexs of consideration here: index m1For RiTo RDSDistance, the namely distance between two relations The least, affinity is the highest;Index m2For the relative radix of relation, namely RiWith RPatentIn the average unit that is connected of each tuple The quantity of group;Index m3Anti-phase to radix for relation, i.e. RPatentWith RiIn a tuple be connected par;Index m4 For RiThe connectedness of pattern, i.e. RiThe quantity of the link in graph of a relation.Af(RParent) refer to RiFather's node and RDS's Affinity, initial value is 1, i.e. RDSThe affinity of itself is 1.The fraction range of index is [0,1], and the summation of corresponding weights is 1 (the corresponding weights of aforementioned four index are all 0.25).And in the generation of OS, the affinity of all relation nodes all should Higher than marginal value θ;
Step 2.3: the formula of the importance Im (S) calculating alternative size-l OS S is:
Im ( S ) = &Sigma; n i &Element; S Im ( O S , R i ) - - - ( 3 )
Wherein Im (OS, Ri) it is OS interior joint RiLi value, Im (OS, Ri) can be calculated by below equation:
Im(OS,Ri)=Im (Ri)·Af(Ri) (4)
Wherein, Im (Ri) it is RiGl value, Af (Ri) it is RiTo RDSAffinity.
To sum up calculate Im value according to the key word of input, generate alternative OS.
Step 3: input natural number l generates final containing l according to OS k-LASP algorithm (referring to step 3.3) obtained The tree with DS as root of individual node.Three factors will be considered in this step: multiformity amount of attenuation (dv), proportionality increment (pv) and quiescent value (li), they are respectively in connection with drawing a last mark (i.e. dw, pw) the most at last.
Step 3.1: multiformity (Dsize-l)
In order to avoid repeating of the too high analog information of importance, should select to export the diversified information of l bar, so Provide the computational methods of a following multiformity amount of attenuation:
d v ( v i ) = 1 - z ( g ( v i ) ) - 1 l - 1 - - - ( 5 )
Wherein, g (vi) refer to and viSimilar first group node;z(g(vi))-1 refer in size-l OS and viNode phase As unit group node summation;z(g(vi)) refer to g (vi) number of times in size-l OS to be occurred in.dv(vi) codomain be [0,1].Definition dv [z] is that node occurs that in size-l OS the multiformity of z time weakens value, and example makes l=10, and " Marry " goes out Existing 2 times, i.e. z=2, then
Then, the multiformity weights that a node static value in Dsize-lOS is combined with multiformity weakening value are by such as Lower formula calculates:
dw(vi)=li (vi)·dv(vi) (6)
To sum up, providing OS and l, generating a Dsize-l OS needs to meet following condition:
1) the tuple number in Dsize-lOS is l (l≤| OS |);
2) this l node all must be connected with root node;
3) each node viThere is corresponding multiformity weights i.e. dw (vi);
4) a Dsize-l OS collect to be divided into
Im ( D S l ) = &Sigma; v i &Element; D S l d w ( v i ) - - - ( 7 )
Step 3.2: proportionality (Psize-l)
In view of in an OS, a first group node may occur in multiple times, but these nodes may have more weak quiet State value, and their frequency has contact important with DS, thus, proportionality in actual size-l OS to be obtained Increment size can be calculated by equation below:
p q ( v i ) = f r ( g ( v i ) ) &alpha; &CenterDot; z ( g ( v i ) ) + 1 - - - ( 8 )
Wherein, fr (g (vi)) it is g (vi) occur in the number of times in OS;z(g(vi)) refer to g (vi) size-l to be occurred in Number of times in OS;α is a constant that can adjust ratio, typically takes α=2.
Then the proportionality weights that a node static value in Psize-l OS is combined with proportionality increment size by Equation below calculates:
pw(vi)=li (vi)·pq(vi) (9)
To sum up, providing OS and l, generating a Psize-l OS needs to meet following condition:
1) the tuple number in Psize-l OS is l (l≤| OS |)
2) this l node all must be connected with root node
3) each node viThere is corresponding multiformity weights i.e. pw (vi)
4) a Psize-l OS collect to be divided into
Im ( P S l ) = &Sigma; v i &Element; P S l p w ( v i ) - - - ( 10 )
Step 3.3: generate the final tree with DS as root containing l node with k-LASP algorithm;
The maximum average value path of k-LASP (k-Largest Averaged Score Path) i.e. k node is (namely The meansigma methods of k node weights on one paths), in this step, dw and pw primary system is referred to as weight w;Each in OS Individual node viThere is a weight w (vi), corresponding viWith its ancestor node (number n, n=max (k 1, actual (tube) length Degree)) average weight be defined asDuring generating OS, need a Hash table, use HFr table Showing, HFr includes three parts, and one is by viIn i as the numbering of node of graph, two is viNumber of times fr (the v occurred in OSi), Three is viNumber of times z (the v occurred in size-l OSi);In order to preferably manage OS interior joint and corresponding AP value, set up one Queue W preserves these information, and in this queue, the order of node is successively decreased arrangement by corresponding AP value.
K-LASP algorithm generates the process of size-l OS:
1) OS is generated, including building HFr, calculating AP (vi) and generate W
2) if | size-l | < l, 3 are turned), otherwise turn 11)
3)piRepresent the node having maximum AP value in current W to the path of root node,
By piIn the front individual node of l-| size-l | join in size-l OS
4) if | size-l | < l, 5 are turned), otherwise turn 10)
5) by selected piIn the individual node of l-| size-l | remove from OS and W
6) for piDescendant nodes (number n, n=max (k 1, the physical length)) v of each nodejDo and update as follows:
AP (v is updated in OS and Wj) value
7) for piIn each node g (v), if g (v) is at HFr, turn 8), otherwise turn 10);
8)HFr(g(v)).z++
9) the node n for the g of making (n)=g (v) each in OS does and updates as follows:
For each node n in each subtree with node n as rootiDo:
In OS and W, AP (n is updated by HFr (g (v)) .z valuei) value
9) 2 are turned)
11) size-lOS is returned
The l bar tuple information that will retrieve that the size-lOS now returned is the most required.
Through the results show, the experiment effect that this method obtains is notable.
Accompanying drawing explanation
Fig. 1: system flow chart.
Fig. 2: system results figure.
Fig. 3: DBLP database schema figure.
The G generated according to author's tuple information in Fig. 4: DBLPDS
Fig. 5: k-LASP (k=2) algorithm implementation example figure.
Detailed description of the invention
Below in conjunction with relevant drawings the present invention explained and illustrate:
The data set that the present invention uses is DBLP data base, and DBLP is that the interior achievement to research of computer realm is with author The integrated database system of one computer english literature of core chronologically lists the scientific achievement of author, including the world The paper that periodical and meeting etc. are published.Its database schema figure is shown in accompanying drawing 3.
Step one: inspired by link analysis technology PageRank, design static off-line sequence evaluation score, generate all The initial value of node;
The initial value of each node of data set is calculated according to formula (1).
Step 2: input key word generates alternative OS
The present invention uses DBLP data base, and the searching keyword of input is " Michalis Faloutsos ", according to formula (4) quiescent value of each node band affinity, i.e. Im are calculated.The G generated according to author's tuple information in DBLPDSSee accompanying drawing 4, bracket In numerical value represent be and RDSCohesion (marginal value θ=0.7 of selected node).Based on GDSThe portion generated according to Im value Divide alternative OS as shown in the table:
Part OS that tuple generates inquired about for " Michalis Faloutsos " by table 1
Step 3: input natural number l according to what the OS k-LASP algorithm generation obtained finally contained l node with DS is The tree of root.
Step 3.1: multiformity (Dsize-l)
As a example by author, l=10, following table is made to provide quiescent value li according to author's tuple information, according to formula (5), (6) The result (successively decreasing arrangement by dw [1] value) gone out:
Table 2 multiformity based on author information tuple weights
The weights calculating gained according to this table are seen, when C.Faloutsos and M.Mitzenmacher respectively occurs once Weights are respectively 1.8 and 1.4, but when the weights when C.Faloutsos occurs three times occur one time with M.Mitzenmacher Weights be equal be all 1.4, thus result can ensure that the multiformity of output tuple information, it is to avoid similar information Repeat.
Step 3.2: proportionality (Psize-l)
Or as a example by above-mentioned author, make l=10, following table provide quiescent value li according to author's tuple information and appearance Frequency fr, according to formula (8), the result (successively decreasing arrangement by pw [1] value) gone out of (9):
Table 3 proportionality based on author information tuple weights
The weights calculating gained according to this table are seen, quiescent value original for S.Krishnamurthy and C.Faloutsos is divided 0.6 and 1.8, between differ 1.2, but the frequency that S.Krishnamurthy is than C.Faloutsos many 25, recognize thus For having more important than C.Faloutsos, so can find out when they appear at Psize-l OS by S.Krishnamurthy The when of three times, their weights only differ from 0.1, and S.Krishnamurthy is higher than C.Faloutsos.One first group node, It may occur in multiple times, but these nodes may have a more weak quiescent value, and its frequency is actual to be obtained Having contact important with DS in size-l OS, thus result can ensure that this yuan of group node can have at Psize-l OS Individual more suitably position.
Step 3.3: generate the final tree with DS as root containing l node with k-LASP algorithm
As a example by k=2,2-LASP (2-LargestAveragedScore Path), i.e. the maximum average value of two nodes Path, in this step, is referred to as weight w by dw and pw primary system;Each node v in OSiThere is a weight w (vi), therewith Corresponding viIt is defined as AP (v with the average weight of its father nodei);During generating OS, need a Hash table, use HFr Representing, HFr includes three parts, and one is by viIn i as the numbering of node of graph, two is viThe number of times fr occurred in OS (vi), three is viNumber of times z (the v occurred in size-l OSi);In order to preferably manage OS interior joint and corresponding AP value, build A vertical queue W preserves these information, and in this queue, the order of node is successively decreased arrangement by corresponding AP value.
As a example by accompanying drawing 5, Fig. 5 (A) is initial OS, the l=5 of order input.
According to 2-LASP algorithm, first select the node i.e. n that in W, weights are the highest11, select n11To this path of root node p1, path p1In have 3 nodes, first these 3 nodes are added in size-l OS;According to algorithm the 4th step, 0 < 5, forward the 5th to Step, removes these three node from OS and W;Then the child nodes of these three node is updated AP (ni) value:
According to algorithm the 7th step, HFr has node viSo forwarding the 8th step to by HFr (g (v)) .z++, thenFurther according to formulaUpdate AP (ni) Value, asOther nodes are similar to.
To sum up complete to update for the first time, shown in result such as Fig. 5 (B), then carry out next update further according to above-mentioned algorithm, Shown in final updated result such as Fig. 5 (C).So obtain final size-5OS.
Also by searching keyword be as a example by " Michalis Faloutsos ", the size-15OS drawn see accompanying drawing 2 (A) and Dsize-15OS is shown in that accompanying drawing 2 (B), Psize-15OS are shown in accompanying drawing 2 (C).Through experimental investigation gained Dsize-15OS and Psize- 15OS more meets user's request.

Claims (1)

1. a keyword retrieval method based on multiformity and proportionality, it is characterised in that: the enforcement step of the method is such as Under,
Step one: inspired by link analysis technology PageRank, design static off-line sequence evaluation score, generate all nodes Initial value;
Step 1.1: collect and disposal data collection, builds data relationship;At this moment definition directed graph G (V, E), wherein V (v1,..., vn) it is set of node, node on behalf various information here, E is the set of representative edge, E={ < vi,vj>|vi,vj∈ V}, < vi,vj > represent from viTo vjLimit, i.e. a viInformation can be linked to vj
Step 1.2:r is a vector i.e. queue of the evaluation score of each the page, the most each node viAll exist corresponding ri, then the evaluation score of iterative computation vector r is carried out by below equation:
r = d A r + ( 1 - d ) e | V | - - - ( 1 )
Wherein d is the damped coefficient of (0,1), and this coefficient ensure that and obtains more accurate result, and general value is 0.85; A is a n*n matrix, and n represents number of vertices, if wherein existing from viTo vjLimit (arc), thenRepresent vj Out-degree), be otherwise 0, say, that if there being three nodes, then A is a 3*3 matrix, v0To v1And v2There are limit and v1To v2 There is limit, thenAnd A21=1, remaining is all 0;E=[1....1]T;| V | is number of vertices;
To sum up, iterative computation goes out the evaluation score of each node in data set, and at this moment this value is referred to as the overall situation weights, i.e. gi (vi) represent viThe initial value of node;Overall situation weights global importance, is abbreviated as gi;
Step 2: input key word generates alternative OS;
Step 2.1: input key word (i.e. DS), system generation one is (i.e. R with DS summit as root nodeDS), with energy and RDSLink The tree that relation is descendants, i.e. OS;In order to distinguish each unit group node v in OS during generating OSiImportance, will One local weight (local importance, be abbreviated as li) be by this tuple overall weights (gi) in data base and This tuple in OS and RDSAffinity two parts determined;Affinity is Affinity, is abbreviated as Af;
Step 2.2: in generating OS, GDSIn with RDSThe relation having higher affinity will be added in OS, RiTo RDSAffine Degree Af (Ri) by below equation iterative computation:
A f ( R i ) = &Sigma; j w j m j &CenterDot; A f ( R P a r e n t ) - - - ( 2 )
Wherein j is a scope, and this scope is index set (m1,m2,...,mn) and its corresponding weights set (w1, w2,...,wn), four indexs of consideration here: index m1For RiTo RDSDistance, namely the distance between two relations is the least, Affinity is the highest;Index m2For the relative radix of relation, namely RiWith RPatentIn the average tuple that is connected of each tuple Quantity;Index m3Anti-phase to radix for relation, i.e. RPatentWith RiIn a tuple be connected par;Index m4For Ri The connectedness of pattern, i.e. RiThe quantity of the link in graph of a relation;Af(RParent) refer to RiFather's node and RDSAffine Degree, initial value is 1, i.e. RDSThe affinity of itself is 1;The fraction range of index is [0,1], the summation of corresponding weights be 1 (on Stating four corresponding weights of index is all 0.25);And in the generation of OS, the affinity of all relation nodes all should be higher than One marginal value θ;
Step 2.3: the formula of the importance Im (S) calculating alternative size-l OS S is:
Im ( S ) = &Sigma; n i &Element; S Im ( O S , R i ) - - - ( 3 )
Wherein Im (OS, Ri) it is OS interior joint RiLi value, Im (OS, Ri) can be calculated by below equation:
Im(OS,Ri)=Im (Ri)·Af(Ri) (4)
Wherein, Im (Ri) it is RiGl value, Af (Ri) it is RiTo RDSAffinity;
To sum up calculate Im value according to the key word of input, generate alternative OS;
Step 3: input natural number l generates final containing l joint according to OS k-LASP algorithm (referring to step 3.3) obtained The tree with DS as root of point;To consider three factors in this step: multiformity amount of attenuation (dv), proportionality increment (pv) and Quiescent value (li), they are respectively in connection with drawing a last mark (i.e. dw, pw) the most at last;
Step 3.1: multiformity (Dsize-l)
In order to avoid repeating of the too high analog information of importance, should select to export the diversified information of l bar, so providing The computational methods of one following multiformity amount of attenuation:
d v ( v i ) = 1 - z ( g ( v i ) ) - 1 l - 1 - - - ( 5 )
Wherein, g (vi) refer to and viSimilar first group node;z(g(vi))-1 refer in size-l OS and viNode is similar The summation of unit's group node;z(g(vi)) refer to g (vi) number of times in size-l OS to be occurred in;dv(vi) codomain be [0,1]; Definition dv [z] is that node occurs that in size-l OS the multiformity of z time weakens value, and example makes l=10, and " Marry " occurs 2 times, I.e. z=2, then
Then, the multiformity weights that a node static value in Dsize-lOS is combined with multiformity weakening value are by following public Formula calculates:
dw(vi)=li (vi)·dv(vi) (6)
To sum up, providing OS and l, generating a Dsize-l OS needs to meet following condition:
Tuple number in 1.Dsize-lOS is l (l≤| OS |);
2. this l node all must be connected with root node;
3. each node viThere is corresponding multiformity weights i.e. dw (vi);
4. a Dsize-l OS collect to be divided into
Im ( D S l ) = &Sigma; v i &Element; D S l d w ( v i ) - - - ( 7 )
Step 3.2: proportionality (Psize-l)
In view of in an OS, a first group node may occur in multiple times, but these nodes may have more weak static state Value, and their frequency has contact important with DS in actual size-l OS to be obtained, thus, proportionality increases Value can be calculated by equation below:
p q ( v i ) = f r ( g ( v i ) ) &alpha; &CenterDot; z ( g ( v i ) ) + 1 - - - ( 8 )
Wherein, fr (g (vi)) it is g (vi) occur in the number of times in OS;z(g(vi)) refer to g (vi) in size-l OS to be occurred in Number of times;α is a constant that can adjust ratio, typically takes α=2;
Then the proportionality weights that a node static value in Psize-l OS is combined with proportionality increment size are by as follows Formula calculates:
pw(vi)=li (vi)·pq(vi) (9)
To sum up, providing OS and l, generating a Psize-l OS needs to meet following condition:
Tuple number in 1.Psize-l OS is l (l≤| OS |)
2. this l node all must be connected with root node
3. each node viThere is corresponding multiformity weights i.e. pw (vi)
4. a Psize-l OS collect to be divided into
Im ( P S l ) = &Sigma; v i &Element; P S l p w ( v i ) - - - ( 10 )
Step 3.3: generate the final tree with DS as root containing l node with k-LASP algorithm;
(namely one, the maximum average value path of k-LASP (k-Largest Averaged Score Path) i.e. k node The meansigma methods of k node weights on path), in this step, dw and pw primary system is referred to as weight w;Each joint in OS Point viThere is a weight w (vi), corresponding viWith its ancestor node (number n, n=max (k 1, physical length)) Average weight is defined asDuring generating OS, need a Hash table, represent with HFr, HFr bag Including three parts, one is by viIn i as the numbering of node of graph, two is viNumber of times fr (the v occurred in OSi), three is vi? Number of times z (the v occurred in size-l OSi);In order to preferably manage OS interior joint and corresponding AP value, set up a queue W Preserving these information, in this queue, the order of node is successively decreased arrangement by corresponding AP value;
K-LASP algorithm generates the process of size-l OS:
1) OS is generated, including building HFr, calculating AP (vi) and generate W
2) if | size-l | < l, 3 are turned), otherwise turn 11)
3)piRepresent the node having maximum AP value in current W to the path of root node,
By piIn the front individual node of l-| size-l | join in size-l OS
4) if | size-l | < l, 5 are turned), otherwise turn 10)
5. by selected piIn the individual node of l-| size-l | remove from OS and W
6) for piThe descendant nodes v of each nodejDo and update as follows:
AP (v is updated in OS and Wj) value;The number of descendant nodes is n, n=max (k 1, physical length);
7. for piIn each node g (v), if g (v) is at HFr, turn 8), otherwise turn 10);
8.HFr(g(v)).z++
9) the node n for the g of making (n)=g (v) each in OS does and updates as follows:
For each node n in each subtree with node n as rootiDo:
In OS and W, AP (n is updated by HFr (g (v)) .z valuei) value
9. turns 2)
11) size-l OS is returned
The l bar tuple information that will retrieve that the size-l OS now returned is the most required;
Through the results show, the experiment effect that this method obtains is notable.
CN201610218405.1A 2016-04-09 2016-04-09 A kind of keyword retrieval method based on diversity and proportionality Expired - Fee Related CN105912646B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610218405.1A CN105912646B (en) 2016-04-09 2016-04-09 A kind of keyword retrieval method based on diversity and proportionality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610218405.1A CN105912646B (en) 2016-04-09 2016-04-09 A kind of keyword retrieval method based on diversity and proportionality

Publications (2)

Publication Number Publication Date
CN105912646A true CN105912646A (en) 2016-08-31
CN105912646B CN105912646B (en) 2019-03-26

Family

ID=56745639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610218405.1A Expired - Fee Related CN105912646B (en) 2016-04-09 2016-04-09 A kind of keyword retrieval method based on diversity and proportionality

Country Status (1)

Country Link
CN (1) CN105912646B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649846A (en) * 2016-12-30 2017-05-10 北京工业大学 Geographic space interest point retrieval method based on diversity
CN106951517A (en) * 2017-03-19 2017-07-14 北京工业大学 The diversity querying method of document in narrow scope

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101140588A (en) * 2007-10-10 2008-03-12 华为技术有限公司 Method and apparatus for ordering incidence relation search result
CN102214216A (en) * 2011-06-07 2011-10-12 复旦大学 Aggregation summarization method for keyword search result of hierarchical relation data
CN102222115A (en) * 2011-07-12 2011-10-19 厦门大学 Method for analyzing edge connectivity of research hotspot based on keyword concurrent

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101140588A (en) * 2007-10-10 2008-03-12 华为技术有限公司 Method and apparatus for ordering incidence relation search result
CN102214216A (en) * 2011-06-07 2011-10-12 复旦大学 Aggregation summarization method for keyword search result of hierarchical relation data
CN102222115A (en) * 2011-07-12 2011-10-19 厦门大学 Method for analyzing edge connectivity of research hotspot based on keyword concurrent

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马力: "基于聚类分析的网络用户兴趣挖掘方法研究", 《中国博士学位论文全文数据库》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649846A (en) * 2016-12-30 2017-05-10 北京工业大学 Geographic space interest point retrieval method based on diversity
CN106649846B (en) * 2016-12-30 2019-12-20 北京工业大学 Geographic space interest point retrieval method based on diversity
CN106951517A (en) * 2017-03-19 2017-07-14 北京工业大学 The diversity querying method of document in narrow scope
CN106951517B (en) * 2017-03-19 2020-06-19 北京工业大学 Method for inquiring diversity of documents in narrow range

Also Published As

Publication number Publication date
CN105912646B (en) 2019-03-26

Similar Documents

Publication Publication Date Title
CN101630314B (en) Semantic query expansion method based on domain knowledge
Wojnar et al. Structural and semantic aspects of similarity of document type definitions and XML schemas
CN103678412B (en) A kind of method and device of file retrieval
CN107247745A (en) A kind of information retrieval method and system based on pseudo-linear filter model
Bergamaschi et al. QUEST: A keyword search system for relational data based on semantic and machine learning techniques
CN101807211B (en) XML-based retrieval method oriented to constraint on integrated paths of large amount of small-size XML documents
CN102955833A (en) Correspondence address identifying and standardizing method
US20130218898A1 (en) Mechanisms for metadata search in enterprise applications
CN106294418B (en) Search method and searching system
Lin et al. A method of extracting the semi-structured data implication rules
CN101650729A (en) Dynamic construction method for Web service component library and service search method thereof
CN102760140A (en) Incident body-based method for expanding searches
CN103020283B (en) A kind of semantic retrieving method of the dynamic restructuring based on background knowledge
CN105912646A (en) Keyword retrieval method based on diversity and proportion characteristics
CN102043802B (en) Method for searching XML (Extensive Makeup Language) key words based on structural abstract
CN115982390A (en) Industrial chain construction and iterative expansion development method
CN105912649A (en) Database fuzzy retrieval method and system
WO2007075157A1 (en) Techniques for knowledge discovery by constructing knowledge correlations using concepts or terms
Ning et al. Efficient processing of top-k twig queries over probabilistic XML data
Rifaieh et al. A matching algorithm for electronic data interchange
Chien et al. A lexical decision tree scheme for supporting schema matching
Weninger et al. Building enriched web page representations using link paths
CN109446440B (en) Deep network query interface integration method, system, computing device and storage medium
Barzilay Graph-based Algorithms in NLP
Sinha et al. Design and development of a Bangla semantic lexicon and semantic similarity measure

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190326

CF01 Termination of patent right due to non-payment of annual fee