WO2013143141A1 - Stratégies d'affinage d'étiquettes pour systèmes d'étiquetage social - Google Patents

Stratégies d'affinage d'étiquettes pour systèmes d'étiquetage social Download PDF

Info

Publication number
WO2013143141A1
WO2013143141A1 PCT/CN2012/073403 CN2012073403W WO2013143141A1 WO 2013143141 A1 WO2013143141 A1 WO 2013143141A1 CN 2012073403 W CN2012073403 W CN 2012073403W WO 2013143141 A1 WO2013143141 A1 WO 2013143141A1
Authority
WO
WIPO (PCT)
Prior art keywords
tags
tag
subset
relativity
similarity score
Prior art date
Application number
PCT/CN2012/073403
Other languages
English (en)
Inventor
Bin Cui
Junjie Yao
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to US13/980,573 priority Critical patent/US9411875B2/en
Priority to PCT/CN2012/073403 priority patent/WO2013143141A1/fr
Publication of WO2013143141A1 publication Critical patent/WO2013143141A1/fr
Priority to US15/197,458 priority patent/US20160306805A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • social tagging may be a collaborative method in which online users may provide descriptive words to mark the contents that are either uploaded or viewed by them.
  • hash-tags that are used by tweeter users to annotate their tweets.
  • a method for refining homonyms and synonyms in a plurality of tags may include receiving, by a tag refinement system, a plurality of tagging actions each of which associates one or more of the plurality of tags with a digital object.
  • the method may further include extracting, by the tag refinement system, a first subset of tags from the plurality of tags, wherein the first subset of tags have a higher collective relativity-similarity score comparing to a second subset of tags selected from the plurality of tags, and the first subset of tags, different from the second subset of tags, have a same predetermined tag count as the second subset of tags.
  • the method includes receiving, by a tag refinement system, a plurality of tagging actions each of which associates one or more of the plurality of tags with a digital object.
  • the method may also include generating, by the tag refinement system, a tag graph having a plurality of nodes linked by a plurality of edges, wherein each of the plurality of nodes is associated with one of the plurality of tags, and each of the plurality of edges is associated with a corresponding co-occurrence relationship existed in the plurality of tagging actions.
  • the method may further include extracting, by the tag refinement system, a first subset of tags from the plurality of tags by recursively processing the tag graph to select nodes based on their respective relativity-similarity scores, wherein the first subset of tags have a higher collective relativity-similarity score comparing to a second subset of tags that are selected from the plurality of tags.
  • a system for refining homonyms and synonyms in a plurality of tags includes a tag list for storing a plurality of tagging actions each of which associates one or more of the plurality of tags with a digital object.
  • the system may further include a tag refinement system coupled with the tag list for extracting a first subset of tags from the plurality of tags, wherein the first subset of tags have a higher collective relativity-similarity score comparing to a second subset of tags selected from the plurality of tags, and the first subset of tags, different from the second subset of tags, have a same predetermined tag count as the second subset of tags.
  • FIG. 1 shows a block diagram of an operational environment, in which illustrative embodiments of a tag refinement system are presented;
  • FIG. 2 shows illustrative embodiments of tag summarization and tag graph ;
  • Fig. 3A-3B shows pseudo codes for illustrative embodiments of an
  • FIG. 4 shows a flow diagram of an illustrative embodiment of a process for implementing a tag refinement strategy
  • FIG. 5 shows an illustrative embodiment of an example computer program product
  • FIG. 6 shows a block diagram of an illustrative embodiment of an example computer system, all arranged in accordance to at least some embodiments of the present disclosure.
  • a "tag” may be referring to a label that is associated with a specific digital object.
  • a digital object may be a digitized piece of information (e.g., without limitation, text, file, web page, image, video, sound, tweets) that is identified by a universal resource locator (U RL)
  • U RL universal resource locator
  • a tag may be a word or a short sentence to describe, annotate, and provide context to, the specific digital object.
  • a "tagging action”, or “tagging” may refer to an action to annotate/associate one or more tags with the specific digital object.
  • a tagging action may associate a first tag, "news,” and a second tag, "media provider,” to this digital object. Afterward, by looking at the tags "news” and/or "media provider", any user may be able to quickly grasp the context of the digital object that is referenced by the above URL.
  • Fig. 1 shows a block diagram of an operational environment, in which illustrative embodiments of a tag refinement system are presented.
  • a client system 1 10 may be operating standalone or communicating with a target system 120 via a network 130.
  • the client 1 10 or the target system 120 may be a computer system or a client program executing on a computer system.
  • Exemplary client 1 10 or target system 120 may include, without limitation, conventional personal computer (PC), workstation, laptop, tablet PC, handheld computing/communication device, cell phone, smart phone, or a similar device.
  • PC personal computer
  • the network 130 may be, for example, a local area network (LAN), wide area network (WAN), metropolitan area network (MAN), global area network such as the Internet, a mobile network (e.g., GSM, CDMA, 3G), or any combination of such interconnects.
  • LAN local area network
  • WAN wide area network
  • MAN metropolitan area network
  • GSM Global System for Mobile communications
  • 3G global area network
  • GSM Global System for Mobile communications
  • the client 1 10 may request for one or more digital objects that are either stored in the client 1 10, or located on the target system 120.
  • the client 1 10 may transmit a user request 1 1 1 to the target system 120.
  • the user request 1 1 1 may also be originated from a user-invoked or an event- triggered software program running on the client 1 10.
  • the user request 1 1 1 may be routed by the network 130 to the target system 120.
  • the target system 120 may contain a web server application such as Apache® HTTP Server, or Microsoft® Internet Information Server, etc, to process the user request 1 1 1 in HTTP.
  • the target system 120 may contain customized software programs to handle the user request 1 1 1 .
  • the client 1 10 may utilize a tag refinement system 140 for tagging and managing the digital objects it requests for.
  • a tag refinement system 140 for tagging and managing the digital objects it requests for.
  • a user on the client 1 10 wants to tag a specific digital object, it may initiate a tagging action which contains the one or more tags the user choose, and the URL for the specific digital object to be tagged.
  • the tagging action may be transmitted (1 12) to the tag refinement system140, while the specific digital object may be stored in the client 1 10 or located at the remote target system 120.
  • the tag refinement system 140 may optionally validate the existence of the specific digital object for the client 1 10 via a confirmation request 140.
  • the tag refinement system 140 may process the tagging action and store the relevant information to a tag list 150. Afterward, the original client 1 10 or another client 1 10 may access the tag list 150 to evaluate, update, and/or retrieve the tags that are associated with different digital objects.
  • a first tagging action may be received to tag a digital object identified by a URL (http://aaa%) with two tags (C# and CSharp). That is, the tagging action may be provided by a user on the client 1 10 to associate the two tags with the digital object. Afterward, the tags may be used to identify and provide context to the digital object identified by the URL.
  • the tag refinement system 140 may receive multiple tagging actions from one or more clients 1 10 for tagging the same digital object.
  • a second tagging action may use three tags (C#, Programming, and Reference) for annotating the digital object that is identified by the same URL (http://aaa).
  • a third tagging action may use tags "Programming" and "CSharp" for tagging the same digital object.
  • the tag list 150 may also store other tags for a different digital object (e.g., one that is identified by the URL http://bbb).
  • the tag refinement system 140 may refine the multiple tags in the tag list 150 that are associated with a common digital object, so that a subset of the tags (or "tag subset") may be extracted from the multiple tags.
  • the subset of the tags may not only represent the multiple tags for identifying the content of the digital object, but also minimize the noise and redundancy that may exist in the multiple tags.
  • the tag refinement system 140 may extract two tags (C# and Programming) from the multiple tags in the tag list 150 to represent the digital object. In this case, the extracted two tags may be deemed a tag subset that substantially covers most, if not all of the tags that reference the digital object. In the extracted tag subset, "CSharp", which is synonym to C#, is not included.
  • the extracted tag subset may select one of the homonyms, such as ".net” and "dotnet”.
  • the tag refinement system 140 may refine the homonyms and synonyms that may be present in the tag list 150. The details of the tag refinement system 140 may be further described below.
  • the tag refinement system 140 may include one or more processors 1 60, memory 170, and other system components.
  • the processors 1 60 may include one or more processors 1 60, memory 170, and other system components.
  • processor(s) 160 may include central processing units (CPUs) for controlling the overall operation of the tag refinement system 140.
  • the processor(s) 1 60 accomplish this by executing software or firmware stored in memory 170.
  • the memory 170 is or includes the main memory of the tag
  • the memory 170 may contain, among other things, a set of machine instruments which, when executed by processor 1 60, causing the processor 160 to perform embodiments of the present disclosure.
  • Fig. 2 shows illustrative embodiments of tag summarization and tag graph.
  • a tag summarization window 210 which may be maintained and displayed by a tag refinement system, may be configured to show a summarized view of a set of tags being associated with a particular digital object 21 1 .
  • the digital object 21 1 may be an online book that is accessible by a URL address.
  • the tag refinement system may store the tags in a tag list (not shown in Fig. 2), and generate a summarized view of the tags for the digital object 21 1 in the tag summarization window 210.
  • the exemplary tag summarization window 210 shows multiple users have supplied various tags for tagging the digital object 21 1 .
  • a first user may annotate the digital object 21 1 with a tag "C#.”
  • a second user may annotate the digital object 21 1 with three tags ".net”, "C#", and "Tutorial” at the same time.
  • all tags that have been introduced by users for tagging the digital object 21 1 may be listed and sorted based on the corresponding tagging counts.
  • tag “C#” has a tag count of 680, meaning the tag “C#” has been used 680 times in various tagging actions for tagging the digital object 21 1 .
  • the tagging actions may be deemed a form of social collaboration having a census nature.
  • tags e.g., "C#”, “Programming” and “Reference”
  • tags that are redundant. For example, ".net” and “dotnet” are homonyms with different forms.
  • tags “Articles”, “Reference”, and “Howto” may also have very generic meanings which do not provide substantial information. In some extreme cases, there may be tags that are noise, as they may give either a misleading or wrong meaning to the content of the digital object 21 1 . Typical of this kind of collaborative annotation process, since different users may have different perspectives, tags used to describe the same concept may be vastly different. Therefore, a summarized subset of tags may be valuable to provide a more concise representation of the digital object 21 1 .
  • the goal to extract a subset of meaningful and representative tags from all the tags that reference the same digital object may be characterized as a tag extraction problem.
  • the annotation of one or more tags to a digital object may be defined as a tagging action denoted by a triple ( T, u, o), where a user u ⁇ U assigns multiple tags (U, t 2 , . . . , t sugar ⁇ T to a digital object o ⁇ O.
  • each tag in the tag set T rnay have a tag count showing how many times it has been used in the tagging actions for annotating o.
  • a first tag may have co-occurrence relations with a second tag when the first tag and the second tag are both used for annotating the same digital object in one or more tagging actions.
  • T, u, o there exists one co-occurrence relationship when a tagging action may use these two tags for annotating the same digital object o.
  • two scoring functions may be used to measure the above tag space contributed by massive tagging actions.
  • the popularity of tags may be defined by a relativity scoring function w(t,o): TxO ⁇ R+, in which the greater the tag count for a specific tag t, the greater the relativity score of w(t,o) for the tag f with respect to a digital object o.
  • the diversity of tags to the targeted digital object may be shown by a similarity scoring function s(t1 , t2) : Ux U ⁇ R+, in which the higher correlation between two tags, the greater the similarity score of s(t1 , t2) for the two tags t1 and t2.
  • a tag graph 220 may be used to illustrate the tags and the relationships among these tags.
  • the oval-shaped nodes may represent tags, and the edges connecting the nodes may represent the cooccurrence relationships among the tags.
  • the node 221 may represent tag "C#", with a weight value (relativity score) that is calculated by the above relativity scoring function w(t,o).
  • the edge 222 may represent the cooccurrence relationship between tags "C#” and “Tutorial”, with a weight value (similarity score) that is calculated based on the above similarity scoring function s(t1 , t2).
  • the tag extraction problem may be characterized as finding a subset of tags S k T that may represent the tag list T and the underlying tagged digital object.
  • An ideal subset of tags (or "tag subset") should be a good
  • the ideal tag subset should have high relativity scores and high similarity scores within the tag set T.
  • the high relativity scores may mean the tag subset may have high coverage and usages by the users. Tags most frequently used by most of the users may be good candidates to act as indicators to the general topics in the tag set T.
  • the high similarity scores may mean that in the tag set T, the tag subset has better overall popularity comparing to the rest of tags in the tag set T.
  • the similarity scores ensure the chosen tags in the tag subset are different between them and at the same time cover more facets of the digital object. Based on the above scores, the tag subset may be valuable in helping the users to quickly grasp the characteristics of the digital object.
  • a multi-objective function for extracting the tag subset S from the tag set T and provide a solution to the tag extraction problem may be designed as the following:
  • the previously defined similarity scoring function may be abbreviated from w(t , o) to w(t) in the above function.
  • the s(t , t') in the above function measures the similarity between tag t and tag t', where tag t belongs to the selected tag subset S, and tag t' is a tag that does not belong to S and has the maximal similarity score with tag t.
  • the output of the above multi- objective function may be deemed a relativity-similarity score, in which the relativity score and the similarity score for tag f are both taken into the consideration.
  • the sum of the relativity-similarity scores of all the tags in the subset S may be deemed a collective relativity-similarity score for the subset S.
  • the above multi-objective function is monotonically linear with coverage metrics w(t).
  • the value of the multi- objective function is related to the coverage metrics w(t).
  • the multi- objective function may be regarded as w(t)'s function of the first degree, so it is monotonically linear with coverage metrics w(t).
  • this multi- objective function may take the similarity scores between neighboring tags into consideration, namely the s(t,t') scores. As the similarity scores measure the overall graph connections, this multi-objective function may cover the graph connection information.
  • the above multi-objective function combines the enumerations of the possible candidate tag subsets, and finds the best (if not possible, then the optimal) solution in an optimization function similar to the following one, in which S * is the optimal solution for the tag extraction problem:
  • the number of possible tag subsets S having k elements may grow exponentially with the increasing of k
  • the searching through all possible combinations of tags to find the best possible tag subset may not be tolerable, and the calculation may further increase exponentially with the increasing of n elements in tag set T.
  • finding a best solution for the above tag extraction function may be a NP-hard enumeration problem.
  • an approximation algorithm to select top k representative tags for the tag subset may be used, along with additional search- pruning techniques in order to reduce the searching space. The details of finding the approximation solution may be further described below.
  • extracting the tag subset from the tag set 7 ⁇ may be illustrated as finding a partition in the tag graph 220.
  • tag graph 220 after extraction, the dot-filled nodes may be deemed a member of the tag subset, and the unfilled nodes may be excluded from the tag subset.
  • a tag subset containing four tags C#, Programming, tutorial, and Articles
  • the synonyms and homonyms such as "CSharp" or "dotnet" are not selected for the tag subset.
  • Fig. 3A-3B show pseudo codes for illustrative embodiments of an
  • a tag refinement system may first generate a tag graph (similar to the tag graph 220 of Fig. 2) for the tag set with nodes representing the tags, and the edges representing the co-occurrence relationships existed in the tag set.
  • the tag refine system may further populate the nodes with relativity scores calculated using the above relativity scoring function w(. , .), and populate the edges with similarity scores generated using the above similarity scoring function s(. , .).
  • the tag subset may have a refined size of k, which is a predetermined tag count that has a fixed value and is substantially smaller than the total number of tags in the tag set.
  • the tag refinement system may sort the nodes of the tag graph in a sorting order (e.g., descending) based on the nodes' corresponding relativity scores.
  • the variable state may be used to store tag graph having nodes being selected for the tag subset, the variable BestScore is for storing a temporary best score, and the variable BestState may store the tag graph that have all tags for the tag subset identified.
  • the above variables may be set to an initial value: state is set to the tag graph having no tag selected for tag subset; BestScore is set to the minimum score (e.g., 0); and
  • BestState is set to be empty.
  • the core searching function SearchDepthFirst ⁇ may be invoked by the tag refinement system to find the optimal tag subset.
  • SearchDepthFirst may employ the depth-first recursive search methods with pruning, in order to find the optimal tag subset that maximizes the objectives described above.
  • the SearchDepthFirst ⁇ may extract those tags from the BestState (which contains the tag graph having k number of optimal tags identified) into tag subset 5*.
  • the tag subset * may be deemed the refined subset of tags, and may be outputted from the process in Fig. 3A.
  • Fig. 3B provides additional details to the above searching function
  • the process in Fig. 3B estimates the multiple conditions in the connected search range, while the actual partition operation is embedded in the selected search range and comparison.
  • the input state may be the tag graph state that has no tags selected.
  • the BestScore and BestState variables may be correlated to each other, and may be deemed the outputs.
  • the variable SelCount is a counter that stores the number of tags that are already selected for the tag subset.
  • SearchRange may be a tag set that contains all the current unselected tags.
  • the function GetScore() implements the above-mentioned multi-objective function f1 () calculated based on the given state variable.
  • the function GetBound() may be an estimating function that gives the maximal possible values of the objective function (f1 ()) based on a specific given state. For example, given one specific state, the function GetBound() may calculate the upper bound of the multi-object function f1 (), which may be deemed a temporary value that holds a potentially highest possible value for the specific state. Based on the temporary value, a pruning process may be adapted. That is, if this potentially highest possible value of this state is less than a score obtained from another state that is currently considered to be the "best" state, then this particular state is
  • the number of already selected tags is stored in the variable SelCount. If SelCount equals the predetermined tag count / for the refined tag subset, it means that a new tag subset is found and may potentially be the tag subset to represent the original set of tags. Next, this new tag subset may be evaluated using the multi-objective function to determine whether under this given state, this new tag subset has a higher value than the current BestScore. If the new tag subset has a higher value, meaning the new tag subset is better than the current best set of tags in BestState, the tag graph in the variable state may be assigned to BestState, and the higher value may be stored as the new BestScore. Afterward, the process in Fig. 3B returns back to the process in Fig. 3A. Otherwise, the current un- chosen tags may be added to the tag set SearchRange. For each tag i in the
  • the maximal possible value for the multi-objective function f1 () may be calculated under the current chosen tags and tag / ' . If the value outputted from the f1 () function is lower than the BestScore, there is no need to choose tag i any more. In this way, the tag / ' is "pruned", and the depth-first-search will not be conducted based on the tag / ' . The depth-first search may continue for the tags that have a multi-object function f1 () value that is higher than the current BestScore.
  • s(t,t') generates a similarity score which measures the similarity between tag t and tag t', and has a value between 0 and 1 . If the similarity score equals 1 , it means that tag t is the same as tag t'. If the similarity score equals 0, it means that tag t is so different from tag t' that these two tags share no similarity. The higher the value of function s(), the more similarity tag t and tag t' may share.
  • cosine function which is defined below, may be used as one kind of similarity function s(t,t'):
  • Fig. 4 shows a flow diagram of an illustrative embodiment of a process for implementing a tag refinement strategy.
  • the process 401 may include one or more operations, functions, or actions as illustrated by blocks 410, 420, 430, 440, 450, 460, and/or 470, which may be performed by hardware, software and/or firmware.
  • the various blocks are not intended to be limiting to the described embodiments.
  • the functions performed in the processes and methods may be implemented in differing order.
  • machine-executable instructions for the process 401 may be stored in memory, executed by a processor, and/or implemented in a computer system of Fig. 1 .
  • a tag refinement system may receive a plurality of tagging actions from a client via a network.
  • the tagging actions may contain a plurality of tags, and each of the plurality of tagging actions associates one or more of the plurality of tags with a digital object.
  • the tag refinement system may generate a tag graph having a plurality of nodes linked by a plurality of edges. Each of the plurality of nodes may be associated with one of the plurality of tags, and each of the plurality of edges is associated with a corresponding co-occurrence
  • the tag refinement system may assign each of the plurality of nodes in the tag graph with a relativity score.
  • the relativity score for a specific node may be determined based on a number of occurrences of a tag, which is corresponding to the specific node, in the plurality of tagging actions.
  • the tag refinement system may assign each of the plurality of edges with a similarity score.
  • the similarity score for a specific edge may be determined based on a number of co-occurrence relationships between the two tags corresponding to the two nodes that are linked by the specific edge.
  • the tag refinement system may select a first node from the plurality of nodes, and calculate a first relativity-similarity score for the first node's corresponding tag.
  • the first node is selected from a list of nodes that are sorted in a descending order (from largest to the smallest) based on the nodes' relativity scores stored in the tag graph. In other words, the first node may have the highest relativity scores among the plurality of tags.
  • the tag refinement system may calculate the first relativity-similarity score based on the first node's relativity score and the similarity scores of all the edges that are connected with the first node in the tag graph.
  • the relativity-similarity score may be generated by adding the relativity score and a relativity value, which is calculated by multiplying the relativity score with a highest similarity score selected among the edges that are connected with the first node, as illustrated by the multi-objective function shown above.
  • the tag refinement system may evaluate the first relativity- similarity score against a temporary score (e.g., the BestScore as shown in Fig. 3B). If the first relativity-similarity score is higher than the temporary score, then the first node's corresponding tag may be selected as one of the subset of tags. And the first relativity-similarity score may be set as the new temporary score. Alternatively, if the first relativity-similarity score is not high compared with the temporary score, then the tag refinement system may select another node from the sorted list of nodes, and perform operations similar to ones in block 430.
  • a temporary score e.g., the BestScore as shown in Fig. 3B.
  • the first relativity-similarity score may be compared with relativity-similarity scores of those of the plurality of tags that do not belong to the subset of tags, and if the first relativity-similarity score is higher, then the tag associated with the first node may be selected as one of the subset of tags.
  • the tag refinement system may select a second node by recursively traversing the tag graph via the edges starting from the first node.
  • a depth-first-search may be conducted by starting from the first node and recursively traversing the second-level nodes that are connected via edges to the first node.
  • nodes on third or additional levels may be similarly
  • the second node may be the one that is connected with the first node, and has the highest relativity score.
  • the second node may be the one that has the highest number of co-occurrence relationships with the first node.
  • the second node may be selected from the plurality of nodes based on the sorting order in the plurality of edges.
  • the tag refinement system may calculate a second relativity-similarity score for the second node, similar to the calculation of the first relativity-similarity score performed at block 430.
  • the tag refinement system may compare the second relativity- similarity score to a temporarily score or relativity-similarity scores of those tags that do not belong to the subset of tags. If the second relativity-similarity score is deemed higher, the second node's associated tag may be selected as one of the subset of tags. If the subset of tags already contains a predetermined tag count of tags, then one of the subset of tags which has a relativity-similarity score that is lower than the second relativity-similarity score may be replaced by the second node's associated tag.
  • the tag refinement system may prune the sub-branches of the tag graph, which are connected with the second node, from further recursive traversing. That is, no sub-level nodes that are connected via edges to the second node may be further traversed.
  • Such an approach may greatly simplify the tag refinement and extract process.
  • the tag refinement system extracts the subset of tags having a predetermined tag count from the plurality of tags by recursively processing the tag graph.
  • the extracted subset of tags may have high collective relativity-similarity scores than any other set of tags that have the same predetermined tag count of tags.
  • the tag refinement system may extract the subset of tags by selecting a first subset of tags having the predetermined tag count from the tags.
  • the tag refinement system may then calculate a collective relativity-similarity score for the first subset of tags by summing up the corresponding relativity- similarity score of each tag in the first subset of tags.
  • the first collective relativity-similarity score is higher than a corresponding collective relativity-similarity score of a different subset of tags (selected from the plurality of tags, having the same predetermined tag count, but not being identical to the first subset of tags)
  • the first subset of tags may be deemed the subset of tags.
  • the tag refinement system may select a second subset of tags that is different from the first subset of tags but have the same predetermined tag count. Afterward, a second collective relativity-similarity score may be calculated for the second subset of tags. If the second collective relativity- similarity score is higher than the first collective relativity-similarity score, then the second subset of tags may be deemed the subset of tags in lieu of the first subset of tags.
  • Fig. 5 is a block diagram of an illustrative embodiment of a computer program product 500 for implementing a method for tag refinement strategies.
  • Computer program product 500 may include a signal bearing medium 502.
  • Signal bearing medium 502 may include one or more sets of executable instructions 504 that, when executed by, for example, a processor, may provide the functionality described above.
  • the computer system may undertake one or more of the operations shown in at least Fig. 4 in response to the instructions 504.
  • signal bearing medium 502 may encompass a non- transitory computer readable medium 506, such as, but not limited to, a hard disk drive, a Compact Disc (CD), a Digital Versatile Disk (DVD), a digital tape, memory, etc.
  • signal bearing medium 502 may encompass a recordable medium 508, such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, etc.
  • signal bearing medium 502 may encompass a communications medium 510, such as, but not limited to, a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
  • computer program product 500 may be wirelessly conveyed to the computer system 1 10 by signal bearing medium 502, where signal bearing medium 502 is conveyed by communications medium 510 (e.g., a wireless communications medium conforming with the IEEE 802.1 1 standard).
  • Computer program product 500 may be recorded on non-transitory computer readable medium 506 or another similar recordable medium 508.
  • Fig. 6 shows a block diagram of an illustrative embodiment of an example computer system 600.
  • the computer system 600 may include one or more processors 610 and a system memory 620.
  • a memory bus 630 may be used for communicating between the processor 610 and the system memory 620.
  • processor 610 may be of any type including but not limited to a microprocessor ( ⁇ ), a microcontroller ( ⁇ ), a digital signal processor (DSP), or any combination thereof.
  • Processor 610 can include one or more levels of caching, such as a level one cache 61 1 and a level two cache 612, a processor core 613, and registers 614.
  • the processor core 613 can include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof.
  • a memory controller 615 can also be used with the processor 610, or in some implementations the memory controller 615 can be an internal part of the processor 610.
  • the system memory 620 may be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof.
  • the system memory 620 may include an operating system 621 , one or more applications 622, and program data 624.
  • the application 622 may include a tag refinement 623 that is arranged to perform the functions and/or operations as described herein including at least the functional blocks and/or operations described with respect to the process 401 of Fig. 4.
  • the program data 624 may include tag list 625 to be accessed by the tag refinement 623.
  • the tag refinement 623 may be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof.
  • the system memory 620 may include an operating system 621 , one or more applications 622, and program data 624.
  • the application 622 may include a tag refinement 623 that is arranged to perform the functions and
  • application 622 may be arranged to operate with the program data 624 on the operating system 621 such that implementations of various tag refinement techniques may be provided as described herein. This described basic
  • the implementer may opt for a mainly hardware and/or firmware vehicle; if flexibility is paramount, the implementer may opt for a mainly software implementation; or, yet again alternatively, the implementer may opt for some combination of hardware, software, and/or firmware.
  • a signal bearing medium examples include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Versatile Disk (DVD), a digital tape, a computer memory, etc. ; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
  • a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and non-volatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and
  • a typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network
  • any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components.
  • any two components so associated can also be viewed as being “operably connected”, or “operably coupled”, to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable”, to each other to achieve the desired functionality.
  • operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

De manière générale, l'invention concerne des techniques relatives à une stratégie d'affinage d'étiquettes. Un procédé donné à titre d'exemple pour affiner les homonymes et les synonymes dans une pluralité d'étiquettes peut consister à recevoir, par le biais d'un système d'affinage d'étiquettes, une pluralité d'actions d'étiquetage, chacune d'elle associant une ou plusieurs étiquettes de la pluralité d'étiquettes à un objet numérique. Le procédé peut consister également à extraire, par le biais du système d'affinage d'étiquettes, un premier sous-ensemble d'étiquettes de la pluralité d'étiquettes, le premier sous-ensemble d'étiquettes ayant un score de relativité-similarité collectif supérieur en comparaison à un second sous-ensemble d'étiquettes sélectionnées à partir de la pluralité d'étiquettes, et le premier sous-ensemble d'étiquettes, différent du second sous-ensemble d'étiquettes, ayant un nombre d'étiquettes prédéterminé identique au second sous-ensemble d'étiquettes.
PCT/CN2012/073403 2012-03-31 2012-03-31 Stratégies d'affinage d'étiquettes pour systèmes d'étiquetage social WO2013143141A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US13/980,573 US9411875B2 (en) 2012-03-31 2012-03-31 Tag refinement strategies for social tagging systems
PCT/CN2012/073403 WO2013143141A1 (fr) 2012-03-31 2012-03-31 Stratégies d'affinage d'étiquettes pour systèmes d'étiquetage social
US15/197,458 US20160306805A1 (en) 2012-03-31 2016-06-29 Tag refinement strategies for social tagging systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/073403 WO2013143141A1 (fr) 2012-03-31 2012-03-31 Stratégies d'affinage d'étiquettes pour systèmes d'étiquetage social

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US13/980,573 A-371-Of-International US9411875B2 (en) 2012-03-31 2012-03-31 Tag refinement strategies for social tagging systems
US15/197,458 Continuation US20160306805A1 (en) 2012-03-31 2016-06-29 Tag refinement strategies for social tagging systems

Publications (1)

Publication Number Publication Date
WO2013143141A1 true WO2013143141A1 (fr) 2013-10-03

Family

ID=49258118

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/073403 WO2013143141A1 (fr) 2012-03-31 2012-03-31 Stratégies d'affinage d'étiquettes pour systèmes d'étiquetage social

Country Status (2)

Country Link
US (2) US9411875B2 (fr)
WO (1) WO2013143141A1 (fr)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8106856B2 (en) 2006-09-06 2012-01-31 Apple Inc. Portable electronic device for photo management
US8698762B2 (en) 2010-01-06 2014-04-15 Apple Inc. Device, method, and graphical user interface for navigating and displaying content in context
US20150066587A1 (en) 2013-08-30 2015-03-05 Tealium Inc. Content site visitor processing system
US11695845B2 (en) 2013-08-30 2023-07-04 Tealium Inc. System and method for separating content site visitor profiles
US9537964B2 (en) 2015-03-11 2017-01-03 Tealium Inc. System and method for separating content site visitor profiles
US8805946B1 (en) 2013-08-30 2014-08-12 Tealium Inc. System and method for combining content site visitor profiles
US9081789B2 (en) 2013-10-28 2015-07-14 Tealium Inc. System for prefetching digital tags
US8990298B1 (en) 2013-11-05 2015-03-24 Tealium Inc. Universal visitor identification system
US9916075B2 (en) 2015-06-05 2018-03-13 Apple Inc. Formatting content for a reduced-size user interface
US20170011015A1 (en) * 2015-07-08 2017-01-12 Ebay Inc. Content extraction system
US10296634B2 (en) * 2015-08-18 2019-05-21 Facebook, Inc. Systems and methods for identifying and grouping related content labels
US10459994B2 (en) 2016-05-31 2019-10-29 International Business Machines Corporation Dynamically tagging webpages based on critical words
US20170357672A1 (en) * 2016-06-12 2017-12-14 Apple Inc. Relating digital assets using notable moments
AU2017100670C4 (en) 2016-06-12 2019-11-21 Apple Inc. User interfaces for retrieving contextually relevant media content
DK201670609A1 (en) 2016-06-12 2018-01-02 Apple Inc User interfaces for retrieving contextually relevant media content
US20180219810A1 (en) * 2016-08-29 2018-08-02 Mezzemail Llc Transmitting tagged electronic messages
US10459960B2 (en) 2016-11-08 2019-10-29 International Business Machines Corporation Clustering a set of natural language queries based on significant events
US10423614B2 (en) 2016-11-08 2019-09-24 International Business Machines Corporation Determining the significance of an event in the context of a natural language query
US11243996B2 (en) 2018-05-07 2022-02-08 Apple Inc. Digital asset search user interface
DK180171B1 (en) 2018-05-07 2020-07-14 Apple Inc USER INTERFACES FOR SHARING CONTEXTUALLY RELEVANT MEDIA CONTENT
US10846343B2 (en) 2018-09-11 2020-11-24 Apple Inc. Techniques for disambiguating clustered location identifiers
US10803135B2 (en) 2018-09-11 2020-10-13 Apple Inc. Techniques for disambiguating clustered occurrence identifiers
US11243916B2 (en) * 2019-02-27 2022-02-08 Atlassian Pty Ltd. Autonomous redundancy mitigation in knowledge-sharing features of a collaborative work tool
US10802806B1 (en) * 2019-03-29 2020-10-13 Advanced Micro Devices, Inc. Generating vectorized control flow using reconverging control flow graphs
US11146656B2 (en) 2019-12-20 2021-10-12 Tealium Inc. Feature activation control and data prefetching with network-connected mobile devices

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080040126A1 (en) * 2006-08-08 2008-02-14 Microsoft Corporation Social Categorization in Electronic Mail
US20080282198A1 (en) * 2007-05-07 2008-11-13 Brooks David A Method and sytem for providing collaborative tag sets to assist in the use and navigation of a folksonomy
US20090292686A1 (en) * 2008-05-22 2009-11-26 International Business Machines Corporation Disambiguating tags in folksonomy tagging systems
CN102129470A (zh) * 2011-03-28 2011-07-20 中国科学技术大学 标签聚类方法和系统
US20110282878A1 (en) * 2010-05-17 2011-11-17 International Business Machines Corporation Generating a taxonomy for documents from tag data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8024327B2 (en) * 2007-06-26 2011-09-20 Endeca Technologies, Inc. System and method for measuring the quality of document sets
EP2107475A1 (fr) * 2008-03-31 2009-10-07 British Telecommunications Public Limited Company Annotation électronique des ressources
US8799294B2 (en) * 2008-05-15 2014-08-05 International Business Machines Corporation Method for enhancing search and browsing in collaborative tagging systems through learned tag hierarchies
US8175847B2 (en) * 2009-03-31 2012-05-08 Microsoft Corporation Tag ranking
EP2341123A1 (fr) * 2009-12-18 2011-07-06 The Procter & Gamble Company Procédé de séchage par atomisation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080040126A1 (en) * 2006-08-08 2008-02-14 Microsoft Corporation Social Categorization in Electronic Mail
US20080282198A1 (en) * 2007-05-07 2008-11-13 Brooks David A Method and sytem for providing collaborative tag sets to assist in the use and navigation of a folksonomy
US20090292686A1 (en) * 2008-05-22 2009-11-26 International Business Machines Corporation Disambiguating tags in folksonomy tagging systems
US20110282878A1 (en) * 2010-05-17 2011-11-17 International Business Machines Corporation Generating a taxonomy for documents from tag data
CN102129470A (zh) * 2011-03-28 2011-07-20 中国科学技术大学 标签聚类方法和系统

Also Published As

Publication number Publication date
US20140089330A1 (en) 2014-03-27
US9411875B2 (en) 2016-08-09
US20160306805A1 (en) 2016-10-20

Similar Documents

Publication Publication Date Title
US9411875B2 (en) Tag refinement strategies for social tagging systems
JP7032397B2 (ja) 複数のデータ表現間の類似性を識別するための方法およびシステム
Zhang et al. Topic analysis and forecasting for science, technology and innovation: Methodology with a case study focusing on big data research
US10546005B2 (en) Perspective data analysis and management
US10885089B2 (en) Methods and systems for identifying a level of similarity between a filtering criterion and a data item within a set of streamed documents
CN102193973B (zh) 呈现回答
US10229200B2 (en) Linking data elements based on similarity data values and semantic annotations
US20200050643A1 (en) Ingestion planning for complex tables
US20200257761A1 (en) Ontology-based document analysis and annotation generation
US20150113388A1 (en) Method and apparatus for performing topic-relevance highlighting of electronic text
US10042913B2 (en) Perspective data analysis and management
Gorrell et al. Using@ Twitter conventions to improve# LOD-based named entity disambiguation
AU2021380919A9 (en) Methods and systems for reuse of data item fingerprints in generation of semantic maps
Goarany et al. Mining social tags to predict mashup patterns
JP2014146218A (ja) 情報提供装置
JP5980520B2 (ja) 効率的にクエリを処理する方法及び装置
WO2021055868A1 (fr) Association d'articles de contenu fournis par l'utilisateur à des nœuds d'intérêt
CN109800429B (zh) 主题挖掘方法、装置及存储介质、计算机设备
JP6145064B2 (ja) 文書集合分析装置、文書集合分析方法、文書集合分析プログラム
Tang et al. Social media mining and search
CN107391613B (zh) 一种工业安全主题多文档自动消歧方法及装置
US20160124946A1 (en) Managing a set of data
Dang et al. An offline–online visual framework for clustering memes in social media
CN103514192B (zh) 数据处理方法和数据处理设备
Fernandes et al. Lightweight context-based web-service composition model for mobile devices

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 13980573

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12873123

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12873123

Country of ref document: EP

Kind code of ref document: A1