US20100332564A1 - Efficient Method for Clustering Nodes - Google Patents
Efficient Method for Clustering Nodes Download PDFInfo
- Publication number
- US20100332564A1 US20100332564A1 US12/876,610 US87661010A US2010332564A1 US 20100332564 A1 US20100332564 A1 US 20100332564A1 US 87661010 A US87661010 A US 87661010A US 2010332564 A1 US2010332564 A1 US 2010332564A1
- Authority
- US
- United States
- Prior art keywords
- abridged
- nodes
- cluster
- node
- primary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/231—Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
Definitions
- Embodiments of the present invention relate to methods and computer storage media for clustering nodes.
- An input file is received that is comprised of primary nodes, secondary nodes and metrics that relate to the association between the primary nodes and the secondary nodes.
- the input file is abridged to reduce the number of nodes contained in the input file. Abridged primary nodes are merged with secondary nodes to form cluster until the cluster size reaches a pre-defined size.
- FIG. 1 is a block diagram of an exemplary computing environment suitable for use in implementing embodiments of the present invention
- FIG. 2 is a block diagram of an exemplary clustering system, in accordance with an embodiment of the present invention.
- FIG. 3A is a block diagram of an exemplary input file, in accordance with an embodiment of the present invention.
- FIG. 3B is a block diagram of an exemplary sorted input file, in accordance with an embodiment of the present invention.
- FIG. 3C is a block diagram of an exemplary abridged input file, in accordance with an embodiment of the present invention.
- FIG. 4A is a block diagram of an exemplary set of nodes, in accordance with an embodiment of the present invention.
- FIG. 4B is a block diagram of an exemplary set of nodes, in accordance with an embodiment of the present invention.
- FIG. 5 is a flow diagram of an exemplary method for clustering nodes, in accordance with an embodiment of the present invention.
- FIG. 6 is a flow diagram of an exemplary method for clustering nodes, in accordance with an embodiment of the present invention.
- FIG. 7 is a flow diagram of an exemplary method for clustering nodes, in accordance with an embodiment of the present invention.
- An input file is received that is comprised of primary nodes, secondary nodes and metrics that relate to the association between the primary nodes and the secondary nodes.
- the input file is abridged to reduce the number of nodes contained in the input file. Abridged primary nodes are merged with secondary nodes to form cluster until the cluster size reaches a pre-defined size.
- the present invention provides computer storage media having computer-executable instructions embodied thereon that, when executed, perform a method for clustering nodes.
- the method includes receiving an input file comprised of a plurality of primary nodes, a plurality of secondary nodes, and a plurality of metrics. Each of the plurality of metrics describes a relationship between one of the plurality of primary nodes and one of the plurality of secondary nodes.
- the method also includes abridging the input file to generate an abridged file that is comprised of: a plurality of abridged primary nodes, a plurality of abridged secondary nodes, and a plurality of abridged metrics.
- the method also is comprised of merging a first abridged primary node of the plurality of abridged primary nodes with a first abridged secondary node of the plurality of abridged secondary nodes to form a first cluster.
- the present invention provides a method for clustering nodes in a computing environment having a processor and memory.
- the method includes receiving an input file comprised of a plurality of primary nodes, a plurality of secondary nodes, and a plurality of metrics.
- Each of the plurality of metrics describes a relationship between one of the plurality of primary nodes and one of the plurality of secondary nodes.
- the method also includes abridging, utilizing the processor and memory, the input file to generate an abridged file that is comprised of: a plurality of abridged primary nodes, a plurality of abridged secondary nodes, and a plurality of abridged metrics.
- the method also is comprised of merging a first abridged primary node of the plurality of abridged primary nodes with a first abridged secondary node of the plurality of abridged secondary nodes to form a first cluster.
- the method is further comprised of replacing the first abridged secondary node in the first cluster with a second abridged secondary node of the plurality of abridged secondary nodes.
- the present invention provides a system for clustering nodes in a computing environment having a processor and memory.
- the system is comprised of a receiving component functional to receive an input file comprised of a plurality of primary nodes, a plurality of secondary nodes, and a plurality of metrics.
- Each of the plurality of metrics describes a relationship between one of the plurality of primary nodes and one of the plurality of secondary nodes.
- the system is further comprised of an abridging component functional to abridge the input file to generate an abridged file that is comprised of: a plurality of abridged primary nodes, a plurality of abridged secondary nodes, and a plurality of abridged metrics.
- the system is further comprised of a merging component functional to merge a first abridged primary node of the plurality of abridged primary nodes with a first abridged secondary node of the plurality of abridged secondary nodes to form a first cluster.
- computing device 100 an exemplary operating environment suitable for implementing embodiments of the present invention is shown and designated generally as computing device 100 .
- Computing device 100 is but one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of modules/components illustrated.
- Embodiments may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program modules, being executed by a computer or other machine, such as a personal data assistant or other handheld device.
- program modules including routines, programs, objects, modules, data structures, and the like, refer to code that performs particular tasks or implements particular abstract data types.
- Embodiments may be practiced in a variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, specialty computing devices, etc.
- Embodiments may also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network.
- computing device 100 includes a bus 110 that directly or indirectly couples the following devices: memory 112 , one or more processors 114 , one or more presentation modules 116 , input/output (I/O) ports 118 , I/O modules 120 , and an illustrative power supply 122 .
- Bus 110 represents what may be one or more busses (such as an address bus, data bus, or combination thereof).
- FIG. 1 is merely illustrative of an exemplary computing device that can be used in connection with one or more embodiments. Distinction is not made between such categories as “workstation,” “server,” “laptop,” “hand-held device,” etc., as all are contemplated within the scope of FIG. 1 and reference to “computer” or “computing device.”
- Computing device 100 typically includes a variety of computer-readable media.
- computer-readable media may comprise Random Access Memory (RAM); Read Only Memory (ROM); Electronically Erasable Programmable Read Only Memory (EEPROM); flash memory or other memory technologies; CDROM, digital versatile disks (DVD) or other optical or holographic media; magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, carrier waves or any other medium that can be used to encode desired information and be accessed by computing device 100 .
- Memory 112 includes computer-storage media in the form of volatile and/or nonvolatile memory.
- the memory may be removable, non-removable, or a combination thereof.
- Exemplary hardware devices include solid-state memory, hard drives, optical-disc drives, etc.
- Computing device 100 includes one or more processors that read data from various entities such as memory 112 or I/O modules 120 .
- Presentation module(s) 116 present data indications to a user or other device.
- Exemplary presentation modules include a display device, speaker, printing module, vibrating module, and the like.
- I/O ports 118 allow computing device 100 to be logically coupled to other devices including I/O modules 120 , some of which may be built in.
- Illustrative modules include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, and the like.
- clustering system 200 is but one example of a suitable clustering system and is not intended to suggest any limitations as to the scope or functionality of the invention. Neither should clustering system 200 be interpreted as having any dependency or requirement relating to any one or combination of modules/components illustrated.
- Clustering system 200 includes a computing device 202 , a network, 204 , a computing device 206 , a receiving component 208 , a cleaning component 210 , a sorting component 212 , an abridging component 214 , an initial merging component 216 , and initial unique node evaluation component 218 , an initial unique node merging component 220 , a subsequent node merging component 222 , a cluster size evaluating component 224 , a cluster replacing component 226 , a cluster merging component 228 , and a list generating component 230 .
- computing devices 202 and 206 are a computing device as represented by the computing device 100 previously described in relation to FIG. 1 .
- Computing device 202 provides an input file to computing device 206 .
- computing device 202 is a server that contains a database that from which the input file is created.
- Both computing devices 202 and 204 are connected to network 204 .
- Network 204 includes, without limitation, one or more local area networks (LANs) and/or wide area networks (WANs). Such networking environments are commonplace in offices, residential networks, intranets, and the Internet. Accordingly, the network 204 is not further described herein.
- Computing device 206 contains a plurality of components. While the cluster system 200 visually depicts the components within a single computing device 206 , it is understood and appreciated by those skilled in the art that the various components illustrated as part of computing device 206 can be independent from computing device 206 wherein the components are directly or indirectly connected to a network such as network 204 . It will also be appreciated and understood by those skilled in the art that the components depicted as part of clustering system 200 are merely illustrations of an exemplary clustering system 200 and it has been contemplated that the components need not be separate or distinct from one another and instead the various components may be merged into multiple resulting components.
- Computing device 206 includes receiving component 208 .
- Receiving component 208 receives an input file.
- the input file is a file, document, data structure and/or information feed that provides nodes or nodal information that is clustered by clustering system 200 .
- An exemplary embodiment of an input file, as visually depicted in FIG. 3A and generally indicated as input file 301 is in accordance with an embodiment of the present invention.
- Input file 301 includes three classes or types of information/data, primary node data, secondary node data, and metric data.
- a node is a data point that represents a specific instance or occurrence of data. For example, a node can represent a keyword in a marketing analysis, wherein each node represents a keyword.
- nodes serving as keywords would be a first node representing the keyword “run”, a second node representing the keyword “running”, and a third node representing the keyword phrase “running shoes”.
- Each of the three exemplary nodes represents a keyword or phrase.
- a node represents a person, identity, or group in a social network. Examples include each node represents an individual user of a social network, or a group within a social network.
- a node includes graph nodes.
- a graph may be formed by a plurality of graph nodes, wherein the graph nodes have connections between one another. The connection between each of the graph nodes may be either a directed connection or an undirected connection.
- a directed connection is when a first node is connected to a second node, but the second node is not necessarily connected to the first node.
- An undirected connection represents a connection between a first and a second node wherein if a first node is connected to a second node, the second node is therefore connected to the first node.
- Examples of graphs comprised of graph nodes includes a network, which as previously discussed, a network may include a social network or advertising network. Stated in other words, nodes of a social network may represent participants within the social network and the connections represent the relationships among the participants/nodes of the social network.
- the metric data of input file 301 is a value indicator of the relational strength, closeness, or correlation between a primary node and a secondary node.
- a node represents a keyword and the metric indicates the number of advertisers that utilize or desire both the primary node and the secondary node. Therefore, using an example from above, if the primary node is “run” and an associated secondary node is “running” and the metric is “24”, this indicates that twenty-four advertisers that bid on and use keyword “run” also bid on and use the keyword “running” in their online advertising campaign.
- every secondary node is also a primary node.
- secondary nodes are also primary nodes when the connections between the nodes are undirected.
- a secondary node is often, but not always, a primary node.
- a secondary node not also being a primary node results from a directed connection among the nodes.
- an originator keyword is akin to a primary node while a receptor keyword is akin to a secondary node, and metric is akin to tie-strength metric.
- input file 301 is generated as a raw input file 310 where the order of the data and the data contained is derived from the underlying data source.
- Raw input file 301 includes a dataset 314 which includes a primary node entry, a secondary node entry, and a metric.
- the primary node entry for data set 314 is “N 15 ”
- the secondary node entry for dataset 314 is “A”
- the metric for dataset 314 is “1”.
- the primary node, secondary node, and metric of dataset 314 are associated with one another because of their inclusion in a common dataset.
- An additional dataset of raw input file 310 is dataset 312 .
- Dataset 312 includes a primary node of “N 12 ”, a secondary node of “E”, and a metric of “13”.
- Raw input file 310 in this exemplary embodiment, has not been sorted, cleaned, or abridged. The datasets are not in a particular order within the raw input file 310 .
- input file 301 is a visual representation of an exemplary embodiment of an input file and that an input file may be of any form that conveys data information so that the data may be clustered by a data clustering system.
- the input file may be a file, document, data/information feed, or data structure that provides information that can be clustered.
- computing device includes cleaning component 210 .
- cleaning component 210 cleans the input file.
- the cleaning of an input file may include removing unreliable data from the file.
- An example of unreliable data in a keyword setting would include those keywords that are bid on by a particular on-line advertiser that bids on a large collection of keywords.
- the datasets that represent those keywords used by the mass advertiser would be cleaned from the input file to provide more reliable data to the clustering system.
- Other examples of cleaning a keyword oriented input file would be to remove those datasets that identify a single advertiser.
- Cleaning of the input file may also include removing datasets that are incomplete or contain data that is inconsistent or aberrant with the other datasets of the input file.
- Computing device 206 also includes sorting component 212 .
- sorting component sorts the data and/or data sets of the input file and/or variations of the input file. Sorting includes arranging the data sets by ascending, descending, numerical value, alphabetic value, or commonality of data elements of any of the data included in the input file. For example, the input file can be sorted from “A”-“Z” based on the originator keyword / primary node entry. Therefore, in the above example, the data sets associated with the primary nodes that are sorted are also manipulated in order to maintain the primary nodes in association with their associated secondary node and the metric describing the relationship between the primary and secondary nodes. Referring to FIG.
- sorted input file 302 is a sorted version of a raw input file, such as raw input file 301 illustrated in FIG. 3A .
- Sorted input file 302 is first sorted in an ascending manner based on the primary node and further sorted in descending order based on the metric.
- dataset 322 includes a primary node of “N 12 ”, a secondary node of “E”, and a metric of “13” which has been sorted to relocate that dataset from its original position as indicated in FIG. 3A and represented as dataset 312 .
- Dataset 324 of FIG. 3B has been re-positioned in the sorted input file 302 from an original position as illustrated by dataset 314 of FIG. 3A .
- computing device 206 includes abridging component 214 .
- abridging component 214 abridges an input file to reduce the number of datasets included in the input file.
- the resulting file may be referred to as an abridged input file.
- Abridging of an input file reduces the amount or number of data elements present in the input file (where input file represents any sorted, cleaned or otherwise manipulated input file).
- an input file is abridged to reduce the number of secondary nodes that are associated with each unique primary node.
- the input file is abridged to reduce the number of secondary nodes associated with the primary nodes.
- FIG. 3C an exemplary embodiment of an abridged input file generally indicated as abridged file 303 .
- Abridged file 303 in this exemplary embodiment, has been abridged to include, at most, three secondary nodes with each primary node. For example, primary nodes “Nix”, where “x” represent a number for clarification purposes only, has been reduced from having five secondary nodes associated to only having three secondary nodes associated with the primary node “Nix”.
- Abridged file 303 does not contain the datasets that include primary nodes “N 14 ” or “N 15 ” as previously included in the input file prior to the file being abridged.
- Dataset 332 which is representative of dataset 322 and dataset 312 of FIG. 3A , and FIG. 3A respectively remains in the abridged file because it is one of three datasets that contained a common primary node, and had a metric that satisfied a condition. And for this example, the metric condition was the three highest metric values were maintained while all data sets with a metric score below the third highest were eliminated from the abridged file.
- Abridging a file reduces the file size which in turn reduces the resources required to cluster.
- the input file is abridged so that no more than ten secondary nodes are associated with each primary node. In another exemplary embodiment, the input file is abridged so that no more than 5 secondary nodes are associated with a primary node.
- computing device 206 includes initial merging component 216 .
- initial merging component 216 merges each unique primary node with a secondary node based on a condition.
- the condition to determine which secondary nodes are paired with a primary node would be determined based on those secondary nodes that are associated with the greatest metric value.
- the primary node identified as “Node A” was included in three datasets, where the first dataset included the secondary node “Node B” and a metric of “1”, and the second dataset included a secondary node “Node C” and a metric of “9”, and a third dataset included a secondary node “Node D” and a metric of “5”, “Node A” would merge with “Node C” because the data set with a primary node “Node A”, a secondary node “Node C”, and a metric “9” is the dataset with the highest metric that has a primary node of “Node A”.
- the initial merging of nodes from an abridged input file are processed in a whole pass.
- a whole pass processes an input file sequentially without focusing on a particular primary node, secondary node, or metric value.
- whole pass merging provides an efficient way of merging nodes in a document since each dataset is only evaluated once.
- An alternative approach to the initial merge is to process all datasets that include a particular element before moving onto the next set of datasets that contain a different particular element.
- An example of this type of initial merging would include evaluating only those datasets that include a particular primary node to determine which nodes should initially be merged before evaluating a different primary node value.
- computing device 260 includes an initial unique node evaluating component 218 .
- the initial unique node evaluating component 218 evaluates data elements to determine if the evaluated data element is the first instance of that data element to be evaluated from the input file. For example, initial unique node evaluating component 218 evaluates the primary node of each dataset to determine if that primary node value or entry has occurred previously in the input file. This allows for the first occurrence of a node entry to be determined, and once it is determined that a particular node entry is the first occurrence a resulting action may be performed.
- each initial unique originator keyword is expected to be merged with the associated receptor keyword, it must be determined if the current originator keyword is the first occurrence of that originator keyword, which therefore means that that originator keyword has not yet been merged, or if the originator keyword is not an initial instance and therefore a different action should be taken because that originator keyword has already been merged.
- computing device 206 includes initial unique node merging component 220 .
- initial unique node merging component 220 merges a node that is an initial occurrence of a unique primary node and merges that node with an associated secondary node to form a first cluster. For example, an initial occurrence of a unique primary node is determined by initial unique node evaluating component 218 , once a node has been determined as an initial occurrence of a unique node the initial unique node merging component 220 merges that identified node with the secondary node of the dataset containing the first occurrence of the initial unique primary node.
- computing device 206 includes subsequent node merging component 222 .
- Subsequent node merging component 222 merges nodes wherein the primary node has not been identified as an initial unique node. For example, if the initial unique evaluating component determines that a particular node is not an initial instance, the subsequent node merging component 222 merges the nodes of that dataset to form a cluster that is not an initial or first cluster created by initial unique node merging component 220 .
- computing device 206 includes a cluster size evaluating component 224 .
- the cluster size evaluating component 224 evaluates the size of a cluster to determine if that particular cluster satisfies a condition. For example, if three-hundred nodes have been clustered to form a cluster, the cluster size evaluating component 224 evaluate the number of nodes that are comprised in the cluster to determine if that cluster satisfies a condition that limits the number of nodes to a cluster.
- computing device 206 includes a cluster replacing component 226 .
- Cluster replacing component 226 replaces one cluster with another cluster. For example, cluster replacing component 226 replaces a first/previous cluster with a second/subsequent cluster if it is determined that the second/subsequent cluster satisfies a condition. Further in example, the condition to be satisfied in order for the cluster replacing component 226 to replace the initial/previous cluster is if the second/subsequent cluster has a higher metric than the first/subsequent cluster. Additional conditions include metric values less than a second/subsequent cluster, metric values more than a second/subsequent cluster, or other comparative conditions to determine if as cluster should be replaced.
- computing device 206 includes cluster merging component 228 .
- Cluster merging component 228 clusters two or more clusters together to form a single cluster.
- FIG. 4A a visual representation of nodes of an embodiment of the present invention.
- the un-clustered group of nodes is generally referred to as nodes 400 .
- Nodes 400 includes node 410 with represents primary node “N 11 ”, node 412 which represent secondary node “F”, node 414 which represents primary node “N 21 ”, node 416 which represents secondary node “B”, and node 418 which represents primary node “N 31 ”.
- the initial unique merging component 220 merges node 410 and node 412 to form an initial cluster since node 410 is an initial unique node as determined by the initial unique node evaluating component 218 .
- nodes 414 and 416 which are clustered as an initial cluster, as well as nodes 418 and 416 are clustered as an initial node.
- FIG. 4B a visual representation of clusters of an embodiment of the present invention.
- the resulting cluster of nodes 410 and 412 is cluster 420 .
- the resulting cluster of nodes 414 and 416 is cluster 422
- the resulting cluster of nodes 418 and 416 is cluster 424 .
- cluster merging component 230 clusters cluster 422 and 424 into a cluster 430 .
- Cluster 430 includes nodes 414 , 416 , and 418 .
- computing device 206 includes list generating component 230 .
- the list generating component 230 generates a list that includes the resulting clusters and the associated nodes of the clusters.
- the list generated may be in the form of a data file, a document, a data/information feed or other forms that communicate the resulting clusters and node combinations.
- the list includes each primary nodes and the secondary nodes and primary nodes that are associated by one or more common clusters.
- the list generated in the exemplary embodiment includes the metric(s) associated with the cluster(s) and nodes.
- FIG. 5 a flow diagram is shown that illustrates an exemplary method for clustering nodes and designated generally as clustering method 500 , in accordance with an embodiment of the present invention.
- An input file is received at block 502 .
- the input file includes primary nodes as indicated at block 504 , secondary nodes as indicated at 506 , and metrics as indicated at 508 .
- the input file is abridged to produce an abridged input file as indicated at block 510 .
- the primary nodes of the abridged file are evaluated at block 512 .
- the evaluation of the primary node is to determine if the primary node is a unique initial instance of that evaluated primary node as indicated at block 514 . If the primary node is determined to be a unique first instance, that primary node is merged with the secondary node of the unique initial primary node's dataset as indicated at block 516 .
- the primary node is determined not to be a unique initial instance of the primary node, then a determination is made to determine if the subsequent cluster that would result from the merging of the evaluated primary node and its associated secondary node should replace the previous cluster as indicated at block 518 .
- the previous cluster is comprised of either the unique initial primary node and its associated secondary node or it is comprised of a subsequent (non-initial) primary node and its associated secondary node which had previously replaced a cluster to become the previous cluster.
- the determination to replace a previous or initial cluster is made by a condition.
- a previous cluster is replaced with a subsequent cluster when the metric associated with the subsequent cluster is greater than the previous cluster.
- the previous cluster When it is determined that a previous cluster should be replaced by a subsequent cluster, the previous cluster is replaced by the subsequent cluster and the subsequent cluster is then known as the previous cluster as indicated at block 520 . Therefore, in an exemplary embodiment, the previous cluster will always result in having the highest associated metric following block 520 .
- the process of evaluating the primary node at block 512 is repeated until there are no more primary nodes to evaluate as indicated at block 522 . If there are no additional primary nodes to evaluate then the previous clusters are merged with subsequent clusters as indicated at block 524 .
- the clustering of previous clusters and subsequent clusters is done according to structured conditions or rules. For example, clusters that include common nodes are combined based on each clusters' metric value or the resulting metric value of the resulting cluster.
- FIG. 6 a flow diagram is shown that illustrates an exemplary method for clustering keywords and designated generally as clustering method 600 , in accordance with an embodiment of the present invention.
- An input file is received at block 602 .
- the input file is comprised of originator keywords indicated at block 604 , receptor keywords indicated at block 606 , and tie-strength metrics indicated at block 608 .
- the input file is abridged at block 610 .
- the initial originator keyword of the abridged file is merged with its associated receptor keyword to form a first cluster indicated at block 612 .
- This first cluster serves as a seed for the clustering method.
- an associated receptor keyword is a receptor keyword that is included in the same dataset as the evaluated originator keyword. More generically, an associated node is a node that is contained in a common dataset.
- a subsequent originator keyword is evaluated, indicated at block 614 to determine if the subsequent originator keyword is an initial unique originator keyword. If it is determined, at block 616 , that the subsequent originator keyword is an initial unique keyword, meaning that that particular originator keyword has not previously been evaluated, then that originator keyword is merged with its associated receptor keyword to create a first cluster, as indicated at block 618 . However, if it is determined at block 616 that the subsequent originator keyword id not an initial unique keyword, then the subsequent originator keyword is merged with its associated receptor keyword to form a second cluster as indicated at block 620 .
- clusters are merged as indicated at block 628 .
- the merging of clusters is performed to combine clusters that have a common originator keyword, a common receptor keyword, or any combination of common originator keywords and receptor keywords.
- the merging of clusters merges clusters that were previously merged and continues to merge those clusters until all clusters have merged that satisfy a condition, such as a maximum node/keyword number.
- a list is generated as indicated at block 630 .
- the list may be in any form that communicates the relationship or association of the originator keywords, receptor keywords, tie-strength metrics, and resulting clusters.
- FIG. 7 a flow diagram is shown that illustrates an exemplary method for clustering keywords and designated generally as clustering method 700 , in accordance with an embodiment of the present invention.
- An input file is received, as indicated at block 702 .
- the input file is comprised of originator keywords, as indicated at block 704 , receptor keywords, as indicated at block 706 , and tie-strength metrics, as indicated at block 708 .
- the input file is cleaned, as indicated at block 710 , and sorted as indicated at block 712 .
- the input file is then abridged to create an abridged file, as indicated at block 714 . It is understood and appreciated by those skilled in the art that while the exemplary clustering method 700 visually represents the cleaning, sorting and abridging of the input file in a particular order, the various alternative arrangements have been contemplated and are optional embodiments of the present invention.
- the initial originator keyword is merged with its associated receptor keyword to form a first cluster as indicated at block 716 .
- a subsequent originator keyword is evaluated as indicated at block 718 .
- the subsequent originator keyword is evaluated to determine if the subsequent originator keyword is an initial unique originator keyword as indicated at block 720 . If it is determined that the subsequent originator keyword is an initial unique originator keyword, the subsequent originator keyword is merged with its associated receptor keyword to form a first cluster as indicated at block 722 . However, if it is determined to the contrary, the subsequent originator keyword is merged with its associated receptor keyword to form a second cluster, as indicated at block 724 .
- the tie-strength of the second cluster and a related first cluster are evaluated, as indicated at block 726 , to determine if the first cluster should be replaced, as indicated at block 728 . If it is determined that the first cluster should replace the second cluster, the first cluster is replaced by the second cluster, as indicated at block 730 . However, if it is determined that the first cluster should not be replaced a determination is made if additional originator keywords could be evaluated, as indicated at block 732 . If additional, originator keywords are present in the input file, and then those keywords are evaluated in turn as indicated at block 718 . If additional originator keywords do not exist, the clusters are merged to form larger clusters.
- the merger of clusters with other clusters is dependent on a condition that a cluster containing more than 300 originator keywords will not be merged with another cluster. Once all of the clusters have been merged to satisfy the conditions of merging, a list is generated, as indicated at block 736 .
Abstract
Methods and computer storage media for clustering nodes are provided. An input file is received that is comprised of primary nodes, secondary nodes and metrics that relate to the association between the primary nodes and the secondary nodes. Upon receiving the input file, the input file may be abridged to reduce the number of nodes contained in the input file. The unique initial primary nodes may then be clustered with their associated secondary node. The clusters containing the unique initial primary nodes may then be replaced if a subsequent related cluster satisfies a pre-defined condition. In some embodiments, multiple clusters may then be merged until the cluster size reaches a pre-defined size. In some embodiments, the input file is may be cleaned and sorted prior to being abridged.
Description
- This is a continuation application that claims the benefit of U.S. patent application Ser. No. 12/036,720, filed Feb. 25, 2008, entitled “EFFICIENT METHOD FOR CLUSTERING NODES,” the entirety of which is incorporated by reference herein.
- Traditionally, when a large number of nodes exist it is difficult to analyze the nodes in order to determine if a relationship among the nodes exists. The relationships between nodes are often indicative of an underlying clustering of the nodes. Clustering of the nodes has traditionally been an inefficient process that requires the distance between all pairs of nodes to be considered before a relationship between the various nodes can be realized.
- Embodiments of the present invention relate to methods and computer storage media for clustering nodes. An input file is received that is comprised of primary nodes, secondary nodes and metrics that relate to the association between the primary nodes and the secondary nodes. Upon receiving the input file, the input file is abridged to reduce the number of nodes contained in the input file. Abridged primary nodes are merged with secondary nodes to form cluster until the cluster size reaches a pre-defined size.
- This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
- Embodiments are described in detail below with reference to the attached drawing figures, wherein:
-
FIG. 1 is a block diagram of an exemplary computing environment suitable for use in implementing embodiments of the present invention; -
FIG. 2 is a block diagram of an exemplary clustering system, in accordance with an embodiment of the present invention; -
FIG. 3A is a block diagram of an exemplary input file, in accordance with an embodiment of the present invention; -
FIG. 3B is a block diagram of an exemplary sorted input file, in accordance with an embodiment of the present invention; -
FIG. 3C is a block diagram of an exemplary abridged input file, in accordance with an embodiment of the present invention; -
FIG. 4A is a block diagram of an exemplary set of nodes, in accordance with an embodiment of the present invention; -
FIG. 4B is a block diagram of an exemplary set of nodes, in accordance with an embodiment of the present invention; -
FIG. 5 is a flow diagram of an exemplary method for clustering nodes, in accordance with an embodiment of the present invention; -
FIG. 6 is a flow diagram of an exemplary method for clustering nodes, in accordance with an embodiment of the present invention; and -
FIG. 7 is a flow diagram of an exemplary method for clustering nodes, in accordance with an embodiment of the present invention. - The subject matter of embodiments of the present invention is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies.
- An input file is received that is comprised of primary nodes, secondary nodes and metrics that relate to the association between the primary nodes and the secondary nodes. Upon receiving the input file, the input file is abridged to reduce the number of nodes contained in the input file. Abridged primary nodes are merged with secondary nodes to form cluster until the cluster size reaches a pre-defined size.
- Accordingly, in one aspect, the present invention provides computer storage media having computer-executable instructions embodied thereon that, when executed, perform a method for clustering nodes. The method includes receiving an input file comprised of a plurality of primary nodes, a plurality of secondary nodes, and a plurality of metrics. Each of the plurality of metrics describes a relationship between one of the plurality of primary nodes and one of the plurality of secondary nodes. The method also includes abridging the input file to generate an abridged file that is comprised of: a plurality of abridged primary nodes, a plurality of abridged secondary nodes, and a plurality of abridged metrics. The method also is comprised of merging a first abridged primary node of the plurality of abridged primary nodes with a first abridged secondary node of the plurality of abridged secondary nodes to form a first cluster.
- In another aspect, the present invention provides a method for clustering nodes in a computing environment having a processor and memory. The method includes receiving an input file comprised of a plurality of primary nodes, a plurality of secondary nodes, and a plurality of metrics. Each of the plurality of metrics describes a relationship between one of the plurality of primary nodes and one of the plurality of secondary nodes. The method also includes abridging, utilizing the processor and memory, the input file to generate an abridged file that is comprised of: a plurality of abridged primary nodes, a plurality of abridged secondary nodes, and a plurality of abridged metrics. The method also is comprised of merging a first abridged primary node of the plurality of abridged primary nodes with a first abridged secondary node of the plurality of abridged secondary nodes to form a first cluster. The method is further comprised of replacing the first abridged secondary node in the first cluster with a second abridged secondary node of the plurality of abridged secondary nodes.
- In another aspect, the present invention provides a system for clustering nodes in a computing environment having a processor and memory. The system is comprised of a receiving component functional to receive an input file comprised of a plurality of primary nodes, a plurality of secondary nodes, and a plurality of metrics. Each of the plurality of metrics describes a relationship between one of the plurality of primary nodes and one of the plurality of secondary nodes. The system is further comprised of an abridging component functional to abridge the input file to generate an abridged file that is comprised of: a plurality of abridged primary nodes, a plurality of abridged secondary nodes, and a plurality of abridged metrics. The system is further comprised of a merging component functional to merge a first abridged primary node of the plurality of abridged primary nodes with a first abridged secondary node of the plurality of abridged secondary nodes to form a first cluster.
- Having briefly described an overview of embodiments of the present invention, an exemplary operating environment suitable for implementing embodiments hereof is described below.
- Referring to the drawings in general, and initially to
FIG. 1 in particular, an exemplary operating environment suitable for implementing embodiments of the present invention is shown and designated generally ascomputing device 100.Computing device 100 is but one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should thecomputing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of modules/components illustrated. - Embodiments may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program modules, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program modules including routines, programs, objects, modules, data structures, and the like, refer to code that performs particular tasks or implements particular abstract data types. Embodiments may be practiced in a variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, specialty computing devices, etc. Embodiments may also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network.
- With continued reference to
FIG. 1 ,computing device 100 includes abus 110 that directly or indirectly couples the following devices:memory 112, one ormore processors 114, one ormore presentation modules 116, input/output (I/O)ports 118, I/O modules 120, and anillustrative power supply 122.Bus 110 represents what may be one or more busses (such as an address bus, data bus, or combination thereof). Although the various blocks ofFIG. 1 are shown with lines for the sake of clarity, in reality, delineating various modules is not so clear, and metaphorically, the lines would more accurately be grey and fuzzy. For example, one may consider a presentation module such as a display device to be an I/O module. Also, processors have memory. The inventors hereof recognize that such is the nature of the art, and reiterate that the diagram ofFIG. 1 is merely illustrative of an exemplary computing device that can be used in connection with one or more embodiments. Distinction is not made between such categories as “workstation,” “server,” “laptop,” “hand-held device,” etc., as all are contemplated within the scope ofFIG. 1 and reference to “computer” or “computing device.” -
Computing device 100 typically includes a variety of computer-readable media. By way of example, and not limitation, computer-readable media may comprise Random Access Memory (RAM); Read Only Memory (ROM); Electronically Erasable Programmable Read Only Memory (EEPROM); flash memory or other memory technologies; CDROM, digital versatile disks (DVD) or other optical or holographic media; magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, carrier waves or any other medium that can be used to encode desired information and be accessed by computingdevice 100. -
Memory 112 includes computer-storage media in the form of volatile and/or nonvolatile memory. The memory may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid-state memory, hard drives, optical-disc drives, etc.Computing device 100 includes one or more processors that read data from various entities such asmemory 112 or I/O modules 120. Presentation module(s) 116 present data indications to a user or other device. Exemplary presentation modules include a display device, speaker, printing module, vibrating module, and the like. I/O ports 118 allowcomputing device 100 to be logically coupled to other devices including I/O modules 120, some of which may be built in. Illustrative modules include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, and the like. - With reference to
FIG. 2 , a block diagram is shown that illustrates an exemplary system suitable for implementing embodiments of the present invention as shown and designated generally asclustering system 200, in accordance with an embodiment of the present invention.Clustering system 200 is but one example of a suitable clustering system and is not intended to suggest any limitations as to the scope or functionality of the invention. Neither shouldclustering system 200 be interpreted as having any dependency or requirement relating to any one or combination of modules/components illustrated. -
Clustering system 200 includes acomputing device 202, a network, 204, acomputing device 206, a receivingcomponent 208, acleaning component 210, asorting component 212, anabridging component 214, aninitial merging component 216, and initial uniquenode evaluation component 218, an initial uniquenode merging component 220, a subsequentnode merging component 222, a clustersize evaluating component 224, acluster replacing component 226, acluster merging component 228, and alist generating component 230. - In an exemplary embodiment of the present
invention computing devices computing device 100 previously described in relation toFIG. 1 .Computing device 202 provides an input file tocomputing device 206. Additionally, in an embodiment,computing device 202 is a server that contains a database that from which the input file is created. Bothcomputing devices Network 204 includes, without limitation, one or more local area networks (LANs) and/or wide area networks (WANs). Such networking environments are commonplace in offices, residential networks, intranets, and the Internet. Accordingly, thenetwork 204 is not further described herein. -
Computing device 206 contains a plurality of components. While thecluster system 200 visually depicts the components within asingle computing device 206, it is understood and appreciated by those skilled in the art that the various components illustrated as part ofcomputing device 206 can be independent fromcomputing device 206 wherein the components are directly or indirectly connected to a network such asnetwork 204. It will also be appreciated and understood by those skilled in the art that the components depicted as part ofclustering system 200 are merely illustrations of anexemplary clustering system 200 and it has been contemplated that the components need not be separate or distinct from one another and instead the various components may be merged into multiple resulting components. -
Computing device 206 includes receivingcomponent 208. Receivingcomponent 208 receives an input file. The input file is a file, document, data structure and/or information feed that provides nodes or nodal information that is clustered byclustering system 200. An exemplary embodiment of an input file, as visually depicted inFIG. 3A and generally indicated asinput file 301 is in accordance with an embodiment of the present invention.Input file 301 includes three classes or types of information/data, primary node data, secondary node data, and metric data. A node is a data point that represents a specific instance or occurrence of data. For example, a node can represent a keyword in a marketing analysis, wherein each node represents a keyword. Further examples of nodes serving as keywords would be a first node representing the keyword “run”, a second node representing the keyword “running”, and a third node representing the keyword phrase “running shoes”. Each of the three exemplary nodes represents a keyword or phrase. Additionally, in another exemplary embodiment, a node represents a person, identity, or group in a social network. Examples include each node represents an individual user of a social network, or a group within a social network. In yet a further embodiment, a node includes graph nodes. A graph may be formed by a plurality of graph nodes, wherein the graph nodes have connections between one another. The connection between each of the graph nodes may be either a directed connection or an undirected connection. A directed connection is when a first node is connected to a second node, but the second node is not necessarily connected to the first node. An undirected connection, on the other hand, represents a connection between a first and a second node wherein if a first node is connected to a second node, the second node is therefore connected to the first node. Examples of graphs comprised of graph nodes includes a network, which as previously discussed, a network may include a social network or advertising network. Stated in other words, nodes of a social network may represent participants within the social network and the connections represent the relationships among the participants/nodes of the social network. - The metric data of
input file 301 is a value indicator of the relational strength, closeness, or correlation between a primary node and a secondary node. In an exemplary embodiment where a node represents a keyword and the metric indicates the number of advertisers that utilize or desire both the primary node and the secondary node. Therefore, using an example from above, if the primary node is “run” and an associated secondary node is “running” and the metric is “24”, this indicates that twenty-four advertisers that bid on and use keyword “run” also bid on and use the keyword “running” in their online advertising campaign. In an exemplary embodiment, every secondary node is also a primary node. In an exemplary embodiment, secondary nodes are also primary nodes when the connections between the nodes are undirected. But, in an alternative exemplary embodiment, a secondary node is often, but not always, a primary node. In an exemplary embodiment of a secondary node not also being a primary node results from a directed connection among the nodes. When discussing primary nodes and secondary nodes in the context of keywords, an originator keyword is akin to a primary node while a receptor keyword is akin to a secondary node, and metric is akin to tie-strength metric. - In an exemplary embodiment,
input file 301 is generated as a raw input file 310 where the order of the data and the data contained is derived from the underlying data source.Raw input file 301 includes adataset 314 which includes a primary node entry, a secondary node entry, and a metric. The primary node entry fordata set 314 is “N15”, the secondary node entry fordataset 314 is “A” and the metric fordataset 314 is “1”. The primary node, secondary node, and metric ofdataset 314 are associated with one another because of their inclusion in a common dataset. An additional dataset of raw input file 310 isdataset 312.Dataset 312 includes a primary node of “N12”, a secondary node of “E”, and a metric of “13”. Raw input file 310, in this exemplary embodiment, has not been sorted, cleaned, or abridged. The datasets are not in a particular order within the raw input file 310. - It will be understood and appreciated by those skilled in the art that
input file 301 is a visual representation of an exemplary embodiment of an input file and that an input file may be of any form that conveys data information so that the data may be clustered by a data clustering system. As previously mentioned, the input file may be a file, document, data/information feed, or data structure that provides information that can be clustered. - Returning to
FIG. 2 , in an exemplary embodiment, computing device includescleaning component 210. In an exemplary embodiment,cleaning component 210 cleans the input file. The cleaning of an input file may include removing unreliable data from the file. An example of unreliable data in a keyword setting would include those keywords that are bid on by a particular on-line advertiser that bids on a large collection of keywords. In this example, the datasets that represent those keywords used by the mass advertiser would be cleaned from the input file to provide more reliable data to the clustering system. Other examples of cleaning a keyword oriented input file would be to remove those datasets that identify a single advertiser. In this example, if there is a particular keyword that would only be used by an individual advertiser, then that data is cleaned to prevent identifying that advertiser based on the resulting clusters or input file. Cleaning of the input file may also include removing datasets that are incomplete or contain data that is inconsistent or aberrant with the other datasets of the input file. -
Computing device 206 also includes sortingcomponent 212. In an exemplary embodiment, sorting component sorts the data and/or data sets of the input file and/or variations of the input file. Sorting includes arranging the data sets by ascending, descending, numerical value, alphabetic value, or commonality of data elements of any of the data included in the input file. For example, the input file can be sorted from “A”-“Z” based on the originator keyword / primary node entry. Therefore, in the above example, the data sets associated with the primary nodes that are sorted are also manipulated in order to maintain the primary nodes in association with their associated secondary node and the metric describing the relationship between the primary and secondary nodes. Referring toFIG. 3B an exemplary embodiment of a sorted input file is illustrated and generally indicated assorted input file 302. In an exemplary embodiment, sortedinput file 302 is a sorted version of a raw input file, such asraw input file 301 illustrated inFIG. 3A .Sorted input file 302 is first sorted in an ascending manner based on the primary node and further sorted in descending order based on the metric. For example,dataset 322 includes a primary node of “N12”, a secondary node of “E”, and a metric of “13” which has been sorted to relocate that dataset from its original position as indicated inFIG. 3A and represented asdataset 312.Dataset 324 ofFIG. 3B has been re-positioned in the sorted input file 302 from an original position as illustrated bydataset 314 ofFIG. 3A . - Returning to
FIG. 2 , in an exemplary embodiment,computing device 206 includesabridging component 214. In an exemplaryembodiment abridging component 214 abridges an input file to reduce the number of datasets included in the input file. When an input file is abridged, the resulting file may be referred to as an abridged input file. Abridging of an input file reduces the amount or number of data elements present in the input file (where input file represents any sorted, cleaned or otherwise manipulated input file). In an exemplary embodiment, an input file is abridged to reduce the number of secondary nodes that are associated with each unique primary node. For example, if there are 5 distinct secondary nodes associated with a single primary node, the input file is abridged to reduce the number of secondary nodes associated with the primary nodes. This is exemplified inFIG. 3C , an exemplary embodiment of an abridged input file generally indicated asabridged file 303.Abridged file 303, in this exemplary embodiment, has been abridged to include, at most, three secondary nodes with each primary node. For example, primary nodes “Nix”, where “x” represent a number for clarification purposes only, has been reduced from having five secondary nodes associated to only having three secondary nodes associated with the primary node “Nix”.Abridged file 303 does not contain the datasets that include primary nodes “N14” or “N15” as previously included in the input file prior to the file being abridged.Dataset 332, which is representative ofdataset 322 anddataset 312 ofFIG. 3A , andFIG. 3A respectively remains in the abridged file because it is one of three datasets that contained a common primary node, and had a metric that satisfied a condition. And for this example, the metric condition was the three highest metric values were maintained while all data sets with a metric score below the third highest were eliminated from the abridged file. Abridging a file reduces the file size which in turn reduces the resources required to cluster. In an exemplary embodiment, the input file is abridged so that no more than ten secondary nodes are associated with each primary node. In another exemplary embodiment, the input file is abridged so that no more than 5 secondary nodes are associated with a primary node. - Returning to
FIG. 2 ,computing device 206 includesinitial merging component 216. In an exemplary embodiment,initial merging component 216 merges each unique primary node with a secondary node based on a condition. The condition to determine which secondary nodes are paired with a primary node, in an exemplary embodiment, would be determined based on those secondary nodes that are associated with the greatest metric value. For example, if the primary node identified as “Node A” was included in three datasets, where the first dataset included the secondary node “Node B” and a metric of “1”, and the second dataset included a secondary node “Node C” and a metric of “9”, and a third dataset included a secondary node “Node D” and a metric of “5”, “Node A” would merge with “Node C” because the data set with a primary node “Node A”, a secondary node “Node C”, and a metric “9” is the dataset with the highest metric that has a primary node of “Node A”. - The initial merging of nodes from an abridged input file, in an exemplary embodiment, are processed in a whole pass. A whole pass processes an input file sequentially without focusing on a particular primary node, secondary node, or metric value. Instead, whole pass merging provides an efficient way of merging nodes in a document since each dataset is only evaluated once. An alternative approach to the initial merge is to process all datasets that include a particular element before moving onto the next set of datasets that contain a different particular element. An example of this type of initial merging would include evaluating only those datasets that include a particular primary node to determine which nodes should initially be merged before evaluating a different primary node value.
- In an exemplary embodiment, computing device 260 includes an initial unique
node evaluating component 218. The initial uniquenode evaluating component 218, in an embodiment of the invention, evaluates data elements to determine if the evaluated data element is the first instance of that data element to be evaluated from the input file. For example, initial uniquenode evaluating component 218 evaluates the primary node of each dataset to determine if that primary node value or entry has occurred previously in the input file. This allows for the first occurrence of a node entry to be determined, and once it is determined that a particular node entry is the first occurrence a resulting action may be performed. For example, if each initial unique originator keyword is expected to be merged with the associated receptor keyword, it must be determined if the current originator keyword is the first occurrence of that originator keyword, which therefore means that that originator keyword has not yet been merged, or if the originator keyword is not an initial instance and therefore a different action should be taken because that originator keyword has already been merged. - In an exemplary embodiment,
computing device 206 includes initial uniquenode merging component 220. In an exemplary embodiment of the present invention, initial uniquenode merging component 220 merges a node that is an initial occurrence of a unique primary node and merges that node with an associated secondary node to form a first cluster. For example, an initial occurrence of a unique primary node is determined by initial uniquenode evaluating component 218, once a node has been determined as an initial occurrence of a unique node the initial uniquenode merging component 220 merges that identified node with the secondary node of the dataset containing the first occurrence of the initial unique primary node. - In an exemplary embodiment,
computing device 206 includes subsequentnode merging component 222. Subsequentnode merging component 222 merges nodes wherein the primary node has not been identified as an initial unique node. For example, if the initial unique evaluating component determines that a particular node is not an initial instance, the subsequentnode merging component 222 merges the nodes of that dataset to form a cluster that is not an initial or first cluster created by initial uniquenode merging component 220. - In yet another exemplary embodiment of the present invention,
computing device 206 includes a clustersize evaluating component 224. The clustersize evaluating component 224 evaluates the size of a cluster to determine if that particular cluster satisfies a condition. For example, if three-hundred nodes have been clustered to form a cluster, the clustersize evaluating component 224 evaluate the number of nodes that are comprised in the cluster to determine if that cluster satisfies a condition that limits the number of nodes to a cluster. - In an exemplary embodiment of the present invention,
computing device 206 includes acluster replacing component 226. Cluster replacingcomponent 226 replaces one cluster with another cluster. For example,cluster replacing component 226 replaces a first/previous cluster with a second/subsequent cluster if it is determined that the second/subsequent cluster satisfies a condition. Further in example, the condition to be satisfied in order for thecluster replacing component 226 to replace the initial/previous cluster is if the second/subsequent cluster has a higher metric than the first/subsequent cluster. Additional conditions include metric values less than a second/subsequent cluster, metric values more than a second/subsequent cluster, or other comparative conditions to determine if as cluster should be replaced. - In an exemplary embodiment,
computing device 206 includescluster merging component 228.Cluster merging component 228 clusters two or more clusters together to form a single cluster. For example, referring toFIG. 4A , a visual representation of nodes of an embodiment of the present invention. The un-clustered group of nodes is generally referred to asnodes 400.Nodes 400 includesnode 410 with represents primary node “N11”,node 412 which represent secondary node “F”,node 414 which represents primary node “N21”,node 416 which represents secondary node “B”, andnode 418 which represents primary node “N31”. In an exemplary embodiment of the present invention, the initialunique merging component 220 mergesnode 410 andnode 412 to form an initial cluster sincenode 410 is an initial unique node as determined by the initial uniquenode evaluating component 218. The same is true fornodes nodes FIG. 4B , a visual representation of clusters of an embodiment of the present invention. The resulting cluster ofnodes cluster 420. The resulting cluster ofnodes cluster 422, and the resulting cluster ofnodes cluster 424. Additionally,cluster merging component 230clusters cluster cluster 430.Cluster 430 includesnodes - In an exemplary embodiment,
computing device 206 includeslist generating component 230. Thelist generating component 230 generates a list that includes the resulting clusters and the associated nodes of the clusters. The list generated may be in the form of a data file, a document, a data/information feed or other forms that communicate the resulting clusters and node combinations. In an exemplary embodiment, the list includes each primary nodes and the secondary nodes and primary nodes that are associated by one or more common clusters. The list generated in the exemplary embodiment includes the metric(s) associated with the cluster(s) and nodes. - With reference to
FIG. 5 , a flow diagram is shown that illustrates an exemplary method for clustering nodes and designated generally asclustering method 500, in accordance with an embodiment of the present invention. An input file is received atblock 502. The input file includes primary nodes as indicated atblock 504, secondary nodes as indicated at 506, and metrics as indicated at 508. The input file is abridged to produce an abridged input file as indicated atblock 510. The primary nodes of the abridged file are evaluated atblock 512. The evaluation of the primary node is to determine if the primary node is a unique initial instance of that evaluated primary node as indicated atblock 514. If the primary node is determined to be a unique first instance, that primary node is merged with the secondary node of the unique initial primary node's dataset as indicated atblock 516. - However, if the primary node is determined not to be a unique initial instance of the primary node, then a determination is made to determine if the subsequent cluster that would result from the merging of the evaluated primary node and its associated secondary node should replace the previous cluster as indicated at
block 518. Where the previous cluster is comprised of either the unique initial primary node and its associated secondary node or it is comprised of a subsequent (non-initial) primary node and its associated secondary node which had previously replaced a cluster to become the previous cluster. The determination to replace a previous or initial cluster is made by a condition. In an exemplary embodiment a previous cluster is replaced with a subsequent cluster when the metric associated with the subsequent cluster is greater than the previous cluster. When it is determined that a previous cluster should be replaced by a subsequent cluster, the previous cluster is replaced by the subsequent cluster and the subsequent cluster is then known as the previous cluster as indicated atblock 520. Therefore, in an exemplary embodiment, the previous cluster will always result in having the highest associated metric followingblock 520. - The process of evaluating the primary node at
block 512 is repeated until there are no more primary nodes to evaluate as indicated atblock 522. If there are no additional primary nodes to evaluate then the previous clusters are merged with subsequent clusters as indicated atblock 524. The clustering of previous clusters and subsequent clusters, in an exemplary embodiment, is done according to structured conditions or rules. For example, clusters that include common nodes are combined based on each clusters' metric value or the resulting metric value of the resulting cluster. - Referring to
FIG. 6 , a flow diagram is shown that illustrates an exemplary method for clustering keywords and designated generally asclustering method 600, in accordance with an embodiment of the present invention. An input file is received atblock 602. The input file is comprised of originator keywords indicated atblock 604, receptor keywords indicated atblock 606, and tie-strength metrics indicated atblock 608. The input file is abridged atblock 610. The initial originator keyword of the abridged file is merged with its associated receptor keyword to form a first cluster indicated atblock 612. This first cluster serves as a seed for the clustering method. Additionally, an associated receptor keyword is a receptor keyword that is included in the same dataset as the evaluated originator keyword. More generically, an associated node is a node that is contained in a common dataset. - A subsequent originator keyword is evaluated, indicated at
block 614 to determine if the subsequent originator keyword is an initial unique originator keyword. If it is determined, atblock 616, that the subsequent originator keyword is an initial unique keyword, meaning that that particular originator keyword has not previously been evaluated, then that originator keyword is merged with its associated receptor keyword to create a first cluster, as indicated atblock 618. However, if it is determined atblock 616 that the subsequent originator keyword id not an initial unique keyword, then the subsequent originator keyword is merged with its associated receptor keyword to form a second cluster as indicated atblock 620. - A determination is made if the resulting second cluster should replace an existing first cluster that shares the same originator keyword, as indicated at
block 622. If it is determined that the second cluster should replace the first cluster, then the second cluster does replace the first cluster as indicated atblock 624. A second cluster that replaces a first cluster is then a first cluster for purposes of a subsequent determination such as done atblock 622. The determination of whether a first cluster should be replaced by a second cluster is done based on stabled or pre-defined conditions and/or rules. In an exemplary embodiment, a second cluster replaces a first cluster when the second cluster has a greater associated tie-strength metric - However, if it is determined that the second cluster should not replace the first cluster a determination is made to determine if additional originator keywords remain in the abridged file to evaluate, as indicated at
block 626. When it is determined that additional originator keywords remain in the input file to evaluate, a subsequent originator keyword is then evaluated as indicated atblock 614. If it is determined that there are no additional originator keywords to evaluate, clusters are merged as indicated atblock 628. The merging of clusters, in an exemplary embodiment, is performed to combine clusters that have a common originator keyword, a common receptor keyword, or any combination of common originator keywords and receptor keywords. The merging of clusters, as indicated atblock 628, merges clusters that were previously merged and continues to merge those clusters until all clusters have merged that satisfy a condition, such as a maximum node/keyword number. Finally, a list is generated as indicated atblock 630. As previously discussed, the list may be in any form that communicates the relationship or association of the originator keywords, receptor keywords, tie-strength metrics, and resulting clusters. - Referring to
FIG. 7 a flow diagram is shown that illustrates an exemplary method for clustering keywords and designated generally asclustering method 700, in accordance with an embodiment of the present invention. An input file is received, as indicated atblock 702. The input file is comprised of originator keywords, as indicated atblock 704, receptor keywords, as indicated atblock 706, and tie-strength metrics, as indicated atblock 708. The input file is cleaned, as indicated atblock 710, and sorted as indicated atblock 712. The input file is then abridged to create an abridged file, as indicated atblock 714. It is understood and appreciated by those skilled in the art that while theexemplary clustering method 700 visually represents the cleaning, sorting and abridging of the input file in a particular order, the various alternative arrangements have been contemplated and are optional embodiments of the present invention. - The initial originator keyword is merged with its associated receptor keyword to form a first cluster as indicated at
block 716. A subsequent originator keyword is evaluated as indicated atblock 718. The subsequent originator keyword is evaluated to determine if the subsequent originator keyword is an initial unique originator keyword as indicated atblock 720. If it is determined that the subsequent originator keyword is an initial unique originator keyword, the subsequent originator keyword is merged with its associated receptor keyword to form a first cluster as indicated atblock 722. However, if it is determined to the contrary, the subsequent originator keyword is merged with its associated receptor keyword to form a second cluster, as indicated atblock 724. - The tie-strength of the second cluster and a related first cluster are evaluated, as indicated at
block 726, to determine if the first cluster should be replaced, as indicated atblock 728. If it is determined that the first cluster should replace the second cluster, the first cluster is replaced by the second cluster, as indicated atblock 730. However, if it is determined that the first cluster should not be replaced a determination is made if additional originator keywords could be evaluated, as indicated atblock 732. If additional, originator keywords are present in the input file, and then those keywords are evaluated in turn as indicated atblock 718. If additional originator keywords do not exist, the clusters are merged to form larger clusters. The merger of clusters with other clusters, in an exemplary embodiment, is dependent on a condition that a cluster containing more than 300 originator keywords will not be merged with another cluster. Once all of the clusters have been merged to satisfy the conditions of merging, a list is generated, as indicated atblock 736.
Claims (20)
1. One or more computer storage media having computer-executable instructions embodied thereon that, when executed, perform a method for clustering nodes, the method comprising:
receiving an input file comprised of a plurality of primary nodes, a plurality of secondary nodes, and a plurality of metrics, wherein each of the plurality of metrics describes a relationship between one of the plurality of primary nodes and one of the plurality of secondary nodes;
abridging the input file to generate an abridged file that is comprised of:
(1) a plurality of abridged primary nodes,
(2) a plurality of abridged secondary nodes, and
(3) a plurality of abridged metrics; and
merging a first abridged primary node of the plurality of abridged primary nodes with a first abridged secondary node of the plurality of abridged secondary nodes to form a first cluster.
2. The computer storage media of claim 1 , wherein the method further comprises replacing the first abridged secondary node in the first cluster with a second abridged secondary node of the plurality of abridged secondary nodes.
3. The computer storage media of claim 2 , wherein the first abridged secondary node is replaced with the second abridged secondary node based on a replacement condition.
4. The computer storage media of claim 1 , wherein the method further comprises merging the first cluster with a second cluster to create a resulting cluster, wherein the abridged primary node of the first cluster is comparable to an abridged primary node of the second cluster.
5. The computer storage media of claim 4 , wherein the first cluster is merged with the second cluster when the first cluster is comprised of a defined number of nodes.
6. The computer storage media of claim 4 , wherein the first cluster is merged with the second cluster when the first cluster contains less than a defined number of nodes.
7. The computer storage media of claim 4 , wherein the plurality of abridged metrics represent a strength of a relationship between one of the plurality of abridged primary nodes and one of the plurality of abridged secondary nodes.
8. The computer storage media of claim 4 , wherein the method further comprises:
determining the first abridged primary node is not an initial occurrence of the first abridged primary node; and
identifying the second cluster having a comparable abridged primary node to the abridged primary node of the first cluster.
9. The computer storage media of claim 1 , wherein the method further comprises removing unreliable data from the input file.
10. The computer storage media of claim 1 , wherein the method further comprises sorting the input file by the plurality of metrics.
11. The computer storage media of claim 1 , wherein abridging the input file is comprised of reducing a number of the plurality of primary nodes to a pre-defined maximum number.
12. The computer storage media of claim 1 , wherein abridging the input file is comprised of reducing a number of the plurality of secondary nodes to a pre-defined maximum number.
13. The computer storage media of claim 1 , wherein abridging the input file is comprised of reducing a number of the plurality of metrics to a pre-defined maximum number.
14. The computer storage media of claim 1 , wherein the input file is derived from either a social networking database or an advertising database.
15. A method for clustering nodes in a computing environment having a processor and memory, the method comprising:
receiving an input file comprised of a plurality of primary nodes, a plurality of secondary nodes, and a plurality of metrics, wherein each of the plurality of metrics describes a relationship between one of the plurality of primary nodes and one of the plurality of secondary nodes;
abridging, utilizing the processor and memory, the input file to generate an abridged file that is comprised of:
(1) a plurality of abridged primary nodes,
(2) a plurality of abridged secondary nodes, and
(3) a plurality of abridged metrics; and
merging a first abridged primary node of the plurality of abridged primary nodes with a first abridged secondary node of the plurality of abridged secondary nodes to form a first cluster; and
replacing the first abridged secondary node in the first cluster with a second abridged secondary node of the plurality of abridged secondary nodes.
16. The method of claim 15 , wherein the first abridged secondary node is replaced with the second abridged secondary node based on a replacement condition associated with at least one of the plurality of abridged metrics.
17. The method of claim 15 , wherein the method further comprises merging the first cluster with a second cluster to create a resulting cluster, wherein the abridged primary node of the first cluster is comparable to an abridged primary node of the second cluster.
18. The method of claim 15 , wherein abridging the input file is comprised of reducing, to a predefined number, at least one of:
1) the plurality of primary nodes,
2) the plurality of secondary nodes, or
3) the plurality of metrics.
19. The method of claim 15 , wherein the input file is derived from either a social networking source or an advertising source.
20. A system for clustering nodes in a computing environment having a processor and memory, the system comprising:
a receiving component functional to receive an input file comprised of a plurality of primary nodes, a plurality of secondary nodes, and a plurality of metrics, wherein each of the plurality of metrics describes a relationship between one of the plurality of primary nodes and one of the plurality of secondary nodes;
an abridging component functional to abridge the input file to generate an abridged file that is comprised of:
(1) a plurality of abridged primary nodes,
(2) a plurality of abridged secondary nodes, and
(3) a plurality of abridged metrics; and
a merging component functional to merge a first abridged primary node of the plurality of abridged primary nodes with a first abridged secondary node of the plurality of abridged secondary nodes to form a first cluster.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/876,610 US20100332564A1 (en) | 2008-02-25 | 2010-09-07 | Efficient Method for Clustering Nodes |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/036,720 US7818322B2 (en) | 2008-02-25 | 2008-02-25 | Efficient method for clustering nodes |
US12/876,610 US20100332564A1 (en) | 2008-02-25 | 2010-09-07 | Efficient Method for Clustering Nodes |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/036,720 Continuation US7818322B2 (en) | 2008-02-25 | 2008-02-25 | Efficient method for clustering nodes |
Publications (1)
Publication Number | Publication Date |
---|---|
US20100332564A1 true US20100332564A1 (en) | 2010-12-30 |
Family
ID=40999326
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/036,720 Expired - Fee Related US7818322B2 (en) | 2008-02-25 | 2008-02-25 | Efficient method for clustering nodes |
US12/853,435 Expired - Fee Related US8150853B2 (en) | 2008-02-25 | 2010-08-10 | Efficient method for clustering nodes |
US12/876,610 Abandoned US20100332564A1 (en) | 2008-02-25 | 2010-09-07 | Efficient Method for Clustering Nodes |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/036,720 Expired - Fee Related US7818322B2 (en) | 2008-02-25 | 2008-02-25 | Efficient method for clustering nodes |
US12/853,435 Expired - Fee Related US8150853B2 (en) | 2008-02-25 | 2010-08-10 | Efficient method for clustering nodes |
Country Status (1)
Country | Link |
---|---|
US (3) | US7818322B2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120226669A1 (en) * | 2011-03-04 | 2012-09-06 | Lsi Corporation | Merging a storage cluster into another storage cluster |
US20170339005A1 (en) * | 2015-02-10 | 2017-11-23 | Huawei Technologies Co., Ltd. | Method and Device for Processing Failure in at Least One Distributed Cluster, and System |
US9904721B1 (en) * | 2013-01-25 | 2018-02-27 | Gravic, Inc. | Source-side merging of distributed transactions prior to replication |
CN108959484A (en) * | 2018-06-21 | 2018-12-07 | 中国人民解放军战略支援部队信息工程大学 | More tactful media data filtration methods and its device towards event detection |
CN112400296A (en) * | 2018-07-11 | 2021-02-23 | 国际商业机器公司 | Network performance assessment without topology information |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7818322B2 (en) * | 2008-02-25 | 2010-10-19 | Microsoft Corporation | Efficient method for clustering nodes |
US8402027B1 (en) * | 2010-02-11 | 2013-03-19 | Disney Enterprises, Inc. | System and method for hybrid hierarchical segmentation |
US9280574B2 (en) | 2010-09-03 | 2016-03-08 | Robert Lewis Jackson, JR. | Relative classification of data objects |
US8639815B2 (en) | 2011-08-31 | 2014-01-28 | International Business Machines Corporation | Selecting a primary-secondary host pair for mirroring virtual machines |
US20130325861A1 (en) * | 2012-05-31 | 2013-12-05 | International Business Machines Corporation | Data Clustering for Multi-Layer Social Link Analysis |
US20140040152A1 (en) * | 2012-08-02 | 2014-02-06 | Jing Fang | Methods and systems for fake account detection by clustering |
US9424612B1 (en) | 2012-08-02 | 2016-08-23 | Facebook, Inc. | Systems and methods for managing user reputations in social networking systems |
JP5855594B2 (en) * | 2013-02-19 | 2016-02-09 | 日本電信電話株式会社 | Clustering apparatus, clustering processing method and program thereof |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5596703A (en) * | 1993-10-22 | 1997-01-21 | Lucent Technologies Inc. | Graphical display of relationships |
US6360227B1 (en) * | 1999-01-29 | 2002-03-19 | International Business Machines Corporation | System and method for generating taxonomies with applications to content-based recommendations |
US6625585B1 (en) * | 2000-02-18 | 2003-09-23 | Bioreason, Inc. | Method and system for artificial intelligence directed lead discovery though multi-domain agglomerative clustering |
US6801200B1 (en) * | 1998-10-19 | 2004-10-05 | Microsoft Corporation | Layout of data structures based on relationships between the data structures |
US6909965B1 (en) * | 2001-12-28 | 2005-06-21 | Garmin Ltd. | System and method for creating and organizing node records for a cartographic data map |
US20050198286A1 (en) * | 2004-01-30 | 2005-09-08 | Zhichen Xu | Selecting nodes close to another node in a network using location information for the nodes |
US20050222972A1 (en) * | 2002-01-04 | 2005-10-06 | Hewlett-Packard Development Company, L.P. | Computer implemented, fast, approximate clustering based on sampling |
US20060047655A1 (en) * | 2004-08-24 | 2006-03-02 | William Peter | Fast unsupervised clustering algorithm |
US20060290697A1 (en) * | 2005-06-24 | 2006-12-28 | Tom Sawyer Software | System for arranging a plurality of relational nodes into graphical layout form |
US20070174275A1 (en) * | 2006-01-25 | 2007-07-26 | Nec Corporation | Information managing system, information managing method, and information managing program for managing various items of information of objects to be retrieved |
US20070179944A1 (en) * | 2005-11-23 | 2007-08-02 | Henry Van Dyke Parunak | Hierarchical ant clustering and foraging |
US20070185904A1 (en) * | 2003-09-10 | 2007-08-09 | International Business Machines Corporation | Graphics image generation and data analysis |
US20070192350A1 (en) * | 2006-02-14 | 2007-08-16 | Microsoft Corporation | Co-clustering objects of heterogeneous types |
US7334187B1 (en) * | 2003-08-06 | 2008-02-19 | Microsoft Corporation | Electronic form aggregation |
US20090063538A1 (en) * | 2007-08-30 | 2009-03-05 | Krishna Prasad Chitrapura | Method for normalizing dynamic urls of web pages through hierarchical organization of urls from a web site |
US20090216780A1 (en) * | 2008-02-25 | 2009-08-27 | Microsoft Corporation | Efficient method for clustering nodes |
US7912854B2 (en) * | 2005-01-28 | 2011-03-22 | United Parcel Service Of America, Inc. | Registration and maintenance of address data for each service point in a territory |
US7913163B1 (en) * | 2004-09-22 | 2011-03-22 | Google Inc. | Determining semantically distinct regions of a document |
US8180725B1 (en) * | 2007-08-01 | 2012-05-15 | Google Inc. | Method and apparatus for selecting links to include in a probabilistic generative model for text |
-
2008
- 2008-02-25 US US12/036,720 patent/US7818322B2/en not_active Expired - Fee Related
-
2010
- 2010-08-10 US US12/853,435 patent/US8150853B2/en not_active Expired - Fee Related
- 2010-09-07 US US12/876,610 patent/US20100332564A1/en not_active Abandoned
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5596703A (en) * | 1993-10-22 | 1997-01-21 | Lucent Technologies Inc. | Graphical display of relationships |
US6801200B1 (en) * | 1998-10-19 | 2004-10-05 | Microsoft Corporation | Layout of data structures based on relationships between the data structures |
US6360227B1 (en) * | 1999-01-29 | 2002-03-19 | International Business Machines Corporation | System and method for generating taxonomies with applications to content-based recommendations |
US6625585B1 (en) * | 2000-02-18 | 2003-09-23 | Bioreason, Inc. | Method and system for artificial intelligence directed lead discovery though multi-domain agglomerative clustering |
US6909965B1 (en) * | 2001-12-28 | 2005-06-21 | Garmin Ltd. | System and method for creating and organizing node records for a cartographic data map |
US20050222972A1 (en) * | 2002-01-04 | 2005-10-06 | Hewlett-Packard Development Company, L.P. | Computer implemented, fast, approximate clustering based on sampling |
US7334187B1 (en) * | 2003-08-06 | 2008-02-19 | Microsoft Corporation | Electronic form aggregation |
US20070185904A1 (en) * | 2003-09-10 | 2007-08-09 | International Business Machines Corporation | Graphics image generation and data analysis |
US20050198286A1 (en) * | 2004-01-30 | 2005-09-08 | Zhichen Xu | Selecting nodes close to another node in a network using location information for the nodes |
US20060047655A1 (en) * | 2004-08-24 | 2006-03-02 | William Peter | Fast unsupervised clustering algorithm |
US7913163B1 (en) * | 2004-09-22 | 2011-03-22 | Google Inc. | Determining semantically distinct regions of a document |
US7912854B2 (en) * | 2005-01-28 | 2011-03-22 | United Parcel Service Of America, Inc. | Registration and maintenance of address data for each service point in a territory |
US20060290697A1 (en) * | 2005-06-24 | 2006-12-28 | Tom Sawyer Software | System for arranging a plurality of relational nodes into graphical layout form |
US20070179944A1 (en) * | 2005-11-23 | 2007-08-02 | Henry Van Dyke Parunak | Hierarchical ant clustering and foraging |
US20070174275A1 (en) * | 2006-01-25 | 2007-07-26 | Nec Corporation | Information managing system, information managing method, and information managing program for managing various items of information of objects to be retrieved |
US7461073B2 (en) * | 2006-02-14 | 2008-12-02 | Microsoft Corporation | Co-clustering objects of heterogeneous types |
US20070192350A1 (en) * | 2006-02-14 | 2007-08-16 | Microsoft Corporation | Co-clustering objects of heterogeneous types |
US8180725B1 (en) * | 2007-08-01 | 2012-05-15 | Google Inc. | Method and apparatus for selecting links to include in a probabilistic generative model for text |
US20090063538A1 (en) * | 2007-08-30 | 2009-03-05 | Krishna Prasad Chitrapura | Method for normalizing dynamic urls of web pages through hierarchical organization of urls from a web site |
US20090216780A1 (en) * | 2008-02-25 | 2009-08-27 | Microsoft Corporation | Efficient method for clustering nodes |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120226669A1 (en) * | 2011-03-04 | 2012-09-06 | Lsi Corporation | Merging a storage cluster into another storage cluster |
US8904141B2 (en) * | 2011-03-04 | 2014-12-02 | Lsi Corporation | Merging a storage cluster into another storage cluster |
US9904721B1 (en) * | 2013-01-25 | 2018-02-27 | Gravic, Inc. | Source-side merging of distributed transactions prior to replication |
US20170339005A1 (en) * | 2015-02-10 | 2017-11-23 | Huawei Technologies Co., Ltd. | Method and Device for Processing Failure in at Least One Distributed Cluster, and System |
US10560315B2 (en) * | 2015-02-10 | 2020-02-11 | Huawei Technologies Co., Ltd. | Method and device for processing failure in at least one distributed cluster, and system |
CN108959484A (en) * | 2018-06-21 | 2018-12-07 | 中国人民解放军战略支援部队信息工程大学 | More tactful media data filtration methods and its device towards event detection |
CN112400296A (en) * | 2018-07-11 | 2021-02-23 | 国际商业机器公司 | Network performance assessment without topology information |
Also Published As
Publication number | Publication date |
---|---|
US20090216780A1 (en) | 2009-08-27 |
US8150853B2 (en) | 2012-04-03 |
US20100325110A1 (en) | 2010-12-23 |
US7818322B2 (en) | 2010-10-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8150853B2 (en) | Efficient method for clustering nodes | |
US9858608B2 (en) | Query suggestion for e-commerce sites | |
Fuxman et al. | Using the wisdom of the crowds for keyword generation | |
US8515937B1 (en) | Automated identification and assessment of keywords capable of driving traffic to particular sites | |
US8793238B1 (en) | Organization system for ad campaigns | |
Aly et al. | Web-scale user modeling for targeting | |
US20080154847A1 (en) | Cloaking detection utilizing popularity and market value | |
US20090063461A1 (en) | User query mining for advertising matching | |
CN109255586B (en) | Online personalized recommendation method for e-government affairs handling | |
US20120095943A1 (en) | System for training classifiers in multiple categories through active learning | |
Bian et al. | Identifying top-k nodes in social networks: a survey | |
US20130191391A1 (en) | Personalization engine for building a dynamic classification dictionary | |
US20110289025A1 (en) | Learning user intent from rule-based training data | |
US20080288347A1 (en) | Advertising keyword selection based on real-time data | |
KR20030091751A (en) | Method and apparatus for categorizing and presenting documents of a distributed database | |
US20100185623A1 (en) | Topical ranking in information retrieval | |
US20100042612A1 (en) | Method and system for ranking journaled internet content and preferences for use in marketing profiles | |
Chellapilla et al. | Improving Cloaking Detection using Search Query Popularity and Monetizability. | |
CN107391577B (en) | Work label recommendation method and system based on expression vector | |
CN115905489B (en) | Method for providing bidding information search service | |
Aly et al. | Towards a robust modeling of temporal interest change patterns for behavioral targeting | |
Yu et al. | An adaptive model for probabilistic sentiment analysis | |
CN114201680A (en) | Method for recommending marketing product content to user | |
Khah et al. | An enhanced Ad event-prediction method based on feature engineering | |
Agagu et al. | Context-aware recommendation methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TANTRUM, JEREMY;REEL/FRAME:024948/0123 Effective date: 20080222 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034766/0509 Effective date: 20141014 |