US8209338B2 - Interest-group discovery system - Google Patents

Interest-group discovery system Download PDF

Info

Publication number
US8209338B2
US8209338B2 US12/651,447 US65144710A US8209338B2 US 8209338 B2 US8209338 B2 US 8209338B2 US 65144710 A US65144710 A US 65144710A US 8209338 B2 US8209338 B2 US 8209338B2
Authority
US
United States
Prior art keywords
keyword
list
node
nodes
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US12/651,447
Other versions
US20100174724A1 (en
Inventor
David Robert Wallace
Marilynn Klamkin
Kali Donovan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PARLU Inc
Original Assignee
PARLU Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PARLU Inc filed Critical PARLU Inc
Priority to US12/651,447 priority Critical patent/US8209338B2/en
Publication of US20100174724A1 publication Critical patent/US20100174724A1/en
Application granted granted Critical
Publication of US8209338B2 publication Critical patent/US8209338B2/en
Assigned to PARLU, INC. reassignment PARLU, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KLAMKIN, MARILYNN, WALLACE, DAVID ROBERT, DONOVAN, KALI
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/282Hierarchical databases, e.g. IMS, LDAP data stores or Lotus Notes

Definitions

  • the present invention relates generally to social networking. More specifically, the invention relates to methods for creating a database of the interests of users, so that users may search for others with the same or similar interests.
  • these types of sites are typically set up such that people must establish a connection with one another before they may view each other's information, so that communication and sharing information is generally limited to being between people who already know each other.
  • these sites allow people to find out about the interests of people they already know, they do not allow people to find others based solely upon their common interests.
  • a number of dating sites allow people to indicate interests, and provide various ways of matching up strangers.
  • these sites expressly include demographic data (sex, age, race, marital status, education, employment, etc.) and physical attributes as part of their matching algorithms.
  • the scope and type of matching is limited, for example allowing a user only to designate one or more of a small number of very broad categories; for example, Yahoo Personals lists such categories as arts, family, travel, cooking, outdoor activities, playing sports, etc.
  • these sites are typically not specific enough to be well suited to finding people of common specific interests, even ignoring the potential presumptions of romantic interest by their users.
  • the present invention allows a form of social networking in which people interested in certain subjects may easily and quickly find others who have the same or similar interests.
  • each person desiring to specify one or more areas of interest to them may have those interests entered in a database.
  • a person may then search the database to find others who have indicated in the database that they have similar interests to the searcher. This allows people who otherwise do not know each other but share one or more of the same interests to find and communicate with one another.
  • Taglist a list of “tags” or keywords, called a taglist, which describes the area of interest to the user.
  • Multiple interests are represented by multiple taglists, one taglist for each area of interest.
  • the taglists for multiple users are stored in a taglist database.
  • users may enter a search taglist indicating an area of interest in which they wish to find other users with similar interests.
  • the system will compare the search taglist selected by the user with the other taglists stored in the database and return a list of the closest matching users, i.e., those with the closest matching taglists.
  • some or all of the users who are the closest matches to a given search taglist can create a common interest group.
  • Those users in the interest group who are online at the same time can communicate, via IM, voice, video connection or other communication means; such communications among persons who are online are known in the art.
  • Those users in the interest group who are not online at the time can be invited to join discussions at other times via an email, voicemail, forum or chat room posting or other message.
  • Interest areas are not limited to hobbies, but may also include such things as support groups, shopping co-ops, student or parent groups, groups of traveling companions, among many examples.
  • a search taglist may be used to help locate persons in the database having a desired expertise by focusing the matching search on an appropriately narrow area.
  • a document, file or project in many cases may be represented by a single taglist, although it may require more than one taglist to adequately cover its contents.
  • taglists enable a user to easily and greatly refine an interest area or a search by specifying interest areas that range from very broad to very narrow. If matching search for a given taglist does not return a suitable set of people, then the taglist defining the interest area can be adjusted and the matching search can be repeated until a group of people that is satisfactory to the searching user is found. At that point the matching group may be turned into a formal interest group if desired.
  • a method of adding a new keyword list to a hierarchical database containing a plurality of stored keyword lists, the database containing nodes corresponding to the keywords of the stored lists, by inserting the new list at the point of its closest match to an existing list comprises comparing the first keyword of the new list to the nodes at a first level of the database to determine whether the first keyword is close enough to an existing node at the first level for the new list to be placed within the existing node; if the first keyword is close enough to fall within an existing node at the first level, repeating the comparison for the next keyword to the nodes at the level which is directly under the existing node; repeating the comparison for each subsequent keyword at each subsequent level until a keyword is not close enough to be placed within an existing node; and when a keyword is not close enough fall within an existing node at the level at which it is compared, creating a new node corresponding to the keyword directly under the last node for which a keyword was dose enough to be placed within the node.
  • a method of adding a new weighted keyword list to a hierarchical database containing a plurality of stored weighted keyword lists, the database containing nodes corresponding to the weighted keywords of the stored lists, by inserting the new list at the point of its closest match to an existing list comprises representing each weighted keyword list as a vector; comparing the first element of the new keyword vector to the nodes at a first level of the database to determine whether the first element is close enough to an existing node at the first level for the new list to be placed within the existing node; if the first element is close enough to fall within an existing node at the first level, repeating the comparison for the next element to the nodes at the level which is directly under the existing node; repeating the comparison for each subsequent element at each subsequent level until an element of the new keyword vector is not close enough to be placed within an existing node; and, when an element of the new keyword vector is not close enough fall within an existing node at the level at which it is compared, creating a new node
  • a method of searching a hierarchical database of keyword lists for lists matching a search keyword list comprises comparing the first keyword of the search list to the nodes at a first level of the database to determine whether any existing nodes match the first keyword; if one or more nodes match the first keyword, repeating the comparison for the next keyword to the nodes at the level which is directly under the nodes matching the first keyword; repeating the comparison for each subsequent keyword at each subsequent level until the nodes closest to the search list are located; and returning a list of the located nodes.
  • a method of searching a hierarchical database of weighted keyword lists for lists matching a search weighted-keyword list, the database containing nodes corresponding to the weighted keywords of the stored lists comprises comparing the first weighted keyword of the search list to the nodes at a first level of the database to determine whether any existing nodes match the first weighted keyword; if one or more nodes match the first weighted keyword, repeating the comparison for the next weighted keyword to the nodes at the level which is directly under the nodes which match the first keyword; repeating the comparison for each subsequent weighted keyword at each subsequent level until the nodes closest to the search list are located; and returning a list of the located nodes.
  • FIG. 1 illustrates a data tree according to the present invention.
  • FIG. 2 is a flowchart of the method of inserting a taglist into the data tree in one embodiment of the invention.
  • FIG. 3 is a diagram showing how the distance between vectors is related to the scope of a search of the data tree in one embodiment of the invention.
  • people may indicate one or more areas of interest to them and have those interests entered into a database so that they or others may locate people having interests similar to their own.
  • a user of a system or method according to the present invention specifies an area in which they are interested by means of a taglist, which is a list of keywords or attributes that describes the area of interest. For example, a person interested in collecting Barbie dolls might designate this interest by the taglist (collecting, dolls, Barbie), while one interested in the use of catalysts in chemical reactions might use the taglist (chemistry, reactions, catalysts) to designate such area.
  • Interest areas can be broad or narrow. In general, a more narrow interest area will have many attributes, while a broad interest area will have relatively fewer attributes; for example, one interested in collecting dolls in general may omit the attribute “Barbie” from the taglist and use the list (collecting, dolls).
  • the taglists generated by users who wish to indicate their interests to others are stored in a taglist database.
  • the taglist database consists of nodes, where each node has internal fields to hold an associated taglist vector as well as fields used to link nodes together to form the tree-like database structure.
  • a computer processor or system with access to the database will search the database by comparing the new taglist to those already in the database, and will store the new taglist near others having identical or similar keywords by determining the “closest” fit as explained herein.
  • New taglists may be entered directly into the computer or system where the database is located, or may be added through client websites which accept the taglists and forward them to a database server.
  • search taglist may be the same as a taglist submitted by a user for entry in the database, or may be different, and need not be submitted by a person who has entered a taglist into the database.
  • search taglists may be entered directly into the database system, or may be entered into a client website and then forwarded to the database server, and the results returned to the client website. For example, a user seeking others with similar interests in books may be able to enter a search taglist through a bookseller's website.
  • the system When a search taglist is entered, similarly to determining where to locate a new taglist, the system will search the database by comparing the search taglist with the other taglists stored in the taglist database and return a list of other users who have the “closest” matching taglists. Where appropriate, the system can provide a distance measure between any two taglists, allowing a consistent comparison of the “closeness” of the interest areas represented by each taglist to the search taglist.
  • the specificity, or lack thereof, of the search taglist will of course impact both the number and specificity of results.
  • a general keyword such as “health” is likely to generate a very large number of people who have input that tag as part of their own taglists and is thus unlikely to be useful or acceptable to the user.
  • the user may add additional tags such as “sports injuries” to narrow the results.
  • a further tag such as “runners' injuries” will narrow the search still further and may provide a list of people who more closely match the user's area of interest.
  • the user may add still further tags such, perhaps “New York City,” and may or may not see further refinement of the results that is useful.
  • a user may keep adding new tags as they please in order to find the most useful match for their purposes.
  • a user accessing the bookseller's website may start with a search for the taglist “mystery novels.”
  • a list of people in the database who have indicated such an interest will be returned. If the list is too long, or the user wishes to be more specific, more search tags may be entered to tailor the search to the user's specific interest, for example adding further tags such as “British” and “19 th Century.”
  • a user is more likely to find people who closely match the user's area of interest, in this case 19 th century British mystery novels. Once such people are identified, they may be notified of their common interest and may begin to communicate with one another immediately through the internet, mobile devices, etc.
  • users may be permitted to select their own taglists with no limitation on the words entered; however, as discussed below it may be more efficient to provide some kind of guide to users either describing the existing contents of the database or guiding the user to a list of words already in the database, so that they may select taglists that will more uniformly either fit into the database or be used to search the database.
  • the search may return all users with an identical interest, and/or may be structured to return users within a determined closeness. In some embodiments, the search will return all matching users. This allows people who otherwise do not know each other but share one or more of the same interests to find and communicate with one another.
  • One possible method of conducting a search and finding one or more matches to a search taglist from a user is to compare the search taglist to all taglists in the database.
  • using such a brute force method to compare the search taglist with each and every taglist in the database is very inefficient.
  • the taglist database is organized around neighborhoods so that the search looks at only a smaller and more relevant part of the database. This is accomplished by using a tree structure, similar to how data is stored in a binary tree, but with more than two branches (or subtrees) allowed at each level or node. While binary tree databases only work for totally ordered binary sets of items, the present invention uses a database organization that is more like a general hierarchy.
  • FIG. 1 illustrates this difference; a binary tree is composed of a sequence of nodes each having only two branches, while the data tree of FIG. 1 may contain more than two branches at each node where appropriate.
  • a node for collecting, and more specifically nodes for collecting dolls and still more specifically Barbie dolls, as mentioned above.
  • branches for collecting other dolls such as G.I. Joe dolls, and for collecting other items such as stamps, coins, books, etc.
  • the topmost (root) level would represent the world, the next level down countries, then states or provinces, then counties, then cities, towns, blocks etc., until a specific address is reached.
  • Searching the taglist database for matched to a particular search taglist, or for the best location to enter a new taglist involves selecting a node or neighborhood to search at each level of the tree. If the database is ordered as described, with the most general nodes at the top, and the taglists are similarly arranged with the more general keywords first, then a search will first compare the first tag or keyword to the highest level nodes to see if there is a match. If a match is found, the search will move to the next keyword, and to the branches or nodes underneath the matching one, and again seek a match. Thus, in general, at each level the system will try to select a more local neighborhood to zero in on the closest matches.
  • items that are deemed close enough to the search taglist to be returned as results of the search can be compared with the search taglist and the items ranked in order of closeness. If desired, only the closest items may be retained on a list of users and their interests that match the search taglist. For example, as discussed in more detail below, a distance constraint may be predetermined by the user that must be met for an item to be retained on the list, so that, for example, any taglists closer than the specified distance are put on the return list while those farther away than that distance are discarded from the list.
  • the search will need to be redone with a larger distance constraint if results, i.e., the identity of other users, are desired.
  • a list-size constraint may be implemented that will limit the list returned to a predetermined number of the closest items.
  • the system will rank the taglists in the database by how closely they match the search taglist and return a list of that predetermined size containing the users who most closely match the search taglist.
  • the system may provide an interface which allows a user to either select pre-defined keywords for a taglist, for example from a pull-down menu or a displayed list that contains the keywords already in the database, or, if the user cannot find a keyword that represents the user's interest, to enter a new one.
  • an entered or search taglist may be made more uniform and efficient by having the system reorder attributes or add related attributes in an attempt to have the taglist more closely conform to the database structure.
  • attributes may be reordered so that storage in the database is independent of the order in which a user has entered attributes; for example, the taglists (Barbie, dolls, collecting) or (collecting, Barbie, dolls) should be either located in the same place (for new taglists), or the database searched in the same way (for search taglists), as the taglist (collecting, dolls, Barbie.)
  • attributes may be added to make a taglist conform to the structure of the database without changing the interest represented, i.e., if a taglist contains “camera,” then matching will generally be easier and more uniform if the more general category “photography” is added by the system based upon the existing structure of the database before either storing or matching is done.
  • Such other data attributes may include information about users, such as their names, addresses, contact information, etc., or identification of documents, media files (images, audio or video files), projects or other objects which may also have their associated “interest” or subject areas represented by taglists. This may be easily accomplished by adding fields to the relational database which correspond to tags or to nodes in the database, which may then be searched as a virtual tree. Such techniques are known to those of skill in the art. In such cases it is possible to apply a query that returns any database items that contain a taglist, whether they are user records or other files such as those described above.
  • a user may indicate an interest area via a taglist.
  • the area of a document's or other file's “interest area,” i.e., its subject may also be characterized by a taglist.
  • tags, or keywords will be found within the document itself, although this is not strictly required.
  • the keywords may, for example, be assigned by a user characterizing the document.
  • the interest area of a document or other file may be specified only by a taglist containing a set of keywords used once each, similar to the description of a user's interest area as above.
  • a document or other file may use a keyword multiple times, in which case it is possible to count the number of times that each keyword is used, thus providing a count value for each keyword which indicates the relative importance of each keyword.
  • the representation of an interest type of a document or other file may be a keyword vector in which each keyword has a vector position and a count indicating the relative importance of the keyword. For certain types of files this may be easy to accomplish by the use of automated programs such as those for text or voice recognition, which can count the number of times each word is used.
  • a taglist represents a vector, and each entry in the database may be represented by the vector that arrives at that entry, with the nodes of the database representing the components of the vectors.
  • Nodes may also be considered shortened vectors, i.e., each node represents the vector of tags that leads to that node.
  • the tags are represented to the user as strings, but internally they are given a numeric index corresponding to a position in a vector.
  • the vector positions must be consistent internally even when tags may be introduced in different orders by different users, or in different versions of a system.
  • There is thus a tag database that provides translations between tag positions and tag text labels, as well as translation functions to make different databases consistent.
  • the length of the vectors will correspond to the number of keywords used in the whole database. When new keywords are introduced, the vector length will be extended (for all vectors) to provide slots for each new keyword. Each keyword gets a unique index into any vector. Managing the changing sizes of vectors is known to those of skill in the art.
  • the “norm” of a vector x is defined as the square root of x ⁇ x, represented as ⁇ x ⁇ .
  • the distance between two vectors, x and y, would thus be represented as ⁇ x ⁇ y ⁇ .
  • the count of each keyword in a vector is limited to 0 or 1, and the vector corresponds to a set of keywords as above.
  • the closeness value defined above may still be calculated, or some alternative distance or closeness measures may be used.
  • one alternative measure could be the cardinality of the symmetric set difference, i.e.: card(A ⁇ B ⁇ A ⁇ B) or the cardinality of the intersection: card(A ⁇ B)
  • card(A ⁇ B) the card(A ⁇ B)
  • the former measures how the sets are different while the latter measures how they are the same. Note that the intersection measure is reversed in that closer vectors or sets have a higher value.
  • the taglist database consists of a tree of keyword (or taglist) vectors, where each node is a component of a vector and there are potentially many children of a node, each another component of a vector. Each node in the tree has tree navigation values, such as parent and children, as well as the component value.
  • a user-generated taglist is thus a vector which is placed in the tree if the same vector is not already present, at least initially as a leaf node as described below.
  • a search taglist is a vector for which the closest matches are sought.
  • a closeness level value, C(level), may be specified for each level of the taglist database tree, which indicates how closely nodes may be located relative to one another; if two vectors representing nodes at a given level would be closer than the closeness level value, then they should be represented by the same node.
  • LAF level adjustment function
  • the user When a user wishes to indicate an interest area for inclusion in the database, the user provides a new taglist that is to be entered into the database. In this case, it should be determined where the new taglist is to be added to the database. In order to determine where to add the new taglist, it is determined at each level of the database, i.e., for each tag or component of the vector, whether the tag at that level is close enough to an existing node to fall within that existing node, and for any remaining tags in the taglist to be children nodes, or whether the tag is farther away from any existing nodes than the closeness level value C(level) at that level of the database and should result in the creation of a new node. (The situation in which the taglist is identical to a taglist already in the database is discussed below, but is very similar to this.)
  • FIG. 2 describes the insertion of a taglist, or more precisely a node S which corresponds to the vector represented by the taglist, into the database.
  • step 201 the database system sets a node level value N to be the root level of the database.
  • the distance between two vectors x and y is represented as ⁇ x ⁇ y ⁇ . Since nodes may also be represented by vectors as above, the distance measure between two nodes is the distance between the corresponding taglist vectors for those nodes.
  • step 203 the system compares C(level) at this level with the minimum value of M selected. If the minimum value of M is greater than C(level), then S is not considered to be close to any of the children C of root level N. In this case, the system proceeds to step 204 where it adds node S as a child of N and the process ends.
  • step 205 the system selects the child node C at this level that was used to calculate the minimum M, i.e., that is within C(level) of S; the selected child node C is now designated as node K.
  • step 206 Next there is a test whether S is a subset of K at step 206 . If S is not a subset of K, the system proceeds to step 208 . If S is a subset of K, at step 207 the system reverses the tag vector values of the nodes S and K, making the smaller tag-vector set come first in the tree structure. The system then proceeds to step 208 .
  • step 208 the system checks whether node K is a leaf node, i.e., a node having no children; if node K is a leaf node, the process proceeds to step 210 where it inserts node S just below node K and then ends. At this point, node K is no longer a leaf node, as it now has a child node S.
  • step 210 it continues the recursive process by setting node K to the new node N and returning to step 202 .
  • the process is now repeated at the next level, now looking at the children of node K, and comparing the next item in the taglist vector as the new S which is either to be included in one of the children nodes of node K or inserted as a new child node itself.
  • the process repeats until a new node is inserted in the database, either because the next keyword in the taglist is not close enough to any of the existing nodes at that level or because every keyword in the taglist has been compared to a level in the database, and the appropriate location has been found to enter the new taglist.
  • the new node may contain information about the user who submitted the taglist. If the new node is simply user information, since a user may submit multiple taglists to represent multiple interests, it will generally be more efficient to enter a user's information only once rather than creating multiple nodes each containing the repetitive information for the same user. Instead of entering the actual user information in the database, it may be more efficient to include in the new node a pointer to a memory location where the user's information may be found rather than a full copy of the information.
  • a user database is described below.
  • the user may submit a search taglist representing the interest for which matches are sought, which results in a vector S.
  • the user may also specify a closeness value d, or, in some embodiments, the system may use a predefined default value for d.
  • the database system uses the following procedure to perform the search:
  • the system does a left-to-right recursive walk down the database tree, at each level selecting those nodes that are within the specified closeness distance d to S.
  • the walk continues to search down a subtree if the node N that is at the root of the subtree is within d+e of S, where e is the closeness level value C(level) at this level of the tree.
  • Each node readied during the walk is entered into a list.
  • the matching values to be returned are those in the list at the end of the search down the selected branches of the tree.
  • Adding the closeness value e at each level to the specified closeness level d of the search when choosing which subtrees to search down insures that the system collects all of the relevant matches within available subtrees. If during the insertion process each child node C of a subtree has been placed within the closeness distance e of the root node N of that subtree at that level, then that subtree should be visited for matching if S is within distance d+e of the root node N.
  • FIG. 3 A simplified diagram of this principle is shown in FIG. 3 .
  • the desired keyword, i.e., vector, S is a distance D from root node N, and not within the specified closeness distance d under the measure described above, and thus it appears that the subtree of node N should not be searched.
  • each child node C of root node N is within a closeness distance e of root node N. If the distance e to a particular child node C is in the “direction of” S, i.e., it reduces the distance between the vectors represented by node C and root node N, then the distance between that child node C and S may be less than D as illustrated in FIG.
  • each leaf node that identifies a user represents a taglist or vector of the indicated interest area, and in searching the goal is to match the search taglist S and identify users (or documents, projects, etc.) with interests matching S.
  • the identity of the users (or documents, projects, etc.) may be located in a separate user database, and each leaf node may contain a pointer to the user to whom the represented taglist belongs.
  • any matching taglist or vector found in a taglist database may be used to identify the associated user by using a pointer to the user database. (Again, in a relational database, these functions may be combined.)
  • any convenient order of tags or keywords may be selected, although in some embodiments the system either encourage users to select tags in a particular order or from a predefined list, while in other embodiments the system may attempt to place the tags in an appropriate order for a search to be properly conducted.
  • a separate function may be provided for determining which users are associated with a particular taglist vector.
  • this is implemented as a binary tree of taglist vectors where the tags are ordered alphabetically. With the tags ordered alphabetically, the taglist vectors are then ordered lexicographically. In this case each taglist is treated like a letter, since the taglists can be given an alphabetic ordering.
  • This enables a binary tree where the nodes are taglist vectors. Such a binary tree allows the fast lookup of data associated with a taglist vector.
  • the data associated with a taglist vector in this case is a list of users that have selected that taglist vector as representing an interest area.
  • the taglist vector (afghan, antique) comes before (afghan, spread) and (antique, doll) for the purpose of placement in this binary tree.
  • Each taglist vector in this tree will be associated with a list of users who have registered that taglist by virtue of a leaf node at the end of the vector containing user information or a pointer to such. This provides a separation of function so the taglist database only needs to deal with taglists, not users.
  • This tree for tracking user tag vectors may be referred to as a user database.
  • a standard binary-tree search is used to find the tag vector t in the user database tree or to add the tag vector t if it is not already present.
  • the user name u is then added to the user name list at that tag vector node, again possibly by means of a pointer to a memory location containing the user information.
  • the taglist vector node associated with t is located in the user database, and then the user name u (or pointer) is removed from the user list at that node. If that action leaves the user list at that node empty, then the system may optionally also remove the taglist vector t from the user database, and may even remove the corresponding taglist vector from the taglist database.
  • the taglist database may also contain interest taglist vectors associated with documents, projects and other files, as well as those taglists entered by individuals.
  • interest taglist vectors associated with documents, projects and other files, as well as those taglists entered by individuals.
  • tools are available to scan such files and locate the keywords that may be used to create a taglist vector. Even if such tools are not available or desirable for some reason, taglist vectors may be created and assigned manually for documents, projects or files, as well as individuals.
  • the taglist database may be used to find those individuals who appear to be interested in the new item. To do this, the new taglist vector may be used to find close taglist vectors in the taglist database, and in turn the individuals associated with the close taglist vectors. These individuals may then be notified of the new item, whether it is a new document, file, project or user.
  • that item's taglist vector may again be used to find those who would be interested based on the closeness of the taglist vector of the item and the taglist vectors of the users. Those who are likely to be interested in the change to the item may then be notified.
  • the taglist database may be used to identify individuals with a particular interest, and in some instances the search for such persons may be for the purpose of starting an interest group devoted to that interest.
  • a person who organizes and/or runs an interest group is commonly termed a moderator.
  • a moderator may or may not be the person who has defined the subject area for the group, and a single group may have more than one moderator. Moderators often have the ability to remove people from their group and/or to act as a gatekeeper for those seeking entry into the group.
  • An interest group may have various rules or permissions settings. Typically such permissions are under the control of the moderator or moderators for the interest group, and may differ for moderators and other members. There may be settings to control the sending or receiving of messages, for example, so that only moderators can send messages to the entire group while other members may only send messages to other individual members. There may be rules determining whether someone desiring to join a group can do so automatically, possibly after registering, or whether joining requires the approval of a moderator.
  • Rules or permissions may be set to obtain a desired effect; for example, if anyone can join a group automatically, and if moderators can send messages to the entire group, then anyone joining the interest group would be automatically signing up to receive any and all messages sent by the moderator or moderators of the interest group.
  • the moderators for a particular website or organization may also be given a special search visibility, such that a user may ask that moderators be the most highly ranked results, or even the only results, to a search.
  • a visibility may allow a visitor to the website to search the taglist database for the moderator with the best interest fit, rather than requiring the user to search through all of the users having the combined interests of each interest group. For instance, a user might be able to more quickly identify a group oriented around an interest of the user by identifying the moderator than by searching for other users with the same interest.
  • a search visibility might allow visitors to a real estate website to locate the best agent or agents that best match their needs.
  • visitors to a website with multiple blogs may enter a taglist to more quickly find those blogs that best match their interests by searching for the bloggers whose taglists most closely match the entered taglist.
  • a central server website (known as the Hello-Central website in a commercial embodiment) may be provided which allows interest groups to register with the central website, and allows users to log in and create taglist vectors to find those interest groups as they would find other users with similar interests.
  • the taglist vectors created at the interest group client websites may be migrated to the central server website database, but with only the client website identification provided instead of the identification of the actual users from the client website.
  • a user on the central website does an interest group search that matches taglist vectors from a client website, the user can be transferred to the client website where the interest group users may be available for contact or discussion.
  • traffic on the central website is transferred to the client websites if there are matching interest groups on the client websites.
  • users on the central website may be permitted to create an interest group there without transferring to a client website.
  • the present invention can be implemented in numerous ways, including as a process, an apparatus, or a system.
  • the methods described herein may be implemented by program instructions for instructing a processor to perform such methods, and such instructions recorded on a computer readable storage medium such as a hard disk drive, floppy disk, optical disc such as a compact disc (CD) or digital versatile disc (DVD), flash memory, etc., or a computer network wherein the program instructions are sent over optical or electronic communication links.
  • a computer readable storage medium such as a hard disk drive, floppy disk, optical disc such as a compact disc (CD) or digital versatile disc (DVD), flash memory, etc.
  • CD compact disc
  • DVD digital versatile disc
  • flash memory etc.
  • the order of the steps of the methods described herein may be altered and still be within the scope of the invention.

Abstract

A method of social networking is disclosed in which users may indicate areas of interest to them and/or search for other users with the same or similar interests. Users may specify one or more areas of interest to them and enter those interests in a database, by means of a list of “tags” or keywords, called a taglist. The values of the tags may be weighted. A user wishing to find other users with similar interests may input a search taglist which is compared with the other taglists stored in the database, and a list of the users with the closest matching taglists is returned. The method may also be used to characterize documents, projects, media files or other data objects, so that such items may be searched in similar fashion.

Description

The present application claims the benefit of U.S. Provisional Patent Application No. 61/204,556, filed on Jan. 8, 2009.
FIELD OF THE INVENTION
The present invention relates generally to social networking. More specifically, the invention relates to methods for creating a database of the interests of users, so that users may search for others with the same or similar interests.
BACKGROUND OF THE INVENTION
In recent years social networking has become very popular as a way of allowing people to connect and communicate with each other, and share both information about themselves and digital content through the internet and over great distances. Sites such as Facebook and MySpace allow users to post information about their interests and activities for others to read, and to read the information that others have posted. Thus, users are able to provide information to a large number of people by posting only once, rather than having to communicate directly with each other person with whom it is desired to share information.
However, these types of sites are typically set up such that people must establish a connection with one another before they may view each other's information, so that communication and sharing information is generally limited to being between people who already know each other. Thus, while these sites allow people to find out about the interests of people they already know, they do not allow people to find others based solely upon their common interests.
A number of dating sites allow people to indicate interests, and provide various ways of matching up strangers. However, these sites expressly include demographic data (sex, age, race, marital status, education, employment, etc.) and physical attributes as part of their matching algorithms. In addition to this, it is believed that the scope and type of matching is limited, for example allowing a user only to designate one or more of a small number of very broad categories; for example, Yahoo Personals lists such categories as arts, family, travel, cooking, outdoor activities, playing sports, etc. As a result, these sites are typically not specific enough to be well suited to finding people of common specific interests, even ignoring the potential presumptions of romantic interest by their users.
While the internet may make it easier in some ways to find people of similar interests, for example by searching for a club, organization or user forum related to a specified subject, there does not appear to be any easily available means of searching for individuals who share an interest with a user.
SUMMARY OF THE INVENTION
The present invention allows a form of social networking in which people interested in certain subjects may easily and quickly find others who have the same or similar interests. According to the present invention, each person desiring to specify one or more areas of interest to them may have those interests entered in a database. A person may then search the database to find others who have indicated in the database that they have similar interests to the searcher. This allows people who otherwise do not know each other but share one or more of the same interests to find and communicate with one another.
Users may indicate an area of interest primarily by specifying a list of “tags” or keywords, called a taglist, which describes the area of interest to the user. Multiple interests are represented by multiple taglists, one taglist for each area of interest. The taglists for multiple users are stored in a taglist database.
In addition to entering one or more taglists of their own interests, users may enter a search taglist indicating an area of interest in which they wish to find other users with similar interests. The system will compare the search taglist selected by the user with the other taglists stored in the database and return a list of the closest matching users, i.e., those with the closest matching taglists.
If desired, some or all of the users who are the closest matches to a given search taglist can create a common interest group. Those users in the interest group who are online at the same time can communicate, via IM, voice, video connection or other communication means; such communications among persons who are online are known in the art. Those users in the interest group who are not online at the time can be invited to join discussions at other times via an email, voicemail, forum or chat room posting or other message. Interest areas are not limited to hobbies, but may also include such things as support groups, shopping co-ops, student or parent groups, groups of traveling companions, among many examples.
In other environments, such as an academic, professional, research or enterprise environment, users may enter their areas of expertise or interest into the database, and a search taglist may be used to help locate persons in the database having a desired expertise by focusing the matching search on an appropriately narrow area. In other embodiments, it may be desirable to locate projects, documents or media files (pictures, voice or video files, etc.) based on the interest areas to which they are related. A document, file or project in many cases may be represented by a single taglist, although it may require more than one taglist to adequately cover its contents.
The use of taglists enables a user to easily and greatly refine an interest area or a search by specifying interest areas that range from very broad to very narrow. If matching search for a given taglist does not return a suitable set of people, then the taglist defining the interest area can be adjusted and the matching search can be repeated until a group of people that is satisfactory to the searching user is found. At that point the matching group may be turned into a formal interest group if desired.
In one embodiment of the present invention, a method of adding a new keyword list to a hierarchical database containing a plurality of stored keyword lists, the database containing nodes corresponding to the keywords of the stored lists, by inserting the new list at the point of its closest match to an existing list, comprises comparing the first keyword of the new list to the nodes at a first level of the database to determine whether the first keyword is close enough to an existing node at the first level for the new list to be placed within the existing node; if the first keyword is close enough to fall within an existing node at the first level, repeating the comparison for the next keyword to the nodes at the level which is directly under the existing node; repeating the comparison for each subsequent keyword at each subsequent level until a keyword is not close enough to be placed within an existing node; and when a keyword is not close enough fall within an existing node at the level at which it is compared, creating a new node corresponding to the keyword directly under the last node for which a keyword was dose enough to be placed within the node.
In another embodiment, a method of adding a new weighted keyword list to a hierarchical database containing a plurality of stored weighted keyword lists, the database containing nodes corresponding to the weighted keywords of the stored lists, by inserting the new list at the point of its closest match to an existing list, comprises representing each weighted keyword list as a vector; comparing the first element of the new keyword vector to the nodes at a first level of the database to determine whether the first element is close enough to an existing node at the first level for the new list to be placed within the existing node; if the first element is close enough to fall within an existing node at the first level, repeating the comparison for the next element to the nodes at the level which is directly under the existing node; repeating the comparison for each subsequent element at each subsequent level until an element of the new keyword vector is not close enough to be placed within an existing node; and, when an element of the new keyword vector is not close enough fall within an existing node at the level at which it is compared, creating a new node corresponding to the element directly under the last node for which an element was close enough to be placed within the node.
In still another embodiment, a method of searching a hierarchical database of keyword lists for lists matching a search keyword list, the database containing nodes corresponding to the keywords of the stored lists, comprises comparing the first keyword of the search list to the nodes at a first level of the database to determine whether any existing nodes match the first keyword; if one or more nodes match the first keyword, repeating the comparison for the next keyword to the nodes at the level which is directly under the nodes matching the first keyword; repeating the comparison for each subsequent keyword at each subsequent level until the nodes closest to the search list are located; and returning a list of the located nodes.
In yet another embodiment, a method of searching a hierarchical database of weighted keyword lists for lists matching a search weighted-keyword list, the database containing nodes corresponding to the weighted keywords of the stored lists, comprises comparing the first weighted keyword of the search list to the nodes at a first level of the database to determine whether any existing nodes match the first weighted keyword; if one or more nodes match the first weighted keyword, repeating the comparison for the next weighted keyword to the nodes at the level which is directly under the nodes which match the first keyword; repeating the comparison for each subsequent weighted keyword at each subsequent level until the nodes closest to the search list are located; and returning a list of the located nodes.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a data tree according to the present invention.
FIG. 2 is a flowchart of the method of inserting a taglist into the data tree in one embodiment of the invention.
FIG. 3 is a diagram showing how the distance between vectors is related to the scope of a search of the data tree in one embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION
In the present invention, people may indicate one or more areas of interest to them and have those interests entered into a database so that they or others may locate people having interests similar to their own. A user of a system or method according to the present invention specifies an area in which they are interested by means of a taglist, which is a list of keywords or attributes that describes the area of interest. For example, a person interested in collecting Barbie dolls might designate this interest by the taglist (collecting, dolls, Barbie), while one interested in the use of catalysts in chemical reactions might use the taglist (chemistry, reactions, catalysts) to designate such area. Interest areas can be broad or narrow. In general, a more narrow interest area will have many attributes, while a broad interest area will have relatively fewer attributes; for example, one interested in collecting dolls in general may omit the attribute “Barbie” from the taglist and use the list (collecting, dolls).
The taglists generated by users who wish to indicate their interests to others are stored in a taglist database. The taglist database consists of nodes, where each node has internal fields to hold an associated taglist vector as well as fields used to link nodes together to form the tree-like database structure. For each new taglist submitted indicating a new user and/or new interest area, a computer processor or system with access to the database will search the database by comparing the new taglist to those already in the database, and will store the new taglist near others having identical or similar keywords by determining the “closest” fit as explained herein. New taglists may be entered directly into the computer or system where the database is located, or may be added through client websites which accept the taglists and forward them to a database server.
To find other users with similar interests, a user may enter a search taglist describing the area of interest to them. A search taglist may be the same as a taglist submitted by a user for entry in the database, or may be different, and need not be submitted by a person who has entered a taglist into the database. As with entering new taglists, search taglists may be entered directly into the database system, or may be entered into a client website and then forwarded to the database server, and the results returned to the client website. For example, a user seeking others with similar interests in books may be able to enter a search taglist through a bookseller's website.
When a search taglist is entered, similarly to determining where to locate a new taglist, the system will search the database by comparing the search taglist with the other taglists stored in the taglist database and return a list of other users who have the “closest” matching taglists. Where appropriate, the system can provide a distance measure between any two taglists, allowing a consistent comparison of the “closeness” of the interest areas represented by each taglist to the search taglist.
The specificity, or lack thereof, of the search taglist will of course impact both the number and specificity of results. For example, a general keyword such as “health” is likely to generate a very large number of people who have input that tag as part of their own taglists and is thus unlikely to be useful or acceptable to the user. The user may add additional tags such as “sports injuries” to narrow the results. A further tag such as “runners' injuries” will narrow the search still further and may provide a list of people who more closely match the user's area of interest. The user may add still further tags such, perhaps “New York City,” and may or may not see further refinement of the results that is useful. A user may keep adding new tags as they please in order to find the most useful match for their purposes.
Similarly, in the case of the bookseller, a user accessing the bookseller's website may start with a search for the taglist “mystery novels.” A list of people in the database who have indicated such an interest will be returned. If the list is too long, or the user wishes to be more specific, more search tags may be entered to tailor the search to the user's specific interest, for example adding further tags such as “British” and “19th Century.” By such refinements, a user is more likely to find people who closely match the user's area of interest, in this case 19th century British mystery novels. Once such people are identified, they may be notified of their common interest and may begin to communicate with one another immediately through the internet, mobile devices, etc.
For both new taglists and search taglists, in some embodiments users may be permitted to select their own taglists with no limitation on the words entered; however, as discussed below it may be more efficient to provide some kind of guide to users either describing the existing contents of the database or guiding the user to a list of words already in the database, so that they may select taglists that will more uniformly either fit into the database or be used to search the database.
The search may return all users with an identical interest, and/or may be structured to return users within a determined closeness. In some embodiments, the search will return all matching users. This allows people who otherwise do not know each other but share one or more of the same interests to find and communicate with one another.
One possible method of conducting a search and finding one or more matches to a search taglist from a user is to compare the search taglist to all taglists in the database. However, as with any search of a large database, using such a brute force method to compare the search taglist with each and every taglist in the database is very inefficient.
Instead, it is more efficient if the taglist database is organized around neighborhoods so that the search looks at only a smaller and more relevant part of the database. This is accomplished by using a tree structure, similar to how data is stored in a binary tree, but with more than two branches (or subtrees) allowed at each level or node. While binary tree databases only work for totally ordered binary sets of items, the present invention uses a database organization that is more like a general hierarchy.
FIG. 1 illustrates this difference; a binary tree is composed of a sequence of nodes each having only two branches, while the data tree of FIG. 1 may contain more than two branches at each node where appropriate. Thus, in FIG. 1, there is a node for collecting, and more specifically nodes for collecting dolls and still more specifically Barbie dolls, as mentioned above. There are also branches for collecting other dolls such as G.I. Joe dolls, and for collecting other items such as stamps, coins, books, etc.
At the same highest level (below the root) as collecting are other general categories as shown, for example photography and sports, and those other categories may also have their own sub-categories or branches, some of which are illustrated in FIG. 1. In a corresponding example based on geography, the topmost (root) level would represent the world, the next level down countries, then states or provinces, then counties, then cities, towns, blocks etc., until a specific address is reached.
Searching the taglist database for matched to a particular search taglist, or for the best location to enter a new taglist, involves selecting a node or neighborhood to search at each level of the tree. If the database is ordered as described, with the most general nodes at the top, and the taglists are similarly arranged with the more general keywords first, then a search will first compare the first tag or keyword to the highest level nodes to see if there is a match. If a match is found, the search will move to the next keyword, and to the branches or nodes underneath the matching one, and again seek a match. Thus, in general, at each level the system will try to select a more local neighborhood to zero in on the closest matches.
The process continues until the keywords of the search taglist have all been considered, or there is no match, at which point the system will take some action as described below, for example either returning the closest match or matches in the case of a search, or entering a user's interest into the database in the case of a new taglist.
In the case of a search, items that are deemed close enough to the search taglist to be returned as results of the search can be compared with the search taglist and the items ranked in order of closeness. If desired, only the closest items may be retained on a list of users and their interests that match the search taglist. For example, as discussed in more detail below, a distance constraint may be predetermined by the user that must be met for an item to be retained on the list, so that, for example, any taglists closer than the specified distance are put on the return list while those farther away than that distance are discarded from the list.
However, if the distance constraint is too small it is possible that no items will be considered close enough matches to a search taglist and thus nothing will be returned, i.e., no matching users will be found. In that case the search will need to be redone with a larger distance constraint if results, i.e., the identity of other users, are desired.
Alternatively, rather than specifying a closeness constraint, which in some instances may result in a list of users that is considered too big or too small for the searcher's purpose, a list-size constraint may be implemented that will limit the list returned to a predetermined number of the closest items. In this case, the system will rank the taglists in the database by how closely they match the search taglist and return a list of that predetermined size containing the users who most closely match the search taglist.
Note that returning all of the matching items of a query, or ranking all items in order of closeness, and then processing each returned item may require too much time (resulting in delays that users do not like) and/or processing power (requiring more and/or faster system hardware). To make the process more efficient it is possible to have the distance constraint built into the database or system, rather than specified by the user as above. In this case, the comparison to the distance constraint will be an integral part of the query, so that only if an item is within the distance constraint will the item be placed on the return list. The query will then return the matching list without any further processing or elimination of items being needed.
To keep time and hardware to a minimum, it will generally be more efficient to organize the tree representing the taglist database such that the more general nodes, i.e., categories, are at the top and the more numerous specific nodes at the bottom, as shown in FIG. 1( b). As above, since users may not be familiar with the structure of the database, in some embodiments the system may provide an interface which allows a user to either select pre-defined keywords for a taglist, for example from a pull-down menu or a displayed list that contains the keywords already in the database, or, if the user cannot find a keyword that represents the user's interest, to enter a new one.
Similarly, since a user may not enter taglists in an order from more general to more specific, an entered or search taglist may be made more uniform and efficient by having the system reorder attributes or add related attributes in an attempt to have the taglist more closely conform to the database structure. For example, attributes may be reordered so that storage in the database is independent of the order in which a user has entered attributes; for example, the taglists (Barbie, dolls, collecting) or (collecting, Barbie, dolls) should be either located in the same place (for new taglists), or the database searched in the same way (for search taglists), as the taglist (collecting, dolls, Barbie.) Similarly, attributes may be added to make a taglist conform to the structure of the database without changing the interest represented, i.e., if a taglist contains “camera,” then matching will generally be easier and more uniform if the more general category “photography” is added by the system based upon the existing structure of the database before either storing or matching is done.
In some cases it may be convenient to combine other data attributes with the taglists in a general purpose relational database, and not solely in the tree-structure neighborhood format. Such other data attributes may include information about users, such as their names, addresses, contact information, etc., or identification of documents, media files (images, audio or video files), projects or other objects which may also have their associated “interest” or subject areas represented by taglists. This may be easily accomplished by adding fields to the relational database which correspond to tags or to nodes in the database, which may then be searched as a virtual tree. Such techniques are known to those of skill in the art. In such cases it is possible to apply a query that returns any database items that contain a taglist, whether they are user records or other files such as those described above.
Certain aspects of the invention will now be explained in more detail. Two basic operations should be provided for a neighborhood-oriented taglist database to serve the functions described herein, the insertion of new taglists into the database, and searching the database for the closest matches to a specified taglist. In addition, there should be some way to measure the closeness between two taglists at each level in the hierarchical database.
As above, a user may indicate an interest area via a taglist. Similarly, if it desired to add a document or other file, the area of a document's or other file's “interest area,” i.e., its subject, may also be characterized by a taglist. Typically the tags, or keywords, will be found within the document itself, although this is not strictly required. The keywords may, for example, be assigned by a user characterizing the document.
In some instances, the interest area of a document or other file may be specified only by a taglist containing a set of keywords used once each, similar to the description of a user's interest area as above. However, a document or other file may use a keyword multiple times, in which case it is possible to count the number of times that each keyword is used, thus providing a count value for each keyword which indicates the relative importance of each keyword. In such a case the representation of an interest type of a document or other file may be a keyword vector in which each keyword has a vector position and a count indicating the relative importance of the keyword. For certain types of files this may be easy to accomplish by the use of automated programs such as those for text or voice recognition, which can count the number of times each word is used.
A taglist represents a vector, and each entry in the database may be represented by the vector that arrives at that entry, with the nodes of the database representing the components of the vectors. (Nodes may also be considered shortened vectors, i.e., each node represents the vector of tags that leads to that node.) The tags (or keywords) are represented to the user as strings, but internally they are given a numeric index corresponding to a position in a vector. The vector positions must be consistent internally even when tags may be introduced in different orders by different users, or in different versions of a system. There is thus a tag database that provides translations between tag positions and tag text labels, as well as translation functions to make different databases consistent. The length of the vectors will correspond to the number of keywords used in the whole database. When new keywords are introduced, the vector length will be extended (for all vectors) to provide slots for each new keyword. Each keyword gets a unique index into any vector. Managing the changing sizes of vectors is known to those of skill in the art.
If there are n unique keywords identified in the database, then the taglist vectors would have the following form, with x and y being vectors of the words identified, and each xi or yi being an integer count of the number of instances of use, for example in a document or file, of the keyword corresponding to that position in the vector:
x=(x i . . . x n), y=(y i , . . . , y n)  (Eq. 1)
In order to assist in measuring the closeness or similarity of documents, some numerical measures are defined. First, the “inner product” of two interest vectors is defined as follows:
x∘y=Σ i=1 n x i ·y i  (Eq. 2)
Next, the “norm” of a vector x is defined as the square root of x∘x, represented as ∥x∥. The distance between two vectors, x and y, would thus be represented as ∥x−y∥.
For example, if we have two vectors a=(1,0) and b=(0,1), their closeness value is
a−b∥=∥(1,−1)∥=√{square root over ((1+1))}=√{square root over (2)}  (Eq. 3)
In the case of a user's interest area, the count of each keyword in a vector is limited to 0 or 1, and the vector corresponds to a set of keywords as above. In that case the closeness value defined above may still be calculated, or some alternative distance or closeness measures may be used. Given two vectors A and B that are viewed as set, one alternative measure could be the cardinality of the symmetric set difference, i.e.:
card(A∪B−A∩B)
or the cardinality of the intersection:
card(A∩B)
The former measures how the sets are different while the latter measures how they are the same. Note that the intersection measure is reversed in that closer vectors or sets have a higher value. These two measures can be combined, for example by taking the cardinality of the symmetric difference minus the cardinality of the intersection:
card(A∪B−A∩B)−card(A∩B)
or the cardinality of the symmetric difference divided by the cardinality of the intersection:
card(A∪B−A∩B)/card(A∩B)
Again, the taglist database consists of a tree of keyword (or taglist) vectors, where each node is a component of a vector and there are potentially many children of a node, each another component of a vector. Each node in the tree has tree navigation values, such as parent and children, as well as the component value. A user-generated taglist is thus a vector which is placed in the tree if the same vector is not already present, at least initially as a leaf node as described below. A search taglist is a vector for which the closest matches are sought.
A closeness level value, C(level), may be specified for each level of the taglist database tree, which indicates how closely nodes may be located relative to one another; if two vectors representing nodes at a given level would be closer than the closeness level value, then they should be represented by the same node. These values may be determined by an initial level that is set at the root node and may be further determined by a level adjustment function (“LAF”) defined such that if L is the closeness level for a particular node N, the closeness level L′ of each child of node N is L′=LAF(L). The LAF may be reprogrammed for various purposes.
In some embodiments, the initial specification of LAF is set such that closeness level at a child node is ½ of the closeness level of the parent node to that node, i.e., L′=LAF(L)=L/2. Note, however, that this particular specification is only applied within the LAF itself so that the closeness levels of each node will only be calculated by invoking the LAF, thus allowing for modification as desired by simply modifying the LAF value.
Entering New User Interest Areas by Tag List Insertion
When a user wishes to indicate an interest area for inclusion in the database, the user provides a new taglist that is to be entered into the database. In this case, it should be determined where the new taglist is to be added to the database. In order to determine where to add the new taglist, it is determined at each level of the database, i.e., for each tag or component of the vector, whether the tag at that level is close enough to an existing node to fall within that existing node, and for any remaining tags in the taglist to be children nodes, or whether the tag is farther away from any existing nodes than the closeness level value C(level) at that level of the database and should result in the creation of a new node. (The situation in which the taglist is identical to a taglist already in the database is discussed below, but is very similar to this.)
FIG. 2 describes the insertion of a taglist, or more precisely a node S which corresponds to the vector represented by the taglist, into the database.
In step 201 the database system sets a node level value N to be the root level of the database.
As above, the distance between two vectors x and y is represented as ∥x−y∥. Since nodes may also be represented by vectors as above, the distance measure between two nodes is the distance between the corresponding taglist vectors for those nodes. In step 202 the system computes the distance value M between the taglist vector, i.e., a potential node S as a child of root level N, and each existing child C of root level N, calculating the distance M=∥C−S∥ for each C. The minimum calculated value of M is selected, as this represents the node C that is the most likely fit for the taglist at this level.
In step 203, the system compares C(level) at this level with the minimum value of M selected. If the minimum value of M is greater than C(level), then S is not considered to be close to any of the children C of root level N. In this case, the system proceeds to step 204 where it adds node S as a child of N and the process ends.
If the minimum value of M is less than or equal to C(level), at step 205 the system selects the child node C at this level that was used to calculate the minimum M, i.e., that is within C(level) of S; the selected child node C is now designated as node K.
Next there is a test whether S is a subset of K at step 206. If S is not a subset of K, the system proceeds to step 208. If S is a subset of K, at step 207 the system reverses the tag vector values of the nodes S and K, making the smaller tag-vector set come first in the tree structure. The system then proceeds to step 208.
At step 208 the system checks whether node K is a leaf node, i.e., a node having no children; if node K is a leaf node, the process proceeds to step 210 where it inserts node S just below node K and then ends. At this point, node K is no longer a leaf node, as it now has a child node S.
On the other hand, if node K is not a leaf node, i.e., it already has children of its own, the system proceeds to step 210 where it continues the recursive process by setting node K to the new node N and returning to step 202. The process is now repeated at the next level, now looking at the children of node K, and comparing the next item in the taglist vector as the new S which is either to be included in one of the children nodes of node K or inserted as a new child node itself.
The process repeats until a new node is inserted in the database, either because the next keyword in the taglist is not close enough to any of the existing nodes at that level or because every keyword in the taglist has been compared to a level in the database, and the appropriate location has been found to enter the new taglist.
Alternatively (or in addition), the new node may contain information about the user who submitted the taglist. If the new node is simply user information, since a user may submit multiple taglists to represent multiple interests, it will generally be more efficient to enter a user's information only once rather than creating multiple nodes each containing the repetitive information for the same user. Instead of entering the actual user information in the database, it may be more efficient to include in the new node a pointer to a memory location where the user's information may be found rather than a full copy of the information. One embodiment of a user database is described below.
Note that if a tag vector uses a set representation, as described above, where the closer vectors have a higher closeness number rather than a smaller one, then in the above steps the inequalities will be reversed as well as the use of maximum instead of minimum values.
Locating Users with Similar Interests by Tag List Matching
When a user wishes to search for other users with matching interests, the user may submit a search taglist representing the interest for which matches are sought, which results in a vector S. The user may also specify a closeness value d, or, in some embodiments, the system may use a predefined default value for d. The database system uses the following procedure to perform the search:
The system does a left-to-right recursive walk down the database tree, at each level selecting those nodes that are within the specified closeness distance d to S. The walk continues to search down a subtree if the node N that is at the root of the subtree is within d+e of S, where e is the closeness level value C(level) at this level of the tree. Each node readied during the walk is entered into a list. The matching values to be returned are those in the list at the end of the search down the selected branches of the tree.
Adding the closeness value e at each level to the specified closeness level d of the search when choosing which subtrees to search down insures that the system collects all of the relevant matches within available subtrees. If during the insertion process each child node C of a subtree has been placed within the closeness distance e of the root node N of that subtree at that level, then that subtree should be visited for matching if S is within distance d+e of the root node N. The sufficiency of this choice is based on the triangle inequality of metric spaces which states specifically that if:
C−S∥≦d
and
N−C∥≦e
then
N−S∥≦d+e
This means that to insure the search of a child node C below root node N by comparison with S, it is sufficient if we search below N when the distance from S to N is less than or equal to d+e.
A simplified diagram of this principle is shown in FIG. 3. There is a particular node N at the root of a subtree in the taglist database. The desired keyword, i.e., vector, S is a distance D from root node N, and not within the specified closeness distance d under the measure described above, and thus it appears that the subtree of node N should not be searched. However, each child node C of root node N is within a closeness distance e of root node N. If the distance e to a particular child node C is in the “direction of” S, i.e., it reduces the distance between the vectors represented by node C and root node N, then the distance between that child node C and S may be less than D as illustrated in FIG. 3. Thus, to insure that such an child node C is searched, it is simplest to search all root nodes N within the distance d+e of S, rather than to try to find only those child nodes C of root node N that are within distance d of S.
As discussed above, each leaf node that identifies a user represents a taglist or vector of the indicated interest area, and in searching the goal is to match the search taglist S and identify users (or documents, projects, etc.) with interests matching S. As also discussed above, in some embodiments the identity of the users (or documents, projects, etc.) may be located in a separate user database, and each leaf node may contain a pointer to the user to whom the represented taglist belongs. Thus, any matching taglist or vector found in a taglist database may be used to identify the associated user by using a pointer to the user database. (Again, in a relational database, these functions may be combined.)
Again as discussed above, any convenient order of tags or keywords may be selected, although in some embodiments the system either encourage users to select tags in a particular order or from a predefined list, while in other embodiments the system may attempt to place the tags in an appropriate order for a search to be properly conducted.
In some cases it may be desirable to delete a specific taglist. This is done by finding the exact taglist desired by performing a minimum value search as described above, i.e, by looking for the closest node to each keyword in the taglist, and then deleting the resulting match by using a standard tree structure delete. It may also be desirable to update intermediate non-leaf taglists above the deleted node. Deletion of an item from a database having a tree structure is well known in the art.
The User-Taglist Relation
A separate function may be provided for determining which users are associated with a particular taglist vector. In one embodiment this is implemented as a binary tree of taglist vectors where the tags are ordered alphabetically. With the tags ordered alphabetically, the taglist vectors are then ordered lexicographically. In this case each taglist is treated like a letter, since the taglists can be given an alphabetic ordering. This enables a binary tree where the nodes are taglist vectors. Such a binary tree allows the fast lookup of data associated with a taglist vector. The data associated with a taglist vector in this case is a list of users that have selected that taglist vector as representing an interest area.
The taglist vector (afghan, antique) comes before (afghan, spread) and (antique, doll) for the purpose of placement in this binary tree. Each taglist vector in this tree will be associated with a list of users who have registered that taglist by virtue of a leaf node at the end of the vector containing user information or a pointer to such. This provides a separation of function so the taglist database only needs to deal with taglists, not users. This tree for tracking user tag vectors may be referred to as a user database.
To add a user-taglist vector (u, t) pair to the user database, a standard binary-tree search is used to find the tag vector t in the user database tree or to add the tag vector t if it is not already present. The user name u is then added to the user name list at that tag vector node, again possibly by means of a pointer to a memory location containing the user information.
To remove a user-taglist vector (u, t) from the user database, as with removing a taglist from the taglist database above, the taglist vector node associated with t is located in the user database, and then the user name u (or pointer) is removed from the user list at that node. If that action leaves the user list at that node empty, then the system may optionally also remove the taglist vector t from the user database, and may even remove the corresponding taglist vector from the taglist database.
Change Notifications
As mentioned above, in some environments such as enterprises, the taglist database may also contain interest taglist vectors associated with documents, projects and other files, as well as those taglists entered by individuals. In the case of documents and certain other types of files, tools are available to scan such files and locate the keywords that may be used to create a taglist vector. Even if such tools are not available or desirable for some reason, taglist vectors may be created and assigned manually for documents, projects or files, as well as individuals.
If a new document, file or project is created with an associated tag vector, or a user enters a new taglist, the taglist database may be used to find those individuals who appear to be interested in the new item. To do this, the new taglist vector may be used to find close taglist vectors in the taglist database, and in turn the individuals associated with the close taglist vectors. These individuals may then be notified of the new item, whether it is a new document, file, project or user.
Similarly, if a change is made to an existing item, that item's taglist vector may again be used to find those who would be interested based on the closeness of the taglist vector of the item and the taglist vectors of the users. Those who are likely to be interested in the change to the item may then be notified.
Moderators
As explained herein, the taglist database may be used to identify individuals with a particular interest, and in some instances the search for such persons may be for the purpose of starting an interest group devoted to that interest. A person who organizes and/or runs an interest group is commonly termed a moderator. A moderator may or may not be the person who has defined the subject area for the group, and a single group may have more than one moderator. Moderators often have the ability to remove people from their group and/or to act as a gatekeeper for those seeking entry into the group.
An interest group may have various rules or permissions settings. Typically such permissions are under the control of the moderator or moderators for the interest group, and may differ for moderators and other members. There may be settings to control the sending or receiving of messages, for example, so that only moderators can send messages to the entire group while other members may only send messages to other individual members. There may be rules determining whether someone desiring to join a group can do so automatically, possibly after registering, or whether joining requires the approval of a moderator. Rules or permissions may be set to obtain a desired effect; for example, if anyone can join a group automatically, and if moderators can send messages to the entire group, then anyone joining the interest group would be automatically signing up to receive any and all messages sent by the moderator or moderators of the interest group.
The moderators for a particular website or organization may also be given a special search visibility, such that a user may ask that moderators be the most highly ranked results, or even the only results, to a search. Such a visibility may allow a visitor to the website to search the taglist database for the moderator with the best interest fit, rather than requiring the user to search through all of the users having the combined interests of each interest group. For instance, a user might be able to more quickly identify a group oriented around an interest of the user by identifying the moderator than by searching for other users with the same interest. In another application, such a search visibility might allow visitors to a real estate website to locate the best agent or agents that best match their needs. Similarly, visitors to a website with multiple blogs may enter a taglist to more quickly find those blogs that best match their interests by searching for the bloggers whose taglists most closely match the entered taglist.
A Central Database
If desired, a central server website (known as the Hello-Central website in a commercial embodiment) may be provided which allows interest groups to register with the central website, and allows users to log in and create taglist vectors to find those interest groups as they would find other users with similar interests. The taglist vectors created at the interest group client websites may be migrated to the central server website database, but with only the client website identification provided instead of the identification of the actual users from the client website.
If a user on the central website does an interest group search that matches taglist vectors from a client website, the user can be transferred to the client website where the interest group users may be available for contact or discussion.
Thus, in this approach traffic on the central website is transferred to the client websites if there are matching interest groups on the client websites. However, if no matches are located from a client website, then users on the central website may be permitted to create an interest group there without transferring to a client website.
The invention has been explained above with reference to several embodiments. Other embodiments will be apparent to those skilled in the art in light of this disclosure. The present invention may readily be implemented using configurations other than those described in the embodiments above, or in conjunction with systems other than the embodiments described above. For example, while the inclusion of documents and other types of files in the database tree structure has been described herein, it would be possible to use the present invention as a way of characterizing, storing and retrieving documents in a database that does not include user's designation of interest areas.
It should also be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, or a system. The methods described herein may be implemented by program instructions for instructing a processor to perform such methods, and such instructions recorded on a computer readable storage medium such as a hard disk drive, floppy disk, optical disc such as a compact disc (CD) or digital versatile disc (DVD), flash memory, etc., or a computer network wherein the program instructions are sent over optical or electronic communication links. It should be noted that the order of the steps of the methods described herein may be altered and still be within the scope of the invention.
These and other variations upon the embodiments are intended to be covered by the present invention, which is limited only by the appended claims.

Claims (14)

1. A method allowing users of an information system to identify other users having similar interests comprising:
storing keyword lists representing user interests in a hierarchical database containing a plurality of stored keyword lists, the database comprising nodes corresponding to the keywords of the stored keyword lists;
adding a new keyword list to the hierarchical database by inserting the new keyword list at a point of the new keyword list's closest match to an existing keyword list by:
comparing a first keyword of the new list to the nodes at a first level of the database to determine whether the first keyword is close enough to an existing node at the first level for the new list to be placed within the existing node;
if the first keyword is close enough to fall within an existing node at the first level, repeating the comparison for the next keyword to the nodes at a level under the existing node;
repeating the comparison for each subsequent keyword at each subsequent level until a keyword is not close enough to be placed within an existing node; and
when a keyword is not close enough to fall within an existing node at the level at which it is compared, creating a new node corresponding to the keyword under the last node for which a keyword was close enough to be placed within the node.
2. The method of claim 1, further comprising creating subsequent nodes below the newly created node, each subsequent node corresponding to any remaining keywords in the new keyword list and being under the node corresponding to the preceding keyword in the list.
3. The method of claim 1, wherein creating a new node further comprises providing identification of a user with whom the new keyword list is associated.
4. The method of claim 1 further wherein:
the keyword lists are weighted keyword lists each represented as a vector;
wherein said comparing comprises comparing the first element of the new keyword vector to the nodes at a first level of the database to determine whether the first element is close enough to an existing node at the first level for the new list to be placed within the existing node;
if the first element is close enough to fall within an existing node at the first level, repeating the comparison for the next element to the nodes at a level under the existing node;
repeating the comparison for each subsequent element at each subsequent level until an element of the new keyword vector is not close enough to be placed within an existing node; and
when an element of the new keyword vector is not close enough fall within an existing node at the level at which it is compared, creating a new node corresponding to the element under the last node for which an element was close enough to be placed within the node.
5. The method of claim 4 wherein determining whether an element is close enough to an existing node for the new list to be placed within the existing node is determined by comparing the distance between the vector represented by the new list and the vector represented by the existing node.
6. The method of claim 5 wherein:
if the distance between the vector represented by the new list and the vector represented by the existing node is less than a specified closeness value the keyword is considered close enough to be placed within the existing node.
7. The method of claim 6 wherein the closeness value is a predetermined value.
8. The method of claim 6 wherein:
the closeness value for any child node is a predetermined function of the closeness value of the nodes parent such that the closeness value is different for each level in the database; and
the closeness value of one level is predetermined and the values of the other levels are calculated using the predetermined function.
9. The method of claim 6 further comprising:
allowing users to enter one or more keyword lists of their own interests,
wherein users may enter a search keyword list indicating an area of interest in which the users wish to find other users with similarly close interests; and
comparing the search keyword list selected by the user with the other keyword lists stored in the database and returning a list of the closest matching users (e.g., users with the closest matching keyword lists) while at each level the system will select the most nearby neighborhoods to zero in on the closest matches.
10. The method of claim 4 wherein the weighted keywords are represented as elements of vectors and determining whether a weighted keyword matches an existing node is determined by comparing the distance between the vector represented by the search list and the vector represented by the existing nodes in the database.
11. The method of claim 10 wherein if the distance between the vector represented by the search list through a particular keyword and the vector represented by the existing node is less than a specified closeness value the existing node is considered to match the search list.
12. The method of claim 1, further comprising:
searching the hierarchical database for keyword lists matching a search keyword list by:
comparing the first keyword of the search list to the nodes at a first level of the database to determine whether any existing nodes match the first keyword;
if one or more nodes match the first keyword, repeating the comparison for the next keyword to the nodes at the level which is directly under the nodes matching the first keyword;
repeating the comparison for each subsequent keyword at each subsequent level until the nodes closest to the search list are located; and
returning a list of the located nodes.
13. The method of claim 12 wherein returning the list of located nodes further comprises returning the closest matching nodes in the order of their closeness to the search keyword list.
14. The method of claim 12 wherein returning the list of located nodes further comprises returning a predetermined number of the closest matching nodes.
US12/651,447 2009-01-08 2010-01-01 Interest-group discovery system Expired - Fee Related US8209338B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/651,447 US8209338B2 (en) 2009-01-08 2010-01-01 Interest-group discovery system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US20455609P 2009-01-08 2009-01-08
US12/651,447 US8209338B2 (en) 2009-01-08 2010-01-01 Interest-group discovery system

Publications (2)

Publication Number Publication Date
US20100174724A1 US20100174724A1 (en) 2010-07-08
US8209338B2 true US8209338B2 (en) 2012-06-26

Family

ID=42312364

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/651,447 Expired - Fee Related US8209338B2 (en) 2009-01-08 2010-01-01 Interest-group discovery system

Country Status (1)

Country Link
US (1) US8209338B2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8443003B2 (en) * 2011-08-10 2013-05-14 Business Objects Software Limited Content-based information aggregation
US9098311B2 (en) 2010-07-01 2015-08-04 Sap Se User interface element for data rating and validation
US9288650B2 (en) 2012-11-13 2016-03-15 Institute For Information Industry Method, device and recording media for searching target clients
US20170212948A1 (en) * 2016-01-21 2017-07-27 Fujitsu Limited Collecting and organizing online resources

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7114148B2 (en) * 2002-09-30 2006-09-26 Microsoft Corporation Runtime services for network software platform
US9065727B1 (en) 2012-08-31 2015-06-23 Google Inc. Device identifier similarity models derived from online event signals
US9514435B2 (en) * 2009-08-17 2016-12-06 Accenture Global Services Limited System for targeting specific users to discussion threads
JP5617249B2 (en) * 2010-01-20 2014-11-05 富士ゼロックス株式会社 Form identification device, form identification program, and form processing system
US20120072497A1 (en) * 2010-09-21 2012-03-22 Dak Brandon Steiert Social interaction application
US20120136877A1 (en) * 2010-09-21 2012-05-31 Zadeh Shayan G System and method for selecting compatible users for activities based on experiences, interests or preferences as identified from one or more web services
US8935237B2 (en) 2011-09-09 2015-01-13 Facebook, Inc. Presenting search results in hierarchical form
US10289267B2 (en) 2011-09-09 2019-05-14 Facebook, Inc. Platform for third-party supplied calls-to-action
US20130067417A1 (en) * 2011-09-09 2013-03-14 Rasmus Mathias Andersson Presenting Hierarchical Information Items
US9053185B1 (en) 2012-04-30 2015-06-09 Google Inc. Generating a representative model for a plurality of models identified by similar feature data
WO2014076559A1 (en) * 2012-11-19 2014-05-22 Ismail Abdulnasir D Keyword-based networking method
CN102968500B (en) * 2012-12-04 2015-04-15 中国飞行试验研究院 Quick retrieving method for special treatment of flight based on layered retrieval
CN104281578B (en) * 2013-07-02 2017-11-03 威盛电子股份有限公司 The region labeling method and device of data file
WO2016207742A1 (en) * 2013-11-19 2016-12-29 Ismail Abdulnasir D Keyword-based networking method
CN105630837B (en) * 2014-11-06 2019-12-13 阿里巴巴集团控股有限公司 Media record searching method and device
US20160292247A1 (en) * 2015-03-31 2016-10-06 Kenneth Scott Kaufman Method of retrieving categorical data entries through an interactive graphical abstraction
CN106682102B (en) * 2016-12-02 2019-07-19 中国通信建设集团设计院有限公司 A kind of information matching method based on set of keywords

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787430A (en) * 1994-06-30 1998-07-28 International Business Machines Corporation Variable length data sequence backtracking a trie structure
US20080071769A1 (en) * 2006-08-23 2008-03-20 Govindarajan Jagannathan Efficient Search Result Update Mechanism

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787430A (en) * 1994-06-30 1998-07-28 International Business Machines Corporation Variable length data sequence backtracking a trie structure
US20080071769A1 (en) * 2006-08-23 2008-03-20 Govindarajan Jagannathan Efficient Search Result Update Mechanism
US7979453B2 (en) * 2006-08-23 2011-07-12 Innovative Solutions, Inc. Efficient search result update mechanism

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9098311B2 (en) 2010-07-01 2015-08-04 Sap Se User interface element for data rating and validation
US8443003B2 (en) * 2011-08-10 2013-05-14 Business Objects Software Limited Content-based information aggregation
US9288650B2 (en) 2012-11-13 2016-03-15 Institute For Information Industry Method, device and recording media for searching target clients
US20170212948A1 (en) * 2016-01-21 2017-07-27 Fujitsu Limited Collecting and organizing online resources
US10902024B2 (en) * 2016-01-21 2021-01-26 Fujitsu Limited Collecting and organizing online resources

Also Published As

Publication number Publication date
US20100174724A1 (en) 2010-07-08

Similar Documents

Publication Publication Date Title
US8209338B2 (en) Interest-group discovery system
US11868409B2 (en) Social network searching with breadcrumbs
US10430425B2 (en) Generating suggested queries based on social graph information
US10331748B2 (en) Dynamically generating recommendations based on social graph information
US10728203B2 (en) Method and system for classifying a question
US8244848B1 (en) Integrated social network environment
US8572129B1 (en) Automatically generating nodes and edges in an integrated social graph
US9094472B2 (en) Web-based services for querying and matching likes and dislikes of individuals
US7860852B2 (en) Systems and apparatuses for seamless integration of user, contextual, and socially aware search utilizing layered approach
TWI401573B (en) Access to trusted user-generated content using social networks
US20100306249A1 (en) Social network systems and methods
US20100153832A1 (en) Collections of Linked Databases
US20100082653A1 (en) Event media search
US20080228947A1 (en) Collections of linked databases
US11080287B2 (en) Methods, systems and techniques for ranking blended content retrieved from multiple disparate content sources
CN104903886A (en) Structured search queries based on social-graph information
US11232522B2 (en) Methods, systems and techniques for blending online content from multiple disparate content sources including a personal content source or a semi-personal content source
CN107153687B (en) Indexing method for social network text data
US11216735B2 (en) Method and system for providing synthetic answers to a personal question
Chew et al. Understanding the everyday use of images on the web
CN109299368B (en) Method and system for intelligent and personalized recommendation of environmental information resources AI
Martoglia AMBIT: semantic engine foundations for knowledge management in context-dependent applications
US20170193113A1 (en) Indexing Auxiliary Domains
CN115525837A (en) Method and device for associating legal community with legal library
CN116186097A (en) Method, device, equipment and storage medium for searching data asset

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: PARLU, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WALLACE, DAVID ROBERT;KLAMKIN, MARILYNN;DONOVAN, KALI;SIGNING DATES FROM 20120222 TO 20120714;REEL/FRAME:028751/0928

REMI Maintenance fee reminder mailed
FPAY Fee payment

Year of fee payment: 4

SULP Surcharge for late payment
FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362