WO2017050991A1 - Aggregating profile information - Google Patents

Aggregating profile information Download PDF

Info

Publication number
WO2017050991A1
WO2017050991A1 PCT/EP2016/072737 EP2016072737W WO2017050991A1 WO 2017050991 A1 WO2017050991 A1 WO 2017050991A1 EP 2016072737 W EP2016072737 W EP 2016072737W WO 2017050991 A1 WO2017050991 A1 WO 2017050991A1
Authority
WO
WIPO (PCT)
Prior art keywords
profiles
nodes
profile
features
matching
Prior art date
Application number
PCT/EP2016/072737
Other languages
French (fr)
Inventor
Razvan DINU
Tom SAVAGE
Alexandru George CAZACU
George IONITA
Mihai BOGDAN
Traian REBEDEA
Original Assignee
3Desk Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 3Desk Ltd filed Critical 3Desk Ltd
Publication of WO2017050991A1 publication Critical patent/WO2017050991A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/01Automatic library building

Definitions

  • People searches are conducted billions of times per day by individuals and organizations in the public and private sector, for various reasons, including sales (e.g. identifying and qualifying influences in the buying process), marketing (targeting communications), recruitment (sourcing candidates), finance (e.g. credit checking) and monitoring political, social and environmental issues. "People data" also powers many products, such as recommendation engines in shopping and search sites.
  • the present disclosure provides techniques for gathering, normalizing, decorating and aggregating a large number of profiles (in embodiments tens of millions) in a fast and efficient manner. It uses a graph based architecture and graph-based pattern matching in order to accurately match together the profiles of a given person gathered from multiple different sources on the web, then make the aggregated profiles available through any of an number of potential channels such as a web-based search engine, API, or plugin to another application.
  • a method of aggregating profile information comprising: from multiple websites, automatically gathering profiles of multiple people profiled on those websites via the Internet, including, for at least some of the people, gathering multiple profiles of the same person from different ones of the websites; identifying multiple features in each of the profiles, including determining a value of each of the features; normalizing the values of the features of each profile into a common format; forming a profile graph by representing each of the profiles as a corresponding node in the profile graph and, based on the normalization into said common format, connecting each of a plurality of pairs of the nodes with one or more edges, each edge representing a match between the values of one of the features found in both profiles represented by the pair of nodes; matching together different ones of said profiles into groups based on the edges between the corresponding nodes in the profile graph, each group estimated to be the profiles of a respective same one of the people; for each of the groups, aggregating at least some of the profiles of the group into an aggregate profile of the respective person
  • the matching of the profiles may comprise: for each of the pairs of nodes, matching the corresponding profiles together into the same group in dependence on a number of other nodes in common within a predetermined number of hops in said profile graph.
  • the nodes in the pair may be connected by more than one edge, each edge representing a match between the values of a different respective one of a plurality of said features found in both profiles represented by the pair of nodes.
  • the matching of the profiles may comprise: for each of the pairs of nodes, matching the corresponding profiles together into the same group in dependence on a number of edges between the nodes of the pair.
  • the matching of the profiles may comprise: for each of the pairs of nodes, matching the corresponding profiles together into the same group in dependence on a frequency of occurrence within the profile graph of a value of one of the features represented by an edge between the corresponding nodes.
  • the matching of the profiles may comprise: for each of the pairs of nodes, matching the corresponding profiles together into the same group in dependence on a measure of similarity between non-exactly matching values of a feature found in both the corresponding profiles.
  • said identifying may comprise an entity extraction phase which identifies which of the features occur in different ones of the profiles and represents each of those features as an entity in an entity graph, and which further identifies relationships between the features and represents the relationships in the entity graph; wherein the entity graph may be an input to the step of forming the profile graph.
  • the features may include at least one of: name, academic institution, skills, occupation, employer, company, interests, and/or place of residence.
  • the identification of the features is performed at least in part using natural language processing.
  • the method may further comprise a validation phase in which, for at least some of the groups, one or more of the profiles are eliminated from the group in
  • the further comparison may comprise determining whether the group contains profiles from the same website, and if so, the validation phase may eliminate one of the profiles from the same website.
  • the aggregated profiles may be made available through a searchable user interface.
  • the search query when a search query is entered through said user interface searching for a value not yet represented in the profile graph, the search query may automatically trigger a gathering, via the Internet, of one or more further profiles from one or more websites based on the search query; and the method may further comprise updating the profile graph to include the one or more further profiles, and based thereon generating a new aggregate profile for a new person and/or an update to one or more of the existing aggregate profiles
  • the method may be performed by a first provider, and the making available of the aggregated profiles may comprise: making the aggregate profiles available to the public through a website run by the first provider.
  • the method may be performed by a first provider, and the making available of the aggregated profiles may comprise: providing the aggregate profiles to a plugin of a web browser or other internet-enabled application provided by a second provider, such that the second provider can make the aggregate profiles available to users of said application.
  • the method may be performed by a first operator, and the making available of the aggregated profiles may comprise: providing the aggregate profiles to an API of a computer system run by a second provider, so the second provider can make the aggregate profiles available to users of said computer system.
  • a server configured to perform the operations of any method disclosed herein.
  • a computer program product comprising code embodied on a computer-readable storage medium, and configured so as when run on one or more processors to perform operations of any method disclosed herein.
  • Figure 1 is a schematic block diagram of a computer network
  • Figure 2 is a flow chart showing a method of aggregating profile information
  • Figure 3 is a schematic illustration of a user interface
  • Figure 4 is a schematic representation of a graph-based matching process.
  • FIG. 1 gives an overview of a system arranged in accordance with embodiments of the present disclosure.
  • the system comprises: a server of an aggregator service 102, the servers of multiple websites 103, a plurality of user terminals 104, and optionally the server of a third party 105 providing another service other than the aggregator or websites.
  • Each of these components 102, 103, 104, 105 is coupled to the Internet 101 via any of a variety of wired and/or wireless technologies. It is by means of this arrangement that the various interactions described below occur.
  • a server herein refers to a logical entity which may comprise one or more server units at one or more geographical sites.
  • Each of the user terminals 104 may take any suitable form, such as smartphone, tablet, laptop, desktop computer or set-top box.
  • Each of the websites 103 is a social media site or the like, whereby multiple different users can post profile information about themselves, and/or by which users can post profile information about others, via the Internet 101 using various ones of the user terminals 104. Users can also view the profiles individually from the individual websites through their user terminals 104.
  • a given user often has multiple different profiles on each of multiple websites 103, and each profile may consist of a different selection of information about the user.
  • a professional networking site may contain different information for the same user than a social networking site.
  • someone would have to visit all the sites individually.
  • the aggregator 102 is arranged to "crawl" multiple different websites 103 via the Internet 101 in order to automatically gather together some or all of the different profiles of each of multiple individuals, and to aggregate the different profiles into an aggregate (combined) profile for each of these people.
  • the aggregator 102 then makes these aggregate profiles available to the user terminals 104 of other users via the Internet 101 (not necessarily the same user terminals 104 through which the profile information was originally submitted to the websites 103, though there may well be some overlap).
  • the aggregate profiles could be made available in a searchable fashion through a special proprietary searching website run by the aggregator 102.
  • the aggregate profiles could be made available through a third party system, product or service 105, by means of an API or plugin application configured to interface between the aggregator 102 and the third party's system, product or service 105.
  • the third party 105 can thus in turn make the aggregate profiles available in a searchable fashion to its own users.
  • the method begins at step 210 with a gathering phase.
  • the aggregator 210 gathers profile information for multiple people from multiple different sources on the web. It does this by a process called "scraping".
  • the aggregator 102 comprises a separate scraper module for each of the different websites 103 it is arranged to recognize, each scraper being configured to interact with and parse the content of a different respective one of the websites 103.
  • the scraper submits an HTTP request to the respective website it is designated to scrape, including an identification of a target person in the request (e.g. name, username, or email address).
  • the website 103 returns the relevant content for that person.
  • the scraper parses the returned content to recognize various features that may be present, and extract values of those features (e.g. if the feature in question is occupation, the value may be "programmer”; and if the feature is place of residence, the value may be "San Jose”; etc.).
  • the scraper is able to do this because it is pre-configured to know the predetermined format of the particular website 103 it is designed for, i.e. which fields or positions each of the different features appear in in the content returned from the website 103 in question. That is, the scraper uses pre-configured rules to know exactly where to look for a given type of information in a given page (e.g. the name is the "div” with the id "ftp- name").
  • the scraper may use natural language processing (NLP) to perform this task.
  • NLP natural language processing
  • the scraper may use natural language processing (NLP) to perform this task.
  • NLP may be used to recognize whether or not a given page is indeed a profile person of a person, and/or to identify features (such as name, occupation, etc.) on pages where such features do not necessarily appear in fixed, predetermined fields.
  • features such as name, occupation, etc.
  • the NLP can be used to extra features or fields from free-form text.
  • a third option which the scraper may be configured to use is to use a dynamic approach based on one or more "extractors".
  • An extractor as referred to herein is a hybrid between a pre-configured rule and a full NLP based approach, in that it begins with a predetermined rule about where to look within webpage for a relevant field, but then uses NLP to determine the meaning of that field. For example, the rule may be to examine the HI field of the page's HTML (this being the highest level of heading), or to examine the caption beneath the largest image on the page.
  • the HI field is likely to contain relevant profile information, and therefore the scraper's extractor should look there.
  • different pages may use the HI field for different purposes, and so from page-to-page it may include different types of profile information or sometimes no relevant profile information.
  • extractors represent general rules or heuristics for extracting information out of a page (e.g. take the text from the first HI element of a page, take the images and the closest text to them, extract links that contain specific attributes, extract links with rich content next to them etc.).
  • a dynamic scraper additionally uses a set of extractors and learns which ones extract good information, i.e. applies them and then validate the information.
  • external services can be used to validate this.
  • other services can be referenced in order to validate whether the extractor extracted a name (e.g. by looking up the name in an index) or if it extracted a location (using Google Maps API).
  • Google Maps API Google Maps API
  • the scraper repeats the process for multiple different people whose profiles appear on the respective website 103 (in embodiments millions or even billions of people).
  • Each of the different scrapers also performs a similar process for the multiple people's profiles appearing on the respective website 103 it is responsible for scraping.
  • the aggregator 102 is able to build up a large database of profile information, including multiple profiles of any given user if that user has profiles on different sites 103.
  • the aggregator 102 stores the profile information extracted from the various different sites 103 in a common format of the aggregator, i.e. converts the data into a common schema, so that it all looks the same regardless of which website 103 it was derived from. This may be referred to herein as normalizing the data. That is, each of the websites 103 publishes its data in its own different respective format, with its own fields in certain places (or even no fixed fields at all). The aggregator 102 then affectively performs a mapping exercise, such that a certain field of the website is mapped to a certain field of the common schema (or a certain feature extracted using NLP is mapped to a certain field of the common schema).
  • the process then proceeds to the next phase 215, which may be referred to herein as the entity extraction phase.
  • entity extraction phase the aggregator 102 identifies entities that may repeatedly occur - e.g. an entity could be a name of a given user, a company, or a university, or a job title, etc.
  • the purpose of this phase is so that when the aggregator encounters an entity again, e.g. a given company, it recognizes it as another instance of the same company.
  • the aggregator determines a relationship between the different entities, e.g. the relationship between user and company may be "works for", or the relationship between user and university may be “studied at”.
  • every normalized profile that comes from the scrapers is used to create one or more entities and relationships between them.
  • the reunion of the nodes and relationships for all profiles form what may be referred to herein as "the entity graph".
  • the nodes in this graph can be anything (e.g. a profile, a company, a website, a location, a role, a skill).
  • the entity extraction phase 215 determines that the person identified by a profile P from a certain website works for company C, then a node is created for both P and C and a relationships between them is created with the type "works_for". If a profile P links to a website W then two nodes are created and also a relationships between them with the type "links to", etc. Relationships are directed but can be traversed both ways.
  • the next phase 220 is referred to herein as the clustering phase.
  • the aggregator 102 creates a "profile graph" representing the gathered data on the various different people from the various different websites 103.
  • the clustering phase 220 constructs the profile graph using the entity graph as its input.
  • the profile graph 400 comprises two types of element: nodes 401 and edges 402.
  • Each node 401 represents a given profile from a given website 103 (so the different profiles from the different websites 103 each have their own respective node 401, including that the different profiles of the same person each have their own respective node 401).
  • the edges 402 represent connections between profiles. In embodiments, there could simply be either one or zero edges between any given pair of nodes 401: i.e. they are either connected or not (e.g. based on some overall test of whether they correspond to the same person). However, preferably, in embodiments multiple edges can be allocated between any given pair of nodes 401. E.g. in the example of Figure 4, the nodes 401a and 401b are connected by three distinct edges 402i, 402 ⁇ , 402iii.
  • each edge 402 represents a match for a given feature. I.e. if both of a pair of nodes 401 represent profiles for which a certain feature is present (note that not all nodes necessarily have the same feature set), and if the values of that feature match, then a respective edge 402 is created between the two nodes 401. For instance, if the feature in question is name, and if the two profiles both include a name and the two values are both "Dave Example” then an edge 402i is created between the respective nodes 401a, 401b, with this edge 401i representing the feature of name. But if the values are instead, say, "Dave Example” and "Steve Forinstance", then no edge is added.
  • Another edge 402H is added between the same pair of nodes 402a, 402b, and so forth.
  • Other examples of features that could be used to create respective edges include: school, university, skills, hobbies, company, home town, current town of residence, country of residence, citizenship, address, email address, etc.
  • Some such items of information could also be broken down into separate features, e.g. the name could be broken down into given name and family name, or the address could be broken down into two or more of street, town and postcode, etc.
  • edge 402 is created only for an exact match between the values of the feature in question. Alternatively however, it is not excluded that edges could be created based on an inexact match. E.g. metrics are known for measuring the similarity of two strings, and/or the aggregator 102 could be configured to recognize certain predetermined variants of a value (such as that Jim is another form of the name James).
  • the aggregator 102 proceeds to the next phase of the process, which is the profile matching phase 230.
  • This phase looks for patterns in the profile graph 400 that indicate whether different profiles appear to belong to the same person (within some acceptable likelihood). As a simple example, one could guess that if two profiles have the same name and the same employer (so two edges 402 for two particular features), there is a 99% chance they are for the same person.
  • the profile matching phase 230 works based on any one or more of a variety of heuristics that may be evaluated based on the profile graph 400, and in embodiments based on a combination of such heuristics.
  • a hop is wherever nodes 401 are connected by at least one edge 402
  • this heuristic may evaluated on a yes/no basis, such that it is true for a given pair of nodes 401 if the nodes are within a predetermined number of hops, e.g. they are adjacent neighbours (one hop) or within a path of two hops, but false otherwise.
  • such an heuristic may measure a number of common neighbours within a predetermined number of hops (i.e. how many such neighbours exist).
  • the heuristic may be true if two nodes 401 share above a threshold number of neighbours in common within the predetermined number of hops, and false otherwise; and/or the number of neighbours within the hop limit could be used as a measure of the likelihood of a match as a matter of degree
  • the above types of heuristic can be used alone. However, such heuristics only determine whether nodes are connected or not, or to what degree they are connected, e.g. whether they are connected within some degree of separation, or whether they share a certain number of neighbours. I.e.
  • the profile matching 230 may alternatively or additionally be based on one or more other heuristics that take into account the nature of the connections between nodes (i.e. one or more heuristics that make use of the fact that, in embodiments of the present disclosure, the edges are characterized as representing certain specific features).
  • An example of this is the number of edges between two nodes 401.
  • Such an heuristic may be evaluated on a yes/no basis, such that it is true if the nodes share more than a threshold number of edges, e.g. two or three, and false otherwise. And/or, the number of edges may be taken as a measure of the likelihood that two nodes represent profiles of the same person, as a matter of degree.
  • Another example is the rarity of the matching value, i.e. its frequency of occurrence within the profiles represented by the graph 400 (how many times does it occur statistically, e.g. as a proportion of the number of instances of edges representing the feature in question in the graph 400).
  • a rare value of a given feature can be a strong indication of a link.
  • finding two profiles for the name "Dave Smith” does not give much confidence of a match, but finding two profiles for the name “Ezekial Q Nithercott” has a very low probability and therefore is a much stronger indication that they are likely to be for the same person.
  • This heuristic could be evaluated on a yes/no basis, such that it is true if the frequency
  • the frequency could be used as a measure of the likelihood of a match as a matter of degree.
  • Yet another possible type of heuristic is a similarity between the values of a given feature. For instance, one or more metrics may be used that measure the similarity between two strings. Again this could be evaluated on a yes/no basis, so that true if the similarity is above a threshold and false otherwise; and/or, the similarity may be used as a measure of the likelihood of a match as a matter of degree.
  • the similarity is not considered to be an heuristic which qualifies an edge 401 per se.
  • the test could be that on condition that two nodes 401 are connected by, say at least one edge 402, or at least two edges, the matching phase 230 then probes the profiles represented by those nodes further to look for features that are similar.
  • edges 401 in the profile graph 400 are added in the clustering phase 220 to represent close but in inexact matches, the measure of similarity may indeed be considered as a property of the edge 401.
  • a plurality of any two, more or all of the above metrics, and/or others, are combined in the decision making process in order to decode whether profiles are to be matched.
  • the output of the matching phase 230 is a second graph.
  • each node represents a given profile from a given website 103.
  • each node is connected by only one or zero edges: matched or not matched.
  • a validation phase 240 is then applied to remove some of the connections (edges) from this second graph.
  • one or more further heuristics are applied to eliminate edges that represent unlikely matches.
  • this phase 240 at least includes eliminating edges between nodes 401 that represent profiles from the same website 103, because it is unlikely that the same person has two different profiles on the same site 103.
  • Another example is to break up overly large bunches of nodes 401 that are still connected, on the basis that a given person is unlikely to have more than a certain number of profiles (e.g. while he or she may have a lot of profiles, numbers in the hundreds start to become unlikely).
  • the reason for the matching phase 230 and separate validation phase 240 is that it has been found to produce better results to find as many potential connections in the graph 400 as possible based on the heuristics used in the matching phase 230, then eliminate some in the validation phase 240; rather than be overly selective at the matching phase 230 and potentially miss some connections that might prove useful.
  • the result of the matching phase 230 and optional validation phase 240 will be a set of discrete sub-clusters, or groups, each representing a different respective person.
  • the aggregator 102 proceeds to the identity building phase 250. Here it aggregates the profiles of each sub-cluster into a respective aggregate profile for the respective person, which it publishes via the Internet 101.
  • the aggregator 102 may host its own website which users can access via their user devices 104 in order to search for people from amongst the aggregated profiles.
  • An example is illustrated in Figure 3, showing an example front-end user interface 300 of such a site.
  • the user interface 300 comprises a search bar 301 in which a user can enter a search query, such as the name or a person or the name of a company.
  • a search query such as the name or a person or the name of a company.
  • the user searches for the company name "SuperTechCo", to try to find people associated with this company. This brings up a list of results, each corresponding to a different person.
  • the list may show a profile picture 302 included in the aggregate profile for the respective person, and/or the values of one, some or all of the features 303 in the aggregate profile (e.g. name, company, interests, etc.).
  • the list may also include a set of icons or logos 304, one for each of the websites 103 from which the aggregate profile has been compiled.
  • the user may select (e.g. click or touch) one of the results in the list to summon up the complete aggregate profile 305.
  • the aggregator 102 may publish the aggregate profiles via other means, such as by making the aggregate profiles available to a plugin application or API of a third party 105 (being a different provider of a different product or service than the provider of the aggregation service 102, e.g. a different party, company, organization or legal entity).
  • a plugin application or API of a third party 105 being a different provider of a different product or service than the provider of the aggregation service 102, e.g. a different party, company, organization or legal entity.
  • the aggregator 102 may provide or endorse a plugin application which plugs in to a web browser or to another internet-enabled application such as an email client or instant messaging (IM) client, enabling users of that application to access the people search functionality though that application via the plugin.
  • IM instant messaging
  • the aggregator 102 may provide or endorse an API (application programming interface) which a third party 105 can integrate into their own computer system in order to access the people search through that system.
  • An example application of this would be to incorporate the API into the internal computer system of a recruitment company in order to allow recruiters to collect information on job applicants or potential applicants.
  • the API may allow the aggregate profile information to be accessed in an automated fashion based on a database of names or other search criteria stored in the third party's system 105, e.g. so a recruiter can automatically update information on a large database of potential applicants they may wish to contact about new job openings.

Abstract

A method of aggregating profile information, comprising: automatically gathering peoples' profiles from multiple websites; identifying multiple features in each profile; normalizing the values of the features into a common format; and forming a profile graph by representing each of the profiles as a corresponding node in the profile graph, and connecting each of a plurality of pairs of the nodes with one or more edges representing matches between the values of features found in both profiles. The method further comprises: matching together different ones of the profiles into groups based on the edges between the corresponding nodes in the profile graph, each group estimated to be the profiles of a respective same one of the people; for each of the groups, aggregating at least some of the profiles of the group into an aggregate profile of the respective person; and making the aggregate profiles available via the Internet.

Description

Aggregating Profile Information
Background Over 30% of all online searches (i.e. 3 billion per day) are searches for people, yet there is no efficient way to comprehensively search someone's online footprint. To harness a person's complete profile one would have to manually go from site to site visiting that person's profile on each different site containing information about them. There are currently 21 billion profiles on the web, and within two years the number is expected to reach 50 billion. As the amount of data grows exponentially, the problem is expected to get worse.
People searches are conducted billions of times per day by individuals and organizations in the public and private sector, for various reasons, including sales (e.g. identifying and qualifying influences in the buying process), marketing (targeting communications), recruitment (sourcing candidates), finance (e.g. credit checking) and monitoring political, social and environmental issues. "People data" also powers many products, such as recommendation engines in shopping and search sites.
A number of companies have attempted to build a people search solution, yet no one to date has managed to create a significant, accurate data set. The result is an incomplete user experience, and insufficient data to power the most valuable use cases. Either decisions are made based on limited information, or time and cost are incurred through having to search supplementary data by hand. Summary
The present disclosure provides techniques for gathering, normalizing, decorating and aggregating a large number of profiles (in embodiments tens of millions) in a fast and efficient manner. It uses a graph based architecture and graph-based pattern matching in order to accurately match together the profiles of a given person gathered from multiple different sources on the web, then make the aggregated profiles available through any of an number of potential channels such as a web-based search engine, API, or plugin to another application.
According to one aspect of the present disclosure, there is provided a method of aggregating profile information, comprising: from multiple websites, automatically gathering profiles of multiple people profiled on those websites via the Internet, including, for at least some of the people, gathering multiple profiles of the same person from different ones of the websites; identifying multiple features in each of the profiles, including determining a value of each of the features; normalizing the values of the features of each profile into a common format; forming a profile graph by representing each of the profiles as a corresponding node in the profile graph and, based on the normalization into said common format, connecting each of a plurality of pairs of the nodes with one or more edges, each edge representing a match between the values of one of the features found in both profiles represented by the pair of nodes; matching together different ones of said profiles into groups based on the edges between the corresponding nodes in the profile graph, each group estimated to be the profiles of a respective same one of the people; for each of the groups, aggregating at least some of the profiles of the group into an aggregate profile of the respective person; and making the aggregate profiles available via the Internet.
In embodiments, the matching of the profiles may comprise: for each of the pairs of nodes, matching the corresponding profiles together into the same group in dependence on a number of other nodes in common within a predetermined number of hops in said profile graph.
In embodiments, for at least some of said plurality of pairs of nodes, the nodes in the pair may be connected by more than one edge, each edge representing a match between the values of a different respective one of a plurality of said features found in both profiles represented by the pair of nodes. In embodiments, the matching of the profiles may comprise: for each of the pairs of nodes, matching the corresponding profiles together into the same group in dependence on a number of edges between the nodes of the pair. In embodiments, the matching of the profiles may comprise: for each of the pairs of nodes, matching the corresponding profiles together into the same group in dependence on a frequency of occurrence within the profile graph of a value of one of the features represented by an edge between the corresponding nodes. In embodiments, the matching of the profiles may comprise: for each of the pairs of nodes, matching the corresponding profiles together into the same group in dependence on a measure of similarity between non-exactly matching values of a feature found in both the corresponding profiles. In embodiments, said identifying may comprise an entity extraction phase which identifies which of the features occur in different ones of the profiles and represents each of those features as an entity in an entity graph, and which further identifies relationships between the features and represents the relationships in the entity graph; wherein the entity graph may be an input to the step of forming the profile graph.
In embodiments, the features may include at least one of: name, academic institution, skills, occupation, employer, company, interests, and/or place of residence.
In embodiments, the identification of the features is performed at least in part using natural language processing.
In embodiments, the method may further comprise a validation phase in which, for at least some of the groups, one or more of the profiles are eliminated from the group in
dependence on a further comparison between the profiles in that group; and said aggregation may aggregate only the profiles remaining after said elimination. In embodiments, the further comparison may comprise determining whether the group contains profiles from the same website, and if so, the validation phase may eliminate one of the profiles from the same website. In embodiments, the aggregated profiles may be made available through a searchable user interface.
In embodiments, when a search query is entered through said user interface searching for a value not yet represented in the profile graph, the search query may automatically trigger a gathering, via the Internet, of one or more further profiles from one or more websites based on the search query; and the method may further comprise updating the profile graph to include the one or more further profiles, and based thereon generating a new aggregate profile for a new person and/or an update to one or more of the existing aggregate profiles In embodiments, the method may be performed by a first provider, and the making available of the aggregated profiles may comprise: making the aggregate profiles available to the public through a website run by the first provider.
In embodiments, the method may be performed by a first provider, and the making available of the aggregated profiles may comprise: providing the aggregate profiles to a plugin of a web browser or other internet-enabled application provided by a second provider, such that the second provider can make the aggregate profiles available to users of said application. In embodiments, the method may be performed by a first operator, and the making available of the aggregated profiles may comprise: providing the aggregate profiles to an API of a computer system run by a second provider, so the second provider can make the aggregate profiles available to users of said computer system. According to another aspect disclosed herein, there is provided a server configured to perform the operations of any method disclosed herein. According to another aspect of the present disclosure, there is provided a computer program product comprising code embodied on a computer-readable storage medium, and configured so as when run on one or more processors to perform operations of any method disclosed herein.
Brief Description of the Drawings
To assist understanding of the present disclosure and to show how it may be put into effect, reference is made by way of example to the accompanying drawings in which:
Figure 1 is a schematic block diagram of a computer network,
Figure 2 is a flow chart showing a method of aggregating profile information,
Figure 3 is a schematic illustration of a user interface, and
Figure 4 is a schematic representation of a graph-based matching process.
Detailed Description of Preferred Embodiments
Figure 1 gives an overview of a system arranged in accordance with embodiments of the present disclosure. The system comprises: a server of an aggregator service 102, the servers of multiple websites 103, a plurality of user terminals 104, and optionally the server of a third party 105 providing another service other than the aggregator or websites. Each of these components 102, 103, 104, 105 is coupled to the Internet 101 via any of a variety of wired and/or wireless technologies. It is by means of this arrangement that the various interactions described below occur. Note that a server herein refers to a logical entity which may comprise one or more server units at one or more geographical sites. Each of the user terminals 104 may take any suitable form, such as smartphone, tablet, laptop, desktop computer or set-top box.
Each of the websites 103 is a social media site or the like, whereby multiple different users can post profile information about themselves, and/or by which users can post profile information about others, via the Internet 101 using various ones of the user terminals 104. Users can also view the profiles individually from the individual websites through their user terminals 104. A given user often has multiple different profiles on each of multiple websites 103, and each profile may consist of a different selection of information about the user. E.g. a professional networking site may contain different information for the same user than a social networking site. Hence to obtain all the profile information on a given person, someone would have to visit all the sites individually. Nowadays that can be a lot of sites, making this a laborious task. Also, it may not be easy to find all the different sources of profile information.
To address this, the aggregator 102 is arranged to "crawl" multiple different websites 103 via the Internet 101 in order to automatically gather together some or all of the different profiles of each of multiple individuals, and to aggregate the different profiles into an aggregate (combined) profile for each of these people. The aggregator 102 then makes these aggregate profiles available to the user terminals 104 of other users via the Internet 101 (not necessarily the same user terminals 104 through which the profile information was originally submitted to the websites 103, though there may well be some overlap). The aggregate profiles could be made available in a searchable fashion through a special proprietary searching website run by the aggregator 102. Alternatively or additionally, the aggregate profiles could be made available through a third party system, product or service 105, by means of an API or plugin application configured to interface between the aggregator 102 and the third party's system, product or service 105. The third party 105 can thus in turn make the aggregate profiles available in a searchable fashion to its own users.
The process performed by the aggregator 102 is now described in more detail with reference to Figure 2.
The method begins at step 210 with a gathering phase. Here, the aggregator 210 gathers profile information for multiple people from multiple different sources on the web. It does this by a process called "scraping". In embodiments, the aggregator 102 comprises a separate scraper module for each of the different websites 103 it is arranged to recognize, each scraper being configured to interact with and parse the content of a different respective one of the websites 103. The scraper submits an HTTP request to the respective website it is designated to scrape, including an identification of a target person in the request (e.g. name, username, or email address). In response the website 103 returns the relevant content for that person. The scraper then parses the returned content to recognize various features that may be present, and extract values of those features (e.g. if the feature in question is occupation, the value may be "programmer"; and if the feature is place of residence, the value may be "San Jose"; etc.). In embodiments, the scraper is able to do this because it is pre-configured to know the predetermined format of the particular website 103 it is designed for, i.e. which fields or positions each of the different features appear in in the content returned from the website 103 in question. That is, the scraper uses pre-configured rules to know exactly where to look for a given type of information in a given page (e.g. the name is the "div" with the id "ftp- name"). Alternatively of additionally, in embodiments, the scraper may use natural language processing (NLP) to perform this task. For example NLP may be used to recognize whether or not a given page is indeed a profile person of a person, and/or to identify features (such as name, occupation, etc.) on pages where such features do not necessarily appear in fixed, predetermined fields. Thus the NLP can be used to extra features or fields from free-form text.
A third option which the scraper may be configured to use (again as an alternative or in addition to either or both of the pre-configured and/or NLP based approaches discussed above), is to use a dynamic approach based on one or more "extractors". An extractor as referred to herein is a hybrid between a pre-configured rule and a full NLP based approach, in that it begins with a predetermined rule about where to look within webpage for a relevant field, but then uses NLP to determine the meaning of that field. For example, the rule may be to examine the HI field of the page's HTML (this being the highest level of heading), or to examine the caption beneath the largest image on the page. For instance, it may be taken as a predetermined rule that the HI field is likely to contain relevant profile information, and therefore the scraper's extractor should look there. However, different pages may use the HI field for different purposes, and so from page-to-page it may include different types of profile information or sometimes no relevant profile information.
Therefore in addition to the pre-determined rule to examine the HI field, the exactor applies NLP to that field in order to try to determine its meaning, i.e. what type of information it represents - e.g. does site comprise the name of the site, the name of the user whose profile the page, etc. Thus extractors represent general rules or heuristics for extracting information out of a page (e.g. take the text from the first HI element of a page, take the images and the closest text to them, extract links that contain specific attributes, extract links with rich content next to them etc.). Further, a dynamic scraper additionally uses a set of extractors and learns which ones extract good information, i.e. applies them and then validate the information. In the validation phase (to be discussed in more detail shortly) external services can be used to validate this. E.g. for an extractor, other services can be referenced in order to validate whether the extractor extracted a name (e.g. by looking up the name in an index) or if it extracted a location (using Google Maps API). Once the information is validated, it is included in the profile. The learning can also be done applying machine learning techniques on a set of positive and negative examples.
By whatever means the scraping is implemented, the scraper repeats the process for multiple different people whose profiles appear on the respective website 103 (in embodiments millions or even billions of people). Each of the different scrapers also performs a similar process for the multiple people's profiles appearing on the respective website 103 it is responsible for scraping. Thus the aggregator 102 is able to build up a large database of profile information, including multiple profiles of any given user if that user has profiles on different sites 103.
As part of parsing the profiles of the different websites 103, the aggregator 102 stores the profile information extracted from the various different sites 103 in a common format of the aggregator, i.e. converts the data into a common schema, so that it all looks the same regardless of which website 103 it was derived from. This may be referred to herein as normalizing the data. That is, each of the websites 103 publishes its data in its own different respective format, with its own fields in certain places (or even no fixed fields at all). The aggregator 102 then affectively performs a mapping exercise, such that a certain field of the website is mapped to a certain field of the common schema (or a certain feature extracted using NLP is mapped to a certain field of the common schema). Once normalized in this manner, the profiles extracted from different websites 103 are then ready to be understood in relation to one another, i.e. to be processed together as part of a common process. Having gathered and normalized at least an initial set of data, the process then proceeds to the next phase 215, which may be referred to herein as the entity extraction phase. Here the aggregator 102 identifies entities that may repeatedly occur - e.g. an entity could be a name of a given user, a company, or a university, or a job title, etc. The purpose of this phase is so that when the aggregator encounters an entity again, e.g. a given company, it recognizes it as another instance of the same company. In the entity extraction phase, the aggregator also determines a relationship between the different entities, e.g. the relationship between user and company may be "works for", or the relationship between user and university may be "studied at". Thus, in the entity extraction phase 215 every normalized profile that comes from the scrapers is used to create one or more entities and relationships between them. The reunion of the nodes and relationships for all profiles form what may be referred to herein as "the entity graph". The nodes in this graph can be anything (e.g. a profile, a company, a website, a location, a role, a skill). If for example from a profile the entity extraction phase 215 determines that the person identified by a profile P from a certain website works for company C, then a node is created for both P and C and a relationships between them is created with the type "works_for". If a profile P links to a website W then two nodes are created and also a relationships between them with the type "links to", etc. Relationships are directed but can be traversed both ways.
The next phase 220 is referred to herein as the clustering phase. In this phase, the aggregator 102 creates a "profile graph" representing the gathered data on the various different people from the various different websites 103. The clustering phase 220 constructs the profile graph using the entity graph as its input.
A portion of such a profile graph is illustrated by way of example in Figure 4. The profile graph 400 comprises two types of element: nodes 401 and edges 402. Each node 401 represents a given profile from a given website 103 (so the different profiles from the different websites 103 each have their own respective node 401, including that the different profiles of the same person each have their own respective node 401). The edges 402 represent connections between profiles. In embodiments, there could simply be either one or zero edges between any given pair of nodes 401: i.e. they are either connected or not (e.g. based on some overall test of whether they correspond to the same person). However, preferably, in embodiments multiple edges can be allocated between any given pair of nodes 401. E.g. in the example of Figure 4, the nodes 401a and 401b are connected by three distinct edges 402i, 402Π, 402iii.
In this case, each edge 402 represents a match for a given feature. I.e. if both of a pair of nodes 401 represent profiles for which a certain feature is present (note that not all nodes necessarily have the same feature set), and if the values of that feature match, then a respective edge 402 is created between the two nodes 401. For instance, if the feature in question is name, and if the two profiles both include a name and the two values are both "Dave Example" then an edge 402i is created between the respective nodes 401a, 401b, with this edge 401i representing the feature of name. But if the values are instead, say, "Dave Example" and "Steve Forinstance", then no edge is added. And if the two values for another feature such as employer are both "SuperTechCo", then another edge 402H is added between the same pair of nodes 402a, 402b, and so forth. Other examples of features that could be used to create respective edges include: school, university, skills, hobbies, company, home town, current town of residence, country of residence, citizenship, address, email address, etc. Some such items of information could also be broken down into separate features, e.g. the name could be broken down into given name and family name, or the address could be broken down into two or more of street, town and postcode, etc.
This process is applied across all the possible combinations of node 401 in the graph 400, to try to find as many different connections for as many different features as possible. In embodiments, and edge 402 is created only for an exact match between the values of the feature in question. Alternatively however, it is not excluded that edges could be created based on an inexact match. E.g. metrics are known for measuring the similarity of two strings, and/or the aggregator 102 could be configured to recognize certain predetermined variants of a value (such as that Jim is another form of the name James).
Once the clustering phase 220 has created a suitably large profile graph 400 with a suitably large number of edges 402, the aggregator 102 proceeds to the next phase of the process, which is the profile matching phase 230. This phase looks for patterns in the profile graph 400 that indicate whether different profiles appear to belong to the same person (within some acceptable likelihood). As a simple example, one could guess that if two profiles have the same name and the same employer (so two edges 402 for two particular features), there is a 99% chance they are for the same person.
In embodiments, the profile matching phase 230 works based on any one or more of a variety of heuristics that may be evaluated based on the profile graph 400, and in embodiments based on a combination of such heuristics.
One example of such an heuristic is based on a number of other nodes in common within a predetermined number of hops in said graph (a hop is wherever nodes 401 are connected by at least one edge 402). For example, this heuristic may evaluated on a yes/no basis, such that it is true for a given pair of nodes 401 if the nodes are within a predetermined number of hops, e.g. they are adjacent neighbours (one hop) or within a path of two hops, but false otherwise. And/or, as another example, such an heuristic may measure a number of common neighbours within a predetermined number of hops (i.e. how many such neighbours exist). E.g. it may measure the number of adjacent neighbours in common (one hop), or a number of common neighbours within a path of two hops. In this case the heuristic may be true if two nodes 401 share above a threshold number of neighbours in common within the predetermined number of hops, and false otherwise; and/or the number of neighbours within the hop limit could be used as a measure of the likelihood of a match as a matter of degree The above types of heuristic can be used alone. However, such heuristics only determine whether nodes are connected or not, or to what degree they are connected, e.g. whether they are connected within some degree of separation, or whether they share a certain number of neighbours. I.e. these heuristics are only based on whether or not nodes 401 are connected at all (by any edge 402). Preferably however, to improve the matching process, the profile matching 230 may alternatively or additionally be based on one or more other heuristics that take into account the nature of the connections between nodes (i.e. one or more heuristics that make use of the fact that, in embodiments of the present disclosure, the edges are characterized as representing certain specific features).
An example of this is the number of edges between two nodes 401. Such an heuristic may be evaluated on a yes/no basis, such that it is true if the nodes share more than a threshold number of edges, e.g. two or three, and false otherwise. And/or, the number of edges may be taken as a measure of the likelihood that two nodes represent profiles of the same person, as a matter of degree. Another example is the rarity of the matching value, i.e. its frequency of occurrence within the profiles represented by the graph 400 (how many times does it occur statistically, e.g. as a proportion of the number of instances of edges representing the feature in question in the graph 400). A rare value of a given feature can be a strong indication of a link. For example, finding two profiles for the name "Dave Smith" does not give much confidence of a match, but finding two profiles for the name "Ezekial Q Nithercott" has a very low probability and therefore is a much stronger indication that they are likely to be for the same person. This heuristic could be evaluated on a yes/no basis, such that it is true if the frequency
(proportion of occurrences of the feature) is below a certain predetermined threshold, but false otherwise. And/or, the frequency could be used as a measure of the likelihood of a match as a matter of degree.
Yet another possible type of heuristic is a similarity between the values of a given feature. For instance, one or more metrics may be used that measure the similarity between two strings. Again this could be evaluated on a yes/no basis, so that true if the similarity is above a threshold and false otherwise; and/or, the similarity may be used as a measure of the likelihood of a match as a matter of degree.
Note: in embodiments where the clustering phase 220 only adds edges 401 for exact matches, the similarity is not considered to be an heuristic which qualifies an edge 401 per se. E.g. the test could be that on condition that two nodes 401 are connected by, say at least one edge 402, or at least two edges, the matching phase 230 then probes the profiles represented by those nodes further to look for features that are similar. Alternatively where edges 401 in the profile graph 400 are added in the clustering phase 220 to represent close but in inexact matches, the measure of similarity may indeed be considered as a property of the edge 401. Preferably a plurality of any two, more or all of the above metrics, and/or others, are combined in the decision making process in order to decode whether profiles are to be matched.
The output of the matching phase 230 is a second graph. Here, again each node represents a given profile from a given website 103. However, in this graph each node is connected by only one or zero edges: matched or not matched.
Optionally, a validation phase 240 is then applied to remove some of the connections (edges) from this second graph. Here, one or more further heuristics are applied to eliminate edges that represent unlikely matches. Preferably, this phase 240 at least includes eliminating edges between nodes 401 that represent profiles from the same website 103, because it is unlikely that the same person has two different profiles on the same site 103. Another example is to break up overly large bunches of nodes 401 that are still connected, on the basis that a given person is unlikely to have more than a certain number of profiles (e.g. while he or she may have a lot of profiles, numbers in the hundreds start to become unlikely). For instance, if it can be identified that two subsets of nodes each contain many common neighbours between them, but only one edge connects the two subsets, than that edge may be eliminated. The reason for the matching phase 230 and separate validation phase 240 (as opposed to, say, just not including edges between profiles from the same site 103 in the first place) is that it has been found to produce better results to find as many potential connections in the graph 400 as possible based on the heuristics used in the matching phase 230, then eliminate some in the validation phase 240; rather than be overly selective at the matching phase 230 and potentially miss some connections that might prove useful. The result of the matching phase 230 and optional validation phase 240 will be a set of discrete sub-clusters, or groups, each representing a different respective person.
Finally, the aggregator 102 proceeds to the identity building phase 250. Here it aggregates the profiles of each sub-cluster into a respective aggregate profile for the respective person, which it publishes via the Internet 101.
In one implementation, the aggregator 102 may host its own website which users can access via their user devices 104 in order to search for people from amongst the aggregated profiles. An example is illustrated in Figure 3, showing an example front-end user interface 300 of such a site. The user interface 300 comprises a search bar 301 in which a user can enter a search query, such as the name or a person or the name of a company. E.g. in the example shown, the user searches for the company name "SuperTechCo", to try to find people associated with this company. This brings up a list of results, each corresponding to a different person. For each result, the list may show a profile picture 302 included in the aggregate profile for the respective person, and/or the values of one, some or all of the features 303 in the aggregate profile (e.g. name, company, interests, etc.). In embodiments, the list may also include a set of icons or logos 304, one for each of the websites 103 from which the aggregate profile has been compiled. In further embodiments, the user may select (e.g. click or touch) one of the results in the list to summon up the complete aggregate profile 305.
Alternatively or additionally, the aggregator 102 may publish the aggregate profiles via other means, such as by making the aggregate profiles available to a plugin application or API of a third party 105 (being a different provider of a different product or service than the provider of the aggregation service 102, e.g. a different party, company, organization or legal entity). For instance, the aggregator 102 may provide or endorse a plugin application which plugs in to a web browser or to another internet-enabled application such as an email client or instant messaging (IM) client, enabling users of that application to access the people search functionality though that application via the plugin. As another example, the aggregator 102 may provide or endorse an API (application programming interface) which a third party 105 can integrate into their own computer system in order to access the people search through that system. An example application of this would be to incorporate the API into the internal computer system of a recruitment company in order to allow recruiters to collect information on job applicants or potential applicants. Note that the API may allow the aggregate profile information to be accessed in an automated fashion based on a database of names or other search criteria stored in the third party's system 105, e.g. so a recruiter can automatically update information on a large database of potential applicants they may wish to contact about new job openings.
It will be appreciated that the above embodiments have been described by way of example only. Other variants may become apparent to a person skilled in the art given the disclosure herein. The scope of the present disclosure is not limited by the described embodiments, but only by the accompanying claims.

Claims

1. A method of aggregating profile information, comprising:
from multiple websites, automatically gathering profiles of multiple people profiled on those websites via the Internet, including, for at least some of the people, gathering multiple profiles of the same person from different ones of the websites;
identifying multiple features in each of the profiles, including determining a value of each of the features;
normalizing the values of the features of each profile into a common format;
forming a profile graph by representing each of the profiles as a corresponding node in the profile graph and, based on the normalization into said common format, connecting each of a plurality of pairs of the nodes with one or more edges, each edge representing a match between the values of one of the features found in both profiles represented by the pair of nodes;
matching together different ones of said profiles into groups based on the edges between the corresponding nodes in the profile graph, each group estimated to be the profiles of a respective same one of the people;
for each of the groups, aggregating at least some of the profiles of the group into an aggregate profile of the respective person; and
making the aggregate profiles available via the Internet.
2. The method of claim 1, wherein the matching of the profiles comprises: for each of the pairs of nodes, matching the corresponding profiles together into the same group in dependence on a number of other nodes in common within a predetermined number of hops in said profile graph.
3. The method of claim 1 or 2, wherein for at least some of said plurality of pairs of nodes, the nodes in the pair are connected by more than one edge, each edge representing a match between the values of a different respective one of a plurality of said features found in both profiles represented by the pair of nodes.
4 The method of claim 3, wherein the matching of the profiles comprises: for each of the pairs of nodes, matching the corresponding profiles together into the same group in dependence on a number of edges between the nodes of the pair.
5. The method of claim 3 or 4, wherein the matching of the profiles comprises: for each of the pairs of nodes, matching the corresponding profiles together into the same group in dependence on a frequency of occurrence within the profile graph of a value of one of the features represented by an edge between the corresponding nodes.
6. The method of any preceding claim, wherein the matching of the profiles comprises: for each of the pairs of nodes, matching the corresponding profiles together into the same group in dependence on a measure of similarity between non-exactly matching values of a feature found in both the corresponding profiles.
7. The method of any preceding claim, wherein said identifying comprises an entity extraction phase which identifies which of the features occur in different ones of the profiles and represents each of those features as an entity in an entity graph, and which further identifies relationships between the features and represents the relationships in the entity graph; wherein the entity graph is an input to the step of forming the profile graph.
8. The method of any preceding claim, wherein the features include at least one of: name, academic institution, skills, occupation, employer, company, interests, and/or place of residence.
9. The method of any preceding claim, wherein the identification of the features is performed at least in part using natural language processing.
10. The method of any preceding claim, wherein:
the method further comprises a validation phase in which, for at least some of the groups, one or more of the profiles are eliminated from the group in dependence on a further comparison between the profiles in that group; and
said aggregation aggregates only the profiles remaining after said elimination.
11. The method of claim 10, wherein the further comparison comprises determining whether the group contains profiles from the same website, and if so, the validation phase eliminates one of the profiles from the same website.
12. The method of any preceding claim, wherein the aggregated profiles are made available through a searchable user interface.
13. The method of claim 12 , wherein:
when a search query is entered through said user interface searching for a value not yet represented in the profile graph, the search query automatically triggers a gathering, via the Internet, of one or more further profiles from one or more websites based on the search query; and
the method further comprises updating the profile graph to include the one or more further profiles, and based thereon generating a new aggregate profile for a new person and/or an update to one or more of the existing aggregate profiles
14. The method of any preceding claim, wherein the method is performed by a first provider, and the making available of the aggregated profiles comprises: making the aggregate profiles available to the public through a website run by the first provider.
15. The method of any preceding claim, wherein the method is performed by a first provider, and the making available of the aggregated profiles comprises: providing the aggregate profiles to a plugin of a web browser or other internet-enabled application provided by a second provider, such that the second provider can make the aggregate profiles available to users of said application.
16. The method of any preceding claim, wherein the method is performed by a first operator, and the making available of the aggregated profiles comprises: providing the aggregate profiles to an API of a computer system run by a second provider, so the second provider can make the aggregate profiles available to users of said computer system.
17. A server configured to perform the operations of any preceding claim.
18. A computer program product comprising code embodied on a computer-readable storage medium, and configured so as when run on one or more processors to perform the operations of any of claims 1 to 16.
PCT/EP2016/072737 2015-09-25 2016-09-23 Aggregating profile information WO2017050991A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1517008.7A GB2543740A (en) 2015-09-25 2015-09-25 Aggregating profile information
GB1517008.7 2015-09-25

Publications (1)

Publication Number Publication Date
WO2017050991A1 true WO2017050991A1 (en) 2017-03-30

Family

ID=54544126

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2016/072737 WO2017050991A1 (en) 2015-09-25 2016-09-23 Aggregating profile information

Country Status (2)

Country Link
GB (1) GB2543740A (en)
WO (1) WO2017050991A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10489388B1 (en) 2018-05-24 2019-11-26 People. ai, Inc. Systems and methods for updating record objects of tenant systems of record based on a change to a corresponding record object of a master system of record
US11924297B2 (en) 2018-05-24 2024-03-05 People.ai, Inc. Systems and methods for generating a filtered data set
US11949682B2 (en) 2018-05-24 2024-04-02 People.ai, Inc. Systems and methods for managing the generation or deletion of record objects based on electronic activities and communication policies

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7082426B2 (en) * 1993-06-18 2006-07-25 Cnet Networks, Inc. Content aggregation method and apparatus for an on-line product catalog
US20150227579A1 (en) * 2014-02-12 2015-08-13 Tll, Llc System and method for determining intents using social media data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7082426B2 (en) * 1993-06-18 2006-07-25 Cnet Networks, Inc. Content aggregation method and apparatus for an on-line product catalog
US20150227579A1 (en) * 2014-02-12 2015-08-13 Tll, Llc System and method for determining intents using social media data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Web scraping - Wikipedia, the free encyclopedia", 19 June 2015 (2015-06-19), XP055303583, Retrieved from the Internet <URL:https://en.wikipedia.org/w/index.php?title=Web_scraping&oldid=667566447> [retrieved on 20160919] *

Cited By (79)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10489388B1 (en) 2018-05-24 2019-11-26 People. ai, Inc. Systems and methods for updating record objects of tenant systems of record based on a change to a corresponding record object of a master system of record
US10489462B1 (en) 2018-05-24 2019-11-26 People.ai, Inc. Systems and methods for updating labels assigned to electronic activities
US10489387B1 (en) 2018-05-24 2019-11-26 People.ai, Inc. Systems and methods for determining the shareability of values of node profiles
US10489457B1 (en) 2018-05-24 2019-11-26 People.ai, Inc. Systems and methods for detecting events based on updates to node profiles from electronic activities
US10489430B1 (en) 2018-05-24 2019-11-26 People.ai, Inc. Systems and methods for matching electronic activities to record objects using feedback based match policies
WO2019227081A1 (en) * 2018-05-24 2019-11-28 People.ai, Inc. Systems and methods for maintaining a group node graph for group entities
US10496634B1 (en) 2018-05-24 2019-12-03 People.ai, Inc. Systems and methods for determining a completion score of a record object from electronic activities
US10496635B1 (en) 2018-05-24 2019-12-03 People.ai, Inc. Systems and methods for assigning tags to node profiles using electronic activities
US10496688B1 (en) 2018-05-24 2019-12-03 People.ai, Inc. Systems and methods for inferring schedule patterns using electronic activities of node profiles
US10498856B1 (en) 2018-05-24 2019-12-03 People.ai, Inc. Systems and methods of generating an engagement profile
US10503719B1 (en) 2018-05-24 2019-12-10 People.ai, Inc. Systems and methods for updating field-value pairs of record objects using electronic activities
US10504050B1 (en) 2018-05-24 2019-12-10 People.ai, Inc. Systems and methods for managing electronic activity driven targets
US10503783B1 (en) 2018-05-24 2019-12-10 People.ai, Inc. Systems and methods for generating new record objects based on electronic activities
US10509786B1 (en) 2018-05-24 2019-12-17 People.ai, Inc. Systems and methods for matching electronic activities with record objects based on entity relationships
US10509781B1 (en) 2018-05-24 2019-12-17 People.ai, Inc. Systems and methods for updating node profile status based on automated electronic activity
US10516587B2 (en) 2018-05-24 2019-12-24 People.ai, Inc. Systems and methods for node resolution using multiple fields with dynamically determined priorities based on field values
US10516784B2 (en) 2018-05-24 2019-12-24 People.ai, Inc. Systems and methods for classifying phone numbers based on node profile data
US10515072B2 (en) 2018-05-24 2019-12-24 People.ai, Inc. Systems and methods for identifying a sequence of events and participants for record objects
US10521443B2 (en) 2018-05-24 2019-12-31 People.ai, Inc. Systems and methods for maintaining a time series of data points
US10528601B2 (en) 2018-05-24 2020-01-07 People.ai, Inc. Systems and methods for linking record objects to node profiles
US10535031B2 (en) 2018-05-24 2020-01-14 People.ai, Inc. Systems and methods for assigning node profiles to record objects
US10545980B2 (en) 2018-05-24 2020-01-28 People.ai, Inc. Systems and methods for restricting generation and delivery of insights to second data source providers
US10552932B2 (en) 2018-05-24 2020-02-04 People.ai, Inc. Systems and methods for generating field-specific health scores for a system of record
US10565229B2 (en) 2018-05-24 2020-02-18 People.ai, Inc. Systems and methods for matching electronic activities directly to record objects of systems of record
US10585880B2 (en) 2018-05-24 2020-03-10 People.ai, Inc. Systems and methods for generating confidence scores of values of fields of node profiles using electronic activities
US10599653B2 (en) 2018-05-24 2020-03-24 People.ai, Inc. Systems and methods for linking electronic activities to node profiles
US10649999B2 (en) 2018-05-24 2020-05-12 People.ai, Inc. Systems and methods for generating performance profiles using electronic activities matched with record objects
US10649998B2 (en) 2018-05-24 2020-05-12 People.ai, Inc. Systems and methods for determining a preferred communication channel based on determining a status of a node profile using electronic activities
US10657129B2 (en) 2018-05-24 2020-05-19 People.ai, Inc. Systems and methods for matching electronic activities to record objects of systems of record with node profiles
US10657132B2 (en) 2018-05-24 2020-05-19 People.ai, Inc. Systems and methods for forecasting record object completions
US10657130B2 (en) 2018-05-24 2020-05-19 People.ai, Inc. Systems and methods for generating a performance profile of a node profile including field-value pairs using electronic activities
US10657131B2 (en) 2018-05-24 2020-05-19 People.ai, Inc. Systems and methods for managing the use of electronic activities based on geographic location and communication history policies
US10671612B2 (en) 2018-05-24 2020-06-02 People.ai, Inc. Systems and methods for node deduplication based on a node merging policy
US10678796B2 (en) 2018-05-24 2020-06-09 People.ai, Inc. Systems and methods for matching electronic activities to record objects using feedback based match policies
US10678795B2 (en) 2018-05-24 2020-06-09 People.ai, Inc. Systems and methods for updating multiple value data structures using a single electronic activity
US10679001B2 (en) 2018-05-24 2020-06-09 People.ai, Inc. Systems and methods for auto discovery of filters and processing electronic activities using the same
US10769151B2 (en) 2018-05-24 2020-09-08 People.ai, Inc. Systems and methods for removing electronic activities from systems of records based on filtering policies
US10860794B2 (en) 2018-05-24 2020-12-08 People. ai, Inc. Systems and methods for maintaining an electronic activity derived member node network
US10860633B2 (en) 2018-05-24 2020-12-08 People.ai, Inc. Systems and methods for inferring a time zone of a node profile using electronic activities
US10866980B2 (en) 2018-05-24 2020-12-15 People.ai, Inc. Systems and methods for identifying node hierarchies and connections using electronic activities
US10872106B2 (en) 2018-05-24 2020-12-22 People.ai, Inc. Systems and methods for matching electronic activities directly to record objects of systems of record with node profiles
US10878015B2 (en) 2018-05-24 2020-12-29 People.ai, Inc. Systems and methods for generating group node profiles based on member nodes
US10901997B2 (en) 2018-05-24 2021-01-26 People.ai, Inc. Systems and methods for restricting electronic activities from being linked with record objects
US10922345B2 (en) 2018-05-24 2021-02-16 People.ai, Inc. Systems and methods for filtering electronic activities by parsing current and historical electronic activities
US11017004B2 (en) 2018-05-24 2021-05-25 People.ai, Inc. Systems and methods for updating email addresses based on email generation patterns
US11048740B2 (en) 2018-05-24 2021-06-29 People.ai, Inc. Systems and methods for generating node profiles using electronic activity information
US11153396B2 (en) 2018-05-24 2021-10-19 People.ai, Inc. Systems and methods for identifying a sequence of events and participants for record objects
US11265390B2 (en) 2018-05-24 2022-03-01 People.ai, Inc. Systems and methods for detecting events based on updates to node profiles from electronic activities
US11265388B2 (en) 2018-05-24 2022-03-01 People.ai, Inc. Systems and methods for updating confidence scores of labels based on subsequent electronic activities
US11277484B2 (en) 2018-05-24 2022-03-15 People.ai, Inc. Systems and methods for restricting generation and delivery of insights to second data source providers
US11283888B2 (en) 2018-05-24 2022-03-22 People.ai, Inc. Systems and methods for classifying electronic activities based on sender and recipient information
US11283887B2 (en) 2018-05-24 2022-03-22 People.ai, Inc. Systems and methods of generating an engagement profile
US11363121B2 (en) 2018-05-24 2022-06-14 People.ai, Inc. Systems and methods for standardizing field-value pairs across different entities
US11394791B2 (en) 2018-05-24 2022-07-19 People.ai, Inc. Systems and methods for merging tenant shadow systems of record into a master system of record
US11418626B2 (en) 2018-05-24 2022-08-16 People.ai, Inc. Systems and methods for maintaining extracted data in a group node profile from electronic activities
US11451638B2 (en) 2018-05-24 2022-09-20 People. ai, Inc. Systems and methods for matching electronic activities directly to record objects of systems of record
US11457084B2 (en) 2018-05-24 2022-09-27 People.ai, Inc. Systems and methods for auto discovery of filters and processing electronic activities using the same
US11463534B2 (en) 2018-05-24 2022-10-04 People.ai, Inc. Systems and methods for generating new record objects based on electronic activities
US11463545B2 (en) 2018-05-24 2022-10-04 People.ai, Inc. Systems and methods for determining a completion score of a record object from electronic activities
US11470170B2 (en) 2018-05-24 2022-10-11 People.ai, Inc. Systems and methods for determining the shareability of values of node profiles
US11470171B2 (en) 2018-05-24 2022-10-11 People.ai, Inc. Systems and methods for matching electronic activities with record objects based on entity relationships
US11503131B2 (en) 2018-05-24 2022-11-15 People.ai, Inc. Systems and methods for generating performance profiles of nodes
US11563821B2 (en) 2018-05-24 2023-01-24 People.ai, Inc. Systems and methods for restricting electronic activities from being linked with record objects
US11641409B2 (en) 2018-05-24 2023-05-02 People.ai, Inc. Systems and methods for removing electronic activities from systems of records based on filtering policies
US11647091B2 (en) 2018-05-24 2023-05-09 People.ai, Inc. Systems and methods for determining domain names of a group entity using electronic activities and systems of record
US11805187B2 (en) 2018-05-24 2023-10-31 People.ai, Inc. Systems and methods for identifying a sequence of events and participants for record objects
US11831733B2 (en) 2018-05-24 2023-11-28 People.ai, Inc. Systems and methods for merging tenant shadow systems of record into a master system of record
US11876874B2 (en) 2018-05-24 2024-01-16 People.ai, Inc. Systems and methods for filtering electronic activities by parsing current and historical electronic activities
US11888949B2 (en) 2018-05-24 2024-01-30 People.ai, Inc. Systems and methods of generating an engagement profile
US11895207B2 (en) 2018-05-24 2024-02-06 People.ai, Inc. Systems and methods for determining a completion score of a record object from electronic activities
US11895205B2 (en) 2018-05-24 2024-02-06 People.ai, Inc. Systems and methods for restricting generation and delivery of insights to second data source providers
US11895208B2 (en) 2018-05-24 2024-02-06 People.ai, Inc. Systems and methods for determining the shareability of values of node profiles
US11909836B2 (en) 2018-05-24 2024-02-20 People.ai, Inc. Systems and methods for updating confidence scores of labels based on subsequent electronic activities
US11909837B2 (en) 2018-05-24 2024-02-20 People.ai, Inc. Systems and methods for auto discovery of filters and processing electronic activities using the same
US11909834B2 (en) 2018-05-24 2024-02-20 People.ai, Inc. Systems and methods for generating a master group node graph from systems of record
US11924297B2 (en) 2018-05-24 2024-03-05 People.ai, Inc. Systems and methods for generating a filtered data set
US11930086B2 (en) 2018-05-24 2024-03-12 People.ai, Inc. Systems and methods for maintaining an electronic activity derived member node network
US11949682B2 (en) 2018-05-24 2024-04-02 People.ai, Inc. Systems and methods for managing the generation or deletion of record objects based on electronic activities and communication policies
US11949751B2 (en) 2018-05-24 2024-04-02 People.ai, Inc. Systems and methods for restricting electronic activities from being linked with record objects

Also Published As

Publication number Publication date
GB201517008D0 (en) 2015-11-11
GB2543740A (en) 2017-05-03

Similar Documents

Publication Publication Date Title
US10936959B2 (en) Determining trustworthiness and compatibility of a person
Gurini et al. Temporal people-to-people recommendation on social networks with sentiment-based matrix factorization
Ozsoy From word embeddings to item recommendation
Abdel-Basset et al. A group decision making framework based on neutrosophic VIKOR approach for e-government website evaluation
Peng et al. A multi-valued neutrosophic qualitative flexible approach based on likelihood for multi-criteria decision-making problems
Rafiei et al. A novel method for expert finding in online communities based on concept map and PageRank
US20160203221A1 (en) System and apparatus for an application agnostic user search engine
Leme et al. Identifying candidate datasets for data interlinking
Cossu et al. A review of features for the discrimination of twitter users: application to the prediction of offline influence
US20150032751A1 (en) Methods and Systems for Utilizing Subject Matter Experts in an Online Community
CN105045931A (en) Video recommendation method and system based on Web mining
Abbasi et al. A social network system for analyzing publication activities of researchers
US10496716B2 (en) Discovery of network based data sources for ingestion and recommendations
CN104899236B (en) A kind of comment information display methods, apparatus and system
Shi et al. A social sensing model for event detection and user influence discovering in social media data streams
US11442972B2 (en) Methods and systems for modifying a search result
Hong et al. GRSAT: a novel method on group recommendation by social affinity and trustworthiness
Rezaie et al. Measuring time-sensitive user influence in Twitter
Razis et al. Discovering similar Twitter accounts using semantics
WO2017050991A1 (en) Aggregating profile information
CN106575418B (en) Suggested keywords
Yang et al. Finding experts in community question answering based on topic-sensitive link analysis
Kim et al. Topic-Driven SocialRank: Personalized search result ranking by identifying similar, credible users in a social network
Dai et al. The workforce analyzer: group discovery among LinkedIn public profiles
Dhekane et al. Talash: Friend Finding In Federated Social Networks.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16778715

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16778715

Country of ref document: EP

Kind code of ref document: A1