US20170310772A1 - Inferring the location of users in online social media platforms using social network analysis - Google Patents

Inferring the location of users in online social media platforms using social network analysis Download PDF

Info

Publication number
US20170310772A1
US20170310772A1 US14/210,265 US201414210265A US2017310772A1 US 20170310772 A1 US20170310772 A1 US 20170310772A1 US 201414210265 A US201414210265 A US 201414210265A US 2017310772 A1 US2017310772 A1 US 2017310772A1
Authority
US
United States
Prior art keywords
location
mapping
user
users
est
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US14/210,265
Other versions
US9794358B1 (en
Inventor
David A. Jurgens
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HRL Laboratories LLC
Original Assignee
HRL Laboratories LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HRL Laboratories LLC filed Critical HRL Laboratories LLC
Priority to US14/210,265 priority Critical patent/US9794358B1/en
Assigned to HRL LABORATORIES, LLC reassignment HRL LABORATORIES, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JURGENS, DAVID A.
Priority to US14/295,101 priority patent/US9953080B1/en
Priority to US14/535,812 priority patent/US10255352B1/en
Priority to US14/539,828 priority patent/US10726090B1/en
Priority to US14/639,979 priority patent/US10305845B1/en
Priority to US15/163,547 priority patent/US9892168B1/en
Publication of US9794358B1 publication Critical patent/US9794358B1/en
Application granted granted Critical
Publication of US20170310772A1 publication Critical patent/US20170310772A1/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/20Services signaling; Auxiliary data signalling, i.e. transmitting data via a non-traffic channel
    • H04W4/21Services signaling; Auxiliary data signalling, i.e. transmitting data via a non-traffic channel for social networking applications
    • H04L67/22
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/024Guidance services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services

Definitions

  • the present invention relates to a system for inferring the location of users in online social media platforms and, more particularly, to a system for inferring the location of users in online social media platforms using social network analysis.
  • Social media provides a new data source for observing the rapidly changing focus of public interests. Detecting the location from which a message originates provides a powerful way of aggregating content spatially. This spatial focus enables detecting regional differences, detecting emerging trends specific to regions, or even measuring information flow. However, little content is connected with ground truth location data.
  • Sadilek et al. (see Literature Reference No. 9) perform social network inference in order to estimate the user's true location.
  • their approach requires that both users' locations be known in order to estimate the social relationship, which limits the approach to only those individuals with known locations.
  • Hetch et al. (see Literature Reference No. 4) and Pontes et al. (see Literature Reference No. 8) infer user locations from self provided location information in TwitterTM and FourSquareTM, respectively. While Pontes et al. (see Literature Reference No. 8) reported more than 90% coverage of users with this method, no attempt was made to infer the locations of the remaining users. Hetch et al. (see Literature Reference No. 4) found significantly less information in TwitterTM with a high error rate.
  • the present invention relates to system for inferring the location of users in online social media platforms and, more particularly, to a system for inferring the location of users in online social media platforms using social network analysis.
  • the system comprises one or more processors and a memory having instructions such that when the instructions are executed, the one or more processors perform multiple operations.
  • the system extracts a social network from data from at least one social media platform, wherein the social network comprises a plurality of users connected through social relationships, and wherein each user in the plurality of users has an identity on each social media platform.
  • a mapping in the social network is generated from each user in the plurality of users to the user's estimated geographical location, resulting in an estimated location mapping Est.
  • a mapping in the social network is generated from each user in the plurality of users having known geographical location data to the user's known geographical location, resulting in a known location mapping SL.
  • the estimated location mapping Est is updated to have the same mapping as the known location mapping SL until a predetermined convergence criteria is met.
  • the location of j in a current estimated location mapping Est′ is updated to be the location in the known location mapping SL.
  • the estimated geographical location of the users in N is added to a set of locations NL.
  • a set of final geographical locations of the users in N is estimated using a geometric median metric.
  • the users in N are mapped to their final estimated geographical locations in the social network.
  • the system provides a subgraph of the social network for N and the set of locations NL.
  • the current estimated location mapping Est′ is updated with the final estimated geographical locations of the users in N, and the mappings in the estimated location mapping Est are replaced with those in the current estimated location mapping Est′.
  • the system combines a user's identities from all social media platforms, such that each user is represented as a single individual in the social network.
  • estimated geographical location data and known geographical location data from all social media platforms for a user is merged.
  • only those users in N who also have social relationships with each other are selected for geographical location estimation using the geometric median metric.
  • the present invention also comprises a method for causing a processor to perform the operations described herein.
  • the present invention also comprises a computer program product comprising computer-readable instructions stored on a non-transitory computer-readable medium that are executable by a computer having a processor for causing the processor to perform the operations described herein.
  • FIG. 1 is a flow diagram for inferring the location of users in online social media platforms using social network analysis according to the principles of the present invention
  • FIG. 2 is a table of performance metrics for user location inference according to the principles of the present invention.
  • FIG. 3 is an illustration of a data processing system according to the principles of the present invention.
  • FIG. 4 is an illustration of a computer program product according to the principles of the present invention.
  • the present invention relates to a system for inferring the location of users in online social media platforms and, more particularly, to a system for inferring the location of users in online social media platforms using social network analysis.
  • the following description is presented to enable one of ordinary skill in the art to make and use the invention and to incorporate it in the context of particular applications. Various modifications, as well as a variety of uses, in different applications will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to a wide range of embodiments. Thus, the present invention is not intended to be limited to the embodiments presented, but is to be accorded with the widest scope consistent with the principles and novel features disclosed herein.
  • any element in a claim that does not explicitly state “means for” performing a specified function, or “step for” performing a specific function, is not to be interpreted as a “means” or “step” clause as specified in 35 U.S.C. Section 112, Paragraph 6.
  • the use of “step of” or “act of” in the claims herein is not intended to invoke the provisions of 35 U.S.C. 112, Paragraph 6.
  • the labels left, right, front, back, top, bottom, forward, reverse, clockwise and counter-clockwise have been used for convenience purposes only and are not intended to imply any particular fixed direction. Instead, they are used to reflect relative locations and/or directions between various portions of an object. As such, as the present invention is changed, the above labels may change their orientation.
  • the present invention has three “principal” aspects.
  • the first is a system for inferring the location of users in online social media platforms.
  • the system is typically in the form of a computer system, computer component, or computer network operating software or in the form of a “hard-coded” instruction set.
  • This system may take a variety of forms with a variety of hardware devices and may include computer networks, handheld computing devices, cellular networks, satellite networks, and other communication devices. As can be appreciated by one skilled in the art, this system may be incorporated into a wide variety of devices that provide different functionalities.
  • the second principal aspect is a method for inferring the location of users in online social media platforms.
  • the third principal aspect is a computer program product.
  • the computer program product generally represents computer-readable instruction means (instructions) stored on a non-transitory computer-readable medium such as an optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape.
  • a non-transitory computer-readable medium such as an optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape.
  • a non-transitory computer-readable medium such as an optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape.
  • CD compact disc
  • DVD digital versatile disc
  • magnetic storage device such as a floppy disk or magnetic tape.
  • Other, non-limiting examples of computer-readable media include hard disks, read-only memory (ROM), and flash-type memories.
  • instructions generally indicates a set of operations to be performed on a computer, and may represent pieces of a whole program or individual, separable, software modules.
  • Non-limiting examples of “instructions” include computer program code (source or object code) and “hard-coded” electronics (i.e., computer operations coded into a computer chip).
  • the “instructions” may be stored on any non-transitory computer-readable medium such as a floppy disk, a CD-ROM, a flash drive, and in the memory of a computer.
  • FIG. 1 illustrates the information flow according to the principles of the present invention.
  • the method begins with one or more social network platforms.
  • FIG. 1 depicts multiple social media platforms 100 , though the method is generalizable from a single platform to any number of platforms. All that is required is that the social medial platforms 100 have some explicitly stated or implicitly visible social relationships.
  • a social network inference stage 102 is responsible for extracting the social network from the available data from the social media platform 100 .
  • TwitterTM and FourSquareTM social media platforms were used.
  • the TwitterTM network offers implicit and explicit networks. It was found that the implicit network is generated by observing users who have both communicated with each other at least once. This implicit network is referred to as the mention network. In contrast. FourSquareTM allows users to explicitly state their friendship with each other. These relationships can be accessed using the FourSquareTM application programming interface (API). As can be appreciated by one skilled in the art, the present invention is also generalizable to other types of social media platforms that have different social network constraints. For instance, the present invention was also tested using the TwitterTM follower network, which generated good performance, but not as high as the performance of the mention network.
  • API application programming interface
  • the social media platform's 100 data is mined to obtain a list of ground truth locations for users. These locations form the seed locations from which other user's locations may be inferred in an exact location extraction stage 104 .
  • GPS global positioning system
  • GPS-tagged tweets those users that have at least five messages that all occur within a 30 kilometer (km) distance of each other are selected. This effectively removes individuals with too few tweets and those users who travel frequently and tweet from many locations. From the remaining users, each user's location is estimated as the location that is the L1 multivariate median (i.e., geometric multivariate mean) among the locations of their tweets, as will be described in detail below.
  • a single individual may have multiple identities on different social media platforms 100 .
  • a social network merger stage 106 combines those identities together so that an individual is represented only as a single node in the inferred social network.
  • the system according to the principles of the present invention uses metadata provided by users to link their FourSquareTM and TwitterTM accounts. This approach is generalizable to work on other social media platforms such as LinkedInTM and TumblrTM. However, additional methods could be used that infer common identities from the user's profiles or discussed content.
  • Multiple social media platforms 100 may report different location data for the same user.
  • An exact location merger stage 108 merges the combined information together, using all available location data from the different social media platforms 100 as well as the discovery of an individual's multiple identities on those social media platforms 100 .
  • a priority-based ranking was used, reporting the location extracted from GPS data from TwitterTM, and, if not available, using the location reported on FourSquareTM. Further work could generalize this to fuse the two based on the amount of GPS data available, or to use the FourSquareTM location as a prior when computing the location from the GPS data.
  • the locations of the remaining individuals in the social network may be inferred in a location inference stage 110 .
  • the locations of the remaining individuals in the social network may be estimated in a location inference stage 110 to generate location estimates.
  • the location estimates are then used in the final location estimates stage 112 .
  • the present invention comprises a process that derives from a standard label propagation framework, but takes geography into account when selecting the new label.
  • Literature Reference No. 10 provides a description of a standard label propagation framework. The process according to the principles of the present invention proceeds as follows:
  • the location inference process (defined by the geometric multivariate median above) is not expected to converge and, therefore, some stopping criterion is needed.
  • a criterion could be a fixed number of iterations, the number of users who have been located, or the percentage change in users with new locations. In experimental studies, it was found that the network was sufficiently covered after a few iterations (usually four), after which the performance did not improve.
  • the individual's locations as determined by the geometric multivariate median are emitted, and individuals are mapped to the final, estimated locations. These locations can serve as strong priors as to where the user's messages come from.
  • the method described herein was tested using a 10% sample of TwitterTM messages from April 2012 to November 2012. This sample produced a network of bidirectional user mentions with 47,760,573 users and with 254,263,081 inferred social relationships between those users. Using the FourSquareTM API, user profiles and friends of users were crawled, resulting in a network with 3,976,819 users and 17,619,191 relations between these users. Using the information about linked accounts with the two social media platforms, the networks were combined into a single social network that had 50,741,905 unique users, with approximately 1.6 million (M) users having identities in both platforms.
  • M 1.6 million
  • the combination of networks also served to validate the social relationship inference on TwitterTM, with approximately 7.5M users who had edges in the TwitterTM mention network also having explicitly indicated friendships in the FourSquareTM social network. Locations were extracted for 2,554,064 users in TwitterTM, which was approximately 5% of the network.
  • the table 200 in FIG. 2 reports the performance of the experimental study. Specifically, the table 200 highlights three trends in performance.
  • the social triangle 202 heuristic produces the highest performance on the matching metrics 204 and median error metrics 206 .
  • noise is reduced and accuracy improves.
  • the method suffers from the lowest recall (i.e., percent located 208 ), only estimating users for 54% of the network. Further experiments revealed that increasing the number of iterations does not increase this percentage substantially.
  • the system according to the principles of the present invention was compared against an oracle-based method (upper bound column 214 ) that always estimates the location of an individual as the location of their closest neighbor. Due to the presence of noise in the neighbors, this should not be considered a true upper bound on performance; however, it does represent what performance would be expected if the closest location was always selected from among the neighboring locations at algorithm initialization time.
  • an oracle-based method (upper bound column 214 ) that always estimates the location of an individual as the location of their closest neighbor. Due to the presence of noise in the neighbors, this should not be considered a true upper bound on performance; however, it does represent what performance would be expected if the closest location was always selected from among the neighboring locations at algorithm initialization time.
  • the system infers the locations of users in arbitrary online social media platforms where users are connected by inferred or explicitly stated social relationships, and where at least a small number of users share their true or estimated location.
  • the invention significantly advances the state-of-the-art by (1) providing better data coverage than is available using current methods and (2) being able to infer locations from users whose associated content in the social media platform offers no indication of their geographic vicinity.
  • FIG. 3 An example of a computer system 300 in accordance with one aspect is shown in FIG. 3 .
  • the computer system 300 is configured to perform calculations, processes, operations, and/or functions associated with a program or algorithm.
  • certain processes and steps discussed herein are realized as a series of instructions (e.g., software program) that reside within computer readable memory units and are executed by one or more processors of the computer system 300 . When executed, the instructions cause the computer system 300 to perform specific actions and exhibit specific behavior, such as described herein.
  • the computer system 300 may include an address/data bus 302 that is configured to communicate information. Additionally, one or more data processing units, such as a processor 304 , are coupled with the address/data bus 302 .
  • the processor 304 is configured to process information and instructions. In one aspect, the processor 304 is a microprocessor. Alternatively, the processor 304 may be a different type of processor such as a parallel processor, or a field programmable gate array.
  • the computer system 300 is configured to utilize one or more data storage units.
  • the computer system 300 may include a volatile memory unit 306 (e.g., random access memory (“RAM”), static RAM, dynamic RAM, etc.) coupled with the address/data bus 302 , wherein a volatile memory unit 306 is configured to store information and instructions for the processor 304 .
  • RAM random access memory
  • static RAM static RAM
  • dynamic RAM dynamic RAM
  • the computer system 300 further may include a non-volatile memory unit 308 (e.g., read-only memory (“ROM”), programmable ROM (“PROM”), erasable programmable ROM (“EPROM”), electrically erasable programmable ROM “EEPROM”), flash memory, etc.) coupled with the address/data bus 302 , wherein the non-volatile memory unit 308 is configured to store static information and instructions for the processor 304 .
  • the computer system 300 may execute instructions retrieved from an online data storage unit such as in “Cloud” computing.
  • the computer system 300 also may include one or more interfaces, such as an interface 310 , coupled with the address/data bus 302 .
  • the one or more interfaces are configured to enable the computer system 300 to interface with other electronic devices and computer systems.
  • the communication interfaces implemented by the one or more interfaces may include wireline (e.g., serial cables, modems, network adaptors, etc.) and/or wireless (e.g., wireless modems, wireless network adaptors, etc.) communication technology.
  • the computer system 300 may include an input device 312 coupled with the address/data bus 302 , wherein the input device 312 is configured to communicate information and command selections to the processor 300 .
  • the input device 312 is an alphanumeric input device, such as a keyboard, that may include alphanumeric and/or function keys.
  • the input device 312 may be an input device other than an alphanumeric input device.
  • the computer system 300 may include a cursor control device 314 coupled with the address/data bus 302 , wherein the cursor control device 314 is configured to communicate user input information and/or command selections to the processor 300 .
  • the cursor control device 314 is implemented using a device such as a mouse, a track-ball, a track-pad, an optical tracking device, or a touch screen.
  • the cursor control device 314 is directed and/or activated via input from the input device 312 , such as in response to the use of special keys and key sequence commands associated with the input device 312 .
  • the cursor control device 314 is configured to be directed or guided by voice commands.
  • the computer system 300 further may include one or more optional computer usable data storage devices, such as a storage device 316 , coupled with the address/data bus 302 .
  • the storage device 316 is configured to store information and/or computer executable instructions.
  • the storage device 316 is a storage device such as a magnetic or optical disk drive (e.g., hard disk drive (“HDD”), floppy diskette, compact disk read only memory (“CD-ROM”), digital versatile disk (“DVD”)).
  • a display device 318 is coupled with the address/data bus 302 , wherein the display device 318 is configured to display video and/or graphics.
  • the display device 318 may include a cathode ray tube (“CRT”), liquid crystal display (“LCD”), field emission display (“FED”), plasma display, or any other display device suitable for displaying video and/or graphic images and alphanumeric characters recognizable to a user.
  • CTR cathode ray tube
  • LCD liquid crystal display
  • FED field emission display
  • plasma display or any other display device suitable for displaying video and/or graphic images and alphanumeric characters recognizable to a user.
  • the computer system 300 presented herein is an example computing environment in accordance with one aspect.
  • the non-limiting example of the computer system 300 is not strictly limited to being a computer system.
  • the computer system 300 represents a type of data processing analysis that may be used in accordance with various aspects described herein.
  • other computing systems may also be implemented.
  • the spirit and scope of the present technology is not limited to any single data processing environment.
  • one or more operations of various aspects of the present technology are controlled or implemented using computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components and/or data structures that are configured to perform particular tasks or implement particular abstract data types.
  • one aspect provides that one or more aspects of the present technology are implemented by utilizing one or more distributed computing environments, such as where tasks are performed by remote processing devices that are linked through a communications network, or such as where various program modules are located in both local and remote computer-storage media including memory-storage devices.
  • FIG. 4 An illustrative diagram of a computer program product embodying the present invention is depicted in FIG. 4 .
  • the computer program product is depicted as either a floppy disk 400 or an optical disk 402 .
  • the computer program product generally represents computer readable code (i.e., instruction means or instructions) stored on any compatible non-transitory computer readable medium.

Abstract

Described is a system for inferring the location of users in online social media platforms using social network analysis. A social network is first extracted from data from at least one social media platform. A mapping is generated from each user to the user's estimated geographical location in the social network, resulting in an estimated location mapping. A mapping is generated from each user to the user's known geographical location, if known, resulting in a known location mapping. The estimated location mapping is updated to match the known location mapping. The location for each user j in the known location mapping is updated in a current estimated location mapping. The final geographical locations of users connected with j are estimated using a geometric median metric. Finally, the final estimated geographical locations of users connected with j are mapped into the social network.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This is a Non-Provisional patent application of U.S. Provisional Application No. 61/809,160, filed in the United States on Apr. 5, 2013, entitled, “Inferring the Location of Users in Online Social Media Platforms Using Social Network Analysis.”
  • GOVERNMENT LICENSE RIGHTS
  • This invention was made with government support under U.S. Government Contract Number BFL1 OSI, IARPA Open Source Indicator. The government has certain rights in the invention.
  • BACKGROUND OF THE INVENTION (1) Field of Invention
  • The present invention relates to a system for inferring the location of users in online social media platforms and, more particularly, to a system for inferring the location of users in online social media platforms using social network analysis.
  • (2) Description of Related Art
  • Social media provides a new data source for observing the rapidly changing focus of public interests. Detecting the location from which a message originates provides a powerful way of aggregating content spatially. This spatial focus enables detecting regional differences, detecting emerging trends specific to regions, or even measuring information flow. However, little content is connected with ground truth location data.
  • Several works have examined location inference on the Twitter™ social media platform. Cheng et al. (see the List of Incorporated Cited Literature References, Literature Reference No. 1), Mahmud et al. (see Literature Reference No. 6), and Ikawa et al. (see Literature Reference No. 5) have examined using the text content produced by a user for inferring their location. While this has produced good results, the approach is limited to only those users who generated text that contained geographic references. Furthermore, their approaches were only tested on English.
  • Sadilek et al. (see Literature Reference No. 9) perform social network inference in order to estimate the user's true location. However, their approach requires that both users' locations be known in order to estimate the social relationship, which limits the approach to only those individuals with known locations.
  • Davis Jr. et al. (see Literature Reference No. 2) use a user's follower network in Twitter™ to perform location inference. They use only one round of standard label propagation to infer the location, which can result in limited coverage. Furthermore, their work was tested only on a small set of users, so whether their work is generalizable to larger sets of users remains untested.
  • Hetch et al. (see Literature Reference No. 4) and Pontes et al. (see Literature Reference No. 8) infer user locations from self provided location information in Twitter™ and FourSquare™, respectively. While Pontes et al. (see Literature Reference No. 8) reported more than 90% coverage of users with this method, no attempt was made to infer the locations of the remaining users. Hetch et al. (see Literature Reference No. 4) found significantly less information in Twitter™ with a high error rate.
  • Each of the prior methods described above exhibit limitations that make them incomplete. Thus, a continuing need exists for a method for inferring a user's location from their social network and a small amount of ground truth data using an inferred social network designed to maximize the location inference accuracy.
  • SUMMARY OF THE INVENTION
  • The present invention relates to system for inferring the location of users in online social media platforms and, more particularly, to a system for inferring the location of users in online social media platforms using social network analysis. The system comprises one or more processors and a memory having instructions such that when the instructions are executed, the one or more processors perform multiple operations. The system extracts a social network from data from at least one social media platform, wherein the social network comprises a plurality of users connected through social relationships, and wherein each user in the plurality of users has an identity on each social media platform. A mapping in the social network is generated from each user in the plurality of users to the user's estimated geographical location, resulting in an estimated location mapping Est. Then a mapping in the social network is generated from each user in the plurality of users having known geographical location data to the user's known geographical location, resulting in a known location mapping SL. The estimated location mapping Est is updated to have the same mapping as the known location mapping SL until a predetermined convergence criteria is met. For each user j in the plurality of users having a mapping in the known location mapping SL, the location of j in a current estimated location mapping Est′ is updated to be the location in the known location mapping SL. For each user in the plurality of users in a group of users N having a social relationship with j and having a mapping in the estimated location mapping Est, the estimated geographical location of the users in N is added to a set of locations NL. A set of final geographical locations of the users in N is estimated using a geometric median metric. The users in N are mapped to their final estimated geographical locations in the social network.
  • In another aspect, the system provides a subgraph of the social network for N and the set of locations NL. The current estimated location mapping Est′ is updated with the final estimated geographical locations of the users in N, and the mappings in the estimated location mapping Est are replaced with those in the current estimated location mapping Est′.
  • In another aspect, the system combines a user's identities from all social media platforms, such that each user is represented as a single individual in the social network.
  • In another aspect, estimated geographical location data and known geographical location data from all social media platforms for a user is merged.
  • In another aspect, only those users in N who also have social relationships with each other are selected for geographical location estimation using the geometric median metric.
  • In another aspect, the present invention also comprises a method for causing a processor to perform the operations described herein.
  • In yet another aspect, the present invention also comprises a computer program product comprising computer-readable instructions stored on a non-transitory computer-readable medium that are executable by a computer having a processor for causing the processor to perform the operations described herein.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The objects, features and advantages of the present invention will be apparent from the following detailed descriptions of the various aspects of the invention in conjunction with reference to the following drawings, where:
  • FIG. 1 is a flow diagram for inferring the location of users in online social media platforms using social network analysis according to the principles of the present invention;
  • FIG. 2 is a table of performance metrics for user location inference according to the principles of the present invention;
  • FIG. 3 is an illustration of a data processing system according to the principles of the present invention; and
  • FIG. 4 is an illustration of a computer program product according to the principles of the present invention.
  • DETAILED DESCRIPTION
  • The present invention relates to a system for inferring the location of users in online social media platforms and, more particularly, to a system for inferring the location of users in online social media platforms using social network analysis. The following description is presented to enable one of ordinary skill in the art to make and use the invention and to incorporate it in the context of particular applications. Various modifications, as well as a variety of uses, in different applications will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to a wide range of embodiments. Thus, the present invention is not intended to be limited to the embodiments presented, but is to be accorded with the widest scope consistent with the principles and novel features disclosed herein.
  • In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without necessarily being limited to these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
  • The reader's attention is directed to all papers and documents which are filed concurrently with this specification and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference. All the features disclosed in this specification, (including any accompanying claims, abstract, and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
  • Furthermore, any element in a claim that does not explicitly state “means for” performing a specified function, or “step for” performing a specific function, is not to be interpreted as a “means” or “step” clause as specified in 35 U.S.C. Section 112, Paragraph 6. In particular, the use of “step of” or “act of” in the claims herein is not intended to invoke the provisions of 35 U.S.C. 112, Paragraph 6.
  • Please note, if used, the labels left, right, front, back, top, bottom, forward, reverse, clockwise and counter-clockwise have been used for convenience purposes only and are not intended to imply any particular fixed direction. Instead, they are used to reflect relative locations and/or directions between various portions of an object. As such, as the present invention is changed, the above labels may change their orientation.
  • Before describing the invention in detail, first a list of cited literature references used in the description is provided. Next, a description of various principal aspects of the present invention is provided. Finally, specific details of the present invention are provided to give an understanding of the specific aspects.
  • (1) List of Incorporated Cited Literature References
  • The following references are cited throughout this application. For clarity and convenience, the references are listed herein as a central resource for the reader. The following references are hereby incorporated by reference as though fully included herein. The references are cited in the application by referring to the corresponding literature reference number, as follows:
    • 1. Cheng, Z.; Caverlee, J.; and Lee, K. 2010. You are where you tweet: a content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management, 759-768. ACM.
    • 2. Davis Jr, C.; Pappa, G.; de Oliveira, D.; and de L Arcanjo, F. 2011. Inferring the location of twitter messages based on user relationships. Transactions in GIS 15(6):735-751.
    • 3. Goldenberg, J., and Levy, M. 2009. Distance is not dead: Social interaction and geographical distance in the internet era. arXiv preprint arXiv:0906.3202.
    • 4. Hecht, B.; Hong, L.; Suh, B.; and Chi, E. 2011. Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In Proceedings of the 2011 annual conference on Human factors in computing systems, 237-246. ACM.
    • 5. Ikawa, Y.; Enoki, M.; and Tatsubori, M. 2012. Location inference using microblog messages. In Proceedings of the 21st international conference companion on World Wide Web, 687-690. ACM.
    • 6. Mahmud, J.; Nichols, J.; and Drews, C. 2012. Where is this tweet from? inferring home locations of twitter users. Proc AAAI ICWSM 12.
    • 7. Mok, D.; Wellman, B.; and Carrasco, J. 2010. Does distance matter in the age of the internet? Urban Studies 47(13): 2747-2783.
    • 8. Pontes, T.; Vasconcelos, M.; Almeida, J.; Kumaraguru, P.; and Almeida, V. 2012. We know where you live: Privacy characterization of foursquare behavior. In UbiComp '12.
    • 9. Sadilek, A.; Kautz, H.; and Bigham, J. 2012. Finding your friends and following them to where you are. In Proceedings of the fifth ACM international conference on Web search and data mining, 723-732. ACM.
    • 10. Zhu, Xiaojin, and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.
    • 11. Vincenty, Thaddeus. Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey review 23.176 (1975): 88-93.
    • 12. Ronkainen, Oja, and Orponen. 2003. Computation of the multivariate Oja median. Developments in Robust Statistics, 344-359.
    • 13. Vardi and Zhang. 2000. The multivariate L1-median and associated data depth. Proceedings of the National Academy of Sciences. 97(4): 1423-6.
  • (2) Principal Aspects
  • The present invention has three “principal” aspects. The first is a system for inferring the location of users in online social media platforms. The system is typically in the form of a computer system, computer component, or computer network operating software or in the form of a “hard-coded” instruction set. This system may take a variety of forms with a variety of hardware devices and may include computer networks, handheld computing devices, cellular networks, satellite networks, and other communication devices. As can be appreciated by one skilled in the art, this system may be incorporated into a wide variety of devices that provide different functionalities. The second principal aspect is a method for inferring the location of users in online social media platforms. The third principal aspect is a computer program product. The computer program product generally represents computer-readable instruction means (instructions) stored on a non-transitory computer-readable medium such as an optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape. Other, non-limiting examples of computer-readable media include hard disks, read-only memory (ROM), and flash-type memories.
  • The term “instructions” as used with respect to this invention generally indicates a set of operations to be performed on a computer, and may represent pieces of a whole program or individual, separable, software modules. Non-limiting examples of “instructions” include computer program code (source or object code) and “hard-coded” electronics (i.e., computer operations coded into a computer chip). The “instructions” may be stored on any non-transitory computer-readable medium such as a floppy disk, a CD-ROM, a flash drive, and in the memory of a computer.
  • (3) Specific Details
  • Described is a system that leverages social relationships in online social media platforms. Recent work has demonstrated that the locality of social relationships still permeates the online space with a strong bias towards having online social relationships with individuals that are nearby (see Literature Reference Nos. 3 and 7). Accordingly, when the locations of a user's relations are aggregated, they present a noisy, but useful, source of data from which the user's location can be inferred.
  • FIG. 1 illustrates the information flow according to the principles of the present invention. The method begins with one or more social network platforms. FIG. 1 depicts multiple social media platforms 100, though the method is generalizable from a single platform to any number of platforms. All that is required is that the social medial platforms 100 have some explicitly stated or implicitly visible social relationships.
  • Once the social media platforms 100 have been selected, two stages appear in parallel. A social network inference stage 102 is responsible for extracting the social network from the available data from the social media platform 100. As a non-limiting example and in experimental studies using the system according to the principles of the present invention, both the Twitter™ and FourSquare™ social media platforms were used.
  • The Twitter™ network offers implicit and explicit networks. It was found that the implicit network is generated by observing users who have both communicated with each other at least once. This implicit network is referred to as the mention network. In contrast. FourSquare™ allows users to explicitly state their friendship with each other. These relationships can be accessed using the FourSquare™ application programming interface (API). As can be appreciated by one skilled in the art, the present invention is also generalizable to other types of social media platforms that have different social network constraints. For instance, the present invention was also tested using the Twitter™ follower network, which generated good performance, but not as high as the performance of the mention network.
  • Concurrently with the social network inference stage 102, the social media platform's 100 data is mined to obtain a list of ground truth locations for users. These locations form the seed locations from which other user's locations may be inferred in an exact location extraction stage 104. For the Twitter™ platform, global positioning system (GPS)-tagged messages were used as a source of ground truth information. These messages occur rarely, accounting for approximately 0.7% of all messages. For each user with GPS-tagged tweets, those users that have at least five messages that all occur within a 30 kilometer (km) distance of each other are selected. This effectively removes individuals with too few tweets and those users who travel frequently and tweet from many locations. From the remaining users, each user's location is estimated as the location that is the L1 multivariate median (i.e., geometric multivariate mean) among the locations of their tweets, as will be described in detail below.
  • In the FourSquare™ platform, users may specify their location in their profile. Literature Reference No. 8 reported that this location is reliable, and the source of information covers over 90% of the users. Through testing it was confirmed that this self-reported location matched the location computed from non-FourSquare™ data for that user. Therefore, that location data was selected for use in the system according to the principles of the present invention. The data itself is in the form of a text name, which must then be converted to a specific coordinate location. The Google™ Geocoding API developed by Google™ along with the GeoNames geographical database was used to map each name to a canonical latitude and longitude.
  • Referring to FIG. 1, a single individual may have multiple identities on different social media platforms 100. A social network merger stage 106 combines those identities together so that an individual is represented only as a single node in the inferred social network. In one aspect, the system according to the principles of the present invention uses metadata provided by users to link their FourSquare™ and Twitter™ accounts. This approach is generalizable to work on other social media platforms such as LinkedIn™ and Tumblr™. However, additional methods could be used that infer common identities from the user's profiles or discussed content.
  • Multiple social media platforms 100 may report different location data for the same user. An exact location merger stage 108 merges the combined information together, using all available location data from the different social media platforms 100 as well as the discovery of an individual's multiple identities on those social media platforms 100. A priority-based ranking was used, reporting the location extracted from GPS data from Twitter™, and, if not available, using the location reported on FourSquare™. Further work could generalize this to fuse the two based on the amount of GPS data available, or to use the FourSquare™ location as a prior when computing the location from the GPS data.
  • Once the network has been constructed and seed locations have been computed, the locations of the remaining individuals in the social network may be inferred in a location inference stage 110. The locations of the remaining individuals in the social network may be estimated in a location inference stage 110 to generate location estimates. The location estimates are then used in the final location estimates stage 112. The present invention comprises a process that derives from a standard label propagation framework, but takes geography into account when selecting the new label. Literature Reference No. 10 provides a description of a standard label propagation framework. The process according to the principles of the present invention proceeds as follows:
      • 1. Let SN be the social network
      • 2. Let Est be a mapping from an individual to their estimated location
      • 3. Let SL be a mapping from an individual to their known location (seed locations)
      • 4. Update Est to have the same mapping as SL
      • 5. Repeat until some convergence criteria is met
        • a) Let Est′ be the updated individual->location mapping for this iteration
        • b) For each individual j
          • i. if j has a mapping in SL
            • 1. Update the location of j in Est′ to be the location in SL
            • 2. Continue to next individual in step (b)
          • ii. Let N be the set of individuals who have a social relationship with j
          • iii. Let NL be a set of locations
          • iv. For each individual k in N
            • 1. If k has a mapping in Est
            •  a. Add the location of k in Est to NL
          • v. Estimate the location of j using a geometric median, providing the subgraph of the social network for N and the locations NL
          • vi. Update Est′ with the new location of j
        • c) Replace the mappings in Est with those in Est′
          The estimate of individual j's location is the geometric median of j's neighbors' locations, where k is used to denote the neighbors, and j denotes the user that is being estimated in step b) above.
  • Key to the process outlined above is the estimation step in (5.b.v). Traditional label propagation would select the new location for individual k as the location that appears most frequently amongst its neighbors. However, this ignores the fact that the labels are related. Because the labels are actually locations, they may be spatially compared to reveal more information about where the individual could be located. Therefore, the system according to the principles of the present invention uses the geometric median to estimate the new location. Furthermore, two strategies are adopted: (1) using the geometric multivariate median only; and (2) first applying a novel heuristic, which is referred to as the “Social Triangle Median.” Given a set of point in space {xi}i=1 . . . n
    Figure US20170310772A1-20171026-P00001
    M, the geometric multivariate median is defined as:
  • m = min x M i = 1 n w i d ( x , x i ) ,
  • where wi is the weight (or multiplicity) of point i, d is a distance function, and x and xi are two points in space M. Because distances on a globe are being measured, Euclidian distance cannot be applied. Instead, geodetic distance is computed according to the curvature of the Earth using Vincenty's formula (see Literature Reference No. 11).
  • For the second heuristic, work in social theory that suggests the closest part of an individual's social group should exhibit triadic closure (i.e., if A is friends with B and C, then B and C will also be friends) is leveraged. Therefore, given an individual's relations to others in the inferred network according to the principles of the present invention, prior to estimating the distance, the network is filtered such that only those connected individuals who are also friends with each other (i.e., exhibit triadic closure) will have their locations used for inference. The “social triangle median” is similar to the geometric median used elsewhere. A distinction of the present invention is that edges from the social network which are not part of a closed triangle are removed.
  • The location inference process (defined by the geometric multivariate median above) is not expected to converge and, therefore, some stopping criterion is needed. As non-limiting examples, a criterion could be a fixed number of iterations, the number of users who have been located, or the percentage change in users with new locations. In experimental studies, it was found that the network was sufficiently covered after a few iterations (usually four), after which the performance did not improve.
  • Referring to FIG. 1, in the final location estimates stage 112, the individual's locations as determined by the geometric multivariate median are emitted, and individuals are mapped to the final, estimated locations. These locations can serve as strong priors as to where the user's messages come from.
  • The method described herein was tested using a 10% sample of Twitter™ messages from April 2012 to November 2012. This sample produced a network of bidirectional user mentions with 47,760,573 users and with 254,263,081 inferred social relationships between those users. Using the FourSquare™ API, user profiles and friends of users were crawled, resulting in a network with 3,976,819 users and 17,619,191 relations between these users. Using the information about linked accounts with the two social media platforms, the networks were combined into a single social network that had 50,741,905 unique users, with approximately 1.6 million (M) users having identities in both platforms. The combination of networks also served to validate the social relationship inference on Twitter™, with approximately 7.5M users who had edges in the Twitter™ mention network also having explicitly indicated friendships in the FourSquare™ social network. Locations were extracted for 2,554,064 users in Twitter™, which was approximately 5% of the network.
  • For evaluation, five-fold cross validation was used. Given the set of seed locations, 80% of those locations were used, and the full inference process depicted in FIG. 1 was performed, stopping after four iterations. The locations of the held-out 20% were then compared with their true locations. The process was repeated using a distinct 20% held-out set for each fold such that all users were evaluated once.
  • Five metrics were used for evaluation. First, the median error in estimated distance was considered. The distribution of errors follows a power law distribution and, therefore, the median is a preferable estimate of performance compared with the mean. Second, the percentage of the network that was found after four iterations was considered. This loosely corresponds to the recall metrics, but due to the stopping criteria, represents a soft upper bound on the number of users that could be located. The remaining three metrics were all based on name matching. Each latitude and longitude was mapped to a city, state, and country name using a reverse geocoding process. Reverse geocoding is the process of reverse coding of a point location (latitude, longitude) to a readable address or place name. A comparison was made with regards to whether the names mapped to the inferred location matched those of the true location. This evaluation was difficult due to the nature of naming locations; the irregularities in naming boundaries can cause locations that are very close in distance to have distinct names. Furthermore, reverse geocoding is not an exact process and can introduce noise.
  • The table 200 in FIG. 2 reports the performance of the experimental study. Specifically, the table 200 highlights three trends in performance. The social triangle 202 heuristic produces the highest performance on the matching metrics 204 and median error metrics 206. By limiting the inference to only those individuals estimated to be in the closer social circle, noise is reduced and accuracy improves. However, due to the constraint that users must have relationships with triadic closer, the method suffers from the lowest recall (i.e., percent located 208), only estimating users for 54% of the network. Further experiments revealed that increasing the number of iterations does not increase this percentage substantially.
  • In the second trend, using the geometric median 210 (also referred to as geometric multivariate median) alone generated significantly better performance at locating more users (i.e., percent located 208) than when the social triangle 202 is applied. This suggests that despite the noise, the additional locations of users with singleton friendships can still provide sufficient data to estimate an individual's true location. Further experiments with other medians, such as Oja's Simplex Median (see Literature Reference No. 12 for a description of Oja's Simplex Median) and standard label propagation, showed that the geometric median offered the best performance on the matching metrics and median error metrics.
  • Last, the addition of FourSquare™ relationships (geometric median+FourSquare™ 212) increased the recall by 0.7% while only creating a small drop in performance. Though a small increase percentage-wise, this represents an increase in coverage of over 335,000 new individuals. Further experimentation included adding location information from FourSquare™ profiles. However, the locations of 846,000 additional individuals did not significantly change the performance.
  • For comparison, the system according to the principles of the present invention was compared against an oracle-based method (upper bound column 214) that always estimates the location of an individual as the location of their closest neighbor. Due to the presence of noise in the neighbors, this should not be considered a true upper bound on performance; however, it does represent what performance would be expected if the closest location was always selected from among the neighboring locations at algorithm initialization time.
  • In summary, the system according to the principles of the present invention infers the locations of users in arbitrary online social media platforms where users are connected by inferred or explicitly stated social relationships, and where at least a small number of users share their true or estimated location. The invention significantly advances the state-of-the-art by (1) providing better data coverage than is available using current methods and (2) being able to infer locations from users whose associated content in the social media platform offers no indication of their geographic vicinity.
  • An example of a computer system 300 in accordance with one aspect is shown in FIG. 3. The computer system 300 is configured to perform calculations, processes, operations, and/or functions associated with a program or algorithm. In one aspect, certain processes and steps discussed herein are realized as a series of instructions (e.g., software program) that reside within computer readable memory units and are executed by one or more processors of the computer system 300. When executed, the instructions cause the computer system 300 to perform specific actions and exhibit specific behavior, such as described herein.
  • The computer system 300 may include an address/data bus 302 that is configured to communicate information. Additionally, one or more data processing units, such as a processor 304, are coupled with the address/data bus 302. The processor 304 is configured to process information and instructions. In one aspect, the processor 304 is a microprocessor. Alternatively, the processor 304 may be a different type of processor such as a parallel processor, or a field programmable gate array.
  • The computer system 300 is configured to utilize one or more data storage units. The computer system 300 may include a volatile memory unit 306 (e.g., random access memory (“RAM”), static RAM, dynamic RAM, etc.) coupled with the address/data bus 302, wherein a volatile memory unit 306 is configured to store information and instructions for the processor 304. The computer system 300 further may include a non-volatile memory unit 308 (e.g., read-only memory (“ROM”), programmable ROM (“PROM”), erasable programmable ROM (“EPROM”), electrically erasable programmable ROM “EEPROM”), flash memory, etc.) coupled with the address/data bus 302, wherein the non-volatile memory unit 308 is configured to store static information and instructions for the processor 304. Alternatively, the computer system 300 may execute instructions retrieved from an online data storage unit such as in “Cloud” computing. In an embodiment, the computer system 300 also may include one or more interfaces, such as an interface 310, coupled with the address/data bus 302. The one or more interfaces are configured to enable the computer system 300 to interface with other electronic devices and computer systems. The communication interfaces implemented by the one or more interfaces may include wireline (e.g., serial cables, modems, network adaptors, etc.) and/or wireless (e.g., wireless modems, wireless network adaptors, etc.) communication technology.
  • In one aspect, the computer system 300 may include an input device 312 coupled with the address/data bus 302, wherein the input device 312 is configured to communicate information and command selections to the processor 300. In accordance with one aspect, the input device 312 is an alphanumeric input device, such as a keyboard, that may include alphanumeric and/or function keys. Alternatively, the input device 312 may be an input device other than an alphanumeric input device. In one aspect, the computer system 300 may include a cursor control device 314 coupled with the address/data bus 302, wherein the cursor control device 314 is configured to communicate user input information and/or command selections to the processor 300. In one aspect, the cursor control device 314 is implemented using a device such as a mouse, a track-ball, a track-pad, an optical tracking device, or a touch screen. The foregoing notwithstanding, in one aspect, the cursor control device 314 is directed and/or activated via input from the input device 312, such as in response to the use of special keys and key sequence commands associated with the input device 312. In an alternative aspect, the cursor control device 314 is configured to be directed or guided by voice commands.
  • In one aspect, the computer system 300 further may include one or more optional computer usable data storage devices, such as a storage device 316, coupled with the address/data bus 302. The storage device 316 is configured to store information and/or computer executable instructions. In one aspect, the storage device 316 is a storage device such as a magnetic or optical disk drive (e.g., hard disk drive (“HDD”), floppy diskette, compact disk read only memory (“CD-ROM”), digital versatile disk (“DVD”)). Pursuant to one aspect, a display device 318 is coupled with the address/data bus 302, wherein the display device 318 is configured to display video and/or graphics. In one aspect, the display device 318 may include a cathode ray tube (“CRT”), liquid crystal display (“LCD”), field emission display (“FED”), plasma display, or any other display device suitable for displaying video and/or graphic images and alphanumeric characters recognizable to a user.
  • The computer system 300 presented herein is an example computing environment in accordance with one aspect. However, the non-limiting example of the computer system 300 is not strictly limited to being a computer system. For example, one aspect provides that the computer system 300 represents a type of data processing analysis that may be used in accordance with various aspects described herein. Moreover, other computing systems may also be implemented. Indeed, the spirit and scope of the present technology is not limited to any single data processing environment. Thus, in one aspect, one or more operations of various aspects of the present technology are controlled or implemented using computer-executable instructions, such as program modules, being executed by a computer. In one implementation, such program modules include routines, programs, objects, components and/or data structures that are configured to perform particular tasks or implement particular abstract data types. In addition, one aspect provides that one or more aspects of the present technology are implemented by utilizing one or more distributed computing environments, such as where tasks are performed by remote processing devices that are linked through a communications network, or such as where various program modules are located in both local and remote computer-storage media including memory-storage devices.
  • An illustrative diagram of a computer program product embodying the present invention is depicted in FIG. 4. As a non-limiting example, the computer program product is depicted as either a floppy disk 400 or an optical disk 402. However, as mentioned previously, the computer program product generally represents computer readable code (i.e., instruction means or instructions) stored on any compatible non-transitory computer readable medium.

Claims (15)

1. A system for inferring the location of users of online social media platforms, the system comprising:
one or more processors and a non-transitory computer-readable medium having instructions encoded thereon such that when the instructions are executed, the one or more processors perform operations of:
(a) extracting a social network from data from at least one social media platform, wherein the social network comprises a plurality of users connected through social relationships, and wherein each user in the plurality of users has an identity on each social media platform;
(b) generating a mapping in the social network from each user in the plurality of users to the user's estimated geographical location, resulting in an estimated location mapping Est;
(c) generating a mapping in the social network from each user in the plurality of users having known geographical location data to the user's known geographical location, resulting in a known location mapping SL;
(d) updating the estimated location mapping Est to have the same mapping as the known location mapping SL;
(e) for a user j in the plurality of users having a mapping in the known location mapping SL, updating the location of j in Est′ to be the location in the known location mapping SL, wherein Est′ is an updated individual-location mapping for a current iteration;
(f) repeating operation (e) for each user in the plurality of users;
wherein N is a set of users having a social relationship with user j, and
wherein NL is a set of locations;
(g) for each user k in N having a mapping in Est, adding the estimated geographical location of k in Est to NL;
(h) estimating a new geographical location of j using a geometric median metric;
(i) updating Est′ with the new geographical location of j;
(j) iterating through operations (e) through (h) until a stopping criteria is met;
(k) replacing the mappings in Est with those in Est′; and
(l) mapping users to final, estimated locations based on the mappings in Est.
2. The system as set forth in claim 1, wherein the one or more processors further perform an operation of:
providing a subgraph of the social network for N and the set of locations NL.
3. The system as set forth in claim 2, wherein the one or more processors further perform an operation of combining a user's identities from all social media platforms, such that each user is represented as a single individual in the social network.
4. The system as set forth in claim 3, wherein the one or more processors further perform an operation of merging estimated geographical location data and known geographical location data from all social media platforms for a user.
5. The system as set forth in claim 4, wherein the one or more processors further perform an operation of selecting only those users in N who also have social relationships with each other for geographical location estimation using the geometric median metric.
6. A computer-implemented method for inferring the location of users of online social media platforms, comprising an act of:
causing one or more processors to execute instructions stored on a non-transitory memory such that upon execution, the one or more processors performs operations of:
(a) extracting a social network from data from at least one social media platform, wherein the social network comprises a plurality of users connected through social relationships, and wherein each user in the plurality of users has an identity on each social media platform;
(b) generating a mapping in the social network from each user in the plurality of users to the user's estimated geographical location, resulting in an estimated location mapping Est;
(c) generating a mapping in the social network from each user in the plurality of users having known geographical location data to the user's known geographical location, resulting in a known location mapping SL;
(d) updating the estimated location mapping Est to have the same mapping as the known location mapping SL;
(e) for a user j in the plurality of users having a mapping in the known location mapping SL, updating the location of j in Est′ to be the location in the known location mapping SL, wherein Est′ is an updated individual-location mapping for a current iteration;
(f) repeating operation (e) for each user in the plurality of users;
wherein N is a set of users having a social relationship with user j, and
wherein NL is a set of locations;
(g) for each user k in N having a mapping in Est, adding the estimated geographical location of k in Est to NL;
(h) estimating a new geographical location of j using a geometric median metric;
(i) updating Est′ with the new geographical location of j;
(j) iterating through operations (e) through (h) until a stopping criteria is met;
(k) replacing the mappings in Est with those in Est′; and
(l) mapping users to final, estimated locations based on the mappings in Est.
7. The method as set forth in claim 6, wherein the one or more processors further performs an operation of:
providing a subgraph of the social network for N and the set of locations NL.
8. The method as set forth in claim 7, wherein the one or more processors further performs an operation of combining a user's identities from all social media platforms, such that each user is represented as a single individual in the social network.
9. The method as set forth in claim 8, wherein the one or more processors further performs an operation of merging estimated geographical location data and known geographical location data from all social media platforms for a user.
10. The method as set forth in claim 9, wherein the one or more processors further performs an operation of selecting only those users in N who also have social relationships with each other for geographical location estimation using the geometric median metric.
11. A computer program product for inferring the location of users of online social media platforms, the computer program product comprising computer-readable instructions stored on a non-transitory computer-readable medium that are executable by a computer having a processor for causing the processor to perform operations of:
(a) extracting a social network from data from at least one social media platform, wherein the social network comprises a plurality of users connected through social relationships, and wherein each user in the plurality of users has an identity on each social media platform;
(b) generating a mapping in the social network from each user in the plurality of users to the user's estimated geographical location, resulting in an estimated location mapping Est;
(c) is generating a mapping in the social network from each user in the plurality of users having known geographical location data to the user's known geographical location, resulting in a known location mapping SL;
(d) updating the estimated location mapping Est to have the same mapping as the known location mapping SL;
(e) for a user j in the plurality of users having a mapping in the known location mapping SL, updating the location of j in Est′ to be the location in the known location mapping SL, wherein Est′ is an updated individual-location mapping for a current iteration;
(f) repeating operation (e) for each user in the plurality of users;
wherein N is a set of users having a social relationship with user j, and
wherein NL is a set of locations;
(g) for each user k in N having a mapping in Est, adding the estimated geographical location of k in Est to NL;
(h) estimating a new geographical location of j using a geometric median metric;
(i) updating Est′ with the new geographical location of j;
(j) iterating through operations (e) through (h) until a stopping criteria is met;
(k) replacing the mappings in Est with those in Est′; and
(l) mapping users to final, estimated locations based on the mappings in Est.
12. The computer program product as set forth in claim 11, further comprising instructions for causing the processor to perform an operation of:
providing a subgraph of the social network for N and the set of locations NL.
13. The computer program product as set forth in claim 12, further comprising instructions for causing the processor to perform an operation of combining a user's identities from all social media platforms, such that each user is represented as a single individual in the social network.
14. The computer program product as set forth in claim 13, further comprising instructions for causing the processor to perform an operation of merging estimated geographical location data and known geographical location data from all social media platforms for a user.
15. The computer program product as set forth in claim 14, further comprising instructions for causing the processor to perform an operation of selecting only those users in N who also have social relationships with each other for geographical location estimation using the geometric median metric.
US14/210,265 2013-04-05 2014-03-13 Inferring the location of users in online social media platforms using social network analysis Active 2034-11-13 US9794358B1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US14/210,265 US9794358B1 (en) 2013-04-05 2014-03-13 Inferring the location of users in online social media platforms using social network analysis
US14/295,101 US9953080B1 (en) 2013-04-05 2014-06-03 Social media data mining for early detection of newsworthy civil unrest events
US14/535,812 US10255352B1 (en) 2013-04-05 2014-11-07 Social media mining system for early detection of civil unrest events
US14/539,828 US10726090B1 (en) 2013-04-05 2014-11-12 Per-user accuracy measure for social network based geocoding algorithms
US14/639,979 US10305845B1 (en) 2013-04-05 2015-03-05 Accurate user alignment across online social media platforms
US15/163,547 US9892168B1 (en) 2013-04-05 2016-05-24 Tracking and prediction of societal event trends using amplified signals extracted from social media

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361809160P 2013-04-05 2013-04-05
US14/210,265 US9794358B1 (en) 2013-04-05 2014-03-13 Inferring the location of users in online social media platforms using social network analysis

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/539,828 Continuation-In-Part US10726090B1 (en) 2013-04-05 2014-11-12 Per-user accuracy measure for social network based geocoding algorithms

Related Child Applications (4)

Application Number Title Priority Date Filing Date
US14/295,101 Continuation-In-Part US9953080B1 (en) 2013-04-05 2014-06-03 Social media data mining for early detection of newsworthy civil unrest events
US14/539,828 Continuation-In-Part US10726090B1 (en) 2013-04-05 2014-11-12 Per-user accuracy measure for social network based geocoding algorithms
US14/639,979 Continuation-In-Part US10305845B1 (en) 2013-04-05 2015-03-05 Accurate user alignment across online social media platforms
US15/163,547 Continuation-In-Part US9892168B1 (en) 2013-04-05 2016-05-24 Tracking and prediction of societal event trends using amplified signals extracted from social media

Publications (2)

Publication Number Publication Date
US9794358B1 US9794358B1 (en) 2017-10-17
US20170310772A1 true US20170310772A1 (en) 2017-10-26

Family

ID=51659126

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/210,265 Active 2034-11-13 US9794358B1 (en) 2013-04-05 2014-03-13 Inferring the location of users in online social media platforms using social network analysis

Country Status (4)

Country Link
US (1) US9794358B1 (en)
EP (1) EP2981903B1 (en)
CN (1) CN105339927B (en)
WO (1) WO2014165306A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10834042B2 (en) 2015-08-31 2020-11-10 International Business Machines Corporation Inference of location where each textual message was posted
US20170195434A1 (en) * 2015-12-31 2017-07-06 Palantir Technologies Inc. Computer-implemented systems and methods for analyzing electronic communications
CN106850410A (en) * 2017-02-13 2017-06-13 焦慧 A kind of method and device by the quick locating personnel position of social platform
US20190228321A1 (en) * 2018-01-19 2019-07-25 Runtime Collective Limited Inferring Home Location of Document Author
US20190252078A1 (en) * 2018-02-15 2019-08-15 X Development Llc Predicting the spread of contagions

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8352183B2 (en) * 2006-02-04 2013-01-08 Microsoft Corporation Maps for social networking and geo blogs
CN102577494B (en) * 2009-09-28 2016-03-16 瑞典爱立信有限公司 Support the method and apparatus of the social network analysis in communication network
US20110125826A1 (en) * 2009-11-20 2011-05-26 Avaya Inc. Stalking social media users to maximize the likelihood of immediate engagement
CN102088419B (en) * 2009-12-07 2012-08-15 倪加元 Method and system for searching information of good friends in social network
CA2840395A1 (en) * 2011-06-27 2013-01-03 Cadio, Inc. Triggering collection of information based on location data
CN102883259B (en) * 2011-07-11 2017-12-12 欢聚时代科技(北京)有限公司 A kind of method and system that friend location is provided
US8965974B2 (en) * 2011-08-19 2015-02-24 Board Of Regents, The University Of Texas System Systems and methods for determining user attribute values by mining user network data and information
US8909771B2 (en) 2011-09-15 2014-12-09 Stephan HEATH System and method for using global location information, 2D and 3D mapping, social media, and user behavior and information for a consumer feedback social media analytics platform for providing analytic measurements data of online consumer feedback for global brand products or services of past, present or future customers, users, and/or target markets
US8726142B2 (en) * 2011-09-21 2014-05-13 Facebook, Inc. Selecting social networking system user information for display via a timeline interface

Also Published As

Publication number Publication date
EP2981903A4 (en) 2016-11-16
US9794358B1 (en) 2017-10-17
CN105339927A (en) 2016-02-17
WO2014165306A1 (en) 2014-10-09
EP2981903A1 (en) 2016-02-10
CN105339927B (en) 2017-12-08
EP2981903B1 (en) 2020-08-05

Similar Documents

Publication Publication Date Title
Jurgens et al. Geolocation prediction in twitter using social networks: A critical analysis and review of current practice
Yao et al. Sensing spatial distribution of urban land use by integrating points-of-interest and Google Word2Vec model
US9824156B1 (en) Targeting of digital content to geographic regions
Tong et al. A linear road object matching method for conflation based on optimization and logistic regression
Kong et al. Spot: Locating social media users based on social network context
US10318884B2 (en) Venue link detection for social media messages
AU2017201653B2 (en) System and method for visual bayesian data fusion
US9794358B1 (en) Inferring the location of users in online social media platforms using social network analysis
Safra et al. Location‐based algorithms for finding sets of corresponding objects over several geo‐spatial data sets
Rodrigues et al. Exploring multiple evidence to infer users’ location in Twitter
Hong et al. Efficient measurement of continuous space shortest distance around barriers
US10366134B2 (en) Taxonomy-based system for discovering and annotating geofences from geo-referenced data
US20160381154A1 (en) Predicting Geolocation Of Users On Social Networks
Murray Evolving location analytics for service coverage modeling
Liu et al. Discovery of statistically significant regional co-location patterns on urban road networks
Lu et al. Online spatial data analysis and visualization system
Jeong et al. Decentralized and coordinate-free computation of critical points and surface networks in a discretized scalar field
US11023465B2 (en) Cross-asset data modeling in multi-asset databases
US9706352B2 (en) System and method for determining a boundary of a geographic area
Rodrigues et al. Uncovering the location of twitter users
US20160378774A1 (en) Predicting Geolocation Of Users On Social Networks
Apreleva et al. Predicting the location of users on Twitter from low density graphs
US10726090B1 (en) Per-user accuracy measure for social network based geocoding algorithms
US20150169626A1 (en) System and method for identifying a new geographical area name
Park et al. Hybrid approach using deep learning and graph comparison for building change detection

Legal Events

Date Code Title Description
AS Assignment

Owner name: HRL LABORATORIES, LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:JURGENS, DAVID A.;REEL/FRAME:032500/0984

Effective date: 20140320

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4