WO2018237098A1 - Methods and systems for identifying markers of coordinated activity in social media movements - Google Patents

Methods and systems for identifying markers of coordinated activity in social media movements Download PDF

Info

Publication number
WO2018237098A1
WO2018237098A1 PCT/US2018/038639 US2018038639W WO2018237098A1 WO 2018237098 A1 WO2018237098 A1 WO 2018237098A1 US 2018038639 W US2018038639 W US 2018038639W WO 2018237098 A1 WO2018237098 A1 WO 2018237098A1
Authority
WO
WIPO (PCT)
Prior art keywords
social media
campaign
cluster
clusters
network
Prior art date
Application number
PCT/US2018/038639
Other languages
French (fr)
Inventor
Vladimir D. Barash
John W. Kelly
Original Assignee
Graphika, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Graphika, Inc. filed Critical Graphika, Inc.
Priority to CA3068264A priority Critical patent/CA3068264C/en
Priority to EP18819788.3A priority patent/EP3642739A4/en
Publication of WO2018237098A1 publication Critical patent/WO2018237098A1/en
Priority to US16/442,544 priority patent/US11409825B2/en
Priority to IL271650A priority patent/IL271650A/en
Priority to US17/883,005 priority patent/US20220391460A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Definitions

  • the present disclosure relates to methods for classifying at least one contagious phenomenon propagating on a. network.
  • Hyperlinks result from a combination of choices, from those made by individual, autonomous authors to those made progranr atieally by designed systems, such as permal nks, site navigation, embedded advertising, tmekihg services, arid the like. Hitman, authors practice the same kind of information selectivity online that they do offline, i.e., what authors (including those representing organizations) write about and link to reflects somewhat stable interests, attitudes, and sociai/organizational relationships.
  • methods and systems generally include determining coordinated activity In social media movements on a social, media channel
  • the method includes identifying a plurality of markers of coordinated activity through analysts of campaign signals from the social media movements.
  • the method Includes configuring- a data structure of the plurality of markers for a social media campaign on a social media channel.
  • the plurality of markers includes a network dimension for representing how accounts are connected, a temporal, dimension for representing patterns of messages over time, and a semantic dimension for
  • the method also includes analyzing the campaign signals indicative of the coordinate activity of the social media movements in the social media campaign including determining users within the social medi campaign, ' determining dusters of users tha make u the social medi campaign and determining relationships between t e users participating in the social medi movements, and determining propagation patterns, across clusters of users of the social media campaign, f fliilSl
  • identify ing the plurality of markers includes evaluating a degree to which the coordin ed activity of the social media campaign is concentrated in the clusters of users, in embodiments, the coordinated activity of the social media campaign is determined from user actions within the social media movement in the social media campaign.
  • identifying the plurality of markers includes evaluating a degree to which the coordinated activity of the social media campaign is distributed among the clusters of users, in embodiments, the plurality of markers includes a day peakedness marker that indicates a percentage of the coordinated activity of the social media campaign that take place on a day identified m most active of the social media campaign., in embodiments, the plurality of markers includes a, commi tment signal that is compu ted by averaging a number of subsequent participation actions for each of plurality of participants in the coordinated activity of the social media campaign. In embodiments, the plurality of markers includes a post regularity commitment signal that represents a deviation of commitment to participation by a user from natural human attention patterns.
  • identifying the plurality of markers includes determining a semantic diversity score for the coordinated activit of the social media campaign b assigning messages k the campaign to topics and calculating a diversity of the topics on. a topic distance scale that facilitates determining the semantic diversity score.
  • identify ing the plurality of markers includes, computing temporal alignment of campaign-related actions for users in the campaign by comparing temporal sequences of campaign-related actions, in embodiments, identifying the plurality of markers includes computing semantic diversity over time to. identify ' co-occurring , topics in the social media campaign, wherein a.
  • the semantic-diversity score is configured to he indicative of fabricated campaigns, wherein a relatively large value of the semantic diversity score is configured to be indicative of spambots, and wherein a semantic diversity score having a value in-between is indicative of normal human activity.
  • methods and. systems generall include a computer system for determining coordinated activity in social media movements on a social media channel.
  • the system includes a user interface that configures a social media campaign on one or more social media channels and that communicates via a network.
  • the system includes a computing device that, identifies a plurality of markers -of coordinated, activity throug analysis of campaign signals from the social media movements and that configures one or more data structures containing the plurality of markers for the social media campaign on one or more social media channels.
  • the plurali t of marke s includes a -network dimension for representing ho accounts are connected, a temporal dimension for representing.
  • the analysis of the campaign signals indicative of the coordinated activity of the social media movements .
  • the social media campaign includes- determining users within the social media campaign, determining clusters- of users- that make up the social media campaign and detemiimng relationships between the users participating in the social media movements, and determining propagation patterns across clusters of users of the social media campaign.
  • the system includes a storage system that stores one or more of the data structures containing the plurality of markers for the social media campaign on one or more of the social media channels.
  • a processing system that executes computer-readable instructions thai cause the processing system to: receive a request from an external system about the coordinated activity of the campaign signals from the social media mo vements; retrieve at least a portion of one or more data structures contai n ing; the plurality of markers for the social media campaign on one or more of the social media channels; and transmit contents of at least portion of the analysis to the user interface that displays at least a portion of the plurality of markers indicative one of coordinated acti vity and normal human activity
  • identifying the plurality of markers through analysis of campaign signals includes evaluating a degree to which the coordinated activit of the social media campaign is concentrated in the clusters of users.
  • the coordinated activity of the social media campaign is determined from user actions within the social media movements in the social media campaign.
  • the coordinated activity includes a relatively large number of accounts on one -or more of the social media channels controlled by a relatively small number of coordinated entities resulting in a relative lack of diversity of similar accounts on one or more social medial channels controlled by uncoordinated users.
  • identifying the plurality of markers through analysis of campaign signals includes evaluating a degree to which the -coordinated activity of the social media campaign is distributed among the clusters of users.
  • the plurali ty of markers includes a day peakedness marker that indicates a percentage of the coordinated activity of the social media campaign that take place on a day identified as most active of the social media campaign, in embod iments, the plural ity of indicators incl udes a commitment .signal that is computed by averaging a number of subsequent participation actions for each of a plurality of participants in the coordinated activity of the social media campaign.
  • the plurality of indicators i ncludes a post regularity commitment signal that represents a deviation of commitment to participation by a user from -natural hitman attention patterns. Iti embodiments, identifying the plurality of markers through analysis of campaign- s ignals includes determining a semantic diversity score for the coordinated activity of the social media campaign. Determining, a semantic diversity score includes
  • identifying the plurality of markers through analysis of campaign signals includes computing temporal alignmen of campaign-related actions for users in the campaign by comparing temporal sequences of campaign-related actions.
  • identifying the plurality of markers through analysis of campaign signals includes computing semantic diversity over time to identity concurring topics in the social media campaign, A relatively small value of the semantic diversity score is configured to be indicative of .fabricated campaigns, a relatively large value of the semantic diversity score is configured to be indicative ofspambots, and a semantic diversity score having a, value in-between is indicative of normal human activity,
  • a computer-readable storage medium with an executable program stored thereon, wherein the program instructs a processor to perform the steps of attentive clustering and analysis may include constructing an online author network, wherein constructing the online author network includes selecting a set of source nodes (S), a set of outlink- targets (T) from at least one selected type of hyperlink, and set of edges (E) between S and T defined by the at least one selected type of hyperlink from S to T during a specified time period; deri ving a set of nodes, T, by any one of or combination of a.) normalizing nodes in T, optionally to a selected level of abstraction, b.) using lists of target nodes for exclusion, ⁇ "blacklists"), and c.) using lists of target nodes tor inclusion ("whitelisis- '); transforming the online author network into a matrix of source nodes in S ; l inked to targets in ; partitioning the online author network into at leas one set
  • the -element of the graphical representation may use at least one of size, thickness, color and pattern to depict a type of acti vity.
  • Attentive clusters and their constituent nodes may be differentiated in the graph ical representation, by at least one of a color (including hue, intensity and saturation), a shape ' . (including 2D or 3D representations), a ' geometric arrangement, a shading, a transparency and a size.
  • the size of the object representing the clustered nodes to the graphical representati n may correlate with a metric.
  • the nodes, targets, and edges ma be collected irons public and private sources of information.
  • Constructing the matrix may include applying at least one threshold parameter from the group consisting of: maxnodes, targetmax, nodeursn, targetmin, max! inks, and hnkroin. Constructing the matrix may include applying a minimum threshold for the number of included nodes that must link to a target to qualify it for inclusion in the matrix. Constructing the matrix may include applying a minimum threshold tor the number of included targets that must link to a node to qualify it for inclusion in the matrix. The matrix may he a graph -matrix. The method may further include applying: any lists specify ing inclusion or exclusion of particular nodes.
  • the term "author.” as used herein should be understood to encompass human and non-human creators and editors of content (including, without limitation, text, images, video, tweets, animations * , multimedia and any combinations or other type of content and including, without limitation, original content, derivative works, commentary, analysis, and other genres of content) that can be ' consumed (e.g., read or viewed) by others, such a readers or viewers in a network,
  • a method ' of usin attentive clustering to steer a farther data collection process may include partitioning an online, author network into at least, one set of source nodes with a similar linking history to form an attentive eluster and at least one set of but! Ink targets with -a. similar citation profile to form an outiink bundle, and collecting . .cliolestream data for the source nodes of the attentive eluster.
  • a method o f using attenti ve clustering to steer a further data collection process may include partitioning an online author network into at least one set of source nodes with a similar linkin history to form an attentive cluster and at least- one set of outiink targets with a similar citation profile to form an outiink -bundle, and collecting cSicksiream data for t he target nodes of t he outiink bundle,
  • a method of using attenti ve cl ustering to steer a further data collection process may include partitioning an online author network into at least one set of source nodes with a similar linking history to form an attentive cluster and at least on set of outiink targets with a similar citation profile to form, an outlink bundle, and collecting survey data for the source nodes of the attentive duster.
  • a method of using attentive clustering to steer a further data collection process may include .partitioning an online author network into at least one set of source nodes with a similar linking history to form an attentive cluster and at least one set of out! ink targets wit a similar citation profile to form an outlink bundle, and collecting survey data for the target nodes of the otitlink bundle.
  • a method using attentive clustering t steer a farther data collection process may include partitioning an online author network into at least one set of source nodes with a similar linking history to form an attentive cluster and. at least, one set of outlink. targets. with a similar citation profile to form an outlink bundle, and collecting, geo- ' locatton data for the source nodes of the attentive cluster.
  • a method of using attentive clustering to steer a further data collection process may include partitioning an online author network, into at least one set of source nodes with a similar linking history to form an attentive cluster and at. least one set of outlink targets with a similar citation profile to form an outlink bundle, and collecting geo-ioeation data for the target nodes of the outlink bundle,
  • a method of metadata tag analysis to facilitate interpretation of an attenti ve cluster may include partitioning an online author network into at least one set of source nodes with a similar linking history to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form an outlink bundle, collecting a metadata tag associated with the source nodes in the attentive cluster, and performing a differential frequency analysis on the metadata tags that are: associated with the attentive cluster.
  • the method may further include sorting cluster focus scores on a plurality of the metadata tags,
  • a method of metadata tag analysis to facilitate interpretation of an attenti ve cluster may include partitioning: an online author network into at least one set of source nodes with a similar linking histor to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form, an outlink: bundle, collecting a metadata tag associated with the source nodes in the attentive cluster, and performing a differential frequency analysis on the metadata tags that arc associated with the outlink bundle.
  • the method may further include sorting cluster focus scores on a plurality of the metadata tags,
  • a method may include partitioning an online autho network into at least one set of source .nodes with a similar Jinking , history to form an attenti ve cluster and at least one set of outlink targets with a similar citation profile to form an outlink. bundle, forming a density matrix of the attentive cluster and the outlink bundle, determining where there is a higher density in the -density matrix thao chance would predict, and identifying patterns of influence of a Mock of web sites on a block of authors by analyzing the higher density area of the density matrix.
  • measurement of link: density may include constructing art online author network, wherein, constructing the online author network comprises • selecting a set of source nodes (S), a set of outlmk targets (T), and a set of edges (E) between S and T defined by the at least one selected type of hyperlink from S to T during a specified time period, deriving a set of nodes, T", by- normalizing nodes in T « transforming the online author network Into a matri of source nodes in S linked to targets in T, and collapsing the matrix to aggregate link measures among clusters of sources and clusters of targets.
  • the aggregated link- measure may be at least one of a coun of the number of nodes in source -cluster S linking to any member of target set T, a density calculated by dividing counts by the product of the number of members in S and the ' number of members in T; and a standard score that is a standardized measure of the deviation from random chance for counts across each source node-outiink target crossing in the density matrix.
  • a method may include partitioning an online author network into at least one set of source nodes with a similar Sinking history to form an attentive cluster and a least one set of outlmk targets with a. similar citation profile to form an outlink bundle, arid associating the attentive cluster with a real world grou of people.
  • a method of multi-layer attentive clustering may include partitioning a multi-layered social, segmentation into at least one set of source nodes with a similar Unking history to form an attentive cluster and at least one set of ootiink targets with a similar citation profile to form an outlmk bundle, and monitoring at least one of the attentive cluster and the outlmk. bundle on at least one layer of the social segmentation.
  • the social segmentation may be an. online social media author network.
  • Monitoring may be tracking the growth of an attentive cluster over time *
  • the method may further- include examining a source node associated with a specific .player in the attentive duster in order to -determine a characteristic. The monitoring may be used to identity a group of people who are susceptible to a message and track downstream -activities in response to the message.
  • a method may include partitioning an online author network into at least one set of source nodes with, a similar linkin histor to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form an: outlink bundle, and analyzing the attentive cluster over time to depict changes in a linking pattern of the attentive cluster over a time period.
  • the outlink bundle may be a list of semantic markers.
  • the semantic marker may be at least one of a text element, a post, a tweet, an online content, and a metadata tag. Analyzing may involve tracking a semantic .marker or set of semantic markers across one or more attentive clusters within the online author network.
  • a method- may include partitioning an online author network into at least one set of source nodes with a similar Sinking history to form an attentive cluster and at least one set of ⁇ utlink targets with a similar citation profile to form an out!ink bundle, arid .calculating a set of cluster focus index (GFi) scores for the attentive ⁇ cluster, wherein the CPi represents the degree to which a particular outlink target is disproportionately cited by members of a particular attentive cluster as compared to the average citation frequency for all nodes in , At least one source node may be a high attention source node.
  • the method may further include automaticall placing an advertisement at the particular outlink target.
  • a method may include partitioning an online author network into at least one set of source nodes with a similar linking history' to form an attenti ve cluster and at least one set of outlink targets with a similar citation profile to form an outlink bundle, and generating a graphical representation of attentive clusters and/or outlink. bundles in the network to enable interpretation of network, features and behavior and calculation of comparative statistical measures across the attentive clusters and outlink bundles, wherein at least one element of the graphical representation depicts a .measure of an extent of a type of activity within the network.
  • the method may further include further segmenting the network using at least, one of a text, an item of online content, a link, and an object.
  • the source node hi the graphical representation may be represented by an individual dot. The size of the dot may be determined based on the number of other source nodes that link to it.
  • a method may include partitioning an online author network into at least one set of source nodes with, a similar linking history to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form an outlink bundle, ⁇ calculating a.
  • CFI cluster focus index
  • a method of attentive clustering may include defining a semantic bundle, searching a plurality of candidate nodes- f r item in the bundle in order to generate a. relevance metric for use in selecting high-relevance online authors, partitioning the online author network into at least one set of source nodes with a similar linking history to form an attentive cluster and at least one set of out!
  • a method- may include partitioning an online author network into at least one set of source nodes with a similar Sinking history to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form an outlink bundle, and generating a graphical representation of link targets, semantic events, and node-associated: metadata scattered in an x ⁇ y coordinate space, wherein the dimensions of the graph are custom- defined using sets of attentive clusters, grouped to .represent substantive dimensions of interest for a particular analysis,
  • a computerized search method may include presenting, to a user, a com uter interface for specifying one or more search terms for a search query, presenting at least one •selectable Uern corresponding to at least one of an M score and a CP! score filler for the search query, generating ' an amended search query based on a. selected item, .and performing a search using the amended search query .
  • the search may be of the Internet.
  • the search may be of a document-corpus.
  • the search may be..of a CH ⁇ filtered set of clusters within an online network.
  • the -search may be of a set of nodes ' having an M score greater than a threshold.
  • CFl may represent the degree to which an event, characteristic or behavior disproportionately occurs in a particular .cluster* or a particular cluster, relative to -a network, preferentially .manifests an event, characteristic or behavior.
  • M scor may he calculated using the formula M score ⁇ co.tmt (alpha ⁇ -fCFl (I -alpha.) [normalised 1 to 10], where count is -the overall number of members on a cluster focus map that have engaged with a target.
  • a computerized search method may include: presenting, to a user, a- computer interface for specifying one- or more search terms for a search query, presenting, to the user, a computer interface for selecting content to search with the search terms, wherein the content is taken from an online creator network partitioned Into at leastone set of source nodes with a similar linking history to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form an outlink bundle, and performing a. search of the selected content using the search query.
  • a method t iteratively reduce the scale of a network to its most influential core communities and obtain a sub-graph of maximally connected sub-actors may include assigning variable, Kcm* to each indi vidual member of the network, where conr relates to a minimum connectedness based on the number of other nodes in the network to which the individual Is connected, removing inactive individuals and individual -with, few followers from the network, temporarily removing certain indi viduals with a : large number of followers for later re-joining, restricting the remaining individuals iteratively by removing individuals with the lowest ⁇ » ⁇ values first, then removing individuals with the next highest Ktm values until a threshold is reached, wherein the threshold is at least one of a number of individuals removed, a number of individuals remaining, and a Kcorr value, and- ev king the temporarily amoved, individuals,
  • a self-service tool to construct a social media map may include an automated process (e.g.. hot) that harvests data (e.g., nodes) and maps the data to one or more clusters segments, a processor that provides cluster/segment labels and CPS scores for the clusters/segment, and an interface that enables user browsing of clusters/segments and the map, tagging nodes, and re-grouping re-iateliiig of cl usters/segments.
  • an automated process e.g. hot
  • harvests data e.g., nodes
  • CPS scores for the clusters/segment
  • an interface that enables user browsing of clusters/segments and the map, tagging nodes, and re-grouping re-iateliiig of cl usters/segments.
  • the automated process may also be capable- of: ⁇ utomatically refreshing the social media map based on using a relevance score for nodes in the map, positively, or negatively weighting , at least one cluster based on a CFl score calculation to include positively weighted nodes and exclude negati vely weighted nodes from the map, filtering out unwanted nodes, obligatorily including ' nodes that were not clustered in a first version of the social media map, erowd-sowrced information regarding nodes and/or links that drives- nodes to bundles, processing social media map usage data for trends/indicators, wherein the usage dat relates to one or more of what is ignored, what is further explored, what is used, ' how are clusters -grouped, what name/label is assigned to a cluster, what color is used for a cluster, what order/position is the cluster placed in a report and wherein nodes preferentially interacted with are weighted more .heavily, and user-contributed data as
  • a method of strategic messagi ng may include generating a list of targets, in -a network/ciustef segrnent, llltermg the list by a criteria to limit whom to message in the networJ cluster/isegment in order to maximize the: impact of the message on the cluster/segment, wherein the filter is at least one of CFl score, M score, number of followers, following status, follower status, number of menlions re-tweets, number of distinct men tions, status of exposure to • content* status of exposure to content that has already peaked, footprint, and number of tweets/publication frequency, and ranking tire list by the filtered criteria,
  • a method of strategic network building may include ' generating a list of targets in a network/el nster segment, wherein the list is generated using at least one of CFl, M score, # of • followers . , mentions/re-tweets, distinct mentions, and: number of tweets, and following the targets, [0041.]
  • a method of calculating score may include calculating a cluster focus index score based on a degree to which a target disproportionately occurs in a particular cluster, or a particular cluster,- relative to a network, preferential ly engages with a target, determining an overall number of members of the cluster or network that have engaged with that target, and calculating an M score based on the formula: count plus CFl, wherein, count is the overall number of members of the cluster that have engaged with that target.
  • an M score filter for a list of targets may include taking a cluster focus index (CFl) score based on a degree to which a target disproportionately occurs in a particular cluster, or a particular clyster, relative to a network:, preferentially engages with a target, and providing a slider to indicate an M score, wherein the M score is based on the formula: count (alpha)+CFJ. ( : ! - alpha), wherein .count is he overall number of members of the cluster or network that have engaged with thai target, and wherein the slider is used to indicate the value of alpha between 0 and L
  • a method of strategic ad placement may include generating a list of targets in a netw-ork elwster/segraeut representing linkages in a social media environment, filtering the list fay a criteria to limit the targets in order to maximize the impact of the ad on the network/cluster/segment, wherein the filter is at least one of CFI score and M score, ranking the list by the filtered criteria, and providing an interface to launch a ad -campaign to place ads direct ly from the environment representing the linkages to the target/website.
  • Ad placement may be done vi integration with various products, .such as TwitterTM sponsored tweets, FacebookTM ad exchange, GoogleTM Adsense/Adwords, and third party online .ad networks.
  • the method may ftirther include tracking interaction with the ad across social networks.
  • a method for using cosine similarity to determine the relationship between one or more clusters may include for each cluster, buildin a vector based on the CFI scores calculated for a number o items, plotting the vectors in a 3D vector space, determine the cosine of the angle between the vectors as an indication of the relationship between, the clusters, -and when a relationship is identified ' between clusters based on the cosine, automatically labeling the clusters with the same label. If the cosine i small, the confidence that there is a high degree of similarity is high.
  • a method may include publishing a map of con ten t as a widget, and tracking interaction with the content in the widget to obtain behavioral data about a user of the. map.
  • a method may -include publishing a map of content as a widget, tracking i teractions with the content in the widget to obtain behavioral data about a user of the published map; and analysing the behavioral data in order to at least one of suggest content, track network evolution, modify the network in strategically valuable- ways, and measure the success of an ad campaign.
  • FIG. I depicts a process flow for attentive clustering.
  • ⁇ 0511 PIG. 2 depicts a social network map in the form of proximity cluster map.
  • FIG, 3 depicts a social network map in the form of a proximity cluster map highlighting attentive clusters of liberal and conservative U.S, bioggers, -and Briti h bioggers.
  • FIG. 4- depicts a social network map in the form of a proximity cluster map focused, on environmentalists, civils, . political bioggers, and parents.
  • FIG. 5 depicts a social network map in the form of a proximity cl uster map with a cluster relationship-identified.
  • FIG. 6 depicts a social network map in the form of a proximity cluster map with a bridge blog identified.
  • FIG, 7 depicts a flow diagram for attentive clustering.
  • FIG. 9 depicts a graph of CF I scores.
  • FIG. 10 depicts a graph of CFI scores.
  • FIG . 1 1 depicts a bi -polar valence graph of link targets to the- Russian biogosphere.
  • FIG, 12 ' depicts an interactive bursiwap interface.
  • FIG, 1.3 depicts a valence graph of outlink targets organised by proportion of links from liberal vs. conservative bioggers.
  • FIG. 14 depicts a flow diagram relating to social media maps.
  • FIG. 15 depicts a flow diagram relating to .refreshing social media maps.
  • FIG. 16 depicts a How diagram relating to social medi maps.
  • FIG . 17 depicts formation of a ranked target list.
  • FIG. 18 depicts Peakedness vs. Commitment by Time Range for two sets of hashtags
  • FIG. 1 a depicts Peakedness vs. Commitment by Subsequent Uses.
  • FIG, i 9b depicts Peakedness vs. Commitment by Commitment by Time Range.
  • FIG, 20 depicts a -distribution of mention-weighted normal feed concentration by topic
  • F IG . 21 depicts a distribu tion of Cohesion by topic.
  • FIG. 22a depicts a cbronotope of the #metro29 hashtag.
  • FIG. 22b depicts a ehronotope of the #samara hashtag.
  • FIG, 23 depicts a social media map platform user flow
  • FIG, 24 depicts a recent ' activity page for a social media ma platform.
  • F!G, 25 depicts a recent activity page fo -a social media map platform.
  • FIG, 26 depicts an overview page for social media map platform.
  • FIG, 27 depicts an interactive map for a social media ma platform
  • FIG . 28 depicts an overview page for a social media map platform.
  • FIG. 29 depicts an .influeneers page for a social media map platform
  • FIG, 30 depicts an infiueneer detail for a social media map platform.
  • FIG, 31 depicts a conversation leaders page for a social media map platform.
  • FIG. 32 depicts a tweets page for a social media map platform.
  • FIG. 33 depicts a websites page for a social media map platform.
  • FIG..34 depicts a key content page for a social media map platform.
  • FIG . 35 depicts a media page for a social media map platform.
  • FIG, 36 depicts a terms page for a social media map platform
  • FIG, 37 depicts a lists page for a social media map platform.
  • the present disclosure relates to a eompnter-impiemerited method fbr-attenttve clustering and analysis.
  • Attentive clusters are groups of authors who share similar linking profiles or collections of nodes whose use of sources indicates common attentive behavior.
  • Attentive clustering and related analytics may include measuring and visualizing the prominence and speci ficity of textual elements, semantic activity, sources of information, and hyperiinked objects across emergent categories of online authors within targeted subgraphs of the global Internet,
  • the disclosure may include a set of specialized parsers that identif and extract online conversations.
  • the disclosure may include algorithms thai cluster data and map t em into intuitive visualizations (publishing nodes, b logs, tweets, etc.) to determine emergent clusterings that are highly navigable.
  • the disclosure may include a front end dashboard fo interaction with, the clustering data.
  • the disclosure may include a database for tracking clustering data.
  • the disclosure may include tools and data, to visualize, interpret and act upon measurable relationships in online media.
  • the approach may be to segment an online landscape based on behavior of authors o ver time * thus creating an emergent segmentation of authors based, on real behavior that drives metrics, rather than driving metrics based on preconceived lists. Because the analysis is. a structural one, rather than language-based, the analysis: Is language agnostic.
  • the segmentation may be global, such as of the English language blogosphere, in an embodiment, the segmentation may involve a relevance metric for every node based on semantic markers and a custom mapping of higfi-relevanee nodes.
  • the disclosure enables identifying influence ⁇ such as who. is authoritative abou what t whom.
  • Attenti e clusters may involve construction of a bipartite matrix, however, an number and variety of flat or hierarchical clustering algorithms may be used to obtain a attentive cluster in the disclosure.
  • a set of content-publishing source nodes (“authors") ' may be- -selected based on a chosen combination- of linguistic, behavioral, semantic, network-based or other criteria,
  • a mixed-raode network may be constructed, comprising the set S of ail source nodes, the set T of all outlink targets from selected types of hyperlinks, and the set E of edges between them defined by the selected type or types of l inks from. S to T found during a specified time period.
  • the matrix may ' be constructed of source nodes in S linked to targets in T ⁇ derived by any combinatio of a.) .normalizing nodes in T, optionally to a selected level of abstraction, b.) using lists of target nodes for exclusion ( blacklists"), and c.) using lists of target node tor inclusion fSvhiteiisfs").
  • the matrix may represent a two-mode network (or actor-event network) that associates two completely different categories of nodes, actors and events, to build a network of actors through their participation in events or affiliations..
  • the matrix is, in effect, an affiliation matrix Of ail authors with the things that they link to, wherein the patterns of their linking may he used to do statistical clustering of their nodes,
  • the matrix may be processed according to user-selected parameters, and clustered in order to perform one or more of the following: I .) partition the network into sets of source nodes with similar linking histories ("attentive clusters"); 2,) identify sets of targets (linkecl-to websites or objects) with similar citation profiles ("outlink bundles”); 3.) calculate: comparative statistical measures across these partitions/attentive clusters; 4.) construct visualizations to aid in interpretation of network features and behavior; 5,) measure fretjueneies of links between attentive clusters and outlink bundles, allowing identification and. measurement of large-scale regularities in the distribution of attention by authors across sources of information, and the like.
  • An arbitrary number and variety of flat or hierarchical clustering algorithms may be used to partition the matrix, and the results may be stored in order to select any solution for output generation.
  • the resulting outputs may provide novel, unique, and useful insights tor determining influential authors and websites, planning communication strategies, targeting online advertising, and the like.
  • systems and methods ' for attentive clusterin and analysis may be embodied in a computer system comprising hardware and software elements, including local or network access to a corpus of ' chronologically-published internet data, such as blog posts, SS feeds, online articles, TwitterTM ' ' tweets," FacebookTM postings, and the like.
  • attentive ⁇ clustering and analysis may include: I .) network selection 102, 2.) partitioning 104, which may include two-mode network clustering- in this- embodiment, and .3 . 4 visualisation and metrics output 108.
  • selection 102 may include at. least two operations: a.) node selection 1 10, and b.) link selection 1.12.
  • a third may be applied in which network analytic operations are used to further specify the set of source nodes under consideration for clustering.
  • the operation may be filtering. Filtering may be technology-based, blacklist-based, whiteiist-based, and the like.
  • nodes may be URLs, at which chronologically published streams or elements of content may be available.
  • An. initial set containi any number of nodes may be selected based on any combination of node-level characteristics and/or calculated relevance scores.
  • node-level characteristics there may be ⁇ number of different kinds of nodes publishin content online, such as weblogs (blogs), online media sites (like newspaper websites), microbj gs (like TwitterTM), foraras/bulleiin boards (like http://w'www ⁇ hio!ogy-online.org/b3-oiogy- • forum/), feeds (like RSS/ATOM), and the like.
  • nodes may differ according to an arbitrary number of other intrinsic or extrinsic node-level characteristics, such as the hosting platform (e.g., Blogspot LiveJournaf), tire type of content published (text, images, audio), languages of textual content (e.g., French, Spanish), ' type of authoring entity (individual, group, corporation, C50, government, online content aggregator, etc.), frequency or regularity of publication (daily, regular, monthly, bursty), network characteristics (e.g., central, authoritative, A-Hst, isolated, un-linked, long-tail), readership/traffic levels, geographical or political location of authoring entity or focus of its concern (e.g., Russian language, Russian Federation, Bay Area Calif), membership in a particular online ad distribution network (e.g., BLOGADS, GOOGLETM ADSENSE), third-party categorizations, and the like,
  • 3 ⁇ 4096j To support node selection 11 based
  • tags may be collected, automatically, such as by "spidering" sites for me a keywords-
  • the corpus -of internet data may be scanned and matches on- list elements tabulated for each node.
  • a number of methods may be used to calculate a relevance score based on these match counts.
  • relevance scores may be calculated by calculating individual: index scores: for text matches (T), link matches (L), and metadata .matches (M), and then, summin them.
  • index scorns (1) may be calculated for each node by scanning all content published by a node during a specified period of time using a list of j relevance markers: i ⁇ simi(( j*w trH3 ⁇ 4*W ⁇ ⁇ 2 , . , (xj*Wj>/ij), where x is the number of matches for the item, w is a • user-assigned weight (a scale of I to S is typical), and t is the total number of item matches in the scanned corpus, in an -example, an- Initial set of source nodes may include the 100,000 Russian language webSogs most highly cited during a particular time frame.
  • the initial • set may include the 10,000 English language web!ogs with the highest relevance scores based on .relevance marker lists associated with the political issue of healthcare, in another example, the initial set may include all nodes by Mian, and Pakistani authors In whatever language that have published at least three tiroes within the past six months.
  • objects may be particular units of chronologically published content found at a node, such as blog posts, "tweets," and the like.
  • Links also referred- to as outlinks herein, may be hyperlink URLs found within a node's source HTML code or its published objects. Many kinds of links exist, and the ability to choose which kinds are used for clustering may be a key feature of the method. There are Sinks tor navigation, links to archives, hnks to server for embedded advertising, links in commentSjltnks to link-tracking services, and the like.
  • Link selection 1 12 may he applied to links that represent deliberate choices made by authors, of which there may also be- many kinds. These links may be to nodes (e.g., a weblog addres found in a 'ijlogrolf'), objects (e.g.. a particular YOUTUBETM video embedded in a blog post), and other classes of entity, such as "friends" and “followers.*' Some node hosting platforms define a typology of links to reflect explici tly defined relationships, such a "friend, ' ' * friend-of, M “community member,” and “community follower” in LIVEIOUR AL. or “follower” and “Mowing” in TwitterTM, Faee ookTM and the like.
  • nodes e.g., a weblog addres found in a 'ijlogrolf'
  • objects e.g. a particular YOUTUBETM video embedded in a blog post
  • other classes of entity such as "friends” and “followers.*'
  • link types are relatively static, meaning they ar iy picall avaiiabie as part of the interface used by a visitor to a node website, while others are dynamic, embedded within published, content objects.
  • Link types may be parsed or estimated and stored, with the link data. These links represent different types of relationships between authors and linked entities, and therefore, according to the user's objectives, certain classes of links may be selected for inclusion. Different, sorts of links also ha ve time values associated with them, such as the date/time of initial publication of an. objec in which a•dynamic link is embedded, or the first-detected and most recently seen date/time of a static link. Links may be further selected fo ' clustering based on these time values,
  • a mixed-mode network X 130 may be constructed, consisting of th set. S of alt source nodes, th set T of all outlink targets from selected types of hyperlinks, and the set E of edges between them defined by th selected type or types of links from S to T found during a specified time period.
  • the network 130 may be considered "mixed mode" because while it may be formally bipartite, a number of nodesin S may also exist in T, which may be considered a violation of the normal concept of two- mode net orks. Rather than excluding nodes thai may be considered either S or T nodes, the systems and.
  • a particular B de may be considered a source of attention (S) in one mode, and an object of attention (T) i the other.
  • the set of nodes Before clustering, the set of nodes may be farther constrained by parameters applied to X, or to a one-mode subnetwork X' consisting of the network 130 defined by nodes in S: along with ail nodes in T that are also in S (or at a level of abstraction under an. element i n S, co llapsed to the patent node).
  • Standard network analytic techniques may fee applied, to X' in order to reduce the source nodes under consideration for clustering. For instance, requirements for k -connectedness may be applied in order to limit consideration to well-connected nodes,
  • partitioning 104 may include: L) specification of node level for building the two-mode network, 2.) assembly of bipartite network matrix 132 using iterative processing of matrix to conform with chosen threshold parameters, and 3.) statistical clustering (multiple methods possible) of nodes o each mode, that is, source node clustering 1 14 and outiink clustering 1 .18, Outiink clustering 118 to form an outiink bundle may involve identifying sets of web sites that are accessed by the same kinds of people.
  • nodes With respect to specification of node level, distinction may be made between "nodes" and
  • the node URL may correspond very simply to a "hostname* (the part of a URL after "http://* and before the next * * ) or a hostname plus a uniform path element (like " blag" after the hostname).
  • multiple nodes may exist at. pathnames under the same hostname.
  • a "node level" may be selected tor building the two-mode network, such, that second mode nodes include (from most general to most specific level) a,) raetanodes (coilapsmg sub-nodes into one) and independent nodes, b.) child, or sub-nodes (treated individually) and independent nodes, or c.) objects (of which a great many ma exist for any given parent node).
  • a node with a webpage URL may Often have one or more associated "feed" URLs, a! which published content may be available. These " feeds are generally considered as the same logical node as the parent site, but may be considered as independent nodes. If a target URL is not a publishing node, but another kin of website, the level may likewise be chosen, though more levels of hierarchy may be possible, and typically the practical choice may be between hostname level or full pathname level,
  • links may he reviewed and collapsed (if necessary ⁇ to the proper node level as described hereinabove, and the two-mode network, may be built between all link sources (the initial node set) and all target (second-mode) nodes at the specified node level or levels.
  • blacklists and wMtelists may be used, to, respectively, exclude- or force inclusion of spec fic source- r target nodes.
  • an NxK bipartite matrix M in which N is the set of final source nodes and K is the set of final target nodes, may be constructed according to user-specified, optional parameters, such as maxnodes, nodemin, maxlinks, iinkmin, and th like.
  • An iterative sorting algorithm may rioritize highly connected sources and widely cited targets, and then use these values to determine which nodes and targets - from the Ml network data may be included in the matrix.
  • axsources and maxtargets may set. the maximum values for the number of elements in N and K.
  • Nodemin may specify the minimum number of included targets (degree) that a source is required to link to in order to qualify for inclusion in the matrix.
  • Linkmin similarly may specify the minimum, number of included sources (degree) that must link to a target to qualify it for inclusion In the matrix.
  • Two other optional parameters, nodemax and Itnkmax max. be used to specify upper thresholds for source and target degree as well
  • Each value (V?) in M is the: number of individual links from source i to target j.
  • clustering 'algorithms which may be used t partition the network, including hierarchical agglomerate ve, divisive, k-meaos, spectral, and the like. They may each have merits for certain objectives.
  • one approach for producing inierpretable results based on internet dat may be as follows; I .) make M binary, reducing all values 0 to 1 ; 2.) calculate distance matrices for M and its transpose, yielding an NxN matrix, of distances between sources, and a KxK matrix -of distances between targets. Various distance measures may be possible, but good result may be obtained by converting Pearson correlations to distances by subtracting from 1 ; 3.) using Ward's method for hierarchical agglomerative clustering, a cluster hierarchy (tree) maybe-computed and stored for each distance matrix. Results of an arbitrary number of clustering operations may be saved in their entirety, so that an particular flat cluster solutions may be chosen as the basis for generating outputs.
  • the clustering algorithm may be language agnostic, that is, forming attentive clusters aro ' urid similar targets of attention without a constraint on the language of the targets.
  • clustering ma mak use of metadata that may enable the system to know about the content of various websites without having to understand a language.
  • the algorithm may have a translator or work in conjunction with a translation application in order to fi nd term across publications ' of any language.
  • Any particular set of cluster solutions for source nodes may be selected by the user in order to generate one or more of the following classes of output: 1.) per-claster network metrics for source nodes 120; 2.) across clusters comparative frequency measures of link, text, semantic and other node and link-level events, content and features; 3.) visualizations 124 of the partitioned network, combined with these measures and.
  • any particular set of cluster solutions for target nodes may be selected and used in combination with the set of cluster solutions for source nodes in order to generate: 1.) measures of link frequencies and densities 128 between source clusters and target clusters; 2.) visualization 124 of the previous as a network of nodes representing clusters of sources and targets with, ties corresponding to link densi ties .128; and 3.) visualizations 124 of one-mode calculated (network of target nodes) networks with ' partjtion data,
  • the partitioning of the network into sets of source nodes may allow independent and comparative measures to be generated for any number of items associated with sourc nodes. These may include such items as: a) the set of target nodes K in M; h.) any subset of ail target nodes, inci udmg those on.
  • any set f target objects such as all URLs fo videos on YOUTUBE ' TM or all object URLs on user-created lists; d.J any other URLs; e.) any text string found in published material from source modes; f.) any semantic entitie found k published material from source nodes; g.) any class of rneta-data associated with source nodes, such as tags, location data, .author demographies, and the l ike.
  • the ' following examples of measures may be generated per each cluster: L) total count: number of occurrences of item within the cluster (multiple occurrences per source node counted); 2.) node count; number of nodes with item occurrence within cluster (multiple occurrences per source node coun t as 1); 3,) item/cluster frequency: total count f of nodes in the cluster; 4.) node/cluster frequency: node. count of nodes in the cluster;. 5.) standardized item/cluster frequency: multiple approaches are possible, including z- cores, and one approach is to use standardized Pearson residuals, which control for both cluster size and item frequency across clusters and.
  • inks where a message can be placed in order to reach specific clusters the relative use of key terms across the clusters (which enables developing specific messages to communicate to each-. cluster), a hitcount (the raw number of times each outiink and term was found withi all the identified nodes), source node and/or cluster geography and demographics, sentiment, and the like.
  • differential frequency analysis can be. done on meta-data, suc as tags, that are associated with different attentive clusters o facilitate cluster interpretation.
  • interpretations of what the clusters are about may be deri ved without any manual review.
  • the met d ta associated with the- clusters may be used t facilitate interpretation of the meaning of the clusters.
  • the meta-data may be language independent, such as GIS map data.
  • a social network diagram may ' be generated and -used to display link, text, semantic and Other node and link-level e ents, content and. features ("event data"), such as that shown in FIG. 2.
  • the network ma ma be static or it may be the basis of an interactive interface for user interaction via software, constituent-as-a ⁇ servlee (SaaS), or the like.
  • SaaS software, constituentware-as-a ⁇ servlee
  • One method may be to use a "physics mode! o "spring erobedder” algorithm suitable for plotting large network diagrams.
  • the Fruchterman-Retngold. algorithm may be used to plot nodes in two or three dimensions. In these maps, every node is represented by a dot, and its position is determined by link to, from, and among its neighbors. The size of the dot can vary according to network metrics, typically representing, the chosen measures of node eentrality.
  • the technique is analogous to a locally-optimized multidimensional sealing algorithm.
  • nodes may be colored according to selected cluster partitions, to allow easy identification of various partitions,.
  • This projection of the cluste solution onto the dimensional map may facilitate intuitive understanding of the "social geograph y " of the online network.
  • This type of visualization may be referred to as a "proximity cluster'' map, because proximity of nodes to one another indicate relationships of influence and interaction.
  • projection of -event data onto the ma may enable powerful and immediate insight into the network context of various online events, such as the use of particular words, or phrases, linking, to particular sources of information, or the embedding of particular videos.
  • metrics may be calculated for partitions at the aggregate level. Event. metrics may include raw counts, node counts, frequencies (counts # nodes in cluster), normalized and standardized scores, and the like.
  • Examples typically include values such as: the proportion of hiogs in a cluster using a certain phrase; the number of blogs in a cluster linking to a target website; the standardized Pearson residual (representing deviatio from expected values based on chance) of the links to a target list of online videos; the per cluster "temperature" of an issue calculated from an array of weighted- value relevance markers; and the like.
  • any particular set of cluster, solution for target nodes may be selected and used in .combination with the set of cluster .solutions for source nodes In order to generate additional outputs.
  • Visualizations produced may include: 1.) two-mode network diagram of relationships between clusters of sources- and targets, treated as aggregate nodes and with tie strength corresponding to link density measures; and 2.) second-mode C3 ⁇ 4o-eitaiion ("network diagram).
  • targets are nodes, connected by ties representing the number of sources citing both of them, and colors corresponding to cluster solution partitions.
  • Another output may be macro measurement of link density.
  • the matrix M may he collapsed to aggregate link measures among clusters of sources and clusters of targets,
  • a ser ies of SxT matrices may be used, with S as the set of source clusters ⁇ "attentive dusters") and T as the set of clustered targets ("outSink bundles").
  • These matrices may contain aggregated link measures, including: counts
  • Various standardized measures are possible, with standardized Pearson residuals obtaining good results, Any of these measures may foe used as the basis of tie strength for two-mode visualizations described above.
  • a density matrix may be constructed between attentive duster and outlink bundles.
  • the attentive clusters may be represented as row headers and the Outiink bundles may be represented as column headers.
  • the density matrix may allow users to see patterns in attention between certain sets of websites and certain bundles.
  • the densit matrix may provide a way to identify similar media sources. Further, the density matrix may provide Information about attentive clusters that may be based on particular verticals.
  • a social network, map. of the English-language blogosphere is depicted.
  • the social network map graphically depicts the most linked-to felogs in the English language blogosphere.
  • the size of the icons representing each Individual blog may be representative of a network metric, such as the number of inbound links to the blog.
  • Thi visualization depicts the output from a method tor attentive clustering and analysis which identified attentive clusters of linked-to hSogs, wherein the attentive. Clusters Included authors with similar interests,
  • the method for attentive clustering and analysis analyzes hloggers' patterns of linking to understand their interests.
  • the visualization in FIG. 3 highlights liberal and conservative IIS, bioggers, and British bioggers as attentive clusters. By zooming in on ' the visualization, subgroups such as conservatives focused on. economies or liberals focused on defense may be identified from among the attentive clusters depicted,
  • the method for attentive clustering and analysis enables building a custom network map
  • the network map features attentive clusters of bioggers attuned to these topics: environmentalists, civils, political bioggers, and parents. Subgroups within each topic may be delineated by a different color, a different, icon shape, and the like. For example, within the parent bioggers, icons representing the libera! parent bioggers may be colored differently than the traditional parent bioggers. Surprising relationships may be discovered among • groups of bioggers. For example, in FSCi 5, two parent bioggers with very different social values are closer in the network than either is to political bioggers who share their broader political views.
  • each attentive cluster may have its own core concerns,- viewpoints, and opinion leaders.
  • the method for attentive clustering and analysis enables identification of blogs that are considered bridge blogs, such -as the one shown circled, which indicates that the blag is popular among multiple attentive dusters.
  • the method for attentive clustering and analysis enables identification of whose opinions matter, about what, and among what groups.
  • the steps of attentive clustering and analysis may include constructing an online author network, wherei constructing the online author network includes selecting a set of source nodes (S), a set of outlink targets (T) from at least one selected type of hy perl ink, and a set of edges (E) between S and T defined by the at least one selected type or types of hyperlink from S to T during a specified time period 702; deriving a set of nodes, "f, by any ⁇ combination of a.) normalizing nodes in TNase optionally t a selected level of abstraction, bj using lists of target nodes tor exclusion ("blacklists**), and c) using lists of target nodes for inclusion- ("whitelists.' * ) 704; transforming the online author network into a matrix of source nodes in S linked to -targets in T 708: and partitioning the online author network into ' at least one set of source nodes with a similar linking history to form an attentive cluster and at least one
  • the steps may optionally include generating -a graphical representation of attentive clusters and/or outlink bundles in the network to enable interpretation of network features and behavior and calculation of comparative statistical measures across the attentive clusters and outlink bundles 712, wherein at least one element of the graphical representation depicts a measure of an extent of a type of activity within the network; and optionally measuring frequencies of links between attentive clusters and outlink bundles enabling identification and measurement of large-scal e regularities in. the distribution, of atten tion by online authors across sources of information 7.14.
  • the element of the graphical representation may use at least one of size, thickness, color and pattern to depict a type of activity. Attentive clusters may be visually differentiated m the graphical representation by at least one of a.
  • Constructing the matrix may include applying at least one threshold -parameter irons the group consisting of: raaxnode.% targetmax, nodemin, targetoiin, maxiinks, and linkmin.
  • Co comptitig the matrix may include applying a minimum threshold lor the number of included nodes that must link to a target to qualify it for inclusion in the matrix
  • Constructing the matrix m y include applying a minimum threshold for the number of included targets that must link to a node to qualify it for inclusion in the. matrix.
  • Constructing the matrix may include using blacklists t exclude particular nodes, and whitelists to force inclusion of particular nodes.
  • the matrix may be a graph matrix,
  • a valence graph may be constructed that, depicts words, phrases, links, objects, and the like that are preferred by one sub-cluster over another sub-cluster; such valence graphs may use aggregated sets of clusters defined by users to display dimensions of substantive interest, such as in FIG, 11 .
  • works from, authors who are mosi relevant in a particular cluster may be displayed and then published as a widget, which may be custom-based on a valence graph, as a way of raomtoring an ongoing stream of information from that cluster.
  • Clusters may he customizable within the widget, such as via a dialog box, menu item, or the like. Further examples will be described hereinbe!ow.
  • a user may be able to, optionally in real time through a user interface, select a stream of information based on looking at the environment, zoom, in based on clustering, figure out a valid emergent segmentation, and then set up monitors to watch the flow of events, such as media objects, text, key wOrds/language, and the like, in. real time.
  • differences i word frequency use by attentive clusters may be used to differentiate and segment clusters.
  • the attentive clusters "militant feminism* and "femin ist mom” may both frequently use terms associated with feminism i n their -publications, bu t additional use of terms related to consism in one ease and maternity in another ease may have been used to subdivide a cluster of civils into the two attentive clusters "militant feminism” and "feminist mom," In extending this concept, not just word usage bat the frequency of word usage, may also be useful in segmenting clusters.
  • an application may automatically craft an advertisement to he placed at one or more out! inks in an o tlink bundle using high frequency terms used by an attentive cluster. Further in the embodiment, th advertisement may be automatically sent to the appropriate ad space vendor for placement at the one or more outlinks.
  • the data collection may include collection of web-based data, such as, for example, clickstream data, data about websites, photos, emails, tweets, blogs, phone calls, online shopping behavior, and the like.
  • tags may be collected automatical iy or manually for every website that, is a node.
  • the tags may be non- hierarchical keywords or terms. These tags may help describe an item and may also allow th item to be found again by browsing or searching.
  • tags may be associated in third- party .collections such as DELICIOUS ' tags, , and the like..
  • tags may he generated by human, coders.
  • CFI cluster focus index
  • the system may apply a further data collection, process in order to associate respondents to a survey and their news sources with various corners of the internet landscape.
  • the influence of a particular news outlet across a segmented environment of the online network may be obtained by examining clustering in conjunction, with a downstream dat collection process, such as .obtaining survey research, elsekstream data, extraction of textual features, for -content analysis including automated sentiment analysis,, content coding of a sample of nodes or messages, or other data.
  • a downstream dat collection process such as .obtaining survey research, elsekstream data, extraction of textual features, for -content analysis including automated sentiment analysis,, content coding of a sample of nodes or messages, or other data.
  • clustering data may be overlaid on CHS maps, "human terrahT maps, asset data on a terrain, cyberterrain, and the like.
  • a method of determining a probability that a user will be exposed to a media source given a known media source exposure is provided.
  • the media source may include newspapers, magazines, radio stations, television stations, and the like.
  • a user who may be exposed to a particular media source may be clustered in a specific attentive cluster. Accordingly, the system may determine thai users in that ' particular attentive cluster are more likely to be exposed t another media source because the second media source may also be present in an oullink bundle preferred by the cluster,
  • a method of attentive clustering on a meso level is provided.
  • the method may enable identifying emergent audiences ' (Attentive- Clusters) and monitor how messages (as specific- as a single article In print; as broad as core campaign themes) traverse cyberspace.
  • the method may involve mapping the attentive clusters where messages have, or are likely to find, recepti ve audiences. Mapping may enable identifying opinion leaders, -and information sources,. online and offline, which help shape their views.
  • the method may enable identification of the mindset/social trends of a group of users.
  • the system may he able to associate an attentive cluster with a known network, such as a political party, a political movement, a group of activists, people organizing demonstrations, people planning protests, and the like.
  • the system may be able to track the evolution of a movement or identit over ⁇ time.
  • the system may track the impact of the political movement of the cluster on society. The system ma track if the political movement has been accepted by majority of the people of the society, rejected by the society, if there is debate about the political movement, and the like.
  • the method may enable growth of a brand, sale of a product, conve ing a message, prediction of what people care about or do, and. the like,
  • a system and method for multi-layer attentive clustering may be provided.
  • attentive clusters may be tracked across various layers of a social segmentation* such as specific social media networks- (TwiiterTM, FaeebookTM, OrkutTM, and the like), a b!ogospher , and the like.
  • the system may be able to track development of an attentive cluster in a single layer or across multiple layers at every stage of the development of the cluster.
  • the growth of an attentive cluster supporting political movement may be tracked back. in time and over a period of a time, ' in the example, once an attentive cluster may be .identified, the system may examine the nodes associated with specific players in the attentive cluster in order to determine characteristics, such as. who is talking to whom, identify key nodes or hubs that link many other layers and/or media sources, identify apparent patterns of affinity or antagonism among clusters or other too wo. networks, who may have started the political movement * when the political movement may have started, what messages were used at the forefront of the political movement's establishment, the size of the movemen the number of people who initially joined the political movement growth of the political movement, influential people from various stages of the political movement, and the like.
  • an attentive cluster may be tracked in a single layer, such as by monitoring the number of TwitterTM followers (or other applicable social platforms), the frequency of new followers added, the content associated, with that attentive cluster, inter-cluster associations, and the like, to determine if a political mo vement may be being spawned, expanded, diminished, or the like.
  • the socio-ideologicai • configuration of the people who spawned the political movement may be evident from analyzing one or more of a biog layer, asocial networking layer, a traditional medi layer, and the like.
  • J6132J For example, a TwitterTM (or other applicable platform) map may be formed where each colored dot is an individual TwitterTM account and the position is a function of the "follows" relationship. People are close to people they are following or who are following the ,. xh e attern of the map may be: related to the structure of influence across the network.
  • the system may be deployed on a social networking site to identify and track attentive clusters and linkage patterns associated with the attentive clusters.
  • the system for attentive clustering may foe applied on FacebookTM to identify attentive clusters in the Facebook.TM audience and track the cluster's activity within FacebookTM
  • the system may be used to identity a group of people wh ma be susceptible to a message. By Identifying aad tracking an attentive cluster in the FacebookTM layer that may be susceptible to a message . , downstream activities, suc ' h as organizing in response to the message, ma be examined, For example, an attentive cluster of university students ma be presented with a message regarding a proposed law lowering the drinking age.
  • the system may track activity within the cluster related to the message, identify new ' groups formed around -the topic of the message, invitations to other groups regarding the message,, opposition from other groups in response to the message, and the like. Indeed, the system may be able to track the formation of new attentive clusters in the FacebookTM layer in response to the message, in this case, the system may identify individuals Otis ' groups that link to one another who share a common interest or target of attention, , such as concerned parents pposin the proposed law, anti-government groups supporting the proposed law. child advocate groups opposing the law, and the like. Discoveries related to the- original layer ma be applied to strongly associated clusters in other layers. For instance,: determination about the interests of a cluster in the FacehookTM layer may be .used , to drive a communications or advertising strategy in associated clusters of other layers such as wehJogs or TwitterTM.
  • Measures for characterizing contagious phenomena propagating on networks may include peafcedness, commitment (such as by subsequent uses and time range), and dispersion (including normalized concentration and cohesion) and will be further described herein,
  • two-mode networks may be generated by projecting modes one onto another.
  • certain social networks may not allow handling of individual data, but may allow public- page data to be accessed. In this way, data from individuals who comment on public pages may be obtained.
  • Public pages may be treated as a two-mode network that is collapsed to one mode, for example, a two-mode network may be formed ' from two -classes, of actors, people and cocktail parties that the people attend.
  • One class of actors could be labeled 1 5 and the other A ' -E to generate a scatter diagram depicting a two-mode network, either a network of eockta.il parties attended by the same people or a network, of people who attended the same cocktail parties.
  • networks may be formed based on who participates in the stream of objects that come from different public pages, the relationship between public pages, such as if there is a . direct "like" relationship between public pages, weighted by how many people commented on objects from two or more pages, and the like.
  • a method of analyzing attentive clusters over time- is provided.
  • the analysis of these attentive clusters may enable the system to depict changes in the linking patterns of attentive clusters over a time period. Further, the analysis .may allo depiction of any changes in the structure of the network itself.
  • a time-based reporting method may be used by the system to demonstrate the effects of events/actions throughout network of attentive clusters For a period of time, in the method, bundles that may -be- . lists of semantic markers, including text elements embedded in a post- or tweet, links to pieces of online content, metadata tags, and the like, may be tracked in clusters across a network, such as a blogosphere.
  • a bundle of semantic ma ke s related to obesity may be tracked Over time to determine how the topic of obesity is being discussed,
  • a particular bundle (with text, l nk and meta data elements) ca be tracked across clusters to see where they are getting attention or not
  • the measure of attention may be defined as a "temperature.”
  • the "'temperature” is based conceptually on Fahrenheit temperatures (without negatives) as compared to other issues where 100 is very, hot and.0 is iee cold.
  • the method may have a tracking report as an output for tracking issues in a map across time, in this example, the tracking report may be focused on a collection of bSogs most focused on childhood obesity organized into attentive clusters over a moving 12-month period of time,
  • the blogs may he clustered broadly into policy/politics, issue focus, culture, famil /parenting, and food attentive clusters. There may be sab-clusters defined for each of those clusters., such as conservative, social conservati ve, and liberal sub-dusters under the policy/polities cluster.
  • the report may indicate the issue intensity for each eiuster/sub-ciuster by assiining.it an average temperature per blog of conversation on the broad topic of childhood obesity within each group.
  • the report may indicate the issue distribution for each e!uster/sub- cluster by calculating a percentage of childhood obesity conversations taking place on b!ogs not in the map and within each cluster within the map.
  • specific terms may be tracked across the dusters/sub-el asters over time and the method may indicate an average temperature, based on the uses of specific terms in b!ogs within each cluster.
  • the term "school lunch” has a high "temperature” in certain issue focus clusters, liberal policy clusters, and foodie dusters and steadily increased over the last eight moving 12-month periods.
  • the intensity of sites, or the average temperature based on links to specific web sites on blogs within each cluster may be provided b the report.
  • the intensity of source objects, or the average temperature based on the links to specific web content (articles, videos, etc), may be provided by the report.
  • the intensity of sub-issues, or the average temperature of conversation on identified issues defined by a set of term and links, may be provided by the report, i the report, specific terms may be tracked on a monthly and per-eksster basis, specific sites may be tracked on a monthly and per-cluster basis, and specific -objects may be tracked on a monthly and per-cluster basis.
  • the system ma identify an track structural changes in a network. For example, during the recent US elections, blogs appeared instantaneously that were anti-Obama, Pro- Pal in, or Pro-McCain but were outside the conservative blogosphere. This rapid change in the network structure- may be indicative of a coordinated, synchronized campaign to message and hlog.
  • a : method of attentive clustering by partitioning an author network into a set of source nodes with similar adoption and use of technology features- is provided.
  • a feature or a piece of technology such as an embedded FacebookTM "Like” button, may be a target of attention or clustering item,
  • a method of creating dusters of people and describing probabilistic relationships with other clusters, such as words-, brands, people, and the like, is provided.
  • the system may describe any probability of any relation between them,
  • a cluster focus index score (CF1) may be calculated, CFl represents the degree to which an event, characteristic, or behavior disproportionately occurs in a particular cluster, or a particular cluster, relati ve to the network, preferentially manifests an event, characteristic, or behavior.
  • FIG, 9 depicis a graph of cluster focu index scores, for targets of a conservati e-grassroots attentive cluster.
  • the targets circled on FIG. 9 are those that everyone in the network links to, according to their CFL
  • the targets circled i ' FIG. 1. ⁇ (A through E) are those that are disproportionately linked to by the conservative-grassroots attentive: cluster, according to their CFL
  • .a method of identifying, websites with high attention from an identified attentive cluster or author is provided.
  • the method may include determining the websites frequently or preferentiall cited by identified ' authors by examining the websites' cluster focus Index (CFl) score. Further, the method may include automatically sending or placing advertisements, alerts, notifications, and the like to the websites.
  • CFl cluster focus Index
  • a social network analysis may generate a network map with thousands of nodes clustered into attentive clusters.
  • influence data thai results from the network analysis may be influence metrics for sites from across the Interact which bloggers link to, including .mainstream media, niche media, Web 2.0, other bloggers, and the like.
  • influence 'date may be metrics that reveal network mfiiten.ee among bloggers directly , Bloggers are .usually thought of as simply being more, influential or less, but this data lets the analyst discover which hiogs are Influential among which online clusters (segments), a far more granular and targeted approach.
  • Cluster targeting can be iurther refined to identify which nodes in a specific cluster have influence on any of the other clusters on the map. Because the conversation within social media covers a wide variety of topics, source ' arid network- influence alone do not necessarily refkci influence on a specific topic. A relevance index metric for discussion regarding particular topics, events, and th like ma be added to a social network analysis to identify which nodes are most focused on this topic,
  • index scores may for nodes may also, he ' calculated, using lists of semantic markers, to provide further metrics of value for targeting communications, advertising, and the like.
  • specific sorts of ' the data will create lists of likely high-value targets for further action, While count, CFI, and relevance index scores are all important they can be combined in order to maximize certain objectives.
  • the following use case examples include combining count and relevance into a targeting index, by multiplying their values. Other, more complicated maximisation formulas are possible as well.
  • the examples demonstrate specific influence sorts that can be generated from ' the Russian ' network data to address each use case. The network data is based on the linking patterns of the nodes in the RuNet map over a nine-month ' peri d ending in February 2010.
  • Use Case 1 and Use ' Case ' 2 involve finding influential sources.
  • Use Case 1 involves identifying sources with the. most influence over the entire map by doing a. sort using the highest values of count. While extremely influential, and in many cases suitable- for advertising campaigns, these universally salient sites also tend to be much harder to reach out to than sites thai are smaller but specifically important to targeted segments.
  • Use Case ,2 involves identifying sources that reach a targeted cluster by sorting on sources by Cluster Focus Index-, CF!s may be sorted for any of the attentive dusters. Count metrics from the map as a whole and from the targeted cluster can be used to further prioritize for action. This sort is the equivalent ' of identifying traditional media trade press, the go-to sites for the selected segment Frequently, these include specifically influential bioggers in addition to niche media and other sources,
  • Use Cases 3-6 involve finding influential nodes.
  • Use Case 3 involves identifyin the greatest network influence by sorting the node by hideg (total number of links from other nodes within the entire network). This sort specifically identifies the network's "A is ' nodes, the mos influential .network members (bioggers). Like prominent sources, these are often more difficult to reach than more targeted niche mfluentisls, but they contribute greatly to. spreading viral niche messages across the wider network.
  • Use Case 4 involves finding the most targeted Infiuencers for a particular cluster by sorting the Cluster Focus Index scores for a targeted cluster to find nodes it -cluster-specific influence. This identifies the nodes with particular influence, interest or prestige among the target cluster. These .nodes tend to be much more "on topic" than others, and much easier to reach that map-wide A-iist nodes. Cluster-specific influential are not always from the target cluster itself, which can be very useful for trying to move discussion between particular clusters. Link metrics provide further assistance in: deciding targeting priorities,
  • Case. 5 involves, following a particular topic at the map level by sorting using topic focus target scores, which combine links (network influence) and topic focus index (issue relevance).
  • Formulas for calculating focus target score can be vari ed, hut the default may be to multiply links by topic focus index;, This may allow-' identification of those nodes " in the entire map that discuss the target issue most frequently. These may be monitored to gauge dominant threads of discussion and opinion about the issue, and targeted for outreach.
  • Use Case 6 involves targeting a -particular clyster's conversation on a topic by sorting within a cluster by the topic focus target score. This may allow members of the target cluster who write about the target issue to be identified for -monitoring or persuasion. Variations of the formula for combining influence and. relevance metrics into a single targeting metric can be used to bias the sort toward relevance, or toward influence, depending on strategic objective.
  • a proximity cluster map method may be used to visualize 124 attentive cluster-based data and generate- a network map.
  • attentive clusters and their constituent, nodes may be displayed i a proximity cluster map.
  • Nodes i the network map may be represented by individual dots, optionally represented by different colors, whose size is determined based on the number of other nodes n the map Sink to them,
  • a general force may act to move dots toward the circular border of the map, while a specific force pulls together every pair of nodes connected by a Sink. In static images or an .
  • nodes may receive a visual treatment to display additional data of interest
  • dots representing nodes may be lit or highlighted to represent all nodes linking to a particular target, or using a particular word, with other nodes darkened.
  • dot size may be varied to indicate a selected node metric.
  • a valence graph method may be used t visualize I 24 attentive cluster- based data and generate a valence graph.
  • targets of attention or semantic elements occurring in the output of nodes may he displayed in a valence graph.
  • the valence graph method may he understood via description, of how a particular valence graph is built such as a Political Video Barometer valence, graph (FIG. 8) useful lor discovering what videos liberal and conservative bloggers are writing about. This particula valence graph ma be used, to watch and track videos linked-to by bloggers who share a user's political opinions, view clips -popular with the users political "enemies," ' and the like,
  • the videos shown in the Barometer are chosen by queries against a large database built by network analysts engines performing network selection 102:.
  • a crawler or ' -spider' visits millions o f b logs and collects their contents and links.
  • the system mines t e links in these blogs to perform partitioning 104 and forms attentive clusters based, on how the blogs link to one-another ' (primarily ' via their blog rolls), and, over time, what else the bloggers link to in common. Attentive clusters may be large or small- and the bigger ones can contain many sub- clusters .and even sub-s b-elusters.
  • determining what the blogs have in common may he done by examining meta-data, tags, language analysis, link target -patterns,, contextual understanding technology, or by human examination of the blogs or a subset thereof in the example, American liberal bloggers and American conservati ve bloggers form the two largest sets of clusters, in the English language biogosphere, and the Barometer draws upon roughly the 8,000 w most linked-to '* blogs in each of these groups to position the videos on the graph by calculating proportions of links to ' each target by the two political cluster groupings,
  • the Barometer may be continually updated by scanning the blogs ' periodically, looking for new links to videos (or videos embedded right i the blogs). By counting these Jinks, it can be determined what v deos political bloggers are promoting. In embodiments, the link count may be displayed on. the valence graph using an identifier such as icon -or marker. In this exam le, some videos are linked to almost exclusively by liberal bloggers, some are linked to mostly by conservative bloggers, and a few are linked to more or less evenly by both groups, Once the system determines that a video s traction m the political clusters, it scans through data from ther parts of the blogos here o count how many "non-pol3 ⁇ 4i3 ⁇ 4a ! bloggers link to it as well.
  • the Political Video Barometer e ample illustrates one kind of vale ce graph and the insight that can be . gained, and. the applications that can. be built based on the method and the data obtained by the method. It should be. understood that the method may be used to examine any sort of potentially cluster-able data, such as technology, celebrity gossip, the use of linguistic elements, the identification of new sub-clusters of particular interest, and the like. All aspects of the valence graph method, and the underlying attentive clustering analysis, may fee customized along multiple variables to enable planning and monitoring campaigns of ail kinds.
  • a multi-cluster focus comparison method may enable comparing cluster focu index (CFI) . scores of multiple attentive clusters.
  • the CFI score may be a measure of the degree to which a particular out! ink is. f disproportionate interest to the attentive cluster being analyzed; in other words, the CFI indicates what link targets are of specific Interest to a particular cluster beyond their general interest to the network as a whole.
  • X may be the CFI score for cl uster A and Y may be the CFI score for cluster B
  • the multi-cluster focus comparison method may compare the two clusters, A arid B, based on their CFI scores, X and Y, This would allow a user to discern elements ' , of common interest vs. divergent interest between the two dusters, insights derived from this method would be of great value in creating and targeting advertising and communications campaigns.
  • link targets, semantic events, and node-associated metadata may ⁇ be scattered in w x «y coordinate space, and the dimensions of the graph may be custom-defined usin sets of clusters grouped to represent substantive dimension of interest, for a particular analysis. Elements are plotted on X and Y according to the proportions of links from, defined cluster groupings. For example, and referring to F G. 11, using data from the Russian bSogosphere, the top 2000 link targets for Russian bloggers may be plotted such that the proportion of links from "news-attentive" biog clusters vs. links from "non-news attentive ' ' clusters determined the position on Y s while the proportion of links from the "Democratic Opposition" cluster vs.
  • the "Nationalist' ' cluster determines the position on X, as shown in FIG, 1 1.
  • popular outlink targets for the US blogosphere may be displayed with the X dimension representing the proportion Liberal vs. conserveati v e bloggers linking to them, and the proportion of political bloggers of any type vs. non-political bloggers represented by the Y dimension, as show in FIG, 13.
  • Various data may be visualized in the graph associated with the clusters of news-attentive and political bloggers, such as meia-daia tags, words, links, tweets, words that occur within 10 words of a target word, and the like.
  • These visualizations may be used in interactive software allowing user-driven exploration of the data graphed in valence space, optionally allowing user-defined sets of cl usters to be used in calculating -valence me trics.
  • a - method of node, selection I 10 based -on node relevance to a defined issue, also known as semantic slicing is provided.
  • Semantic slicin may involve .clustering according to a relevance bundle.
  • a relevance bundle may include one or more of key markers, wha the nodes may have linked to, what the nodes have posted, text elements, links, tags, and the like.
  • semantic slicing involves pre-sereens nodes for relevance based on semantic analysis
  • the relevance bundles may be used to sort through all of the network data to select the top high relevance nodes.
  • a custom-mapping of a sub-set of the link economy may he done.
  • semantic slicing may enable generating a coniextualized report of interest to a user on an industry level. Semantic slicing may enable focusing ' attentive clustering on selected vertical markets.
  • the vertical markets may be a. .group of similar businesses and customers who may engage in trade based on specific and specialized needs.
  • Lists of semantic markers, such as key words and phrases, links to relevant websites and online content, and relevant metadat tags, are built which represent the relevant vertical market Relevance -metrics are calculated for candidate nodes,, and a selection o .
  • the semantic slice may be done to analyze an energy policy vertical, market by focusing the attentive clustering around one or more selected, highly relevant nodes,.
  • the attentive clusters may be more specific to identified domain interest of interest or vertical market.
  • the attentive clusters discovered include topic-relevant segmentations of particular kinds of Conservative bioggers discussing the issue, such as Conservative-Grassroots and Conservative-Beltway. Additional high-relevance attenti ve clusters may be identified, such .as climate Skeptics, Middle East policy, and the like.
  • Cluster focus index scores may be used to determine what sites everyone in each cluster links to and which sites are preferred by the cluster.
  • semantic slicing may be done using a single node, such as a particular website, a particular piece of content, arid the like. I n an embodiment, semantic slicing ma be done over a period of time to enable monitoring the impact of a campaign.
  • a tool such as sofhvare-as-a-serviee, for enabling users to define one or more- semantic bundles for attentive clustering and as the basis of report outputs.
  • the tool may be an on-demand tool that may be used for semantic slicing.
  • a user may declare a seman tic bundle o f nodes and/or links prior to attentive cl uster ing.
  • the system may provide an application programing interface (API) for delivering a segmen ation to track -one r more particular clusters f attention, or track how an audience is interacting with a piece of content, and the tike,
  • API application programing interface
  • the data about the various clusters may be collected directl from the API. For example, a user may wish to track a cluster. The user may enter keywords related to the cluster i a search option provided by the AM. Thereafter, the tool, may track . various websites ' - a d report back the webi iks: nd data that may be relevant to the cluster.
  • the API may be used to interact with a valence graph at various resolutions.
  • the API may provide.segmentation data and metadata derived from the segmentation to other analytics and web data tracking firms, for use in their own client-iaeing tools and products.
  • the segmentation and resultant data from attenti ve clustering provide ' an additional dimension of high value against which third-party tools and. other analytic capabilities such as automated sentiment monitoring may be leveraged,
  • the system may enable real-time selectio of elements to visualize based on attentive clustering of .social media.
  • the system may facilitate selection of a stream of information based on looking at the environment, zooming in on a dat element based on clustering, determining a valid emergent segmentation, and monitoring the flow of events in real time.
  • the events may include media objects, test, key words/language, and the like.
  • the -real-time selection of elements may facilitate an analysis of trends/events especially for flnaiicia! purposes.
  • a search engine may be provided that prioritizes search results being displayed to a user based on a determination of real-time attention including attention from a ⁇ particular cluster or set of clusters, A user ma be able to customize: the prioritization of search results, such as by getting real-time attention from a particular cluster, from a particular sub- cluster, and the like.
  • a search engine searches within only those Sites accounts with high cluster focus for a chosen segment.
  • a GOOGLETM search may be restricted to the 30 websites with the highest CFf scores for the Dirt Bike racing cluster of OAKLEY'S TWITTERTM followers map.
  • the search may only return results from a. list of ke influential sites related to the chose segment
  • the search may be • restricted to websites (or domains within them), with a particular CF1 score. Websites (or domains) that meet a threshold.
  • the search query may restrict the search to particular websites that are identified based on the CFI scores.
  • the search query- may be restricted by CFI score of a website and the CFI score restriction ma be indicated in the settings of the search engine, in other embodiments, the CFI score for sites to search may be indicated in the search, string itself. For example, a. user .may indicate -a particular
  • the slider may be provided with a normalised scale, such as ascribing I to low CFl scores and 10 to high CFl scores, such as using a linear, logarithmic, or other scaling process.
  • the system may then search, a database of websites for the range of CFl scores to identify one or more websites to which to limit the search. These websites are then included in a search string that is provided fo a search engine.
  • the search can be restricted to only specific content, or specific content may be promoted to high ranking within a search, leaving other content to the lower ranked results.
  • One way to do this restriction is to utilize the valence mapping functionality of the system.
  • a valence graph may be constructed for a chosen segment that depicts words, phrases, links, objects, and the like that are preferred by one cluster over another cluster.
  • Content indicated in the valence graph may be indexed by the system and only that content in the valence graph may be searched by a search engine. Further restriction of the content may be employed, such as by website, CFl score, and the like.
  • attentive clustering and related analyses ma result in identifying issues, altitudes and messaging language that may be specific to discourse for a target market, and ratty be suitable for presentation in a report.
  • a clustering of bloggers sympathetic to Arts in Schools by examining inira-eluster linking patterns, it may be determined that most of the bloggers within each cluster tend to keep the discussion, within their cluster except for the bloggers in the ⁇ lnteresting/teachers ediicalors" -cluster who have a tendency to spread conversation to each of the other clusters.
  • the report may feed into a method of generating a campaign blueprint for both social and upstream media sources and a method of identifying influence inter-cluster and hitra-ciuster in order to plan a campaign.
  • the blueprint may include target audience, demographic details, objectives- of the campaign, flow of the campaign, messaging to use in the campaign, outlmks to target, and the like.
  • the campaign tracker may track data from a variet of sources to provide closed-loop return on. investment (RO ) analysis.
  • the tool may parse the information of each website accessed by the users, keywords entered, any information about the campaign, and the like. Further, the tool may track how people react to the campaigns and which ones are most successful.
  • the campaign tracker may track and analyze results in real -time to determine die effectiveness of the c mpaigns.
  • the tool may enable the system to generate reports for clients.
  • the reports may include details about the campaigns such as campaign type, number of people who have viewed the campaign, any feedback from the people, and the like.
  • analyst coding tools and a survey integrator may support distributed metadata . collection for qualitative analysis to best interpret quantitative findings
  • the tools may include an interactive visual interface for navigating complex data sets and harvesting content. This interface may contain an interactive proximity cluster map which can display specific node data, metadata, search results, and. the like. This proximity cluster map interface may enable the user to click on nodes to. see node-specific .metadata and to open the node URL in a browser window or external browser.
  • a user can add metadata and view metadata about any given blogger on a map.
  • the tools enable grabbing whole sets of biogs or items to add to semantic lists, and may enable a user to define surveys so a team of human coders can open the website and f ll out surveys.
  • a dashboard may be provided.
  • the dashboard may combine advanced network and text analysis, real-time updates, team-based data collection and management, and the like.
  • the dashboard may also include flexible tools and interfaces for both "big picture" views and mmute-by-mmut updates.. on messages as they move through networks.
  • a user may define bundles and. track them in the aggregate through networks over time.
  • a user may he able to see how Specific media objects are doing with a particular cluster over time,
  • the dashboard may provide a burstmap feature in which the history of selected events or sets of events over a timeframe may be displayed, using a proximity cluster map.
  • nodes in the map will light up at a time corresponding ' to their participation in the selected event or events.
  • this burstmap feature may include a timeline view displaying event-related metrics over time, such as the number of nodes linking to a particula video.
  • the btirstmap feature ma include lists of events- available for display. An. example of - burstmap interface is found in FIG, 12,
  • techniques disclosed herein may be used to generate social media maps that visualize social media relationship data and enable utilization of a suite of metrics on the data.
  • Social media maps may be constructed via clustering of various social media communities including. TWITTER ' TM, PACEBOO TM Mop, online social media, and others.
  • the clustering technique used ma be manual, relationship-based, attentive clustering such as previousl disclosed herein, network segmentation, or another analogous technique.
  • the social media maps may be organised in portfolios that are targeted to market segments or relate to an issue/topic campaign.
  • Social media maps may be offered via an API or as raw data to plug into a third party dashboard.
  • Services related to the social media maps thai may be offered include robust tools for searching, comparin and generating integrated reports across multiple maps, searchable indexing and map browsing.
  • Pricing for social media maps may be via subscription, for one or more maps, a portfolio of maps, the whole portfolio of -maps, the whole portfolio maps save some exclusive custom items, or the like.
  • Systems and methods for how to generate, utilize, update and .offer social media maps will be further described herein, f 01??]
  • a comprehensive catalog of social medi maps and network segmentations may be offered and updated on a regular basis. The catalog may .include targeted portfolios for key markets, such as consumer goods, media and entertainment, politics and public policy, energy, science and technology, government, and more.
  • the catalog may contain maps for each layer of the social, medi system, such as biogs. TwitterTM, social network services, forums, and the like. It may contain maps for all major languages, countries and regions of the world Social media map data may be used, within partner dashboard systems, so that range of commercial tools can be leveraged by subscribers and so thai the social media map data are "portable ' " across various tools.
  • a suite of reporting tools may be used in conjunction with the social media maps, I&178]
  • one or more social media map and network, segmentations may be constructed via clustering of data from at least one social media community.
  • the social media map or network segmentation may be ottered via an API or as raw data.
  • the social media community may be based on at least one of a social media layer, a language, a country, a region, or the like.
  • the clustering technique may be attentive clustering,, as described previously herein, reiatiooshlp-based, manual, network segmentation, or the like.
  • relationship-based clustering of data from at least one social media community 1402 is used to construct one or more social media maps and network segmentations using the clustering 1404.
  • One or more social media maps and network segmentations may be offered via an APS 1408 or as raw data 1.410.
  • a report may demonstrate the interaction of «odes/Hftks between the maps 1412.
  • he maps may be generated by an -autonomous- process.
  • Th autonomous process ma create maps based on one or more criteria, a. scope definition, an instruction, or the Hke.
  • a social graph may be generated based on followers of an individual or entity in a social network.
  • the ma criteria may be semantical iy based, such as based on key words or hashta-gs.
  • the maps may be geo-based, such as based on which users/nodes are in a territory.
  • the maps may be based on previous mappings.
  • segment in other snaps on health and fitness- may be used to triangulate or iterate to a mapping of a new category.
  • the map may be based on an arbitrary set of accounts generated by a third party.
  • One scenario-might be a. mapping of the social network accounts .for all the users of a mobile application, in still another example, the maps may be based on a nomination of individuals based on some criteria, such as demographies. Once generated, the maps maybe, stored and indexed.
  • maps may be based on CFI scores for dynamic data (e.g., YOUTUBETM videos).
  • the amount of data may b increased to obtain a belter indication of what the segment is communicating about whether data can. be obtained on the i f!ueneers of a segment, which may be coming from oil the map, in additio to looking at data comin from the segment, the system may be able to access data jftom social media accounts that have high CFI for that segment (not just the ones that are * %" the segment).
  • CFI scores may be calculated for a first segment.
  • CFI scores may be calculated for those iniluencers on the first segment
  • the first segment may be followers of a: particular art, gallery but the system can also examine the CM for the first segment's itifiueneers, which may be several well-known Ar Gallery aficionados who may or may not be followers of the particular art gallery.
  • certain maps may be based only on the CFI scores calculated for the inftueocers.
  • a searchable inde for a catalog of social media maps may be constructed 14.14.
  • social media maps in the catalog may be searchable.
  • the maps may be searchable by a keyword, a URL, a semantic marker, an the like.
  • the social media maps may he indexed by one or more of a keyword, URL or semantic marker so as to form a searchable index of social media maps.
  • the searchable index may include metrics to indicate a statistic regarding the social media maps.
  • the statistic may represent a dimension -of popularity,, relevan.ee, semantic density, or similar feature.
  • a search engine may ⁇ be enabled to return maps in terms of relevance by using certain statistics in the searchable index.
  • a semantic marker may include a keyword, a phrase, a URL (node or object level), a tag (such as those from bookm&rking and annotation, services, ineta keywords extracted from. HTML, tags assigned by coders, etc, ⁇ , and the like.
  • Semantic markers may also Include those used in. particular social network environmen ?, such as TWITTERTM, and may include follows relationships, mentions, retweets, replies, hashtags, URL targets, and the like. Any of these semantic markers may be used to index a social medi map.
  • a stew social media map subscription may be suggested. For example, if a user searches a social media map index for the terms "Nissan LEAP M ,” “electric vehicle,” and leai3 ⁇ 4ations,coni, subscriptions to social media maps sueli as automobiles, eeo-friendly products, arid California trends may be ' suggested, i
  • a dashboard ma be- used, for browsing, visualizing, manipulating, and calculating metrics for one or more social media maps constructed via clustering of data from at least one social media community. Clustering techniques may include relationsh ip-based, manual, attentive clustering, or the like.
  • the dashboard may be a third -party dashboard that supports visualization of data from clustering, wherein the data may be delivered by a raw data feed, an API plug-in, or any other data delivery method, hi embodiments, the data from clustering, may be joined with or otherwise integrated with data from ot er data sources to form a new data set.
  • the new data set may be similarly browsed, visualized, manipulated, and processed by dashboards,
  • APIs, dashboards, and partner tools may be used with social media maps for planning/assessment.
  • social media maps may be used lor enterprise resource planning,- business insight, marketing, search engine optimization,- intelligence, politics, industry verticals, financial industry, and the like.
  • an entertainment promotion company may own a plurality of social media accounts, if they could navigate sector-level mappings related to genres of music, they could use the maps to target music genre-specific messages using the most appropriate of those accounts for maximum effectiveness.
  • custom maps ma be deri ved from mash ing up sets of social media maps.
  • the social media maps ma be constructed via clustering (e.g., • relationship-based manual, attentive, etc.): of data from at least one social media community targeted to a specific market segment.
  • the market segments may include government intelligence, public diplomacy, social media landscapes in other countries, pharmaceuticals, medical, health care, sports, parenting, consumer products, energy, and the like, in these embodiments, the market segment may be used to index the social media maps,
  • a reporting product may leverage social media maps to demonstrate the Interaction of nodes, and/or links between social media maps.
  • a multi-map report may be generated comparing the nodes and links in different social media communities in a particular n rket environment.
  • the repotting -product ' . may be integrated with a ' dashboard or analytics platform,:
  • Multi-map reports generated, by the reporting product may be used to demonstrate various phenomena, such as how particular items can be found m particular social media layers.
  • a multi-map report may demonstrate how wehlog hosts are having customers driven to th m from TWITTERTM.
  • a multi-map report may demonstrate how FACEBOOKTM pages are getting attention from a segment of TWITTERTM, 0189 ⁇
  • information derived ' from the social media maps may be published or displayed as a map widget, which may enable monitoring an. ongoing stream of information from one or more clusters or one or more maps.
  • - Information being displayed that is derived from the social media map may be customizable within the widget, such as via a dialog box, menu item, or the like.
  • a user may be able to, optionally in real time through a user interface, select a stream of information based on looking at the environment, zoom in based on clustering, figure out a valid emergent segmentation, and then set up monitors to watch, the flow of events, such as media objects, text, key words/language, and the like, in real time.
  • the published, widgetixed map acts as a sensor network to obtain a host of behavioral data and leads that can be leveraged by the map's user or hosts.
  • users ma interact with other users' ma widgets to discover content and individuals/entities.
  • Using other users' map widgets user may grow their own. networks by engaging with the content and people/entities in the widget, such as to start following a person or to retweet an item.
  • a social media map may be automatically refreshed via calculating a relevance score for nodes or bundles in the map 1502 and reconstructing the map based on a relevance ranking revealed by the relevance score 1.504.
  • Semantic/relevance marker bundles ma include lists of semantic markers like key words, phrases, relevant link targets, accounts that are followed on TWITTERTM, and the like Semantic markers may be manually eurated, in an embodiment, the refresh process ma involve performing the relevance search/semantic slice that generated the original map for new relevance/semantic markers. A relevance . ' calculation may be performed, on th nodes to calculate a relevance score.
  • a social media map may be automatically refreshed via positively or negatively weighting at least one cluster based o a CFI score calculation 1.508. and reconstructing the map to modify the nodes in the clusters 1510. Modifying the nodes may be done to include positively weighted nodes and exclude negatively weighted nodes, CFI scores for clusters may be leveraged to evol e a .map in a certain direction. Clusters in the map that include preferred/wanted nodes/links are positively weighted. Clusters are negatively weighted in they are deemed to not be relevant. Applying weightings to the map may enable pulling, in additional nodes that are more relevant. Weighting map dusters for the CFI bias operation may be done by humans.
  • a social media map may be automatically refreshed via filtering out unwanted nodes 1512,
  • a social media map may be automatically refreshed via obligatorily including nodes thai were not clustered in the original map 1514. Semantic markers thai are known to not fit based on their relevance ranking or for some other reason are not allowed are filtered out
  • nodes may be forced into the map whether or not they were identified in the relevance search/semantic slice. Curating black lists of nodes may be done by humans,
  • asocial media map may be automatically refreshed via erowd-sourced information regarding nodes and/or links that dri ve nodes to bundles 1518,
  • a social media map may be automatically refreshed via processing social media map usage data for trends/indicators .15.20. Usage data may reiate to one or m re of what is ignored, what is further explored, what Is used, how clusters are grouped, what name/label is assigned to a cluster, what color is used for a cluster, what order/position the cluster is placed in a report, and th like. Nodes preferentially interacted with may be weighted more ' heavily.
  • community feedback may mil aen.ee each of the three streams of automated map refresh described herein.
  • Community feedback provides an indication of news, events, information, etc. that may drive: addition of nodes to the bundles* sneh as. for example, if a new website is a target link. This sort of feedback may provide feedback or guidance as to the
  • CFI bias operation For example, if feedback suggests that a cluster is relevant then that cluster may be positively weighted.
  • Feedback and updating may be based on how people are using the maps, such as, understanding what they ignore, what the drill, down on, what they use, how they want, to group things, what name label they assign a cluster, what color they use for a cluster, wha clusters are most important: to a client based on an order/position the client places it in a report, and the like. Refreshing the maps may leverage this captured information,
  • feedback may be received passively from clickable/interactive maps via built-in feedback system.
  • This feedback system may be. used a a naive weighting system.
  • the map may include a flag available to provide commentary or feedback. ⁇ 9198 ⁇
  • a map may include raw clysters and hurnan «made groupings and the attachment of oth er sort of metadata Such as the coloring of a cluster.
  • the exampl may be that of the Russian bl-ogosphere, which may contain 40 clusters and 7-8 groups, including 5 right wing Russian nationalist groups and a libera! apposition group.
  • Clusters may be processed by human- assigned re-aggregation, and metrics may be run against them to progressively refine the clusters. Different clients, even on a base map, may want to grou things differently, name a cluster in an interface diffcrently, color a cluster In an interface differently, and the like. Users need to be able to define groups, re-ia d clusters, select clusters and the like. Community feedback ma provide observations as to how users are grouping the same- map- and that yields data, about which clusters are related to each other that, is ⁇ r wd-sou ced to the user. Users may define the order in which the data are presented, in- the reporting. For example, a user may want to place data on preferred clusters higher in a chart " . Cluster orderin and positioning information is customizable, which can be harvested as an importance weighting by the community.
  • map users may- contribute to map metadata to generate a community data set established, and/or expanded by users. For example, users could input the gender of a Tweeier blpgger.
  • the user community itself may be a segmentable population.
  • the user community can contribute to scoping a. map for a particular topic. For example, somethin about a disease might appear in various places; Consumer segments. Politics, Medical/science, Sports, and the like, User feedback may also help scope the size of the map. For example, a user may ask: Should the map be constructed on the first 5,000 target or should 20,000 targets be used? in an embodiment, user-contributed data may be used to provide metadata for a social media map constructed vi clustering (e.g., relationship-based, manual, attentive, or the like) of ..data .from at least one social media community.
  • a social media map constructed vi clustering e.g., relationship-based, manual, attentive, or the like
  • data including user-contributed data, may form a searchable, editable metadata and basic information repository for U RLs 1602, such as to form a U Lipedia.
  • the repository may be linked to one or more social media maps 1604.
  • clustering e.g., relationship-based, manual, attentive, or the l ike
  • clustering of data from a t least one social media community may be used to generate an actionable targeting list.
  • Targeting lists combine network centraBty 1704, issue relevance 1708 and CP! for a cluster 1710 into a ranked target list 1702 that may be used by marketers- or " ther interested parties In order to reach certain nodes in some meaningful order for targeting for strategic " communication or other business purpose, fire formula of combination may be adjusted to maximize ranking to suit client/user objectives.
  • network centra!ity may be a universal score related to how central node is in the network.
  • daytime talk show hosts may have a network central ity of 1.00 in the general population, while economists may he a zero, !rs.
  • a Cluster Focus Index score may be calculated for each cluster.
  • daytime talk show hosts may he a zero CFl for economies, buteeonomlsts are 100.
  • an issue relevance score may be calculated- for each cluster,
  • the issue relevance -related to the budget deficit may be calculated based on a publication ' frequency ⁇ score (e.g., # of tweets).
  • Other scorn techniques- may be used- to -calculate an issue-relevance.
  • user may b able to purchase ads or message placements on a target from the targeting list 171.2.
  • users may be enabled to buy an ad placement or messag placement on the target site at the cl ick of a button, in an embodiment, the effect, or impact, of the ad/message placements may be tracked for the node and across a social media map.
  • the system may enable users to identity targets according to a ranked list based on network centrality, CFl, and issue. -relevance, and then place and. track ads/messages o the targets -from the lists.
  • targetin lists may be used in connection with any ad network for ad/message placement. Tracking ads/messages may involve receiving feedback on actions taken with respect to the ads messages, .calculating impact metrics, and the like *
  • historical data, browser may provide a mechanism for visualizing archived, historical social media map data, such as for research or historical purposes.
  • ther may be value to academia of accumulatin old social media ma s and showing the delta between them, such as to explore how the market has evolved over some period of time.
  • Historical, social medi map data may also be useful for financial Industry foretisie-s and i nt el ligence analys i s,
  • CFl metrics may be displayed on a social media map.
  • a CFl metric for items in clusters indicates. ho -much attention there is to thai item for that cluster.
  • An. ttention score indicates the relative attention to an item as compared to other items for a cluster for a range of time or for a "point" in time, A higher attention score means the item is more specific to the cluster. Attention scores are non-linear in the sense that anything below two is -not significant and greater than two, it is exponentially significant
  • CFl scores may be a metric for measuring search engine optimisation and/or advertising effectiveness because it represents cluster specificity. CFl metrics would have to be combined with a more global metric to enable companies to shift from thinking at the exeeution implementation. layer (e.g., where d 1 advertise?) to the strategic -layer (e.g., where are we going with this community? ⁇ ,
  • a CFl Graph may include CFl scores for sources and nodes on th map. in the upper right of the. map are clusters with high focus on the particular cluster, high overall level of attention, and many in-links. On the CFl graph, users can see various items at a glance. For example, users may find the key players related to a topic or the landscape of players to determrae who has influence.
  • a CFl graph ma include a Cluster Map Properties Editor/User nterface.
  • the -interface .enables users to label, clusters, assign clusters to a group, and perform group metrics.
  • Maps may be generated based on semantic elements, bundles, white lists;* black lists, and the like in an automated fashion in come embodiments but labeling the clusters in automated way, such as -when a map update is made, ma be difficult.
  • Drat labels may be assigned when the cluster is created or updated based on- a previous storehouse of knowledge. A confidence score as to that labeling may be generated.
  • members of a cluster may be compared with membership of clusters of past maps and if a high percentage are the same then it is assumed the clusters relate to the same thing and are labeled similarly, in another embodiment, automated labeling is based on a structural equivalence.
  • Labeling a node or an object that has well defined properties may - he easier than labeling a cluster, which is a collection, of objects.
  • Structural equivalence involves examining the node's outhnks. For example, if people are friends with the same people, then they may have similar interests. In another example, oiog that link to the Same sets of things are likely to be similar. In yet another example, if there are two people who have superior relationships to twenty soldiers, chances are that the two people are sergeants or some other form of commander; While this may work at the node level, it is harder to do at the cluster level CFl. scores, which are already generated for clusters, may be used in the generation of labels.
  • CFl scores enables a comparison between two Items or sets of items thai a cluster may be disproportionately paying attention to.
  • Cluster 1 is very interested in horses and baseball
  • - Cluster 2 is very interested in horses and basketball.
  • vector cosine similarity can be used to determine the relationship between the two clusters.
  • vectors can be built based on the CFl scores calculated for each of the clusters for the same items ie.g., Cluster 1* F11 (1), CFl 1 (2) . . .
  • the vectors may be plotted in a 3D vector space.
  • the cosine of the angle between the two vectors may be one indication of the relationship between the. two clusters. If the cosine is small, the confidence is high.
  • clusters in the new map can be compared to clusters- of old maps. When there is a match, that is, small angle between two -cluster vectors, the label from the cluster in the old map is assigned to the cluster in the new map.
  • the cosine of the angle may also act -as a similarity score. There are a number -of measures .for vector distance, including CQ.msiati.on distance, cosine similarity, Euclidian distance, and the like.
  • the CFFs may be filtered to: include only a €f I of two or more o a particular cluster. This/effectively reduces the dimensionality of the space.
  • items that are similar may be aggregated in labeling. For example, using outlink bundles rather than an individual CFl score may enable grouping items into target clusters, and examining the density of links to the target cluster.
  • an advertising campaign planning tool can enable running a campaign on biogs, and tracking -success in other layers (e.g., ; TWiTTER. iM ; F.ACEBOO TM; segment- specific online forums).
  • URL shorteners included in social media content may be tracked.
  • the system may provide reporting outputs that track the success of a social medi campaign including a URL shortener in different- layers of the social media sy stem,
  • the system may not only be used to lan the campaign, but may also be used to report on the TWITTERTM bounce from blog activity or the FACBB(X)KTM bounce from blog activity, for example,
  • the system may enable campaign planning (e.g., domestic, international, multi-platform, multi-network, etc.) where language is not a required first limitation.
  • campaign planning e.g., domestic, international, multi-platform, multi-network, etc.
  • the system may enable campaign planning in marketing, such as, for consumer goods, media and entertainment, movie marketing, video games, social games, music, international product launches, talent agencies, public diplomacy, public health, political campaigns, and the like.
  • Campaigns may be: tracked, such as with a chronotope -analysis, as will be further described herein* to determine a pattern that exists In time and space determined by combining temporal and. network features in toe analysis of the segments/clusters.
  • the system may marry internal reporting with other reporting tools such as splash, .resonance, clicks, -transactions, and the like.
  • the system enables analysis and prediction, such as in the financial industry (e.g., market predictions and trading positions), social media firms whose value is built around prediction, and the like,
  • third party data, and clusters may be used with the mapping techniques described herein.
  • models may be built on on or more clusters using tools that can. be accessed across clusters.
  • a social media map and network segmentation may be constructed via clustering of data from a single user's social media community.
  • FIG. 23 a user flow for becoming a user and interacting with, a map is. depicted. Starting from logical block 2300, processing flow proceeds to a login -screen at logical block 2302 where users may log in, such- as via a social media authorisation. If the user is a ne user, the user is sent to a sign up .
  • processing flo proceeds to logical block 2308 to cheek a wait list status. If the user is a beta user, processing flow proceeds t logical Mock 2310 where it is determined if the login is a first login. If so, processing flow proceeds to logical block 2312 where a tear may be taken.
  • processing flow may proceed to logical block 231.8 where a map overview is presented, including a competitive overview, a text description, a cluster power, and the like, if ' the user is not a beta user, processing flow may proceed to- ' logical block 2314, where the delta since last visit is presented, including new followers, recent activit with map indicators, and the like. Processing flow may then proceed to logical block 231.8. From logical block 23 i 8, processing flow may proceed back to logical block 231 if recent activity is requested again.
  • processing flow may proceed to Iogical block 2320 to obtain a cluster overview., including local competitive performance, mflueo ' cets, conversation, images, videos, recent tweets, and the like, if the user chooses to delve into the entire interactive map, processing flow may proceed, to logical block .2322 for ci.usfcer.map navigation.
  • Processing flow may alternatively proceed to logical block 2324 from logical block 2320 where the user may take action, in an alternative embodiment, processing flow may first proceed to logical block 2328 where the user may first view full lists, and then processing flow- may proceed to logical block 2324 where, only actions that are relevant to the list being reviewed are dlsplayed at logical block 2324. From logical block 2324, the user may choose to build a network, save one or more clusters as a list, move a message, engage with content, or the like. If choosing to build a network, processing flow may proceed to logical block 2330, where the user is prompted to make a list of influe cers.
  • logical block 2332 From there, user details may be entered at logical block 2332, and then actions such as engaging one of the users ' make current logical block 2334 or a follow action may he taken at logical block 2338. From logical block 2330, a follow list may be generated at iogical block 2340, or the current view may be saved as a TwitterTM list or some other social media list at: logical block.2342.
  • processing flow may proceed to save the current view as a TwitterTM list or some other ' social media l ist at logical block 2342, If the move- message action is selected, a l is t of followers ma be made at logical block 2344 and from, there the current view may be saved as a TwitterTM list or some other social media list at logical block 2342, or a message may be composed .at logical, block 2348 which may include content and context and the message, if engage with content is chosen at logical block 2324, processing flow may proceed to logical, block 2358, where a list of content, such as U RLs, key content and media, may be made. Users ma choose to.
  • a list of content such as U RLs, key content and media
  • logical block 2332 screen content details at logical block 2332 after which processing flow may proceed .to. logical block 2360 where a word tweet is generated, logical block 2358, where a re weet is generated, or logical block 2354 where tweets by i fluenced who tweeted the content are found and then potentially re-tweeted at logical block 2358.
  • clustering techniques may need to be modified.
  • some set of node pay attention to some set of targets and the nodes get clustered based on the targets they pay attention to.
  • a very large number of nodes pa attention to a very large number of targets.
  • the number of operations scales at least polynomial!' (e.g., the cube of the number of nodes). For example, .for 10,000 nodes the number of operations is in the billions.
  • computing power may need to be augmented,
  • attentive gravity may be used to scale up the size of the social media maps.
  • Nodes pa attention to targets (input data) an object may be created where nodes are not discretely assigned to a cluster but are drawn to ' different poles, such as ideological, thematic, or topical poles. Depending on which nodes a target pays attention to, it can be drawn to one pole, another pole, or the middle.
  • an attentive gravity map may have poles where the nodes are distributed based on how close they are to each pole.
  • a node may have ' a set of scores which represent a gravitational coefficient for each of the poles of gravity.
  • the gravitational coefficient may be used with other visualizations in order to modify the size, color, or opacit of the cluster representation based on the attentive gravity toward a pole.
  • the gravitational: coefficient may simply be used as a metric on" the cluster map previously described herein. The gravitational coefficient provides the degree to which a node matches a segmentation (e.g., a sports weight and a parenting weight for the same node, rather than just sorting the nodes into different eitrsters/segmentations and throwing oat the relationship to other clusters or segmentations),
  • CI usters themselves may not really be definitive. For example, a node might not be in just one cluster. Such characteristics may be reflected in: mapping technologies.
  • One technique may be a Discrimination Function,
  • 1 ,000,000 nodes ma be clustered,.
  • An initial condition may be a seed attentive clusterin for a small number of nodes, such as 10,000.
  • the eentroids of the clusters are used to assign values to the othe clusters (the X, Y average of the dots). For example, It can be determined if a new node is closer to the centroid of one cluster or of another.
  • this technique applies to nodes 10,00 . 1 th oug J ,000,000.
  • Another technique ma be to iteratively cluster the 1,000,000 . nodes in batches of 10,000. Then, the CFl scores of those clusters .may be used to cluster like clusters - with each other. The clusters Way be combined at a meta-cluster level. To make that work well, how similar some clusters are may need to be tracked across large groups of sub-clusters to see which ones are idiosyncratic and should standalone versus ones that are somewhat consistent and should be joined.
  • St may be desired to reduce the scale of the map to just those actors connected at a mesoscale while- eliminating actors who are not really active members of the network and are just "star" followers.
  • An influence Network Discovery method may be used to reduce ver large networks to their most influential core communities and obtain a sub-graph of maximally connected sub-actors.
  • a variable may be assigned , to each member of the network, where K «sw relates to a minimum connectedness, or the number of other nodes in the network to which tiie individual is connected, (e.g., a known measure of connectedness in networks).
  • K. «*r value is to restrict the network, by : K. «*r value.
  • a network may be restricted to only those with a _ of five and up, that is, only those people connected to at .least five other people.
  • Another way to reduce the network may be done teraiively.
  • a network of people surrounding the Democratic Part may be reduced iteratively.
  • inactive members and members with few followers may be eliminated.
  • certain network members, such as public figures or those who have a lot of followers may be removed temporarily fro the network and reserved in a "keep' 5 set.
  • the remaining network may be examined and refined by K*. .
  • members of tiie network with a Ksssv of one are removed from the network.
  • Removal f these people from the network may change the Kcw values for the .remaining members of the network.
  • the process iterates, removing those network members with the lowest Kt values.
  • the process can iterate until a. specified number of network members is obtained.
  • a t this poin t, any members in the keep set. may be added back to the network.
  • a « ⁇ * of the kee set members ma be done and limited to the node threshold. Based on the follo patterns of the members retained in the map, they may be assigned to a cluster.
  • a delta report may be provided to examine the evolution, of a cluster ma and capture the most sa.iie.ti t point of change in the last interval.
  • the delta report may identify which clusters have grown, which sites are being targeted more by clusters now than before, which topics are being discussed more now than before, which clusters are more active than before, and the like.
  • the delta report may be provided on a periodic basis, such as weekly, monthly, and the like.
  • -Generating the delta report may involve reporting which CFI ' scores changed the most and which clusters .are more; active than before:.
  • Delta reports may be enabled by organisation into a self-updating database with time snapshots, A delta report may be useful in customizing a stream of content For example, a stream of new objects of interest for clusters in the rasp can be provided as a delta report and feed to a user.
  • a self-service tool may be designed to let users access the system, and initiate generation of a social media map.
  • a user may log in to the system or, in embodiments, to a social network or other third party website, in order to initiate the map creation process.
  • a hot may be spawned that harvests data and maps the data to clusters. The hot may further provide cluster labels and CFI .scores.
  • the output may be a. social media map data object with CFi scores.
  • the self-service too! may enable user browsing of cluster and the map, tagging nodes, grouping and labeling clusters, and the like; In an embodi merit, a machine learning labeler may suggest, cluster labels.
  • the user-generated labels may be led into the machine learning facility used to label clusters for the social media maps.
  • the focus of the self-service tool may be on actions that strategically build a user's network, and strategically message to components of the network, CFIs can be used to determine a similarity among maps so that an existing social medi map that is similar to- the se!f-serviee map may be recommended for review.
  • Social media maps may be used to enable users to strategically message components of their network.
  • a social media map may be created for the TwitterTM followers of a live entertainment company.
  • Certain clusters relate to dense communities around particular st re or particular genres, of music. For the live entertainment compan , there are relativel few messages- that they transmit that everyone in the ma cares about; however, using social media maps, clustering enables more discrete message targeting.
  • CFi scores may be used to limit the messaging in order to maximize the impact on the cluster.
  • Such discrete targeting may he particularly useful in the case where direct messaging to followers may be limited.
  • Social media maps may be used to enable/users to strategically build their network.
  • the country music cluster may be growing in size.
  • the social media map- may ' be used to identify nich influential nodes for the country music cluster, such as by using- segment CFI data to maximize connections with, targeted segments/key influences. Then, the user can start following those influential node in hopes that they will, follow back. Such a process may help build the network in a desired strategic direction. Users may be able to see how they are doing against competitors for any given, segment by examining the proportion ofitiflueneers (high CP! target), who may or may not be in the map, following them versus others.
  • social media maps ma he organized and navigated as a map of maps, where each map appears as a node on a larger map.
  • the strength of the 'Connection between maps is the ma mum of ratios of how- many nodes a one map versos another map.
  • an indication may be given when a cluster in one map is very similar to another cluster in another map that may or may not be accessible by the user, for example, if one map relates to diabetes and another relates to obesity, a common cluster may be groups actively modifying .lifestyles to avoid both pathologies; in embodiments, the system may provide an- interface from the search screen with which the user may purchase the map they do not currently have access to.
  • user segmentation may be used to find segments for targeting as customers
  • Maps may be automatically generated ' for the target customer and conversion rates ' to paying customers may be tracked.
  • Described herein is a system for examining social media phenomena, such as hashtags, aftd how they spread- i a network.
  • Patterns of spreading may include salience, commitment, or a combination thereof termed resonant salience, where there is a burst of activity followed by a sustained commitment, or resonance, pattern.
  • chronotopes i.e., patterns that exist in time and space
  • a timeline view may he used to examine messages across clusters, The timeline may include the chronotope as the drill down.
  • a primary timeline may be organized in rows by grouping of clusters (e.g., similar dusters are assigned together into a group). There may be several bands for groups (e.g., things for which there is a C i score).
  • the timeline may " be examined for objects of interest that have very high CFl scores at some point.
  • One example may be hash tag in a Twitter network
  • a dot may be placed at the point in time when the activity (attention) peaked (had the most citations, re-tweets, etc.) for that object of interest.
  • a dot may be placed in the macro timeline for the group (showing the peak points of all objects of in terest) where the peaks were for each group (a group corresponds to a band below the main timeline).
  • the chronotope for that object of interest may appear in a window below the timeline.
  • the timeline view may include time on the X axis and groups/clusters on the Y axis. Peak interest points for objects may appear as dots at points in time corresponding to the groups tha have interest. Clicking on that object reveals the chronotope for that object for all of those groups.
  • ⁇ 234 ⁇ interacting with data in the chronotope view may reveal what the- object of interest is.
  • a group of Herns may he selected at a time period for a certain cluster/group and, a word cloud or semantic analysis of proper nouns that appear in those Items may be assembled,
  • Social media sites enable users to engage in the spread of contagious phenomena: everything ftom iftfoimalion. and rum to social movements and virally marketed products.
  • Twitter ' TM has been observed to function as a platform for political discourse, allowing political movements -to spread their message and engage supporters, and also as a platform for information diffusion, ⁇ allowing everyone from mass media to citizens to reach a wide audience with a critical piece of news.
  • Described herein is a system for classifying contagious- phenomena based on the properties of their propagation dynamics, by combining temporal and neiwork fea tures.
  • Methods and systems described herein are designed to explore the propagation of contagious, hashtags. in two dimensions: their dynamics, that Is, the properties of the time series of the contagious phenomena, and their dispersion, that is, the distribution of the contagious phenomena across communities within a. population of interest.
  • Further described is a method for simultaneously ⁇ visualizing both the dynamics and dispersion of particular contagions phenomena. Using this method, particular contagious phenomeno ehronotopcs, or persistent patterns across time and network structure, may help emerge a taxonomy for contagious phenomena in general.
  • measure for characterizing contagious phenomena propagating on networks may include peakedness, commitment (such as by subsequent uses and time range), and dispersion (includin ⁇ normalized concentration and cohesion).
  • a peak may be defined as a day-long period where total first, mentions by day lies two-standard deviations above the median first .mentions. The specific duration of the peak window and the required deviation can be varied to .maximize issefuiness for particular kinds of phenomena and for particular social media networks.
  • Median may he used instead of mean because, du to the skewed distribution, of first mentions by day for most contagious phenomena, the mean i over-inflated.
  • Contagious phenomena with snort lifespans tend to have a sharp peak, when a large number of people mention, the phenomenon, but the number of mentions is very s all on, either si de of the peak, In. contrast, long-lifespan contagions phenomena tend to grow slowly, with a less pronounced peak of mentions.
  • the peakedness of a contagious phenomenon is the fraction of ail engagements with mat phenomenon, that occur on the da with the most engagements with that phenomenon.
  • a high peakedness means that most of the network's engagement with the phenomenon (e.g., for a social network, people in the network mentioning it) occurs within a short span of time, typically, hours to days.
  • a low peakedness means that the network's engagement with the phenomenon is spread over a long period of time, typically, weeks to months.
  • Phenomena with high peakedness. such, as news stories, may propagate rapidly through the ' network and then dissipate just as rapidly in the course of the daily news cycle.
  • Phenomena with .low peakedness may include popular websites and videos, which may maintain a slow but steady rate of engagement—individuals in the network are constantly discovering these phenomena, even as others get tired of them and stop engaging.
  • Commitment is the measure of the average scope of engagement with a particular contagious phenomenon by nodes in the network, or the stayin power of a phenomena.
  • the commitment with a particular piece of online con tent can be the a verage scope of mentions of that content by pieces of the network. This- measure would, for example, differentiate between a ' political movement that is just a fed, and one that .accumulates a number of diehard supporters who keep the movement alive.
  • Scope may be measured in at least two ways, which leads to the following two sub- measures: Commitment b Subsequent Uses and. Commitment ' by Time Range.
  • the cost in terms of time and effort to mention something for the second or third or tenth time is relati vely ' small; therefore, for ' a second dimension, two quantities may be defined: first, the average numbe of subsequent mentions (all mentions excluding the first mention of the phenomenon by user) ' -of a contagious phenomenon among the adopting users; and second, the average time difference, (in days) between first: and last mention of the phenomenon among the adopting users.
  • the second measure indicates long-term commitment to mentioning the phenomenon b a set of users.
  • Commit ent by Subsequent Uses is the average number of subsequent engagements with a phenomenon after a node's , first engagement. For instance, if each person, in a social network played an online game at most once. Commitment by Subsequent Uses for that story would be zero. In contrast, if just one percent of the people in a social network played an online game thirt times each, Commitment by Subsequen Uses for that game would, be twenty-nine. Phenomena with high Commitment by Subsequent Uses may include online games, which encourage repeat engagements. Other phenomena with high Commitment by Subsequent Uses may include astro- turfed content, where a third party may encourage repeated interest in the content by paying or otherwise endorsing people who engage with it.
  • Commitment by Time Range is the average time period between the first and last engagement with a. phenomenon by nodes in the network, measured over some large time window (e «g vie a year). For example, if each person in a social network read articles on a biog ten times o ver the course of one day and never visited it again. Commitment by Time Range for that b!og would be one day. However, if just one percent of the people in a social network read articles on a blog once every week for ten weeks and then abandoned it, Commitment by Time Range for that blog would be ten weeks. Phenomena with high Commitment by Time Range include blogs with loyal followers who keep coming back for more content. Phenomena with, low commitment by Time Range include news stories that, on average, a person reads only once and never sees again.
  • Dispersion is a. measure of the distribution of ehgageriterits with a contagious phenomenon over the network through which it propagates. Phenomena that, are highly dispersed are broadly popular but may have less focused engagement from a particular group; phenomena that are not dispersed are not broadly popular, but may have focused engagement with a particular group. There are many ways of measuring the distribution of engagements with a phenomenon over a network, including the following two sub-measures: Normalized Concentration and Cohesion.
  • the Normalized Concentration of a contagious phenomenon presupposes a partition of the underlying network into discrete clusters, which usually represent communities. Given such a partition, the Normalized Concentration of contagious phenomeno is the fraction of all engagements, that come from, the cluster mat engages most with die phenomenon, or the Majority Cluster. For instance, if a social network were divided into two clusters, one of which engaged with a particular news story nine times, and the other, only once, the Normalised -Concentration for thai phenomenon would be 0,9.
  • Phenomena with- high Normalized Concentration tend to fee the cause celebre of a particular community, e.g., political and ' social movements thai have not gained wide traction.
  • Phenomena with low Normalized Concentration may include headline news stories that touch many communities at once.. Depending on the size of individual communities. Concentration may or may not correlate inversely with popularity.
  • a measure of Cohesion may fee defined as the network density over the subgraph on all users engaged in a particular contagious phenomenon- Contagious phenomena that spread over strongly connected sets of users will have a Cohesion close to one, whereas phenomena that spread over weakly connected, sets of users will, have a Cohesion close to zero.
  • the Cohesion of a contagious phenomenon is the network density of the sub-graph of all nodes engaging with, the phenomenon.
  • the network density of a graph is the total number of actual connections between nodes in the graph divided by the total possible number of connections (usually n*in-l.)/2 for undirected graphs, wher is the number of nodes in. the graph).
  • Phenomena with low Cohesion include news and rumors that move between acquaintances, such that, for example, after multiple propagations, the person who hears the rumor and the person who started it ma fee total strangers.
  • phenomena with high- Peakedness- tend to have l.ow ' Commitment, making those- two measures a natural pair for comparing different online phenomena.
  • PIG. 18 depicts Commitment by Time Range on the Y axis and Peakedness on the X axis for two -different sets of data depicted by different icons.
  • the two datasets are: i .) 1 .12 Bundled hashtags relating to specific topics shown in red or a icon #1 and 2.) a baseline dataset of the top 50 ⁇ hashtags for all users sho wn in black or as icon #2,
  • the bundled, hashtags display a generally lower level of Commitment by Time Range than the top 500 hashtags at the same level of Peakedness.
  • ome of the top 500 hashtags ' have extreme levels- of Commitment, up to 150 days.
  • H mh g with the highest levels ' ' of Commitment a e of several sorts, which notably Include regional-location tags, tags for particular sports, religion tags: ⁇ e,g, s . "Catholic," "Jewish”), tags, for particular news outlets, and general tags related to investing and financial markets, intuitively, all. of these are topics that might engage « stable set of users Over a long time.
  • hashtags with tow Peafedness and low Commitment by Subsequent Uses are generally not -very . popular. Some of them are very generic- ⁇ moscow* #rnetro ' ), and some just never had a peak nor became adopted by a committed user base. Som of these are tags that are similar to popular tags, but reflect less-used variations.
  • hashtags with low Peakedness and high Commitment by Subsequent Uses are all.
  • regional hashtags (with the exception of the Nashi hashtag that refers to a pro-government political youth movement in Russia), These regional hashtags were tangential ly related. to; the forest fire events, but their main use is likely- in talking about local affairs, hence the high commitment of a few users.
  • #Putinoirt in particular lias relati vel long temporal staying power (an average of 50 days between first and last mention by a • user in the dataset) but relativel short staying power by mentions (an. average of less, than six subsequent mentions).
  • measures of dispersion of hashiags are analyzed across a core set of TwitterTM users.
  • the distribution across nine topics of Normalized Concentration are plotted by hashtag within each topic. Comparing across all nine topics enables distinctive patterns to emerge; the .minimum Concentration among pro-government hashiags in the Seliger and modernization topics is between 0.3 and OA
  • the maximum Concentration among opposition hashiags in the Eashin and Russian Drivers' Movement topics is between OA and 0.5.
  • Pro-government hashtags are on the whole more concentrated within one cluster man opposition hashtags.
  • Hashtags r lated to news events tend to be diffuse, which is in line with the intuition that major news events tend to engage the population as a . whole rather than specific communities.
  • 0249) in FIG. 1 the distribution across nine topics of Cohesion are plotted by hashtag within each topic. For ease of visualizing, the distribution plots-are cut off at 0.2 and ail hashtags with Cohesion >0.2 are assigned a value of 0,2.
  • FIGS. 18 through 2.1 provide a ..high-level analysis of hashtag diffusion among the Rmsi aft-speaking TwitterTM ' community, both from the temporal and the spatial (network) perspective. However, this analysis necessarily leaves out the idiosyncrasies of individual hashtags.
  • FIG. 22a, FIG. 22k and: FIG, 22c, ehroooiopes of the #metro2 (a), • #saraara .(b), and #IRu (e) hashiags are depicted. In typical chronotope images, color indicates cluster group, and color brightness indicates volume of engagements..
  • FIG. 22 shows three such visualizations: the #rnetro2 hashtag related to the Moscow Metro bombings on Mar. 29, 201:0; the ⁇ sama hashtag related to the Russian city of Samara; and the #iRu hashtag, related to President Dmitri ' Medvedev's policy of modernizing Russia.
  • These three visualizations display three distinctive patterns across space and time; #inetro29, in FIG, 22a has a "salience" chronotope, with engagements across the spectrum of cluster groups during the week around March 29. In -contrast, #samara in FIG.
  • FIG. 22b has a "resonance"- chronoiope, with consistent engagements from the local cluster group, presumably residents of Samara talking about their city.
  • #il u in FIG, 22c has a "resonant salience” ehronotope, with an initial cross-group burst of activity in late November 2010 (around the time of Medvedev's announcement of his new policies), followed by consistent engagements from the Pro- Government cluster group over the next month.
  • FIG. 22 does not contrast with FIG. 1% which suggests that pro-government hashiags have low staying power, but instead presents a more subtl picture; the cluster group of pro-government users remains active in the #iRu hashtag over the course of a month, but, as FIG. 1 b indicates, individuals within that cluster rarely carr on with adoptions for more than 5 days. There may he/a high turnover of users of the #iR « hashtag, with new enthusiasts comin in even as the original adopters lose interest in the topic.
  • a flexible algorithm may be used- for optimizing a targeted network influence campaign. For example, a user may have a high CFI score, but the may not message their social networks f equently, thus targeting: these individuals may not optimize the campaign.
  • the algorithm may output an M Score, which may be calculated from a CFI score plus some other network or behavioral metric.
  • the score may instead be used to m ximize campaign e f ctiven ss;
  • the M score may be an interpolation of the number -of followers of the target item (influence) and the CFI score of the target item (specificity). This mathematical, calculation may result in a normalized score OR a scale, such as a scale from I to 1 where 1 is low impact and 10 is high impact
  • the M score is a general measure of influence and specificity.
  • the M ⁇ seore may be user-tunable, so that there is a choice to prioritize "segment speeificity" vs. "global footprint” and/or "network position” vs. "behavioral profile” (e.g., someone who retweets frequently) when selecting; behavioral and/or network metrics to calculate the M score.
  • a slider 2902 may be. provided to users so that can select a target thai is more niche or more global.
  • the M score enables optimizing a campaign on network position or on behavior, if the slider is dragged towards "niche,” alpha approaches zero and the M score is near equivalent to just the CFI score of the target item (high specificity). If the slider is dragged towards "broad,” alpha .approaches 1 so that the M score is near equivalent ' to just the number of followers of the target item (high influence), Setting the slider somewhere in between ' "niche” and "broad ' ' allows users to tune the set of indi vMuais/enti ies that they want to target.
  • direct ad placement may be enabled by CFI seores/M scores.
  • CFI scores and/or M scores a list of targets/website may be created and ads may be placed directly on the target/website via integration with, various products, such as TwitterTM sponsored tweets. FacebookTM ad. exchange, GoogleTM AdSense Adwords, third party online ad networks, and the like.
  • a recent activity page of a social media map platform provides recent acti ity, such as new followers, new infiueneers following the user, an indication of any retweets including the number of people who have reiweeied an item, changes to the user's cluster groups with links to respective group overview screens, a list of new influenced including their cluster group and their number of followers, the current conversation leaders including their cluster group and their number of followers, a view of all media being shared in the network including the latest influential media and the segments in which the media is. influential, links to an overview page, links to a lists page, links to a help and support page, and the like. The user may continue to their map from thi screen.
  • Graphics such as a bar graph, may be included in the changes to the user cluster groups box to indicate the number of users in each cluster group. Graphics, such as a bubble char may also be Included in the media box to indicate the size of the segments in which the displayed latest: media is influential
  • FIG. 25 another example of a recent activit page of a social media map platform is shown,
  • new followers- are shown; .included in the number of followers are new iniiuencers and group changes, including a percent change for each cluster group, information on ne mfiuencers, such as their name, handle, number of tweets, number of followers, number of people they -are following, and a button to message, thera or follow them.
  • trendi ng terms/URLs are also on this page.
  • the overview page includes a table of cluster groups, the number of .members in the group, the power of the cluster, and the tweet activity,
  • a power score is an indication of which segment is worth engaging with and may be an indication of which segments are most dense and represent the greatest signal of interest.
  • . power may be calculated based on network density: the number of connections divided by the number of possible connections.
  • * power is calculated based on coordinates, such as the average distance from the center of a cluster map.
  • power may be calculated as the average distance from the eentroid of the cluster that emerges in the clustering computation.
  • power is like the segment cluster version of the M score,
  • an individual cluster may be selected and a representation of that cluster in a map may be highlighted.
  • the UK design cluster has been, highlighted and a dialog box appears showing more information about the individual group, including number of members and graphics depicting the power and tweet activity associated with the group.
  • a box may appear with more information.
  • the map and group information items may remain visible when the page scrolls such that they are in a fixed position.. Selecting clearer on the page overview causes the selected row to be cleared and makes all map nodes visible. An alarm icon on the overview page allows the user to review all recent activity including number of tweets from various members of the network.
  • FIG. 27 a full-screen map is displayed, in ' this map, the international cluster has been selected and the South America sub-cluster was selected.
  • the colored nodes in the map may Indicate one or both of the selected clusters and sub-clusters.
  • the mfiuencers In a particular sub-cluster may be viewed and when an infl ' uenceris selected, the URLs associated with that infiuencer there may be shown, A node overview may appear including the infiuencer name, their handle, their location, their URL, when they joined the social network, their number of tweets, their number of followers, the.
  • a segment or cluster has b&exi selected and data regarding that segment is displayed, such as key influencets, current conversation leaders (mentions), an interactive map, key photos and. videos or other -.media, key tweeis/tetweets, key websites, key content,, latest conversation terras, and the like.
  • this page shows an enhanced version of cluster-focused data and makes it more accessible,
  • the power score for the segment is displayed as well as an icon from which, the user may take certain actions such as build their network, find content, find media, find tweets, message followers, launch a TwitterTM campaign, launch a FacebookTM campaign, launch a. mobile campaign, launch a social media campaign, launch an AdWords campaign, launch an advertisement campaign, and the like.
  • the overview page may be a user interface, " Notifications • of certain data and data presentation may be made in the user interface, for example, which may be implemented by software and embodied in. a tangible medium, such as a mobile device, smartphone, tablet computer, or the like.
  • the use interface may be a touchscreen embodiment, such that to utilize the user interlace, a user is required to touch the screen of the device displaying the user interface.
  • the user interface may be accessible on different computing devices and capable of dynamically accessing user specific data stored on a network server and/or local dev ice. 10263] Referring now to FIG. 29-, the "infiueneers" tab has been invoked.
  • Various ways to filter the inftuencers are provided such as by follower status (all followers, follows the user, does not follow the user) or by .following status (show all, the user fallows, the user does not follow).
  • Another way to filter mfiuencers may be by M score, follower count, mentions, name, screen name, .
  • One way to filter by M score is by use of a slider 2902 to obtain more niche or broader -individuals/entities as. described elsewhere herein.
  • Another way to ' filter individuals entities may be by their exposure to particular content. By utilizing this titer, the user may target individuals entities wh have not already been exposed to the content, Users may take action from this page such as to follow selected individuals entities, save individuals/entities to a TwitterTM list, create. new list, add a selection to a list, send a direct message, send a sponsored tweet, and the like.
  • a dialog box may appear with list choices for the user, such as a list for my mfiuencers following me, a list for my !nilueneers and not following me, a branding group, and. the like.
  • one action • being -taken is to follow seven new users.
  • the users network may potentially expand to include the newly followed individuals/entities.
  • Another action that is taken is to compose a message.
  • the compose message screen may include suggested content such as .
  • the suggested content may be filtered by the exposu re of target individuals entities- to the content
  • Data related to the content such as its peakedtiess, first appearance, and the like may be exposed to the user so that the user can decide whether it makes sense to share the content with other indlviduais/eniittes.
  • users n3 ⁇ 4ay be able to drill down to the individual infiueneer level to see in what other segments/clusters the individual is influential, their latest tweets, M score, number of tweets, number of followers, number following, footprint,, following/follower status with respect to the user, demographic information, URL, and the like, icons may be available to follow, act (i.e., add the person to a list, retweet their latest tweet, send a direct message, etc.), view a .social media profil , and the like,
  • a tab for conversation leader is displayed.
  • Various ways to filter the conversation leaders are provided such as by follower status (all followers, follows the user, does not follow the user) or by following tatus (show ail, the user follows, the user does not follow).
  • Another way to filter conversation, leaders is by peak, date such as all, today, past week, pas month, custom date range, and the like,
  • Another wa to filter conversation leaders may be by score, follower count, mentions, peak, peakedness, name, screen name, and the like.
  • Another way to filter conversation leaders may be by their exposur to particular content.
  • the user may target individuals/entities who have not already been exposed to the content. Users: -may take action from this page such as to follow selected individuals/entities, save individuals/entities to a TwitterTM list, create a new list, add a selection to a list, send a direct message, send a sponsored tweet, and the like.
  • tweets tab is displayed.
  • the tweets may be filtered by peak date such as all today, past week, past month, custom date range, and the like.
  • Th tweets may ⁇ be filtered by M score, re-tweets, original postdate, peak, peakedness,. name of poster, screen name o f poster, and the like.
  • M score One way to filter by M score is by use of a slider to obtain an audience that is more-niche or broader, as described elsewhere herein.
  • Data regarding each displayed may include an M score the number of influential re-tweets, the number of retweet, the posted date, the peak date, a graphic of the peak, pattern, icons with which to take action such as reply/retweet/favorite, name, screen, name, and the like.
  • Selecting one of the tweets may cause a drill down box to -appear with additional information about the ' individual/entity who made the tweet, such as M score, number of tweets, number of follo ers, number following, footprint, number of friends, follower/following status, demographic data, URL, which segments the individual/entity is ret eetmg in, who have they been retweeted by, icons to social media profiles, icon with which to take actions such as reply/ ' re-tweet3 ⁇ 4vo.rlte/add to list, and the like.
  • a websites tab is displayed.
  • the websites can be sorted by mentions, M score, subpages mentioned, hostname, and the like,
  • information about the website in the drill down box may include M score,, distinct mentions, mentions, subpages mentioned, excerpt, peak date, a graphic of the peak pattern, segments clusters the website is mentioned in, who mentioned the website, latest tweets ' mentioning this URL., a button to take action, and the like.
  • a tab for key content may be displayed, information about the ke content included in this view ncl des the name of the website, name of an article, URL, peak date, a peak pattern, M score, citations, distinct citations, and the like.
  • the key content may be sorted by score, citations, peak, peakedness, host name, content title and the like * One way to filter by score is by nse of a slider to obtain an audience that is more niche or broader, as described elsewhere herein.
  • the key content may be filtered, by peak date such as all, today, past week, past month, custom date range, and the like.
  • Users may take action from this page such a to compose a message, compose a tweet, view a drill down box for the key content, and the like.
  • compose message or compose Tweet view users may be able to select one or more individuals/entities or and. influeneers conversation leaders to message with suggested, content (most used hashtags, popular terms, key content, etc,)-
  • the individuate/entities may be part of a list such that either certain members of the list or the entire list may be easily included as recipients of the message. Selecting a. key content reveals a drill down box for the content.
  • Information about the content in the drill down bo may include name of website, title of article, M score, distmetmentions, mentions, subpages mentioned, excerpt, peak date, a graphic of the peak pattern, segments/clusters the content is mentioned in, who mentioned the content, latest tweets mentioning this URL, most used, hashtags, a button, to lake action (tweet this, use in direct message, add list, etc.), and. the like,
  • Medi may be filtered by images, videos, audio, GIFs, and the like.
  • the media may be filtered by peak date such as all, today, past week, past month, custom date range, and the like.
  • the media may be sorted by M score, citations, peak, peakedness, host name, content title and the like.
  • Information about the media in this view may include title, duration, media type, score, mentions, distinct mentions, peak date, peak pattern, and the like.
  • a drill down box may appear, information in the drill down box may isekfde itle of media, URL, M score, mentions, distinct mentions, peak date, peak pattern, media type, duration, what segments/clusters the media is mentioned in, most used hashtags, who has mentioned the media, latest tweets mentioning this media, an icon to take action with, and the like,
  • a tab for terras is displayed The terms may be filtered by hash tags, one word, 2 words, 3 words, and the like.
  • the temis may he filtered by peak date such as all, today, past week, ast month, custom date range, and the like.
  • the terms may be sorted by M score, citations, peak, peakedness, host name, content title and the like, information, about terms In the list may include the term, peak date, peak pattern, M score, mentions, distinct mentions, and the like, Selecting a term may reveal a drill down, box where additional information out the term may be displayed Including which segments/clusters the term has been mentioned in frequently, what other terras have been mentioned with the selected terra, who has mentioned the ierm, latest tweets mentioning this terra, an icon to take action with, and the like.
  • an analytical framework for a coordinated campaign identification Includes proposing a framework fo analyzing fabricated social movements. In many embodiments, not only Is there the ability to distinguish these movements from truly organic ones, there is also the ability to create a formal method for studying patterns of fabricated, pseudo- grassroots (also, "astToturf") collective action.
  • any .suc collective action may be required to gi ve the impression of a large group of people coalescing around a movement that is easy to describe and share with others. I the group is not well-connected enough,: then it may be iogisticall difficult for any actor to organize the group's online behavior. If the group is not actin In temporal lockstep, ' then its ' message may not achieve a high frequency. In ' embodiments, low-frequency messages do not appear as global trends: for example.
  • Twitter ' s "trending" algorithm appears to identify topics that are popular now, rather than topics that have been popular for a while or on a daily basis, to help yon discover the hottest emerging topics of discussion on ' TwitterTM.
  • the many examples remain applicable to the ' myriad social platforms.
  • the framework operates on three levels: 1.) Event, the level of an entire social campaign; 2.) Segment, the level of a community of users participating in a social media campaign (e,g., Russian, social media troll accounts),, and 3, ⁇ ) Actor., the level of an individual user participating in a social media campaign.
  • Event the level of an entire social campaign
  • Segment the level of a community of users participating in a social media campaign (e,g., Russian, social media troll accounts),, and 3, ⁇ ) Actor., the level of an individual user participating in a social media campaign.
  • Table 1 shows examples of the three-dimensional analysis framework in more detail specifically, the signals relevant for particular., combinations of level and dimension. It will be appreciated in light of the disclosure that not every combination of level and dimension has corresponding relevant signals.
  • each signal in Table 1 a ove is mapped to a discrete metric in. Table 3. Further detail regarding key definitions for understanding these metrics, and any non-obvious activity metrics are provided herein.
  • the network dimension assumes thai actor participating in a campaign are connected to each other in a directed network 0 (i.e., a connection from, user a to user b does not imply the reverse).
  • Twitter 1 ⁇ followin networks are an example of directed networks: many people follow TwitterTM celebrities, but those celebrities do not follow their fans back as a general rifle. Other social media platforms ' and ' connected- platforms are applicable.
  • the unit of analysis is a "map," which may be a -collection of key social media accounts around a particular social context
  • a map may be composed of "nodes,* * which are the social medi accounts in question.
  • Each node may be connected to one or more nodes in the map through "edges" and edges may represent social relationships embedded in the respeciive social media platform (e.giller "following" for TwitterTM, FacebookTM, or the like).
  • each node in the map may belong to exactly one "segment" and one "group.”
  • a segment may be a collection of nodes with a shared pattern of interests, (e.g., a collection of TwitterTM accounts wh all follow US Tea Party politicians).
  • Bach segment may have a label (e.g., "Tea Party”).
  • a group may be a collection of segments with similar interest profiles (e.g., a collectio of "Tea Parry,” “Constitutional Conservatives "- etc. segments into a 'Conservative" group).
  • the process for generating segments, groups, labels, and colors for a map be fully or partially automated, as follows; a proprietary clustering algorithm may automatically generate segments and groups for a map; subsequently, the map-making process may use supervised machine learning, to generate label for segments and group from human- labeled examples.
  • a Subject Matter Expert an individual well-versed in the topic. and/or geographical area covered by the map, may perform a quality assurance check on the segment and group labels.
  • the example- consists of 100 users connected in a network C
  • the network G further breaks down into exactly two communities A and B, each with exactly one half of the total population.
  • the overall number of connections from members of A. to any other actor in the network is 500, while the number of connections from members of A to members ofB is 200.
  • the campaign proceeds over the course of ten days* and the first of those days features the highest level of campaign activity, with exactly one quarter of all actors participating.
  • metric is th degree to which a particular campaign is concentrated in one community versus diffused among many different communities.
  • the entrop of a campaign may he, as known in. the art, the information theoretic entropy of the distributioa of users active in the campaign among different communities, in the toy example, the Entropy of the -campaign mm fee:
  • H below 1.0 may be shown to represent hetero h ly, or lower-than-expected inierconneetivity between communities. Values of H equal to .1.0 may he shown to represent, the baseline ' random expectation. Values of H above 1.0 may be shown to represent honiophily, or higher-thaivexpected- lntereonnectivity.
  • H values are used for community pairs where there may be expected low / high values (e.g., ideologically separate ideologically aligned communities) in the same networked terrain as the case study as a baseline. .
  • Semantic diversity of a particular actor's / segments / campaign's messaging is based on the assignment of messages to topics.
  • LDA is a common method for identifying topics in text data.
  • a semantic diversity score may be calculated for the message set.
  • the authors of the referenced work may represent their measure of semantic diversity as the probability that two documents chosen from the corpus at random with replacement will be on the same topic.
  • the corpus may be the message set, and the documents -may be user Tweet histories, post histories, etc.
  • the LDA algorithm may run for 15 iterations, with a number of topics no less than 20% of the number of documents and no more than 30 iterations and may average semantic diversity over 20 distinct runs of the LDA algorithm, on the same corpus to smooth out variations due to the initial conditions for a particular run.
  • a topic may be assigned a distance score of 1000,
  • versions of £1 arc run. for individual users (OA), communities ( c), or entire campaigns ( ⁇ ), These metrics can also be run for all messages within particular time period ( ⁇ *) to calculate the change In semantic diversity over time.
  • Semantic diversity scores of less than one may represent user who exclusi vely post abou t the same topic, characteristic of fabricated campaigns. Seman ic diversity scores between 1 and 100 may represent users who post on a variety of topics, characteristic of normal human activity. Finally, semantic diversity scores above KM) may represent users who post on an extremely diverse set of topics, characteristic of sparahots or users who bridge different cultural and/or linguistic communities (e.g., users who post in. different, languages, etc,)
  • Campaign Peakedness may be defined as the fraction of all. activity that occurs, in the day with the most campaign-related activity during some time frame, in the toy example, P ⁇ 1 ⁇ 4 ⁇ 0.2$. Dynamic Time Warp Alignment! ) *
  • the Dynamic Time Warp is an algorithm ⁇ known in the art for comparing two temporal sequences of activity.
  • the Dynamic Time Warp may be used to compare the activities of individual users iPU) or entire segments (DS),
  • the Dynamic- Time Warp between two sequences SI and S2 is the number of warping tmnsformations that are require to change S : into S2.
  • Dy namic Time Warp may be used to identify hots and trolls in a different social media setting.
  • a framework of signals (or metrics) along at least three dimensions may be constructed and may include, without limitation:
  • fU295j A Network, dimension that may; for example, represent how accounts are connected;. 10296]
  • From this framework, a plurality of hypotheses may be derived for ""signals" exploring potentially hidden coordination on social media movements on a social media channel such a TwitterTM, FacebookTM or the like. The exploring potentially hidden: coordination on social media movements on a social media channel may occur at the level of the entire campaign (e.g.. nine signals), a uster level of the campaign (e.g., a set of well interwoven accounts), at the individual account level, and the like. In embodiments, the pluralit of hypotheses may include twenty-five or more such hypotheses.
  • Empirical evidence associated with these signals can be shown across a number of case studies of known coordinated (i.e., inorganic, eeotrally-eoniroSled) and spontaneous (i.e., organic, individually) campaigns.
  • three- of the campaign signals may systematically reveal coordination in social media movements on TwitterTM, Facebook.TM- and other platforms.
  • Some signals, either at the cluster or at the individual account level, ma facilitate campaign analysis, and some of them may be transformed into campaign- level signals.
  • Each campaign may include a set of "seeds" from a specified timeframe that may be, for example, a hashtag, a sentence shared in posts, a URL shared in posts, or the like.
  • clusters may be communities of users active within the campaign.
  • users ma be defined by their individual accounts* defined by their TwitterTM handle, FacebookTM identification defined by their user name on .other social media platforms, or the like,
  • Network Terrai - Campaigns may occur in a specific context referred to: as the "network terrain.”
  • the #BlackLi vesMatler movement may be better analyzed ' within its ''network terrain," which display s the US political conversation on rwitterTM, FacebookTM or other relevant social media platforms.
  • social media platforms like ' witterTM, FacebookTM may constitute a eyber-soeiai "network .terrain" formed by the relationships (such as following in TwitterTM, FacebookTM, or the like among actors.
  • the structure of the network or social media platform may determine who and what may be visible to whom, and thus it may be the social landscape on which the struggle for in.fiu.ence may occur.
  • the methods and systems may include analyzing case study campaigns across specific network terrain maps in order to understand the relationships between participant and the patterns of campaign propagation across specific online communities (e.g., clusters or clusters discovered using machine learning analysts ' of network relationships and the ilke).
  • fiBilJ Campaign versus Investigatory Signals - Signals measured at the cluster and individual actor (user) levels may facilitate investigating the inner workings, of ' specific campaigns, building a more qualitative understanding of how these campaigns unfolded, and helping form -campaign level metrics among other things,
  • the methods and systems may include testing signals set on a set of case studies and exemplary campaigns.
  • the investigatory 1 signals may operate at the cluster or at the individual level.
  • the investigatory signals may facilitate building a qualitative .understanding of the dynamics of a campaign and may provide tools to build campaign-level signals.
  • CJ indicates a signal operating at the cluster level
  • [ ⁇ ] indicates a signal is operating at the user level
  • a priority signal name is Concentration in Lead Cluster.
  • the concentration in lead cluster signal description Large-scale spontaneous campaigns may be more likely to engage participants fr m a range of different clusters, whereas coordinated campaigns are typical ly highly concentrated in a specific cluster of the network or social media platform.
  • the .concentration in lead cluster signal evaluates the degree to which m entire campaign's activity is concentrated in a particular cluster of participants.
  • the - concentration in lead cluster signal may measure by the traction of all campaign participants who are members of the most campaign-active el aster, in the network terrain map.
  • the range of score value range of the Concentration in lead cluster signal (metric) is zero to 100%.
  • the concentratio in lead cluster signal (metric) value is computed by determining the value of the concentration of the fraction, of a campaign's participants that are members of the most active community in the campaign. In an example including a 3-com.munity map, if 30 participants are from community A, 25 from community , and 25 fiom-conimunity C, -then t ie value of the concentration in lead cluster signal (metric) for the campaign . ' on this ma equals 30%.
  • possible values of the concentration in lead cluster signal (or metric) may be between .0 (i.e., not concentrated) and 1 0% (i.e., fully concentrated in 1 cluster).
  • the concentration in lead cluster signal (or metric) raay be consistent across a set of campaigns, which may cover a variety of geographies and dates. It will be appreciated in light of the disclosure thai coordinated campaigns, on average, may be shown to have larger values of the concentration, in lead cluster signal (or metric) than those of spontaneous campaigns. It will also be appreciated in light of the disclosure that there may be some overlap between the coordinated and spontaneous ranges due at least in part to a large number of sociocaliural setting and time periods in the data sets.
  • An exemplary range, of values of the concentration in lead cluster signal score for coordinated campaigns is 20% to.89%, The range here is the full range between the lowest value and the highest value for this category in the campaign.
  • An exemplary average value of the concentration in lead cluster signal for spontaneous (organic) campaigns is 22%.
  • An exemplary range of values of the concentration i lead cluster signal scor for spontaneous campaigns is 9% to 50%.
  • the performance of the concentration in lead cluster signal may be sensitive to: the speciiic terrain map being used because the signal (metric) may be less successful if the terrain map used only captures the active participant in a campaign.
  • the concentration in lead cluster signal (metric) may be more successful when, capturing the broader terrain in which the campaign under scrutiny unfolds,
  • the methods and systems described herein also include computing the value of the concentration in. lead cluster signal (or metric) using actions rather than users and .may measure what proportio of the total actions (TweetsTM or the like) in the campaign that came from the most active community. This approach can be shown to be . reliable because heavy posters (those who TweetTM o the like) may create skews in the measurements.
  • a priority signal name is Concentration via Entropy.
  • the concentration via entropy signs! (metric) ma be shown to be a useful signal for knowing if more than one community is driving a coordinated campaign, which could be missed relying on theconesntration i . lead cluster signal (metric) alone.
  • the concentration via entropy signal (metric) may calculate the concentration -of distribution among all clusters, la embodiments, coordinated campaigns generally tend to have values of the concentration via entrop signal (metric) that are less than 2.0.
  • the lowest score is ze ⁇ all participants belong to the same comm uni ty).
  • the highest, score depends on the number of communities active in the map,. Because the highest number of communities, in an exemplary case study map ma be 50, the highest entropy value in this example would be four (assuming a perfectly even distribution of participants amongst the 5(3 communities).
  • the concentration vi entropy signal (metric) may be. an entropy of the distribution of participants amon communities. In an example with a two-community map, the value of the Concentration vi Entropy signal would be 1.0 when 50 participants are from community A. .50 participants are from community B, and thus the distribution would be 0.5, 0,5.
  • c(i) is the count of participants in the ith cluster and p(c(i)) is the fraction of all participants coming from the ith. cluster
  • the concentration via entropy signal (metric) is based on a logarithm c scale, so a small difference in entropy belies a large difference in the uoevemiess of the underlying distribution, it will be appreciated in light of the disclosure that a very rough rule of thumb is that a difference of one point in the value of the concentration via entropy signal may be equivalent a change in concentration by a factor of three, so a campaign with the concentration via entropy signal equal to two is three times more concentrated, in a few clusters than, a campaig with the concentration via entropy signal thai is equal to three.
  • An .exemplary average value of the concentration via entropy signal for coordinated campaigns is 3L43.
  • An exemplary average range of values of the concentration via entropy signal for coordinated campaigns is 0.46 to 2.19.
  • An exemplary average value of the coricefctraiion via entropy signal for spontaneous campaigns is 2.52.
  • the concentration via -entropy signai may be useful to analyze "battleground campaigns" where a few clusters tight for - control over the social medi narrative, e.g., on a dedicated hashtag,. where these campaigns, may he concentrated in these few communities and simply using a measure focused on the lead community may miss this activity.
  • 0352j In embodiments, a priority signal name is Da Peakedness.
  • daypeakedness signal may detail the percentage of all activity that the busiest da of the campaign may represent.
  • the daypeakedness signai (metric) of a campaign is measured as the percentage of campaign actions (TweetsTM or the like) that take place on. the most active day of the campaign. It will, be appreciated in light of the disclosure that generally spontaneous campaigns appear to be more "bursty' * because, for example, spontaneous campaigns exhibit more of a peak (or more of a. number of peaks) than coordinated, campaigns,
  • the range of the values of the daypeakedness signal (metric) is 0% to
  • the value of the daypeakedness Signal is ' computed by determining the fraction of all activity that occurs- on the day with the most campaign-related activity. Examples include a campaign that proceeds over the course of ten days, and the first of those days features the highest level of campaign activity, with one-quarter of all actors participating:. In this example, the value of the daypeakedness signal (metric) is 25%,
  • metric can be shown to be consistent across campaigns despite the variety of geographies and dates.
  • Cm average may have a lower value of the daypeakedness signal (metric ⁇ than spontaneous campaigns, it will be appreciated in Sight of the disclosure that there may be some overlap between the coordinated and spontaneous ranges due to the large number of socloculiural settings and time periods in the campaign,
  • An. exemplary range of values of the daypeakedness signal for coordinated campaigns is 0.08 to 0.22.
  • An exemplary standard deviation of the ' value -of the daypeakedness' signal for coordinated campaigns is 0.05.
  • An exemplary average range of values of the daypeakedness signal for spontaneous campaigns is 0 to 0,71,
  • the daypeakedness signal (metric) may be: sensitive to daie oii «dary/iime ⁇ 2ones most notably when the campaign: is being analyzed only over the last few days.
  • the sensiti vity of ' the daypeakedness signal (metric) may be improved by allowing it to be less sensitive to time zones,
  • the peak time may be identified as- the median of time stamps of a dynamic phenomenon to be able to observe a logarithmic distribution of volume around the peak.
  • the methods and systems described herein may identify peaks as days when, volume exceeds two standard deviations above the median, and may calculate the value -of the daypeakedness signal as a fraction of ' all content, that occurred during a 24-hour period, it will be appreciated in light of the disclosure that the median volume may- be used instead of mean volume due in part to the .-observation that volume ' follows a skewed distribution, so the mean may not be an appropriate statistic to use to characterize it.
  • the measure of peakedness in the methods and systems described herein may be relatively less sophisticated and, therefore, may be easier to interpret while giving a good initial impression of the utility of the signal from a social media platform for identifying coordinate campaigns,
  • the value of the daypeaked ess signal may be affected by the overall time range of a. -campaign.
  • the value of the daypeakedness signal may not go below 33% but if the campaign lasts 1 days, then the value of the daypeakedness signal .cannot g below 10%.
  • campaigns may last as little as one week and may last as long as several months.
  • the value of the daypeakedness. signal may be shown to follow the pattern described i the campaign value examples across these time ranges.
  • a signal name is Commitment: Average Posts Count in the Campaign.
  • the value of the commitment: average posts count in campaign signal (metric) can include the average number of campaign- related posts that participants publish: after their first campaign post.
  • the range of values cvf the commitment; average posts count in campaign signal (metric) is bounded by the lowest value being zero which corresponds to a user only posting once about the campaign, hi embodiments, the commitment: average posts count in campaign signal (metric) may have a range of values between 0 and 10 posts, it will be appreciated in light of the disclosure that the maximum value of the commitment: average posts count in campaign signal (metric) could be much higher.
  • participants in a campaign may be very dedicated and may post 100 times about a certain subject during the scope of analysis, and the like.
  • the methods and systems disclosed herein determine, the average number of subsequent participation actions, e.g.. TweetsTM (or other posting) with ' campaign hashtag, acros ail participants in a campaign.
  • participants (i.e., posters) i a campaign can be a smaller subset o participants in a map.
  • the map may capture some of their ' followers ahd r other members of the network terrain when those. re h ighly connected to active participants in the campaign.
  • campaign participation can include TweetsTM or the like wit campaign-related hashtags (for campaigns organized around a hashtag). TweetsTM or the like with links to. a video or article (for campaigns Organized around a video or article), retweets of the above t eets and the like.
  • Examples of out o f scope for participation include favorites of tweets with campaign-related hashtags or Sinks or @-rephes or ⁇ mentions of TweetsTM (or the like) with campaign-related hashtag or links.
  • it will be appreciated in light of the disclosure that participants in spontaneous campaigns post more about their campaigns than participants in coordinated campaigns. It will also be appreciated in light of the disclosure that this pattern may be counterintuitive, as one may expect participants in coordinated campaigns to be extrinskally motivated to hit certain participation targets (e.g., by being paid by number of posts) and thu to post more than participants in spontaneous campaigns, who lack such moti vation.
  • An exemplary average range of values of the commitment average posts count in campaign signal (metric ⁇ for coordinated campaigns is i ,28 to 3,40.
  • An exemplary standard deviation of the value of the commitment average posts count in campaign signal (metric) for coordinated campaigns is 0.84.
  • An exemplary av erage value of the commitment average posts count in campaign signal (metric) for spontaneous campaigns is 3,53.
  • the commitment average posts count in campaign signal (metric) can be analyzed at the community level, at a cluster level, and a participant level.
  • The. commitment; average posts count in campaign signal (metric) can be analyzed at the participant level to represent individuals who have extremely high commitment values, e.g., posting about a campaign one hundred times.
  • the commitment: average posts count in campaign signal (metric) is focused on participations after the firsi post and complemented by a measurement of the proportion of participants in the campaign who have only participated: once.
  • the commitment: average posts count in campaign signal (metric) may be combined with a commitment:, average time range of participation signal (metric) into a commitment; post regularity signal (metric) that may capture the deviation of campaign parties pants front natural human attention patterns.
  • the commitment average posts count n campaign- signal (metric) may be .normalized to take into account average posts per users in order to control for users with a very heavy activity across ail campaigns.
  • a priority signal name is Commitment: Average Time ' Range of
  • the commitment: average time range o -participation signal (metric) may be used to facilitate looking at how long (in days) participants remained engaged in pushing the campaign.
  • the loyalty of participants to the campaign may be measured by time range (in days) for their campaign-related TweetsTM ' (or other postings) that may be averaged across all participants,
  • the range of the values of the commitmen average time range of participation signal is an unbounded value and therefore can be zero days to the total length of the campaign.
  • the commitment: average time range of participation signal (metric) may look at the tim -frame between first and last participation action that can be averaged across ail participants in a campaign.
  • the commitment: average time range of participation signal (metric) may measure whether actors participate in a "one-off' way (one TweetTM and done) ' or demonstrate a commitment to the campaign (multiple TweetsTM or other postings over time).
  • participant in coordinated campaigns engage with the ' campaign over a longer period than participants in spontaneous campaigns:. It will also be appreciated in light of the disclosure that participants in coordinated campaigns maybe more likely than participants in. spontaneous campaigns to receive extrinsic motivation, such as payment, for engaging with the campaig and, as such, the extrinsic motivation may lead to a longer engagement period titan intrinsic motivation.
  • An exemplary average range of values of the commitment is 0.08 to 22.33 days,
  • An exemplary average value of the commitment is 1 ,53 days,
  • An exemplary average range of values of the commitment is 0 to 3.36 days.
  • the commitment: average time range of participation -signal, (metric) may be affected fay the overall time range of a campaign, e.g., if a campaign lasts three days, then this metric cannot go above a value of three.
  • the commitment: average time range of participation signal (metric) may be combined into a • commitment: post regularity signal that may capture the deviation of campaign participants from natural human attention patterns.
  • a signal name is Semantic Di versity for all Messages.
  • the semantic diversity for ail messages signal (metric) description ⁇ The semantic diversity for all messages signal (metric) looks to detail how- generally on-inessage is the campaign. The semantic di versity for all messages signal (metric) also looks to determine whether the interaction or activity appears like a diverse conversation covering a range of topics and expressions or ma he a fairly uniform campaign with low semantic diversity, it will be appreciated in light of the disclosure that people tend to TweetTM (or otherwise post) on a variety of topics related to their daily lives, work, and interests, A group .trying to promote a coordinated campaign, however, may be interested onl in the narrow range of topics relevant to that campaign.
  • b-ots or propaganda accounts ma also be- interested, in any TweetTM (or applicable posting) relevant to any campaign they are trying to push, and therefore could be Tweeting*"* (or otherwise posting) on an. extremel wide range of topics.
  • the semantic diversity for all messages signal may he measuring the extent to which participants in the campaign are TweetingTM (or otherwise posting) on an intermediate range, of topics, which suggests that their activities are spontaneous -and human rather than automated or coordinated to propagate a specific message. 10397)
  • the- range of values of the semantic diversity for ail messages signal (metric) is zero to 100%
  • raw alues of the semantic di versity for all messages signal fall into three categories: (i) When the value of the semantic diversity for ail messages signal: (metric) is ⁇ 1 (less than one), then it may represent users who exclusively post about the same topic, which may be a eharacte.ris.tic of fabricated campaigns, (ii) When the value of the semantic- diversity for ail messages signal (metric is between one and 100, then It may represent users who post on a variety of topics and being characteristic of norma! human activity, (iii) When the value of the semantic diversity for all messages signal (metric) is above 100, then it may represent users who post on an extremely diverse set.
  • the semantic diversity for all messages signal may be set to be bounded at 1000 because it may be necessary to fix a maximum value for the "distance" between any pair of topics, for which no document includes terms. from both topics. It will be appreciated in light of the disclosure that mathematically the distance should be infinity but, typically, it can be to set the value to 1000.
  • the percentage of users with the semantic diversity for all messages signal may be greater than or equal to i .O and less than 100 and thus varies between zero and 100%.
  • the value of the semantic diversity for ail messages signal (metric) of a particular actor's (or cluster ' s, or campaign's) messaging may be based on the assignment of messages to topics.
  • the compulation of the semantic diversity for ail messages signal (metric) may use a Latent Dirichlet Allocation algorithm.
  • the semantic diversity for ail messages signal (metric) Is determined for the message set.
  • the measure of the value of the semantic di versity for all messages signal (metric) is determined as the probability that two documents chosen from the corpus -at random with replacement wili be on the same topic,
  • the corpus is the message set, and the documents may be user TweetTM (or other posting) histories, aggregated by user.
  • the Latent Dirichlet Allocation (LDA) algorithm may be run for fiftee iterations with a number of topics no less than 20% of the number of documents and no more than 30%.
  • An. average value of the semantic diversity for all messages signal (metric) over twent distinct runs of the LDA. algorithm is used on. the same corpus to smooth out variations due to the initial conditions for a particular run.
  • a topic- distance score of 1000 may be assigned to the semantic diversity for all messages signal (metric) for topics that do not co-occur in documents.
  • the semantic diversity for all messages- signal may refer to all campaign-related messages
  • An exemplary average ' range of values of the semantic diversit for all messages signal (metric) for spontaneous campaigns is 50% to 98%.
  • the semantic diversity for all messages signal may be very sensitive to confounds *
  • news organizations may tend to have low semantic diversity because news organizations may post the same story headlines over and over even though such news organisations are not coordinated actors.
  • TweetsTM (or other postings) in one language tend to be more coordinated than TweetsTM (or other postings) in multiple languages, because the Latent Diriehlet Allocaiion (LDA) algorithm may not translate terras across languages.
  • LDA Latent Diriehlet Allocaiion
  • the semantic diversity for all messages signal may point t the differentiation between natural language use and the use of language to push a particular message. It will be appreciated i light of the disclosure that coordination around a message may require that that message may be as clear and simple as possible, whereas natural language can he complex, metaphorical, and even slightly confusing. To that end, coordinated campaigns may, therefore, not wish to increase the -semantic diversity of their messages even if the technical or organizational opportunity was available.
  • the semantic diversity for all messages signal includes separating language diversity from semantic diversity either by- grouping TweetsTM (or other postings) by post language prior to analysis or using automated machine translation to proconvert all TweetsTM (Or other postings) to the same language.
  • the semantic diversity for all messages signal also Includes leveraging existing natural language processing approaches to identify certain kinds of low-semantic diversity language that may not be of interest, e.g., news headlines and press releases,
  • the ⁇ semantic diversity for all messages signal may measure the temporal alignment of campaign-related TweetsTM (or other postings) for all participants. It will be appreciated in light of the disclosure that users generally do not time their TweetsTM (or other postings) to- coincide with the TweetsTM (or postings) of others. When the TweetTM (or other posting) histories of campaign participants follow the same pattern of ebb and flow, especially across time zone boundaries, this ma be. evidence that an actor is coordinating the activities of participants to create a concentrated temporal burst of engagement.
  • the semantic diversity for ail messages -signal may include temporal coordination of TweetsTM (or other postings) between campaign participants measured by alignment of TweetTM (or other posting) historie across all participants in the campaign.
  • the range of the values of the semantic diversity for ail messages signal is between 0% and 100% and represents the percent alignment of two users' temporal normalized sequences of participation in the campaign, Toward that end, 0% -alignment may mean that the users' sequences do not match at ail, while 100% alignment may indicate a perfect matc :,
  • the semantic diversity for all messages signal may be computed with a dynamic time warp algorithm for comparing two temporal sequences of activity.
  • the dynamic time warp algorithm between two sequences SI and S2 is the number of wa ping transformations that are required to change S 1 into S2,
  • the methods and systems described herein may, for example, use the dynamic time warp algorithm to identify bots and trolls in a different social media setting.
  • the number of warping transformations may be normalized by the length of both -sequences SI and S2 and multiplied by 100- to get a percent value. Finally, the normalized number may be subtracted from 1 0 in order to calculate the percent alignment of SI and S2.
  • a priority signal name is temporal coordination per cluster.
  • the temporal coordination per cluster signal (metric) description may look at the- communities who participate in this campaign to identify different communities exhibiting very similar patterns of engagement that may be considered as being odd. In embodiments, the pattern of the temporal coordination per cluster signal (metric) may be even odder when postings exist in different time zones.
  • the temporal coordination per cluster signal (metric) is measuring the temporal alignment of campaign-related TweetsTM (or other postings) aggregated, at the cluster level With that in mind. communities generally do not time their TweetsTM (or other postings) to coincide with the TweetsTM (or otter postings) of otter comm nities.
  • TweetTM (or other posting) histories of participating clusters follo the : same pattern of ebb and flow, especially across time m ' boundaries, th s may be evidence that an actor, is coordinating the activities of participants to create a concentrated temporal host of engagement.
  • the range of values for the temporal coordination per cluster signal is zero percent to 100%
  • the value of the temporal coordination per cluster signal (metnc) ' represents the percent alignment of two users' temporal normalized, sequences ' of participation in the campaign. Toward that end. 0% alignment may mean thai the users * sequences do not match at all, while 100% alignment indicates a perfect match.
  • the temporal coordination per cluster signal (metric) is a pe.r-user take on examining temporal coordination, which might he helpful when other metrics are noisy.
  • Temporal coordination per user is technically the temporal coordination between pairs of users, in embodiments, the temporal coordination per cluster signal ( metric) may measure the temporal alignment of campaign-related TweetsTM (or other postings) between individual campaign participants. As noted before, users generally do not time their TweetsTM (or other postings) to coincide with the tweets of others. When the TweetTM (or other posting) histories of campaign participants- follow the same pattern of ebb and flow, especially across time zone boundaries, this may be evidence that an actor is coordinating the activities of participants to create a.
  • the temporal coordination per cluster signal may provide: a good high-level description of the rate of unusual coordination across the users participating in & campaign.
  • the temporal coordination per cluster signal (metric) may suffer from, the same overestimation of actual temporal coordination so the algorithm may be adjustable: for including , in the calculation the average temporal coordination across users,
  • a signal name is client diversity per cluster
  • the client diversity per cluster signal (metric) description may determine how accounts in a given cluster use TwitterTM, FacebookTM, or other social media platforms.
  • the client diversit per cluster signal (metric) may also determine how TwitterTM users (or other posters or various relevant platforms) go through a mobile device, a computer, or directly access APIs of TwitterTM to TweetTM (or other social media postings).
  • some clients may be used to coordinated TweetsTM (or other social media postings) and the client diversity per cluster signal (metric) may he used to determine how coordinate are the TweetsTM (or othe social media postings), and are such coordinating TweetsTM (or othe social media postings) those that are used heavily in some of the communities who participate in this campaign.
  • client diversity per cluster signal (metric) is the same as the client diversity at campaign scale signal (metric) but analyzed at the cluster level.
  • the valise of the client diversity per- cluster signal is computed by using the "source" field of the TweetTM (or other posting) to identify the client used to make the TweetTM (or other posting), as in the Client diversity at campaign scale signal (metric). Then the TweetsTM (or other postings) are aggregated into clusters of the author of the TweetTM (or other posting) in the campaign map,
  • a signal name is Time Delta between Communities.
  • the time delta between communities signal (metric) description - the time delta between communities signal (metric) may identify a community that is engaging with the campaign .significantly ahead of others, in one example, thi is due to kick-starting that campaign or being significantly behind maybe becau e: there is a need to coordinate talking points before engaging. It will be appreciated in light of the disclosure that the time delta between communities signal (metric) was inspired by qualitative analysis initially done in the Syrian Civil War context such that communities pretending to portray civilians while being led by military intelligence engaged with popular topics with a lag of several hours to days. Toward that end, the time delta between communities signal (metric) may examine when clusters are most active in the campaign. By way of this example, the time delta beiween communities signal (metric) may measure the distance between a given cluster's peak and the more general peak of the overall campaign.
  • the range of values of the time delta between communities signal represents a number of days
  • .Negative values may indicate that a ⁇ community's peak of temporal activity happens before the average peak date for all other communities. ' Positive values may indicate the peak happens after the average peak date tor all other communities. A score of zero may indicate a community peaking in sync with the rest of the communities.
  • the time delta between communities signal, (metric) may be helpful to analyz disputed hashtags, with both spontaneous and coordinated, clusters engaging in the same campaign.
  • the time delta between communities si gnal (metric) may point to the natural logistical cost of coordinating a message of a campaign in response to a sudden event, such as a late-breaking news story. It will be appreciated in light of the disclosure that even the most sophisticated coordinated campaigns cannot anticipate such events and at the same time, they cannot respond to these events spontaneously a it may distract from their message and may hurt the overall aim of the campaign.
  • the time delta between communities signal may include automatic identification of •sudden events as they happen, e.g., by matching campaign-related terms against GoogleTM News, other ews sources, and the like.
  • a subsequent step may be to automatically track .responses to the same events from campaign compared to non-campaign-reiated clusters,.
  • a signal name is Commitment by User *
  • the commitment by user signal (metric) description ⁇ Loyalty of participants to the campaign may be measured by the number of times the participants TweetTM (or otherwise post) about the campaign and time range (in days) for their campaign-related TweetsTM (or other postings).
  • the commitment by user signal (metric) may be measured by the user.
  • the commitment by user signal (metric) looks at whether individual users are particularly committed to a campaign, in embodiments, the commitment by user signal (metric) may facilitate looking at users and their own commitments by determining whether there are, for example, people who TweetTM (or otherwise post) ⁇ exactly 100 times* or some predictable predetermined amount.
  • the value of the commitment by user signal (metric) may facilitate identifying and singling out accounts that might be incentivized to participate x- number of times .or for x days straight,
  • a signal name is Commitment by Cluster, ⁇ 9424 ⁇ The commitment by ciuster signal (metric) description ----- The commitment by ciuster signal
  • the commitment by ciuster signal may facilitate looking at clusters and their own commitments.
  • the commitment b cluster signal may facilitate the determination of whether there are clusters that TweetTM (or otherwise post) exactly i 0(1 times.
  • the commitment by cluster signal may be used to determine whether a group of accounts showed up, TweetedTM (or otherwise posted) 100 tiroes over five days, and then left.
  • the commitment by cluster signal (metric) may look at the loyalty of participants to the campaign that may be measured by the number of time the parties pants TweetTM (or otherwise post) about the campaign and time- ange f in days) for their campaign- related Tweets "1' (or other postings).
  • the commitment by cluster signal .(metric) may measure the degree to which a body of actors- in the campaign stick with it after their first engagement witii the campaign, it wiii be appreciated in light of the disclosure that the value of the Commitment by cluster signal (metric) for mo t human activity is a skewed distribution, in measurable contrast to coordinated, activit that may include those who participate once with a tew die-hard supporters that participate a lot. Deviations from the skewed distribution detailing human activity may, therefore, may reveal coordination. By way of this example, if an actor participates in campaign exactly ! 00 times, this may suggest that they were incenttvized by a coordinating body to meet that threshold,
  • the range of the values of the commitment by cluster signal (metric) are ' unbounded values starting at zero, i.e.,, no subsequent actions, zero days pass between first and last action.
  • the value of the. ' commitment by cluster signal (metric) by subsequent actions is between zero and ten actions.
  • the value of the- commitment -by cluster signal (metric) by time frame is between zero and thirty days.
  • a signal, name is Account Creation Date Diversity for Cluster,
  • the account creatio date diversity for cluster signal (metric) description ⁇ this signal (metric) ma facilitate observing how close in time -all accounts -participating- in a campaign were created. If 90% of participating accounts within a given cluster wen; created within -a ..span , of five days, for example, then such activity may indicate a heavy coordination within that cluster.
  • the account creation date diversity for cluster signal (metric) may he particularly helpful to spot bote, .toll farms, and the like on networks using ' fake accounts-. generated in hulk,
  • the range of values of the account, creation date diversity for cluster signal (metric) is zero to 4,015 days. It will be appreciated in Sight of the disclosure that the maximum range m a range from zero to the total day since the founding of TwitterTM or the other applicable social media platforms;
  • the values of the account creation, date diversity tor cluster ' -signal (metric) in datasets evaluated have included a range of zero to 1.200 day s,
  • a signal name is Homophily.
  • the homophily signal (metric) description - ⁇ This signal (metric) ma facil itate looking for communities that pay a. "disproportionate" amount of attention, to one another, for instance across ideologies, language, culture, or the like, in embodiments, the homophily signal (metric) can identify disproportionate attention relationships between clusters measured, by a number of following relationships between clusters. When looking at. communities (clusters)., if will be appreciated in light of the disclosure thai it Is just a important to understand ' who the community pays attention to -as who is in the community. With this in mind, the homophily signal (metric) may measure deviations from expected, patterns of attention in social media.
  • the homophily signal may facilitate the identification of patterns of intense inter-attention across ideologies, culture, and language that may imply evidence for coordination.
  • the range of values of the homophily signal . metric) can be shown, to be zero to ten.
  • the homophily signal (metric ⁇ as a telltale of cluster attention is a ratio of the actual number of edges connecting members of the clusters compared to what would be expected under conditions where each cluster paid attention to every other cluster strictly in proportion to the cluster's size.
  • the baseline for such a signal (metric) in. is random, connection patterns.
  • the homophi!y signal (metric) includes relatively more aggressive-, baselines ⁇ because no actual human relationships follow a random pattern.
  • a signal name is Language Mismatch.
  • the language .mismatch signal, (metric) description The default language for a new TwitterTM (or other social media) account appears to be English. Users may, however, choose to change their profile language if they want. It will be appreciated in light of the disclosure that users posting frequently in a language thai differs from their default TwitterTM (or other social media), profile language- may be part of a i3 ⁇ 4reign-iauguage propaganda operation on behalf of some coordinated entity.
  • the language .mismatch signal may measure the percentage of a campaign's TweetsTM (or other postings) - at both the cluster and campaign level ⁇ that is In a language that differs from the users' default . TwitterTM ( r other social media) profile language.
  • the range of values of the language mismatc signal is zero to one hundred percent, -where one hundred percent would have indicated that all campaign participation actions in this cluster/campaign are TweetedTM (or otherwise posted) in a language different from their accounts.' default, profile language.
  • the language mismatch signal (metric) is computed -
  • the language mismatch signal (metric) may identify the language of the Tweet ' TM (or other posting) and the language profile setting i the TwitterTM .API or the API of another social media platform.
  • the language mismatch signal (metric) may also aggregate the TweetsTM (or other postings) by the cluster of the author of the TweetTM (or other posting) in a campaign map. By way of this example, the % of TweetsTM (or other postings) for each cluster whose tweet language did not match the poster language of the TweetTM (or other posting) may be reported,
  • the methods and systems described herein may be deployed in part or in whole through a machine that executes computer software, program codes, and/or instructions on a processor.
  • the present disclosure may be implemented as a method on the. machine, as a system or apparatus as part of or in relation to the machine, or as a computer program product embodied in a computer readable medium executing on one or more of the machines,
  • the processor may be part of a server, cloud server, client, network infrastructure, mobile computing platform, stationary computing platform, or other computing platform.
  • a processor may be any kind of computational or processing device capable of executing program instructions, codes, binary instructions, and the like.
  • the processor may he or may include a signal processor, digital processor, embedded processor, microprocessor, or any variant such as a co-processor (math coprocessor, graphic co-processor, communication co-processor and the like) and the like that may directl or indirectly facilitate execution of program code or program instructions stored thereon.
  • the processor may enable execution of multiple programs, threads, and codes.
  • the thread may be executed simultaneously to enhance the performance of the processor and to facilitate simultaneous operations of the application.
  • methods, program codes, program, instructions and the like described herein may be implemented in one or more thread.
  • the thread may spawn other threads that may have assigned priorities associated with them the processor may execute these threads based on priority or any other order based on instructions provided, in the program code.
  • the processor may include non-transitory memory that stores methods, codes, instructions, and programs as described herein and elsewhere.
  • the processor may access a non-transitory storage medium, through an interface that may store methods, codes, and instructions as described herein and elsewhere.
  • the storage medium associated with the processor for storing methods, programs, codes, program instructions or other type of instructions capable of being executed b the computing or processing device may include but may not be limited to one or more of a CD-ROM, DVD, memory, hard disk, flash drive, RAM, ROM, cache, and the like.
  • a processor may include one or more cores mat may enhance speed and performance of a multiprocessor.
  • the process may be a dual core processor, quad core processors, other chip-level multiprocessor and the like that combine two or more independent cores (called a die).
  • the methods and systems described herein may be deployed in part or in whole through a machine that executes computer software o a server, client, firewall, gateway, hub, router, or other such computer and/or networking hardware.
  • the software . ro- m may be associated, with a server that may include a file server, print server, domain server, Internet server, intranet server, cloud server, and other variants such a secondary se ve , host server, distributed server, and the like.
  • the server may include one or more of memories, processors, computer readable media, storage media, ports (physical and virtual), communication devices, and interfaces capable of accessing other servers, clients,, machines, and devices through a wired or a wireless .medium, and the like.
  • the methods, programs, or codes as described herein and elsewhere may be executed by the server.
  • other devices required, for execution of methods as. described, in this application may be considered as a part of the infrastructure associated with the server.
  • the server may- rovide an interface to other devices including, without limitation, clients, other servers, printers, database servers, print servers, file servers,, communication servers, distributed servers, social networks, and the like. Additionally, this coupling and/or connection may facilitate remote execution of program across the network. The networking of some or ail of these devices ma facilitate parallel processing of program o method at one or more location without deviating from: the scope of the disclosure-.
  • any of the devices attached t the server through an interface may include at least one storage medium capable of storing methods, .programs, code and/or instructions.
  • a central repository may provide program instructions to be executed on different devices. In this ' implementation:, the remote repository may act as a storage medium for program code, instructions, and programs,
  • the software program may be associated with a client that may include a file client, print client, domain client, Internet client, intranet client and other variants such as secondary client, host client, distributed client, and the like.
  • the client may include one- r more of memories, processors, computer eadable- media, storage media, ports (physical and. virtual), communication devices, and in terfaces capable of accessing other clients, servers, machines, and devices through •a wired or wireless medium, and the like.
  • the methods, programs, or codes as described herein and elsewhere may be executed by the client.
  • other devices required for execution of methods as described in. this application ma be considered as a pari o the infrastructure associated with the client..
  • the client may provide an interface to other devices including, without limitation, servers, other clients,, printers, database servers, print servers, file servers, communication servers, distributed servers, and the like. Additionally, this coupling and/or connection may facilitate remote execution of program across the network The networking of some-. or all of these devices may facilitate parallel processing of a program or .method at one or more location without deviating -from the scope of die disclosure, in addition, an of the devices attached to the client through an interface may include at least one storage medium capable of storing methods ; , programs, applications, code and/or instructions. A central repository may provide program instructions to he executed on different devices, m this implementation,, the remote repository may act as a storage medium for program code, instructions, and programs,
  • the methods and systems described herein may be deployed in part or in whole through network infrastructures.
  • the network infrastructur may include elements such as computing devices, servers, routers, hubs, firewalls, clients, personal computers, communication devices, .routing devices and other active and passive devices-, modules and/or components as known in the art
  • the computing and/or non-computing device(s) associated with the network infrastructure may include, apart -from other components, a storage medium such as flas memory, buffer, stack, RAM, ROM, and the like.
  • the processes, methods, program codes, instructions described herein and elsewhere may be executed by one or more of the network ii astruetura! elements.
  • the methods and systems described herein ma he adapted for use with any kind of private, community, or hybrid cloud computing network or cloud computing environment, including those which involve features of software as a service (SaaS), platform as- a service (PaaS), and/or infrastructure as. a service (laaS).
  • SaaS software as a service
  • PaaS platform as- a service
  • laaS infrastructure as. a service
  • the methods, program codes, and instructions described herein and elsewhere may be ' implemented on a cellular network having multiple cells.
  • The. cellula network may either be frequency division multiple access (FDMA) network or code division multiple access (CDMA) network.
  • the cellular network may include. mobile devices, ceil sites, base stations, repeaters, antennas, towers, and th like. Ihe ceil network may be a GSM, GPRS, 3G, EVDO, mesh, or other networks types.
  • the methods, program codes, and instructions ' described herein and elsewhere may be implemented an or through mobile devices.
  • the mobile devices may include navigation devices, cell phones, mobile phones, mobile personal digital, assistants, laptops, palmtops, nethooks, pagers, electronic books readers, music players and the like. These devices may include, apart • from other components . , a storage medium such as a flash memory, buffer, RAM, ROM and one or .more computing devices.
  • the computing devices associated with mobile devices may be enabled to execute program codes, methods, and Instructions stored thereon. Alternatively, the mobile devices ma be configured to execute instructions in collaboration with other devices.
  • the mobile devices may communicate with base stations interfaced with servers and configured to execute program codes.
  • the mobile devices may communicate on a peer-to-peer network, mesh network, or other communications network.
  • the program, code may he stored on the storage medium associated with the -server- and executed by a computing device embedded within the server.
  • the base station may include a computing device and a storage ⁇ ' medium.
  • the storage device may store program codes and. instructions executed by the . computing devices associated with the base station,
  • the computer software, program codes, and/or instructions may be stored and/or accessed on machine readable media that may include: computer -c m onents, devices, and recording media that retai digital data used for computing for some interval of time; semiconductor storage known as random access memory (RAM); mass storage typically for more permanent storage, such as optical discs, forms of magnetic storage like hard disks, tapes, drums, cards and other types; processor registers, cache memory, volatile memory, rion-volati1e memory; -optical storage such as CD, DVD; removable media such as flash memory (e.g., USB sticks or keys), floppy disks, magnetic tape, paper tape, punch cards, standalone RAM disks, Zip drives* removable mass storage, -off-line, and the like; other computer memory such as. dynamic memory, static memory, read/write - storage, mutable storage, read only, random access, sequential access, location addressable, file .addressable, content addressable, network attached storage, storage area network, bar codes, magnetic
  • the methods and systems described herein may transform physical and/or -intangible items from one state to another.
  • the methods and systems described herein may also transform data representing physical and or intangible items fr m one state to another.
  • machines may include, but may not be limited .to, personal digital assistants, laptops, personal computers, mobile phones, other handheld computing devices, medical equipment, wired or wireless communication devices, transducers, chips, calculators, satellites, tablet PCs, electronic books, gadgets, electronic devices, devices having artificial intelligence, computing devices, networking equipment, servers, routers, and the like.
  • the elements depicted in the flowchart and block diagrams- or any other logical component - may be implemented on a machine capable of executing program instructions.
  • the methods and/or processes described above, and steps associated therewith, may be realized in hardware, software or any combination of hardware and software suitable For a particular application.
  • the hardware may include a general- purpose- Computer and/or dedicated computing device or specific computing device or particular aspect or component of a specific computing device.
  • the processes may be realized i one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors or other programmable device, along with internal and/or external memory.
  • the processes may also, or instead, be embodied in an application specific integrated circuit a programmable gate array, progra.raro.abte array logic,- or any other device or combination of devices that may be configured to process electronic signals, it ill further b appreciated that one or mor of the processes may be realized as a computer executable code capable of being executed on a. machine-readable medium,
  • the computer executable code may be created using structured programming language such as C, an object oriented programming language such as C-H-, or any other high-level or low- level programming language (including assembly languages, hardware description languages, and database programming languages and technologies) that may fee stored, compiled or interpreted to run oil one of the above devices, as well as heterogeneou combinations of processors, processor architectures, or combinations of different hardware and. software, or any other machine capable of executin program instructions.
  • structured programming language such as C
  • an object oriented programming language such as C-H-
  • any other high-level or low- level programming language including assembly languages, hardware description languages, and database programming languages and technologies
  • methods described above and combinations thereof may be embodied in computer executable code that, when executing on one or more computing devices, performs the steps thereof
  • the methods may he embodied in systems that perform the steps thereof and may be distributed across devices in a number of ways, or all of the functionality may be Integrated into a dedicated, standalone device or other hardware.
  • the means for performing the steps associated wi h the processes described above may-include any of the hardware and/or software- described above. All such permutations and combination are intended to fall within the scope of the present disclosure.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Methods and systems generally include determining coordinated activity in social media movements on a social media channel. The method includes identifying a plurality of markers of coordinated activity through analysis of campaign signals from the social media movements. The method includes configuring a data structure of the plurality of markers for a social media campaign on a social media channel. The plurality of markers includes a network dimension for representing how accounts are connected, a temporal dimension for representing patterns of messages over time, and a semantic dimension for representing a diversity of topics and meanings of the social media movements. The method includes analyzing the campaign signals indicative of the coordinate activity of the social media movements in the social media campaign including determining users within the social media campaign, determining clusters of users that make up the social media campaign, and determining relationships between the users participating in the social media movements, and determining propagation patterns across clusters of users, of the social media campaign.

Description

METHODS AND SYSTEMS FOR IDENTIFYING MARKERS OF COORDINATED ACTIVITY I SOCIAL MEDIA MOVEMENTS
CROSS-REFERENCE TO RELATED APPLICATIONS
|t Rll] This application claims the benefit of the following two provisional applications: U..S. Patent Application No. 62/522,644, Hied June '20, 2017, and US. Patent Application No. 62/534 J 72, filed July 18, 2017,
|0<)i2] This application is a continuation-in-part of US. Patent Application No. 14/832,106, filed August 21 , 2.013, which claims the benefit of the following provisional application: US. Patent Application No. 62/040,075, filed August 21, 2014. US. Patent Application No. 14/832,106 is a continuation-in-part of the following patent application: U.S. Patent Application No. 13/859,396, filed April 9, 20 i 3, which claims the benefit of the following provisional, applications: U.S. Provisional Patent Application No. 61/621 ,845, filed April 9, 2012; and OS. Provisional Patent Application No. 61/760,652, filed February 5, 2013. U.S. Patent Application No. 13/859,396 is. a continuation-in-part of the following U.S. patent application: U.S.. Patent Application No. 12/973,296, filed December 20, 2010, and issued January 21 , 2014 as U.S. Pat. No. 8,635,281 , which claims priority to U.S. .Provisional Patent Application No. 61/287,766, filed December 18, 2009. The above applications are hereby incorporated by. reference in. their entirety as if fully set forth herein,
BACKGROUND
1 , Field
|0003 The present disclosure relates to methods for classifying at least one contagious phenomenon propagating on a. network.
2. Description of the Related Art
|TO§4] Internet-based technologies, and the manifold genres of interaction they afford, are re- archi testing public and private communications alike and thus altering the relationships between ail manner of social actors, from Individuals, to organizations, to mass medi a institutions, Internet technologies, have enabled shifts in methods and practices of interpersonal, communication. Many-to-many and social scale-spanning internet communications technologies are eliminating the channel-segregation 'thai previously reinforced the independence of classes of actors at these levels of scale, .enabling (or more accurately in many cases, forcing) them to represent themselves to one another via a common medium, . nd increasingly in ways that are universally visible, searchable and persistent,
{WW] Online readers typically navigate hyperlihked chains of related stories, bouncing; between numerous websites in a: hypertext network, returning periodically to favored starting points to pick u new trails. Hyperlinks result from a combination of choices, from those made by individual, autonomous authors to those made progranr atieally by designed systems, such as permal nks, site navigation, embedded advertising, tmekihg services, arid the like. Hitman, authors practice the same kind of information selectivity online that they do offline, i.e., what authors (including those representing organizations) write about and link to reflects somewhat stable interests, attitudes, and sociai/organizational relationships. The structure of the network formed by these .hypeflinks is a product of these choices, and thus large-scale regularities ,in choices will be evident i macro-level structure. This- structure will thus bear the mark of individual preferences and characteristics of designed systems and allows a kind of "flow map" of how the internet channels attention io online resources. Discriminating among types of links, and the ability to select categories of those which represen author choices, allows structural analytics to discover similarities among authors. Errors, randomness, or noise in linking at the individual level has local, independent causes, and does not bias large-scale macro patterns,
f 6TO6] Thus, in order to understand and leverage the onl ine information ecosystem, there remains a need for systems and methods for structural analytics aimed at identifying clusters of online readers and influential authors, discovering how they drive traffic to particular online resources, and leveraging that knowledge across various applications ranging from targeted advertising and communication to expert identification, and the like. This need includes a need, for understanding the role of structures and .-similarities amon authors and readers in -situations involving phenomena that follow a pattern of contagion, i.e., where an i tem of interest, such as a news story, a political topic, a product, an item of entertainment content, or the like, initiates with a single point or a small group, then spreads and grows through the network. Predicting the pattern of spread or contagion, the parties who will take interest in,. fee involved with, or be influenced b a particular item, and the like may have great value in. a range of applications; accordingly, a need exists for methods and systems that assist in or enable such prediction, of tire behavio of contagious phenomena,
SUMMARY
|80i7j In embodiments, methods and systems generally include determining coordinated activity In social media movements on a social, media channel The method includes identifying a plurality of markers of coordinated activity through analysts of campaign signals from the social media movements. The method Includes configuring- a data structure of the plurality of markers for a social media campaign on a social media channel. The plurality of markers includes a network dimension for representing how accounts are connected, a temporal, dimension for representing patterns of messages over time, and a semantic dimension for
.representing a diversity of topics and meanings of the social media movements. The method also includes analyzing the campaign signals indicative of the coordinate activity of the social media movements in the social media campaign including determining users within the social medi campaign, 'determining dusters of users tha make u the social medi campaign and determining relationships between t e users participating in the social medi movements, and determining propagation patterns, across clusters of users of the social media campaign, f fliilSl In embodiments, identify ing the plurality of markers includes evaluating a degree to which the coordin ed activity of the social media campaign is concentrated in the clusters of users, in embodiments, the coordinated activity of the social media campaign is determined from user actions within the social media movement in the social media campaign. In
embodiments, identifying the plurality of markers includes evaluating a degree to which the coordinated activity of the social media campaign is distributed among the clusters of users, in embodiments, the plurality of markers includes a day peakedness marker that indicates a percentage of the coordinated activity of the social media campaign that take place on a day identified m most active of the social media campaign., in embodiments, the plurality of markers includes a, commi tment signal that is compu ted by averaging a number of subsequent participation actions for each of plurality of participants in the coordinated activity of the social media campaign. In embodiments, the plurality of markers includes a post regularity commitment signal that represents a deviation of commitment to participation by a user from natural human attention patterns. In embodiments, identifying the plurality of markers includes determining a semantic diversity score for the coordinated activit of the social media campaign b assigning messages k the campaign to topics and calculating a diversity of the topics on. a topic distance scale that facilitates determining the semantic diversity score. I enibodimenis, identify ing the plurality of markers includes, computing temporal alignment of campaign-related actions for users in the campaign by comparing temporal sequences of campaign-related actions, in embodiments, identifying the plurality of markers includes computing semantic diversity over time to. identify 'co-occurring, topics in the social media campaign, wherein a. relatively small value of the semantic-diversity score is configured to he indicative of fabricated campaigns, wherein a relatively large value of the semantic diversity score is configured to be indicative of spambots, and wherein a semantic diversity score having a value in-between is indicative of normal human activity.
10009} In embodiments, methods and. systems generall include a computer system for determining coordinated activity in social media movements on a social media channel. The system includes a user interface that configures a social media campaign on one or more social media channels and that communicates via a network. The system includes a computing device that, identifies a plurality of markers -of coordinated, activity throug analysis of campaign signals from the social media movements and that configures one or more data structures containing the plurality of markers for the social media campaign on one or more social media channels.- The plurali t of marke s includes a -network dimension for representing ho accounts are connected, a temporal dimension for representing. atterns 'of messages- .over -time, and a semantic dimension for representing a diversity of topics and meanings of die social media movements. The analysis of the campaign signals indicative of the coordinated activity of the social media movements .in the social media campaign includes- determining users within the social media campaign, determining clusters- of users- that make up the social media campaign and detemiimng relationships between the users participating in the social media movements, and determining propagation patterns across clusters of users of the social media campaign. The system includes a storage system that stores one or more of the data structures containing the plurality of markers for the social media campaign on one or more of the social media channels. The system
includes
(60 tO] a processing system that executes computer-readable instructions thai cause the processing system to: receive a request from an external system about the coordinated activity of the campaign signals from the social media mo vements; retrieve at least a portion of one or more data structures contai n ing; the plurality of markers for the social media campaign on one or more of the social media channels; and transmit contents of at least portion of the analysis to the user interface that displays at least a portion of the plurality of markers indicative one of coordinated acti vity and normal human activity
fOOt 1 J In embodiments, identifying the plurality of markers through analysis of campaign signals includes evaluating a degree to which the coordinated activit of the social media campaign is concentrated in the clusters of users. I embodiments, the coordinated activity of the social media campaign is determined from user actions within the social media movements in the social media campaign. The coordinated activity includes a relatively large number of accounts on one -or more of the social media channels controlled by a relatively small number of coordinated entities resulting in a relative lack of diversity of similar accounts on one or more social medial channels controlled by uncoordinated users. In embodiments, identifying the plurality of markers through analysis of campaign signals includes evaluating a degree to which the -coordinated activity of the social media campaign is distributed among the clusters of users.
[Θ0Ι.2] i embodiments, the plurali ty of markers includes a day peakedness marker that indicates a percentage of the coordinated activity of the social media campaign that take place on a day identified as most active of the social media campaign, in embod iments, the plural ity of indicators incl udes a commitment .signal that is computed by averaging a number of subsequent participation actions for each of a plurality of participants in the coordinated activity of the social media campaign. In embodiments, the plurality of indicators i ncludes a post regularity commitment signal that represents a deviation of commitment to participation by a user from -natural hitman attention patterns. Iti embodiments, identifying the plurality of markers through analysis of campaign- s ignals includes determining a semantic diversity score for the coordinated activity of the social media campaign. Determining, a semantic diversity score includes
assigning messages in the campaign to topics and calculating a di versity o the topics on a topic distance scale that facilitates determining ie seman ic: diversity score, in embodiments, identifying the plurality of markers through analysis of campaign signals includes computing temporal alignmen of campaign-related actions for users in the campaign by comparing temporal sequences of campaign-related actions. I n embodiments, identifying the plurality of markers through analysis of campaign signals includes computing semantic diversity over time to identity concurring topics in the social media campaign, A relatively small value of the semantic diversity score is configured to be indicative of .fabricated campaigns, a relatively large value of the semantic diversity score is configured to be indicative ofspambots, and a semantic diversity score having a, value in-between is indicative of normal human activity,
|0013| In an aspect of the disclosure, methods and systems are provided that allow characterization Of structures and features of networks, such as online networks of creators and consumers of Items of content, in. torn enabling prediction course of action of actors in such networks and the flow of items, such as items of content, through such networks, including the growth and spreading of contagious phenomena.
f CHIM] In an aspect of the disclosure, a computer-readable storage medium with an executable program stored thereon, wherein the program instructs a processor to perform the steps of attentive clustering and analysis, may include constructing an online author network, wherein constructing the online author network includes selecting a set of source nodes (S), a set of outlink- targets (T) from at least one selected type of hyperlink, and set of edges (E) between S and T defined by the at least one selected type of hyperlink from S to T during a specified time period; deri ving a set of nodes, T, by any one of or combination of a.) normalizing nodes in T, optionally to a selected level of abstraction, b.) using lists of target nodes for exclusion, {"blacklists"), and c.) using lists of target nodes tor inclusion ("whitelisis- '); transforming the online author network into a matrix of source nodes in S; l inked to targets in ; partitioning the online author network into at leas one set of source nodes with a similar linking history to form an attentive -cluster and/or at least one set of outlink targets with a similar citatio profile to form an outlink bundle; and optionally, generatin a graphical representation of attentive clusters and/or outlink bundles in the network to enable interpretation of network features and behavior and calculation of comparati ve statistical measures across the attentive clusters and outlink bundles; wherein at least one element of the graphical representation depicts a measure of an extent of a type of activity within the network; and measuring frequencies of links between attentive clusters and outiink bundles enabling identification and measurement of large-scale regularities in the distribution of attention by online authors across sources of information. The -element of the graphical representation may use at least one of size, thickness, color and pattern to depict a type of acti vity. Attentive clusters and their constituent nodes may be differentiated in the graph ical representation, by at least one of a color (including hue, intensity and saturation), a shape'. (including 2D or 3D representations), a 'geometric arrangement, a shading, a transparency and a size. The size of the object representing the clustered nodes to the graphical representati n .may correlate with a metric. The nodes, targets, and edges ma be collected irons public and private sources of information. Constructing the matrix may include applying at least one threshold parameter from the group consisting of: maxnodes, targetmax, nodeursn, targetmin, max! inks, and hnkroin. Constructing the matrix may include applying a minimum threshold for the number of included nodes that must link to a target to qualify it for inclusion in the matrix. Constructing the matrix may include applying a minimum threshold tor the number of included targets that must link to a node to qualify it for inclusion in the matrix. The matrix may he a graph -matrix. The method may further include applying: any lists specify ing inclusion or exclusion of particular nodes.
(0 15] it should be understood that, except where context prevents, the term "author." as used herein, should be understood to encompass human and non-human creators and editors of content (including, without limitation, text, images, video, tweets, animations*, multimedia and any combinations or other type of content and including, without limitation, original content, derivative works, commentary, analysis, and other genres of content) that can be 'consumed (e.g., read or viewed) by others, such a readers or viewers in a network,
|0 I6} in an aspect of the disclosure, a method' of usin attentive clustering to steer a farther data collection process may include partitioning an online, author network into at least, one set of source nodes with a similar linking history to form an attentive eluster and at least one set of but! Ink targets with -a. similar citation profile to form an outiink bundle, and collecting..cliolestream data for the source nodes of the attentive eluster.
10 17] In an aspect of the disclosure, a method o f using attenti ve clustering to steer a further data collection process may include partitioning an online author network into at least one set of source nodes with a similar linkin history to form an attentive cluster and at least- one set of outiink targets with a similar citation profile to form an outiink -bundle, and collecting cSicksiream data for t he target nodes of t he outiink bundle,
(0018] In an aspect of the disclosure, a method of using attenti ve cl ustering to steer a further data collection process may include partitioning an online author network into at least one set of source nodes with a similar linking history to form an attentive cluster and at least on set of outiink targets with a similar citation profile to form, an outlink bundle, and collecting survey data for the source nodes of the attentive duster..
£θί)ϊ ] in an aspect of the disclosarSi a method of using attentive clustering to steer a further data collection process may include .partitioning an online author network into at least one set of source nodes with a similar linking history to form an attentive cluster and at least one set of out! ink targets wit a similar citation profile to form an outlink bundle, and collecting survey data for the target nodes of the otitlink bundle.
|Θ02Θ] in an aspect of the disclosure, a method using attentive clustering t steer a farther data collection process may include partitioning an online author network into at least one set of source nodes with a similar linking history to form an attentive cluster and. at least, one set of outlink. targets. with a similar citation profile to form an outlink bundle, and collecting, geo-'locatton data for the source nodes of the attentive cluster.
|iW2ij In an aspect of the disclosure, a method of using attentive clustering to steer a further data collection process may include partitioning an online author network, into at least one set of source nodes with a similar linking history to form an attentive cluster and at. least one set of outlink targets with a similar citation profile to form an outlink bundle, and collecting geo-ioeation data for the target nodes of the outlink bundle,
|ΐΚί22] In an aspect of the disclosure, a method of metadata tag analysis to facilitate interpretation of an attenti ve cluster may include partitioning an online author network into at least one set of source nodes with a similar linking history to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form an outlink bundle, collecting a metadata tag associated with the source nodes in the attentive cluster, and performing a differential frequency analysis on the metadata tags that are: associated with the attentive cluster. The method may further include sorting cluster focus scores on a plurality of the metadata tags,
f 01)23 j in an aspect of the disclosure, a method of metadata tag analysis to facilitate interpretation of an attenti ve cluster may include partitioning: an online author network into at least one set of source nodes with a similar linking histor to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form, an outlink: bundle, collecting a metadata tag associated with the source nodes in the attentive cluster, and performing a differential frequency analysis on the metadata tags that arc associated with the outlink bundle. The method may further include sorting cluster focus scores on a plurality of the metadata tags,
{(H)24j In. an aspect of the disclosure, a method may include partitioning an online autho network into at least one set of source .nodes with a similar Jinking, history to form an attenti ve cluster and at least one set of outlink targets with a similar citation profile to form an outlink. bundle, forming a density matrix of the attentive cluster and the outlink bundle, determining where there is a higher density in the -density matrix thao chance would predict, and identifying patterns of influence of a Mock of web sites on a block of authors by analyzing the higher density area of the density matrix. £θί)25| In an aspect of the disclosure,.* method of macro, measurement of link: density may include constructing art online author network, wherein, constructing the online author network comprises selecting a set of source nodes (S), a set of outlmk targets (T), and a set of edges (E) between S and T defined by the at least one selected type of hyperlink from S to T during a specified time period, deriving a set of nodes, T", by- normalizing nodes in T« transforming the online author network Into a matri of source nodes in S linked to targets in T, and collapsing the matrix to aggregate link measures among clusters of sources and clusters of targets. The aggregated link- measure may be at least one of a coun of the number of nodes in source -cluster S linking to any member of target set T, a density calculated by dividing counts by the product of the number of members in S and the 'number of members in T; and a standard score that is a standardized measure of the deviation from random chance for counts across each source node-outiink target crossing in the density matrix.
|0«26| in an aspect of the -disclosure, a method may include partitioning an online author network into at least one set of source nodes with a similar Sinking history to form an attentive cluster and a least one set of outlmk targets with a. similar citation profile to form an outlink bundle, arid associating the attentive cluster with a real world grou of people.
f(MI27J In an aspect of the disclosure, a method of multi-layer attentive clustering may include partitioning a multi-layered social, segmentation into at least one set of source nodes with a similar Unking history to form an attentive cluster and at least one set of ootiink targets with a similar citation profile to form an outlmk bundle, and monitoring at least one of the attentive cluster and the outlmk. bundle on at least one layer of the social segmentation. The social segmentation may be an. online social media author network. Monitoring may be tracking the growth of an attentive cluster over time* The method may further- include examining a source node associated with a specific .player in the attentive duster in order to -determine a characteristic. The monitoring may be used to identity a group of people who are susceptible to a message and track downstream -activities in response to the message.
|Θ028] in an aspect of the disclosure, a method may include partitioning an online author network into at least one set of source nodes with, a similar linkin histor to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form an: outlink bundle, and analyzing the attentive cluster over time to depict changes in a linking pattern of the attentive cluster over a time period. The outlink bundle may be a list of semantic markers. The semantic marker may be at least one of a text element, a post, a tweet, an online content, and a metadata tag. Analyzing may involve tracking a semantic .marker or set of semantic markers across one or more attentive clusters within the online author network.
£01129] In an aspect of the disclosure, a method-may include partitioning an online author network into at least one set of source nodes with a similar Sinking history to form an attentive cluster and at least one set of ©utlink targets with a similar citation profile to form an out!ink bundle, arid .calculating a set of cluster focus index (GFi) scores for the attentivecluster, wherein the CPi represents the degree to which a particular outlink target is disproportionately cited by members of a particular attentive cluster as compared to the average citation frequency for all nodes in , At least one source node may be a high attention source node. The method may further include automaticall placing an advertisement at the particular outlink target.
f (Mi3oj la an aspect of the disclosure, a method may include partitioning an online author network into at least one set of source nodes with a similar linking history' to form an attenti ve cluster and at least one set of outlink targets with a similar citation profile to form an outlink bundle, and generating a graphical representation of attentive clusters and/or outlink. bundles in the network to enable interpretation of network, features and behavior and calculation of comparative statistical measures across the attentive clusters and outlink bundles, wherein at least one element of the graphical representation depicts a .measure of an extent of a type of activity within the network. The method may further include further segmenting the network using at least, one of a text, an item of online content, a link, and an object. The source node hi the graphical representation may be represented by an individual dot. The size of the dot may be determined based on the number of other source nodes that link to it.
[Θ031 J in an aspect of the disclosure, a method may include partitioning an online author network into at least one set of source nodes with, a similar linking history to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form an outlink bundle, calculating a. set of cluster focus index (CFl) scores (CFI) for the attentive cluster, wherein the CFI represents the degree to which a particular outlink target is disproportionately cited by at least one source node of particular attentive cluster, and generating a graphical representation of attentive clusters and/or outlink bundles in the network, wherein at least one element of the graphical representation depicts a measure of an extent of a ty pe of activity within the network, wherein the higher the CFI score, the higher the outlink target appears along at least one axi of the graphical representation,.
fM32j In. an aspect of the disclosure, a method of attentive clustering may include defining a semantic bundle, searching a plurality of candidate nodes- f r item in the bundle in order to generate a. relevance metric for use in selecting high-relevance online authors, partitioning the online author network into at least one set of source nodes with a similar linking history to form an attentive cluster and at least one set of out! ink targets with a similar citation profile to farm an utlink bundle, and calculating metrics wit across clusters for items in the 'semanti bundle, 10033] In m aspect of the disclosure, a method-may include partitioning an online author network into at least one set of source nodes with a similar Sinking history to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form an outlink bundle, and generating a graphical representation of link targets, semantic events, and node-associated: metadata scattered in an x~y coordinate space, wherein the dimensions of the graph are custom- defined using sets of attentive clusters, grouped to .represent substantive dimensions of interest for a particular analysis,
10034] In an aspect, a computerized search method may include presenting, to a user, a com uter interface for specifying one or more search terms for a search query, presenting at least one •selectable Uern corresponding to at least one of an M score and a CP! score filler for the search query, generating' an amended search query based on a. selected item, .and performing a search using the amended search query . The search may be of the Internet. The search may be of a document-corpus. The search may be..of a CH~filtered set of clusters within an online network. The -search may be of a set of nodes' having an M score greater than a threshold.
|0035] CFl may represent the degree to which an event, characteristic or behavior disproportionately occurs in a particular .cluster* or a particular cluster, relative to -a network, preferentially .manifests an event, characteristic or behavior. M scor may he calculated using the formula M score~co.tmt (alpha}-fCFl (I -alpha.) [normalised 1 to 10], where count is -the overall number of members on a cluster focus map that have engaged with a target.
10036} in an aspect, a computerized search method may include: presenting, to a user, a- computer interface for specifying one- or more search terms for a search query, presenting, to the user, a computer interface for selecting content to search with the search terms, wherein the content is taken from an online creator network partitioned Into at leastone set of source nodes with a similar linking history to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form an outlink bundle, and performing a. search of the selected content using the search query.
|0037} In an aspect, a method t iteratively reduce the scale of a network to its most influential core communities and obtain a sub-graph of maximally connected sub-actors may include assigning variable, Kcm* to each indi vidual member of the network, where conr relates to a minimum connectedness based on the number of other nodes in the network to which the individual Is connected, removing inactive individuals and individual -with, few followers from the network, temporarily removing certain indi viduals with a: large number of followers for later re-joining, restricting the remaining individuals iteratively by removing individuals with the lowest ί»ϊΐ values first, then removing individuals with the next highest Ktm values until a threshold is reached, wherein the threshold is at least one of a number of individuals removed, a number of individuals remaining, and a Kcorr value, and- ev king the temporarily amoved, individuals,
fiMI3$| in an aspect, a self-service tool to construct a social media map may include an automated process (e.g.. hot) that harvests data (e.g., nodes) and maps the data to one or more clusters segments, a processor that provides cluster/segment labels and CPS scores for the clusters/segment, and an interface that enables user browsing of clusters/segments and the map, tagging nodes, and re-grouping re-iateliiig of cl usters/segments. The automated process may also be capable- of: · utomatically refreshing the social media map based on using a relevance score for nodes in the map, positively, or negatively weighting, at least one cluster based on a CFl score calculation to include positively weighted nodes and exclude negati vely weighted nodes from the map, filtering out unwanted nodes, obligatorily including' nodes that were not clustered in a first version of the social media map, erowd-sowrced information regarding nodes and/or links that drives- nodes to bundles, processing social media map usage data for trends/indicators, wherein the usage dat relates to one or more of what is ignored, what is further explored, what is used, 'how are clusters -grouped, what name/label is assigned to a cluster, what color is used for a cluster, what order/position is the cluster placed in a report and wherein nodes preferentially interacted with are weighted more .heavily, and user-contributed data as. metadata for the social media map.. CH139] in an aspect, a method of strategic messagi ng may include generating a list of targets, in -a network/ciustef segrnent, llltermg the list by a criteria to limit whom to message in the networJ cluster/isegment in order to maximize the: impact of the message on the cluster/segment, wherein the filter is at least one of CFl score, M score, number of followers, following status, follower status, number of menlions re-tweets, number of distinct men tions, status of exposure to content* status of exposure to content that has already peaked, footprint, and number of tweets/publication frequency, and ranking tire list by the filtered criteria,
(Θ04Θ] In an aspect, a method of strategic network building may include' generating a list of targets in a network/el nster segment, wherein the list is generated using at least one of CFl, M score, # of followers., mentions/re-tweets, distinct mentions, and: number of tweets, and following the targets, [0041.] In an aspect, a method of calculating score may include calculating a cluster focus index score based on a degree to which a target disproportionately occurs in a particular cluster, or a particular cluster,- relative to a network, preferential ly engages with a target, determining an overall number of members of the cluster or network that have engaged with that target, and calculating an M score based on the formula: count plus CFl, wherein, count is the overall number of members of the cluster that have engaged with that target. f0042] in an aspect, an M score filter for a list of targets may include taking a cluster focus index (CFl) score based on a degree to which a target disproportionately occurs in a particular cluster, or a particular clyster, relative to a network:, preferentially engages with a target, and providing a slider to indicate an M score, wherein the M score is based on the formula: count (alpha)+CFJ. (:! - alpha), wherein .count is he overall number of members of the cluster or network that have engaged with thai target, and wherein the slider is used to indicate the value of alpha between 0 and L
|Θ0 3] in an aspect, a method of strategic ad placement may include generating a list of targets in a netw-ork elwster/segraeut representing linkages in a social media environment, filtering the list fay a criteria to limit the targets in order to maximize the impact of the ad on the network/cluster/segment, wherein the filter is at least one of CFI score and M score, ranking the list by the filtered criteria, and providing an interface to launch a ad -campaign to place ads direct ly from the environment representing the linkages to the target/website. Ad placement may be done vi integration with various products, .such as Twitter™ sponsored tweets, Facebook™ ad exchange, Google™ Adsense/Adwords, and third party online .ad networks. The method may ftirther include tracking interaction with the ad across social networks.
|ί>ί 44{ in an aspect, a method for using cosine similarity to determine the relationship between one or more clusters may include for each cluster, buildin a vector based on the CFI scores calculated for a number o items, plotting the vectors in a 3D vector space, determine the cosine of the angle between the vectors as an indication of the relationship between, the clusters, -and when a relationship is identified' between clusters based on the cosine, automatically labeling the clusters with the same label. If the cosine i small, the confidence that there is a high degree of similarity is high.
|0< 5 j In m aspect, a method, may include publishing a map of con ten t as a widget, and tracking interaction with the content in the widget to obtain behavioral data about a user of the. map. |tW46| In an aspect, a method may -include publishing a map of content as a widget, tracking i teractions with the content in the widget to obtain behavioral data about a user of the published map; and analysing the behavioral data in order to at least one of suggest content, track network evolution, modify the network in strategically valuable- ways, and measure the success of an ad campaign.
ΘΘ47{ These and other systems, methods, objects, features, and advantages of the present disclosure will be apparent to those skilled in the art from the following detailed description of the preferred embodiment and the drawings.
f(M)48j All documents mentioned herein are hereby incorporated in their entiret by reference. References to items in the singular should be understood, to include items in the plural and vice versa, unless explicitly -stated otherwise or clear from the text. Grammatical conjunctions are intended to- express any and all disjunc ive and conjunctive combinations of conjoined clauses, sentences, words, and the like, unless otherwise stated- or clear from the context,
BRIEF DESCRIPTION OF TOE FIGURES
{0049] The structures, methods, systems, inventions and the following detailed description of certain embodiments thereof may be understood by reference to the following figures:
10050} FIG. I depicts a process flow for attentive clustering.
{ 0511 PIG. 2 depicts a social network map in the form of proximity cluster map.
{ 52J FIG, 3 depicts a social network map in the form of a proximity cluster map highlighting attentive clusters of liberal and conservative U.S, bioggers, -and Briti h bioggers.
{0053] FIG. 4- depicts a social network map in the form of a proximity cluster map focused, on environmentalists, feminists, .political bioggers, and parents.
[0054] FIG. 5 depicts a social network map in the form of a proximity cl uster map with a cluster relationship-identified.
{0055} FIG. 6 depicts a social network map in the form of a proximity cluster map with a bridge blog identified.
{0056] FIG, 7 depicts a flow diagram for attentive clustering.
{00 73 Ft'G- 8 depicts a Political Video Barometer valence graph.
ftMiSS] FIG. 9 depicts a graph of CF I scores.
{0059} FIG. 10 depicts a graph of CFI scores.
10060} FIG . 1 1 depicts a bi -polar valence graph of link targets to the- Russian biogosphere.
[0061 J FIG, 12 'depicts an interactive bursiwap interface.
{0062} FIG, 1.3 depicts a valence graph of outlink targets organised by proportion of links from liberal vs. conservative bioggers.
{0063] FIG. 14 depicts a flow diagram relating to social media maps.
{0064] FIG. 15 depicts a flow diagram relating to .refreshing social media maps.
|8065} FIG. 16 depicts a How diagram relating to social medi maps.
10066} FIG . 17 depicts formation of a ranked target list.
{0067} FIG. 18 depicts Peakedness vs. Commitment by Time Range for two sets of hashtags,
{0068} FIG. 1 a depicts Peakedness vs. Commitment by Subsequent Uses.
{0069] FIG, i 9b depicts Peakedness vs. Commitment by Commitment by Time Range.
{0070] FIG, 20 depicts a -distribution of mention-weighted normal feed concentration by topic,
10071] F IG . 21 depicts a distribu tion of Cohesion by topic.
10072} FIG. 22a depicts a cbronotope of the #metro29 hashtag.
10073} FIG. 22b depicts a ehronotope of the #samara hashtag. 10074} FIG. 22c depicts a ehronotope of the iRu hashtag.
007S| FIG, 23 depicts a social media map platform user flow,
| il76j FIG, 24 depicts a recent' activity page for a social media ma platform.,
\W77] F!G, 25 depicts a recent activity page fo -a social media map platform.
ffl6?8i FIG, 26 depicts an overview page for social media map platform.
[0079] FIG, 27 depicts an interactive map for a social media ma platform,
0080} FIG . 28 depicts an overview page for a social media map platform.
|0O8l} FIG. 29 depicts an .influeneers page for a social media map platform,
0082} FIG, 30 depicts an infiueneer detail for a social media map platform.
}0083j FIG, 31 depicts a conversation leaders page for a social media map platform.
[M ] FIG. 32 depicts a tweets page for a social media map platform.
[0085] FIG. 33 depicts a websites page for a social media map platform.
}0086} FIG..34 depicts a key content page for a social media map platform.
10087} FIG . 35 depicts a media page for a social media map platform.
0088} FIG, 36 depicts a terms page for a social media map platform,
10089} FIG, 37 depicts a lists page for a social media map platform.
DETAILED DESCRIPTION
[0090] The present disclosure relates to a eompnter-impiemerited method fbr-attenttve clustering and analysis. Attentive clusters are groups of authors who share similar linking profiles or collections of nodes whose use of sources indicates common attentive behavior. Attentive clustering and related analytics may include measuring and visualizing the prominence and speci ficity of textual elements, semantic activity, sources of information, and hyperiinked objects across emergent categories of online authors within targeted subgraphs of the global Internet, The disclosure may include a set of specialized parsers that identif and extract online conversations. The disclosure may include algorithms thai cluster data and map t em into intuitive visualizations (publishing nodes, b logs, tweets, etc.) to determine emergent clusterings that are highly navigable. The disclosure may include a front end dashboard fo interaction with, the clustering data. The disclosure ma include a database for tracking clustering data. The disclosure may include tools and data, to visualize, interpret and act upon measurable relationships in online media. The approach .may be to segment an online landscape based on behavior of authors o ver time* thus creating an emergent segmentation of authors based, on real behavior that drives metrics, rather than driving metrics based on preconceived lists. Because the analysis is. a structural one, rather than language-based, the analysis: Is language agnostic. In an embodiment, the segmentation may be global, such as of the English language blogosphere, in an embodiment, the segmentation may involve a relevance metric for every node based on semantic markers and a custom mapping of higfi-relevanee nodes. The disclosure enables identifying influence^ such as who. is authoritative abou what t whom.
|θί)9ΐ] One method of obtaining attenti e clusters may involve construction of a bipartite matrix, however, an number and variety of flat or hierarchical clustering algorithms may be used to obtain a attentive cluster in the disclosure. In an embodiment, a set of content-publishing source nodes ("authors") 'may be- -selected based on a chosen combination- of linguistic, behavioral, semantic, network-based or other criteria, A mixed-raode network may be constructed, comprising the set S of ail source nodes, the set T of all outlink targets from selected types of hyperlinks, and the set E of edges between them defined by the selected type or types of l inks from. S to T found during a specified time period. A. matrix, such as a bipartite graph matrix, may 'be constructed of source nodes in S linked to targets in T\ derived by any combinatio of a.) .normalizing nodes in T, optionally to a selected level of abstraction, b.) using lists of target nodes for exclusion ( blacklists"), and c.) using lists of target node tor inclusion fSvhiteiisfs"). The matrix may represent a two-mode network (or actor-event network) that associates two completely different categories of nodes, actors and events, to build a network of actors through their participation in events or affiliations.. In embodiments, the matrix is, in effect, an affiliation matrix Of ail authors with the things that they link to, wherein the patterns of their linking may he used to do statistical clustering of their nodes,
f iM)92| The matrix may be processed according to user-selected parameters, and clustered in order to perform one or more of the following: I .) partition the network into sets of source nodes with similar linking histories ("attentive clusters"); 2,) identify sets of targets (linkecl-to websites or objects) with similar citation profiles ("outlink bundles"); 3.) calculate: comparative statistical measures across these partitions/attentive clusters; 4.) construct visualizations to aid in interpretation of network features and behavior; 5,) measure fretjueneies of links between attentive clusters and outlink bundles, allowing identification and. measurement of large-scale regularities in the distribution of attention by authors across sources of information, and the like. An arbitrary number and variety of flat or hierarchical clustering algorithms may be used to partition the matrix, and the results may be stored in order to select any solution for output generation. The resulting outputs (measures and visualizations) may provide novel, unique, and useful insights tor determining influential authors and websites, planning communication strategies, targeting online advertising, and the like.
ffli)93] In an embodiment, systems and methods' for attentive clusterin and analysis may be embodied in a computer system comprising hardware and software elements, including local or network access to a corpus of ' chronologically-published internet data, such as blog posts, SS feeds, online articles, Twitter™ ''tweets," Facebook™ postings, and the like. |β094} Referring to FIG. 1 , attentive clustering and analysis may include: I .) network selection 102, 2.) partitioning 104, which may include two-mode network clustering- in this- embodiment, and .3.4 visualisation and metrics output 108. Network: selection 102 may include at. least two operations: a.) node selection 1 10, and b.) link selection 1.12. Optionally, a third may be applied in which network analytic operations are used to further specify the set of source nodes under consideration for clustering. For example, the operation ma be filtering. Filtering may be technology-based, blacklist-based, whiteiist-based, and the like.
|Θ095] In an embodiment, nodes may be URLs, at which chronologically published streams or elements of content may be available. An. initial set containi any number of nodes may be selected based on any combination of node-level characteristics and/or calculated relevance scores. Regarding node-level characteristics, there may be number of different kinds of nodes publishin content online, such as weblogs (blogs), online media sites (like newspaper websites), microbj gs (like Twitter™), foraras/bulleiin boards (like http://w'ww\hio!ogy-online.org/b3-oiogy- forum/), feeds (like RSS/ATOM), and the like. In addition to different technical genres of node, nodes may differ according to an arbitrary number of other intrinsic or extrinsic node-level characteristics, such as the hosting platform (e.g., Blogspot LiveJournaf), tire type of content published (text, images, audio), languages of textual content (e.g., French, Spanish), 'type of authoring entity (individual, group, corporation, C50, government, online content aggregator, etc.), frequency or regularity of publication (daily, regular, monthly, bursty), network characteristics (e.g., central, authoritative, A-Hst, isolated, un-linked, long-tail), readership/traffic levels, geographical or political location of authoring entity or focus of its concern (e.g., Russian language, Russian Federation, Bay Area Calif), membership in a particular online ad distribution network (e.g., BLOGADS, GOOGLE™ ADSENSE), third-party categorizations, and the like, |¾096j To support node selection 11 based on relevance to particular issues or actors, or relevance-based node select ion 11 , lists of relevance markers ma be used to calculate composite scores across nodes. These lists may include such items as key words and phrases, semantic entities, full or partial URLs, meta tags embedded in site code and/or published documents, associated tags in third-party collections (e.g., DELICIOUS tags), and the like. For example, tags may be collected, automatically, such as by "spidering" sites for me a keywords- The corpus -of internet data may be scanned and matches on- list elements tabulated for each node. A number of methods may be used to calculate a relevance score based on these match counts. In an 'embodiment, relevance scores may be calculated by calculating individual: index scores: for text matches (T), link matches (L), and metadata .matches (M), and then, summin them. These individual index scorns (1) may be calculated for each node by scanning all content published by a node during a specified period of time using a list of j relevance markers: i~simi(( j*w trH¾*W } {2 , . , (xj*Wj>/ij), where x is the number of matches for the item, w is a user-assigned weight (a scale of I to S is typical), and t is the total number of item matches in the scanned corpus, in an -example, an- Initial set of source nodes may include the 100,000 Russian language webSogs most highly cited during a particular time frame. In anotnere-xample, the initial set may include the 10,000 English language web!ogs with the highest relevance scores based on .relevance marker lists associated with the political issue of healthcare, in another example, the initial set may include all nodes by Mian, and Pakistani authors In whatever language that have published at least three tiroes within the past six months.
009*?] With respect to the link selection i 12 .component of network selection .102, objects may be particular units of chronologically published content found at a node, such as blog posts, "tweets," and the like. Links, also referred- to as outlinks herein, may be hyperlink URLs found within a node's source HTML code or its published objects. Many kinds of links exist, and the ability to choose which kinds are used for clustering may be a key feature of the method. There are Sinks tor navigation, links to archives, hnks to server for embedded advertising, links in commentSjltnks to link-tracking services, and the like. Link selection 1 12 may he applied to links that represent deliberate choices made by authors, of which there may also be- many kinds. These links may be to nodes (e.g., a weblog addres found in a 'ijlogrolf'), objects (e.g.. a particular YOUTUBE™ video embedded in a blog post), and other classes of entity, such as "friends" and "followers.*' Some node hosting platforms define a typology of links to reflect explici tly defined relationships, such a "friend,'' *friend-of,M "community member," and "community follower" in LIVEIOUR AL. or "follower" and "Mowing" in Twitter™, Faee ook™ and the like. In other cases, informal conventions, such as "blogro!ls," define: a: type of link. Some of these link types are relatively static, meaning they ar iy picall avaiiabie as part of the interface used by a visitor to a node website, while others are dynamic, embedded within published, content objects. Link types may be parsed or estimated and stored, with the link data. These links represent different types of relationships between authors and linked entities, and therefore, according to the user's objectives, certain classes of links may be selected for inclusion. Different, sorts of links also ha ve time values associated with them, such as the date/time of initial publication of an. objec in which a•dynamic link is embedded, or the first-detected and most recently seen date/time of a static link. Links may be further selected fo 'clustering based on these time values,
|ΘΘ 8{ From the parameters defined for node selection 1 10 and link selection 1 12, a mixed-mode network X 130 may be constructed, consisting of th set. S of alt source nodes, th set T of all outlink targets from selected types of hyperlinks, and the set E of edges between them defined by th selected type or types of links from S to T found during a specified time period. The network 130 may be considered "mixed mode" because while it may be formally bipartite, a number of nodesin S may also exist in T, which may be considered a violation of the normal concept of two- mode net orks. Rather than excluding nodes thai may be considered either S or T nodes, the systems and. methods of the presen t d sclosure consider them, logically separate, A particular B de may be considered a source of attention (S) in one mode, and an object of attention (T) i the other. Before clustering, the set of nodes may be farther constrained by parameters applied to X, or to a one-mode subnetwork X' consisting of the network 130 defined by nodes in S: along with ail nodes in T that are also in S (or at a level of abstraction under an. element i n S, co llapsed to the patent node). Standard network analytic techniques .may fee applied, to X' in order to reduce the source nodes under consideration for clustering. For instance, requirements for k -connectedness may be applied in order to limit consideration to well-connected nodes,
fiMI9 | In an embodiment, partitioning 104 may include: L) specification of node level for building the two-mode network, 2.) assembly of bipartite network matrix 132 using iterative processing of matrix to conform with chosen threshold parameters,, and 3.) statistical clustering (multiple methods possible) of nodes o each mode, that is, source node clustering 1 14 and outiink clustering 1 .18, Outiink clustering 118 to form an outiink bundle may involve identifying sets of web sites that are accessed by the same kinds of people.
fO!iMfj With respect to specification of node level, distinction may be made between "nodes" and
"objects," considering the node as a stable URL at which a number of objects are published. This may result in generation of a straightforward two-level hierarchy (object-node); however, nodes sometimes have a hierarchical relationship among each other (ohjeci-nod.e-raeta.node). Consider the following three U RLs:
[0101 J I .) htip://www.b1oghost.conV;
10102} 2.) iZ w ibl gh si m^se s^ hnd e/bl g/;- and
|01 3j 3.) .h ://ww >b1oghosi om¾s.ers.¾ohndoe¾iog/09/6 2 l/niybtogposdii k
flli$4j Here, a three-level hierarchy with a etatxxle [1], node [2,1, and object exists. In some embodiments, the node URL may correspond very simply to a "hostname* (the part of a URL after "http://* and before the next ** ) or a hostname plus a uniform path element (like " blag" after the hostname). In other embodiments though, multiple nodes ma exist at. pathnames under the same hostname. Dependin on the objective of (he user, a "node level" may be selected tor building the two-mode network, such, that second mode nodes include (from most general to most specific level) a,) raetanodes (coilapsmg sub-nodes into one) and independent nodes, b.) child, or sub-nodes (treated individually) and independent nodes, or c.) objects (of which a great many ma exist for any given parent node). In embodiments, it may be possible to mix node levels according to a rule set based on defining levels for particular sets of nodes and meianodes, or on link thresholds for qualifying objects independently. Furthermore, a node with a webpage URL may Often have one or more associated "feed" URLs, a! which published content may be available. These" feeds are generally considered as the same logical node as the parent site, but may be considered as independent nodes. If a target URL is not a publishing node, but another kin of website, the level may likewise be chosen, though more levels of hierarchy may be possible, and typically the practical choice may be between hostname level or full pathname level,
|M§5] With respect iO the assembly of the bipartite network matrix 132 using .iterative processing of the matrix 132 to conform with chosen threshold parameters, links may he reviewed and collapsed (if necessary} to the proper node level as described hereinabove, and the two-mode network, may be built between all link sources (the initial node set) and all target (second-mode) nodes at the specified node level or levels. Optionally, blacklists and wMtelists may be used, to, respectively, exclude- or force inclusion of spec fic source- r target nodes. From this full network data, an NxK bipartite matrix M, in which N is the set of final source nodes and K is the set of final target nodes, may be constructed according to user-specified, optional parameters, such as maxnodes, nodemin, maxlinks, iinkmin, and th like. An iterative sorting algorithm may rioritize highly connected sources and widely cited targets, and then use these values to determine which nodes and targets - from the Ml network data may be included in the matrix. axsources and maxtargets may set. the maximum values for the number of elements in N and K. Nodemin may specify the minimum number of included targets (degree) that a source is required to link to in order to qualify for inclusion in the matrix. Linkmin similarly may specify the minimum, number of included sources (degree) that must link to a target to qualify it for inclusion In the matrix. Two other optional parameters, nodemax and Itnkmax max. be used to specify upper thresholds for source and target degree as well Each value (V?) in M, is the: number of individual links from source i to target j.
fOI j With respect to .statistical clustering in each mode, that is node clustering 1 14 and outSink -clustering 1 18, there may be a number of clustering 'algorithms which may be used t partition the network, including hierarchical agglomerate ve, divisive, k-meaos, spectral, and the like. They may each have merits for certain objectives. In an embodiment, one approach for producing inierpretable results based on internet dat may be as follows; I .) make M binary, reducing all values 0 to 1 ; 2.) calculate distance matrices for M and its transpose, yielding an NxN matrix, of distances between sources, and a KxK matrix -of distances between targets. Various distance measures may be possible, but good result may be obtained by converting Pearson correlations to distances by subtracting from 1 ; 3.) using Ward's method for hierarchical agglomerative clustering, a cluster hierarchy (tree) maybe-computed and stored for each distance matrix. Results of an arbitrary number of clustering operations may be saved in their entirety, so that an particular flat cluster solutions may be chosen as the basis for generating outputs. |0187) in an embodiment, the clustering algorithm may be language agnostic, that is, forming attentive clusters aro'urid similar targets of attention without a constraint on the language of the targets. In m embodiment, clustering ma mak use of metadata that may enable the system to know about the content of various websites without having to understand a language. In another embodiment, the algorithm may have a translator or work in conjunction with a translation application in order to fi nd term across publications 'of any language.
|θί§8} Now that the first two stages of attentive clustering, network selection and two-mode network clustering, have been described we turn to a description of visualization and metrics output Any particular set of cluster solutions for source nodes (an assignment of each node to a cluster) may be selected by the user in order to generate one or more of the following classes of output: 1.) per-claster network metrics for source nodes 120; 2.) across clusters comparative frequency measures of link, text, semantic and other node and link-level events, content and features; 3.) visualizations 124 of the partitioned network, combined with these measures and. other dat on node and link-level events, content and features; and 4.) aggregate cluster metrics refl ecting ties among clusters taken as groups* Further, any particular set of cluster solutions for target nodes may be selected and used in combination with the set of cluster solutions for source nodes in order to generate: 1.) measures of link frequencies and densities 128 between source clusters and target clusters; 2.) visualization 124 of the previous as a network of nodes representing clusters of sources and targets with, ties corresponding to link densi ties .128; and 3.) visualizations 124 of one-mode calculated (network of target nodes) networks with'partjtion data,
|M09j lo one class of output, and with respect, to per-cluster network metrics for source nodes 120, in addition to- standard- network metrics, for source nodes that are generated over the entire network, and which reflect various properties important for determining influence and role in information flow, user-selected cluster 'solutions ma be used to generate a set of measures for each node, per-ciusier. These measures may represent the node's direct and indirect influence on, or visibility to, each cluster, as well as its attentiveness to each cluster. For every node i, these measures may include the following: saroe-m: the number of nodes in the same cluster that link to i ; sa me-out: the number of nodes in the same cluster i. links to; ciiff-m: the number of nodes in other clusters tha link to i; di.Ff-out: the number of nodes in other clusters that i links to;, same-in- ratio': the proportion o i -l i nki ng node fro the same cluster; saine-out-ratio: the proportion of in-linking nodes from other clusters; w-same-ih; same-in scores where value of in-Sinking hlogs is weighted by its central ky' · measure w-dtff-in: diff-ih scores where value of in-linking b!ogs is weighted b its centraliiy measure; and per-elusier influence scores: similar scores (raw and weighted) fo in-links from, and out-links to, each cluster on the map. fOT lO] in another class of output, and with respect to across clusters comparative frequency measures of link, text, semantic, and other node and link-level events, content and -features, the partitioning of the network into sets of source nodes may allow independent and comparative measures to be generated for any number of items associated with sourc nodes. These may include such items as: a) the set of target nodes K in M; h.) any subset of ail target nodes, inci udmg those on. user-generated lists; e.) any set f target objects, such as all URLs fo videos on YOUTUBE'™ or all object URLs on user-created lists; d.J any other URLs; e.) any text string found in published material from source modes; f.) any semantic entitie found k published material from source nodes; g.) any class of rneta-data associated with source nodes, such as tags, location data, .author demographies, and the l ike. For any item I in a set of items associated with source nodes, the 'following examples of measures may be generated per each cluster: L) total count: number of occurrences of item within the cluster (multiple occurrences per source node counted); 2.) node count; number of nodes with item occurrence within cluster (multiple occurrences per source node coun t as 1); 3,) item/cluster frequency: total count f of nodes in the cluster; 4.) node/cluster frequency: node. count of nodes in the cluster;. 5.) standardized item/cluster frequency: multiple approaches are possible, including z- cores, and one approach is to use standardized Pearson residuals, which control for both cluster size and item frequency across clusters and. items in the set; and 6,) standardized node e!uster frequency: multiple approaches are possible, including z-scores, and one approach, is to use standardized Pearson residuals, or Cluster Focus Index scores 12.2, The higher the CF! score for the item, the greater the degree of its disproportionate use by the cluster. A score of zero indicates that the cluster ci es the source at the same frequency as the network doe on average. Other detailed data m y lie possible to obtain, such as the top nodes in each cluster, lists of all nodes in the cluster, lists of relevant Internet sites that eac of the clusters link to (which enables identifying target ou ! inks where a message can be placed in order to reach specific clusters), the relative use of key terms across the clusters (which enables developing specific messages to communicate to each-. cluster), a hitcount (the raw number of times each outiink and term was found withi all the identified nodes), source node and/or cluster geography and demographics, sentiment, and the like.
|Θ111 } For example, differential frequency analysis can be. done on meta-data, suc as tags, that are associated with different attentive clusters o facilitate cluster interpretation. In. the example, by sorting cluster focus scores 122 on the raela-data tags, interpretations of what the clusters are about may be deri ved without any manual review. The met d ta associated with the- clusters may be used t facilitate interpretation of the meaning of the clusters. In an. example, the meta-data may be language independent, such as GIS map data. |0112] In another class of output, and with respect to visualizations of the partitioned network 124, a social network diagram, may 'be generated and -used to display link, text, semantic and Other node and link-level e ents, content and. features ("event data"), such as that shown in FIG. 2. The network ma ma be static or it may be the basis of an interactive interface for user interaction via software, soitware-as-a~servlee (SaaS), or the like. There may be two components to this process of visualization: .1.) creating a map of source nodes in a .dimensional space for viewing; and 2,} use of colors, opacity and sizes of graphical elements to represent clusters, nodes and event data. With the dimensional mapping component, multiple approaches may he possible. One method may be to use a "physics mode!" o "spring erobedder" algorithm suitable for plotting large network diagrams. The Fruchterman-Retngold. algorithm may be used to plot nodes in two or three dimensions. In these maps, every node is represented by a dot, and its position is determined by link to, from, and among its neighbors. The size of the dot can vary according to network metrics, typically representing, the chosen measures of node eentrality. The technique is analogous to a locally-optimized multidimensional sealing algorithm. With the component related to use of colors, opacity and sizes of graphical elements to represen clusters and event data, nodes may be colored according to selected cluster partitions, to allow easy identification of various partitions,. This projection of the cluste solution onto the dimensional map may facilitate intuitive understanding of the "social geography" of the online network. This type of visualization may be referred to as a "proximity cluster'' map, because proximity of nodes to one another indicate relationships of influence and interaction. Further, projection of -event data onto the ma may enable powerful and immediate insight into the network context of various online events, such as the use of particular words, or phrases, linking, to particular sources of information, or the embedding of particular videos. This may he produced as static images, and may also be the basis of software-based interactive tools for exploring content and link behavior among network nodes, fill 13] in another class of ou tput, and with respect to aggregate cluster metrics 128, metrics may be calculated for partitions at the aggregate level. Event. metrics may include raw counts, node counts, frequencies (counts # nodes in cluster), normalized and standardized scores, and the like. Examples typically include values such as: the proportion of hiogs in a cluster using a certain phrase; the number of blogs in a cluster linking to a target website; the standardized Pearson residual (representing deviatio from expected values based on chance) of the links to a target list of online videos; the per cluster "temperature" of an issue calculated from an array of weighted- value relevance markers; and the like.
fiH 14] As described above, any particular set of cluster, solution for target nodes may be selected and used in .combination with the set of cluster .solutions for source nodes In order to generate additional outputs. Visualizations produced may include: 1.) two-mode network diagram of relationships between clusters of sources- and targets, treated as aggregate nodes and with tie strength corresponding to link density measures; and 2.) second-mode C¾o-eitaiion") network diagram. In which targets are nodes, connected by ties representing the number of sources citing both of them, and colors corresponding to cluster solution partitions. Another output may be macro measurement of link density. To reveal and measure large-scale patterns in the distribution of links from Source nodes to targets, the matrix M may he collapsed to aggregate link measures among clusters of sources and clusters of targets, A ser ies of SxT matrices may be used, with S as the set of source clusters {"attentive dusters") and T as the set of clustered targets ("outSink bundles"). These matrices may contain aggregated link measures, including: counts |'c); the number of nodes in source cluster s linking to any member of target set t; densities (d): c divided by the product of the number of members in $ and the number of members in t; and standard scores (s): standardized measures of the deviation from random chance for counts across each cell. Various standardized measures are possible, with standardized Pearson residuals obtaining good results, Any of these measures may foe used as the basis of tie strength for two-mode visualizations described above.
{01.15] In an embodiment, a density matrix may be constructed between attentive duster and outlink bundles. The attentive clusters may be represented as row headers and the Outiink bundles may be represented as column headers. The density matrix may allow users to see patterns in attention between certain sets of websites and certain bundles. The densit matrix may provide a way to identify similar media sources. Further, the density matrix may provide Information about attentive clusters that may be based on particular verticals.
10116} Having described the process for attentive clustering, we now turn to examples of applications of the technique and various related analytical applications thereof for measuring frequencies of links between attentive clusters and outiink bundles, thus enabling identification and measurement of large-scale regularities' in the distribution of attention b online authors across sources of information,
(0117] In an embodiment, and. referring to FIG. 2, a social network, map. of the English-language blogosphere is depicted. The social network map graphically depicts the most linked-to felogs in the English language blogosphere. The size of the icons representing each Individual blog may be representative of a network metric, such as the number of inbound links to the blog. Thi visualization depicts the output from a method tor attentive clustering and analysis which identified attentive clusters of linked-to hSogs, wherein the attentive. Clusters Included authors with similar interests,
|0118] Referring to FIG. 3, the method for attentive clustering and analysis analyzes hloggers' patterns of linking to understand their interests. The visualization in FIG. 3 highlights liberal and conservative IIS, bioggers, and British bioggers as attentive clusters. By zooming in on' the visualization, subgroups such as conservatives focused on. economies or liberals focused on defense may be identified from among the attentive clusters depicted,
|flll9| Referring to FIG. 4, the method for attentive clustering and analysis enables building a custom network map, in FIG. 4, the network map features attentive clusters of bioggers attuned to these topics: environmentalists, feminists, political bioggers, and parents. Subgroups within each topic may be delineated by a different color, a different, icon shape, and the like. For example, within the parent bioggers, icons representing the libera! parent bioggers may be colored differently than the traditional parent bioggers. Surprising relationships may be discovered among groups of bioggers. For example, in FSCi 5, two parent bioggers with very different social values are closer in the network than either is to political bioggers who share their broader political views. ('0128] Referring to FIG. 6, each attentive cluster may have its own core concerns,- viewpoints, and opinion leaders. The method for attentive clustering and analysis enables identification of blogs that are considered bridge blogs, such -as the one shown circled, which indicates that the blag is popular among multiple attentive dusters. The method for attentive clustering and analysis enables identification of whose opinions matter, about what, and among what groups.
| 12i I Referring to FIG. 7, the steps of attentive clustering and analysis may include constructing an online author network, wherei constructing the online author network includes selecting a set of source nodes (S), a set of outlink targets (T) from at least one selected type of hy perl ink, and a set of edges (E) between S and T defined by the at least one selected type or types of hyperlink from S to T during a specified time period 702; deriving a set of nodes, "f, by any combination of a.) normalizing nodes in T„ optionally t a selected level of abstraction, bj using lists of target nodes tor exclusion ("blacklists**), and c) using lists of target nodes for inclusion- ("whitelists.'*) 704; transforming the online author network into a matrix of source nodes in S linked to -targets in T 708: and partitioning the online author network into 'at least one set of source nodes with a similar linking history to form an attentive cluster and at least one set of outlink targets with a similar citation profile to form an outlink bundle 71.0.. The steps may optionally include generating -a graphical representation of attentive clusters and/or outlink bundles in the network to enable interpretation of network features and behavior and calculation of comparative statistical measures across the attentive clusters and outlink bundles 712, wherein at least one element of the graphical representation depicts a measure of an extent of a type of activity within the network; and optionally measuring frequencies of links between attentive clusters and outlink bundles enabling identification and measurement of large-scal e regularities in. the distribution, of atten tion by online authors across sources of information 7.14. The element of the graphical representation may use at least one of size, thickness, color and pattern to depict a type of activity. Attentive clusters may be visually differentiated m the graphical representation by at least one of a. color, a shape, a shading, and a size.. The size -of the object representing the attentive clusters in the graphical representation may correlate with g metric. The nodes, targets, and edges may be collected from public and private sources of information. Constructing the matrix may include applying at least one threshold -parameter irons the group consisting of: raaxnode.% targetmax, nodemin, targetoiin, maxiinks, and linkmin. Co tructitig the matrix may include applying a minimum threshold lor the number of included nodes that must link to a target to qualify it for inclusion in the matrix, Constructing the matrix m y include applying a minimum threshold for the number of included targets that must link to a node to qualify it for inclusion in the. matrix. Constructing the matrix may include using blacklists t exclude particular nodes, and whitelists to force inclusion of particular nodes. The matrix may be a graph matrix,
(0122] By identifying and measuring the frequencies of Sinks between attentive clusters and outlink handles, all manner of information about the distribution of attention by online authors across sources of information may be obtained. Various examples of the sorts of information, visualizations, applications, reports, APIs, widgets, tools, and the like that are possible using the methods described herein will be described. For example, two play lists lor YOUTUBB™ videos may be ident i fied, one that has fraction wi th sub-cluster A the other with sub-elUster 8. in another example, two SS feeds may be ganized at supply a user with items that have- more attention from sub-cluster A. versus sub-cluster 8, in another example, a valence graph may be constructed that, depicts words, phrases, links, objects, and the like that are preferred by one sub-cluster over another sub-cluster; such valence graphs may use aggregated sets of clusters defined by users to display dimensions of substantive interest, such as in FIG, 11 . in yet another example, works from, authors who are mosi relevant in a particular cluster may be displayed and then published as a widget, which may be custom-based on a valence graph, as a way of raomtoring an ongoing stream of information from that cluster. Clusters may he customizable within the widget, such as via a dialog box, menu item, or the like. Further examples will be described hereinbe!ow.
I&123] A user may be able to, optionally in real time through a user interface, select a stream of information based on looking at the environment, zoom, in based on clustering, figure out a valid emergent segmentation, and then set up monitors to watch the flow of events, such as media objects, text, key wOrds/language, and the like, in. real time.
(0124J In an embodiment, differences i word frequency use by attentive clusters may be used to differentiate and segment clusters. For example, the attentive clusters "militant feminism* and "femin ist mom" may both frequently use terms associated with feminism i n their -publications, bu t additional use of terms related to militantism in one ease and maternity in another ease may have been used to subdivide a cluster of feminists into the two attentive clusters "militant feminism" and "feminist mom," In extending this concept, not just word usage bat the frequency of word usage, may also be useful in segmenting clusters. For example, in clusters 'of parents, the ones 'actually doing home schooling did not use the terra "home -school" frequently, but rather used the term "home education" with greater frequency. By identifying the 'specific language/words used by a cluster, the system may enable crafting messages, brands* language, and the like for -particular clusters, in an embodiment, an application may automatically craft an advertisement to he placed at one or more out! inks in an o tlink bundle using high frequency terms used by an attentive cluster. Further in the embodiment, th advertisement may be automatically sent to the appropriate ad space vendor for placement at the one or more outlinks.
|012S'j In an embodiment, a method of using attentive clustering based, on analysis of link structures to steer a further data collection process is provided., The data collection may include collection of web-based data, such as, for example, clickstream data, data about websites, photos, emails, tweets, blogs, phone calls, online shopping behavior, and the like. For example, tags may be collected automatical iy or manually for every website that, is a node. The tags may be non- hierarchical keywords or terms. These tags may help describe an item and may also allow th item to be found again by browsing or searching. In a example, tags may be associated in third- party .collections such as DELICIOUS' tags,, and the like.. In another example, there web crawlers ay extract meta keywords and tags includedwithin, node html Further, specific keywords and phrases may be exported to a database. In yet another example, the tags may he generated by human, coders. Once a cluster partitioning exists, the system may do differential frequency analysis on the tags that are associated with different attention clusters. By sorting cluster focus index (CFI) scores along with the tags, the system can come up with an interpretation of the meaning of a cluster without requiring further analysis of the cluster itself.. 1ft an. embodiment, the system may apply a further data collection, process in order to associate respondents to a survey and their news sources with various corners of the internet landscape. For example, the influence of a particular news outlet across a segmented environment of the online network may be obtained by examining clustering in conjunction, with a downstream dat collection process, such as .obtaining survey research, elsekstream data, extraction of textual features, for -content analysis including automated sentiment analysis,, content coding of a sample of nodes or messages, or other data.
|θί .26] In an embodiment, clustering data may be overlaid on CHS maps, "human terrahT maps, asset data on a terrain, cyberterrain, and the like.
|ίΗ2?] In an embodiment of the present disclosure,, a method of determining a probability that a user will be exposed to a media source given a known media source exposure is provided. The media source may include newspapers, magazines, radio stations, television stations, and the like. For example, a user who may be exposed to a particular media source may be clustered in a specific attentive cluster. Accordingly, the system may determine thai users in that 'particular attentive cluster are more likely to be exposed t another media source because the second media source may also be present in an oullink bundle preferred by the cluster,
f flOS In an embodiment of the present disclosure,, a method of attentive clustering on a meso level is provided. The method ma enable identifying emergent audiences '(Attentive- Clusters) and monitor how messages (as specific- as a single article In print; as broad as core campaign themes) traverse cyberspace. The method may involve mapping the attentive clusters where messages have, or are likely to find, recepti ve audiences. Mapping may enable identifying opinion leaders, -and information sources,. online and offline, which help shape their views.
i2 j The method may enable identification of the mindset/social trends of a group of users. For example, the system may he able to associate an attentive cluster with a known network, such as a political party, a political movement, a group of activists, people organizing demonstrations, people planning protests, and the like. Vi the ability to associate attentive clusters with particular groups of people, the system may be able to track the evolution of a movement or identit over¬ time. Further, if a cluster supports a political movement, the system may track the impact of the political movement of the cluster on society. The system ma track if the political movement has been accepted by majority of the people of the society, rejected by the society, if there is debate about the political movement, and the like. Accordingly, the method may enable growth of a brand, sale of a product, conve ing a message, prediction of what people care about or do, and. the like,
Ot30] in an embodiment of the present disclosure, a system and method for multi-layer attentive clustering may be provided. In. the system and method, attentive clusters may be tracked across various layers of a social segmentation* such as specific social media networks- (Twiiter™, Faeebook™, Orkut™, and the like), a b!ogospher , and the like. The system may be able to track development of an attentive cluster in a single layer or across multiple layers at every stage of the development of the cluster. When different layers of online media (such as webiogs, microblogs, and social network service) are clustered individually, measures of associatio may be created between clusters across layers, based on densit of hyperlinks between them, common identities of underlying authors, mutual citation of the same sources, mutual preference for certain topics or language, and the like. The system may also track the major players f clusters at ever stage of development of the cluster,
0ϊ31| For example, the growth of an attentive cluster supporting political movement may be tracked back. in time and over a period of a time, 'in the example, once an attentive cluster may be .identified, the system may examine the nodes associated with specific players in the attentive cluster in order to determine characteristics, such as. who is talking to whom, identify key nodes or hubs that link many other layers and/or media sources, identify apparent patterns of affinity or antagonism among clusters or other too wo. networks, who may have started the political movement* when the political movement may have started, what messages were used at the forefront of the political movement's establishment, the size of the movemen the number of people who initially joined the political movement growth of the political movement, influential people from various stages of the political movement, and the like. In this example* all of the analysis may be confined to activity in a single layer of a social segmentation, or it may be undertaken across multiple layers, Continuin with the example, the impact of the political movement on society may be examined by tracking 'the penetration of art attenti ve cluster or its message across layers or the expansion of the attentive cluster in a single layer. Likewise, attentive cluster analysis may enable predictions. For example, an attentive cluster may be tracked in a single layer, such as by monitoring the number of Twitter™ followers (or other applicable social platforms), the frequency of new followers added, the content associated, with that attentive cluster, inter-cluster associations, and the like, to determine if a political mo vement may be being spawned, expanded, diminished, or the like. In an embodiment, the socio-ideologicai configuration of the people who spawned the political movement may be evident from analyzing one or more of a biog layer, asocial networking layer, a traditional medi layer, and the like. J6132J For example, a Twitter™ (or other applicable platform) map may be formed where each colored dot is an individual Twitter™ account and the position is a function of the "follows" relationship. People are close to people they are following or who are following the ,. xhe attern of the map may be: related to the structure of influence across the network.
01.331 Ift an embodiment, the system may be deployed on a social networking site to identify and track attentive clusters and linkage patterns associated with the attentive clusters. For example, the system for attentive clustering may foe applied on Facebook™ to identify attentive clusters in the Facebook.™ audience and track the cluster's activity within Facebook™ In an example, the system may be used to identity a group of people wh ma be susceptible to a message. By Identifying aad tracking an attentive cluster in the Facebook™ layer that may be susceptible to a message., downstream activities, suc'h as organizing in response to the message, ma be examined, For example, an attentive cluster of university students ma be presented with a message regarding a proposed law lowering the drinking age. The system may track activity within the cluster related to the message, identify new ' groups formed around -the topic of the message, invitations to other groups regarding the message,, opposition from other groups in response to the message, and the like. Indeed, the system may be able to track the formation of new attentive clusters in the Facebook™ layer in response to the message, in this case, the system may identify individuals Otis 'groups that link to one another who share a common interest or target of attention,, such as concerned parents pposin the proposed law, anti-government groups supporting the proposed law. child advocate groups opposing the law, and the like. Discoveries related to the- original layer ma be applied to strongly associated clusters in other layers. For instance,: determination about the interests of a cluster in the Facehook™ layer may be .used, to drive a communications or advertising strategy in associated clusters of other layers such as wehJogs or Twitter™.
134] Measures for characterizing contagious phenomena propagating on networks may include peafcedness, commitment (such as by subsequent uses and time range), and dispersion (including normalized concentration and cohesion) and will be further described herein,
f0!3S] In other embodiments, two-mode networks may be generated by projecting modes one onto another. For example, certain social networks may not allow handling of individual data, but may allow public- page data to be accessed. In this way, data from individuals who comment on public pages may be obtained., Public pages may be treated as a two-mode network that is collapsed to one mode, for example, a two-mode network may be formed' from two -classes, of actors, people and cocktail parties that the people attend. One class of actors could be labeled 1 5 and the other A'-E to generate a scatter diagram depicting a two-mode network, either a network of eockta.il parties attended by the same people or a network, of people who attended the same cocktail parties. Likewise, networks may be formed based on who participates in the stream of objects that come from different public pages, the relationship between public pages, such as if there is a. direct "like" relationship between public pages, weighted by how many people commented on objects from two or more pages, and the like.
101.36} These data may be clustered as described herein, in embodiments, the weight between public pages indicated by the number of users commentin on object, from both -pages may be used to visually indicate a stronger connection between- pages with higher weights.
0137'j Clustering of this public page data may result in the formation of poles. For example, two .poles may form where one set of pages is interacted with by one population and. another set of pages interacted with by a very different population. There may be individuals who are interacting with both of these sets of pages at either pole. In any event, in the process of attentive clustering, users who are most tenuously connected to any thing are forced to the outer edges of the cluster map,
|0138 j In an embodiment of the present disclosure, a method of analyzing attentive clusters over time- is provided. The analysis of these attentive clusters may enable the system to depict changes in the linking patterns of attentive clusters over a time period. Further, the analysis .may allo depiction of any changes in the structure of the network itself. |0I39) In an embodiment, a time-based reporting method may be used by the system to demonstrate the effects of events/actions throughout network of attentive clusters For a period of time, in the method, bundles that may -be-. lists of semantic markers, including text elements embedded in a post- or tweet, links to pieces of online content, metadata tags, and the like, may be tracked in clusters across a network, such as a blogosphere.
|M40j For example, a bundle of semantic ma ke s related to obesity ma be tracked Over time to determine how the topic of obesity is being discussed, In the example, a particular bundle (with text, l nk and meta data elements) ca be tracked across clusters to see where they are getting attention or not The measure of attention may be defined as a "temperature." The "'temperature" is based conceptually on Fahrenheit temperatures (without negatives) as compared to other issues where 100 is very, hot and.0 is iee cold. The method may have a tracking report as an output for tracking issues in a map across time, in this example, the tracking report may be focused on a collection of bSogs most focused on childhood obesity organized into attentive clusters over a moving 12-month period of time, The blogs.may he clustered broadly into policy/politics, issue focus, culture, famil /parenting, and food attentive clusters. There may be sab-clusters defined for each of those clusters., such as conservative, social conservati ve, and liberal sub-dusters under the policy/polities cluster. The report may indicate the issue intensity for each eiuster/sub-ciuster by assiining.it an average temperature per blog of conversation on the broad topic of childhood obesity within each group. The report may indicate the issue distribution for each e!uster/sub- cluster by calculating a percentage of childhood obesity conversations taking place on b!ogs not in the map and within each cluster within the map. Continuing with this example, specific terms may be tracked across the dusters/sub-el asters over time and the method may indicate an average temperature, based on the uses of specific terms in b!ogs within each cluster. In the example, the term "school lunch" has a high "temperature" in certain issue focus clusters, liberal policy clusters, and foodie dusters and steadily increased over the last eight moving 12-month periods. Similarly, the intensity of sites, or the average temperature based on links to specific web sites on blogs within each cluster, may be provided b the report. The intensity of source objects, or the average temperature based on the links to specific web content (articles, videos, etc), may be provided by the report, The intensity of sub-issues, or the average temperature of conversation on identified issues defined by a set of term and links, ma be provided by the report, i the report, specific terms may be tracked on a monthly and per-eksster basis, specific sites may be tracked on a monthly and per-cluster basis, and specific -objects may be tracked on a monthly and per-cluster basis.
ft) 141} in an exemplary embodiment, the system ma identify an track structural changes in a network. For example, during the recent US elections, blogs appeared instantaneously that were anti-Obama, Pro- Pal in, or Pro-McCain but were outside the conservative blogosphere. This rapid change in the network structure- may be indicative of a coordinated, synchronized campaign to message and hlog.
{«142] lo an embodiment of the present disclosure, a: method of attentive clustering by partitioning an author network into a set of source nodes with similar adoption and use of technology features- is provided. For example, instead of a website being a target of attention for an attentive cluster or around which an attentive cluster forms, a feature or a piece of technology, such as an embedded Facebook™ "Like" button, may be a target of attention or clustering item,
0143) in an embodiment a method of creating dusters of people and describing probabilistic relationships with other clusters, such as words-, brands, people, and the like, is provided. The system may describe any probability of any relation between them,
{'0144] To identify what an attentive cluster links to more than the network average or what words and phrases they use more than the network average, a cluster focus index score (CF1) may be calculated, CFl represents the degree to which an event, characteristic, or behavior disproportionately occurs in a particular cluster, or a particular cluster, relati ve to the network, preferentially manifests an event, characteristic, or behavior. For example, CFl score could be generated for a particular cluster across a set of target nodes, representing the degree to which a particular target is- disproportionately and -preferentially cited by members of the particular cluster, or the degree to which, the particular cluster, relative to the network, preferentially cites the target The CFl gives a sense of what is Important to an attentive cluster, where they go for their information, what words, phrases and issues they discuss, and the like, FIG, 9 depicis a graph of cluster focu index scores, for targets of a conservati e-grassroots attentive cluster. The targets circled on FIG. 9 (F through J) are those that everyone in the network links to, according to their CFL The targets circled i 'FIG. 1.Ό (A through E) are those that are disproportionately linked to by the conservative-grassroots attentive: cluster, according to their CFL
{0145] In an embodiment, .a method of identifying, websites with high attention from an identified attentive cluster or author is provided. The method may include determining the websites frequently or preferentiall cited by identified' authors by examining the websites' cluster focus Index (CFl) score. Further, the method may include automatically sending or placing advertisements, alerts, notifications, and the like to the websites. For example, a social network analysis may generate a network map with thousands of nodes clustered into attentive clusters. In an example with bloggers, influence data thai results from the network analysis may be influence metrics for sites from across the Interact which bloggers link to, including .mainstream media, niche media, Web 2.0, other bloggers, and the like. These are the influential sources (also called outlinks, or targets) used by specific groups of nodes across the map. For example, influencing a targeted cluster of bloggers can. often be accomplished by targeting these sources, '"upstream" in the information cycle, rather than going after the bloggers directly, hi other .embodiments, influence 'date may be metrics that reveal network mfiiten.ee among bloggers directly , Bloggers are .usually thought of as simply being more, influential or less, but this data lets the analyst discover which hiogs are Influential among which online clusters (segments), a far more granular and targeted approach. Bach of these data sets can be sorted to examine either influence over the entire map or disproportionate influenc -over particular clusters (i.e., how to reach particular audiences). Cluster targeting can be iurther refined to identify which nodes in a specific cluster have influence on any of the other clusters on the map. Because the conversation within social media covers a wide variety of topics, source' arid network- influence alone do not necessarily refkci influence on a specific topic. A relevance index metric for discussion regarding particular topics, events, and th like ma be added to a social network analysis to identify which nodes are most focused on this topic,
18146') For both data sets there are two main sorts metrics representin influence* First are metrics representing the. influence of nodes in the one-mode network (set of source nodes S) as a whole, or directly among particular clusters or among specific other nodes. For example, for any given node in S, count (also called in-degree) is the number of other nodes in S that link to it. Count can be calculated across: the whole map, or per cluster. Second, score can be calculated that shows the influence of target nodes (nodes in. T or T) on clusters of nodes in S. Count can also be used, and CFf scores can be calculated that represent the influence of particular targets on specific attentive clusters. In other words, how specifically interesting or authoritative the target, is for that cluster. Relevance: index scores may for nodes may also, he' calculated, using lists of semantic markers, to provide further metrics of value for targeting communications, advertising, and the like. Depending on the communications strategy, specific sorts of 'the data will create lists of likely high-value targets for further action, While count, CFI, and relevance index scores are all important they can be combined in order to maximize certain objectives. The following use case examples include combining count and relevance into a targeting index, by multiplying their values. Other, more complicated maximisation formulas are possible as well. The examples demonstrate specific influence sorts that can be generated from 'the Russian 'network data to address each use case. The network data is based on the linking patterns of the nodes in the RuNet map over a nine-month 'peri d ending in February 2010.
|0147J Use Case 1 and Use 'Case '2 involve finding influential sources. Use Case 1 involves identifying sources with the. most influence over the entire map by doing a. sort using the highest values of count. While extremely influential, and in many cases suitable- for advertising campaigns, these universally salient sites also tend to be much harder to reach out to than sites thai are smaller but specifically important to targeted segments.
£0148] Use Case ,2 involves identifying sources that reach a targeted cluster by sorting on sources by Cluster Focus Index-, CF!s may be sorted for any of the attentive dusters. Count metrics from the map as a whole and from the targeted cluster can be used to further prioritize for action. This sort is the equivalent 'of identifying traditional media trade press, the go-to sites for the selected segment Frequently, these include specifically influential bioggers in addition to niche media and other sources,
0149] Use Cases 3-6 involve finding influential nodes. Use Case 3 involves identifyin the greatest network influence by sorting the node by hideg (total number of links from other nodes within the entire network). This sort specifically identifies the network's "A is ' nodes, the mos influential .network members (bioggers). Like prominent sources, these are often more difficult to reach than more targeted niche mfluentisls, but they contribute greatly to. spreading viral niche messages across the wider network.
[0150} Use Case 4 involves finding the most targeted Infiuencers for a particular cluster by sorting the Cluster Focus Index scores for a targeted cluster to find nodes it -cluster-specific influence. This identifies the nodes with particular influence, interest or prestige among the target cluster. These .nodes tend to be much more "on topic" than others, and much easier to reach that map-wide A-iist nodes. Cluster-specific influential are not always from the target cluster itself, which can be very useful for trying to move discussion between particular clusters. Link metrics provide further assistance in: deciding targeting priorities,
[0151} Use: Case. 5 involves, following a particular topic at the map level by sorting using topic focus target scores, which combine links (network influence) and topic focus index (issue relevance). Formulas for calculating focus target score can be vari ed, hut the default may be to multiply links by topic focus index;, This may allow-' identification of those nodes "in the entire map that discuss the target issue most frequently. These may be monitored to gauge dominant threads of discussion and opinion about the issue, and targeted for outreach.
6152] Use Case 6 involves targeting a -particular clyster's conversation on a topic by sorting within a cluster by the topic focus target score. This may allow members of the target cluster who write about the target issue to be identified for -monitoring or persuasion. Variations of the formula for combining influence and. relevance metrics into a single targeting metric can be used to bias the sort toward relevance, or toward influence, depending on strategic objective.
(0153] In an embodiment, a proximity cluster map method may be used to visualize 124 attentive cluster-based data and generate- a network map. in the method, attentive clusters and their constituent, nodes may be displayed i a proximity cluster map. Nodes i the network map may be represented by individual dots, optionally represented by different colors, whose size is determined based on the number of other nodes n the map Sink to them, A general force may act to move dots toward the circular border of the map, while a specific force pulls together every pair of nodes connected by a Sink. In static images or an . interactive visualization via software connected to a database, nodes may receive a visual treatment to display additional data of interest For example, dots representing nodes ma be lit or highlighted to represent all nodes linking to a particular target, or using a particular word, with other nodes darkened. In another exampl dot size may be varied to indicate a selected node metric.
{0154} in an embodiment, a valence graph method may be used t visualize I 24 attentive cluster- based data and generate a valence graph. In the method, targets of attention or semantic elements occurring in the output of nodes may he displayed in a valence graph. The valence graph method may he understood via description, of how a particular valence graph is built such as a Political Video Barometer valence, graph (FIG. 8) useful lor discovering what videos liberal and conservative bloggers are writing about. This particula valence graph ma be used, to watch and track videos linked-to by bloggers who share a user's political opinions, view clips -popular with the users political "enemies,"' and the like,
|0155| The videos shown in the Barometer are chosen by queries against a large database built by network analysts engines performing network selection 102:. Periodically, a crawler (or ' -spider') visits millions o f b logs and collects their contents and links. Nest, the system, mines t e links in these blogs to perform partitioning 104 and forms attentive clusters based, on how the blogs link to one-another' (primarily' via their blog rolls), and, over time, what else the bloggers link to in common. Attentive clusters may be large or small- and the bigger ones can contain many sub- clusters .and even sub-s b-elusters. in embodiments, determining what the blogs have in common may he done by examining meta-data, tags, language analysis, link target -patterns,, contextual understanding technology, or by human examination of the blogs or a subset thereof in the example, American liberal bloggers and American conservati ve bloggers form the two largest sets of clusters, in the English language biogosphere, and the Barometer draws upon roughly the 8,000 wmost linked-to'* blogs in each of these groups to position the videos on the graph by calculating proportions of links to'each target by the two political cluster groupings,
101.56} The Barometer may be continually updated by scanning the blogs 'periodically, looking for new links to videos (or videos embedded right i the blogs). By counting these Jinks, it can be determined what v deos political bloggers are promoting. In embodiments, the link count may be displayed on. the valence graph using an identifier such as icon -or marker. In this exam le, some videos are linked to almost exclusively by liberal bloggers, some are linked to mostly by conservative bloggers, and a few are linked to more or less evenly by both groups, Once the system determines that a video s traction m the political clusters, it scans through data from ther parts of the blogos here o count how many "non-pol¾i¾a ! bloggers link to it as well. («157] The Political Video Barometer e ample illustrates one kind of vale ce graph and the insight that can be .gained, and. the applications that can. be built based on the method and the data obtained by the method. It should be. understood that the method may be used to examine any sort of potentially cluster-able data, such as technology, celebrity gossip, the use of linguistic elements, the identification of new sub-clusters of particular interest, and the like. All aspects of the valence graph method, and the underlying attentive clustering analysis, may fee customized along multiple variables to enable planning and monitoring campaigns of ail kinds.
(0158] hi an embodiment, a multi-cluster focus comparison method may enable comparing cluster focu index (CFI). scores of multiple attentive clusters. The CFI score may be a measure of the degree to which a particular out! ink is. f disproportionate interest to the attentive cluster being analyzed; in other words, the CFI indicates what link targets are of specific Interest to a particular cluster beyond their general interest to the network as a whole. In an example, X may be the CFI score for cl uster A and Y may be the CFI score for cluster B, The multi-cluster focus comparison method may compare the two clusters, A arid B, based on their CFI scores, X and Y, This would allow a user to discern elements', of common interest vs. divergent interest between the two dusters, insights derived from this method would be of great value in creating and targeting advertising and communications campaigns.
159] in another embodiment, link targets, semantic events, and node-associated metadata may¬ be scattered in w x«y coordinate space, and the dimensions of the graph may be custom-defined usin sets of clusters grouped to represent substantive dimension of interest, for a particular analysis. Elements are plotted on X and Y according to the proportions of links from, defined cluster groupings. For example, and referring to F G. 11, using data from the Russian bSogosphere, the top 2000 link targets for Russian bloggers may be plotted such that the proportion of links from "news-attentive" biog clusters vs. links from "non-news attentive'' clusters determined the position on Ys while the proportion of links from the "Democratic Opposition" cluster vs. the "Nationalist'' cluster determines the position on X, as shown in FIG, 1 1. In another example, popular outlink targets for the US blogosphere may be displayed with the X dimension representing the proportion Liberal vs. Conservati v e bloggers linking to them, and the proportion of political bloggers of any type vs. non-political bloggers represented by the Y dimension, as show in FIG, 13. Various data may be visualized in the graph associated with the clusters of news-attentive and political bloggers, such as meia-daia tags, words, links, tweets, words that occur within 10 words of a target word, and the like. These visualizations may be used in interactive software allowing user-driven exploration of the data graphed in valence space, optionally allowing user-defined sets of cl usters to be used in calculating -valence me trics.
£01601 In an em odiment:, a - method of node, selection I 10 based -on node relevance to a defined issue, also known as semantic slicing, is provided. Semantic slicin may involve .clustering according to a relevance bundle. A relevance bundle may include one or more of key markers, wha the nodes may have linked to, what the nodes have posted, text elements, links, tags, and the like. In essence, semantic slicing involves pre-sereens nodes for relevance based on semantic analysis,
{0161 The relevance bundles ma be used to sort through all of the network data to select the top high relevance nodes. In an embodiment, a custom-mapping of a sub-set of the link economy may he done.
(0162] In an embodiment, semantic slicing ma enable generating a coniextualized report of interest to a user on an industry level. Semantic slicing may enable focusing' attentive clustering on selected vertical markets. The vertical markets may be a. .group of similar businesses and customers who may engage in trade based on specific and specialized needs. Lists of semantic markers, such as key words and phrases, links to relevant websites and online content, and relevant metadat tags, are built which represent the relevant vertical market Relevance -metrics are calculated for candidate nodes,, and a selection o . high-relevance nodes are mapped and clustered, Continuing the example, the semantic slice may be done to analyze an energy policy vertical, market by focusing the attentive clustering around one or more selected, highly relevant nodes,. Thus, the attentive clusters may be more specific to identified domain interest of interest or vertical market. In this example, instead of just forming an attent e cluster of Conservative bloggers by focusing attentive clustering on one or more: key markers related to energy policy the attentive clusters discovered include topic-relevant segmentations of particular kinds of Conservative bioggers discussing the issue, such as Conservative-Grassroots and Conservative-Beltway. Additional high-relevance attenti ve clusters may be identified, such .as Climate Skeptics, Middle East policy, and the like. Cluster focus index scores may be used to determine what sites everyone in each cluster links to and which sites are preferred by the cluster. In an embodiment, semantic slicing may be done using a single node, such as a particular website, a particular piece of content, arid the like. I n an embodiment, semantic slicing ma be done over a period of time to enable monitoring the impact of a campaign.
f0363] In an embodiment, a tool, such as sofhvare-as-a-serviee, for enabling users to define one or more- semantic bundles for attentive clustering and as the basis of report outputs is provided. The tool may be an on-demand tool that may be used for semantic slicing. In such models, a user may declare a seman tic bundle o f nodes and/or links prior to attentive cl uster ing. |ΘΙ64} In an. embodiment, the system may provide an application programing interface (API) for delivering a segmen ation to track -one r more particular clusters f attention, or track how an audience is interacting with a piece of content, and the tike, The data about the various clusters may be collected directl from the API. For example, a user may wish to track a cluster. The user may enter keywords related to the cluster i a search option provided by the AM. Thereafter, the tool, may track . various websites'- a d report back the webi iks: nd data that may be relevant to the cluster. The API may be used to interact with a valence graph at various resolutions. The API may provide.segmentation data and metadata derived from the segmentation to other analytics and web data tracking firms, for use in their own client-iaeing tools and products. The segmentation and resultant data from attenti ve clustering provide' an additional dimension of high value against which third-party tools and. other analytic capabilities such as automated sentiment monitoring may be leveraged,
(Θ165) In an embodiment the system may enable real-time selectio of elements to visualize based on attentive clustering of .social media. The system may facilitate selection of a stream of information based on looking at the environment, zooming in on a dat element based on clustering, determining a valid emergent segmentation, and monitoring the flow of events in real time. The events may include media objects, test, key words/language, and the like. For example, the -real-time selection of elements may facilitate an analysis of trends/events especially for flnaiicia! purposes.
166] In an embodiment, a search engine may be provided that prioritizes search results being displayed to a user based on a determination of real-time attention including attention from a particular cluster or set of clusters, A user ma be able to customize: the prioritization of search results, such as by getting real-time attention from a particular cluster, from a particular sub- cluster, and the like.
016-7] In ait embodiment, a search engine is provided that searches within only those Sites accounts with high cluster focus for a chosen segment. For example, a GOOGLE™ search may be restricted to the 30 websites with the highest CFf scores for the Dirt Bike racing cluster of OAKLEY'S TWITTER™ followers map. Thus, the search may only return results from a. list of ke influential sites related to the chose segment In other embodiments, the search may be restricted to websites (or domains within them), with a particular CF1 score. Websites (or domains) that meet a threshold. CF1 score, websites that fall into a range of CFI scores for a chosen segment, websites with a particular' score, and the like, la an embodiment^ the search query may restrict the search to particular websites that are identified based on the CFI scores. In an embodiment, the search query- may be restricted by CFI score of a website and the CFI score restriction ma be indicated in the settings of the search engine, in other embodiments, the CFI score for sites to search may be indicated in the search, string itself. For example, a. user .may indicate -a particular
.search they want io perform and they may be provided with slider bar where the user indicates that the search should -be-, restricted to thos websites with a CH score falling into th range selected on the slider bar. The slider may be provided with a normalised scale, such as ascribing I to low CFl scores and 10 to high CFl scores, such as using a linear, logarithmic, or other scaling process.. The system may then search, a database of websites for the range of CFl scores to identify one or more websites to which to limit the search. These websites are then included in a search string that is provided fo a search engine.
[0168) Similarly, the search can be restricted to only specific content, or specific content may be promoted to high ranking within a search, leaving other content to the lower ranked results. One way to do this restriction is to utilize the valence mapping functionality of the system. As described herein, a valence graph may be constructed for a chosen segment that depicts words, phrases, links, objects, and the like that are preferred by one cluster over another cluster. Content indicated in the valence graph ma be indexed by the system and only that content in the valence graph may be searched by a search engine. Further restriction of the content may be employed, such as by website, CFl score, and the like.
|0169] in an embodiment, attentive clustering and related analyses ma result in identifying issues, altitudes and messaging language that may be specific to discourse for a target market, and ratty be suitable for presentation in a report. For example, in a clustering of bloggers sympathetic to Arts in Schools, by examining inira-eluster linking patterns, it may be determined that most of the bloggers within each cluster tend to keep the discussion, within their cluster except for the bloggers in the ^lnteresting/teachers ediicalors" -cluster who have a tendency to spread conversation to each of the other clusters. This behavior points to an opportunity to work with these bloggers to spread messages across the space, in continuing with the example, by examining clustering related to specific keywords, websites, outlinks, objects, and the like, it may be .determined thai there is a broader discussion about education and education reform than about arts and arts education. Therefore, a conclusion may be that introducing an arts education message to education discussions has more potential than introducing- arts education messages to arts discussions. In the report, various valence graphs may be presented, such as cluster specific term valence .maps, maps of sources, outiink maps, term specific maps, issue maps, and the like. Alternatively, the report may be presented as a spreadsheet of data,
fU3?ffj in an embodiment of the present disclosure, the report may feed into a method of generating a campaign blueprint for both social and upstream media sources and a method of identifying influence inter-cluster and hitra-ciuster in order to plan a campaign. The blueprint may include target audience, demographic details, objectives- of the campaign, flow of the campaign, messaging to use in the campaign, outlmks to target, and the like. Systems and methods for measuring the success of a campaign in variou online segments and generating targeted data sets .identifying sub-clusters specific to & user's identity or objective are provided.
\ 171] In. an exemplary embodiment, the campaign tracker may track data from a variet of sources to provide closed-loop return on. investment (RO ) analysis. The tool may parse the information of each website accessed by the users, keywords entered, any information about the campaign, and the like. Further, the tool may track how people react to the campaigns and which ones are most successful. The campaign tracker may track and analyze results in real -time to determine die effectiveness of the c mpaigns.
[01.72] In addition, the tool may enable the system to generate reports for clients. The reports may include details about the campaigns such as campaign type, number of people who have viewed the campaign, any feedback from the people, and the like.
f0173] in an embodiment, analyst coding tools (ACT) and a survey integrator may support distributed metadata . collection for qualitative analysis to best interpret quantitative findings, The tools may include an interactive visual interface for navigating complex data sets and harvesting content. This interface may contain an interactive proximity cluster map which can display specific node data, metadata, search results, and. the like. This proximity cluster map interface may enable the user to click on nodes to. see node-specific .metadata and to open the node URL in a browser window or external browser. Using the tools, a user can add metadata and view metadata about any given blogger on a map. The tools enable grabbing whole sets of biogs or items to add to semantic lists, and may enable a user to define surveys so a team of human coders can open the website and f ll out surveys.
| 1.74| hi an embodiment of the present disclosure, a dashboard may be provided. The dashboard may combine advanced network and text analysis, real-time updates, team-based data collection and management, and the like. In the embodiment, the dashboard may also include flexible tools and interfaces for both "big picture" views and mmute-by-mmut updates.. on messages as they move through networks. Using the dashboard, a user may define bundles and. track them in the aggregate through networks over time. Using the dashboard, a user may he able to see how Specific media objects are doing with a particular cluster over time,
[0175} hi an embodiment, the dashboard may provide a burstmap feature in which the history of selected events or sets of events over a timeframe may be displayed, using a proximity cluster map. During playback, nodes in the map will light up at a time corresponding' to their participation in the selected event or events.. For example, at a time in playback representing a certain date, every node which linked to a particular YOUT BE™ video will Sight up, allowing the user to see the pattern of linking as it unfolded over time. Optionally, this burstmap feature may include a timeline view displaying event-related metrics over time, such as the number of nodes linking to a particula video. Optionally, the btirstmap feature ma include lists of events- available for display. An. example of - burstmap interface is found in FIG, 12,
In an embodiment, techniques disclosed herein may be used to generate social media maps that visualize social media relationship data and enable utilization of a suite of metrics on the data. Social media maps may be constructed via clustering of various social media communities including. TWITTER'™, PACEBOO ™ Mop, online social media, and others. In one embodiment, the clustering technique used ma be manual, relationship-based, attentive clustering such as previousl disclosed herein, network segmentation, or another analogous technique. The social media maps may be organised in portfolios that are targeted to market segments or relate to an issue/topic campaign. Social media maps may be offered via an API or as raw data to plug into a third party dashboard. Services related to the social media maps thai may be offered include robust tools for searching, comparin and generating integrated reports across multiple maps, searchable indexing and map browsing. Pricing for social media maps may be via subscription, for one or more maps, a portfolio of maps, the whole portfolio of -maps, the whole portfolio maps save some exclusive custom items, or the like. Systems and methods for how to generate, utilize, update and .offer social media maps will be further described herein, f 01??] A comprehensive catalog of social medi maps and network segmentations may be offered and updated on a regular basis. The catalog may .include targeted portfolios for key markets, such as consumer goods, media and entertainment, politics and public policy, energy, science and technology, government, and more. The catalog may contain maps for each layer of the social, medi system, such as biogs. Twitter™, social network services, forums, and the like. It may contain maps for all major languages, countries and regions of the world Social media map data may be used, within partner dashboard systems, so that range of commercial tools can be leveraged by subscribers and so thai the social media map data are "portable'" across various tools. In addition, a suite of reporting tools may be used in conjunction with the social media maps, I&178] In an embodiment, one or more social media map and network, segmentations may be constructed via clustering of data from at least one social media community. The social media map or network segmentation may be ottered via an API or as raw data. The social media community may be based on at least one of a social media layer, a language, a country, a region, or the like. In some embodiments, the clustering technique may be attentive clustering,, as described previously herein, reiatiooshlp-based, manual, network segmentation, or the like. Referring now to FIG. 1 , relationship-based clustering of data from at least one social media community 1402 is used to construct one or more social media maps and network segmentations using the clustering 1404. One or more social media maps and network segmentations may be offered via an APS 1408 or as raw data 1.410. A report may demonstrate the interaction of «odes/Hftks between the maps 1412.
| l? j In em odiments, he maps may be generated by an -autonomous- process. Th autonomous process ma create maps based on one or more criteria, a. scope definition, an instruction, or the Hke. For example, a social graph may be generated based on followers of an individual or entity in a social network. IP another example, the ma criteria may be semantical iy based, such as based on key words or hashta-gs. In yet another example, the maps may be geo-based, such as based on which users/nodes are in a territory. In still another example, the maps may be based on previous mappings. In this example, segment in other snaps on health and fitness- may be used to triangulate or iterate to a mapping of a new category. In another example, the map ma be based on an arbitrary set of accounts generated by a third party. One scenario-might be a. mapping of the social network accounts .for all the users of a mobile application, in still another example, the maps may be based on a nomination of individuals based on some criteria,, such as demographies. Once generated, the maps maybe, stored and indexed.
0180] in embodiments, maps may be based on CFI scores for dynamic data (e.g„ YOUTUBE™ videos). However, the amount of data may b increased to obtain a belter indication of what the segment is communicating about whether data can. be obtained on the i f!ueneers of a segment, which may be coming from oil the map, in additio to looking at data comin from the segment, the system may be able to access data jftom social media accounts that have high CFI for that segment (not just the ones that are *%" the segment). Thus, calculating cluster focus for the dynamic data may be improved:, CFI scores may be calculated for a first segment. Then, CFI scores may be calculated for those iniluencers on the first segment For example, the first segment may be followers of a: particular art, gallery but the system can also examine the CM for the first segment's itifiueneers, which may be several well-known Ar Gallery aficionados who may or may not be followers of the particular art gallery. In embodiments, certain maps may be based only on the CFI scores calculated for the inftueocers.
|8l8lj A searchable inde for a catalog of social media maps may be constructed 14.14. Further, social media maps in the catalog may be searchable. For example, the maps may be searchable by a keyword, a URL, a semantic marker, an the like. In embodiments, the social media maps may he indexed by one or more of a keyword, URL or semantic marker so as to form a searchable index of social media maps. In embodiments, the searchable index, may include metrics to indicate a statistic regarding the social media maps. For example, the statistic may represent a dimension -of popularity,, relevan.ee, semantic density, or similar feature. For example, a search engine may¬ be enabled to return maps in terms of relevance by using certain statistics in the searchable index. 101821 For example, a semantic marker may include a keyword, a phrase, a URL (node or object level), a tag (such as those from bookm&rking and annotation, services, ineta keywords extracted from. HTML, tags assigned by coders, etc,}, and the like. Semantic markers may also Include those used in. particular social network environmen ?, such as TWITTER™, and may include follows relationships, mentions, retweets, replies, hashtags, URL targets, and the like. Any of these semantic markers may be used to index a social medi map.
0 831 Based on at least one of the search terms or the search results, a stew social media map subscription may be suggested. For example, if a user searches a social media map index for the terms "Nissan LEAPM," "electric vehicle," and leai¾ations,coni, subscriptions to social media maps sueli as automobiles, eeo-friendly products, arid California trends may be' suggested, i | In an embodiment, a dashboard ma be- used, for browsing, visualizing, manipulating, and calculating metrics for one or more social media maps constructed via clustering of data from at least one social media community. Clustering techniques may include relationsh ip-based, manual, attentive clustering, or the like. In some embodiments, the dashboard may be a third -party dashboard that supports visualization of data from clustering, wherein the data may be delivered by a raw data feed, an API plug-in, or any other data delivery method, hi embodiments, the data from clustering, may be joined with or otherwise integrated with data from ot er data sources to form a new data set. The new data set may be similarly browsed, visualized, manipulated, and processed by dashboards,
(0185] in an embodiment, APIs, dashboards, and partner tools may be used with social media maps for planning/assessment. For example, social media maps may be used lor enterprise resource planning,- business insight, marketing, search engine optimization,- intelligence, politics, industry verticals, financial industry, and the like. For example, an entertainment promotion company may own a plurality of social media accounts, if they could navigate sector-level mappings related to genres of music, they could use the maps to target music genre-specific messages using the most appropriate of those accounts for maximum effectiveness.
{81861 I n embodiments, custom maps ma be deri ved from mash ing up sets of social media maps. 101871 I an embodiment, the social media maps ma be constructed via clustering (e.g., relationship-based manual, attentive, etc.): of data from at least one social media community targeted to a specific market segment. For example, the market segments may include government intelligence, public diplomacy, social media landscapes in other countries, pharmaceuticals, medical, health care, sports, parenting, consumer products, energy, and the like, in these embodiments, the market segment may be used to index the social media maps,
{01881 In an embodiment, a reporting product may leverage social media maps to demonstrate the Interaction of nodes, and/or links between social media maps. For example, a multi-map report may be generated comparing the nodes and links in different social media communities in a particular n rket environment. The repotting -product' .may be integrated with a 'dashboard or analytics platform,: Multi-map reports generated, by the reporting product may be used to demonstrate various phenomena, such as how particular items can be found m particular social media layers. For example, a multi-map report may demonstrate how wehlog hosts are having customers driven to th m from TWITTER™. In another e ample, a multi-map report may demonstrate how FACEBOOK™ pages are getting attention from a segment of TWITTER™, 0189} In an embodiment, information derived 'from the social media maps, including portions of or the entire map itself, may be published or displayed as a map widget, which may enable monitoring an. ongoing stream of information from one or more clusters or one or more maps.- Information being displayed that is derived from the social media map may be customizable within the widget, such as via a dialog box, menu item, or the like. A user may be able to, optionally in real time through a user interface, select a stream of information based on looking at the environment, zoom in based on clustering, figure out a valid emergent segmentation, and then set up monitors to watch, the flow of events, such as media objects, text, key words/language, and the like, in real time. The published, widgetixed map acts as a sensor network to obtain a host of behavioral data and leads that can be leveraged by the map's user or hosts., in embodiments, users ma interact with other users' ma widgets to discover content and individuals/entities. Using other users' map widgets, user may grow their own. networks by engaging with the content and people/entities in the widget, such as to start following a person or to retweet an item.
f0198{ There are at least three processes that yield attributes of nodes, including calculating a relevance score, performing a CFI bias: weighting, and identifying nodes as "allowed" or "not allowed" (e>g., hiae ist/whitebst.}, Automated social media ma refresh may leverage one or more of these processes.
|0191 J In an embodiment and. referring to FIG, 15, a social media map may be automatically refreshed via calculating a relevance score for nodes or bundles in the map 1502 and reconstructing the map based on a relevance ranking revealed by the relevance score 1.504. Semantic/relevance marker bundles ma include lists of semantic markers like key words, phrases, relevant link targets, accounts that are followed on TWITTER™, and the like Semantic markers may be manually eurated, in an embodiment, the refresh process ma involve performing the relevance search/semantic slice that generated the original map for new relevance/semantic markers. A relevance .'calculation may be performed, on th nodes to calculate a relevance score. 10192] In another embodiment, a social media map may be automatically refreshed via positively or negatively weighting at least one cluster based o a CFI score calculation 1.508. and reconstructing the map to modify the nodes in the clusters 1510. Modifying the nodes may be done to include positively weighted nodes and exclude negatively weighted nodes, CFI scores for clusters may be leveraged to evol e a .map in a certain direction. Clusters in the map that include preferred/wanted nodes/links are positively weighted. Clusters are negatively weighted in they are deemed to not be relevant. Applying weightings to the map may enable pulling, in additional nodes that are more relevant. Weighting map dusters for the CFI bias operation may be done by humans.
193| in an embodiment, a social media map may be automatically refreshed via filtering out unwanted nodes 1512, In an embodiment, a social media map may be automatically refreshed via obligatorily including nodes thai were not clustered in the original map 1514. Semantic markers thai are known to not fit based on their relevance ranking or for some other reason are not allowed are filtered out In embodiments, nodes may be forced into the map whether or not they were identified in the relevance search/semantic slice. Curating black lists of nodes may be done by humans,
|8Ι9 '] in an embodiment, asocial media map ma be automatically refreshed via erowd-sourced information regarding nodes and/or links that dri ve nodes to bundles 1518, In an embodiment, a social media map may be automatically refreshed via processing social media map usage data for trends/indicators .15.20. Usage data may reiate to one or m re of what is ignored, what is further explored, what Is used, how clusters are grouped, what name/label is assigned to a cluster, what color is used for a cluster, what order/position the cluster is placed in a report, and th like. Nodes preferentially interacted with may be weighted more' heavily.
|81 5] i embodiments, community feedback may mil aen.ee each of the three streams of automated map refresh described herein. Community feedback provides an indication of news, events, information, etc. that may drive: addition of nodes to the bundles* sneh as. for example, if a new website is a target link. This sort of feedback may provide feedback or guidance as to the
CFI bias operation. For example, if feedback suggests that a cluster is relevant then that cluster may be positively weighted.
|8196] Feedback and updating may be based on how people are using the maps, such as, understanding what they ignore, what the drill, down on, what they use, how they want, to group things, what name label they assign a cluster, what color they use for a cluster, wha clusters are most important: to a client based on an order/position the client places it in a report, and the like. Refreshing the maps may leverage this captured information,
fill 97] In. an embodiment, feedback may be received passively from clickable/interactive maps via built-in feedback system. This feedback system may be. used a a naive weighting system. In an embodiment, the map may include a flag available to provide commentary or feedback. {9198} In art example, a map may include raw clysters and hurnan«made groupings and the attachment of oth er sort of metadata Such as the coloring of a cluster. The exampl may be that of the Russian bl-ogosphere, which may contain 40 clusters and 7-8 groups, including 5 right wing Russian nationalist groups and a libera! apposition group. Clusters may be processed by human- assigned re-aggregation, and metrics may be run against them to progressively refine the clusters. Different clients, even on a base map, may want to grou things differently, name a cluster in an interface diffcrently, color a cluster In an interface differently, and the like. Users need to be able to define groups, re-ia d clusters, select clusters and the like. Community feedback ma provide observations as to how users are grouping the same- map- and that yields data, about which clusters are related to each other that, is ^r wd-sou ced to the user. Users may define the order in which the data are presented, in- the reporting. For example, a user may want to place data on preferred clusters higher in a chart". Cluster orderin and positioning information is customizable, which can be harvested as an importance weighting by the community.
%99) in another example, map users may- contribute to map metadata to generate a community data set established, and/or expanded by users. For example, users could input the gender of a Tweeier blpgger. The user community itself may be a segmentable population. The user community can contribute to scoping a. map for a particular topic. For example, somethin about a disease might appear in various places; Consumer segments. Politics, Medical/science, Sports, and the like, User feedback may also help scope the size of the map. For example, a user may ask: Should the map be constructed on the first 5,000 target or should 20,000 targets be used? in an embodiment, user-contributed data may be used to provide metadata for a social media map constructed vi clustering (e.g., relationship-based, manual, attentive, or the like) of ..data .from at least one social media community.
|Θ2§0] In an embodiment and "referring to FIG, 16, data, including user-contributed data, may form a searchable, editable metadata and basic information repository for U RLs 1602, such as to form a U Lipedia. The repository may be linked to one or more social media maps 1604.
|62ilj In an embodiment and referring to FIG. 17, clustering (e.g., relationship-based, manual, attentive, or the l ike) of data from a t least one social media community may be used to generate an actionable targeting list. Targeting lists combine network centraBty 1704, issue relevance 1708 and CP! for a cluster 1710 into a ranked target list 1702 that may be used by marketers- or" ther interested parties In order to reach certain nodes in some meaningful order for targeting for strategic" communication or other business purpose, lire formula of combination may be adjusted to maximize ranking to suit client/user objectives. In an embodiment, network centra!ity may be a universal score related to how central node is in the network. For example, daytime talk show hosts may have a network central ity of 1.00 in the general population, while economists may he a zero, !rs. an embodiment, a Cluster Focus Index score may be calculated for each cluster. For example, daytime talk show hosts may he a zero CFl for economies, buteeonomlsts are 100. 1» an embodiment,, an issue relevance score may be calculated- for each cluster, For example, the issue relevance -related to the budget deficit may be calculated based on a publication 'frequency score (e.g., # of tweets). Other scorn techniques- may be used- to -calculate an issue-relevance.
[0202] In an embodiment.,, user may b able to purchase ads or message placements on a target from the targeting list 171.2. From the targeting list, users may be enabled to buy an ad placement or messag placement on the target site at the cl ick of a button, in an embodiment, the effect, or impact, of the ad/message placements may be tracked for the node and across a social media map. Thus, the system may enable users to identity targets according to a ranked list based on network centrality, CFl, and issue. -relevance, and then place and. track ads/messages o the targets -from the lists. In another embodiment, targetin lists may be used in connection with any ad network for ad/message placement. Tracking ads/messages may involve receiving feedback on actions taken with respect to the ads messages, .calculating impact metrics, and the like*
[0203 j in an embodiment, historical data, browser may provide a mechanism for visualizing archived, historical social media map data, such as for research or historical purposes. For 'example, ther may be value to academia of accumulatin old social media ma s and showing the delta between them, such as to explore how the market has evolved over some period of time. Historical, social medi map data may also be useful for financial Industry foretisie-s and i nt el ligence analys i s,
10204) in an embodiment, CFl metrics may be displayed on a social media map. A CFl metric for items in clusters indicates. ho -much attention there is to thai item for that cluster. An. ttention score indicates the relative attention to an item as compared to other items for a cluster for a range of time or for a "point" in time, A higher attention score means the item is more specific to the cluster. Attention scores are non-linear in the sense that anything below two is -not significant and greater than two, it is exponentially significant
[0205] CFl scores may be a metric for measuring search engine optimisation and/or advertising effectiveness because it represents cluster specificity. CFl metrics would have to be combined with a more global metric to enable companies to shift from thinking at the exeeution implementation. layer (e.g., where d 1 advertise?) to the strategic -layer (e.g., where are we going with this community?},
fi)2§6{ In. an embodiment, a CFl Graph may include CFl scores for sources and nodes on th map. in the upper right of the. map are clusters with high focus on the particular cluster, high overall level of attention, and many in-links. On the CFl graph, users can see various items at a glance. For example, users may find the key players related to a topic or the landscape of players to determrae who has influence.
|Θ2Ι7| In m embodiment, a CFl graph ma include a Cluster Map Properties Editor/User nterface. The -interface .enables users to label, clusters, assign clusters to a group, and perform group metrics.
f(l2§8] Maps may be generated based on semantic elements, bundles, white lists;* black lists, and the like in an automated fashion in come embodiments but labeling the clusters in automated way, such as -when a map update is made, ma be difficult. Drat labels may be assigned when the cluster is created or updated based on- a previous storehouse of knowledge. A confidence score as to that labeling may be generated. To automate the labeling, members of a cluster may be compared with membership of clusters of past maps and if a high percentage are the same then it is assumed the clusters relate to the same thing and are labeled similarly, in another embodiment, automated labeling is based on a structural equivalence. Labeling a node or an object that has well defined properties may - he easier than labeling a cluster, which is a collection, of objects. Structural equivalence involves examining the node's outhnks. For example, if people are friends with the same people, then they may have similar interests. In another example, oiog that link to the Same sets of things are likely to be similar. In yet another example, if there are two people who have superior relationships to twenty soldiers, chances are that the two people are sergeants or some other form of commander; While this may work at the node level, it is harder to do at the cluster level CFl. scores, which are already generated for clusters, may be used in the generation of labels. For example, for two clusters with numerous links from nodes in these clusters to other nodes, it is difficult to compare the clusters at face value. Que might just be larger, more popular, or have more links, However, CFl scores enables a comparison between two Items or sets of items thai a cluster may be disproportionately paying attention to. For example, Cluster 1 is very interested in horses and baseball, while- Cluster 2 is very interested in horses and basketball. Given the CFl scores, vector cosine similarity can be used to determine the relationship between the two clusters. For each cluster, vectors can be built based on the CFl scores calculated for each of the clusters for the same items ie.g., Cluster 1* F11 (1), CFl 1 (2) . . . etc.; Cluster 2 H2(1), CFI2(2) , . >, etc.). The vectors may be plotted in a 3D vector space. The cosine of the angle between the two vectors may be one indication of the relationship between the. two clusters. If the cosine is small, the confidence is high. As maps are updated with new content, clusters in the new map can be compared to clusters- of old maps. When there is a match, that is, small angle between two -cluster vectors, the label from the cluster in the old map is assigned to the cluster in the new map. In embodiments, the cosine of the angle may also act -as a similarity score. There are a number -of measures .for vector distance, including CQ.msiati.on distance, cosine similarity, Euclidian distance, and the like.
£Θ2§ ] In embodiments, to limit the number ofCFPs to include in vector generation, the CFFs may be filtered to: include only a€f I of two or more o a particular cluster. This/effectively reduces the dimensionality of the space.
[0210] In other embodiments, items that are similar may be aggregated in labeling. For example, using outlink bundles rather than an individual CFl score may enable grouping items into target clusters, and examining the density of links to the target cluster.
{0211 J in an embodiment, an advertising campaign planning tool can enable running a campaign on biogs, and tracking -success in other layers (e.g., ;TWiTTER.iM; F.ACEBOO ™; segment- specific online forums).
£0212] In an embodiment, URL shorteners included in social media content may be tracked. The system may provide reporting outputs that track the success of a social medi campaign including a URL shortener in different- layers of the social media sy stem, The system may not only be used to lan the campaign, but may also be used to report on the TWITTER™ bounce from blog activity or the FACBB(X)K™ bounce from blog activity, for example,
|02JJ] in an embodiment, the system may enable campaign planning (e.g., domestic, international, multi-platform, multi-network, etc.) where language is not a required first limitation. For example, the system may enable campaign planning in marketing, such as, for consumer goods, media and entertainment, movie marketing, video games, social games, music, international product launches, talent agencies, public diplomacy, public health, political campaigns, and the like. Campaigns may be: tracked, such as with a chronotope -analysis, as will be further described herein* to determine a pattern that exists In time and space determined by combining temporal and. network features in toe analysis of the segments/clusters.
|02I4 j In an embodiment, the system may marry internal reporting with other reporting tools such as splash, .resonance, clicks, -transactions, and the like.
[8215] In an embodiment, the system enables analysis and prediction, such as in the financial industry (e.g., market predictions and trading positions), social media firms whose value is built around prediction, and the like,
1021.6} In embodiments, third party data, and clusters may be used with the mapping techniques described herein.
|0217 j to embodiments, models may be built on on or more clusters using tools that can. be accessed across clusters.
|0218] In som embodiments, a social media map and network segmentation may be constructed via clustering of data from a single user's social media community. Referring now to FIG. 23, a user flow for becoming a user and interacting with, a map is. depicted. Starting from logical block 2300, processing flow proceeds to a login -screen at logical block 2302 where users may log in, such- as via a social media authorisation. If the user is a ne user, the user is sent to a sign up .page at iogical .block 230 , where they may sign up or be given additional conten t to entice a signup, if the user is already on a list as having requested access, processing flo proceeds to logical block 2308 to cheek a wait list status. If the user is a beta user, processing flow proceeds t logical Mock 2310 where it is determined if the login is a first login. If so, processing flow proceeds to logical block 2312 where a tear may be taken. After the tour, processing flow ma proceed to logical block 231.8 where a map overview is presented, including a competitive overview, a text description, a cluster power, and the like, if 'the user is not a beta user, processing flow may proceed to- 'logical block 2314, where the delta since last visit is presented, including new followers, recent activit with map indicators, and the like. Processing flow may then proceed to logical block 231.8. From logical block 23 i 8, processing flow may proceed back to logical block 231 if recent activity is requested again.
[021. 1 Alternatively, if tire user chooses a. cluster or group at logical block 2318, processing flow may proceed to Iogical block 2320 to obtain a cluster overview., including local competitive performance, mflueo'cets, conversation, images, videos, recent tweets, and the like, if the user chooses to delve into the entire interactive map, processing flow may proceed, to logical block .2322 for ci.usfcer.map navigation. Processing flow may alternatively proceed to logical block 2324 from logical block 2320 where the user may take action, in an alternative embodiment, processing flow may first proceed to logical block 2328 where the user may first view full lists, and then processing flow- may proceed to logical block 2324 where, only actions that are relevant to the list being reviewed are dlsplayed at logical block 2324. From logical block 2324, the user may choose to build a network, save one or more clusters as a list, move a message, engage with content, or the like. If choosing to build a network, processing flow may proceed to logical block 2330, where the user is prompted to make a list of influe cers. From there, user details may be entered at logical block 2332, and then actions such as engaging one of the users' make current logical block 2334 or a follow action may he taken at logical block 2338. From logical block 2330, a follow list may be generated at iogical block 2340, or the current view may be saved as a Twitter™ list or some other social media list at: logical block.2342. Likewise, if the "Save Cluster as- List" action is selected, processing flow may proceed to save the current view as a Twitter™ list or some other 'social media l ist at logical block 2342, If the move- message action is selected, a l is t of followers ma be made at logical block 2344 and from, there the current view may be saved as a Twitter™ list or some other social media list at logical block 2342, or a message may be composed .at logical, block 2348 which may include content and context and the message, if engage with content is chosen at logical block 2324, processing flow may proceed to logical, block 2358, where a list of content, such as U RLs, key content and media, may be made. Users ma choose to. screen content details at logical block 2332 after which processing flow may proceed .to. logical block 2360 where a word tweet is generated, logical block 2358, where a re weet is generated, or logical block 2354 where tweets by i fluenced who tweeted the content are found and then potentially re-tweeted at logical block 2358.
|0220j in order to scale the amount of information in the social media snaps, clustering techniques may need to be modified. In general, some set of node pay attention to some set of targets and the nodes get clustered based on the targets they pay attention to. There are at least two extensions of this general approach. In one embodiment, a very large number of nodes pa attention to a very large number of targets. Thus, for clustering, the number of operations scales at least polynomial!)' (e.g., the cube of the number of nodes). For example, .for 10,000 nodes the number of operations is in the billions. To accommodate this scale, computing power may need to be augmented,
|022i | in another embodiment, attentive gravity may be used to scale up the size of the social media maps. Nodes pa attention to targets (input data), however an object may be created where nodes are not discretely assigned to a cluster but are drawn to 'different poles, such as ideological, thematic, or topical poles. Depending on which nodes a target pays attention to, it can be drawn to one pole, another pole, or the middle. Instead of discrete .maps with a plurality of clusters (e.g. , 40) in plurality of colors (e.g., 40), an attentive gravity map may have poles where the nodes are distributed based on how close they are to each pole. A node may have 'a set of scores which represent a gravitational coefficient for each of the poles of gravity. The gravitational coefficient may be used with other visualizations in order to modify the size, color, or opacit of the cluster representation based on the attentive gravity toward a pole. In another embodiment, the gravitational: coefficient may simply be used as a metric on" the cluster map previously described herein. The gravitational coefficient provides the degree to which a node matches a segmentation (e.g., a sports weight and a parenting weight for the same node, rather than just sorting the nodes into different eitrsters/segmentations and throwing oat the relationship to other clusters or segmentations),
[0222] CI usters themselves may not really be definitive. For example, a node might not be in just one cluster. Such characteristics may be reflected in: mapping technologies.
fi)223] One technique may be a Discrimination Function, In an example, 1 ,000,000 nodes ma be clustered,. An initial condition may be a seed attentive clusterin for a small number of nodes, such as 10,000. To expand the clustering, the eentroids of the clusters are used to assign values to the othe clusters (the X, Y average of the dots). For example, It can be determined if a new node is closer to the centroid of one cluster or of another. As many nodes as desired to be incorporated into a map may be clustered via this technique, in this example, this technique applies to nodes 10,00.1 th oug J ,000,000.
10224] Another technique ma be to iteratively cluster the 1,000,000. nodes in batches of 10,000. Then, the CFl scores of those clusters .may be used to cluster like clusters - with each other. The clusters Way be combined at a meta-cluster level. To make that work well, how similar some clusters are may need to be tracked across large groups of sub-clusters to see which ones are idiosyncratic and should standalone versus ones that are somewhat consistent and should be joined.
f0225] In an embodi ment, St may be desired to reduce the scale of the map to just those actors connected at a mesoscale while- eliminating actors who are not really active members of the network and are just "star" followers. An influence Network Discovery method may be used to reduce ver large networks to their most influential core communities and obtain a sub-graph of maximally connected sub-actors. A variable may be assigned, to each member of the network, where K«sw relates to a minimum connectedness, or the number of other nodes in the network to which tiie individual is connected, (e.g., a known measure of connectedness in networks). One way to reduce the network quickl is to restrict the network, by :K.«*r value. For example, a network may be restricted to only those with a _ of five and up, that is, only those people connected to at .least five other people. Another way to reduce the network may be done teraiively. For example, a network of people surrounding the Democratic Part may be reduced iteratively. In: a first step, inactive members and members with few followers may be eliminated. Then, certain network members, such as public figures or those who have a lot of followers may be removed temporarily fro the network and reserved in a "keep'5 set. Then, the remaining network, may be examined and refined by K*. . In the example, members of tiie network with a Ksssv of one are removed from the network. Removal f these people from the network may change the Kcw values for the .remaining members of the network. The process iterates, removing those network members with the lowest Kt values. The process can iterate until a. specified number of network members is obtained. A t this poin t, any members in the keep set. may be added back to the network. As a second pass, a «κ* of the kee set members ma be done and limited to the node threshold. Based on the follo patterns of the members retained in the map, they may be assigned to a cluster.
fi)226{ In. an embodimen t, a delta report may be provided to examine the evolution, of a cluster ma and capture the most sa.iie.ti t point of change in the last interval. The delta report may identify which clusters have grown, which sites are being targeted more by clusters now than before, which topics are being discussed more now than before, which clusters are more active than before, and the like. The delta report may be provided on a periodic basis, such as weekly, monthly, and the like. -Generating the delta report may involve reporting which CFI' scores changed the most and which clusters .are more; active than before:. Delta reports ma be enabled by organisation into a self-updating database with time snapshots, A delta report may be useful in customizing a stream of content For example, a stream of new objects of interest for clusters in the rasp can be provided as a delta report and feed to a user.
|0227j in an embodiment, a self-service tool may be designed to let users access the system, and initiate generation of a social media map. In an embodiment, a user may log in to the system or, in embodiments, to a social network or other third party website, in order to initiate the map creation process. A hot may be spawned that harvests data and maps the data to clusters. The hot may further provide cluster labels and CFI .scores. The output may be a. social media map data object with CFi scores. The self-service too! may enable user browsing of cluster and the map, tagging nodes, grouping and labeling clusters, and the like; In an embodi merit, a machine learning labeler may suggest, cluster labels. The user-generated labels may be led into the machine learning facility used to label clusters for the social media maps. The focus of the self-service tool, may be on actions that strategically build a user's network, and strategically message to components of the network, CFIs can be used to determine a similarity among maps so that an existing social medi map that is similar to- the se!f-serviee map may be recommended for review.
f0228] Social media maps may be used to enable users to strategically message components of their network. In an example, a social media map may be created for the Twitter™ followers of a live entertainment company. Certain clusters relate to dense communities around particular st re or particular genres, of music. For the live entertainment compan , there are relativel few messages- that they transmit that everyone in the ma cares about; however, using social media maps, clustering enables more discrete message targeting. IT the company wants to use Twitter*** to -get the: word out about a- country artist, for example, they can target the country music cluste only with their messaging, if the company wants to target only those nodes within the country music cluster that have the highest influence, CFi scores may be used to limit the messaging in order to maximize the impact on the cluster. Such discrete targeting may he particularly useful in the case where direct messaging to followers may be limited.
|0229} Social media maps may be used to enable/users to strategically build their network. For example, in the live entertainment company, the country music cluster may be growing in size. The social media map- may' be used to identify nich influential nodes for the country music cluster, such as by using- segment CFI data to maximize connections with, targeted segments/key influences. Then, the user can start following those influential node in hopes that they will, follow back. Such a process may help build the network in a desired strategic direction. Users may be able to see how they are doing against competitors for any given, segment by examining the proportion ofitiflueneers (high CP! target), who may or may not be in the map, following them versus others.
{.«230] In one embodiment, social media maps ma he organized and navigated as a map of maps, where each map appears as a node on a larger map. The strength of the 'Connection between maps is the ma mum of ratios of how- many nodes a one map versos another map. In navigating and searching the maps for a particular target, an indication may be given when a cluster in one map is very similar to another cluster in another map that may or may not be accessible by the user, for example, if one map relates to diabetes and another relates to obesity, a common cluster may be groups actively modifying .lifestyles to avoid both pathologies; in embodiments, the system may provide an- interface from the search screen with which the user may purchase the map they do not currently have access to.
{0231] In an embodiment, user segmentation may be used to find segments for targeting as customers, Maps may be automatically generated 'for the target customer and conversion rates' to paying customers may be tracked.
0232} Described herein is a system for examining social media phenomena, such as hashtags, aftd how they spread- i a network. Patterns of spreading may include salience, commitment, or a combination thereof termed resonant salience, where there is a burst of activity followed by a sustained commitment, or resonance, pattern. By combining temporal and network features in the analysi of the segments/clusters, chronotopes (i.e., patterns that exist in time and space) emerge. 10233} i an embodiment, a timeline view may he used to examine messages across clusters, The timeline may include the chronotope as the drill down. For example, a primary timeline may be organized in rows by grouping of clusters (e.g., similar dusters are assigned together into a group). There may be several bands for groups (e.g., things for which there is a C i score). The timeline may" be examined for objects of interest that have very high CFl scores at some point. One example may be hash tag in a Twitter network A dot may be placed at the point in time when the activity (attention) peaked (had the most citations, re-tweets, etc.) for that object of interest. A dot may be placed in the macro timeline for the group (showing the peak points of all objects of in terest) where the peaks were for each group (a group corresponds to a band below the main timeline). When the dot that corresponds to the peak of attention, to an. object of interest for a group/cluster is clicked, the chronotope is revealed. The chronotope for that object of interest may appear in a window below the timeline. The timeline view may include time on the X axis and groups/clusters on the Y axis. Peak interest points for objects may appear as dots at points in time corresponding to the groups tha have interest. Clicking on that object reveals the chronotope for that object for all of those groups. |Θ234} interacting with data in the chronotope view may reveal what the- object of interest is. n some embodiments, a group of Herns may he selected at a time period for a certain cluster/group and, a word cloud or semantic analysis of proper nouns that appear in those Items may be assembled,
jft235] Social media sties enable users to engage in the spread of contagious phenomena: everything ftom iftfoimalion. and rumors to social movements and virally marketed products. For example. Twitter'™ has been observed to function as a platform for political discourse, allowing political movements -to spread their message and engage supporters, and also as a platform for information diffusion, allowing everyone from mass media to citizens to reach a wide audience with a critical piece of news. Different contagious phenomena may display distinct propagation dynamics, and in particular, news may spread differently through a population than other phenomena- Described herein is a system for classifying contagious- phenomena based on the properties of their propagation dynamics, by combining temporal and neiwork fea tures. Methods and systems described herein are designed to explore the propagation of contagious, hashtags. in two dimensions: their dynamics, that Is, the properties of the time series of the contagious phenomena, and their dispersion, that is, the distribution of the contagious phenomena across communities within a. population of interest. Further described is a method for simultaneously visualizing both the dynamics and dispersion of particular contagions phenomena. Using this method, particular contagious phenomeno ehronotopcs, or persistent patterns across time and network structure, may help emerge a taxonomy for contagious phenomena in general.
|0236) Given some contagious phenomenon pf p may be considered to have spread to user it the first time: that u engages with p. For simplicity, engagement is measured as mentioning the phenomenon. For news, mentioning is likely a sufficient form of engagement, while for a political movement, stronger evidence of engagement may be preferable (contributing money, attending a rally, etc,). However,: in social media sites, higher levels of mentioning often correlate with higher levels of engagement (e.g., users tweet about a political- rall '), while false indicators of engagement are rare: if a user wishes to mention a political movement to disagree with it she will often not use a tag or specific name referring to that movement, but use a variant of it (e.g., a Twitter™ user who wants Vladimir Putin out of power may use the tag #Putinout instead of Pufin when tweeting about the prime minister and future Russian, president), Therefore, the number of first mentions of p by users in some social media site is used as a roxy for the number of users that p Has spread to,
|023?1 In an embodiment, measure for characterizing contagious phenomena propagating on networks may include peakedness, commitment (such as by subsequent uses and time range), and dispersion (includin normalized concentration and cohesion). |0238] The peakedness of a contagious phenomenon is a scale-invariant measure of how concentrated that henomenon is in time. A peak: may be defined as a day-long period where total first, mentions by day lies two-standard deviations above the median first .mentions. The specific duration of the peak window and the required deviation can be varied to .maximize issefuiness for particular kinds of phenomena and for particular social media networks. Median may he used instead of mean because, du to the skewed distribution, of first mentions by day for most contagious phenomena, the mean i over-inflated. Contagious phenomena with snort lifespans tend to have a sharp peak, when a large number of people mention, the phenomenon, but the number of mentions is very s all on, either si de of the peak, In. contrast, long-lifespan contagions phenomena tend to grow slowly, with a less pronounced peak of mentions. The peakedness of a contagious phenomenon is the fraction of ail engagements with mat phenomenon, that occur on the da with the most engagements with that phenomenon, A high peakedness means that most of the network's engagement with the phenomenon (e.g., for a social network, people in the network mentioning it) occurs within a short span of time, typically, hours to days. In contrast a low peakedness means that the network's engagement with the phenomenon is spread over a long period of time, typically, weeks to months. Phenomena with high peakedness., such, as news stories, may propagate rapidly through the' network and then dissipate just as rapidly in the course of the daily news cycle. Phenomena with .low peakedness may include popular websites and videos, which may maintain a slow but steady rate of engagement—individuals in the network are constantly discovering these phenomena, even as others get tired of them and stop engaging. 0239] Commitment is the measure of the average scope of engagement with a particular contagious phenomenon by nodes in the network, or the stayin power of a phenomena, Using the example of people engaging with online content in a social network, the commitment with a particular piece of online con tent can be the a verage scope of mentions of that content by pieces of the network. This- measure would, for example, differentiate between a 'political movement that is just a fed, and one that .accumulates a number of diehard supporters who keep the movement alive. Scope may be measured in at least two ways, which leads to the following two sub- measures: Commitment b Subsequent Uses and. Commitment' by Time Range. In social media sites, the cost in terms of time and effort to mention something for the second or third or tenth time is relati vely 'small; therefore, for 'a second dimension, two quantities may be defined: first, the average numbe of subsequent mentions (all mentions excluding the first mention of the phenomenon by user) '-of a contagious phenomenon among the adopting users; and second, the average time difference, (in days) between first: and last mention of the phenomenon among the adopting users. While the first measure, "Commitment by .Subsequent Uses," is relatively easy to Inflate by mentioning the phenomenon multiple times in a short period, the second measure, ''Commitment by Time Range", indicates long-term commitment to mentioning the phenomenon b a set of users.
|Θ24 | Commit ent by Subsequent Uses is the average number of subsequent engagements with a phenomenon after a node's, first engagement. For instance, if each person, in a social network played an online game at most once. Commitment by Subsequent Uses for that story would be zero. In contrast, if just one percent of the people in a social network played an online game thirt times each, Commitment by Subsequen Uses for that game would, be twenty-nine. Phenomena with high Commitment by Subsequent Uses may include online games, which encourage repeat engagements. Other phenomena with high Commitment by Subsequent Uses may include astro- turfed content, where a third party may encourage repeated interest in the content by paying or otherwise endorsing people who engage with it.
£0241] Commitment by Time Range is the average time period between the first and last engagement with a. phenomenon by nodes in the network, measured over some large time window (e«g„ a year). For example, if each person in a social network read articles on a biog ten times o ver the course of one day and never visited it again. Commitment by Time Range for that b!og would be one day. However, if just one percent of the people in a social network read articles on a blog once every week for ten weeks and then abandoned it, Commitment by Time Range for that blog would be ten weeks. Phenomena with high Commitment by Time Range include blogs with loyal followers who keep coming back for more content. Phenomena with, low commitment by Time Range include news stories that, on average, a person reads only once and never sees again
|Θ242| In addition to measuring the dynamics of contagious phenomena (the properties of the time series of engagements with a phenomenon), the dispersion of contagious phenomena (the properties of distribution of a contagious phenomenon throughou a population) may be measured. Dispersion is a. measure of the distribution of ehgageriterits with a contagious phenomenon over the network through which it propagates. Phenomena that, are highly dispersed are broadly popular but may have less focused engagement from a particular group; phenomena that are not dispersed are not broadly popular, but may have focused engagement with a particular group. There are many ways of measuring the distribution of engagements with a phenomenon over a network, including the following two sub-measures: Normalized Concentration and Cohesion. |Θ243 j The Normalized Concentration of a contagious phenomenon presupposes a partition of the underlying network into discrete clusters, which usually represent communities. Given such a partition, the Normalized Concentration of contagious phenomeno is the fraction of all engagements, that come from, the cluster mat engages most with die phenomenon, or the Majority Cluster. For instance, if a social network were divided into two clusters, one of which engaged with a particular news story nine times, and the other, only once, the Normalised -Concentration for thai phenomenon would be 0,9. However, if both clusters had engaged with the story five times, the Normalized Concentration- for that henomenon would he 0.5, Phenomena with- high Normalized Concentration tend to fee the cause celebre of a particular community, e.g., political and' social movements thai have not gained wide traction. Phenomena with low Normalized Concentration may include headline news stories that touch many communities at once.. Depending on the size of individual communities. Concentration may or may not correlate inversely with popularity.
[0244] In addition to Normalized Concentration, some aspect of the connections between the engaged users ma be- measured. For example, it is passible that a contagious phenomenon is widely spread across a number of communities,, hut di ffuses only through strong ties- so that the engaged users form a clique. Conversely, it is possible that a contagious phenomenon is confined to- a single community, hut spreads through weak, ties and the engaged users are sparsely interconnected. Therefore, a measure of Cohesion may fee defined as the network density over the subgraph on all users engaged in a particular contagious phenomenon- Contagious phenomena that spread over strongly connected sets of users will have a Cohesion close to one, whereas phenomena that spread over weakly connected, sets of users will, have a Cohesion close to zero. The Cohesion of a contagious phenomenon is the network density of the sub-graph of all nodes engaging with, the phenomenon. The network density of a graph is the total number of actual connections between nodes in the graph divided by the total possible number of connections (usually n*in-l.)/2 for undirected graphs, wher is the number of nodes in. the graph). For .example, if only three people read a particular blog, hut all those people knew each other, the Cohesion of that blog. would be 1.0. in contrast, if ten people read a particular blog, but every one of those ten people knew exactly two of the others (the people were connected in a circle graph), the Cohesion of that blog would fee 10/(10*9 2 = 10/45-0.22, Phenomena with high Cohesion may include, stories', and memes- that. ropagate in an. "echo chamber" of people who already kno w each other and engage with similar kinds of online content Phenomena with low Cohesion include news and rumors that move between acquaintances, such that, for example, after multiple propagations, the person who hears the rumor and the person who started it ma fee total strangers. f0245j In embodiments, phenomena with high- Peakedness- tend to have l.ow'Commitment, making those- two measures a natural pair for comparing different online phenomena. For example, PIG. 18 depicts Commitment by Time Range on the Y axis and Peakedness on the X axis for two -different sets of data depicted by different icons. In this example, the two datasets are: i .) 1 .12 Bundled hashtags relating to specific topics shown in red or a icon #1 and 2.) a baseline dataset of the top 50Θ hashtags for all users sho wn in black or as icon #2, The bundled, hashtags display a generally lower level of Commitment by Time Range than the top 500 hashtags at the same level of Peakedness. ome of the top 500 hashtags 'have extreme levels- of Commitment, up to 150 days. H mh g with the highest levels' 'of Commitment a e of several sorts, which notably Include regional-location tags, tags for particular sports, religion tags:{e,g,s. "Catholic," "Jewish"), tags, for particular news outlets, and general tags related to investing and financial markets, intuitively, all. of these are topics that might engage « stable set of users Over a long time.
|0246j Referring to FIG, 1 , and in an example, dealing primarily with topics related to Russia, peafcedness is plotted for the bundled hashtags against both levels of Commitment: subsequent uses (FIG. !¾) and time range (FIG. 19b). In FIG. 1 a, there are several distinct .regions of the distribution. On the bottom right, hashtags with high Peakedness and low Commitment by Subsequent Uses .are all directly related to salient news events, which in this ease are the airport and metro bombings in Russia (#Domodedovo, ^explosion, #metro29, #Moseow29). On the bottom left, hashtags with tow Peafedness and low Commitment by Subsequent Uses are generally not -very .popular. Some of them are very generic- ^moscow* #rnetro'), and some just never had a peak nor became adopted by a committed user base. Som of these are tags that are similar to popular tags, but reflect less-used variations. On the top left, hashtags with low Peakedness and high Commitment by Subsequent Uses are all. regional hashtags (with the exception of the Nashi hashtag that refers to a pro-government political youth movement in Russia), These regional hashtags were tangential ly related. to; the forest lire events, but their main use is likely- in talking about local affairs, hence the high commitment of a few users. Finally, on the top right, there are a number of hashtags with both high Peakedness and high Commitment by Subsequent Uses. These tend to be pro-government political hashtags (#sRu and #GoRu -are both related to Medvedev's polic of modernization while #rospioner and #seii.ger are both related to the Seiiger yout camp). T his observation suggests that pro-government political hashtags have some event (such as- the- Seiiger camp) that is linked to a sudden burst of popularity, but subsequent to that event, users continue to include the hashtag in their tweets. This suggests that pro- government political hashtags may have "staying power" in the Russian Twitter™ community. Alternatively, or in combination with this, a. committal set of users may use the pro-government hashtag both before and after the event, perhaps in an organizational or mobilizing capacity. |0247j In contrast, and referring to FIG. 19b, some of the sa e clustering seen in FIG. 19a is depicted, where news is on the bottom, right, regional hashtags are on. the top left, but the top right group dominated b pro-government hashtags has moved down, indicating th t these hashtags do not have staying power ove long periods of time; they may be .mentioned multiple times, but in a relatively short tim range around the peak (days or weeks, not months). In contrast, the hashtags on the top right in FIG. 19b are the regional hashtag #Moseow and the political hashtag #Putinout (referring to the anti-Putin movement). It is important to note that #Putinoirt in particular lias relati vel long temporal staying power (an average of 50 days between first and last mention by a user in the dataset) but relativel short staying power by mentions (an. average of less, than six subsequent mentions).
f'0248) Referring to FIG. 20 and FIG. 21, measures of dispersion of hashiags are analyzed across a core set of Twitter™ users. In FIG. 20, the distribution across nine topics of Normalized Concentration are plotted by hashtag within each topic. Comparing across all nine topics enables distinctive patterns to emerge; the .minimum Concentration among pro-government hashiags in the Seliger and modernization topics is between 0.3 and OA In contrast, the maximum Concentration among opposition hashiags in the Eashin and Russian Drivers' Movement topics, is between OA and 0.5. Pro-government hashtags are on the whole more concentrated within one cluster man opposition hashtags. Hashtags r lated to news events, such as the Moscow Metro Bombing and the Domodedovo attack, tend to be diffuse, which is in line with the intuition that major news events tend to engage the population as a. whole rather than specific communities. |0249) in FIG. 1 , the distribution across nine topics of Cohesion are plotted by hashtag within each topic. For ease of visualizing, the distribution plots-are cut off at 0.2 and ail hashtags with Cohesion >0.2 are assigned a value of 0,2. Again, there is a contrast between opposition hashtags, which have extremely small Cohesio of 0.03 and below, and some pro-goveiTmient hashtags (especi ally those in the Seliger and modernization topics), that have the .much higher Cohesion of 0.1 -0.30, Curiously, a few news-related hashtags have, very high Cohesion, which suggests that some news-related hashtags may spread through strong ties.
[Θ250Ι FIGS. 18 through 2.1 provide a ..high-level analysis of hashtag diffusion among the Rmsi aft-speaking Twitter™' community, both from the temporal and the spatial (network) perspective. However, this analysis necessarily leaves out the idiosyncrasies of individual hashtags. Referring now to FIG. 22a, FIG. 22k and: FIG, 22c, ehroooiopes of the #metro2 (a), #saraara .(b), and #IRu (e) hashiags are depicted. In typical chronotope images, color indicates cluster group, and color brightness indicates volume of engagements.. Detailed analysis of individual contagious phenomena enables crossing the dimensions of dynamics (loosely, temporal properties) and dispersion (loosely, spatial properties) of the latter. Therefore, spaiioteraporai analyses of contagious phenomena,: such, as hashtags, may be constructed, and patterns in their diffusion across time and space may be discovered. Such patterns may be called the chronotopcs of the hashiags. A ehronotope is simply a pattern that persists across a spatiotemporai context, originally used in literary theory to describe genres or tropes.
0251) In order to discover hashtag chronotopes, the diffusion of individual hashtags is visualized both across different communities and across time. First, a particular hashtag is selected and the set of engagements of Twitter™ users with this hashtag is binned by day; Next for each day, the volume of engagements for that day is broken down b cluster group. Finally, a grid where columns correspond to clyster groups and rows correspond to days is created. Each -row-column ceil of the grid is filled with a color corresponding to the cluster group. A cue -as to the volume of engagements corresponding to a particular cell is given via the brightness of the color: the brighter the cell, the more engagements with a hashta on that da from that cluster roup. Black cells correspond t day's whe a .particular cluster group has no engagements with the hashtag, |0252j FIG. 22 shows three such visualizations: the #rnetro2 hashtag related to the Moscow Metro bombings on Mar. 29, 201:0; the ^sama hashtag related to the Russian city of Samara; and the #iRu hashtag, related to President Dmitri ' Medvedev's policy of modernizing Russia. These three visualizations display three distinctive patterns across space and time; #inetro29, in FIG, 22a has a "salience" chronotope, with engagements across the spectrum of cluster groups during the week around March 29. In -contrast, #samara in FIG. 22b has a "resonance"- chronoiope, with consistent engagements from the local cluster group, presumably residents of Samara talking about their city. Finally, #il u in FIG, 22c has a "resonant salience" ehronotope, with an initial cross-group burst of activity in late November 2010 (around the time of Medvedev's announcement of his new policies), followed by consistent engagements from the Pro- Government cluster group over the next month. Note that FIG. 22 does not contrast with FIG. 1% which suggests that pro-government hashiags have low staying power, but instead presents a more subtl picture; the cluster group of pro-government users remains active in the #iRu hashtag over the course of a month, but, as FIG. 1 b indicates, individuals within that cluster rarely carr on with adoptions for more than 5 days. There may he/a high turnover of users of the #iR« hashtag, with new enthusiasts comin in even as the original adopters lose interest in the topic.
|Θ253 j In embodiments, phenomena with: the Salience Chronotope tend to have high Peakedness and low Commitment, while phenomena with the Resonance Chronoiope tend to have lo Peakedness and high Commitment by Time Range. Phenomena with the Resonant Salience Chronotope tend to have both high Peakedness and high Commitment b Time Range.
10254] In an embodiment, a flexible algorithm may be used- for optimizing a targeted network influence campaign. For example, a user may have a high CFI score, but the may not message their social networks f equently, thus targeting: these individuals may not optimize the campaign. The algorithm may output an M Score, which may be calculated from a CFI score plus some other network or behavioral metric. In embodiments', wherever it is described to use the'- CFI score, the score may instead be used to m ximize campaign e f ctiven ss; In embodiments, the M score may be an interpolation of the number -of followers of the target item (influence) and the CFI score of the target item (specificity). This mathematical, calculation may result in a normalized score OR a scale, such as a scale from I to 1 where 1 is low impact and 10 is high impact Thus, the M score is a general measure of influence and specificity.
('0255} One way to calculate the- score is to combine CFI 'and count, where count is the overall number of members on the map that, have engaged with thai target in a formulaic wa , The formula is M score^eount (aipha)+CFI (l-alpha) [normalized 1 to 10].
[0256] In embodiments, the M~seore may be user-tunable, so that there is a choice to prioritize "segment speeificity" vs. "global footprint" and/or "network position" vs. "behavioral profile" (e.g., someone who retweets frequently) when selecting; behavioral and/or network metrics to calculate the M score. In an embodiment, for example, a slider 2902 may be. provided to users so that can select a target thai is more niche or more global. The M score enables optimizing a campaign on network position or on behavior, if the slider is dragged towards "niche," alpha approaches zero and the M score is near equivalent to just the CFI score of the target item (high specificity). If the slider is dragged towards "broad," alpha .approaches 1 so that the M score is near equivalent' to just the number of followers of the target item (high influence), Setting the slider somewhere in between '"niche" and "broad'' allows users to tune the set of indi vMuais/enti ies that they want to target.
| 2S7{ in an embodiment, direct ad placement ma be enabled by CFI seores/M scores. Using CFI scores and/or M scores, a list of targets/website may be created and ads may be placed directly on the target/website via integration with, various products, such as Twitter™ sponsored tweets. Facebook™ ad. exchange, Google™ AdSense Adwords, third party online ad networks, and the like.
[Θ258} Referring now to FIG, 24, a recent activity page of a social media map platform provides recent acti ity, such as new followers, new infiueneers following the user, an indication of any retweets including the number of people who have reiweeied an item, changes to the user's cluster groups with links to respective group overview screens, a list of new influenced including their cluster group and their number of followers, the current conversation leaders including their cluster group and their number of followers, a view of all media being shared in the network including the latest influential media and the segments in which the media is. influential, links to an overview page, links to a lists page, links to a help and support page, and the like. The user may continue to their map from thi screen. Graphics, such as a bar graph, may be included in the changes to the user cluster groups box to indicate the number of users in each cluster group. Graphics, such as a bubble char may also be Included in the media box to indicate the size of the segments in which the displayed latest: media is influential
02591 Referring now to FIG. 25, another example of a recent activit page of a social media map platform is shown, In. this example, new followers- are shown; .included in the number of followers are new iniiuencers and group changes, including a percent change for each cluster group, information on ne mfiuencers, such as their name, handle, number of tweets, number of followers, number of people they -are following, and a button to message, thera or follow them. Also on this page are trendi ng terms/URLs.. including the number of men tions of a hashtag that is related to the user, trending media and imagery, and latest ln.ikieo.cer tweets, icons m y be provided to reply, retweet, favorite a tweet, share or embed a tweet, and the like,
|0260j Referring now to FIG. 26, an overview page is shown. The overview page includes a table of cluster groups, the number of .members in the group, the power of the cluster, and the tweet activity, A power score is an indication of which segment is worth engaging with and may be an indication of which segments are most dense and represent the greatest signal of interest. In One embodiment, .power may be calculated based on network density: the number of connections divided by the number of possible connections. In another embodiment* power is calculated based on coordinates, such as the average distance from the center of a cluster map. in another embodiment, power may be calculated as the average distance from the eentroid of the cluster that emerges in the clustering computation. In embodiments, power is like the segment cluster version of the M score,
| 26i| Continuing with the page n HO, 26, an individual cluster may be selected and a representation of that cluster in a map may be highlighted. For example, the UK design cluster has been, highlighted and a dialog box appears showing more information about the individual group, including number of members and graphics depicting the power and tweet activity associated with the group. When the user clicks the "Read more" link, a box may appear with more information. The map and group information, items may remain visible when the page scrolls such that they are in a fixed position.. Selecting clearer on the page overview causes the selected row to be cleared and makes all map nodes visible. An alarm icon on the overview page allows the user to review all recent activity including number of tweets from various members of the network. Selecting "view full -screen map" will send the user to a screen such as that shown in FIG. 27. Referring now to FIG.. 27, a full-screen map is displayed, in' this map, the international cluster has been selected and the South America sub-cluster was selected. The colored nodes in the map may Indicate one or both of the selected clusters and sub-clusters. The mfiuencers In a particular sub-cluster may be viewed and when an infl'uenceris selected, the URLs associated with that infiuencer there may be shown, A node overview may appear including the infiuencer name, their handle, their location, their URL, when they joined the social network, their number of tweets, their number of followers, the. number of people they -are following, the groups they are linking in, the number of in-iinks in each group, as well as any other relevant information. |Θ262] Referring now to .FIO. 28, an embodiment of an overview page is shown. In this view, a segment or cluster has b&exi selected and data regarding that segment is displayed, such as key influencets, current conversation leaders (mentions), an interactive map, key photos and. videos or other -.media, key tweeis/tetweets, key websites, key content,, latest conversation terras, and the like. Effectively, this page shows an enhanced version of cluster-focused data and makes it more accessible, The power score for the segment is displayed as well as an icon from which, the user may take certain actions such as build their network, find content, find media, find tweets, message followers, launch a Twitter™ campaign, launch a Facebook™ campaign, launch a. mobile campaign, launch a social media campaign, launch an AdWords campaign, launch an advertisement campaign, and the like. The overview page may be a user interface, "Notifications of certain data and data presentation may be made in the user interface, for example, which may be implemented by software and embodied in. a tangible medium, such as a mobile device, smartphone, tablet computer, or the like. The use interface may be a touchscreen embodiment, such that to utilize the user interlace, a user is required to touch the screen of the device displaying the user interface. The user interface may be accessible on different computing devices and capable of dynamically accessing user specific data stored on a network server and/or local dev ice. 10263] Referring now to FIG. 29-, the "infiueneers" tab has been invoked. Various ways to filter the inftuencers are provided such as by follower status (all followers, follows the user, does not follow the user) or by .following status (show all, the user fallows, the user does not follow). Another way to filter mfiuencers may be by M score, follower count, mentions, name, screen name,. and the like. One way to filter by M score is by use of a slider 2902 to obtain more niche or broader -individuals/entities as. described elsewhere herein. Another way to 'filter individuals entities may be by their exposure to particular content. By utilizing this titer, the user may target individuals entities wh have not already been exposed to the content, Users may take action from this page such as to follow selected individuals entities, save individuals/entities to a Twitter™ list, create. new list, add a selection to a list, send a direct message, send a sponsored tweet, and the like. When saving individuals/entities to a Twitter™ list, a dialog box may appear with list choices for the user, such as a list for my mfiuencers following me, a list for my !nilueneers and not following me, a branding group, and. the like. In this example, one action being -taken is to follow seven new users. By following individuals/entitie and engaging in behaviors that might cause them to be aware of the user, the users network may potentially expand to include the newly followed individuals/entities.. Another action that is taken is to compose a message. The compose message screen, may include suggested content such as. most used hashiags or other media based on a CFl, popular terms, key content uch a high M score media, and the like, Snfiueneer information may be leveraged, in determining whom to message. The suggested content may be filtered by the exposu re of target individuals entities- to the content Data related to the content such as its peakedtiess, first appearance, and the like may be exposed to the user so that the user can decide whether it makes sense to share the content with other indlviduais/eniittes. Referring, to FIG. 30, users n¾ay be able to drill down to the individual infiueneer level to see in what other segments/clusters the individual is influential, their latest tweets, M score, number of tweets, number of followers, number following, footprint,, following/follower status with respect to the user, demographic information, URL, and the like, icons may be available to follow, act (i.e., add the person to a list, retweet their latest tweet, send a direct message, etc.), view a .social media profil , and the like,
1026 ] Referring now to FIG, 31 , a tab for conversation leader is displayed. Various ways to filter the conversation leaders are provided such as by follower status (all followers, follows the user, does not follow the user) or by following tatus (show ail, the user follows, the user does not follow). Another way to filter conversation, leaders is by peak, date such as all, today, past week, pas month, custom date range, and the like, Another wa to filter conversation leaders may be by score, follower count, mentions, peak, peakedness, name, screen name, and the like. Another way to filter conversation leaders may be by their exposur to particular content. By utilizing this filter, the user may target individuals/entities who have not already been exposed to the content. Users: -may take action from this page such as to follow selected individuals/entities, save individuals/entities to a Twitter™ list, create a new list, add a selection to a list, send a direct message, send a sponsored tweet, and the like.
10265} Referring now t FIG. 32, a tweets tab is displayed. The tweets may be filtered by peak date such as all today, past week, past month, custom date range, and the like. Th : tweets may¬ be filtered by M score, re-tweets, original postdate, peak, peakedness,. name of poster, screen name o f poster, and the like. One way to filter by M score is by use of a slider to obtain an audience that is more-niche or broader, as described elsewhere herein. Data regarding each displayed may include an M score the number of influential re-tweets, the number of retweet, the posted date, the peak date, a graphic of the peak, pattern, icons with which to take action such as reply/retweet/favorite, name, screen, name, and the like. Selecting one of the tweets may cause a drill down box to -appear with additional information about the' individual/entity who made the tweet, such as M score, number of tweets, number of follo ers, number following, footprint, number of friends, follower/following status, demographic data, URL, which segments the individual/entity is ret eetmg in, who have they been retweeted by, icons to social media profiles, icon with which to take actions such as reply/'re-tweet¾vo.rlte/add to list, and the like.
|0266| Referring now to FIG. 33, a websites tab is displayed. The websites can be sorted by mentions, M score, subpages mentioned, hostname, and the like, One way to filter the websites by score i s by use of a slider to obtain m audience that is more niche at broader, as described elsewhere herein. Users may take action from this page such as to buy an. ad, create a new list, add a selection to a list, and the like, Sefccting'a website reveals 8 dril l down box. for the website, information about the website in the drill down box may include M score,, distinct mentions, mentions, subpages mentioned, excerpt, peak date, a graphic of the peak pattern, segments clusters the website is mentioned in, who mentioned the website, latest tweets 'mentioning this URL., a button to take action, and the like.
ΙΘ267} Referring now to FIG, 34, a tab for key content may be displayed, information about the ke content included in this view ncl des the name of the website, name of an article, URL, peak date, a peak pattern, M score, citations, distinct citations, and the like. The key content may be sorted by score, citations, peak, peakedness, host name, content title and the like* One way to filter by score is by nse of a slider to obtain an audience that is more niche or broader, as described elsewhere herein. The key content may be filtered, by peak date such as all, today, past week, past month, custom date range, and the like. Users may take action from this page such a to compose a message, compose a tweet, view a drill down box for the key content, and the like. In the compose message or compose Tweet view, users may be able to select one or more individuals/entities or and. influeneers conversation leaders to message with suggested, content (most used hashtags, popular terms, key content, etc,)- In one embodiment, the individuate/entities may be part of a list such that either certain members of the list or the entire list may be easily included as recipients of the message. Selecting a. key content reveals a drill down box for the content. Information about the content in the drill down bo may include name of website, title of article, M score, distmetmentions, mentions, subpages mentioned, excerpt, peak date, a graphic of the peak pattern, segments/clusters the content is mentioned in, who mentioned the content, latest tweets mentioning this URL, most used, hashtags, a button, to lake action (tweet this, use in direct message, add list, etc.), and. the like,
ftl268] Referring now to FIG. 35, media tab is displayed. Medi may be filtered by images, videos, audio, GIFs, and the like. The media may be filtered by peak date such as all, today, past week, past month, custom date range, and the like. The media may be sorted by M score, citations, peak, peakedness, host name, content title and the like. Information about the media in this view may include title, duration, media type, score, mentions, distinct mentions, peak date, peak pattern, and the like. By selecting one of the media items, a drill down box may appear, information in the drill down box may isekfde itle of media, URL, M score, mentions, distinct mentions, peak date, peak pattern, media type, duration, what segments/clusters the media is mentioned in, most used hashtags, who has mentioned the media, latest tweets mentioning this media, an icon to take action with, and the like, |0269] Referring 'to FIG. 36, a tab for terras is displayed The terms may be filtered by hash tags, one word, 2 words, 3 words, and the like. The temis may he filtered by peak date such as all, today, past week, ast month, custom date range, and the like. The terms may be sorted by M score, citations, peak, peakedness, host name, content title and the like, information, about terms In the list may include the term, peak date, peak pattern, M score, mentions, distinct mentions, and the like, Selecting a term may reveal a drill down, box where additional information out the term may be displayed Including which segments/clusters the term has been mentioned in frequently, what other terras have been mentioned with the selected terra, who has mentioned the ierm, latest tweets mentioning this terra, an icon to take action with, and the like.
|0276'j Referring now to FIG. '37, a list page of a social media ma platform i s displayed; In this view, information may be provided in the form of 'lists, such as lists of influencers, conversation leaders, key content, terms, and the like. Information about each list member may include name, screen name, M score, followers, mentions, follower/following status, and the like. Lists .may be sorfed fsltered by any of the techniques mentioned, h ein including by influence, M score (such as with a slider or other user input), and the like. Users may take action from the list view, f0271] In further embodiments, an analytical framework for a coordinated campaign identification Includes proposing a framework fo analyzing fabricated social movements. In many embodiments, not only Is there the ability to distinguish these movements from truly organic ones, there is also the ability to create a formal method for studying patterns of fabricated, pseudo- grassroots (also, "astToturf") collective action.
0272] it will be appreciated in light of the disclosure that any .suc collective action may be required to gi ve the impression of a large group of people coalescing around a movement that is easy to describe and share with others. I the group is not well-connected enough,: then it may be iogisticall difficult for any actor to organize the group's online behavior. If the group is not actin In temporal lockstep, 'then its' message may not achieve a high frequency. In' embodiments, low-frequency messages do not appear as global trends: for example. Twitter's "trending" algorithm appears to identify topics that are popular now, rather than topics that have been popular for a while or on a daily basis, to help yon discover the hottest emerging topics of discussion on 'Twitter™. The many examples remain applicable to the' myriad social platforms. Finally, if the group behind a fabricated social movement does not promote it with a coherent message, the movement's impact on the general public .may be blunted by conflicting information.
fi)273] It will be appreciated in light, of the disclosure, that 'these- constraints suggest a natural set of three- dimensions for analyzing fabricated social movements: 1.). the semantic dimension (how messages are formulated), 2,) the network, dimension (how accounts within the campaigns are connected to one another), and 3.) the temporal dimension (when messages spread throughout the campaign). In many embodiments, these dimensions, and their intersections, yield discrete signals that can be used to scrutinize social media, operations and assess if they display a suspicious degree of bidder! coordination.
fl*2?4j lii: embodiments, the framework operates on three levels: 1.) Event, the level of an entire social campaign; 2.) Segment, the level of a community of users participating in a social media campaign (e,g., Russian, social media troll accounts),, and 3,·) Actor., the level of an individual user participating in a social media campaign.
|0: 75] Table 1 below shows examples of the three-dimensional analysis framework in more detail specifically, the signals relevant for particular., combinations of level and dimension. It will be appreciated in light of the disclosure that not every combination of level and dimension has corresponding relevant signals.
Figure imgf000070_0001
|0276] This framework is a helpful methodological tool, bat it would not be useful without operational definitions., which are captured via mathematical metrics of 'campaign activity. -In embodiments, each signal in Table 1 a ove: is mapped to a discrete metric in. Table 3. Further detail regarding key definitions for understanding these metrics, and any non-obvious activity metrics are provided herein.
Figure imgf000071_0001
Table 2. Mapping of Signals to Metrics
Key 'Definitions
Network
$277] hi many embodiments,: the network dimension assumes thai actor participating in a campaign are connected to each other in a directed network 0 (i.e., a connection from, user a to user b does not imply the reverse). Twitter1^ followin networks are an example of directed networks: many people follow Twitter™ celebrities, but those celebrities do not follow their fans back as a general rifle. Other social media platforms' and 'connected- platforms are applicable. Segmen
fl*2?8 When calculating metrics at the network level, it is assumed that each actor participating in a campaign belongs to exactly one community «?». where c represents a group of actors with similar interests, whether social, political or otherwise.
Identifying Networks and Communities
|0 79] in order to Identify relevant networks and communities within those networks, network segmentation technologies are leveraged such as hierarchical aggiomerative clustering. In man examples, it. ma be shown that network segmentation framework, based on .hierarchical aggiomerative clustering has been tested on more than eight hu dred different socioeultural contexts with many academic applications. By way of many examples.,, the unit of analysis is a "map," which may be a -collection of key social media accounts around a particular social context A map may be composed of "nodes,** which are the social medi accounts in question. Each node may be connected to one or more nodes in the map through "edges" and edges may represent social relationships embedded in the respeciive social media platform (e.g„ "following" for Twitter™, Facebook™, or the like).
\Q28ty In embodiments, each node in the map ma belong to exactly one "segment" and one "group."' By way of these examples,, a segment may be a collection of nodes with a shared pattern of interests, (e.g., a collection of Twitter™ accounts wh all follow US Tea Party politicians). Bach segment may have a label (e.g., "Tea Party"). A group may be a collection of segments with similar interest profiles (e.g., a collectio of "Tea Parry," "Constitutional Conservatives "- etc. segments into a 'Conservative" group). The process for generating segments, groups, labels, and colors for a map be fully or partially automated, as follows; a proprietary clustering algorithm may automatically generate segments and groups for a map; subsequently, the map-making process may use supervised machine learning, to generate label for segments and group from human- labeled examples. At the end of the automated process, a Subject Matter Expert, an individual well-versed in the topic. and/or geographical area covered by the map, may perform a quality assurance check on the segment and group labels.
Key Metrics Explained
19281] To illustrate metrics in. this section, a toy campaign example may be employed. The example- consists of 100 users connected in a network C The network G further breaks down into exactly two communities A and B, each with exactly one half of the total population. The overall number of connections from members of A. to any other actor in the network is 500, while the number of connections from members of A to members ofB is 200. The campaign proceeds over the course of ten days* and the first of those days features the highest level of campaign activity, with exactly one quarter of all actors participating.
Entropy E
f l*282j This. metric is th degree to which a particular campaign is concentrated in one community versus diffused among many different communities. Given a mapping of users- to Communities, which i described, in more detail below, the entrop of a campaign may he, as known in. the art, the information theoretic entropy of the distributioa of users active in the campaign among different communities, in the toy example, the Entropy of the -campaign mm fee:
E ~ - ) P(c(i))l08h(c(i)) -™0.5to,¾ (0.5) - 0.5io#?(0.5) = 1
In general, it may e shown, that low values of E represent campaigns concentrated in one community, while, high values of E represent campaigns distributed among a wide array -of communities.
Inter-comtnunity Homophily H
{0283] It is known in the art that the Inter-community Homophily if is the degree to which communities active around the campaign are more interconnected than one would expect by random chance. Mathematically, If is calculated for an ordered pair of communities: A, B, The quantity H(AfB) is the ratio of the actual number of connections from members of A to member of f, Ε(Λ,Β), to a normalizing factor p that assumes thai members of A make their connections to all other nodes at andom, in the random, baseline, the number of connections from, members of A to members of B is the nu mber of all connections from members of A to any other node in the network -, multiplied by the fraction, of G that & represents, in the toy example, the Ifomophi!y from edmmumfy J to community B is:
E(A,B) 200
H(A, B) = ---- --· = 0.8
p 500 * 0.5 f0284| Values of H below 1.0 may be shown to represent hetero h ly, or lower-than-expected inierconneetivity between communities. Values of H equal to .1.0 may he shown to represent, the baseline' random expectation. Values of H above 1.0 may be shown to represent honiophily, or higher-thaivexpected- lntereonnectivity.
0285j H superimear, so a value of // ::: 4,0 is much more than twice as interconnected as H ~ |0286] While the random baseline for Moraophiiy is established in the citation above, it will be appreciated in light of th disclosure tha it may be ari excessively low baseline for such empirical analyses. Therefore, when possible, H values are used for community pairs where there may be expected low / high values (e.g., ideologically separate ideologically aligned communities) in the same networked terrain as the case study as a baseline..
ommitment M
|0287j Commitment to a. particular campaign is measured in two ways: 1 ,) Ms, the number of subsequent engagements with the campaign by an actor; or 2.) Mr, the length of time between first and last recorded engagement with the campaign by an actor.
Semantic Diversity Ω
\&2 \ Semantic diversity of a particular actor's / segments / campaign's messaging is based on the assignment of messages to topics. As known in the art, LDA is a common method for identifying topics in text data. Once messages have been assigned to topics, a semantic diversity score ma be calculated for the message set. The authors of the referenced work may represent their measure of semantic diversity as the probability that two documents chosen from the corpus at random with replacement will be on the same topic. By way of these examples, the corpus may be the message set, and the documents -may be user Tweet histories, post histories, etc. aggregated b user, in many examples, the LDA algorithm may run for 15 iterations, with a number of topics no less than 20% of the number of documents and no more than 30 iterations and may average semantic diversity over 20 distinct runs of the LDA algorithm, on the same corpus to smooth out variations due to the initial conditions for a particular run. For topics that do not co-occur in documents, a topic may be assigned a distance score of 1000,
|0289] in embodiments, versions of £1 arc run. for individual users (OA), communities ( c), or entire campaigns (Ω ), These metrics can also be run for all messages within particular time period (Ωτ*) to calculate the change In semantic diversity over time.
|ίΙ29ίι] Semantic diversity scores of less than one may represent user who exclusi vely post abou t the same topic, characteristic of fabricated campaigns. Seman ic diversity scores between 1 and 100 may represent users who post on a variety of topics, characteristic of normal human activity. Finally, semantic diversity scores above KM) may represent users who post on an extremely diverse set of topics, characteristic of sparahots or users who bridge different cultural and/or linguistic communities (e.g., users who post in. different, languages, etc,)
Campaign Feakedness F
|0291] Campaign Peakedness may be defined as the fraction of all. activity that occurs, in the day with the most campaign-related activity during some time frame, in the toy example, P ~ ¼ ~ 0.2$. Dynamic Time Warp Alignment!)*
0292| The Dynamic Time Warp is an algorithm■■known in the art for comparing two temporal sequences of activity. In. the many embodiments, the Dynamic Time Warp may be used to compare the activities of individual users iPU) or entire segments (DS), In general the Dynamic- Time Warp between two sequences SI and S2 is the number of warping tmnsformations that are require to change S: into S2. In many examples. Dy namic Time Warp may be used to identify hots and trolls in a different social media setting.
0293j hi many examples, this framework nd these metrics on eighteen case studies of political campaigns have been tested in seven different sodoeultural settings, spanning three continents and six years in all. These studies included ten groups of witte '^ hashtags linked by subject matter experts (SME) to known coordinated campaigns, and eight groups of Twitter™ hashtags linked by SMEs to known spontaneous campaigns. Based on the eighteen case studies, it may he shown that clear differences between coordinated and spontaneous campaigns across socioenlturai setting and time for four of the metrics listed above: Entropy ¾ Commitment by subsequent engagements MB, Time delta, and Peakedness . The same analysis also showed that at least one especially coordinated campaign showed extremely low values of Semantic Diversit by Event Ωι.: and high Dynamic Time Warp alignment lSbetween the acti vity of different segments. 10294] In further embodiments, methods and systems are disclosed for identifying markers of coordinated activity in social media movements that may identify a large number of accounts that ma be controlled by a small number of coordinated entities that may result in a measurable lack of diversiiy of a similar number of accounts controlled by uncoordinated individual actors. To facilitate the methods and systems of identifying markers, of coordinated activity in social media movements, a framework of signals (or metrics) along at least three dimensions may be constructed and may include, without limitation:
fU295j A Network, dimension that may; for example, represent how accounts are connected;. 10296]
Figure imgf000075_0001
and
10297] A. Semantic dimension that may represent, for example, diversity of topics and meaning. |0298| From this framework, a plurality of hypotheses may be derived for ""signals" exploring potentially hidden coordination on social media movements on a social media channel such a Twitter™, Facebook™ or the like. The exploring potentially hidden: coordination on social media movements on a social media channel may occur at the level of the entire campaign (e.g.. nine signals), a uster level of the campaign (e.g., a set of well interwoven accounts), at the individual account level, and the like. In embodiments, the pluralit of hypotheses may include twenty-five or more such hypotheses. Empirical evidence associated with these signals can be shown across a number of case studies of known coordinated (i.e., inorganic, eeotrally-eoniroSled) and spontaneous (i.e., organic, individually) campaigns. In embodiments, three- of the campaign signals may systematically reveal coordination in social media movements on Twitter™, Facebook.™- and other platforms. Some signals, either at the cluster or at the individual account level, ma facilitate campaign analysis, and some of them may be transformed into campaign- level signals.
|Θ299| Campaign / Cluster / User ~ Each campaign may include a set of "seeds" from a specified timeframe that may be, for example, a hashtag, a sentence shared in posts, a URL shared in posts, or the like. In embodiments, clusters may be communities of users active within the campaign. In embodiments, users ma be defined by their individual accounts* defined by their Twitter™ handle, Facebook™ identification defined by their user name on .other social media platforms, or the like,
fiB!Ci] Network Terrai - Campaigns may occur in a specific context referred to: as the "network terrain." In one example, jt will be appreciated in the light of the disclosure tha the #BlackLi vesMatler movement may be better analyzed' within its ''network terrain," which display s the US political conversation on rwitter™, Facebook™ or other relevant social media platforms. In a representative' odel, social media platforms like ' witter™, Facebook™ may constitute a eyber-soeiai "network .terrain" formed by the relationships (such as following in Twitter™, Facebook™, or the like among actors. The structure of the network or social media platform may determine who and what may be visible to whom, and thus it may be the social landscape on which the struggle for in.fiu.ence may occur. The methods and systems may include analyzing case study campaigns across specific network terrain maps in order to understand the relationships between participant and the patterns of campaign propagation across specific online communities (e.g., clusters or clusters discovered using machine learning analysts' of network relationships and the ilke).
fiBilJ Campaign versus Investigatory Signals - Signals measured at the cluster and individual actor (user) levels may facilitate investigating the inner workings, of 'specific campaigns, building a more qualitative understanding of how these campaigns unfolded, and helping form -campaign level metrics among other things,
f0302j Case Studies - To date, the methods and systems may include testing signals set on a set of case studies and exemplary campaigns.
SIGNAL SUMMARY
flBi3] Exemplary investigatory Signals - The investigatory1 signals may operate at the cluster or at the individual level. The investigatory signals may facilitate building a qualitative .understanding of the dynamics of a campaign and may provide tools to build campaign-level signals. (CJ indicates a signal operating at the cluster level, and [ϋ] indicates a signal is operating at the user level
(0304] The following a e exemplary priority signals:
(0305] Concentration in Lead Cluster (Cj;
030 ] Concentration via Entropy [€]:
|03«7] Day ~ eakedness C3;
(0308} Temporal coordination per cluster [Cj;
(0309} Temporal coordination per user [U];
(0310} Client-diversity per cluster (CJ; and
|031Ι] Time delta between clusters [CJ.
10312] Other signals include:
(0313] Commitment by user p.fj;
(0314] Commitment by cluster [CJ;
(0315) Account creation date diversity for cluster [C];
0316] l-iomophiiy [C];
(0317} Language mismatch [Cj;
0318] Russian language profile % [C¾
10319] % in cluster also active. [C];
10320] % of hits in own cluster (Cj;
(0321] Account creation date diversity j'C];
10322} Semantic diversit by user for user Tweets™ (or other postings) [U];
10323} Semantic diversity by time slice by cluster [CJ; and
(0324} Semantic diversity b time slice by user [U'j.
10 25] In embodiments, a priority signal name is Concentration in Lead Cluster.
10326] The concentration in lead cluster signal description - Large-scale spontaneous campaigns may be more likely to engage participants fr m a range of different clusters, whereas coordinated campaigns are typical ly highly concentrated in a specific cluster of the network or social media platform. The .concentration in lead cluster signal (metric) evaluates the degree to which m entire campaign's activity is concentrated in a particular cluster of participants. The - concentration in lead cluster signal (metric) may measure by the traction of all campaign participants who are members of the most campaign-active el aster, in the network terrain map.
(0327] The range of score value range of the Concentration in lead cluster signal (metric) is zero to 100%. In embodiments, the concentratio in lead cluster signal (metric) value is computed by determining the value of the concentration of the fraction, of a campaign's participants that are members of the most active community in the campaign. In an example including a 3-com.munity map, if 30 participants are from community A, 25 from community , and 25 fiom-conimunity C, -then t ie value of the concentration in lead cluster signal (metric) for the campaign .'on this ma equals 30%. In embodiments, possible values of the concentration in lead cluster signal (or metric) may be between .0 (i.e., not concentrated) and 1 0% (i.e., fully concentrated in 1 cluster).
ffBlSJ The concentration in lead cluster signal (or metric) raay be consistent across a set of campaigns, which may cover a variety of geographies and dates. It will be appreciated in light of the disclosure thai coordinated campaigns, on average, may be shown to have larger values of the concentration, in lead cluster signal (or metric) than those of spontaneous campaigns. It will also be appreciated in light of the disclosure that there may be some overlap between the coordinated and spontaneous ranges due at least in part to a large number of sociocaliural setting and time periods in the data sets.
('0329] An exemplary average value of the concentration in lead cluster signal for coordinated campaigns is 48%,
£0330} An exemplary range, of values of the concentration in lead cluster signal score for coordinated campaigns is 20% to.89%, The range here is the full range between the lowest value and the highest value for this category in the campaign.
|033Ι] An exemplary value of the standard deviation of the concentration in lead cluster signal for coordinated campaigns is 0,21.
|0332] An exemplary average value of the concentration in lead cluster signal for spontaneous (organic) campaigns is 22%.
10333} An exemplary range of values of the concentration i lead cluster signal scor for spontaneous campaigns is 9% to 50%.
|0334| An exemplary value of the standard deviation of the concentration in lead cluster signal for spontaneous campaigns is 0.12,
f0335j in embodiments, the performance of the concentration in lead cluster signal (metric may be sensitive to: the speciiic terrain map being used because the signal (metric) may be less successful if the terrain map used only captures the active participant in a campaign. The concentration in lead cluster signal (metric) may be more successful when, capturing the broader terrain in which the campaign under scrutiny unfolds,
|0336} The methods and systems described herein also include computing the value of the concentration in. lead cluster signal (or metric) using actions rather than users and .may measure what proportio of the total actions (Tweets™ or the like) in the campaign that came from the most active community. This approach can be shown to be. reliable because heavy posters (those who Tweet™ o the like) may create skews in the measurements.
10337} in embodiments, a priority signal name is Concentration via Entropy. (9338) The concentration via entropy signal description ----- The concentratio via entropy signal is nother approach to measuring concentration that looks at ho the participants aire distributed among the active: communities in the camp ign rather than simply looking at how many of them belong to the most prevalent '.community. The concentration via entropy signs! (metric) ma be shown to be a useful signal for knowing if more than one community is driving a coordinated campaign, which could be missed relying on theconesntration i . lead cluster signal (metric) alone. The concentration via entropy signal (metric) may calculate the concentration -of distribution among all clusters, la embodiments, coordinated campaigns generally tend to have values of the concentration via entrop signal (metric) that are less than 2.0.
(0339'j The .concentration via entropy signal value range - Relatively higher values of the concentration via entropy signal (metric) reflect more even distributions of participants between the communities active in the campaign. The lowest score is ze {all participants belong to the same comm uni ty). The highest, score depends on the number of communities active in the map,. Because the highest number of communities, in an exemplary case study map ma be 50, the highest entropy value in this example would be four (assuming a perfectly even distribution of participants amongst the 5(3 communities).
(0340 j Ho the concentration via entropy signal I computed - The concentration vi entropy signal (metric) may be. an entropy of the distribution of participants amon communities. In an example with a two-community map, the value of the Concentration vi Entropy signal would be 1.0 when 50 participants are from community A. .50 participants are from community B, and thus the distribution would be 0.5, 0,5.
|0341j Exemplary formula for the concentration via entropy signal (metric);
E - ···∑picmtmb(<
(0342) in the formula, c(i) is the count of participants in the ith cluster and p(c(i)) is the fraction of all participants coming from the ith. cluster,
(0343) In embodiments, the concentration via entropy signal (metric) is based on a logarithm c scale, so a small difference in entropy belies a large difference in the uoevemiess of the underlying distribution, it will be appreciated in light of the disclosure that a very rough rule of thumb is that a difference of one point in the value of the concentration via entropy signal may be equivalent a change in concentration by a factor of three, so a campaign with the concentration via entropy signal equal to two is three times more concentrated, in a few clusters than, a campaig with the concentration via entropy signal thai is equal to three.
(0344) Analy sis in ease studies - The concentration, via entrop signal (metric) cm be shown to be consistent across campaigns despite the variety of geographies and dates. It will be appreciated in light of the disclosure that coord inated campaigns, on average, have a lower concentration via entropy signal,
f 0345] An .exemplary average value of the concentration via entropy signal for coordinated campaigns is 3L43.
£0346] An exemplary average range of values of the concentration via entropy signal for coordinated campaigns is 0.46 to 2.19.
0347} An exemplary standard deviation of the value of the concentration via entropy signal .for coordinated campaigns is 0.57.
10348] An exemplary average value of the coricefctraiion via entropy signal for spontaneous campaigns is 2.52.
ffl349j An exemplary average range of values of the concentration via entropy signal for spontaneous campaigns is 0.69 - 3.38.
|0350'} An exemplary standard deviation- of the value of the concentration via entropy signai tor spontaneous campaigns is 0.71.
035I,| In embodiments, the concentration via -entropy signai (metric) may be useful to analyze "battleground campaigns" where a few clusters tight for - control over the social medi narrative, e.g., on a dedicated hashtag,. where these campaigns, may he concentrated in these few communities and simply using a measure focused on the lead community may miss this activity. |0352j In embodiments, a priority signal name is Da Peakedness.
10353} tte daypeakedness signal description - A coordinated campaign, typically, may exhibit sustained activity by the accounts promotin it. Spontaneous activity, in contrast, is. characterized by "bursty" cascades of activity. In. embodiments, the daypeakedness signal may detail the percentage of all activity that the busiest da of the campaign may represent.
[9354] The daypeakedness signai (metric) of a campaign is measured as the percentage of campaign actions (Tweets™ or the like) that take place on. the most active day of the campaign. It will, be appreciated in light of the disclosure that generally spontaneous campaigns appear to be more "bursty'* because, for example, spontaneous campaigns exhibit more of a peak (or more of a. number of peaks) than coordinated, campaigns,
[0355} In embodiments, the range of the values of the daypeakedness signal (metric) is 0% to
100%.
[0356] In embodiments, the value of the daypeakedness Signal (metric) is 'computed by determining the fraction of all activity that occurs- on the day with the most campaign-related activity. Examples include a campaign that proceeds over the course of ten days, and the first of those days features the highest level of campaign activity, with one-quarter of all actors participating:. In this example, the value of the daypeakedness signal (metric) is 25%,
0357 | It will, be- appreciated in light of the disclosure that one-eighth of all activity in coordinated campaigns, on average, happens during peak day, whereas over one-third of all activity for spontaneous campaigns happens during peak day. in embodiments, the daypeakedness signal
(metric) can be shown to be consistent across campaigns despite the variety of geographies and dates. B way of this example, coordinated campaigns. Cm average, may have a lower value of the daypeakedness signal (metric} than spontaneous campaigns, it will be appreciated in Sight of the disclosure that there may be some overlap between the coordinated and spontaneous ranges due to the large number of socloculiural settings and time periods in the campaign,
f0358j An exemplary' average' value of the daypeakedness signal for coordinated campaigns is
.0.14,
£11359] An. exemplary range of values of the daypeakedness signal for coordinated campaigns is 0.08 to 0.22.
£0360) An exemplary standard deviation of the' value -of the daypeakedness' signal for coordinated campaigns is 0.05.
(Θ36Ι An exemplary average value of the daypeakedness signal for spontaneous campaigns is 0. 1
£1)362] An exemplary average range of values of the daypeakedness signal for spontaneous campaigns is 0 to 0,71,
£0363] An exemplary standard deviation of the value of the daypeakedness signal tor spontaneous campaigns is 0,21.
|0364j The daypeakedness signal (metric) may be: sensitive to daie oii«dary/iime~2ones most notably when the campaign: is being analyzed only over the last few days. In. embodiments, the sensiti vity of 'the daypeakedness signal (metric) may be improved by allowing it to be less sensitive to time zones,
£1)365] It will be appreciated in light of the disclosure that there- are other possibly more complex ways to calculate the value of the- daypeakedness -signal, in embodiments, the peak time may be identified as- the median of time stamps of a dynamic phenomenon to be able to observe a logarithmic distribution of volume around the peak. The methods and systems described herein may identify peaks as days when, volume exceeds two standard deviations above the median, and may calculate the value -of the daypeakedness signal as a fraction of 'all content, that occurred during a 24-hour period, it will be appreciated in light of the disclosure that the median volume may- be used instead of mean volume due in part to the .-observation that volume 'follows a skewed distribution, so the mean may not be an appropriate statistic to use to characterize it. The measure of peakedness in the methods and systems described herein may be relatively less sophisticated and, therefore, may be easier to interpret while giving a good initial impression of the utility of the signal from a social media platform for identifying coordinate campaigns,
£0366] la embodiments, the value of the daypeaked ess signal (metric) may be affected by the overall time range of a. -campaign. By way of this example, if a campaign lasts three clays, then the value of the daypeakedness signal may not go below 33% but if the campaign lasts 1 days, then the value of the daypeakedness signal .cannot g below 10%. In embodimeft is, campaigns may last as little as one week and may last as long as several months. The value of the daypeakedness. signal may be shown to follow the pattern described i the campaign value examples across these time ranges.
Θ3&7] lo. embodiments, a signal name is Commitment: Average Posts Count in the Campaign. |036S| The commitment: average posts count in campaign signal description - Campaigns typically feature numerous die-hard supporters who post .repeatedly and fewer casual participants who merely chime in. This commitment: average posts count in campaign signal (metric) may capture the degree fo which a campaign's body of actors sticks with further posting after their first engagement with the social media platform. In embodiments, the value of the commitment: average posts count in campaign signal (metric) can include the average number of campaign- related posts that participants publish: after their first campaign post.
{0369] The range of values cvf the commitment; average posts count in campaign signal (metric) is bounded by the lowest value being zero which corresponds to a user only posting once about the campaign, hi embodiments, the commitment: average posts count in campaign signal (metric) may have a range of values between 0 and 10 posts, it will be appreciated in light of the disclosure that the maximum value of the commitment: average posts count in campaign signal (metric) could be much higher. In one example, participants in a campaign may be very dedicated and may post 100 times about a certain subject during the scope of analysis, and the like.
|03?8>] To compute the value of the commitment: average posts count in campaign signal (metric), the methods and systems disclosed herein determine, the average number of subsequent participation actions, e.g.. Tweets™ (or other posting) with 'campaign hashtag, acros ail participants in a campaign. In embodiments, participants (i.e., posters) i a campaign can be a smaller subset o participants in a map. In embodiments, the map may capture some of their 'followers ahd r other members of the network terrain when those. re h ighly connected to active participants in the campaign. In order to compute the commitment: average posts count in campaign 'signal (metric), only participants Who actually posted about the campaign are taken into account For example, when a participant posted through Twitter™, Faeebook™, or the like with a campaign-related hashtag twice, their commitment is 1.0, in embodiments, campaign participation can include Tweets™ or the like wit campaign-related hashtags (for campaigns organized around a hashtag). Tweets™ or the like with links to. a video or article (for campaigns Organized around a video or article), retweets of the above t eets and the like. Examples of out o f scope for participation include favorites of tweets with campaign-related hashtags or Sinks or @-rephes or {^mentions of Tweets™ (or the like) with campaign-related hashtag or links. f§3?l| it will be appreciated in light of the disclosure that participants in spontaneous campaigns post more about their campaigns than participants in coordinated campaigns. It will also be appreciated in light of the disclosure that this pattern may be counterintuitive, as one may expect participants in coordinated campaigns to be extrinskally motivated to hit certain participation targets (e.g., by being paid by number of posts) and thu to post more than participants in spontaneous campaigns, who lack such moti vation.
|03?2 j An exemplary average value of the commitment: average- osts count in campai gn signal (metric) for coordinated campaigns is 2.52.
| 373] An exemplary average range of values of the commitment: average posts count in campaign signal (metric} for coordinated campaigns is i ,28 to 3,40.
10374] An exemplary standard deviation of the value of the commitment: average posts count in campaign signal (metric) for coordinated campaigns is 0.84.
[0375] An exemplary av erage value of the commitment: average posts count in campaign signal (metric) for spontaneous campaigns is 3,53.
$376] An exemplary average range of values of Che commitment: average posts count in campaign signal (metric) for spontaneous campaigns is 1.39 to 6.07,
10377] An exemplary standard deviation of the value- of (he commitment: average posts count in campaign signal (metric) for spontaneous campaigns is. 1.48.
$378] In embodiments, the commitment: average posts count in campaign signal (metric) can be analyzed at the community level, at a cluster level, and a participant level. The commitment: average posts count in campaign signal (metric) can be analyzed, at the community level to single out communities with participants being particularly committed to a campaign. The. commitment; average posts count in campaign signal (metric) can be analyzed at the participant level to represent individuals who have extremely high commitment values, e.g., posting about a campaign one hundred times.
[0379] In embodiments, the commitment: average posts count in campaign signal (metric) is focused on participations after the firsi post and complemented by a measurement of the proportion of participants in the campaign who have only participated: once.
[0380] in embodiments, the commitment: average posts count in campaign signal (metric) may be combined with a commitment:, average time range of participation signal (metric) into a commitment; post regularity signal (metric) that may capture the deviation of campaign parties pants front natural human attention patterns.
038i] lii embodiments, other statistical properties of the 'distribution of posts per user may be part of refining the commitment metrics, in embodiments, there may be a natural shap of this distribution for spontaneous .campaign* and that natural shape ma be skewed, it will be appreciated in light of the disclosure that the commitment average posts count in campaign, signal {metric) may mak average post count an inappropriate metric in many long duration situations. Instead, it may be possible to be able to identify coordinated campaigns by a lack of skewness and/or th presence of a second moment at some value above one, which may both be indicative of an unusually large percentage of participants posting multiple times about a campaign, e,g., due to a coordinating body paying these' participants per post.
0382| In embodiments, the commitment average posts count n campaign- signal (metric) may be .normalized to take into account average posts per users in order to control for users with a very heavy activity across ail campaigns.
£0383) In embodiments., a priority signal name is Commitment: Average Time 'Range of
Participation,
0384| The commitment: average time range of participation 'signal description ~ In the desire to determine whether participants in this campaign are die-hard supporters or just people who chime in, the commitment: average time range o -participation signal (metric) may be used to facilitate looking at how long (in days) participants remained engaged in pushing the campaign, In embodiments,, the loyalty of participants to the campaign may be measured by time range (in days) for their campaign-related Tweets™' (or other postings) that may be averaged across all participants,
|0385| The range of the values of the commitmen average time range of participation signal (metric) is an unbounded value and therefore can be zero days to the total length of the campaign. 10386] in embodiments, the commitment: average time range of participation signal (metric) may look at the tim -frame between first and last participation action that can be averaged across ail participants in a campaign. By way of this example, the commitment: average time range of participation signal (metric) may measure whether actors participate in a "one-off' way (one Tweet™ and done)' or demonstrate a commitment to the campaign (multiple Tweets™ or other postings over time).
|0387j It. will be appreciated in light of the disclosure that participants in coordinated campaigns engage with the' campaign over a longer period than participants in spontaneous campaigns:. It will also be appreciated in light of the disclosure that participants in coordinated campaigns maybe more likely than participants in. spontaneous campaigns to receive extrinsic motivation, such as payment, for engaging with the campaig and, as such, the extrinsic motivation may lead to a longer engagement period titan intrinsic motivation.
f 0388] An exemplary average value of the commitment: average time range of participation signal {metric] rorcoordinated campaigns is 7.24.
j¾3893 An exemplary average range of values of the commitment: average time range of participation signal (metric) signal for coordinated campaigns is 0.08 to 22.33 days,
|0390j An exemplary standard deviation of the value of the commitment: average time range of participation signal (metric) for coordinated campaigns Is 9.04 days.
0391) An exemplary average value of the commitment: average time range of participation signs! (metric) for spontaneous campaigns is 1 ,53 days,
|Θ392] An exemplary average range of values of the commitment: average time range of participation signal (metric) for spontaneous campaigns is 0 to 3.36 days.
£11393] An. exemplary standard deviation of the value of the commitment: average time range of participation signal (metric) for spontaneous -campaigns is 1.23 days,
|039 ') It will be. appreciated in light of the disclosiire that the commitment: average time range of participation -signal, (metric) may be affected fay the overall time range of a campaign, e.g., if a campaign lasts three days, then this metric cannot go above a value of three. In embodiments, the commitment: average time range of participation signal (metric) may be combined into a commitment: post regularity signal that may capture the deviation of campaign participants from natural human attention patterns.
|0395) In embodiments, a signal name is Semantic Di versity for all Messages.
ΙΘ396) The semantic diversity for ail messages signal (metric) description ~ The semantic diversity for all messages signal (metric) looks to detail how- generally on-inessage is the campaign. The semantic di versity for all messages signal (metric) also looks to determine whether the interaction or activity appears like a diverse conversation covering a range of topics and expressions or ma he a fairly uniform campaign with low semantic diversity, it will be appreciated in light of the disclosure that people tend to Tweet™ (or otherwise post) on a variety of topics related to their daily lives, work, and interests, A group .trying to promote a coordinated campaign, however, may be interested onl in the narrow range of topics relevant to that campaign. In embodiments, b-ots or propaganda accounts ma also be- interested, in any Tweet™ (or applicable posting) relevant to any campaign they are trying to push, and therefore could be Tweeting*"* (or otherwise posting) on an. extremel wide range of topics. In embodiments, the semantic diversity for all messages signal (metric) may he measuring the extent to which participants in the campaign are Tweeting™ (or otherwise posting) on an intermediate range, of topics, which suggests that their activities are spontaneous -and human rather than automated or coordinated to propagate a specific message. 10397) In embodiments-, the- range of values of the semantic diversity for ail messages signal (metric) is zero to 100%,
£03981 la embodiments, raw alues of the semantic di versity for all messages signal (metric) fall into three categories: (i) When the value of the semantic diversity for ail messages signal: (metric) is <1 (less than one), then it may represent users who exclusively post about the same topic, which may be a eharacte.ris.tic of fabricated campaigns, (ii) When the value of the semantic- diversity for ail messages signal (metric is between one and 100, then It may represent users who post on a variety of topics and being characteristic of norma! human activity, (iii) When the value of the semantic diversity for all messages signal (metric) is above 100, then it may represent users who post on an extremely diverse set. of topics, characteristic -of spamhots or users who bridge different cultural and/or linguistic communities (e.g., users who post in different languages, etc.). In embodiments, the semantic diversity for all messages signal (metric) may be set to be bounded at 1000 because it may be necessary to fix a maximum value for the "distance" between any pair of topics, for which no document includes terms. from both topics. It will be appreciated in light of the disclosure that mathematically the distance should be infinity but, typically, it can be to set the value to 1000. The percentage of users with the semantic diversity for all messages signal (metric) may be greater than or equal to i .O and less than 100 and thus varies between zero and 100%. 10399] How the semantic diversity for all messages signal (metric) is computed - The value of the semantic diversity for ail messages signal (metric) of a particular actor's (or cluster's, or campaign's) messaging may be based on the assignment of messages to topics. In embodiments, the compulation of the semantic diversity for ail messages signal (metric) may use a Latent Dirichlet Allocation algorithm. By way of this example, once: messages have been, assigned to topics, the semantic diversity for ail messages signal (metric) Is determined for the message set. in embodiment, the measure of the value of the semantic di versity for all messages signal (metric) is determined as the probability that two documents chosen from the corpus -at random with replacement wili be on the same topic,
fiMiOj In the current exemplary case, the corpus is the message set, and the documents may be user Tweet™ (or other posting) histories, aggregated by user. The Latent Dirichlet Allocation (LDA) algorithm may be run for fiftee iterations with a number of topics no less than 20% of the number of documents and no more than 30%. An. average value of the semantic diversity for all messages signal (metric) over twent distinct runs of the LDA. algorithm is used on. the same corpus to smooth out variations due to the initial conditions for a particular run. in embodiments, a topic- distance score of 1000 may be assigned to the semantic diversity for all messages signal (metric) for topics that do not co-occur in documents.
1 01] Because the focus of the many embodiments is differentiating coordinated and/or aiiiomated campaigns from spontaneous and human-driven campaigns, the semantic diversity for all messages- signal (metric) as the percentage of all users in a campaign i computed with raw- diversity score falling into the range o normal human activity, i.e., the. metric being greater or equal to 1.0 hut less than 100. in embodiments, the semantic diversity for alt messages signal (metric) may refer to all campaign-related messages,
f04t}2] The values beiow show the percentage of users with the semantic diversity fo all messages signal (metric) greater than or equal to 1 ,0 and less than 100.0,
|Θ403] An exemplary average value of the semantic diversity for all messages signal (metric) for coordinated campaigns is 55%,
0 04] An exemplary average range of values' of the semantic diversity for all messages' signal
(metric) for coordinated: campaigns is i ?% to 90%,
(0485] An exemplary standard deviation of the value of the semantic diversity- for all messages signal (metric) .for coordinated campaigns is 36.59%.
ίϊ4§6] An exemplary average value of the semantic diversity for all messages signal (metric) for spontaneous campaigns is ? 1 %.
(0407] An exemplary average' range of values of the semantic diversit for all messages signal (metric) for spontaneous campaigns is 50% to 98%.
(040$] An exemplary standard deviation of the value of the semantic diversity for all messages signal (metric) for spontaneous campaign is 21,2%,
(0409] In embodiments, the semantic diversity for all messages signal (metric) may be very sensitive to confounds* By way of this example, news organizations may tend to have low semantic diversity because news organizations may post the same story headlines over and over even though such news organisations are not coordinated actors. Moreover, Tweets™ (or other postings) in one language tend to be more coordinated than Tweets™ (or other postings) in multiple languages, because the Latent Diriehlet Allocaiion (LDA) algorithm may not translate terras across languages.
(0410] At the same time, the semantic diversity for all messages signal (metric) may point t the differentiation between natural language use and the use of language to push a particular message. it will be appreciated i light of the disclosure that coordination around a message may require that that message may be as clear and simple as possible, whereas natural language can he complex, metaphorical, and even slightly confusing. To that end, coordinated campaigns may, therefore, not wish to increase the -semantic diversity of their messages even if the technical or organizational opportunity was available.
(0411] in embodiments, the semantic diversity for all messages signal (metric) includes separating language diversity from semantic diversity either by- grouping Tweets™ (or other postings) by post language prior to analysis or using automated machine translation to proconvert all Tweets™ (Or other postings) to the same language. The semantic diversity for all messages signal (metric) also Includes leveraging existing natural language processing approaches to identify certain kinds of low-semantic diversity language that may not be of interest, e.g., news headlines and press releases,
[6412] In embodiments, the■■semantic diversity for all messages signal (metric) may measure the temporal alignment of campaign-related Tweets™ (or other postings) for all participants. It will be appreciated in light of the disclosure that users generally do not time their Tweets™ (or other postings) to- coincide with the Tweets™ (or postings) of others. When the Tweet™ (or other posting) histories of campaign participants follow the same pattern of ebb and flow, especially across time zone boundaries, this ma be. evidence that an actor is coordinating the activities of participants to create a concentrated temporal burst of engagement. The semantic diversity for ail messages -signal (metric) may include temporal coordination of Tweets™ (or other postings) between campaign participants measured by alignment of Tweet™ (or other posting) historie across all participants in the campaign.
|Θ4Ι3| in embodiments, the range of the values of the semantic diversity for ail messages signal (metric) is between 0% and 100% and represents the percent alignment of two users' temporal normalized sequences of participation in the campaign, Toward that end, 0% -alignment may mean that the users' sequences do not match at ail, while 100% alignment may indicate a perfect matc :,
[9414] In embodiments, the semantic diversity for all messages signal (metric) may be computed with a dynamic time warp algorithm for comparing two temporal sequences of activity. In general, the dynamic time warp algorithm between two sequences SI and S2 is the number of wa ping transformations that are required to change S 1 into S2, The methods and systems described herein may, for example, use the dynamic time warp algorithm to identify bots and trolls in a different social media setting. The number of warping transformations may be normalized by the length of both -sequences SI and S2 and multiplied by 100- to get a percent value. Finally, the normalized number may be subtracted from 1 0 in order to calculate the percent alignment of SI and S2.
[0415] in embodiments, a priority signal name is temporal coordination per cluster.
[9416} The temporal coordination per cluster signal (metric) description - The temporal coordination per cluster signal (metric) may look at the- communities who participate in this campaign to identify different communities exhibiting very similar patterns of engagement that may be considered as being odd. In embodiments, the pattern of the temporal coordination per cluster signal (metric) may be even odder when postings exist in different time zones. The temporal coordination per cluster signal (metric) is measuring the temporal alignment of campaign-related Tweets™ (or other postings) aggregated, at the cluster level With that in mind. communities generally do not time their Tweets™ (or other postings) to coincide with the Tweets™ (or otter postings) of otter comm nities. When, the Tweet™ (or other posting) histories of participating clusters follo the : same pattern of ebb and flow, especially across time m 'boundaries, th s may be evidence that an actor, is coordinating the activities of participants to create a concentrated temporal host of engagement.
[0417] The range of values for the temporal coordination per cluster signal (metric) is zero percent to 100%, The value of the temporal coordination per cluster signal (metnc)'represents the percent alignment of two users' temporal normalized, sequences 'of participation in the campaign. Toward that end. 0% alignment may mean thai the users* sequences do not match at all, while 100% alignment indicates a perfect match.
fi l&j The temporal coordination per cluster signal (metric) description - The temporal coordination per cluster signal (metric) is a pe.r-user take on examining temporal coordination, which might he helpful when other metrics are noisy. Temporal coordination per user is technically the temporal coordination between pairs of users, in embodiments, the temporal coordination per cluster signal ( metric) may measure the temporal alignment of campaign-related Tweets™ (or other postings) between individual campaign participants. As noted before, users generally do not time their Tweets™ (or other postings) to coincide with the tweets of others. When the Tweet™ (or other posting) histories of campaign participants- follow the same pattern of ebb and flow, especially across time zone boundaries, this may be evidence that an actor is coordinating the activities of participants to create a. concentrated temporal hurst o f engagemen t. 10419} The temporal coordination per cluster signal (metric), especially its heatmap visualization, may provide: a good high-level description of the rate of unusual coordination across the users participating in & campaign. The temporal coordination per cluster signal (metric),, however, may suffer from, the same overestimation of actual temporal coordination so the algorithm may be adjustable: for including, in the calculation the average temporal coordination across users,
94291 In embodiments, a signal name is client diversity per cluster,
|0421] The client diversity per cluster signal (metric) description -The client diversit per cluster signal (metric) may determine how accounts in a given cluster use Twitter™, Facebook™, or other social media platforms. The client diversit per cluster signal (metric) may also determine how Twitter™ users (or other posters or various relevant platforms) go through a mobile device, a computer, or directly access APIs of Twitter™ to Tweet™ (or other social media postings). In one example, some clients may be used to coordinated Tweets™ (or other social media postings) and the client diversity per cluster signal (metric) may he used to determine how coordinate are the Tweets™ (or othe social media postings), and are such coordinating Tweets™ (or othe social media postings) those that are used heavily in some of the communities who participate in this campaign. It will be appreciated in light of the disclosure that client diversity per cluster signal (metric) is the same as the client diversity at campaign scale signal (metric) but analyzed at the cluster level.
|t>422| There is no specific range of values applicable to the client diversity per cluster signal (metric) because it is a qualitative signal (metric)
[0423] The valise of the client diversity per- cluster signal (metric) is computed by using the "source" field of the Tweet™ (or other posting) to identify the client used to make the Tweet™ (or other posting), as in the Client diversity at campaign scale signal (metric). Then the Tweets™ (or other postings) are aggregated into clusters of the author of the Tweet™ (or other posting) in the campaign map,
( 24j In embedments, a signal name is Time Delta between Communities.
('0425] The time delta between communities signal (metric) description - the time delta between communities signal (metric) may identify a community that is engaging with the campaign .significantly ahead of others, in one example, thi is due to kick-starting that campaign or being significantly behind maybe becau e: there is a need to coordinate talking points before engaging. It will be appreciated in light of the disclosure that the time delta between communities signal (metric) was inspired by qualitative analysis initially done in the Syrian Civil War context such that communities pretending to portray civilians while being led by military intelligence engaged with popular topics with a lag of several hours to days. Toward that end, the time delta between communities signal (metric) may examine when clusters are most active in the campaign. By way of this example, the time delta beiween communities signal (metric) may measure the distance between a given cluster's peak and the more general peak of the overall campaign.
|0426| hi embodiments, the range of values of the time delta between communities signal (metric) represents a number of days, .Negative values may indicate that acommunity's peak of temporal activity happens before the average peak date for all other communities. 'Positive values may indicate the peak happens after the average peak date tor all other communities. A score of zero may indicate a community peaking in sync with the rest of the communities.
|0427| How the time delta between communities signal (metric) is computed - This metric measures the number Of days between the peak date of campaign participation in a given cluster and peak date of campaign participation averaged across all other clusters, l.n one example with three clusters, where activity in cluster A peaks on 25 January 20 i ?, activity in cluster B peaks on 26 January 2017, and activit in cluster C peaks on 27 January 2017' 5 the value of the time delta beiween c mmunities: signal (metric) for A equais: -1 ,5, the value of the time delta between communities signal (metric) for 6 equals zero, and the value of the time delta between communities signal (metric) for C equals 1.5. |Θ428{ In embodiments, the time delta between communities signal, (metric) may be helpful to analyz disputed hashtags, with both spontaneous and coordinated, clusters engaging in the same campaign. I n embodiments, the time delta between communities si gnal (metric) may point to the natural logistical cost of coordinating a message of a campaign in response to a sudden event, such as a late-breaking news story. It will be appreciated in light of the disclosure that even the most sophisticated coordinated campaigns cannot anticipate such events and at the same time, they cannot respond to these events spontaneously a it may distract from their message and may hurt the overall aim of the campaign. It will also he appreciated in light of the disclosure that all coordinated campaigns will need at least a little time to respond to late-breaking events, and their responses will measurably lag behind spontaneous human reactions to the same, in embodiments, the time delta between communities signal (metric) may include automatic identification of •sudden events as they happen, e.g., by matching campaign-related terms against Google™ News, other ews sources, and the like. A subsequent step may be to automatically track .responses to the same events from campaign compared to non-campaign-reiated clusters,.
|Θ429| in embodiments, a signal name is Commitment by User*
|043 j The commitment by user signal (metric) description ~ Loyalty of participants to the campaign may be measured by the number of times the participants Tweet™ (or otherwise post) about the campaign and time range (in days) for their campaign-related Tweets™ (or other postings). The commitment by user signal (metric) may be measured by the user. In embodiments, the commitment by user signal (metric) looks at whether individual users are particularly committed to a campaign, in embodiments, the commitment by user signal (metric) may facilitate looking at users and their own commitments by determining whether there are, for example, people who Tweet™ (or otherwise post) · exactly 100 times* or some predictable predetermined amount. The value of the commitment by user signal (metric) may facilitate identifying and singling out accounts that might be incentivized to participate x- number of times .or for x days straight,
|8 3lj The range of values of the commitment by user signal (metric) are unbounded values starting at zero, i.e., no subsequent actions, zero days pass between first and last action, In embodiments,, value* for the commitment by user sign l- (metric) by subsequent actions are between zero and ten actions, those for commitment by time frame are between zero- and thirty days.
10 32] In. embodiments, there may be users whose commitment by user signal (metric) is extremely high and such .behavior may also contribute to highe values associated with the Commitment: average time range of participation signal (metric) noted above.
0433] in embodiments, a signal name is Commitment by Cluster, {9424} The commitment by ciuster signal (metric) description ----- The commitment by ciuster signal
(metric) may be used, to determine whether a specific, cluster is particularly c mmit ed to a campaign. In embodiments, the commitment by ciuster signal (metric) may facilitate looking at clusters and their own commitments. B way of this example, the commitment b cluster signal, (metric) may facilitate the determination of whether there are clusters that Tweet™ (or otherwise post) exactly i 0(1 times. In embodiments, the commitment by cluster signal (metric) ma he ased to single out clusters that might be raoentivized to participate a certain number of times or for a certain length of time, in one example., the commitment by cluster signal (metric) may be used to determine whether a group of accounts showed up, Tweeted™ (or otherwise posted) 100 tiroes over five days, and then left.
|l>435] In embodiments, the commitment by cluster signal (metric) may look at the loyalty of participants to the campaign that may be measured by the number of time the parties pants Tweet™ (or otherwise post) about the campaign and time- ange f in days) for their campaign- related Tweets"1' (or other postings). In embodiments, the commitment by cluster signal .(metric) may measure the degree to which a body of actors- in the campaign stick with it after their first engagement witii the campaign, it wiii be appreciated in light of the disclosure that the value of the Commitment by cluster signal (metric) for mo t human activity is a skewed distribution, in measurable contrast to coordinated, activit that may include those who participate once with a tew die-hard supporters that participate a lot. Deviations from the skewed distribution detailing human activity may, therefore, may reveal coordination. By way of this example, if an actor participates in campaign exactly ! 00 times, this may suggest that they were incenttvized by a coordinating body to meet that threshold,
{9426} The range of the values of the commitment by cluster signal (metric) are 'unbounded values starting at zero, i.e.,, no subsequent actions, zero days pass between first and last action. In embodiments, the value of the. 'commitment by cluster signal (metric) by subsequent actions is between zero and ten actions. In further embodiments, the value of the- commitment -by cluster signal (metric) by time frame is between zero and thirty days.
|M37j How the value of the commitment by cluster signal, (metric) is computed ~ There are two commitment me trics: (i) counting the number of subsequent participation "actions" (i.e.* Tweets™ or other postings with a campaign hashtag"), and (ii) the time frame (in days, can be fractional) between, first and last participation action. Both metrics may be averaged across all participants in a campaign. Both metrics may measure whether actors participate in a 'One-off way (i.e., one Tweet™ or other posting and done) or ma demonstrate a commitment, to- the campaign (e.g., multiple Tweets™ or other postings over time).
0438] in embodiments, a signal, name is Account Creation Date Diversity for Cluster, |0 39) The account creatio date diversity for cluster signal (metric) description ~ this signal (metric) ma facilitate observing how close in time -all accounts -participating- in a campaign were created. If 90% of participating accounts within a given cluster wen; created within -a ..span, of five days, for example, then such activity may indicate a heavy coordination within that cluster. The account creation date diversity for cluster signal (metric) may he particularly helpful to spot bote, .toll farms, and the like on networks using' fake accounts-. generated in hulk,
0440) The range of values of the account, creation date diversity for cluster signal (metric) is zero to 4,015 days. It will be appreciated in Sight of the disclosure that the maximum range m a range from zero to the total day since the founding of Twitter™ or the other applicable social media platforms; The values of the account creation, date diversity tor cluster' -signal (metric) in datasets evaluated have included a range of zero to 1.200 day s,
£0441] .How the account creation date diversity for cluster signal (metric) is computed - Account creation date diversity for a particular cluster and campaign combination is the standard deviation (in days) of Twitter™ (or Other applicable social media platform) account creation dates for all accounts in thai cluster who engaged with the campaign in question. As a baseline, embodiments may compare account creation date diversity for a particular cluster to account creation date diversity for the entire campaign.
1.0442] In -embodiments, a signal name is Homophily.
f 0443] The homophily signal (metric) description - This signal (metric) ma facil itate looking for communities that pay a. "disproportionate" amount of attention, to one another, for instance across ideologies, language, culture, or the like, in embodiments, the homophily signal (metric) can identify disproportionate attention relationships between clusters measured, by a number of following relationships between clusters. When looking at. communities (clusters)., if will be appreciated in light of the disclosure thai it Is just a important to understand' who the community pays attention to -as who is in the community. With this in mind, the homophily signal (metric) may measure deviations from expected, patterns of attention in social media. By way of this example, it will be appreciated in light of the disclosure that most people- may pay most of their attention to like-minded friends and the vast majority of people may pay most of their attention to friends in the same cultural and linguistic environment or in their affinity. In further examples, the homophily signal (metric) may facilitate the identification of patterns of intense inter-attention across ideologies, culture, and language that may imply evidence for coordination.
10444] The range of values of the homophily signal . metric) can be shown, to be zero to ten. 10445] How the homophily signal (metric.) is computed - The homophily signal (metric} as a telltale of cluster attention is a ratio of the actual number of edges connecting members of the clusters compared to what would be expected under conditions where each cluster paid attention to every other cluster strictly in proportion to the cluster's size. Typically, the baseline for such a signal (metric) in. is random, connection patterns. In embodiments, the homophi!y signal (metric) includes relatively more aggressive-, baselines · because no actual human relationships follow a random pattern.
|0446] In embodiments, a signal name is Language Mismatch.
[0447] The language .mismatch signal, (metric) description - The default language for a new Twitter™ (or other social media) account appears to be English. Users may, however, choose to change their profile language if they want. It will be appreciated in light of the disclosure that users posting frequently in a language thai differs from their default Twitter™ (or other social media), profile language- may be part of a i¾reign-iauguage propaganda operation on behalf of some coordinated entity.
[0448] The language .mismatch signal (metric) may measure the percentage of a campaign's Tweets™ (or other postings) - at both the cluster and campaign level ~ that is In a language that differs from the users' default . Twitter™ ( r other social media) profile language.
[0449] The range of values of the language mismatc signal (metric) is zero to one hundred percent, -where one hundred percent would have indicated that all campaign participation actions in this cluster/campaign are Tweeted™ (or otherwise posted) in a language different from their accounts.' default, profile language.
f.0450] flow the language mismatch signal (metric) is computed - For each w et™ (or other posting) with the campaign-related hashtag, the language mismatch signal (metric) may identify the language of the Tweet'™ (or other posting) and the language profile setting i the Twitter™ .API or the API of another social media platform. In embodiments, the language mismatch signal (metric) may also aggregate the Tweets™ (or other postings) by the cluster of the author of the Tweet™ (or other posting) in a campaign map. By way of this example, the % of Tweets™ (or other postings) for each cluster whose tweet language did not match the poster language of the Tweet™ (or other posting) may be reported,
[8451] Detailed embodiments of the present disclosure are disclosed herein; however, it is to be understood thai the various disclosed embodiments are merely exemplary of the disclosure, which may he embodied in various forms. Therefore, specific structural and .functional. details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in. the art to variously employ the present disclosure in. virtually any appropriately detailed structure.
10452] The terms *'a" or "an," as used herein, are defined a one or more than one. The term "another," as used herein, is defined as at least, a second or more. The term "including" and/or "having," as used herein, are defined as comprising (i.e., open transition). f0453] While only a few erabodiiments of the present disclosure have been shown and described:, it will be obvious to those skilled in the art that many changes and modifications may be made thereunto without departing from the spirit and scope of the present disclosure as described in the following claims.. All patent applications and patents, both foreign and domestic, and all other publications referenced herein are incorporated herein in their entireties- to the full, e tent permitted by law,
0454} The methods and systems described herein may be deployed in part or in whole through a machine that executes computer software, program codes, and/or instructions on a processor. The present disclosure may be implemented as a method on the. machine, as a system or apparatus as part of or in relation to the machine, or as a computer program product embodied in a computer readable medium executing on one or more of the machines, In embodiments, the processor may be part of a server, cloud server, client, network infrastructure, mobile computing platform, stationary computing platform, or other computing platform. A processor may be any kind of computational or processing device capable of executing program instructions, codes, binary instructions, and the like. The processor may he or may include a signal processor, digital processor, embedded processor, microprocessor, or any variant such as a co-processor (math coprocessor, graphic co-processor, communication co-processor and the like) and the like that may directl or indirectly facilitate execution of program code or program instructions stored thereon. In addition, the processor may enable execution of multiple programs, threads, and codes. The thread may be executed simultaneously to enhance the performance of the processor and to facilitate simultaneous operations of the application. By way of implementation, methods, program codes, program, instructions and the like described herein may be implemented in one or more thread. The thread may spawn other threads that may have assigned priorities associated with them the processor may execute these threads based on priority or any other order based on instructions provided, in the program code. The processor, or any machine utilizing one, may include non-transitory memory that stores methods, codes, instructions, and programs as described herein and elsewhere. The processor may access a non-transitory storage medium, through an interface that may store methods, codes, and instructions as described herein and elsewhere. The storage medium associated with the processor for storing methods, programs, codes, program instructions or other type of instructions capable of being executed b the computing or processing device may include but may not be limited to one or more of a CD-ROM, DVD, memory, hard disk, flash drive, RAM, ROM, cache, and the like.
flMSSJ A processor ma include one or more cores mat may enhance speed and performance of a multiprocessor. In embodiments, the process may be a dual core processor, quad core processors, other chip-level multiprocessor and the like that combine two or more independent cores (called a die).
f 0456] The methods and systems described herein may be deployed in part or in whole through a machine that executes computer software o a server, client, firewall, gateway, hub, router, or other such computer and/or networking hardware. The software . ro- m may be associated, with a server that may include a file server, print server, domain server, Internet server, intranet server, cloud server, and other variants such a secondary se ve , host server, distributed server, and the like. The server may include one or more of memories, processors, computer readable media, storage media, ports (physical and virtual), communication devices, and interfaces capable of accessing other servers, clients,, machines, and devices through a wired or a wireless .medium, and the like. The methods, programs, or codes as described herein and elsewhere may be executed by the server. In addition, other devices required, for execution of methods as. described, in this application may be considered as a part of the infrastructure associated with the server.
f0457j The server may- rovide an interface to other devices including, without limitation, clients, other servers, printers, database servers, print servers, file servers,, communication servers, distributed servers, social networks, and the like. Additionally, this coupling and/or connection may facilitate remote execution of program across the network. The networking of some or ail of these devices ma facilitate parallel processing of program o method at one or more location without deviating from: the scope of the disclosure-. In addition, any of the devices attached t the server through an interface may include at least one storage medium capable of storing methods, .programs, code and/or instructions. A central repository may provide program instructions to be executed on different devices. In this 'implementation:, the remote repository may act as a storage medium for program code, instructions, and programs,
$458} The software program may be associated with a client that may include a file client, print client, domain client, Internet client, intranet client and other variants such as secondary client, host client, distributed client, and the like. The client may include one- r more of memories, processors, computer eadable- media, storage media, ports (physical and. virtual), communication devices, and in terfaces capable of accessing other clients, servers, machines, and devices through •a wired or wireless medium, and the like. The methods, programs, or codes as described herein and elsewhere may be executed by the client. In addition, other devices required for execution of methods as described in. this application ma be considered as a pari o the infrastructure associated with the client..
f 0459] The client may provide an interface to other devices including, without limitation, servers, other clients,, printers, database servers, print servers, file servers, communication servers, distributed servers, and the like. Additionally, this coupling and/or connection may facilitate remote execution of program across the network The networking of some-. or all of these devices may facilitate parallel processing of a program or .method at one or more location without deviating -from the scope of die disclosure, in addition, an of the devices attached to the client through an interface ma include at least one storage medium capable of storing methods;, programs, applications, code and/or instructions. A central repository may provide program instructions to he executed on different devices, m this implementation,, the remote repository may act as a storage medium for program code, instructions, and programs,
0 60} The methods and systems described herein may be deployed in part or in whole through network infrastructures. The network infrastructur may include elements such as computing devices, servers, routers, hubs, firewalls, clients, personal computers, communication devices, .routing devices and other active and passive devices-, modules and/or components as known in the art The computing and/or non-computing device(s) associated with the network infrastructure may include, apart -from other components, a storage medium such as flas memory, buffer, stack, RAM, ROM, and the like. The processes, methods, program codes, instructions described herein and elsewhere may be executed by one or more of the network ii astruetura! elements. The methods and systems described herein ma he adapted for use with any kind of private, community, or hybrid cloud computing network or cloud computing environment, including those which involve features of software as a service (SaaS), platform as- a service (PaaS), and/or infrastructure as. a service (laaS).
{046 lj The methods, program codes, and instructions described herein and elsewhere may be 'implemented on a cellular network having multiple cells. The. cellula network may either be frequency division multiple access (FDMA) network or code division multiple access (CDMA) network. The cellular network may include. mobile devices, ceil sites, base stations, repeaters, antennas, towers, and th like. Ihe ceil network may be a GSM, GPRS, 3G, EVDO, mesh, or other networks types.
0462] The methods, program codes, and instructions' described herein and elsewhere may be implemented an or through mobile devices. The mobile devices may include navigation devices, cell phones, mobile phones, mobile personal digital, assistants, laptops, palmtops, nethooks, pagers, electronic books readers, music players and the like. These devices may include, apart from other components., a storage medium such as a flash memory, buffer, RAM, ROM and one or .more computing devices. The computing devices associated with mobile devices may be enabled to execute program codes, methods, and Instructions stored thereon. Alternatively, the mobile devices ma be configured to execute instructions in collaboration with other devices. The mobile devices may communicate with base stations interfaced with servers and configured to execute program codes. The mobile devices may communicate on a peer-to-peer network, mesh network, or other communications network. The program, code may he stored on the storage medium associated with the -server- and executed by a computing device embedded within the server. The base station may include a computing device and a storage■'medium. The storage device may store program codes and. instructions executed by the .computing devices associated with the base station,
|Ώ4ό3] The computer software, program codes, and/or instructions may be stored and/or accessed on machine readable media that may include: computer -c m onents, devices, and recording media that retai digital data used for computing for some interval of time; semiconductor storage known as random access memory (RAM); mass storage typically for more permanent storage, such as optical discs, forms of magnetic storage like hard disks, tapes, drums, cards and other types; processor registers, cache memory, volatile memory, rion-volati1e memory; -optical storage such as CD, DVD; removable media such as flash memory (e.g., USB sticks or keys), floppy disks, magnetic tape, paper tape, punch cards, standalone RAM disks, Zip drives* removable mass storage, -off-line, and the like; other computer memory such as. dynamic memory, static memory, read/write - storage, mutable storage, read only, random access, sequential access, location addressable, file .addressable, content addressable, network attached storage, storage area network, bar codes, magnetic ink, and the like.
fu464j the methods and systems described herein may transform physical and/or -intangible items from one state to another. The methods and systems described herein may also transform data representing physical and or intangible items fr m one state to another.
(6465) The elements described and depicted herein, including, in flowcharts and block diagrams throughout the figures, imply logical 'boundaries between the' elements. However, according to sof are or hardware engineering practices, the depicted elements and the functions thereof may be implemented on machines through computer executable media having a processor capable of executing program instructions stored thereon as a monolithic software structure, as standalone software modules, or as modules that employ external routines, code, services, rid so forth, or any combination of these, and all such implementations may he within the scope of the present disclosure. Examples of such machines may include, but may not be limited .to, personal digital assistants, laptops, personal computers, mobile phones, other handheld computing devices, medical equipment, wired or wireless communication devices, transducers, chips, calculators, satellites, tablet PCs, electronic books, gadgets, electronic devices, devices having artificial intelligence, computing devices, networking equipment, servers, routers, and the like. Furthermore, the elements depicted in the flowchart and block diagrams- or any other logical component -may be implemented on a machine capable of executing program instructions. Thus, while the foregoing drawings and descriptions set forth functional aspects of the disclosed systems, no particular arrangement of software for implementing these functional aspects should be inferred from these descriptions unless explicitly stated or otherwise clear from the context Similarly, it will be appreciated that the various steps identified and described above may be varied, and thai the order of steps may be adapted to particular applications of the techniques disclosed herein. All such variations and modifications are intended to fall, within the scope, of this disclosure. As such, the depiction and/or description of an order for various steps should not be .understood to require a particular order of execution for those steps, unless required by a particular application, or explicitly stated or otherwise clear from the context
ΙΘ 66) The methods and/or processes described above, and steps associated therewith, may be realized in hardware, software or any combination of hardware and software suitable For a particular application. The hardware may include a general- purpose- Computer and/or dedicated computing device or specific computing device or particular aspect or component of a specific computing device. The processes may be realized i one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors or other programmable device, along with internal and/or external memory. The processes may also, or instead, be embodied in an application specific integrated circuit a programmable gate array, progra.raro.abte array logic,- or any other device or combination of devices that may be configured to process electronic signals, it ill further b appreciated that one or mor of the processes may be realized as a computer executable code capable of being executed on a. machine-readable medium,
{9467] The computer executable code .may be created using structured programming language such as C, an object oriented programming language such as C-H-, or any other high-level or low- level programming language (including assembly languages, hardware description languages, and database programming languages and technologies) that may fee stored, compiled or interpreted to run oil one of the above devices, as well as heterogeneou combinations of processors, processor architectures, or combinations of different hardware and. software, or any other machine capable of executin program instructions.
{9468] Thus, in one aspect, methods described above and combinations thereof may be embodied in computer executable code that, when executing on one or more computing devices, performs the steps thereof In another aspect, the methods may he embodied in systems that perform the steps thereof and may be distributed across devices in a number of ways, or all of the functionality may be Integrated into a dedicated, standalone device or other hardware. In another aspect, the means for performing the steps associated wi h the processes described above may-include any of the hardware and/or software- described above. All such permutations and combination are intended to fall within the scope of the present disclosure.
{9469] While the disclosure has been disclosed in. connection with the preferred embodiments shown and described in detail, various modifications, and improvements thereon will become readily apparent to those skilled in the art Accordingly, the spirit and scope of the present disclosure ts not to be limited by the foregoing examples, bat is to be understood in the broadest sense allowable by law.
| 7δ] The use of the terms "a" and "an" and. "the" and similar referents i the contest of .describing the disclosure (especially in. the context of the following claims) is to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The te ms "comprising," "having," "including," and "containing" are to be construed as open-ended terms (i.e., meaning "including, but not limited to,") unless otherwise noted. Recitations of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each, separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and ail examples, or exemplary language (e.g., "such as") provided herein, is intended merely to better illuminate the disclosure and does not pose a limitation on the scope of the disclosure unless otherwise claimed. No language in the specification should be construed as indicating any non- claimed element as essential to the practice of the disclosure.
{0471] While the foregoing written description enables one skilled in the art to make and use what is considered presently to be the best mode thereof, those skilled in the art will understand and appreciate the existence of variations, combi ations, and equivalents of the specific embodiment, method, and examples herein. The disclosure should therefore not be limited by .the above- described embodiment, method* and examples, but by all embodiments and methods within the scope and spirit of the disclosure,
0472] Any element in a claim that does not .explicitly state "means for" performing a specified function, or "step for" performing a specified function, is not to be interpreted as "means" or "step" clause as specified in 35 U.S.C. § 1 12(f). In particular, any use of "step of in the claims is not intended to invoke the provision of 35 U.S.C. §1 12(f).
|Θ473| Persons skilled in the art may appreciate that numerous design configurations may be possible to enj y the functional benefits Of the inventive systems. Thus, given the wide variet of configurations and arrangements of embodiments of the present invention, the scope of the invention is reflected by the breadth of the claims below rather than narrowed by the. embodiments described above.

Claims

A method for determining coordinated activity in social media movements on a social media channel, the method comprising:
identifying a plurality f markers of coordinated activity through analysis of campaign signals from the social media movements;
configuring a data structure of the plurality of markers, for a social media campaign on a social media channel, wherein the pl urali ty of markers includes a network dimension for representing: how accounts are connected, a temporal ' dimension or' representing patterns of messages over time, and a semantic- .dimension for representing a diversity of topic and meanings of the social media movements; and
analyzing the campaign signals indicative of the coordinate activit of the social media, movements in the social media campaign including determining users within the social media campaign, determining clusters of users that make up the social media campaign and determining relationships between the users participating in the social media movements, and determining propagation, patterns across clusters of users of the social media campaign,
The method of claim i , wherein identifying the plurality of markers includes evaluating a degree to which the coordinated activity of the social, media campaign is concentra ted in the clusters of users.
The method of claim 3, wherein the coordinated- activity of the social media campaign is determined from user actions within the social media movements in the social media campaign.
The method of claim 1,. wherein identifying the plurality of markers includes evaluating a degree to which the coordinated activity of the social, media campaign is distributed among the clusters of users.
The method of claim L wherein the plurality of markers includes a day peakedaess marker thai indicates a percentage of the coordinated activity of the social medi campaign that take place on a day identified as most active of the social media campaign.
The method of claim 1, wherein the plurality of markers includes a commitment signal that is computed by' averaging a number of subsequent participation actions for each of a plurality of participants in the coordinated activity of the social media campaign.
The method of claim 6, wherein the plurality of markers includes a post regularity commitment signal that represents a deviation of 'commitment to participation by a user from natural human attention patterns.
The method of claim 1 , wherein identifying the plurality of markers includes determining a semantic diversity score for the coordinated activity of the social media campaign by assigning messages in the campaign to topics and calculating a d iversity of the topics on a topic distance scale that facilitates determining the semantic diversity score.
The method of claim L wherein identifying the plurality of markers includes computing temporal alignment of campaign-related actions for users in the campaign by comparing temporal sequences of campaign-related actions.
The method of claim 1, wherein identifying the plurality of markers includes computing semantic diversit over time to identify co-occurring topics in the social media campaign, wherein a relatively small value of the semantic diversity score is configured to be indicative of fabricated campaigns, wherein a elati el large value of the semantic diversity score is configured to be indicative of spambots, and wherein a semantic diversity score having a value in-between is indicative of normal human activity.
A computer system ibr 'determining coordinated activity in social media movements on a social media channel,, the system comprising:
a user interface that .configures a social media campaign on one or more social media channels and thai communicates via a network;
a computing device that identifies a plurality of markers f coordinated activity through analysis of campaign signals from the social media movements and that configures one or more data structures containin the plurality of markers for the social media campaign oo one or more social media ch annels, wherein the plurality of markers includes a nefwork dimension for representing how accounts are connected, a temporal, dimension for representing patterns of messages over time, and a semantic dimension for representing a diversity of topics and meanings of the social media movements, wherein the analysis of the campaign signals indicative of the coordinated activity of the social media movements in the social media campaig includes deierminsng users within the social media campaign., determining clusters of users that make up the social media campaign and determining relationships between the users participating in the social media movements, and determining propagation patterns across clusters of users o the social media campaign; and
too storage system that stores one or more of the data structures containing the plurality of markers for the social media campaign, on one or more of the social media channels;
a processing system that executes computer-readable instructions that cause the processing system to:
receive a request from an externa! system about the coordinated acti vity of the campaign signals from the social media movements;
retrieve at least a portion of one or more data structures containing the plurality of markers for the social media campaign on one or more of the social media channels; and
transmit contents of at least a portion of the analysis to the user interface that displays a least a portion of the plurality of markers Indicative one of coordinated activity and normal human activity
12. The system of claim 1 1 wherein identifying the plur lity of markers through analysis of campaig signals includes evaluating a degree, to which the coordinated activity of the social media campaign is concentrated in the clusters of users.
13. The system of claim ί i wherein the coordinated activity of the social medi campaign is determined from user actions within the social media movements in the social media campaign, wherein the coordinated activity includes a relatively large number of accounts o one or more of the social media channels controlled by a relatively small number of coordinated entities resulting in a relative lack of diversity of similar accounts on one or more social medial channels controlled by uncoordinated use s,
14. The system of claim 1.1 wherein identifying the plurality of markers through analysts -of campaig signals includes evaluating a degree to which the coordinated activity of the social media campaign is distributed among the clusters' of users.
15. The system of claim i 1 wherein the plurality of markers includes a day peakedness marker that indicate a percentage of the coordinated activit of the social media campaign that take place on a day identified as most active of the social media campaign,
16. The system of cl im 11 wherein the plurality of indicators- includes a commitment signal that is -computed b 'averaging a number of subsequent participation, actions for each of a plurality of participants in the coordinated activi ty of the social media campaign,
17. The system of claim 16 wherein the plurality of indicators includes a post regularity commitment signal that represents a deviation of commitment to participation by a user .from natural human attention patterns. The system of claim 1. 1, wherein identifying the plurality of markers through analysis of campaign signals includes determining a semantic diversity score for the coordinated activit of the social media campaign., wherein determining, a semantic diversity score inc ludes assigning messages in the campai gn to topics and calcu lating a diversity of the topics on. a topic distance scale that facilitates determining the semantic diversity score. The system of claim 1 I wherein identifying the plu alit o markers through analysis of campaign, signals includes computing temporal alignment of campaign-related actions for users in th campaign by comparing temporal sequences of campaign-related actions. The system of claim 1.1 wherein identifying the pluralit of markers through analysis of campaign signals includes computing semantic diversity over time to Identify co-occurring topics in the social media campaign, wherein a relatively small value of the semantic diversity score is configured to be indicati ve of fabricated campaigns, wherein a relatively large value of the semantic diversity score is configured to be indicative o spambots, and wherein a semantic diversity score having a value in-between, is indicative of normal human activity.
PCT/US2018/038639 2009-12-18 2018-06-20 Methods and systems for identifying markers of coordinated activity in social media movements WO2018237098A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CA3068264A CA3068264C (en) 2017-06-20 2018-06-20 Methods and systems for identifying markers of coordinated activity in social media movements
EP18819788.3A EP3642739A4 (en) 2017-06-20 2018-06-20 Methods and systems for identifying markers of coordinated activity in social media movements
US16/442,544 US11409825B2 (en) 2009-12-18 2019-06-16 Methods and systems for identifying markers of coordinated activity in social media movements
IL271650A IL271650A (en) 2017-06-20 2019-12-22 Methods and systems for identifying markers of coordinated activity in social media movements
US17/883,005 US20220391460A1 (en) 2009-12-18 2022-08-08 Methods and systems for identifying markers of coordinated activity in social media movements

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762522644P 2017-06-20 2017-06-20
US62/522,644 2017-06-20
US201762534172P 2017-07-18 2017-07-18
US62/534,172 2017-07-18

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/832,106 Continuation-In-Part US10324598B2 (en) 2009-12-18 2015-08-21 System and method for a search engine content filter

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/442,544 Continuation-In-Part US11409825B2 (en) 2009-12-18 2019-06-16 Methods and systems for identifying markers of coordinated activity in social media movements

Publications (1)

Publication Number Publication Date
WO2018237098A1 true WO2018237098A1 (en) 2018-12-27

Family

ID=64737330

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/038639 WO2018237098A1 (en) 2009-12-18 2018-06-20 Methods and systems for identifying markers of coordinated activity in social media movements

Country Status (4)

Country Link
EP (1) EP3642739A4 (en)
CA (1) CA3068264C (en)
IL (1) IL271650A (en)
WO (1) WO2018237098A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461118A (en) * 2020-03-31 2020-07-28 中国移动通信集团黑龙江有限公司 Interest feature determination method, device, equipment and storage medium
CN112231562A (en) * 2020-10-15 2021-01-15 北京工商大学 Network rumor identification method and system
CN112272213A (en) * 2020-09-30 2021-01-26 上海连尚网络科技有限公司 Activity registration method and equipment
CN112650851A (en) * 2020-12-28 2021-04-13 西安交通大学 False news identification system and method based on multilevel interactive evidence generation
WO2021076287A1 (en) * 2019-10-15 2021-04-22 Microsoft Technology Licensing, Llc Semantic sweeping of metadata enriched service data
CN113010578A (en) * 2021-03-22 2021-06-22 华南理工大学 Community data analysis method and device, community intelligent interaction platform and storage medium
US20220156393A1 (en) * 2020-11-19 2022-05-19 Tetrate.io Repeatable NGAC Policy Class Structure
CN115766555A (en) * 2022-11-11 2023-03-07 中国航空工业集团公司西安飞行自动控制研究所 TTE switch network test architecture and method
WO2023129166A1 (en) * 2021-12-30 2023-07-06 Eidelman Vlad Generating and analyzing policymaker and organizational issue graphs
CN118411194A (en) * 2024-07-04 2024-07-30 山东物慧信息科技有限公司 Community integral system operation service data information integration system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140222577A1 (en) * 2006-03-17 2014-08-07 Raj Abhyanker Campaign in a geo-spatial environment
US20150106370A1 (en) * 2009-03-31 2015-04-16 Microsoft Corporation Automatic generation of markers based on social interaction
US20160350868A1 (en) * 2013-09-20 2016-12-01 Bank Of America Corporation Interactive map for grouped activities within a financial and social management system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130232263A1 (en) * 2009-12-18 2013-09-05 Morningside Analytics System and method for classifying a contagious phenomenon propagating on a network
US10324598B2 (en) * 2009-12-18 2019-06-18 Graphika, Inc. System and method for a search engine content filter

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140222577A1 (en) * 2006-03-17 2014-08-07 Raj Abhyanker Campaign in a geo-spatial environment
US20150106370A1 (en) * 2009-03-31 2015-04-16 Microsoft Corporation Automatic generation of markers based on social interaction
US20160350868A1 (en) * 2013-09-20 2016-12-01 Bank Of America Corporation Interactive map for grouped activities within a financial and social management system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ADRIEN GUILLE ET AL.: "Predicting the Temporal Dynamics of Information Diffusion in Social Networks", ARXIV, 1 March 2013 (2013-03-01), pages 1 - 10, XP055560066 *
EYTAN BAKSHY ET AL.: "Everyone' s an Influencer: Quantifying Influence on Twitter", PROCEEDINGS OF THE FOURTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 9 February 2011 (2011-02-09), pages 65 - 74, XP055560073, Retrieved from the Internet <URL:http://snap.stanford.edu/class/cs224w-readings/ bakshy11influencers.pdf> *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021076287A1 (en) * 2019-10-15 2021-04-22 Microsoft Technology Licensing, Llc Semantic sweeping of metadata enriched service data
US11587095B2 (en) 2019-10-15 2023-02-21 Microsoft Technology Licensing, Llc Semantic sweeping of metadata enriched service data
CN111461118B (en) * 2020-03-31 2023-11-24 中国移动通信集团黑龙江有限公司 Interest feature determining method, device, equipment and storage medium
CN111461118A (en) * 2020-03-31 2020-07-28 中国移动通信集团黑龙江有限公司 Interest feature determination method, device, equipment and storage medium
CN112272213B (en) * 2020-09-30 2023-09-19 上海连尚网络科技有限公司 Activity registration method and equipment
CN112272213A (en) * 2020-09-30 2021-01-26 上海连尚网络科技有限公司 Activity registration method and equipment
CN112231562B (en) * 2020-10-15 2023-07-14 北京工商大学 Network rumor recognition method and system
CN112231562A (en) * 2020-10-15 2021-01-15 北京工商大学 Network rumor identification method and system
US20220156393A1 (en) * 2020-11-19 2022-05-19 Tetrate.io Repeatable NGAC Policy Class Structure
CN112650851A (en) * 2020-12-28 2021-04-13 西安交通大学 False news identification system and method based on multilevel interactive evidence generation
CN112650851B (en) * 2020-12-28 2023-04-07 西安交通大学 False news identification system and method based on multilevel interactive evidence generation
CN113010578A (en) * 2021-03-22 2021-06-22 华南理工大学 Community data analysis method and device, community intelligent interaction platform and storage medium
CN113010578B (en) * 2021-03-22 2024-03-15 华南理工大学 Community data analysis method and device, community intelligent interaction platform and storage medium
WO2023129166A1 (en) * 2021-12-30 2023-07-06 Eidelman Vlad Generating and analyzing policymaker and organizational issue graphs
CN115766555A (en) * 2022-11-11 2023-03-07 中国航空工业集团公司西安飞行自动控制研究所 TTE switch network test architecture and method
CN118411194A (en) * 2024-07-04 2024-07-30 山东物慧信息科技有限公司 Community integral system operation service data information integration system

Also Published As

Publication number Publication date
EP3642739A4 (en) 2020-11-11
EP3642739A1 (en) 2020-04-29
IL271650A (en) 2020-02-27
CA3068264C (en) 2023-10-03
CA3068264A1 (en) 2018-12-27

Similar Documents

Publication Publication Date Title
US11409825B2 (en) Methods and systems for identifying markers of coordinated activity in social media movements
US10324598B2 (en) System and method for a search engine content filter
Stieglitz et al. Social media analytics–Challenges in topic discovery, data collection, and data preparation
CA3068264C (en) Methods and systems for identifying markers of coordinated activity in social media movements
US20130232263A1 (en) System and method for classifying a contagious phenomenon propagating on a network
US8635281B2 (en) System and method for attentive clustering and analytics
US10176609B2 (en) Analysis and visualization of interaction and influence in a network
Tinati et al. Identifying communicator roles in twitter
Gundecha et al. Mining social media: a brief introduction
Bayrakdar et al. Semantic analysis on social networks: A survey
Amato et al. Multimedia story creation on social networks
EP4396765A2 (en) Analyzing social media data to identify markers of coordinated movements, using stance detection, and using clustering techniques
Roberts et al. Visualising business data: A survey
Tang et al. Group profiling for understanding social structures
Abu-Salih et al. Social big data analytics
Sijtsma et al. Tweetviz: Visualizing tweets for business intelligence
Bartal et al. Role-aware information spread in online social networks
Chung et al. A computational framework for social-media-based business analytics and knowledge creation: empirical studies of CyTraSS
WO2014123929A1 (en) System and method for classifying a contagious phenomenon propagating on a network
Kostakos et al. Where am I? Location archetype keyword extraction from urban mobility patterns
Kumar et al. Web Mining and Web Usage Mining for Various Human-Driven Applications
Lin et al. Advances in social networks analysis and mining (asonam)
Thom Visual analytics of social media for situation awareness
Yang et al. Comparison and modelling of country-level micro-blog user behaviour and activity in cyber-physical-social systems using weibo and twitter data
WO2024123876A1 (en) Analyzing social media data to identify markers of coordinated movements, using stance detection, and using clustering techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18819788

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3068264

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018819788

Country of ref document: EP

Effective date: 20200120